This is an archive of the discontinued LLVM Phabricator instance.

I have a question regarding your intended usage of this, do you have a pass pipeline in which this fits (if so it it possible to share it) ?
The reason I am asking is that most passes in Linalg should be considered as "test" passes as we do not have automatic profitability heuristics upstream.

Instead we have been switching to implementing key functionality as transform dialect ops and apply / test them independently of a pass pipeline.
Separately, pass pipelines with proper heuristics can be built from these building blocks.

Here is a recent example of adding a new transform and a new test, in case it is useful: https://reviews.llvm.org/D153420.

You could also directly target a memory space attribute rather than using the hardcoded constants that are backend specific and can be confusing (see also: https://discourse.llvm.org/t/confused-by-inconsistencies-in-gpu-magic-constants/72041).

mlir/include/mlir/Dialect/Linalg/Passes.td
157	nit: dimensions
mlir/lib/Dialect/Linalg/Transforms/UpdateAddressSpace.cpp
147 ↗	(On Diff #555006)	`MemRefType::Builder(srcMemRefType).setAddressSpace(...)` or something similar plz
159 ↗	(On Diff #555006)	you would likely be better off here with a `linalg::CopyOp` as it can further be mapped to e.g. GPU thread ids and vector load/store using something like: https://reviews.llvm.org/D154836

Actually, here is a contribution that seems related: https://reviews.llvm.org/D144666

Do you see a way to generalize and/or reuse the existing ?

@nicolasvasilache thanks for feedback, I really appreciate it!

I will take a look at Transform Dialect and the references you added here and see how I can generalize a solution.

Thanks,
Aviad Cohen

Updated linalg transform::promote operations instead of dedicated pass.

AviadCo retitled this revision from [mlir][linalg]: Add LinalgDMAAddressSpace pass to [mlir][Linalg]: Add memory space to linalg transform::PromoteOp.Sep 6 2023, 12:25 AM

AviadCo edited the summary of this revision. (Show Details)

In D159074#4631170, @nicolasvasilache wrote:

Actually, here is a contribution that seems related: https://reviews.llvm.org/D144666

Do you see a way to generalize and/or reuse the existing ?

I really appreciate your references. I now see that I may use Linalg transform::PromoteOp with small modification to add memory space support.
Not related to this patch, I am a bit confused why GPU dialect is deeply connected to this transform? IMO we should avoid that coupling. Moreover, I am not using GPU at all but still needs to specify the memory address.

Can you please review this new patch?

@AviadCo thanks this looks good to me.

As to the existing state, I suspect the person who added these patterns wanted to implement some simple heuristic to raise the level of control / automation in the transform but ended up coupling the GPU dialect with it in the process.
I'd welcome a refactoring to decouple these concerns indeed.
The operations used for alloc and copy could also be made parametric to avoid hardcoding.

This revision is now accepted and ready to land.Sep 6 2023, 4:52 AM

This revision was landed with ongoing or failed builds.Sep 7 2023, 7:35 AM

Closed by commit rGd6a2014eb8b9: [mlir][Linalg]: Add memory space to linalg transform::PromoteOp (authored by AviadCo). · Explain Why

This revision was automatically updated to reflect the committed changes.

AviadCo added a commit: rGd6a2014eb8b9: [mlir][Linalg]: Add memory space to linalg transform::PromoteOp.

In D159074#4639463, @nicolasvasilache wrote:

@AviadCo thanks this looks good to me.

As to the existing state, I suspect the person who added these patterns wanted to implement some simple heuristic to raise the level of control / automation in the transform but ended up coupling the GPU dialect with it in the process.
I'd welcome a refactoring to decouple these concerns indeed.
The operations used for alloc and copy could also be made parametric to avoid hardcoding.

thanks for reviewing so fast!

I acctually have some ideas to improve the promotion a bit (i.e. not to copy outputs before the linalg operation if there are no uses inside the linalg block - meaning output is not an "in/out" operand) so i believe I will make further commits to PromoteOp.

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

Linalg/

Passes.h

3 lines

Passes.td

22 lines

lib/

Dialect/

Linalg/

Transforms/

CMakeLists.txt

1 line

DMAAddressSpace.cpp

178 lines

test/

Dialect/

Linalg/

dma_address_space.mlir

198 lines

dma_address_space_same_addr_space.mlir

13 lines

Diff 554217

mlir/include/mlir/Dialect/Linalg/Passes.h

	Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines
	/// Create a pass to convert named Linalg operations to Linalg generic			/// Create a pass to convert named Linalg operations to Linalg generic
	/// operations.			/// operations.
	std::unique_ptr<Pass> createLinalgGeneralizationPass();			std::unique_ptr<Pass> createLinalgGeneralizationPass();

	/// Create a pass to convert Linalg operations to equivalent operations that			/// Create a pass to convert Linalg operations to equivalent operations that
	/// work on primitive types, if possible.			/// work on primitive types, if possible.
	std::unique_ptr<Pass> createLinalgDetensorizePass();			std::unique_ptr<Pass> createLinalgDetensorizePass();

				/// Create a pass to DMA Linalg operations if possible.
				std::unique_ptr<Pass> createLinalgDMAAddressSpacePass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Registration			// Registration
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	/// Generate the code for registering passes.			/// Generate the code for registering passes.
	#define GEN_PASS_REGISTRATION			#define GEN_PASS_REGISTRATION
	#include "mlir/Dialect/Linalg/Passes.h.inc"			#include "mlir/Dialect/Linalg/Passes.h.inc"

	} // namespace mlir			} // namespace mlir

	#endif // MLIR_DIALECT_LINALG_PASSES_H_			#endif // MLIR_DIALECT_LINALG_PASSES_H_

mlir/include/mlir/Dialect/Linalg/Passes.td

Show First 20 Lines • Show All 139 Lines • ▼ Show 20 Lines	def LinalgDetensorize : InterfacePass<"linalg-detensorize", "FunctionOpInterface"> {
let options = [		let options = [
Option<"aggressiveMode", "aggressive-mode", "bool", /default=/"false",		Option<"aggressiveMode", "aggressive-mode", "bool", /default=/"false",
"Detensorize all ops that qualify for detensoring along with branch"		"Detensorize all ops that qualify for detensoring along with branch"
" operands and basic-block arguments.">		" operands and basic-block arguments.">

];		];
}		}

		def LinalgDMAAddressSpace : Pass<"linalg-dma-address-space", "func::FuncOp"> {
		let summary = "DMA linalg ops inputs/output from one address space to another";
		let description = [{
		Insert MemRef DMA start and wait operations before and after linalg generic ops.
		Each input/output in `srcAddrSpace` will be DMA imported/exported into
		`destAddrSpace`. This pass allows moving operation to run on faster memory.

		This pass supports linalg::generic operations only. Only input/output from
		MemRef type with static dimentations are supported.
		}];
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions nit: dimensions nicolasvasilache: nit: dimensions
		let constructor = "mlir::createLinalgDMAAddressSpacePass()";
		let dependentDialects = [
		"linalg::LinalgDialect", "memref::MemRefDialect"
		];
		let options = [
		Option<"srcAddrSpace", "src-addr-space", "unsigned", /default=/"0",
		"Source inputs/output address space to DMA import/export.">,
		Option<"destAddrSpace", "dest-addr-space", "unsigned", /default=/"0",
		"Destination inputs/output address space to DMA import/export.">
		];
		}

#endif // MLIR_DIALECT_LINALG_PASSES		#endif // MLIR_DIALECT_LINALG_PASSES

mlir/lib/Dialect/Linalg/Transforms/CMakeLists.txt

	add_mlir_dialect_library(MLIRLinalgTransforms			add_mlir_dialect_library(MLIRLinalgTransforms
	BubbleUpExtractSlice.cpp			BubbleUpExtractSlice.cpp
	BufferizableOpInterfaceImpl.cpp			BufferizableOpInterfaceImpl.cpp
	Bufferize.cpp			Bufferize.cpp
	ConstantFold.cpp			ConstantFold.cpp
	ConvertToDestinationStyle.cpp			ConvertToDestinationStyle.cpp
	ConvertConv2DToImg2Col.cpp			ConvertConv2DToImg2Col.cpp
	DataLayoutPropagation.cpp			DataLayoutPropagation.cpp
	DecomposeLinalgOps.cpp			DecomposeLinalgOps.cpp
	Detensorize.cpp			Detensorize.cpp
				DMAAddressSpace.cpp
	DropUnitDims.cpp			DropUnitDims.cpp
	ElementwiseOpFusion.cpp			ElementwiseOpFusion.cpp
	ElementwiseToLinalg.cpp			ElementwiseToLinalg.cpp
	EliminateEmptyTensors.cpp			EliminateEmptyTensors.cpp
	EraseUnusedOperandsAndResults.cpp			EraseUnusedOperandsAndResults.cpp
	FusePadOpWithLinalgProducer.cpp			FusePadOpWithLinalgProducer.cpp
	Fusion.cpp			Fusion.cpp
	Generalization.cpp			Generalization.cpp
	▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

mlir/lib/Dialect/Linalg/Transforms/DMAAddressSpace.cpp

This file was added.

				//===- ElementwiseToLinalg.cpp - conversion of elementwise to linalg ------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "mlir/Dialect/Linalg/Passes.h"

				#include "mlir/Dialect/Arith/Utils/Utils.h"
				#include "mlir/Dialect/Func/IR/FuncOps.h"
				#include "mlir/Dialect/Linalg/IR/Linalg.h"
				#include "mlir/Dialect/Linalg/Transforms/Transforms.h"
				#include "mlir/Dialect/Linalg/Utils/Utils.h"
				#include "mlir/Transforms/DialectConversion.h"

				#define DEBUG_TYPE "linalg-dma-address-space"

				namespace mlir {
				#define GEN_PASS_DEF_LINALGDMAADDRESSSPACE
				#include "mlir/Dialect/Linalg/Passes.h.inc"
				} // namespace mlir

				using namespace mlir;
				using namespace mlir::linalg;

				namespace {
				/// A permissive pass that tries to DMA linalg operations that accepts
				/// MemRefs in one address (memory) space and to DMA it to another address
				/// (memory) space. Currently only linalg generic operations are supported. This
				/// is necessary due to HWs where operations are allowed to run (or can benefit
				/// from) running the operations on faster address (memory) space.
				class LinalgDMAAddressSpacePass
				: public impl::LinalgDMAAddressSpaceBase<LinalgDMAAddressSpacePass> {
				public:
				LinalgDMAAddressSpacePass() = default;

				void runOnOperation() final;

				private:
				// Tries to DMA the linalg generic operands to destination address (memory)
				// space.
				void tryDmaGenericOp(linalg::GenericOp genericOp);
				// Checks if an a given operand can be DMA to destination address (memory)
				// space.
				bool canDMAOperand(Value operand);
				// For inputs, allocates a memory before linalg generic operation and add
				// MemRef DMA Start and MemRef DMA Wait operations for the given operand. For
				// outputs, DMA Start and DMA Wait will be inserted after the linalg generic
				// operation.
				Value dmaOperand(linalg::GenericOp genericOp, Value operand, bool isInput);
				};
				} // end anonymous namespace

				void LinalgDMAAddressSpacePass::runOnOperation() {
				func::FuncOp funcOp = getOperation();

				if (funcOp.isDeclaration()) {
				LLVM_DEBUG(llvm::dbgs()
				<< "Skipping declaration function " << funcOp.getName() << "\n");
				return;
				}

				if (srcAddrSpace == destAddrSpace) {
				funcOp.emitError("Source and destination address spaces must be different");
				return signalPassFailure();
				}

				funcOp->walk(
				[&](linalg::GenericOp genericOp) { tryDmaGenericOp(genericOp); });
				}

				void LinalgDMAAddressSpacePass::tryDmaGenericOp(linalg::GenericOp genericOp) {
				LLVM_DEBUG(llvm::dbgs() << "Converting operands address spaces of "
				<< genericOp << "\n");

				SmallVector<Value> newInputs;
				for (auto operand : genericOp.getInputs()) {
				Value newOperand;
				if (canDMAOperand(operand))
				newOperand = dmaOperand(genericOp, operand, true /isInput/);
				else
				newOperand = operand;

				newInputs.push_back(newOperand);
				}
				genericOp.getInputsMutable().assign(newInputs);

				SmallVector<Value> newOutputs;
				for (auto operand : genericOp.getOutputs()) {
				Value newOperand;
				if (canDMAOperand(operand))
				newOperand = dmaOperand(genericOp, operand, false /isInput/);
				else
				newOperand = operand;

				newOutputs.push_back(newOperand);
				}
				genericOp.getOutputsMutable().assign(newOutputs);
				}

				bool LinalgDMAAddressSpacePass::canDMAOperand(Value operand) {
				auto memRefType = dyn_cast<MemRefType>(operand.getType());
				if (!memRefType) {
				LLVM_DEBUG(llvm::dbgs() << "Only MemRef operands are supported for operand "
				<< operand << "\n");
				return false;
				}

				unsigned addrSpace = 0 /default/;
				Attribute spaceAttr = memRefType.getMemorySpace();
				if (spaceAttr) {
				addrSpace = spaceAttr.cast<mlir::IntegerAttr>().getInt();
				}

				if (addrSpace != srcAddrSpace) {
				LLVM_DEBUG(llvm::dbgs()
				<< "Operand " << operand
				<< " address space doesn't match source address space "
				<< srcAddrSpace << "\n");
				return false;
				}
				if (!memRefType.hasStaticShape()) {
				LLVM_DEBUG(llvm::dbgs()
				<< "Operand " << operand << " has dynamic dimensions\n");
				return false;
				}

				return true;
				}

				Value LinalgDMAAddressSpacePass::dmaOperand(linalg::GenericOp genericOp,
				Value operand, bool isInput) {
				OpBuilder builder(genericOp.getOperation());

				builder.setInsertionPoint(genericOp);
				auto loc = genericOp.getLoc();

				// Create a tag (single element 1-d memref) for the DMA.
				auto tagMemRefType = MemRefType::get({1}, builder.getIntegerType(32));
				auto tagBuffer = builder.create<memref::AllocOp>(loc, tagMemRefType);

				// Create a buffer in destination address space for the DMA.
				auto srcMemRefType = cast<MemRefType>(operand.getType());
				auto destMemRefType =
				MemRefType::get(srcMemRefType.getShape(), srcMemRefType.getElementType(),
				AffineMap{}, builder.getI64IntegerAttr(destAddrSpace));
				auto destBufferAlloc = builder.create<memref::AllocOp>(loc, destMemRefType);

				auto dmaSource = isInput ? operand : destBufferAlloc.getResult();
				auto dmaDest = isInput ? destBufferAlloc.getResult() : operand;

				if (!isInput)
				// Four outputs only, the DMA should be after linalg generic
				builder.setInsertionPointAfter(genericOp);

				// DMA the whole source buffer.
				auto numElements = builder.create<arith::ConstantOp>(
				loc, builder.getIndexAttr((destMemRefType.getNumElements())));
				auto zero = builder.create<arith::ConstantOp>(loc, builder.getIndexAttr(0));
				auto rank = srcMemRefType.getRank();
				SmallVector<Value> sourceIndices(rank, zero);
				SmallVector<Value> destIndices(rank, zero);
				SmallVector<Value> tagIndices(1, zero);

				// Create async DMA and wait right before input / right after output.
				builder.create<memref::DmaStartOp>(loc, dmaSource, sourceIndices, dmaDest,
				destIndices, numElements, tagBuffer,
				tagIndices);
				builder.create<memref::DmaWaitOp>(loc, tagBuffer, tagIndices, numElements);

				return destBufferAlloc.getResult();
				}

				std::unique_ptr<Pass> mlir::createLinalgDMAAddressSpacePass() {
				return std::make_unique<LinalgDMAAddressSpacePass>();
				}

mlir/test/Dialect/Linalg/dma_address_space.mlir

This file was added.

				// RUN: mlir-opt %s -pass-pipeline="builtin.module(func.func(linalg-dma-address-space{src-addr-space=1 dest-addr-space=3}))" -split-input-file \| FileCheck %s


				// CHECK-LABEL: func.func @linalg_generic_dma_all_function_inputs_outputs(
				// CHECK-SAME: %[[VAL_0:.*]]: memref<1x2x3x4xf32, 1>,
				// CHECK-SAME: %[[VAL_1:.*]]: memref<1x2x3x4xf32, 1>) -> memref<1x2x3x4xf32, 1> {
				// CHECK: %[[VAL_2:.*]] = memref.alloc() {alignment = 64 : i64} : memref<1x2x3x4xf32, 1>
				// CHECK: %[[VAL_3:.*]] = memref.alloc() : memref<1xi32>
				// CHECK: %[[VAL_4:.*]] = memref.alloc() : memref<1x2x3x4xf32, 3>
				// CHECK: %[[VAL_5:.*]] = arith.constant 24 : index
				// CHECK: %[[VAL_6:.*]] = arith.constant 0 : index
				// CHECK: memref.dma_start %[[VAL_0]]{{\[}}%[[VAL_6]], %[[VAL_6]], %[[VAL_6]], %[[VAL_6]]], %[[VAL_4]]{{\[}}%[[VAL_6]], %[[VAL_6]], %[[VAL_6]], %[[VAL_6]]], %[[VAL_5]], %[[VAL_3]]{{\[}}%[[VAL_6]]] : memref<1x2x3x4xf32, 1>, memref<1x2x3x4xf32, 3>, memref<1xi32>
				// CHECK: memref.dma_wait %[[VAL_3]]{{\[}}%[[VAL_6]]], %[[VAL_5]] : memref<1xi32>
				// CHECK: %[[VAL_7:.*]] = memref.alloc() : memref<1xi32>
				// CHECK: %[[VAL_8:.*]] = memref.alloc() : memref<1x2x3x4xf32, 3>
				// CHECK: %[[VAL_9:.*]] = arith.constant 24 : index
				// CHECK: %[[VAL_10:.*]] = arith.constant 0 : index
				// CHECK: memref.dma_start %[[VAL_1]]{{\[}}%[[VAL_10]], %[[VAL_10]], %[[VAL_10]], %[[VAL_10]]], %[[VAL_8]]{{\[}}%[[VAL_10]], %[[VAL_10]], %[[VAL_10]], %[[VAL_10]]], %[[VAL_9]], %[[VAL_7]]{{\[}}%[[VAL_10]]] : memref<1x2x3x4xf32, 1>, memref<1x2x3x4xf32, 3>, memref<1xi32>
				// CHECK: memref.dma_wait %[[VAL_7]]{{\[}}%[[VAL_10]]], %[[VAL_9]] : memref<1xi32>
				// CHECK: %[[VAL_11:.*]] = memref.alloc() : memref<1xi32>
				// CHECK: %[[VAL_12:.*]] = memref.alloc() : memref<1x2x3x4xf32, 3>
				// CHECK: linalg.generic {doc = "", indexing_maps = [#map, #map, #map], iterator_types = ["parallel", "parallel", "parallel", "parallel"], library_call = ""} ins(%[[VAL_4]], %[[VAL_8]] : memref<1x2x3x4xf32, 3>, memref<1x2x3x4xf32, 3>) outs(%[[VAL_12]] : memref<1x2x3x4xf32, 3>) {
				// CHECK: ^bb0(%[[VAL_13:.]]: f32, %[[VAL_14:.]]: f32, %[[VAL_15:.*]]: f32):
				// CHECK: %[[VAL_16:.*]] = arith.addf %[[VAL_13]], %[[VAL_14]] : f32
				// CHECK: linalg.yield %[[VAL_16]] : f32
				// CHECK: }
				// CHECK: %[[VAL_17:.*]] = arith.constant 24 : index
				// CHECK: %[[VAL_18:.*]] = arith.constant 0 : index
				// CHECK: memref.dma_start %[[VAL_12]]{{\[}}%[[VAL_18]], %[[VAL_18]], %[[VAL_18]], %[[VAL_18]]], %[[VAL_2]]{{\[}}%[[VAL_18]], %[[VAL_18]], %[[VAL_18]], %[[VAL_18]]], %[[VAL_17]], %[[VAL_11]]{{\[}}%[[VAL_18]]] : memref<1x2x3x4xf32, 3>, memref<1x2x3x4xf32, 1>, memref<1xi32>
				// CHECK: memref.dma_wait %[[VAL_11]]{{\[}}%[[VAL_18]]], %[[VAL_17]] : memref<1xi32>
				// CHECK: return %[[VAL_2]] : memref<1x2x3x4xf32, 1>
				// CHECK: }
				#map = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>
				func.func @linalg_generic_dma_all_function_inputs_outputs(%arg0: memref<1x2x3x4xf32, 1>, %arg1: memref<1x2x3x4xf32, 1>) -> memref<1x2x3x4xf32, 1> {
				%alloc = memref.alloc() {alignment = 64 : i64} : memref<1x2x3x4xf32, 1>
				linalg.generic {doc = "", indexing_maps = [#map, #map, #map], iterator_types = ["parallel", "parallel", "parallel", "parallel"], library_call = ""} ins(%arg0, %arg1 : memref<1x2x3x4xf32, 1>, memref<1x2x3x4xf32, 1>) outs(%alloc : memref<1x2x3x4xf32, 1>) {
				^bb0(%in: f32, %in_1: f32, %out: f32):
				%1 = arith.addf %in, %in_1 : f32
				linalg.yield %1 : f32
				}
				return %alloc : memref<1x2x3x4xf32, 1>
				}

				// -----

				// CHECK-LABEL: func.func @linalg_generic_dma_ignore_inputs_addr_space(
				// CHECK-SAME: %[[VAL_0:.*]]: memref<1x2x3x4xf32, 2>,
				// CHECK-SAME: %[[VAL_1:.*]]: memref<1x2x3x4xf32, 2>) -> memref<1x2x3x4xf32, 1> {
				// CHECK: %[[VAL_2:.*]] = memref.alloc() {alignment = 64 : i64} : memref<1x2x3x4xf32, 1>
				// CHECK: %[[VAL_3:.*]] = memref.alloc() : memref<1xi32>
				// CHECK: %[[VAL_4:.*]] = memref.alloc() : memref<1x2x3x4xf32, 3>
				// CHECK: linalg.generic {doc = "", indexing_maps = [#map, #map, #map], iterator_types = ["parallel", "parallel", "parallel", "parallel"], library_call = ""} ins(%[[VAL_0]], %[[VAL_1]] : memref<1x2x3x4xf32, 2>, memref<1x2x3x4xf32, 2>) outs(%[[VAL_4]] : memref<1x2x3x4xf32, 3>) {
				// CHECK: ^bb0(%[[VAL_5:.]]: f32, %[[VAL_6:.]]: f32, %[[VAL_7:.*]]: f32):
				// CHECK: %[[VAL_8:.*]] = arith.addf %[[VAL_5]], %[[VAL_6]] : f32
				// CHECK: linalg.yield %[[VAL_8]] : f32
				// CHECK: }
				// CHECK: %[[VAL_9:.*]] = arith.constant 24 : index
				// CHECK: %[[VAL_10:.*]] = arith.constant 0 : index
				// CHECK: memref.dma_start %[[VAL_4]]{{\[}}%[[VAL_10]], %[[VAL_10]], %[[VAL_10]], %[[VAL_10]]], %[[VAL_2]]{{\[}}%[[VAL_10]], %[[VAL_10]], %[[VAL_10]], %[[VAL_10]]], %[[VAL_9]], %[[VAL_3]]{{\[}}%[[VAL_10]]] : memref<1x2x3x4xf32, 3>, memref<1x2x3x4xf32, 1>, memref<1xi32>
				// CHECK: memref.dma_wait %[[VAL_3]]{{\[}}%[[VAL_10]]], %[[VAL_9]] : memref<1xi32>
				// CHECK: return %[[VAL_2]] : memref<1x2x3x4xf32, 1>
				// CHECK: }
				#map = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>
				func.func @linalg_generic_dma_ignore_inputs_addr_space(%arg0: memref<1x2x3x4xf32, 2>, %arg1: memref<1x2x3x4xf32, 2>) -> memref<1x2x3x4xf32, 1> {
				%alloc = memref.alloc() {alignment = 64 : i64} : memref<1x2x3x4xf32, 1>
				linalg.generic {doc = "", indexing_maps = [#map, #map, #map], iterator_types = ["parallel", "parallel", "parallel", "parallel"], library_call = ""} ins(%arg0, %arg1 : memref<1x2x3x4xf32, 2>, memref<1x2x3x4xf32, 2>) outs(%alloc : memref<1x2x3x4xf32, 1>) {
				^bb0(%in: f32, %in_1: f32, %out: f32):
				%1 = arith.addf %in, %in_1 : f32
				linalg.yield %1 : f32
				}
				return %alloc : memref<1x2x3x4xf32, 1>
				}

				// -----

				// CHECK-LABEL: func.func @linalg_generic_dma_ignore_outputs_addr_space(
				// CHECK-SAME: %[[VAL_0:.*]]: memref<1x2x3x4xf32, 1>,
				// CHECK-SAME: %[[VAL_1:.*]]: memref<1x2x3x4xf32, 1>) -> memref<1x2x3x4xf32, 3> {
				// CHECK: %[[VAL_2:.*]] = memref.alloc() {alignment = 64 : i64} : memref<1x2x3x4xf32, 3>
				// CHECK: %[[VAL_3:.*]] = memref.alloc() : memref<1xi32>
				// CHECK: %[[VAL_4:.*]] = memref.alloc() : memref<1x2x3x4xf32, 3>
				// CHECK: %[[VAL_5:.*]] = arith.constant 24 : index
				// CHECK: %[[VAL_6:.*]] = arith.constant 0 : index
				// CHECK: memref.dma_start %[[VAL_0]]{{\[}}%[[VAL_6]], %[[VAL_6]], %[[VAL_6]], %[[VAL_6]]], %[[VAL_4]]{{\[}}%[[VAL_6]], %[[VAL_6]], %[[VAL_6]], %[[VAL_6]]], %[[VAL_5]], %[[VAL_3]]{{\[}}%[[VAL_6]]] : memref<1x2x3x4xf32, 1>, memref<1x2x3x4xf32, 3>, memref<1xi32>
				// CHECK: memref.dma_wait %[[VAL_3]]{{\[}}%[[VAL_6]]], %[[VAL_5]] : memref<1xi32>
				// CHECK: %[[VAL_7:.*]] = memref.alloc() : memref<1xi32>
				// CHECK: %[[VAL_8:.*]] = memref.alloc() : memref<1x2x3x4xf32, 3>
				// CHECK: %[[VAL_9:.*]] = arith.constant 24 : index
				// CHECK: %[[VAL_10:.*]] = arith.constant 0 : index
				// CHECK: memref.dma_start %[[VAL_1]]{{\[}}%[[VAL_10]], %[[VAL_10]], %[[VAL_10]], %[[VAL_10]]], %[[VAL_8]]{{\[}}%[[VAL_10]], %[[VAL_10]], %[[VAL_10]], %[[VAL_10]]], %[[VAL_9]], %[[VAL_7]]{{\[}}%[[VAL_10]]] : memref<1x2x3x4xf32, 1>, memref<1x2x3x4xf32, 3>, memref<1xi32>
				// CHECK: memref.dma_wait %[[VAL_7]]{{\[}}%[[VAL_10]]], %[[VAL_9]] : memref<1xi32>
				// CHECK: linalg.generic {doc = "", indexing_maps = [#map, #map, #map], iterator_types = ["parallel", "parallel", "parallel", "parallel"], library_call = ""} ins(%[[VAL_4]], %[[VAL_8]] : memref<1x2x3x4xf32, 3>, memref<1x2x3x4xf32, 3>) outs(%[[VAL_2]] : memref<1x2x3x4xf32, 3>) {
				// CHECK: ^bb0(%[[VAL_11:.]]: f32, %[[VAL_12:.]]: f32, %[[VAL_13:.*]]: f32):
				// CHECK: %[[VAL_14:.*]] = arith.addf %[[VAL_11]], %[[VAL_12]] : f32
				// CHECK: linalg.yield %[[VAL_14]] : f32
				// CHECK: }
				// CHECK: return %[[VAL_2]] : memref<1x2x3x4xf32, 3>
				// CHECK: }
				#map = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>
				func.func @linalg_generic_dma_ignore_outputs_addr_space(%arg0: memref<1x2x3x4xf32, 1>, %arg1: memref<1x2x3x4xf32, 1>) -> memref<1x2x3x4xf32, 3> {
				%alloc = memref.alloc() {alignment = 64 : i64} : memref<1x2x3x4xf32, 3>
				linalg.generic {doc = "", indexing_maps = [#map, #map, #map], iterator_types = ["parallel", "parallel", "parallel", "parallel"], library_call = ""} ins(%arg0, %arg1 : memref<1x2x3x4xf32, 1>, memref<1x2x3x4xf32, 1>) outs(%alloc : memref<1x2x3x4xf32, 3>) {
				^bb0(%in: f32, %in_1: f32, %out: f32):
				%1 = arith.addf %in, %in_1 : f32
				linalg.yield %1 : f32
				}
				return %alloc : memref<1x2x3x4xf32, 3>
				}

				// -----

				// CHECK-LABEL: func.func @linalg_generic_dma_ignore_all(
				// CHECK-SAME: %[[VAL_0:.*]]: memref<1x2x3x4xf32>,
				// CHECK-SAME: %[[VAL_1:.*]]: memref<1x2x3x4xf32>) -> memref<1x2x3x4xf32, 3> {
				// CHECK: %[[VAL_2:.*]] = memref.alloc() {alignment = 64 : i64} : memref<1x2x3x4xf32, 3>
				// CHECK: linalg.generic {doc = "", indexing_maps = [#map, #map, #map], iterator_types = ["parallel", "parallel", "parallel", "parallel"], library_call = ""} ins(%[[VAL_0]], %[[VAL_1]] : memref<1x2x3x4xf32>, memref<1x2x3x4xf32>) outs(%[[VAL_2]] : memref<1x2x3x4xf32, 3>) {
				// CHECK: ^bb0(%[[VAL_3:.]]: f32, %[[VAL_4:.]]: f32, %[[VAL_5:.*]]: f32):
				// CHECK: %[[VAL_6:.*]] = arith.addf %[[VAL_3]], %[[VAL_4]] : f32
				// CHECK: linalg.yield %[[VAL_6]] : f32
				// CHECK: }
				// CHECK: return %[[VAL_2]] : memref<1x2x3x4xf32, 3>
				// CHECK: }
				#map = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>
				func.func @linalg_generic_dma_ignore_all(%arg0: memref<1x2x3x4xf32>, %arg1: memref<1x2x3x4xf32>) -> memref<1x2x3x4xf32, 3> {
				%alloc = memref.alloc() {alignment = 64 : i64} : memref<1x2x3x4xf32, 3>
				linalg.generic {doc = "", indexing_maps = [#map, #map, #map], iterator_types = ["parallel", "parallel", "parallel", "parallel"], library_call = ""} ins(%arg0, %arg1 : memref<1x2x3x4xf32>, memref<1x2x3x4xf32>) outs(%alloc : memref<1x2x3x4xf32, 3>) {
				^bb0(%in: f32, %in_1: f32, %out: f32):
				%1 = arith.addf %in, %in_1 : f32
				linalg.yield %1 : f32
				}
				return %alloc : memref<1x2x3x4xf32, 3>
				}

				// -----

				// CHECK-LABEL: func.func @linalg_generic_dma_only_one_input(
				// CHECK-SAME: %[[VAL_0:.*]]: memref<3x4xi32, 1>,
				// CHECK-SAME: %[[VAL_1:.*]]: memref<3x4xi32>) -> memref<3x4xi32, 3> {
				// CHECK: %[[VAL_2:.*]] = memref.alloc() {alignment = 64 : i64} : memref<3x4xi32, 3>
				// CHECK: %[[VAL_3:.*]] = memref.alloc() : memref<1xi32>
				// CHECK: %[[VAL_4:.*]] = memref.alloc() : memref<3x4xi32, 3>
				// CHECK: %[[VAL_5:.*]] = arith.constant 12 : index
				// CHECK: %[[VAL_6:.*]] = arith.constant 0 : index
				// CHECK: memref.dma_start %[[VAL_0]]{{\[}}%[[VAL_6]], %[[VAL_6]]], %[[VAL_4]]{{\[}}%[[VAL_6]], %[[VAL_6]]], %[[VAL_5]], %[[VAL_3]]{{\[}}%[[VAL_6]]] : memref<3x4xi32, 1>, memref<3x4xi32, 3>, memref<1xi32>
				// CHECK: memref.dma_wait %[[VAL_3]]{{\[}}%[[VAL_6]]], %[[VAL_5]] : memref<1xi32>
				// CHECK: linalg.generic {doc = "", indexing_maps = [#map, #map, #map], iterator_types = ["parallel", "parallel"], library_call = ""} ins(%[[VAL_4]], %[[VAL_1]] : memref<3x4xi32, 3>, memref<3x4xi32>) outs(%[[VAL_2]] : memref<3x4xi32, 3>) {
				// CHECK: ^bb0(%[[VAL_7:.]]: i32, %[[VAL_8:.]]: i32, %[[VAL_9:.*]]: i32):
				// CHECK: %[[VAL_10:.*]] = arith.addi %[[VAL_7]], %[[VAL_8]] : i32
				// CHECK: linalg.yield %[[VAL_10]] : i32
				// CHECK: }
				// CHECK: return %[[VAL_2]] : memref<3x4xi32, 3>
				// CHECK: }
				#map = affine_map<(d0, d1) -> (d0, d1)>
				func.func @linalg_generic_dma_only_one_input(%arg0: memref<3x4xi32, 1>, %arg1: memref<3x4xi32>) -> memref<3x4xi32, 3> {
				%alloc = memref.alloc() {alignment = 64 : i64} : memref<3x4xi32, 3>
				linalg.generic {doc = "", indexing_maps = [#map, #map, #map], iterator_types = ["parallel", "parallel"], library_call = ""} ins(%arg0, %arg1 : memref<3x4xi32, 1>, memref<3x4xi32>) outs(%alloc : memref<3x4xi32, 3>) {
				^bb0(%in: i32, %in_1: i32, %out: i32):
				%1 = arith.addi %in, %in_1 : i32
				linalg.yield %1 : i32
				}
				return %alloc : memref<3x4xi32, 3>
				}

				// -----

				// CHECK-LABEL: func.func @linalg_generic_ignore_dyanmic_shape(
				// CHECK-SAME: %[[VAL_0:.*]]: memref<3x4xi32, 1>,
				// CHECK-SAME: %[[VAL_1:.*]]: memref<?x?xi32, 1>) -> memref<3x4xi32, 1> {
				// CHECK: %[[VAL_2:.*]] = memref.alloc() {alignment = 64 : i64} : memref<3x4xi32, 1>
				// CHECK: %[[VAL_3:.*]] = memref.alloc() : memref<1xi32>
				// CHECK: %[[VAL_4:.*]] = memref.alloc() : memref<3x4xi32, 3>
				// CHECK: %[[VAL_5:.*]] = arith.constant 12 : index
				// CHECK: %[[VAL_6:.*]] = arith.constant 0 : index
				// CHECK: memref.dma_start %[[VAL_0]]{{\[}}%[[VAL_6]], %[[VAL_6]]], %[[VAL_4]]{{\[}}%[[VAL_6]], %[[VAL_6]]], %[[VAL_5]], %[[VAL_3]]{{\[}}%[[VAL_6]]] : memref<3x4xi32, 1>, memref<3x4xi32, 3>, memref<1xi32>
				// CHECK: memref.dma_wait %[[VAL_3]]{{\[}}%[[VAL_6]]], %[[VAL_5]] : memref<1xi32>
				// CHECK: %[[VAL_7:.*]] = memref.alloc() : memref<1xi32>
				// CHECK: %[[VAL_8:.*]] = memref.alloc() : memref<3x4xi32, 3>
				// CHECK: linalg.generic {doc = "", indexing_maps = [#map, #map, #map], iterator_types = ["parallel", "parallel"], library_call = ""} ins(%[[VAL_4]], %[[VAL_1]] : memref<3x4xi32, 3>, memref<?x?xi32, 1>) outs(%[[VAL_8]] : memref<3x4xi32, 3>) {
				// CHECK: ^bb0(%[[VAL_9:.]]: i32, %[[VAL_10:.]]: i32, %[[VAL_11:.*]]: i32):
				// CHECK: %[[VAL_12:.*]] = arith.addi %[[VAL_9]], %[[VAL_10]] : i32
				// CHECK: linalg.yield %[[VAL_12]] : i32
				// CHECK: }
				// CHECK: %[[VAL_13:.*]] = arith.constant 12 : index
				// CHECK: %[[VAL_14:.*]] = arith.constant 0 : index
				// CHECK: memref.dma_start %[[VAL_8]]{{\[}}%[[VAL_14]], %[[VAL_14]]], %[[VAL_2]]{{\[}}%[[VAL_14]], %[[VAL_14]]], %[[VAL_13]], %[[VAL_7]]{{\[}}%[[VAL_14]]] : memref<3x4xi32, 3>, memref<3x4xi32, 1>, memref<1xi32>
				// CHECK: memref.dma_wait %[[VAL_7]]{{\[}}%[[VAL_14]]], %[[VAL_13]] : memref<1xi32>
				// CHECK: return %[[VAL_2]] : memref<3x4xi32, 1>
				// CHECK: }
				#map = affine_map<(d0, d1) -> (d0, d1)>
				func.func @linalg_generic_ignore_dyanmic_shape(%arg0: memref<3x4xi32, 1>, %arg1: memref<?x?xi32, 1>) -> memref<3x4xi32, 1> {
				%alloc = memref.alloc() {alignment = 64 : i64} : memref<3x4xi32, 1>
				linalg.generic {doc = "", indexing_maps = [#map, #map, #map], iterator_types = ["parallel", "parallel"], library_call = ""} ins(%arg0, %arg1 : memref<3x4xi32, 1>, memref<?x?xi32, 1>) outs(%alloc : memref<3x4xi32, 1>) {
				^bb0(%in: i32, %in_1: i32, %out: i32):
				%1 = arith.addi %in, %in_1 : i32
				linalg.yield %1 : i32
				}
				return %alloc : memref<3x4xi32, 1>
				}

mlir/test/Dialect/Linalg/dma_address_space_same_addr_space.mlir

This file was added.

				// RUN: mlir-opt %s -pass-pipeline="builtin.module(func.func(linalg-dma-address-space{src-addr-space=3 dest-addr-space=3}))" -verify-diagnostics

				#map = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>
				// expected-error @+1 {{Source and destination address spaces must be different}}
				func.func @linalg_generic_dma_ignore_all(%arg0: memref<1x2x3x4xf32>, %arg1: memref<1x2x3x4xf32>) -> memref<1x2x3x4xf32, 3> {
				%alloc = memref.alloc() {alignment = 64 : i64} : memref<1x2x3x4xf32, 3>
				linalg.generic {doc = "", indexing_maps = [#map, #map, #map], iterator_types = ["parallel", "parallel", "parallel", "parallel"], library_call = ""} ins(%arg0, %arg1 : memref<1x2x3x4xf32>, memref<1x2x3x4xf32>) outs(%alloc : memref<1x2x3x4xf32, 3>) {
				^bb0(%in: f32, %in_1: f32, %out: f32):
				%1 = arith.addf %in, %in_1 : f32
				linalg.yield %1 : f32
				}
				return %alloc : memref<1x2x3x4xf32, 3>
				}

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][Linalg]: Add memory space to linalg transform::PromoteOpClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 554217

mlir/include/mlir/Dialect/Linalg/Passes.h

mlir/include/mlir/Dialect/Linalg/Passes.td

mlir/lib/Dialect/Linalg/Transforms/CMakeLists.txt

mlir/lib/Dialect/Linalg/Transforms/DMAAddressSpace.cpp

mlir/test/Dialect/Linalg/dma_address_space.mlir

mlir/test/Dialect/Linalg/dma_address_space_same_addr_space.mlir

[mlir][Linalg]: Add memory space to linalg transform::PromoteOp
ClosedPublic