Download Raw Diff

Details

Reviewers

mravishankar
nicolasvasilache

Commits

rG4cd7ff6728f4: [mlir][linalg] Constant fold linalg.generic that are transposes

Summary

This commit adds a pattern to perform constant folding on linalg
generic ops which are essentially transposes. We see real cases
where model importers may generate such patterns.

Diff Detail

Repository

rG LLVM Github Monorepo

Build Status

Buildable 126825
Build 184168: arc lint + arc unit

Event Timeline

antiagainst created this revision.Sep 27 2021, 4:21 PM

Herald added a reviewer: mravishankar. · View Herald TranscriptSep 27 2021, 4:21 PM

Herald added subscribers: wenzhicui, wrengr, Chia-hungDuan and 19 others. · View Herald Transcript

antiagainst requested review of this revision.Sep 27 2021, 4:21 PM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptSep 27 2021, 4:21 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: limo1996, stephenneuendorffer, nicolasvasilache. · View Herald Transcript

Harbormaster completed remote builds in B126000: Diff 375432.Sep 27 2021, 4:33 PM

mravishankar requested changes to this revision.Sep 28 2021, 11:30 AM

mravishankar added inline comments.

mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
934 ↗	(On Diff #375432)	Will review this, but I am not sure that this goes in as a general canonicalization pattern. This can increase your memory footprint by quite a lot. You could have same constant one with and without transpose.

This revision now requires changes to proceed.Sep 28 2021, 11:30 AM

mravishankar added inline comments.Sep 28 2021, 11:33 AM

mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
868 ↗	(On Diff #375432)	THis is a heuristic. I would rather use a control function and allow the caller to decide when this should be done. Also I would move this out of canonicalization and move it into the `ElementwiseOpsFusion` patterns where all the patterns already use such a control function.

Address comments

mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
868 ↗	(On Diff #375432)	Done.
934 ↗	(On Diff #375432)	The logic checks only one user and the # of elements stays the same. So it should not increase/decrease memory footprint at runtime. It can for the compiler but that's due to the fact that all attributes are never released in MLIR. But not relevant anymore given now the decision is deferred to the caller's policy.

Harbormaster completed remote builds in B126360: Diff 375920.Sep 29 2021, 9:53 AM

Thanks for modifications. Some more changes here will help land something really nice!

This has some of the pieces needed for the general implementation of not just transpose but all elementwise operation. All you need for the next step here is

Use the affine map + linearization to get the index for each input at every point in the iteration space.
"interpret" the body of the linalg op to get the final yield value.

The space of ops to interpret can be large, but I think the code here could be restructured to setup those things so that the generalization of this is just adding interpretation for different operations. So we can start with just the transpose, and generalizing that would be straight-forward.
Also note that this doesnt have to anchor on Generic ops. This could work for any LinalgOp. All LinalgOps have a region. So you could just use that.

mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp
1332	If I am reading this right, this is going to create a new `Attribute` for every element of the constant. Thats probably going to blow up the compilation time. Can we create the container that the `DenseElementsAttr` can hold directly instead of first creating an attribute for each value and then using that to create the `DenseElementsAttr` by individually extracting the value of each attribute?

This revision now requires changes to proceed.Sep 30 2021, 11:11 AM

Use APInt/APFloat instead of Attribute

Harbormaster completed remote builds in B126825: Diff 376889.Oct 4 2021, 7:18 AM

In D110597#3034284, @mravishankar wrote:

Thanks for modifications. Some more changes here will help land something really nice!

Do you have a concrete real-world pattern that would fit into this category? My approach is generally to avoid abstracting too early because YAGNI. :) Without concrete examples, it's also hard to see how much we can reuse here. The checking over input/output type, indexing maps, and region contents are likely all specific to the particular pattern. It looks the indexing linearization and de-linearization might be reused? Not exactly sure how we should best modularize it though. Maybe some utility functions would be fine, or a function taking functors, or templates..

Anyway, it looks we should defer this until we see another concrete real-world example.

This has some of the pieces needed for the general implementation of not just transpose but all elementwise operation. All you need for the next step here is

Use the affine map + linearization to get the index for each input at every point in the iteration space.

"interpret" the body of the linalg op to get the final yield value.

Talking generally, it would be nice to have a constexpr like mode in MLIR. Effectively what we want is const folding the region content at compile time. Simply pattern matching the region content can become unwieldy quickly.

mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp
1332	Right. I've changed to avoid creating Attributes all the time. Holding different element types in the same attribute is always hard to handle, but at least APInt/APFloat provides some common abstraction to avoid a full blown switch among all possible int/float bitwdiths.

In D110597#3039856, @antiagainst wrote:

In D110597#3034284, @mravishankar wrote:

Thanks for modifications. Some more changes here will help land something really nice!

Do you have a concrete real-world pattern that would fit into this category? My approach is generally to avoid abstracting too early because YAGNI. :) Without concrete examples, it's also hard to see how much we can reuse here. The checking over input/output type, indexing maps, and region contents are likely all specific to the particular pattern. It looks the indexing linearization and de-linearization might be reused? Not exactly sure how we should best modularize it though. Maybe some utility functions would be fine, or a function taking functors, or templates..

I am not asking to generalize right now, but rather restructure the code so that it can handle such cases (not sure how to answer "real world pattern". I can give you an example that is real world that is also something I made up). The current implementation is very transpose specific, and then this will eitehr have to removed/reworked significantly even for slight variances. Here maybe are some more concrete changes that might help

For now check that the op has single input and a single output with permutation maps (this is how it is right now).
Instead of just checking for the body being a linalg.yield, just iterate over the body with a map from Value -> {int64_t/float}. Then you have a dispatch to "evaluate" each instruction to update this map. Finally you just return the value the yield is mapped to. Today this dispatch could have no statements, and it just returns the yield.

This is a minor restructuring that could be left as a placeholder for people if this needs to be extended AFAICS.

Anyway, it looks we should defer this until we see another concrete real-world example.

This has some of the pieces needed for the general implementation of not just transpose but all elementwise operation. All you need for the next step here is

Use the affine map + linearization to get the index for each input at every point in the iteration space.

"interpret" the body of the linalg op to get the final yield value.

Talking generally, it would be nice to have a constexpr like mode in MLIR. Effectively what we want is const folding the region content at compile time. Simply pattern matching the region content can become unwieldy quickly.

mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp
1335	Nit: Some comments over the reasoning here would be good as a reference , i.e. state that this is done to save compile time, etc.

Address comments - rewrite the pattern to have a common base

In D110597#3041216, @mravishankar wrote:

I am not asking to generalize right now, but rather restructure the code so that it can handle such cases (not sure how to answer "real world pattern". I can give you an example that is real world that is also something I made up). The current implementation is very transpose specific, and then this will eitehr have to removed/reworked significantly even for slight variances. Here maybe are some more concrete changes that might help

For now check that the op has single input and a single output with permutation maps (this is how it is right now).

Instead of just checking for the body being a linalg.yield, just iterate over the body with a map from Value -> {int64_t/float}. Then you have a dispatch to "evaluate" each instruction to update this map. Finally you just return the value the yield is mapped to. Today this dispatch could have no statements, and it just returns the yield.

This is a minor restructuring that could be left as a placeholder for people if this needs to be extended AFAICS.

Okay, I basically rewrite the patterns to extract out a common base that can support N inputs, 1 output, all permutation maps. Hopefully this abstraction is good for future cases .

Harbormaster completed remote builds in B127338: Diff 377600.Oct 6 2021, 11:08 AM

Thanks! We can iterate on this if we need to.

mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp
1329	nit: s/require all permutation maps for now./require all permutation maps for now to be permutations.
1343	Dont see what this check is really doing. The actual implementation below is always true.

This revision is now accepted and ready to land.Oct 7 2021, 3:51 PM

antiagainst marked 2 inline comments as done.Oct 8 2021, 5:08 AM

antiagainst added inline comments.

mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp
1343	It's meant for subclasses to further check the indexing maps. For the only subclass we have, it checks `return genericOp.getIndexingMaps().size() == 2;`. That makes sure we have 1 input and 1 output. It can be false for cases where we have more inputs (which is allowed in the base class).

Closed by commit rG4cd7ff6728f4: [mlir][linalg] Constant fold linalg.generic that are transposes (authored by antiagainst). · Explain WhyOct 8 2021, 5:12 AM

This revision was automatically updated to reflect the committed changes.

antiagainst marked an inline comment as done.

antiagainst added a commit: rG4cd7ff6728f4: [mlir][linalg] Constant fold linalg.generic that are transposes.

Diff 376889

mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp

Show First 20 Lines • Show All 1,158 Lines • ▼ Show 20 Lines	struct FoldReshapeWithGenericOpByExpansion
}		}

private:		private:
ControlElementwiseOpsFusionFn controlFoldingReshapes;		ControlElementwiseOpsFusionFn controlFoldingReshapes;
};		};

/// Pattern to fold a generic op with a splat constant/scalar constant. Does not		/// Pattern to fold a generic op with a splat constant/scalar constant. Does not
/// handle cases where the constant is not single-valued.		/// handle cases where the constant is not single-valued.
class FoldConstants : public OpRewritePattern<GenericOp> {		class FoldScalarOrSplatConstant : public OpRewritePattern<GenericOp> {
public:		public:
FoldConstants(MLIRContext *context, ControlElementwiseOpsFusionFn &fun,		FoldScalarOrSplatConstant(MLIRContext *context,
		ControlElementwiseOpsFusionFn &fun,
PatternBenefit benefit = 1)		PatternBenefit benefit = 1)
: OpRewritePattern<GenericOp>(context, benefit), controlFn(fun) {}		: OpRewritePattern<GenericOp>(context, benefit), controlFn(fun) {}

LogicalResult matchAndRewrite(GenericOp genericOp,		LogicalResult matchAndRewrite(GenericOp genericOp,
PatternRewriter &rewriter) const override {		PatternRewriter &rewriter) const override {
if (!genericOp.hasTensorSemantics())		if (!genericOp.hasTensorSemantics())
return failure();		return failure();
for (OpOperand *opOperand : genericOp.getInputOperands()) {		for (OpOperand *opOperand : genericOp.getInputOperands()) {
Operation *def = opOperand->get().getDefiningOp();		Operation *def = opOperand->get().getDefiningOp();
▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	for (OpOperand *opOperand : genericOp.getInputOperands()) {
return success();		return success();
}		}
return failure();		return failure();
}		}

private:		private:
ControlElementwiseOpsFusionFn controlFn;		ControlElementwiseOpsFusionFn controlFn;
};		};

		// Folds linalg.generic ops that are actually transposes on constant values.
		class FoldConstantTranspose : public OpRewritePattern<GenericOp> {
		public:
		FoldConstantTranspose(MLIRContext *context,
		const ControlElementwiseOpsFusionFn &controlFn,
		PatternBenefit benefit = 1)
		: OpRewritePattern<GenericOp>(context, benefit), controlFn(controlFn) {}

		LogicalResult matchAndRewrite(GenericOp genericOp,
		PatternRewriter &rewriter) const override {
		if (genericOp.hasBufferSemantics())
		return failure();

		// Transpose should just have one input and one output.
		if (genericOp.inputs().size() != 1 \|\| genericOp.outputs().size() != 1)
		return failure();

		// All indexing maps should be permutations.
		if (!llvm::all_of(genericOp.getIndexingMaps(),
		[](AffineMap map) { return map.isPermutation(); }))
		return failure();

		Value input = genericOp.inputs().front();
		Value output = genericOp.getResult(0);
		auto inputType = input.getType().dyn_cast<ShapedType>();
		auto outputType = output.getType().dyn_cast<ShapedType>();
		auto elementType = inputType.getElementType();
		if (!inputType \|\| !outputType)
		return failure();
		if (!elementType.isIntOrFloat())
		return failure();
		if (!outputType.hasStaticShape())
		return failure();

		// Make sure the region only contains a yield op.
		Block &body = genericOp.region().front();
		if (!llvm::hasSingleElement(body))
		return failure();
		auto yieldOp = dyn_cast<linalg::YieldOp>(body.getTerminator());
		if (!yieldOp)
		return failure();

		// The yield op should return the block argument corresponds to the input.
		for (Value yieldVal : yieldOp.values()) {
		auto yieldArg = yieldVal.dyn_cast<BlockArgument>();
		if (!yieldArg \|\| yieldArg.getOwner() != &body)
		return failure();
		if (yieldArg.getArgNumber() != 0)
		return failure();
		}

		DenseIntOrFPElementsAttr inputValues;
		if (!matchPattern(input, m_Constant(&inputValues)))
		return failure();

		// Identified this as a potential candidate for folding Now check the
		// policy to see whether we are allowed to proceed.
		OpOperand *consumer = genericOp.getInputOperand(0);
		OpResult producer = consumer->get().cast<OpResult>();
		if (!controlFn(producer, *consumer))
		return failure();
		mravishankarUnsubmitted Done Reply Inline Actions nit: s/require all permutation maps for now./require all permutation maps for now to be permutations. mravishankar: nit: s/require all permutation maps for now./require all permutation maps for now to be…

		auto linalgOp = cast<LinalgOp>(genericOp.getOperation());
		SmallVector<int64_t, 4> loopBounds = linalgOp.computeStaticLoopSizes();
		mravishankarUnsubmitted Done Reply Inline Actions If I am reading this right, this is going to create a new `Attribute` for every element of the constant. Thats probably going to blow up the compilation time. Can we create the container that the `DenseElementsAttr` can hold directly instead of first creating an attribute for each value and then using that to create the `DenseElementsAttr` by individually extracting the value of each attribute? mravishankar: If I am reading this right, this is going to create a new `Attribute` for every element of the…
		antiagainstAuthorUnsubmitted Done Reply Inline Actions Right. I've changed to avoid creating Attributes all the time. Holding different element types in the same attribute is always hard to handle, but at least APInt/APFloat provides some common abstraction to avoid a full blown switch among all possible int/float bitwdiths. antiagainst: Right. I've changed to avoid creating Attributes all the time. Holding different element types…
		int64_t numElements = inputType.getNumElements();

		SmallVector<APInt> intOutputValues;
		mravishankarUnsubmitted Done Reply Inline Actions Nit: Some comments over the reasoning here would be good as a reference , i.e. state that this is done to save compile time, etc. mravishankar: Nit: Some comments over the reasoning here would be good as a reference , i.e. state that this…
		SmallVector<APFloat> fpOutputValues;
		if (elementType.isa<FloatType>())
		fpOutputValues.resize(numElements, APFloat(0.f));
		else
		intOutputValues.resize(numElements);

		// Return the constant dim positions from the given permutation map.
		auto getDimPositions = [](AffineMap map) {
		mravishankarUnsubmitted Done Reply Inline Actions Dont see what this check is really doing. The actual implementation below is always true. mravishankar: Dont see what this check is really doing. The actual implementation below is always true.
		antiagainstAuthorUnsubmitted Done Reply Inline Actions It's meant for subclasses to further check the indexing maps. For the only subclass we have, it checks `return genericOp.getIndexingMaps().size() == 2;`. That makes sure we have 1 input and 1 output. It can be false for cases where we have more inputs (which is allowed in the base class). antiagainst: It's meant for subclasses to further check the indexing maps. For the only subclass we have, it…
		SmallVector<unsigned> dims;
		dims.reserve(map.getNumResults());
		for (AffineExpr result : map.getResults()) {
		dims.push_back(result.cast<AffineDimExpr>().getPosition());
		}
		return dims;
		};

		auto inputDims = getDimPositions(genericOp.getIndexingMaps()[0]);
		auto outputDims = getDimPositions(genericOp.getIndexingMaps()[1]);
		auto outputShape = outputType.getShape();

		// Transpose the input constant. Because we don't know its rank in advance,
		// we need to loop over the range [0, element count) and delinearize the
		// index.
		for (int linearIndex0 = 0; linearIndex0 < numElements; ++linearIndex0) {
		SmallVector<uint64_t, 6> indices(loopBounds.size(), 0);
		int totalCount = linearIndex0;
		for (int dim = loopBounds.size() - 1; dim >= 0; --dim) {
		indices[dim] = totalCount % loopBounds[dim];
		totalCount /= loopBounds[dim];
		}

		SmallVector<uint64_t, 6> srcIndices(loopBounds.size(), 0);
		SmallVector<uint64_t, 6> dstIndices(loopBounds.size(), 0);
		for (int dim = loopBounds.size() - 1; dim >= 0; --dim) {
		srcIndices[dim] = indices[inputDims[dim]];
		dstIndices[dim] = indices[outputDims[dim]];
		}

		uint64_t linearIndex1 = dstIndices.front();
		for (int dim = 1; dim < outputType.getRank(); ++dim)
		linearIndex1 = linearIndex1 * outputShape[dim] + dstIndices[dim];

		if (elementType.isa<FloatType>()) {
		fpOutputValues[linearIndex1] =
		inputValues.getValue<APFloat>(srcIndices);
		} else {
		intOutputValues[linearIndex1] = inputValues.getValue<APInt>(srcIndices);
		}
		}

		DenseIntOrFPElementsAttr outputAttr;
		if (elementType.isa<FloatType>()) {
		outputAttr = DenseFPElementsAttr::get(outputType, fpOutputValues);
		} else {
		outputAttr = DenseIntElementsAttr::get(outputType, intOutputValues);
		}
		rewriter.replaceOpWithNewOp<ConstantOp>(genericOp, outputAttr);
		return success();
		}

		private:
		ControlElementwiseOpsFusionFn controlFn;
		};

} // namespace		} // namespace

static Optional<SmallVector<Value>>		static Optional<SmallVector<Value>>
fuseElementwiseOps(PatternRewriter &rewriter, OpOperand *consumerOpOperand,		fuseElementwiseOps(PatternRewriter &rewriter, OpOperand *consumerOpOperand,
GenericOp producer,		GenericOp producer,
const ControlElementwiseOpsFusionFn &controlFn) {		const ControlElementwiseOpsFusionFn &controlFn) {
if (producer->getNumResults() != 1)		if (producer->getNumResults() != 1)
return llvm::None;		return llvm::None;
▲ Show 20 Lines • Show All 158 Lines • ▼ Show 20 Lines	patterns.add<FoldReshapeWithGenericOpByExpansion>(patterns.getContext(),
controlFoldingReshapes);		controlFoldingReshapes);
patterns.add<FoldWithProducerReshapeOpByExpansion>(patterns.getContext(),		patterns.add<FoldWithProducerReshapeOpByExpansion>(patterns.getContext(),
controlFoldingReshapes);		controlFoldingReshapes);
}		}

void mlir::linalg::populateElementwiseOpsFusionPatterns(		void mlir::linalg::populateElementwiseOpsFusionPatterns(
RewritePatternSet &patterns, LinalgElementwiseFusionOptions options) {		RewritePatternSet &patterns, LinalgElementwiseFusionOptions options) {
auto *context = patterns.getContext();		auto *context = patterns.getContext();
patterns.add<FuseElementwiseOps, FoldConstants>(		patterns.add<FuseElementwiseOps, FoldScalarOrSplatConstant,
context, options.controlElementwiseOpsFusionFn);		FoldConstantTranspose>(context,
		options.controlElementwiseOpsFusionFn);
patterns.add<RemoveOutsDependency>(context);		patterns.add<RemoveOutsDependency>(context);
populateFoldReshapeOpsByExpansionPatterns(patterns,		populateFoldReshapeOpsByExpansionPatterns(patterns,
options.controlFoldingReshapesFn);		options.controlFoldingReshapesFn);
AffineApplyOp::getCanonicalizationPatterns(patterns, context);		AffineApplyOp::getCanonicalizationPatterns(patterns, context);
GenericOp::getCanonicalizationPatterns(patterns, context);		GenericOp::getCanonicalizationPatterns(patterns, context);
TensorExpandShapeOp::getCanonicalizationPatterns(patterns, context);		TensorExpandShapeOp::getCanonicalizationPatterns(patterns, context);
TensorCollapseShapeOp::getCanonicalizationPatterns(patterns, context);		TensorCollapseShapeOp::getCanonicalizationPatterns(patterns, context);
context->getLoadedDialect<LinalgDialect>()->getCanonicalizationPatterns(		context->getLoadedDialect<LinalgDialect>()->getCanonicalizationPatterns(
Show All 15 Lines

mlir/test/Dialect/Linalg/fusion-elementwise-ops.mlir

Show First 20 Lines • Show All 749 Lines • ▼ Show 20 Lines	func @fuse_scalar_constant(%arg0 : tensor<?x?xf32>) -> (tensor<?x?xf32>, tensor<?x?xi32>) {
%c1 = constant 1 : index		%c1 = constant 1 : index
%d0 = tensor.dim %arg0, %c0 : tensor<?x?xf32>		%d0 = tensor.dim %arg0, %c0 : tensor<?x?xf32>
%d1 = tensor.dim %arg0, %c1 : tensor<?x?xf32>		%d1 = tensor.dim %arg0, %c1 : tensor<?x?xf32>
%0 = linalg.init_tensor[%d0, %d1] : tensor<?x?xf32>		%0 = linalg.init_tensor[%d0, %d1] : tensor<?x?xf32>
%1 = linalg.init_tensor[%d0, %d1] : tensor<?x?xi32>		%1 = linalg.init_tensor[%d0, %d1] : tensor<?x?xi32>
%2:2 = linalg.generic {		%2:2 = linalg.generic {
indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>,		indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>,
affine_map<(d0, d1) -> ()>,		affine_map<(d0, d1) -> ()>,
affine_map<(d0, d1) -> ()>,		affine_map<(d0, d1) -> ()>,
affine_map<(d0, d1) -> (d0, d1)>,		affine_map<(d0, d1) -> (d0, d1)>,
affine_map<(d0, d1) -> (d0, d1)>],		affine_map<(d0, d1) -> (d0, d1)>],
iterator_types = ["parallel", "parallel"]}		iterator_types = ["parallel", "parallel"]}
ins(%arg0, %cst, %c42 : tensor<?x?xf32>, f32, i32)		ins(%arg0, %cst, %c42 : tensor<?x?xf32>, f32, i32)
outs(%0, %1 : tensor<?x?xf32>, tensor<?x?xi32>) {		outs(%0, %1 : tensor<?x?xf32>, tensor<?x?xi32>) {
^bb0(%arg1 : f32, %arg2 : f32, %arg3 : i32, %arg4 : f32, %arg5 : i32) :		^bb0(%arg1 : f32, %arg2 : f32, %arg3 : i32, %arg4 : f32, %arg5 : i32) :
%3 = addf %arg1, %arg2 : f32		%3 = addf %arg1, %arg2 : f32
linalg.yield %3, %arg3 : f32, i32		linalg.yield %3, %arg3 : f32, i32
} -> (tensor<?x?xf32>, tensor<?x?xi32>)		} -> (tensor<?x?xf32>, tensor<?x?xi32>)
return %2#0, %2#1 : tensor<?x?xf32>, tensor<?x?xi32>		return %2#0, %2#1 : tensor<?x?xf32>, tensor<?x?xi32>
}		}
// CHECK-LABEL: func @fuse_scalar_constant		// CHECK-LABEL: func @fuse_scalar_constant
// CHECK-DAG: %[[CST:.+]] = constant 4.000000e+00 : f32		// CHECK-DAG: %[[CST:.+]] = constant 4.000000e+00 : f32
// CHECK-DAG: %[[C42:.+]] = constant 42 : i32		// CHECK-DAG: %[[C42:.+]] = constant 42 : i32
// CHECK: linalg.generic		// CHECK: linalg.generic
// CHECK-SAME: ins(%{{.+}} : tensor<?x?xf32>)		// CHECK-SAME: ins(%{{.+}} : tensor<?x?xf32>)
// CHECK: %[[YIELD:.+]] = addf %{{.+}}, %[[CST]] : f32		// CHECK: %[[YIELD:.+]] = addf %{{.+}}, %[[CST]] : f32
// CHECK: linalg.yield %[[YIELD]], %[[C42]] : f32, i32		// CHECK: linalg.yield %[[YIELD]], %[[C42]] : f32, i32

		// -----

		// CHECK-LABEL: @transpose_fold_2d_fp32
		func @transpose_fold_2d_fp32(%init: tensor<3x2xf32>) -> tensor<3x2xf32> {
		%input = constant dense<[[0.0, 1.0, 2.0], [3.0, 4.0, 5.0]]> : tensor<2x3xf32>
		// CHECK: %[[CST:.+]] = constant
		// CHECK-SAME{LITERAL}: dense<[[0.000000e+00, 3.000000e+00], [1.000000e+00, 4.000000e+00], [2.000000e+00, 5.000000e+00]]> : tensor<3x2xf32>
		%1 = linalg.generic {
		indexing_maps = [affine_map<(d0, d1) -> (d1, d0)>, affine_map<(d0, d1) -> (d0, d1)>],
		iterator_types = ["parallel", "parallel"]
		} ins(%input : tensor<2x3xf32>) outs(%init : tensor<3x2xf32>) {
		^bb0(%arg1: f32, %arg2: f32):
		linalg.yield %arg1 : f32
		} -> tensor<3x2xf32>
		// CHECK: return %[[CST]]
		return %1 : tensor<3x2xf32>
		}

		// -----

		// CHECK-LABEL: @transpose_fold_2d_fp64
		func @transpose_fold_2d_fp64(%init: tensor<3x2xf64>) -> tensor<3x2xf64> {
		%input = constant dense<[[0.0, 1.0, 2.0], [3.0, 4.0, 5.0]]> : tensor<2x3xf64>
		// CHECK: %[[CST:.+]] = constant
		// CHECK-SAME{LITERAL}: dense<[[0.000000e+00, 3.000000e+00], [1.000000e+00, 4.000000e+00], [2.000000e+00, 5.000000e+00]]> : tensor<3x2xf64>
		%1 = linalg.generic {
		indexing_maps = [affine_map<(d0, d1) -> (d1, d0)>, affine_map<(d0, d1) -> (d0, d1)>],
		iterator_types = ["parallel", "parallel"]
		} ins(%input : tensor<2x3xf64>) outs(%init : tensor<3x2xf64>) {
		^bb0(%arg1: f64, %arg2: f64):
		linalg.yield %arg1 : f64
		} -> tensor<3x2xf64>
		// CHECK: return %[[CST]]
		return %1 : tensor<3x2xf64>
		}

		// -----

		// CHECK-LABEL: @transpose_fold_4d_i32
		func @transpose_fold_4d_i32(%init: tensor<3x1x4x2xi32>) -> tensor<3x1x4x2xi32> {
		%input = constant dense<[[
		[[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]],
		[[12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]]
		]]> : tensor<1x2x3x4xi32>
		// CHECK: %[[CST:.+]] = constant dense<[
		// CHECK-SAME{LITERAL}: [[[0, 12], [1, 13], [2, 14], [3, 15]]],
		// CHECK-SAME{LITERAL}: [[[4, 16], [5, 17], [6, 18], [7, 19]]],
		// CHECK-SAME{LITERAL}: [[[8, 20], [9, 21], [10, 22], [11, 23]]]
		// CHECK-SAME{LITERAL}: ]>
		%1 = linalg.generic {
		indexing_maps = [affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>, affine_map<(d0, d1, d2, d3) -> (d2, d0, d3, d1)>],
		iterator_types = ["parallel", "parallel", "parallel", "parallel"]
		} ins(%input : tensor<1x2x3x4xi32>) outs(%init : tensor<3x1x4x2xi32>) {
		^bb0(%arg1: i32, %arg2: i32):
		linalg.yield %arg1 : i32
		} -> tensor<3x1x4x2xi32>
		// CHECK: return %[[CST]]
		return %1 : tensor<3x1x4x2xi32>
		}

		// -----

		// CHECK-LABEL: @transpose_fold_4d_i16
		func @transpose_fold_4d_i16(%init: tensor<3x1x4x2xi16>) -> tensor<3x1x4x2xi16> {
		%input = constant dense<[[
		[[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]],
		[[12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]]
		]]> : tensor<1x2x3x4xi16>
		// CHECK: %[[CST:.+]] = constant dense<[
		// CHECK-SAME{LITERAL}: [[[0, 12], [1, 13], [2, 14], [3, 15]]],
		// CHECK-SAME{LITERAL}: [[[4, 16], [5, 17], [6, 18], [7, 19]]],
		// CHECK-SAME{LITERAL}: [[[8, 20], [9, 21], [10, 22], [11, 23]]]
		// CHECK-SAME{LITERAL}: ]>
		%1 = linalg.generic {
		indexing_maps = [affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>, affine_map<(d0, d1, d2, d3) -> (d2, d0, d3, d1)>],
		iterator_types = ["parallel", "parallel", "parallel", "parallel"]
		} ins(%input : tensor<1x2x3x4xi16>) outs(%init : tensor<3x1x4x2xi16>) {
		^bb0(%arg1: i16, %arg2: i16):
		linalg.yield %arg1 : i16
		} -> tensor<3x1x4x2xi16>
		// CHECK: return %[[CST]]
		return %1 : tensor<3x1x4x2xi16>
		}

		// -----

		// CHECK-LABEL: @transpose_nofold_non_cst_input
		func @transpose_nofold_non_cst_input(%input: tensor<2x3xf32>, %init: tensor<3x2xf32>) -> tensor<3x2xf32> {
		// CHECK: linalg.generic
		%1 = linalg.generic {
		indexing_maps = [affine_map<(d0, d1) -> (d1, d0)>, affine_map<(d0, d1) -> (d0, d1)>],
		iterator_types = ["parallel", "parallel"]
		} ins(%input : tensor<2x3xf32>) outs(%init : tensor<3x2xf32>) {
		^bb0(%arg1: f32, %arg2: f32):
		linalg.yield %arg1 : f32
		} -> tensor<3x2xf32>
		return %1 : tensor<3x2xf32>
		}

		// -----

		// CHECK-LABEL: @transpose_nofold_yield_const
		func @transpose_nofold_yield_const(%init: tensor<3x2xf32>) -> tensor<3x2xf32> {
		%input = constant dense<[[0.0, 1.0, 2.0], [3.0, 4.0, 5.0]]> : tensor<2x3xf32>
		%cst = constant 8.0 : f32
		// CHECK: linalg.generic
		%1 = linalg.generic {
		indexing_maps = [affine_map<(d0, d1) -> (d1, d0)>, affine_map<(d0, d1) -> (d0, d1)>],
		iterator_types = ["parallel", "parallel"]
		} ins(%input : tensor<2x3xf32>) outs(%init : tensor<3x2xf32>) {
		^bb0(%arg1: f32, %arg2: f32):
		linalg.yield %cst : f32
		} -> tensor<3x2xf32>
		return %1 : tensor<3x2xf32>
		}

		// -----

		// CHECK-LABEL: @transpose_nofold_multi_ops_in_region
		func @transpose_nofold_multi_ops_in_region(%init: tensor<3x2xf32>) -> tensor<3x2xf32> {
		%input = constant dense<[[0.0, 1.0, 2.0], [3.0, 4.0, 5.0]]> : tensor<2x3xf32>
		// CHECK: linalg.generic
		%1 = linalg.generic {
		indexing_maps = [affine_map<(d0, d1) -> (d1, d0)>, affine_map<(d0, d1) -> (d0, d1)>],
		iterator_types = ["parallel", "parallel"]
		} ins(%input : tensor<2x3xf32>) outs(%init : tensor<3x2xf32>) {
		^bb0(%arg1: f32, %arg2: f32):
		%add = addf %arg1, %arg1 : f32
		linalg.yield %add : f32
		} -> tensor<3x2xf32>
		return %1 : tensor<3x2xf32>
		}

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][linalg] Constant fold linalg.generic that are transposes
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 376889

mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp

mlir/test/Dialect/Linalg/fusion-elementwise-ops.mlir

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][linalg] Constant fold linalg.generic that are transposesClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 376889

mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp

mlir/test/Dialect/Linalg/fusion-elementwise-ops.mlir

[mlir][linalg] Constant fold linalg.generic that are transposes
ClosedPublic