This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/StandardOps/IR/
-
mlir/
-
Dialect/
-
StandardOps/
-
IR/
-
Ops.td
-
lib/Dialect/StandardOps/IR/
-
Dialect/
-
StandardOps/
-
IR/
7/11
Ops.cpp
-
test/Transforms/
-
Transforms/
-
canonicalize.mlir

Differential D87696

[mlir][Standard] Canonicalize chains of tensor_cast operations
ClosedPublic

Authored by herhut on Sep 15 2020, 7:47 AM.

Download Raw Diff

Details

Reviewers

frgossen
mehdi_amini
jpienaar
ftynse

Summary

Adds a pattern that replaces a chain of two tensor_cast operations by a single
tensor_cast operation if doing so will not remove constraints on the shapes.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

herhut created this revision.Sep 15 2020, 7:47 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 15 2020, 7:47 AM

Herald added subscribers: tatianashp, msifontes, jurahul and 13 others. · View Herald Transcript

herhut requested review of this revision.Sep 15 2020, 7:47 AM

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald TranscriptSep 15 2020, 7:47 AM

I have considered to place a templated version of joinShapes somewhere (to be used with ShapedType and a template parameter to figure out the type to create) but wasn't sure where and whether useful.

Harbormaster completed remote builds in B71739: Diff 291924.Sep 15 2020, 8:02 AM

jpienaar added inline comments.Sep 15 2020, 8:08 AM

mlir/lib/Dialect/StandardOps/IR/Ops.cpp
3143	This is a little restrictive, especially if one considers element types with nested types. Something operating on shapes instead could be reused and no need for templating or special restrictions on element type. E.g., join of two shapes with result being a vector/populating a vector (where first element is rank to allow returning unranked) or using ShapedComponentType (https://github.com/llvm/llvm-project/blob/8985755762a429573af2ce657274772339d3b9db/mlir/include/mlir/Interfaces/InferTypeOpInterface.h#L34). The latter does have an element type and attribute extra (the former can be left empty unless equal in the join, the latter we haven't used yet and so remove it too)
3197	Perhaps expand to say that it won't be used if it contains less information and remove empty line below.

ftynse accepted this revision.Sep 15 2020, 8:11 AM

ftynse added a subscriber: ftynse.

ftynse added inline comments.

mlir/lib/Dialect/StandardOps/IR/Ops.cpp
3150	Drop `llvm::`
3151	assert that ranks are equal
3162	Reserve space before pushing in a loop, you are guaranteed to push back exactly once on each iteration

This revision is now accepted and ready to land.Sep 15 2020, 8:11 AM

frgossen added inline comments.Sep 15 2020, 8:35 AM

mlir/lib/Dialect/StandardOps/IR/Ops.cpp
3142	I think, you could get away without materialising the joined shapes here. If the cast chain is A -> B -> C then A <= B or C <= B is a sufficient condition for this rewrite (smaller meaning more concrete shapes).
3150	Would `resize(one.getRank())` be useful?
3200	A comment for why this is needed could help. Cannot eliminate intermediate cast if is stricter than the resulting cast.

frgossen accepted this revision.Sep 15 2020, 8:35 AM

Comments

I will do the change to use ShapedTypeComponents in a follow up to enable reuse for the memref case.

mlir/lib/Dialect/StandardOps/IR/Ops.cpp
3142	That is true, if computed for each element of the shape. It is not required for the shapes as a whole. Like A = [1,?], B = [?,2] and C = [1,2]. So implementing `lessPrecise(ShapeType, ShapeType)` and composing the above from it does not work. So I need a specialized `castingAtoCisEquivalentToCastingAtoBtoC()`, which I shied away from as `join` seemed more reusable. So how much to we value performance over code reuse?
3143	A thanks! How about moving `ShapedTypeComponents` to TypeUtilities.h or even cook up a ShapeUtilities.h? If we use this more broadly than just the InferTypeOpInterface, it should live in a shared location.

Harbormaster completed remote builds in B71832: Diff 292128.Sep 16 2020, 1:35 AM

frgossen added inline comments.Sep 16 2020, 2:32 AM

mlir/lib/Dialect/StandardOps/IR/Ops.cpp
3142	Right, the order on shapes is not complete. The `join` version is a lot easier to read :-)

This has landed a while back in 5e0ded268929b87ddf2c5e077c9185554342f602.

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

StandardOps/

IR/

Ops.td

2 lines

lib/

Dialect/

StandardOps/

IR/

Ops.cpp

81 lines

test/

Transforms/

canonicalize.mlir

48 lines

Diff 292128

mlir/include/mlir/Dialect/StandardOps/IR/Ops.td

Show First 20 Lines • Show All 2,988 Lines • ▼ Show 20 Lines	def TensorCastOp : CastOp<"tensor_cast"> {
let extraClassDeclaration = [{		let extraClassDeclaration = [{
/// Return true if `a` and `b` are valid operand and result pairs for		/// Return true if `a` and `b` are valid operand and result pairs for
/// the operation.		/// the operation.
static bool areCastCompatible(Type a, Type b);		static bool areCastCompatible(Type a, Type b);

/// The result of a tensor_cast is always a tensor.		/// The result of a tensor_cast is always a tensor.
TensorType getType() { return getResult().getType().cast<TensorType>(); }		TensorType getType() { return getResult().getType().cast<TensorType>(); }
}];		}];

		let hasCanonicalizer = 1;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// TensorLoadOp		// TensorLoadOp
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

def TensorLoadOp : Std_Op<"tensor_load",		def TensorLoadOp : Std_Op<"tensor_load",
[SameOperandsAndResultShape, SameOperandsAndResultElementType,		[SameOperandsAndResultShape, SameOperandsAndResultElementType,
▲ Show 20 Lines • Show All 383 Lines • Show Last 20 Lines

mlir/lib/Dialect/StandardOps/IR/Ops.cpp

Show First 20 Lines • Show All 3,131 Lines • ▼ Show 20 Lines	bool TensorCastOp::areCastCompatible(Type a, Type b) {

return succeeded(verifyCompatibleShape(aT, bT));		return succeeded(verifyCompatibleShape(aT, bT));
}		}

OpFoldResult TensorCastOp::fold(ArrayRef<Attribute> operands) {		OpFoldResult TensorCastOp::fold(ArrayRef<Attribute> operands) {
return impl::foldCastOp(*this);		return impl::foldCastOp(*this);
}		}

		/// Compute a TensorType that has the joined shape knowledge of the two
		/// given TensorTypes. The element types need to match.
		static TensorType joinShapes(TensorType one, TensorType two) {
		frgossenUnsubmitted Not Done Reply Inline Actions I think, you could get away without materialising the joined shapes here. If the cast chain is A -> B -> C then A <= B or C <= B is a sufficient condition for this rewrite (smaller meaning more concrete shapes). frgossen: I think, you could get away without materialising the joined shapes here. If the cast chain…
		herhutAuthorUnsubmitted Not Done Reply Inline Actions That is true, if computed for each element of the shape. It is not required for the shapes as a whole. Like A = [1,?], B = [?,2] and C = [1,2]. So implementing `lessPrecise(ShapeType, ShapeType)` and composing the above from it does not work. So I need a specialized `castingAtoCisEquivalentToCastingAtoBtoC()`, which I shied away from as `join` seemed more reusable. So how much to we value performance over code reuse? herhut: That is true, if computed for each element of the shape. It is not required for the shapes as a…
		frgossenUnsubmitted Not Done Reply Inline Actions Right, the order on shapes is not complete. The `join` version is a lot easier to read :-) frgossen: Right, the order on shapes is not complete. The `join` version is a lot easier to read :-)
		assert(one.getElementType() == two.getElementType());
		jpienaarUnsubmitted Not Done Reply Inline Actions This is a little restrictive, especially if one considers element types with nested types. Something operating on shapes instead could be reused and no need for templating or special restrictions on element type. E.g., join of two shapes with result being a vector/populating a vector (where first element is rank to allow returning unranked) or using ShapedComponentType (https://github.com/llvm/llvm-project/blob/8985755762a429573af2ce657274772339d3b9db/mlir/include/mlir/Interfaces/InferTypeOpInterface.h#L34). The latter does have an element type and attribute extra (the former can be left empty unless equal in the join, the latter we haven't used yet and so remove it too) jpienaar: This is a little restrictive, especially if one considers element types with nested types.
		herhutAuthorUnsubmitted Done Reply Inline Actions A thanks! How about moving `ShapedTypeComponents` to TypeUtilities.h or even cook up a ShapeUtilities.h? If we use this more broadly than just the InferTypeOpInterface, it should live in a shared location. herhut: A thanks! How about moving `ShapedTypeComponents` to TypeUtilities.h or even cook up a…

		if (!one.hasRank())
		return two;
		if (!two.hasRank())
		return one;

		int64_t rank = one.getRank();
		frgossenUnsubmitted Done Reply Inline Actions Would `resize(one.getRank())` be useful? frgossen: Would `resize(one.getRank())` be useful?
		ftynseUnsubmitted Done Reply Inline Actions Drop `llvm::` ftynse: Drop `llvm::`
		if (rank != two.getRank())
		ftynseUnsubmitted Done Reply Inline Actions assert that ranks are equal ftynse: assert that ranks are equal
		return {};

		SmallVector<int64_t, 4> join;
		join.reserve(rank);
		for (int64_t i = 0; i < rank; ++i) {
		if (one.isDynamicDim(i)) {
		join.push_back(two.getDimSize(i));
		continue;
		}
		if (two.isDynamicDim(i)) {
		join.push_back(one.getDimSize(i));
		ftynseUnsubmitted Done Reply Inline Actions Reserve space before pushing in a loop, you are guaranteed to push back exactly once on each iteration ftynse: Reserve space before pushing in a loop, you are guaranteed to push back exactly once on each…
		continue;
		}
		if (one.getDimSize(i) != two.getDimSize(i))
		return {};
		join.push_back(one.getDimSize(i));
		}
		return RankedTensorType::get(join, one.getElementType());
		}

		namespace {

		/// Replaces chains of two tensor_cast operations by a single tensor_cast
		/// operation if doing so does not remove runtime constraints.
		struct ChainedTensorCast : public OpRewritePattern<TensorCastOp> {
		using OpRewritePattern<TensorCastOp>::OpRewritePattern;

		LogicalResult matchAndRewrite(TensorCastOp tensorCast,
		PatternRewriter &rewriter) const final {
		auto tensorCastOperand =
		tensorCast.getOperand().getDefiningOp<TensorCastOp>();

		if (!tensorCastOperand)
		return failure();

		auto sourceType =
		tensorCastOperand.getOperand().getType().cast<TensorType>();
		auto intermediateType = tensorCastOperand.getType().cast<TensorType>();
		auto resultType = tensorCast.getType().cast<TensorType>();

		// We can remove the intermediate cast if joining all three produces the
		// same result as just joining the source and result shapes.
		auto firstJoin =
		joinShapes(joinShapes(sourceType, intermediateType), resultType);

		// The join might not exist if the cast sequence would fail at runtime.
		jpienaarUnsubmitted Done Reply Inline Actions Perhaps expand to say that it won't be used if it contains less information and remove empty line below. jpienaar: Perhaps expand to say that it won't be used if it contains less information and remove empty…
		if (!firstJoin)
		return failure();

		frgossenUnsubmitted Done Reply Inline Actions A comment for why this is needed could help. Cannot eliminate intermediate cast if is stricter than the resulting cast. frgossen: A comment for why this is needed could help. Cannot eliminate intermediate cast if is stricter…
		// The newJoin always exists if the above join exists, it might just contain
		// less information. If so, we cannot drop the intermediate cast, as doing
		// so would remove runtime checks.
		auto newJoin = joinShapes(sourceType, resultType);
		if (firstJoin != newJoin)
		return failure();

		rewriter.replaceOpWithNewOp<TensorCastOp>(tensorCast, resultType,
		tensorCastOperand.getOperand());
		return success();
		}
		};

		} // namespace

		void TensorCastOp::getCanonicalizationPatterns(
		OwningRewritePatternList &results, MLIRContext *context) {
		results.insert<ChainedTensorCast>(context);
		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Helpers for Tensor[Load\|Store]Op		// Helpers for Tensor[Load\|Store]Op
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

static Type getTensorTypeFromMemRefType(Type type) {		static Type getTensorTypeFromMemRefType(Type type) {
if (auto memref = type.dyn_cast<MemRefType>())		if (auto memref = type.dyn_cast<MemRefType>())
return RankedTensorType::get(memref.getShape(), memref.getElementType());		return RankedTensorType::get(memref.getShape(), memref.getElementType());
if (auto memref = type.dyn_cast<UnrankedMemRefType>())		if (auto memref = type.dyn_cast<UnrankedMemRefType>())
▲ Show 20 Lines • Show All 289 Lines • Show Last 20 Lines

mlir/test/Transforms/canonicalize.mlir

Show First 20 Lines • Show All 1,056 Lines • ▼ Show 20 Lines	%0 = dynamic_tensor_from_elements %size1, %c5, %size4 {
%1 = constant 32 : index		%1 = constant 32 : index
yield %1 : index		yield %1 : index
// CHECK: : tensor<3x?x5x7x?xindex>		// CHECK: : tensor<3x?x5x7x?xindex>
} : tensor<3x?x?x7x?xindex>		} : tensor<3x?x?x7x?xindex>
// CHECK: tensor_cast %{{.*}} : tensor<3x?x5x7x?xindex> to tensor<3x?x?x7x?xindex>		// CHECK: tensor_cast %{{.*}} : tensor<3x?x5x7x?xindex> to tensor<3x?x?x7x?xindex>
return %0 : tensor<3x?x?x7x?xindex>		return %0 : tensor<3x?x?x7x?xindex>
}		}

		// -----

		// CHECK-LABEL: @tensor_cast_chain_ok
		// CHECK-SAME: %[[IN:.]]: tensor<xi32>
		func @tensor_cast_chain_ok(%input: tensor<*xi32>) -> tensor<4x8xi32> {
		// CHECK-NEXT: %[[RES:.]] = tensor_cast %[[IN]] : tensor<xi32> to tensor<4x8xi32>
		%0 = tensor_cast %input : tensor<*xi32> to tensor<4x?xi32>
		%1 = tensor_cast %0 : tensor<4x?xi32> to tensor<4x8xi32>
		// CHECK-NEXT: return %[[RES]]
		return %1 : tensor<4x8xi32>
		}

		// -----

		// CHECK-LABEL: @tensor_cast_chain_regain
		// CHECK-SAME: %[[IN:.*]]: tensor<4xi32>
		func @tensor_cast_chain_regain(%input: tensor<4xi32>) -> tensor<4xi32> {
		%0 = tensor_cast %input : tensor<4xi32> to tensor<?xi32>
		%1 = tensor_cast %0 : tensor<?xi32> to tensor<4xi32>
		// CHECK-NEXT: return %[[IN]]
		return %1 : tensor<4xi32>
		}

		// -----

		// CHECK-LABEL: @tensor_cast_chain_keep
		// CHECK-SAME: %[[IN:.*]]: tensor<?x?xi32>
		func @tensor_cast_chain_keep(%input: tensor<?x?xi32>) -> tensor<?x8xi32> {
		// CHECK-NEXT: %[[C1:.*]] = tensor_cast %[[IN]]
		%0 = tensor_cast %input : tensor<?x?xi32> to tensor<4x?xi32>
		// CHECK-NEXT: %[[C2:.*]] = tensor_cast %[[C1]]
		%1 = tensor_cast %0 : tensor<4x?xi32> to tensor<?x8xi32>
		// CHECK-NEXT: return %[[C2]]
		return %1 : tensor<?x8xi32>
		}

		// -----

		// CHECK-LABEL: @tensor_cast_chain_invalid
		// CHECK-SAME: %[[IN:.*]]: tensor<4x8xi32>
		func @tensor_cast_chain_invalid(%input: tensor<4x8xi32>) -> tensor<8x4xi32> {
		// CHECK-NEXT: %[[C1:.*]] = tensor_cast %[[IN]]
		%0 = tensor_cast %input : tensor<4x8xi32> to tensor<?x?xi32>
		// CHECK-NEXT: %[[C2:.*]] = tensor_cast %[[C1]]
		%1 = tensor_cast %0 : tensor<?x?xi32> to tensor<8x4xi32>
		// CHECK-NEXT: return %[[C2]]
		return %1 : tensor<8x4xi32>
		}