This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
lib/Dialect/SCF/IR/
-
Dialect/
-
SCF/
-
IR/
-
SCF.cpp
-
test/Dialect/SCF/
-
Dialect/
-
SCF/
-
canonicalize.mlir

Differential D144577

[mlir] Fix folding for scf.for(tensor.cast).
AcceptedPublic

Authored by pifon2a on Feb 22 2023, 10:53 AM.

Download Raw Diff

Details

Reviewers

bkramer
nicolasvasilache

Commits

rGa5cdcf49b2f7: [mlir] Fix folding for scf.for(tensor.cast).

Summary

We should only fold tensor.casts that provide some new static information about
shapes, instead of looking for a symmetric pattern cast(for(cast)).

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

pifon2a created this revision.Feb 22 2023, 10:53 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 22 2023, 10:53 AM

Herald added subscribers: Moerafaat, zero9178, bzcheeseman and 21 others. · View Herald Transcript

pifon2a requested review of this revision.Feb 22 2023, 10:53 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 22 2023, 10:53 AM

Herald added a subscriber: stephenneuendorffer. · View Herald Transcript

pifon2a retitled this revision from [mlir] Fix foling for scf.for(tensor.cast). to [mlir] Fix folding for scf.for(tensor.cast)..Feb 22 2023, 10:58 AM

Harbormaster completed remote builds in B215323: Diff 499586.Feb 22 2023, 2:04 PM

This is in line with how casts are folded in other parts of the system so LGTM.
Please add a minimal test before landing!

This revision is now accepted and ready to land.Feb 23 2023, 2:03 AM

Add a test.

This revision was landed with ongoing or failed builds.Feb 23 2023, 2:23 AM

Closed by commit rGa5cdcf49b2f7: [mlir] Fix folding for scf.for(tensor.cast). (authored by pifon2a). · Explain Why

This revision was automatically updated to reflect the committed changes.

pifon2a added a commit: rGa5cdcf49b2f7: [mlir] Fix folding for scf.for(tensor.cast)..

Harbormaster completed remote builds in B215460: Diff 499774.Feb 23 2023, 2:37 AM

@pifon I apologize, this approval was a big oversight on my part, the change is incorrect: scf.for provides no guarantee that the iter_arg type remains the same type dynamically.

This is why the original pattern was looking for pairs of casts.

We need to revert this.

This revision is now accepted and ready to land.Apr 3 2023, 2:00 AM

Is there an example of such scf.for? Maybe there is another way to fix it.

In D144577#4240066, @nicolasvasilache wrote:

@pifon I apologize, this approval was a big oversight on my part, the change is incorrect: scf.for provides no guarantee that the iter_arg type remains the same type dynamically.

This is why the original pattern was looking for pairs of casts.

We need to revert this.

The use case that exposed the bug will be fixed independently indeed.

Here is a problematic example:

%1 = tensor.cast %0: !static_tensor to !dynamic_tensor
%2 = scf.for ... (%iter = %1) -> {
  %3 = take_a_random_slice(%iter) : !dynamic_tensor -> !dynamic_tensor
  scf.yield %3 : !dynamic_tensor
} -> !dynamic_tensor
%3 = do_something(%2) : !dynamic_tensor

take_a_random_slice can return a slice of any size and casting it to !static_tensor is only valid when the last yielded tensor has the expected dynamic size at runtime.

This easily propagates miscompiles that result in out of bounds accesses way later in the program.

This PR makes an assumption on scf.for that the dynamic type remains constant, this is an incorrect assumption.

Does not that mean that in this example we yield values of different sizes on every iteration? I thought that type(iter_arg), type(result) and the type in scf.yield should all be compatible.

In D144577#4240349, @nicolasvasilache wrote:
The use case that exposed the bug will be fixed independently indeed.

Here is a problematic example:
%1 = tensor.cast %0: !static_tensor to !dynamic_tensor
%2 = scf.for ... (%iter = %1) -> {
  %3 = take_a_random_slice(%iter) : !dynamic_tensor -> !dynamic_tensor
  scf.yield %3 : !dynamic_tensor
} -> !dynamic_tensor
%3 = do_something(%2) : !dynamic_tensor
take_a_random_slice can return a slice of any size and casting it to !static_tensor is only valid when the last yielded tensor has the expected dynamic size at runtime.

This easily propagates miscompiles that result in out of bounds accesses way later in the program.

This PR makes an assumption on scf.for that the dynamic type remains constant, this is an incorrect assumption.

In D144577#4240366, @pifon2a wrote:

Does not that mean that in this example we yield values of different sizes on every iteration?

Correct

I thought that type(iter_arg), type(result) and the type in scf.yield should all be compatible.

The static types are indeed compatible, but there is no guarantee about the dynamic value of the ?.

I think @springerm ran into similar assumption errors in the past.

This folding is valid only if the loop is "dynamic-shape-preserving". In your example, this cast may fail at runtime depending on what @do is doing:

%4 = tensor.cast %3 : tensor<?x?xf32> to tensor<32x1024xf32>

I submitted a dim(iter_arg) folding some time ago and then noticed that a blanket folding (without dynamic shape analysis) is incorrect. This is how we fixed it: https://reviews.llvm.org/D109430. The code contains a TypeSwitch for ops for which we know that they conserve the dynamic shape. We can do a bit better these days: We know that destination style ops preserve the dynamic shape, so we could query that interface. But it won't work with CallOps, because those are not destination style ops.

nicolasvasilache mentioned this in D148714: [SCF] Clean up ForOpTensorCastFolder and harden it against nop tensor casts.Apr 19 2023, 6:49 AM

nicolasvasilache added a reverting change: D148743: [mlir][SCF] Revert 2 spurious commits related to scf.for folding..Apr 19 2023, 1:41 PM

Revision Contents

Path

Size

mlir/

lib/

Dialect/

SCF/

IR/

SCF.cpp

20 lines

test/

Dialect/

SCF/

canonicalize.mlir

29 lines

Diff 499776

mlir/lib/Dialect/SCF/IR/SCF.cpp

	Show First 20 Lines • Show All 888 Lines • ▼ Show 20 Lines
	///			///
	/// ```			/// ```
	/// %0 = tensor.cast %t0 : tensor<32x1024xf32> to tensor<?x?xf32>			/// %0 = tensor.cast %t0 : tensor<32x1024xf32> to tensor<?x?xf32>
	/// %1 = scf.for %i = %c0 to %c1024 step %c32 iter_args(%iter_t0 = %0)			/// %1 = scf.for %i = %c0 to %c1024 step %c32 iter_args(%iter_t0 = %0)
	/// -> (tensor<?x?xf32>) {			/// -> (tensor<?x?xf32>) {
	/// %2 = call @do(%iter_t0) : (tensor<?x?xf32>) -> tensor<?x?xf32>			/// %2 = call @do(%iter_t0) : (tensor<?x?xf32>) -> tensor<?x?xf32>
	/// scf.yield %2 : tensor<?x?xf32>			/// scf.yield %2 : tensor<?x?xf32>
	/// }			/// }
	/// %2 = tensor.cast %1 : tensor<?x?xf32> to tensor<32x1024xf32>			/// use_of(%1)
	/// use_of(%2)
	/// ```			/// ```
	///			///
	/// folds into:			/// folds into:
	///			///
	/// ```			/// ```
	/// %0 = scf.for %arg2 = %c0 to %c1024 step %c32 iter_args(%arg3 = %arg0)			/// %0 = scf.for %arg2 = %c0 to %c1024 step %c32 iter_args(%arg3 = %arg0)
	/// -> (tensor<32x1024xf32>) {			/// -> (tensor<32x1024xf32>) {
	/// %2 = tensor.cast %arg3 : tensor<32x1024xf32> to tensor<?x?xf32>			/// %2 = tensor.cast %arg3 : tensor<32x1024xf32> to tensor<?x?xf32>
	/// %3 = call @do(%2) : (tensor<?x?xf32>) -> tensor<?x?xf32>			/// %3 = call @do(%2) : (tensor<?x?xf32>) -> tensor<?x?xf32>
	/// %4 = tensor.cast %3 : tensor<?x?xf32> to tensor<32x1024xf32>			/// %4 = tensor.cast %3 : tensor<?x?xf32> to tensor<32x1024xf32>
	/// scf.yield %4 : tensor<32x1024xf32>			/// scf.yield %4 : tensor<32x1024xf32>
	/// }			/// }
	/// use_of(%0)			/// %1 = tensor.cast %0 : tensor<32x1024xf32> to tensor<?x?xf32>
				/// use_of(%1)
	/// ```			/// ```
	struct ForOpTensorCastFolder : public OpRewritePattern<ForOp> {			struct ForOpTensorCastFolder : public OpRewritePattern<ForOp> {
	using OpRewritePattern<ForOp>::OpRewritePattern;			using OpRewritePattern<ForOp>::OpRewritePattern;

	LogicalResult matchAndRewrite(ForOp op,			LogicalResult matchAndRewrite(ForOp op,
	PatternRewriter &rewriter) const override {			PatternRewriter &rewriter) const override {
	for (auto it : llvm::zip(op.getIterOpOperands(), op.getResults())) {			for (auto it : llvm::zip(op.getIterOpOperands(), op.getResults())) {
	OpOperand &iterOpOperand = std::get<0>(it);			OpOperand &iterOpOperand = std::get<0>(it);
	auto incomingCast = iterOpOperand.get().getDefiningOp<tensor::CastOp>();			auto incomingCast = iterOpOperand.get().getDefiningOp<tensor::CastOp>();
	if (!incomingCast)			if (!incomingCast)
	continue;			continue;
	if (!std::get<1>(it).hasOneUse())			// If the dest type of the cast does not preserve static information in
	continue;			// the source type.
	auto outgoingCastOp =			if (!tensor::preservesStaticInformation(incomingCast.getDest().getType(),
	dyn_cast<tensor::CastOp>(*std::get<1>(it).user_begin());			incomingCast.getSource().getType()))
	if (!outgoingCastOp)
	continue;			continue;
				if (!std::get<1>(it).hasOneUse())
	// Must be a tensor.cast op pair with matching types.
	if (outgoingCastOp.getResult().getType() !=
	incomingCast.getSource().getType())
	continue;			continue;

	// Create a new ForOp with that iter operand replaced.			// Create a new ForOp with that iter operand replaced.
	auto newForOp = replaceTensorCastForOpIterArg(rewriter, iterOpOperand,			auto newForOp = replaceTensorCastForOpIterArg(rewriter, iterOpOperand,
	incomingCast.getSource());			incomingCast.getSource());

	// Insert outgoing cast and use it to replace the corresponding result.			// Insert outgoing cast and use it to replace the corresponding result.
	rewriter.setInsertionPointAfter(newForOp);			rewriter.setInsertionPointAfter(newForOp);
	▲ Show 20 Lines • Show All 2,792 Lines • Show Last 20 Lines

mlir/test/Dialect/SCF/canonicalize.mlir

Show First 20 Lines • Show All 844 Lines • ▼ Show 20 Lines	func.func @fold_away_iter_and_result_with_no_use(%arg0 : i32,
// CHECK: return %[[FOR_RES]] : i32		// CHECK: return %[[FOR_RES]] : i32
return %0#0 : i32		return %0#0 : i32
}		}

// -----		// -----

func.func private @do(%arg0: tensor<?x?xf32>) -> tensor<?x?xf32>		func.func private @do(%arg0: tensor<?x?xf32>) -> tensor<?x?xf32>

// CHECK-LABEL: matmul_on_tensors		func.func @matmul_on_tensors(%t0: tensor<32x1024xf32>) -> tensor<?x?xf32> {
// CHECK-SAME: %[[T0:[0-9a-z]*]]: tensor<32x1024xf32>
// CHECK-SAME: %[[T1:[0-9a-z]*]]: tensor<1024x1024xf32>
func.func @matmul_on_tensors(%t0: tensor<32x1024xf32>, %t1: tensor<1024x1024xf32>) -> tensor<1024x1024xf32> {
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%c32 = arith.constant 32 : index		%c32 = arith.constant 32 : index
%c1024 = arith.constant 1024 : index		%c1024 = arith.constant 1024 : index
		%0 = tensor.cast %t0 : tensor<32x1024xf32> to tensor<?x?xf32>
		%1 = scf.for %i = %c0 to %c1024 step %c32 iter_args(%iter_t0 = %0) -> (tensor<?x?xf32>) {
		%2 = func.call @do(%iter_t0) : (tensor<?x?xf32>) -> tensor<?x?xf32>
		scf.yield %2 : tensor<?x?xf32>
		} {some_attr}
		return %1 : tensor<?x?xf32>
		}
		// CHECK-LABEL: matmul_on_tensors
		// CHECK-SAME: %[[T0:[0-9a-z]*]]: tensor<32x1024xf32>

// CHECK-NOT: tensor.cast		// CHECK-NOT: tensor.cast
// CHECK: %[[FOR_RES:.]] = scf.for {{.}} iter_args(%[[ITER_T0:.*]] = %[[T0]]) -> (tensor<32x1024xf32>) {		// CHECK: %[[FOR_RES:.]] = scf.for {{.}} iter_args(%[[ITER_T0:.*]] = %[[T0]]) -> (tensor<32x1024xf32>) {
// CHECK: %[[CAST:.*]] = tensor.cast %[[ITER_T0]] : tensor<32x1024xf32> to tensor<?x?xf32>		// CHECK: %[[CAST:.*]] = tensor.cast %[[ITER_T0]] : tensor<32x1024xf32> to tensor<?x?xf32>
// CHECK: %[[DONE:.*]] = func.call @do(%[[CAST]]) : (tensor<?x?xf32>) -> tensor<?x?xf32>		// CHECK: %[[DONE:.*]] = func.call @do(%[[CAST]]) : (tensor<?x?xf32>) -> tensor<?x?xf32>
// CHECK: %[[UNCAST:.*]] = tensor.cast %[[DONE]] : tensor<?x?xf32> to tensor<32x1024xf32>		// CHECK: %[[UNCAST:.*]] = tensor.cast %[[DONE]] : tensor<?x?xf32> to tensor<32x1024xf32>
// CHECK: scf.yield %[[UNCAST]] : tensor<32x1024xf32>		// CHECK: scf.yield %[[UNCAST]] : tensor<32x1024xf32>
// CHECK: } {some_attr}		// CHECK: } {some_attr}
%0 = tensor.cast %t0 : tensor<32x1024xf32> to tensor<?x?xf32>		// CHECK: %[[RES:.*]] = tensor.cast
%1 = scf.for %i = %c0 to %c1024 step %c32 iter_args(%iter_t0 = %0) -> (tensor<?x?xf32>) {		// CHECK: return %[[RES]] : tensor<?x?xf32>
%2 = func.call @do(%iter_t0) : (tensor<?x?xf32>) -> tensor<?x?xf32>
scf.yield %2 : tensor<?x?xf32>
} {some_attr}
// CHECK-NOT: tensor.cast
// CHECK: %[[RES:.*]] = tensor.insert_slice %[[FOR_RES]] into %[[T1]][0, 0] [32, 1024] [1, 1] : tensor<32x1024xf32> into tensor<1024x1024xf32>
// CHECK: return %[[RES]] : tensor<1024x1024xf32>
%2 = tensor.cast %1 : tensor<?x?xf32> to tensor<32x1024xf32>
%res = tensor.insert_slice %2 into %t1[0, 0] [32, 1024] [1, 1] : tensor<32x1024xf32> into tensor<1024x1024xf32>
return %res : tensor<1024x1024xf32>
}

// -----		// -----

// CHECK-LABEL: @cond_prop		// CHECK-LABEL: @cond_prop
func.func @cond_prop(%arg0 : i1) -> index {		func.func @cond_prop(%arg0 : i1) -> index {
%res = scf.if %arg0 -> index {		%res = scf.if %arg0 -> index {
%res1 = scf.if %arg0 -> index {		%res1 = scf.if %arg0 -> index {
%v1 = "test.get_some_value1"() : () -> index		%v1 = "test.get_some_value1"() : () -> index
▲ Show 20 Lines • Show All 638 Lines • Show Last 20 Lines