This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Fix folding for scf.for(tensor.cast).
AcceptedPublic

Authored by pifon2a on Feb 22 2023, 10:53 AM.

Details

Summary

We should only fold tensor.casts that provide some new static information about
shapes, instead of looking for a symmetric pattern cast(for(cast)).

Diff Detail

Event Timeline

pifon2a created this revision.Feb 22 2023, 10:53 AM
pifon2a requested review of this revision.Feb 22 2023, 10:53 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 22 2023, 10:53 AM
pifon2a retitled this revision from [mlir] Fix foling for scf.for(tensor.cast). to [mlir] Fix folding for scf.for(tensor.cast)..Feb 22 2023, 10:58 AM
nicolasvasilache accepted this revision.Feb 23 2023, 2:03 AM

This is in line with how casts are folded in other parts of the system so LGTM.
Please add a minimal test before landing!

This revision is now accepted and ready to land.Feb 23 2023, 2:03 AM
pifon2a updated this revision to Diff 499774.Feb 23 2023, 2:12 AM

Add a test.

This revision was landed with ongoing or failed builds.Feb 23 2023, 2:23 AM
This revision was automatically updated to reflect the committed changes.
nicolasvasilache reopened this revision.Apr 3 2023, 2:00 AM

@pifon I apologize, this approval was a big oversight on my part, the change is incorrect: scf.for provides no guarantee that the iter_arg type remains the same type dynamically.

This is why the original pattern was looking for pairs of casts.

We need to revert this.

This revision is now accepted and ready to land.Apr 3 2023, 2:00 AM

Is there an example of such scf.for? Maybe there is another way to fix it.

@pifon I apologize, this approval was a big oversight on my part, the change is incorrect: scf.for provides no guarantee that the iter_arg type remains the same type dynamically.

This is why the original pattern was looking for pairs of casts.

We need to revert this.

The use case that exposed the bug will be fixed independently indeed.

Here is a problematic example:

%1 = tensor.cast %0: !static_tensor to !dynamic_tensor
%2 = scf.for ... (%iter = %1) -> {
  %3 = take_a_random_slice(%iter) : !dynamic_tensor -> !dynamic_tensor
  scf.yield %3 : !dynamic_tensor
} -> !dynamic_tensor
%3 = do_something(%2) : !dynamic_tensor

take_a_random_slice can return a slice of any size and casting it to !static_tensor is only valid when the last yielded tensor has the expected dynamic size at runtime.

This easily propagates miscompiles that result in out of bounds accesses way later in the program.

This PR makes an assumption on scf.for that the dynamic type remains constant, this is an incorrect assumption.

Does not that mean that in this example we yield values of different sizes on every iteration? I thought that type(iter_arg), type(result) and the type in scf.yield should all be compatible.

The use case that exposed the bug will be fixed independently indeed.

Here is a problematic example:

%1 = tensor.cast %0: !static_tensor to !dynamic_tensor
%2 = scf.for ... (%iter = %1) -> {
  %3 = take_a_random_slice(%iter) : !dynamic_tensor -> !dynamic_tensor
  scf.yield %3 : !dynamic_tensor
} -> !dynamic_tensor
%3 = do_something(%2) : !dynamic_tensor

take_a_random_slice can return a slice of any size and casting it to !static_tensor is only valid when the last yielded tensor has the expected dynamic size at runtime.

This easily propagates miscompiles that result in out of bounds accesses way later in the program.

This PR makes an assumption on scf.for that the dynamic type remains constant, this is an incorrect assumption.

Does not that mean that in this example we yield values of different sizes on every iteration?

Correct

I thought that type(iter_arg), type(result) and the type in scf.yield should all be compatible.

The static types are indeed compatible, but there is no guarantee about the dynamic value of the ?.

I think @springerm ran into similar assumption errors in the past.

This folding is valid only if the loop is "dynamic-shape-preserving". In your example, this cast may fail at runtime depending on what @do is doing:

%4 = tensor.cast %3 : tensor<?x?xf32> to tensor<32x1024xf32>

I submitted a dim(iter_arg) folding some time ago and then noticed that a blanket folding (without dynamic shape analysis) is incorrect. This is how we fixed it: https://reviews.llvm.org/D109430. The code contains a TypeSwitch for ops for which we know that they conserve the dynamic shape. We can do a bit better these days: We know that destination style ops preserve the dynamic shape, so we could query that interface. But it won't work with CallOps, because those are not destination style ops.