This is an archive of the discontinued LLVM Phabricator instance.

[mlir][affine-loop-fusion] Fix a bug that AffineIfOp prevents fusion of the other loops
ClosedPublic

Authored by tungld on Jul 14 2021, 12:04 AM.

Details

Summary

The presence of AffineIfOp inside AffineFor prevents fusion of the other loops to happen. For example:

affine.for %i0 = 0 to 10 {
  affine.store %cf7, %a[%i0] : memref<10xf32>
}
affine.for %i1 = 0 to 10 {
  %v0 = affine.load %a[%i1] : memref<10xf32>
  affine.store %v0, %b[%i1] : memref<10xf32>
}
affine.for %i2 = 0 to 10 {
  affine.if #set(%i2) {
    %v0 = affine.load %b[%i2] : memref<10xf32>
  }
}

The first two loops were not be fused because of affine.if inside the last affine.for.

The issue seems to come from a conservative constraint that does not allow fusion if there are ops whose number of regions != 0 (affine.if is one of them).

This patch just removes such a constraint when`affine.if` is inside affine.for. The existing canFuseLoops method is able to handle affine.if correctly.

Diff Detail

Event Timeline

tungld created this revision.Jul 14 2021, 12:04 AM
tungld requested review of this revision.Jul 14 2021, 12:04 AM
bondhugula requested changes to this revision.EditedJul 15 2021, 12:39 AM

This would be wrong to do - the checks can't be simply removed. Please see below. Region holding ops would need to be properly handled and they can be in the cases we know which ops those are.

mlir/lib/Transforms/LoopFusion.cpp
747–750

It's not okay to simply remove this check. It would just lead to incorrect fusion. An if op constrains the execution inside and I think the fusion here isn't/may not be currently geared to handle fusion when there are if ops inside.

778–780

We can't simply remove this check without know what kind of op it is! Instead we may just want to allow the ops that we know would not create an issue instead of allowing everything. For eg. this would be wrong to do if there was an scf.execute_region here (you'd have to look inside the op to see what memrefs are accessed) before you do anything at the outer level.

This is also clearly wrong even for affine.if because the if op might be reading/writing to memrefs and you can't fuse other things in general without considering that. Your test cases don't cover that.

This revision now requires changes to proceed.Jul 15 2021, 12:39 AM

Thank you for your comments. I will update the patch to check affine.if only.

This is also clearly wrong even for affine.if because the if op might be reading/writing to memrefs and you can't fuse other things in general without considering that. Your test cases don't cover that.

When I removed these checks, I saw that canFuseLoops method was able to handle affine.if. Actually canFuseLoops checks affine.if in the source and dest ops via calling gatherLoadsAndStores. For example, if destOp or srcOp contains affine.if, there is no fusion for them.
Also, if there is affine.if sandwiched between srcOp and destOp, there is no fusion:

affine.for %arg0 = 0 to 10 {
  affine.for %arg1 = 0 to 10 {
    affine.store %cst, %0[%arg0, %arg1] : memref<10x10xf32>
  }
}
affine.for %arg0 = 0 to 10 {
  affine.for %arg1 = 0 to 10 {
    affine.if #set(%arg1) {
      affine.store %cst, %0[%arg0, %arg1] : memref<10x10xf32>
    }
  }
}
affine.for %arg0 = 0 to 10 {
  affine.for %arg1 = 0 to 10 {
    %2 = affine.load %0[%arg1, %arg0] : memref<10x10xf32>
    affine.store %2, %1[%arg0, %arg1] : memref<10x10xf32>
  }
}

So, it seems that returning false in MemRefDependenceGraph::init is too early for affine.if, which causes all the handling of affine.if inside canFuseLoops unreachable.

tungld updated this revision to Diff 359242.Jul 16 2021, 12:49 AM

Only handle AffineIfOp

tungld marked 2 inline comments as done.Jul 16 2021, 12:59 AM

@bondhugula I updated the patch to handle only the case that affine.if is inside affine.for. One more test is added for the case where affine.if is sandwiched by the source and dest ops.

bondhugula requested changes to this revision.Jul 21 2021, 7:49 PM

Some comments. @vinayaka-polymage could you please review this?

mlir/lib/Transforms/LoopFusion.cpp
751

No need of dyn_cast here - instead use isa.

757–758

op here is hiding another variable of the same name. Please rename.

757–762

You don't need to use a flag. Instead, please use interrupt() and wasInterrupted() pattern. Several references for this at other places in the codebase.

This revision now requires changes to proceed.Jul 21 2021, 7:49 PM
tungld updated this revision to Diff 360703.Jul 21 2021, 10:44 PM
Simplify the code according to Uday's comments.
tungld marked 3 inline comments as done.Jul 21 2021, 10:45 PM
vinayaka-polymage requested changes to this revision.Jul 21 2021, 11:56 PM

Thanks for this, occurrence an AffineIfOp doesn't need to prevent fusion happening elsewhere, and canFuseLoops already checks for AffineIfOps.

Left a couple of minor suggestions. Also, the commit summary and the example there will have to change now I think, as a top-level AffineIfOp still prevents fusions.

mlir/lib/Transforms/LoopFusion.cpp
756

LoopNestStateCollector is walking the forOp anyway, so you can in fact, set a state inside that itself if AffineIf op is the only reason hasNonForRegion is being set. Only when it is, you can continue with fusion, else return false.

The main reason for suggesting is avoiding multiple walks over forOps.

mlir/test/Transforms/loop-fusion.mlir
526

Can you please change this to contain non-empty regions ? - I am afraid, this AffineIf might get folded away rendering the test case without if.

Also, unless you really want 2D loop nests, even a simpler 1D test case will also simplify checks.

583–601

Check return at the end.

This revision now requires changes to proceed.Jul 21 2021, 11:56 PM
tungld updated this revision to Diff 360789.Jul 22 2021, 6:38 AM

Move the check of AffineIf into LoopNestStateCollector and use 1D tensors in the lit tests.

tungld edited the summary of this revision. (Show Details)Jul 22 2021, 6:43 AM
tungld marked 3 inline comments as done.Jul 22 2021, 6:45 AM

@vinayaka-polymage I updated the patch according to your comments. Thanks!

Thanks for addressing all the comments. Changes look good!

Really minor suggestion again on test cases: As you are not capturing the values and reusing them, you can get rid of all %{{.*}} and only check affine.for, affine.load etc..

tungld updated this revision to Diff 361009.Jul 22 2021, 3:53 PM

Thanks, @vinayaka-polymage, for your comments! I simplified the lit tests.

@bondhugula @vinayaka-polymage is it the current patch good enough? Thanks!

Thanks @tungld ! It looks good to me.

@bondhugula is this patch ok to be landed? Thanks!

bondhugula requested changes to this revision.Jul 29 2021, 10:04 AM
bondhugula added inline comments.
mlir/lib/Transforms/LoopFusion.cpp
79

This name has become very convoluted. Isn't the same as:

hasNonAffineRegionOp

?

747–751

Again, really convoluted.

mlir/test/Transforms/loop-fusion.mlir
526

ForInst -> ForOp

This revision now requires changes to proceed.Jul 29 2021, 10:04 AM
bondhugula added inline comments.Jul 29 2021, 10:04 AM
mlir/test/Transforms/loop-fusion.mlir
566

There's no ForInst.

ForInst -> ForOp everywhere.

bondhugula added inline comments.Jul 29 2021, 10:05 AM
mlir/lib/Transforms/LoopFusion.cpp
747–751

Isn't this the same as:

Returns false if a region holding op other than affine.for and affine.if ...
tungld updated this revision to Diff 362944.Jul 29 2021, 6:25 PM

Thanks @bondhugula! I changed variable names and comments accoring to your comments.

I used ForInst since the existing tests used it. Now all ForInst are renamed to ForOp, also if_inst to if_op.

tungld marked 5 inline comments as done.Jul 29 2021, 6:26 PM
bondhugula accepted this revision.Jul 30 2021, 2:51 AM

LGTM - thanks.

This revision is now accepted and ready to land.Jul 30 2021, 2:51 AM