This is an archive of the discontinued LLVM Phabricator instance.

[MLIR] [affine-loop-fusion] Fix a bug about non-result ops in affine-loop-fusion
ClosedPublic

Authored by tungld on Jan 28 2021, 11:21 PM.

Details

Summary

This patch fixes the following bug when calling --affine-loop-fusion

Input program:

mlir
func @should_not_fuse_since_top_level_non_affine_non_result_users(
    %in0 : memref<32xf32>, %in1 : memref<32xf32>) {
  %c0 = constant 0 : index
  %cst_0 = constant 0.000000e+00 : f32

  affine.for %d = 0 to 32 {
    %lhs = affine.load %in0[%d] : memref<32xf32>
    %rhs = affine.load %in1[%d] : memref<32xf32>
    %add = addf %lhs, %rhs : f32
    affine.store %add, %in0[%d] : memref<32xf32>
  }
  store %cst_0, %in0[%c0] : memref<32xf32>
  affine.for %d = 0 to 32 {
    %lhs = affine.load %in0[%d] : memref<32xf32>
    %rhs = affine.load %in1[%d] : memref<32xf32>
    %add = addf %lhs, %rhs: f32
    affine.store %add, %in0[%d] : memref<32xf32>
  }
  return
}

call --affine-loop-fusion, we got an incorrect output:

mlir
func @should_not_fuse_since_top_level_non_affine_non_result_users(%arg0: memref<32xf32>, %arg1: memref<32xf32>) {
  %c0 = constant 0 : index
  %cst = constant 0.000000e+00 : f32
  store %cst, %arg0[%c0] : memref<32xf32>
  affine.for %arg2 = 0 to 32 {
    %0 = affine.load %arg0[%arg2] : memref<32xf32>
    %1 = affine.load %arg1[%arg2] : memref<32xf32>
    %2 = addf %0, %1 : f32
    affine.store %2, %arg0[%arg2] : memref<32xf32>
    %3 = affine.load %arg0[%arg2] : memref<32xf32>
    %4 = affine.load %arg1[%arg2] : memref<32xf32>
    %5 = addf %3, %4 : f32
    affine.store %5, %arg0[%arg2] : memref<32xf32>
  }
  return
}

This happened because when analyzing the source and destination nodes,
affine loop fusion ignored non-result ops sandwitched between them. In
other words, the MemRefDependencyGraph in the affine loop fusion ignored
these non-result ops.

This patch solves the issue by adding these non-result ops to the
MemRefDependencyGraph.

Diff Detail

Event Timeline

tungld created this revision.Jan 28 2021, 11:21 PM
tungld requested review of this revision.Jan 28 2021, 11:21 PM
tungld retitled this revision from This patch fixes the following bug when calling --affine-loop-fusion to [MLIR] [affine-loop-fusion] Fix a bug about non-result ops in affine-loop-fusion.Jan 28 2021, 11:23 PM
tungld edited the summary of this revision. (Show Details)
bondhugula requested changes to this revision.Jan 30 2021, 5:15 AM

Thanks for noticing this issue. A couple of comments.

mlir/lib/Transforms/LoopFusion.cpp
772

"which could be memory side-effect" -> "which could have a memory side effect"

773

You should really be checking for the Write MemoryEffect via the MemoryEffectOpInterface. Using the interface, you can even get hold of the actual memref Value. Otherwise, this is just overly conservative. If an op doesn't implement the MemoryEffectOpInterface, we have the guarantee to assume it's not writing to a memref.

This revision now requires changes to proceed.Jan 30 2021, 5:15 AM
bondhugula added inline comments.Jan 30 2021, 5:16 AM
mlir/lib/Transforms/LoopFusion.cpp
773

-> which could have a memory write side effect

bondhugula added inline comments.Jan 30 2021, 5:21 AM
mlir/lib/Transforms/LoopFusion.cpp
771

There is nothing special about zero result ops for what you are addressing. An op with results can also have memory write side effects. The case right above op.getNumResults() > 0 && !op.use_empty() will miss it if the results don't have uses, i.e., you could have an op with memory write side effects, one or more results, and with those results being unused: such an op can't be ignored.

What you need is:

else if (auto memOp = dyn_cast<MemoryEffectOpInterface>(op)) {
   // Get hold of the effects; check for write effect.
   ...
}
tungld updated this revision to Diff 320387.Jan 31 2021, 5:53 PM

Use MemWrite side effect to precisely capture the situation.

@bondhugula Thank you for your comments! I updated the patch.

tungld updated this revision to Diff 320395.Jan 31 2021, 10:15 PM

Readd the lit test into the new revision

tungld updated this revision to Diff 320408.Feb 1 2021, 12:56 AM

Change the lit test name

tungld added a comment.Feb 2 2021, 7:20 PM

@bondhugula any additional modification needed? thanks!

@bondhugula any additional modification needed? thanks!

Can you please mark the comments that are done as "Done"?

bondhugula added inline comments.Feb 4 2021, 7:31 AM
mlir/lib/Transforms/LoopFusion.cpp
776–780

Btw, for a write effect, doing a getValue() on it also gives you the operand Value that is actually written to. Would this be useful in any way? The previous case was about things that generate SSA values and so such SSA edges are accounted for. But I can't recall what the impact of just adding a node like this would be. Would the memref being written to get looked at among the operands?

tungld marked 4 inline comments as done.Feb 4 2021, 5:09 PM
tungld added inline comments.
mlir/lib/Transforms/LoopFusion.cpp
776–780

By adding this node, it is used when checking whether a src and dest loop can be fused or not (i.e. inside canFuseSrcWhichWritesToLiveOut), which calls hasNonAffineUsersOnThePath to check non-affine users.

This revision is now accepted and ready to land.Feb 5 2021, 2:59 AM

Thanks for improving this. Let me know if you want me to commit this for you.

tungld added a comment.Feb 5 2021, 3:16 AM

Thank you! Please help me to land it. My "name (email)" : "Tung D. Le (tung@jp.ibm.com)".