This patch fixes the following bug when calling --affine-loop-fusion
Input program:
mlir func @should_not_fuse_since_top_level_non_affine_non_result_users( %in0 : memref<32xf32>, %in1 : memref<32xf32>) { %c0 = constant 0 : index %cst_0 = constant 0.000000e+00 : f32 affine.for %d = 0 to 32 { %lhs = affine.load %in0[%d] : memref<32xf32> %rhs = affine.load %in1[%d] : memref<32xf32> %add = addf %lhs, %rhs : f32 affine.store %add, %in0[%d] : memref<32xf32> } store %cst_0, %in0[%c0] : memref<32xf32> affine.for %d = 0 to 32 { %lhs = affine.load %in0[%d] : memref<32xf32> %rhs = affine.load %in1[%d] : memref<32xf32> %add = addf %lhs, %rhs: f32 affine.store %add, %in0[%d] : memref<32xf32> } return }
call --affine-loop-fusion, we got an incorrect output:
mlir func @should_not_fuse_since_top_level_non_affine_non_result_users(%arg0: memref<32xf32>, %arg1: memref<32xf32>) { %c0 = constant 0 : index %cst = constant 0.000000e+00 : f32 store %cst, %arg0[%c0] : memref<32xf32> affine.for %arg2 = 0 to 32 { %0 = affine.load %arg0[%arg2] : memref<32xf32> %1 = affine.load %arg1[%arg2] : memref<32xf32> %2 = addf %0, %1 : f32 affine.store %2, %arg0[%arg2] : memref<32xf32> %3 = affine.load %arg0[%arg2] : memref<32xf32> %4 = affine.load %arg1[%arg2] : memref<32xf32> %5 = addf %3, %4 : f32 affine.store %5, %arg0[%arg2] : memref<32xf32> } return }
This happened because when analyzing the source and destination nodes,
affine loop fusion ignored non-result ops sandwitched between them. In
other words, the MemRefDependencyGraph in the affine loop fusion ignored
these non-result ops.
This patch solves the issue by adding these non-result ops to the
MemRefDependencyGraph.
There is nothing special about zero result ops for what you are addressing. An op with results can also have memory write side effects. The case right above op.getNumResults() > 0 && !op.use_empty() will miss it if the results don't have uses, i.e., you could have an op with memory write side effects, one or more results, and with those results being unused: such an op can't be ignored.
What you need is: