A producer loop that is a reduction into a scalar value is inserted into the consumer as a single iteration. The producer loop needs to be inserted with it original bounds to be correct.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Before I go for an in-depth review, one question on the approach:
If the fusion was incorrect, was the dependence analysis incorrect? If that's the case, it needs to be enhanced as a generic solution. I see some matching being done, and that made me wonder if matching is a robust enough solution.
@vinayaka-polymage Thanks for taking a look! Perhaps I should have worded it differently. The result of the entire fusion is what am referring to here . While it is a valid fusion canidadate, what is incorrect is not using the entire slice bounds. A similar fixing of bounds
happens when there is read-read dependency. See https://github.com/llvm/llvm-project/blob/3e6ba89055c80e6360b7605464520711b30084a6/mlir/lib/Dialect/Affine/Analysis/Utils.cpp#L1107 . My change is along those lines and is keeping the origina bounds
when the producer is a memref based reduction. The pattern checking is checking if it is a memref based reduction like suggested in https://discourse.llvm.org/t/detecting-reduction-loops-in-affine-dialect/2527
Thanks @sumesh13 for the clarification. Sorry for the late response here, I am unavailable (travel) until 12th. I will check this thread after that.
I would like to help but I'm having a hard time guessing how the wrong code looks like and what is this actually fixing. Something that could help, and that is very common in LLVM, is to pre-commit the test matching the wrong output, and then introduce the fix so that we can clearly see how the test evolves. Could you please do that? It would be very helpful.
Thanks,
Diego
@dcaballe Thanks for the suggestion. I have added the example that shows the incorrect fusion. You see how the initialization of the reduction is merged into the reduction body due to bounds of the slice being set as 1.
@dcaballe Sorry couldnt remind earlier. Was on vacation myself. Did you need anything else for the review ?
I think we need feedback from @bondhugula or @vinayaka-polymage. IIRC, some of the utilities that this patch modifies are used in different places and I think they assume that reductions are represented by SSA values through iter_args. @bondhugula had some thoughts about converting reductions back and forth from SSA to memory representation based on the requirements of each analysis/transformation. I'm missing some context here.
I'm also wondering if we should use affine analysis to detect memory reductions instead of just pattern-matching the IR. That would be less error-prone?