Details
- Reviewers
herhut nicolasvasilache rriddle
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
mlir/lib/Transforms/ParallelLoopFusion.cpp | ||
---|---|---|
22 | using namespace mlir; | |
27 | Use /// for top-level comments. | |
28 | Only classes should be in anonymous namespaces, functions should be in the global namespace and marked static. https://llvm.org/docs/CodingStandards.html#anonymous-namespaces | |
30 | nit: -> WalkResult is unnecessary. | |
41 | nit: Can you just use std::equal for now? | |
57 | nit: Use LogicalResult instead of boolean. | |
76 | Drop trivial braces. | |
86 | nit: Remove this temporary value and just return the result directly. It doesn't really help readability at all. | |
118 | nit: block.getOps<ParallelOp>() | |
124 | nit: Drop all of these trivial braces. | |
137 | Drop the mlir:: on each of these. |
mlir/lib/Transforms/ParallelLoopFusion.cpp | ||
---|---|---|
138 | Can this pass be located under the loop dialect? |
mlir/lib/Transforms/ParallelLoopFusion.cpp | ||
---|---|---|
138 | @mehdi_amini Do you mean moving this code to mlir/lib/Transforms/LoopFusion.cpp where fusion on affine loops is implemented? That file is quite big already. Or do you mean having it in Dialect/LoopOps/Transforms similarly to Dialect/Linalg/Transforms? If it is the latter, then there might be several other passes in lib/Transforms that have to be moved to respective directories. |
mlir/lib/Transforms/ParallelLoopFusion.cpp | ||
---|---|---|
138 | yes there are several passes we need to move under their respective dialects. |
Cool, great start!
mlir/include/mlir/Dialect/LoopOps/Passes.h | ||
---|---|---|
1 ↗ | (On Diff #244609) | This is lacking the LLVM header comment. |
mlir/lib/Dialect/LoopOps/Transforms/ParallelLoopFusion.cpp | ||
39 ↗ | (On Diff #244609) | Avoid named TODO in llvm code base. I am an offender, as well, but I was told to avoid this in the future. |
48 ↗ | (On Diff #244609) | You also need that ploop2 does not write to buffers that ploop1 reads. Both loops are fully independent if WRITES(ploop1) U READS(ploop2) is empty and READS(ploop1) U WRITES(ploop2) is empty. As we have sequential execution order within one iteration, we can ignore reads and writes between the two loops that are on the same index in the context of fusion. |
49 ↗ | (On Diff #244609) | Can you add a comment what the map contains? This is a map from ploop1 indices to ploop2 indices I assume? |
65 ↗ | (On Diff #244609) | This is sufficient for memrefs that are defined outside of the loops. However, memrefs that are created within the loop, e.g. by a subview, will not be handled by this. As a starter, it would be ok to bail out in these cases. |
70 ↗ | (On Diff #244609) | Does something like llvm::all_of(llvm::zip(storeIndices, loadIndices), std::equal) work? Just curious, no need to go there. |
110 ↗ | (On Diff #244609) | What happens if there is a read or write inbetween the ploops? As a start, you could consider immediately adjacent loops or loops where we have no sideffecting ops inbetween. |
mlir/lib/Dialect/LoopOps/Transforms/ParallelLoopFusion.cpp | ||
---|---|---|
70 ↗ | (On Diff #244609) | should work with a custom comparator |
mlir/lib/Dialect/LoopOps/Transforms/ParallelLoopFusion.cpp | ||
---|---|---|
55 ↗ | (On Diff #245106) | Storing to a locally defined buffer also should bail. |
90 ↗ | (On Diff #245106) | This should be the dual to the above function with the same comments regarding aliasing. Without alias information, we have to bail on the case where we load or store to a locally defined buffer. |
mlir/test/Dialect/Loops/parallel-loop-fusion.mlir | ||
32 ↗ | (On Diff #245106) | Can you drop the {temp = true} from the tests. They are just noise here. |
276 ↗ | (On Diff #245106) | This would already fail due to not-matching indices. Maybe load from A instead? |
279 ↗ | (On Diff #245106) | Why is this case illegal? If they store to the same buf but at the same index it should be fine. |
291 ↗ | (On Diff #245106) | There are 4 cases for this. Read/write to local allocation in loop1/loop2. Essentially any effect on a memref we do not understand. |
mlir/lib/Dialect/LoopOps/Transforms/ParallelLoopFusion.cpp | ||
---|---|---|
48 ↗ | (On Diff #245367) | the the -> the |
142 ↗ | (On Diff #245367) | Can you extend this so it applies to any operation and traverses its regions? That way it can also be used, e.g., on a ParallelLoop to fuse loops in its body. |
172 ↗ | (On Diff #245367) | This could be a pass that runs on any Operation then, not necessarily a function. |
183 ↗ | (On Diff #245367) | This is over-committing a bit. It does not fuse loop nests. |
mlir/test/Dialect/Loops/parallel-loop-fusion.mlir | ||
261 ↗ | (On Diff #245367) | How is this different from above? |
mlir/lib/Dialect/LoopOps/Transforms/ParallelLoopFusion.cpp | ||
---|---|---|
100 ↗ | (On Diff #245139) | No, because we are using BlockAndValueMapping that has only a map from IVs of ploop1 to IVs to ploop2. We can have the inverse map though. |
mlir/test/Dialect/Loops/parallel-loop-fusion.mlir | ||
261 ↗ | (On Diff #245367) | renamed to common_buf to make it more visible. |
Are we sure we want another parallel SW stack to do fusion here?
I'd love to understand the use cases that are fundamentally different from what we'd expect to do with Linalg + Affine (with or without Intel's multi-for).
Marking as blocker to be sure my question is addressed, sorry for being late to the party as I was away last week.
I do not see a parallel SW stack. This is pretty much structural fusion on parallel loop nests, which are the bottom layer before we go to GPU. The reason this exists is that we do not have all code generation come through LinAlg and would like to combine loops at the lowest level. I do not expect for this to grow sophisticated dependency analysis and get complex. I agree that we will need a common analysis engine that can be reused for different loop-like structures. That is probably easier designed once we have more approaches implemented in full.
Regarding affine, I'd be happy to use loop fusion on affine.parallel loops, as well. It would need to support reduction, though.
Until then, I'd like to land this structural fusion to serve a current code-generation need we have.
I had a similar comment on the ploop tiling diff, @nicolasvasilache... Linalg fusion is also structural in that sense, but we need to decouple the transformation implementation from any of the dialect-specific concerns. I don't see how many abstraction layers and templates we'd have to add to reuse code across dialects and whether it will be worth the effort. I suppose somebody with deep experience in loop stuff (that is you or me) should look into refactoring this.
using namespace mlir;