This patch adds a new transform operation transform.loop.fuse_sibling,
which given two loops, fuses them, assuming that they are independent.
The transform operation itself performs very basic checks to ensure
IR legality, and leaves the responsibility of ensuring independence on the user.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
mlir/include/mlir/Dialect/SCF/TransformOps/SCFTransformOps.td | ||
---|---|---|
319–320 | Mention which kind of "loop" ops are supported. Also mention that the bounds must match and that a silenceable failure is produced otherwise. Also mention that both input handles are mapped to exactly one op and what happens if they are not. | |
324–325 | Add the following: "This operation consumes the loop and sibling handles and produces the fused_loop handle." | |
328–329 | I would rename these to indicate which loop is fused into which one. Maybe something like target and source. | |
mlir/lib/Dialect/SCF/TransformOps/SCFTransformOps.cpp | ||
377–379 | What happens if the result of one loop is used as an "shared_outs" operand of the other loop? Also, could there be issues where the resulting IR violates dominance because some operands of ops inside the loop are not defined after the other loop. I think it should be possible to write a relatively simple check with getUsedValuesDefinedAbove. | |
382–383 | I would make this and the isOpSibiling check a silenceable error. When there is an obvious error like "op type does not match", indicating there's something inherently wrong with the transform dialect script, I usually make it a definite error. | |
mlir/lib/Dialect/SCF/Utils/Utils.cpp | ||
974–975 | Can you give this function a RewriterBase & and use that rewriter to modify the IR. That would make it easier to reuse this helper function. | |
1025–1033 | These should then also go through the rewriter. |
mlir/include/mlir/Dialect/SCF/TransformOps/SCFTransformOps.td | ||
---|---|---|
320 | The legality of loop fusion is a tricky topic, we can't just approximate with "neither is an ancestor of another". You should at least restrict the transform to only work in the case of loops with no recursive side effects (i.e. on tensors only and without function calls like print). |
mlir/include/mlir/Dialect/SCF/TransformOps/SCFTransformOps.td | ||
---|---|---|
319–320 | Can this description be more extensive? It should mention that only forall loops can be fused, and which preconditions should hold for the loops. Also mention that there is no real legality check performed (see below). | |
mlir/lib/Dialect/SCF/TransformOps/SCFTransformOps.cpp | ||
331–332 | The bounds being the same is not a sufficient criterion for the fusion to be legal. Consider scf.forall %i in %bounds store %i, %mem[%i] scf.forall %i in %bounds load %mem[%bounds - %i] that would trivially be incorrect if two loops are fused. I'm not necessary opposed to having a "dangerous" transform exposed as an op, but it must be very clearly documented as such. | |
335–340 | isa followed by cast, even at a distance, is an anti-pattern. Call dyn_cast and check the resulting variables for being non-null instead. | |
369–372 | This should be documented in the op description. | |
379 | Nit: don't start error messages with a capital letter. | |
389 | Nit: it's possible to construct an ArrayRef temporary from a single element, no need to use a vector here. | |
mlir/lib/Dialect/SCF/Utils/Utils.cpp | ||
977 | Pass in RewriterBase & as an argument to this function and use the rewriter available in the apply method. | |
980 | Nit: expand auto unless the type is obvious from context (e.g., there's a cast on the RHS). | |
982 | Nit: don't specify the number of vector stack elements unless there is a very good reason to. | |
983–985 | Something like llvm::to_vector(llvm::concat(range1, range2)) would probably work here. | |
1025–1033 | This must use proper rewriting API, e.g. replaceOp and eraseOp. Low-level Operation API will result to catastrophic memory problems when used with proper rewriters. |
mlir/lib/Dialect/SCF/TransformOps/SCFTransformOps.cpp | ||
---|---|---|
327–329 | This will accept loops from, e.g., different functions. A better, and faster, condition is to check that both operations belong to the same block. Also, depending on the understanding of "sibling", this may accept as siblings two loops that have other operations (including other loops!) between them, which may not be legal. |
- Address comments
mlir/include/mlir/Dialect/SCF/TransformOps/SCFTransformOps.td | ||
---|---|---|
320 | Right. I don't really want to make this operation check for the legality of loop fusion other than something that may produce invalid IR as that is not easy, but I still want this operation to be useable in those cases. I have changed the description to reflect this better. | |
mlir/lib/Dialect/SCF/TransformOps/SCFTransformOps.cpp | ||
327–329 | I updated the documentation to describe what this operation is meant to do better. | |
331–332 | Added documentation to make it clear that it is the responsibility of the user to consider these. | |
377–379 | If the shared_outs use happens, then it's a dependence. This is what I wanted to check with this operation but wrongly assumed that isAncestor does this. What I want to check is if any result produced by the first loop is used by the second loop. For the dominance problem, I added a check that all values used by target must be defined before source. | |
mlir/lib/Dialect/SCF/Utils/Utils.cpp | ||
983–985 | For some reason, to_vector refuses to work with this. Tried: llvm::to_vector(llvm::concat<Value>(range1, range2)) But that fails with: error: taking the address of a temporary object of type 'mlir::Value' [-Waddress-of-temporary] Using the old implementation for now. |
mlir/include/mlir/Dialect/SCF/TransformOps/SCFTransformOps.td | ||
---|---|---|
316 | Nit: I'd just say "assuming the fusion is legal". Consumer/producer fusion is also fine here IMO. | |
326 | s/matching/mapping | |
mlir/include/mlir/Dialect/SCF/Utils/Utils.h | ||
189–190 | Nit: in my understanding, "siblings" doesn't imply independence and vice versa, these are two separate conditions that must be met. Two adjacent loops can depend on each other, e.g. in producer/consumer way or have the first loop compute the trip count of the second loop. | |
mlir/lib/Dialect/SCF/TransformOps/SCFTransformOps.cpp | ||
328 | ||
335 | ||
340 | Not all operands necessarily have a defining op, they may be block arguments, at which point this will be dereferencing a null pointer and likely crash. Furthermore, I suspect this condition one of the conditions here is either incorrect or unnecessary. I'm confused by the use of "source" and "target" here, so I'll just my own example: %results = scf.forall { // first_loop } // something in-between scf.forall %operands { // second_loop } If the body of first_loop is merged into second_loop, then the condition that has to be ensured is that all uses of %results are after second_loop. Since second_loop already follows first_loop, all operands of first_loop are already visible at second_loop if the input IR is valid, no need to check for that. Also note that dominance is more complex than just order of operations in blocks. Operands may be defined in surrounding regions, or in dominating blocks that transfer control to the given block. Results can be used in nested regions, or dominated blocks to which the control is transferred. It is possible to assume (and actually check) that the current block belongs to a single-block region, which is the most common case in presence of structured control flow. When done carefully, this entire test will conservatively return failure. However, I am encouraging you to consider at least nested regions, otherwise the utility of this transform is quite limited. | |
358 | Nit: I would rename this function to make it better correspond to what it actually does, e.g., isForallWithIdenticalConfiguration. Otherwise somebody will eventually try using it for actual fusion legality tests. | |
359–360 | Nit: consider https://llvm.org/docs/CodingStandards.html#use-early-exits-and-continue-to-simplify-code. Here and below. | |
391 | Nit: please follow https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements here and below. | |
403 | I may sound obsessed with the term, but "not siblings" doesn't describe the actual check that has taken place, which also includes dominance checks. Couldn't you just return DiagnosedSilenceableFailure from the helper function and put a more descriptive error message there? |
mlir/include/mlir/Dialect/SCF/Utils/Utils.h | ||
---|---|---|
189–190 | My understanding of siblings is that they are two loops that do not behave in a producer/consumer fashion. A result/side-effect produced by the first loop, should not be consumed by the other loop. Isn't this the same as being independent? Or am I missing something? I'm happy to change the name if you feel this is not the right word. | |
mlir/lib/Dialect/SCF/TransformOps/SCFTransformOps.cpp | ||
340 | Regarding the unnecessary conditions, I now check the ordering and only do the required checks. Regarding dominance, I switched to using DominanceInfo, will that be enough? Also, I added some more test cases which check for dominance failures. While adding these tests, I also noticed that not only do the operands need to be checked, but we also need to check if any value used by the second_loop should be visible by the first loop. So, I added another check for this. |
mlir/include/mlir/Dialect/SCF/Utils/Utils.h | ||
---|---|---|
189–190 | In my understanding, sibling operations are merely operations that have a common parent. Nothing more. This is a natural extension of the parent/child terminology used extensively for operations nested in regions of other operations: two "child" operations that have a common parent are siblings. It doesn't imply anything about data flow between them. In non-polyhedral loop transformation terminology, "sibling fusion" is sometimes used in opposition to "producer/consumer fusion", but this is a very niche usage compared to the basic IR structure usage that is more pervasive. I could live with something like doIndependentSiblingFusion so it is clearer that "sibling" qualifies the kind of fusion and not the relation between operations, but a more verbose and less ambiguous name is welcome. |
mlir/include/mlir/Dialect/SCF/Utils/Utils.h | ||
---|---|---|
189–190 | Changed it to fuseIndependentSiblingForallLoops. |
Nit: I'd just say "assuming the fusion is legal". Consumer/producer fusion is also fine here IMO.