This is an archive of the discontinued LLVM Phabricator instance.

[SCF][Transform] Add transform.loop.fuse_sibling
ClosedPublic

Authored by Groverkss on Aug 3 2023, 10:46 PM.

Details

Summary

This patch adds a new transform operation transform.loop.fuse_sibling,
which given two loops, fuses them, assuming that they are independent.
The transform operation itself performs very basic checks to ensure
IR legality, and leaves the responsibility of ensuring independence on the user.

Diff Detail

Event Timeline

Groverkss created this revision.Aug 3 2023, 10:46 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 3 2023, 10:46 PM
Groverkss requested review of this revision.Aug 3 2023, 10:46 PM
springerm added inline comments.Aug 4 2023, 1:05 AM
mlir/include/mlir/Dialect/SCF/TransformOps/SCFTransformOps.td
319–320

Mention which kind of "loop" ops are supported. Also mention that the bounds must match and that a silenceable failure is produced otherwise. Also mention that both input handles are mapped to exactly one op and what happens if they are not.

324–325

Add the following: "This operation consumes the loop and sibling handles and produces the fused_loop handle."

328–329

I would rename these to indicate which loop is fused into which one. Maybe something like target and source.

mlir/lib/Dialect/SCF/TransformOps/SCFTransformOps.cpp
377–379

What happens if the result of one loop is used as an "shared_outs" operand of the other loop? Also, could there be issues where the resulting IR violates dominance because some operands of ops inside the loop are not defined after the other loop. I think it should be possible to write a relatively simple check with getUsedValuesDefinedAbove.

382–383

I would make this and the isOpSibiling check a silenceable error. When there is an obvious error like "op type does not match", indicating there's something inherently wrong with the transform dialect script, I usually make it a definite error.

mlir/lib/Dialect/SCF/Utils/Utils.cpp
974–975

Can you give this function a RewriterBase & and use that rewriter to modify the IR. That would make it easier to reuse this helper function.

1025–1033

These should then also go through the rewriter.

mlir/include/mlir/Dialect/SCF/TransformOps/SCFTransformOps.td
320

The legality of loop fusion is a tricky topic, we can't just approximate with "neither is an ancestor of another".

You should at least restrict the transform to only work in the case of loops with no recursive side effects (i.e. on tensors only and without function calls like print).
And this should also be well documented.

ftynse requested changes to this revision.Aug 4 2023, 1:22 AM
ftynse added inline comments.
mlir/include/mlir/Dialect/SCF/TransformOps/SCFTransformOps.td
319–320

Can this description be more extensive? It should mention that only forall loops can be fused, and which preconditions should hold for the loops. Also mention that there is no real legality check performed (see below).

mlir/lib/Dialect/SCF/TransformOps/SCFTransformOps.cpp
331–332

The bounds being the same is not a sufficient criterion for the fusion to be legal. Consider

scf.forall %i in %bounds
  store %i, %mem[%i]

scf.forall %i in %bounds
  load %mem[%bounds - %i]

that would trivially be incorrect if two loops are fused.

I'm not necessary opposed to having a "dangerous" transform exposed as an op, but it must be very clearly documented as such.

335–340

isa followed by cast, even at a distance, is an anti-pattern. Call dyn_cast and check the resulting variables for being non-null instead.

369–372

This should be documented in the op description.

379

Nit: don't start error messages with a capital letter.

389

Nit: it's possible to construct an ArrayRef temporary from a single element, no need to use a vector here.

mlir/lib/Dialect/SCF/Utils/Utils.cpp
977

Pass in RewriterBase & as an argument to this function and use the rewriter available in the apply method.

980

Nit: expand auto unless the type is obvious from context (e.g., there's a cast on the RHS).

982

Nit: don't specify the number of vector stack elements unless there is a very good reason to.

983–985

Something like llvm::to_vector(llvm::concat(range1, range2)) would probably work here.

1025–1033

This must use proper rewriting API, e.g. replaceOp and eraseOp. Low-level Operation API will result to catastrophic memory problems when used with proper rewriters.

This revision now requires changes to proceed.Aug 4 2023, 1:22 AM
ftynse added inline comments.Aug 4 2023, 1:24 AM
mlir/lib/Dialect/SCF/TransformOps/SCFTransformOps.cpp
327–329

This will accept loops from, e.g., different functions. A better, and faster, condition is to check that both operations belong to the same block.

Also, depending on the understanding of "sibling", this may accept as siblings two loops that have other operations (including other loops!) between them, which may not be legal.

Groverkss edited the summary of this revision. (Show Details)Aug 4 2023, 7:19 AM
Groverkss updated this revision to Diff 547203.Aug 4 2023, 7:28 AM
Groverkss marked 20 inline comments as done.
  • Address comments
mlir/include/mlir/Dialect/SCF/TransformOps/SCFTransformOps.td
320

Right. I don't really want to make this operation check for the legality of loop fusion other than something that may produce invalid IR as that is not easy, but I still want this operation to be useable in those cases. I have changed the description to reflect this better.

mlir/lib/Dialect/SCF/TransformOps/SCFTransformOps.cpp
327–329

I updated the documentation to describe what this operation is meant to do better.

331–332

Added documentation to make it clear that it is the responsibility of the user to consider these.

377–379

If the shared_outs use happens, then it's a dependence. This is what I wanted to check with this operation but wrongly assumed that isAncestor does this.

What I want to check is if any result produced by the first loop is used by the second loop.

For the dominance problem, I added a check that all values used by target must be defined before source.

mlir/lib/Dialect/SCF/Utils/Utils.cpp
983–985

For some reason, to_vector refuses to work with this. Tried: llvm::to_vector(llvm::concat<Value>(range1, range2))

But that fails with:

error: taking the address of a temporary object of type 'mlir::Value' [-Waddress-of-temporary]

Using the old implementation for now.

ftynse requested changes to this revision.Aug 7 2023, 2:18 AM
ftynse added inline comments.
mlir/include/mlir/Dialect/SCF/TransformOps/SCFTransformOps.td
316

Nit: I'd just say "assuming the fusion is legal". Consumer/producer fusion is also fine here IMO.

326

s/matching/mapping

mlir/include/mlir/Dialect/SCF/Utils/Utils.h
189–190

Nit: in my understanding, "siblings" doesn't imply independence and vice versa, these are two separate conditions that must be met. Two adjacent loops can depend on each other, e.g. in producer/consumer way or have the first loop compute the trip count of the second loop.

mlir/lib/Dialect/SCF/TransformOps/SCFTransformOps.cpp
328
335
340

Not all operands necessarily have a defining op, they may be block arguments, at which point this will be dereferencing a null pointer and likely crash.

Furthermore, I suspect this condition one of the conditions here is either incorrect or unnecessary. I'm confused by the use of "source" and "target" here, so I'll just my own example:

%results = scf.forall {  // first_loop
}
// something in-between
scf.forall %operands {  // second_loop
}

If the body of first_loop is merged into second_loop, then the condition that has to be ensured is that all uses of %results are after second_loop. Since second_loop already follows first_loop, all operands of first_loop are already visible at second_loop if the input IR is valid, no need to check for that.
If the body of the second_loop is merged into the first_loop, then the condition is that operands of second_loop are visible at first_loop, but the results are already okay.

Also note that dominance is more complex than just order of operations in blocks. Operands may be defined in surrounding regions, or in dominating blocks that transfer control to the given block. Results can be used in nested regions, or dominated blocks to which the control is transferred. It is possible to assume (and actually check) that the current block belongs to a single-block region, which is the most common case in presence of structured control flow. When done carefully, this entire test will conservatively return failure. However, I am encouraging you to consider at least nested regions, otherwise the utility of this transform is quite limited.

358

Nit: I would rename this function to make it better correspond to what it actually does, e.g., isForallWithIdenticalConfiguration. Otherwise somebody will eventually try using it for actual fusion legality tests.

359–360
391
403

I may sound obsessed with the term, but "not siblings" doesn't describe the actual check that has taken place, which also includes dominance checks. Couldn't you just return DiagnosedSilenceableFailure from the helper function and put a more descriptive error message there?

This revision now requires changes to proceed.Aug 7 2023, 2:18 AM
Groverkss updated this revision to Diff 548480.Aug 8 2023, 11:51 PM
Groverkss marked 10 inline comments as done.
  • Add better dominance checking startergy, add more tests, address comments
Groverkss added inline comments.Aug 8 2023, 11:54 PM
mlir/include/mlir/Dialect/SCF/Utils/Utils.h
189–190

My understanding of siblings is that they are two loops that do not behave in a producer/consumer fashion. A result/side-effect produced by the first loop, should not be consumed by the other loop. Isn't this the same as being independent? Or am I missing something?

I'm happy to change the name if you feel this is not the right word.

mlir/lib/Dialect/SCF/TransformOps/SCFTransformOps.cpp
340

Regarding the unnecessary conditions, I now check the ordering and only do the required checks.

Regarding dominance, I switched to using DominanceInfo, will that be enough? Also, I added some more test cases which check for dominance failures.

While adding these tests, I also noticed that not only do the operands need to be checked, but we also need to check if any value used by the second_loop should be visible by the first loop. So, I added another check for this.

Groverkss updated this revision to Diff 548483.Aug 8 2023, 11:59 PM
  • Add comment to explain context of isOpSibling
ftynse accepted this revision.Aug 16 2023, 5:59 AM
ftynse added inline comments.
mlir/include/mlir/Dialect/SCF/Utils/Utils.h
189–190

In my understanding, sibling operations are merely operations that have a common parent. Nothing more. This is a natural extension of the parent/child terminology used extensively for operations nested in regions of other operations: two "child" operations that have a common parent are siblings. It doesn't imply anything about data flow between them. In non-polyhedral loop transformation terminology, "sibling fusion" is sometimes used in opposition to "producer/consumer fusion", but this is a very niche usage compared to the basic IR structure usage that is more pervasive.

I could live with something like doIndependentSiblingFusion so it is clearer that "sibling" qualifies the kind of fusion and not the relation between operations, but a more verbose and less ambiguous name is welcome.

This revision is now accepted and ready to land.Aug 16 2023, 5:59 AM
Groverkss updated this revision to Diff 551729.Aug 19 2023, 2:47 AM
Groverkss marked an inline comment as done.
  • Change method name
Groverkss added inline comments.Aug 19 2023, 2:49 AM
mlir/include/mlir/Dialect/SCF/Utils/Utils.h
189–190

Changed it to fuseIndependentSiblingForallLoops.

This revision was landed with ongoing or failed builds.Aug 19 2023, 3:03 AM
This revision was automatically updated to reflect the committed changes.