Page MenuHomePhabricator

[MLIR] Fix op folding to not run pre-replace when not constant folding
ClosedPublic

Authored by bondhugula on Mon, Mar 16, 10:29 PM.

Details

Summary

OperationFolder::tryToFold was running the pre-replacement action even
when there was no constant folding, i.e., when the operation was just
being updated in place but was not going to be replaced. This led to
nested ops being unnecessarily removed from the worklist and only being
processed in the next outer iteration of the greedy pattern rewriter,
which is also why this didn't affect the final output IR but only the
convergence rate. It also led to an op's results' users to be
unnecessarily added to the worklist.

Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>

Diff Detail

Event Timeline

bondhugula created this revision.Mon, Mar 16, 10:29 PM
rriddle added inline comments.Mon, Mar 16, 11:16 PM
mlir/lib/Transforms/Utils/FoldUtils.cpp
92–98

Hmmm, can we instead add a flag to the callback detailing if the op was just updated in-place? I'd say for canonicalization we still want to reconsider the original operands and the users of the operation even when it gets updated.

bondhugula marked an inline comment as done.Tue, Mar 17, 1:40 AM
bondhugula added inline comments.
mlir/lib/Transforms/Utils/FoldUtils.cpp
92–98

There is a tradeoff to make here. If the op gets updated, but doesn't really fold to a constant: would you really want to add its users and operands to the worklist? Although some of the original operands might have been removed and will have a lower use count, we won't be sure if that's true and for how many (because we won't know in what way the op got updated). So you risk adding more and more wasteful worklist items for that iteration. If those operand use values end up with say 0 uses, you'd anyway catch them later - either in that iteration or the next iteration since 'changed' would be set to true.

In striking contrast, in the case of constant folding, you definitely know that all original operands are wiped out, and so it makes perfect sense to add all of them to the worklist as well as add users of its results to the worklist, which the existing code does. But there is something to think about for the cases where there isn't true folding (aka constand folding). As a concrete example, consider forOp bound canonicalization: here, dropping unused bound operands is being done as part of tryToFold and it returns true when that happens. Now, we don't need to add all of the original operands back to the worklist - they'll be caught in the worklist subsequently and a good subset of them are probably still on the operand list after the folding, which means it's wasted work for their pattern matches.

Similarly, adding the users of a result when you don't do a true fold but just an update might make things slower: the situations where the user is affected because the defining op has been updated (as opposed to folded) are even less common(?) (compared to the operand case). So you risk creating 0 work pattern matches in that iteration. And since we set 'changed' to already true, all of these ops would anyway get into the worklist again.

Let me know what you think - I wish we had some precise timing tests on a good mix of heavy use cases. But there aren't many ops where fold does an in-place update - so either choice may potentially have little impact.

rriddle added inline comments.Tue, Mar 17, 4:14 PM
mlir/lib/Transforms/Utils/FoldUtils.cpp
92–98

If those operand use values end up with say 0 uses, you'd anyway catch them later - either in that iteration or the next iteration since 'changed' would be set to true.

An extra iteration is always going to be more expensive. We should be trying to limit the number of iterations, each one involves adding *every* op to the worklist as well as additional(likely unnecessary in these cases) region simplifications.

Now, we don't need to add all of the original operands back to the worklist

You could try and say the same for every op, there is nothing special here. The number of operations that fold in-place is extremely small, and almost all of the ones I know of involve removing operands.

Similarly, adding the users of a result when you don't do a true fold but just an update:

I don't view this one similarly. I could see this being avoided for plain in-place updates, but this is a stark contrast from the common case of in-place folds.

rriddle accepted this revision.Tue, Mar 17, 4:18 PM
This revision is now accepted and ready to land.Tue, Mar 17, 4:18 PM
rriddle added inline comments.Tue, Mar 17, 4:20 PM
mlir/lib/Transforms/Utils/FoldUtils.cpp
92–98

Though, given that there are so few cases of in-place updates ATM I'm fine with punting until we have a case with unnecessary additional iterations.

bondhugula marked an inline comment as done.Tue, Mar 17, 6:36 PM
bondhugula added inline comments.
mlir/lib/Transforms/Utils/FoldUtils.cpp
92–98

An extra iteration is always going to be more expensive. We should
be trying to limit the number of iterations, each one involves adding

Yes, but that extra iteration is anyway happening (irrespective of this change) because we are setting 'changed' to true with in-place updates as well! Did you mean to return failure() instead of success() in the case of in-place updates? Then, the extra iteration won't happen (but that won't be right).

You could try and say the same for every op, there is nothing
special here. The number of operations that fold in-place is

In the case of constant folds, the use count of each value involved in the operands reduces by exactly one - this may not happen for in-place updates and even if it happens, it may not for all its operands. Second, in the case of constant folds, the chances of users needing an update is much higher than in the case of users with in-place updates. So, the two scenarios are quite different. Again, w/ or w/o this patch, note that in both cases, we are setting 'changed' to true and so the extra iteration (which adds every op) will happen.

Let me know if I'm missing something here.

This revision was automatically updated to reflect the committed changes.