This is an archive of the discontinued LLVM Phabricator instance.

[mlir][scf] Add a pass to expand scf.if regions
Needs RevisionPublic

Authored by antiagainst on Jan 11 2022, 5:51 AM.

Details

Summary

This pass expands the scf.if op regions by pulling
in ops before and after the scf.if op into both
regions. This is an useful transformation to put
ops within the same region so that we can optimze
together better, e.g., when we have a fast-slow
path in the same kernel we'd want to optimize
each path separately.

Depends On D117018

Diff Detail

Event Timeline

antiagainst created this revision.Jan 11 2022, 5:51 AM
antiagainst requested review of this revision.Jan 11 2022, 5:51 AM

I wonder if there isn't a name in the literature for this transformation?

Also I would think that a canonicalization could undo this transformation right?

mlir/include/mlir/Dialect/SCF/Passes.td
127 ↗(On Diff #398928)

This seems like an unnecessary restriction to me: can't you just process operation in the same block as the scf.if without further restrictions?

I wonder if there isn't a name in the literature for this transformation?

Good question! I didn't find one. Was tempted to call it sinking, but that only pulls in ops before the region I think..

Also I would think that a canonicalization could undo this transformation right?

Canonicalization typically won't move ops across region boundaries. (Constant is an exception though.) But yes generally the application of this pattern should be controlled precisely w.r.t. other passes, like CSE, which may reverse some of the changes.

mlir/include/mlir/Dialect/SCF/Passes.td
127 ↗(On Diff #398928)

Good point. Though I'm not sure how likely we are gonna see the case where we have multiple basic blocks containing scf.if ops. To me scf.if ops are structured control flow, while basic blocks are normal control flow; they typically don't operate at the same level. But can certainly drop if there is such a case and need though.

mlir/include/mlir/Dialect/SCF/Passes.h
64 ↗(On Diff #406202)

As a transformation and a pattern this makes sense to me.
But I would turn this into a test pass: this is unlikely to grow standalone heuristics and control beyond: just apply greedily.

mlir/lib/Dialect/SCF/Transforms/IfRegionExpansion.cpp
63

typo: no

155

This is unnecessary and an effect of running with a greedy rewriter.

As we move away from this towards better compositions of patterns, this goes against the flow.
How about having a test pass that just walks the scf.if ops in your IR and applies the pattern (e.g. with applyOpPatternsAndFold).

antiagainst marked 3 inline comments as done.

Address comments

mlir/lib/Dialect/SCF/Transforms/IfRegionExpansion.cpp
155

SG! Done.

mlir/lib/Dialect/SCF/Transforms/IfRegionExpansion.cpp
62

Can you explain what DCE has to do here?
I am unclear why you want to limit prevOps to all being noEffect; copying to all branches should be ok no?

Also, why only prevOps and no nextOps ?

72

I don't get this, are you assuming that all ops that follow scf.if in the block will get pulled in and that will be the final type?

First, I am not sure why nextOps let you do that while prevOps don't.

Second, you're also doing this in the context of GPU right?
If so, what about divergent control flow?
Such ops shouldn't be pulled in IIUC?

84

What happens with all the backward slice involved in computing the condition?
Your code and tests seem to assume this is always a bbarg of an enclosing block, which sounds super restrictive.
In any case, this looks like a bug to me atm.

103

This looks fishy, should this be newYieldOp ?
Otherwise you're plugging in old values that would not exist in the cloned code.

antiagainst marked 4 inline comments as done.

Address comments

mlir/lib/Dialect/SCF/Transforms/IfRegionExpansion.cpp
62

The original idea of requiring no side effecting for all prevOps is that we can then free to clone them into both regions without problem. And we can leave the original prevOps just there---given they are not side effecting and after cloning and rewrite users, they will be dead code. So DCE can kick in later to remove them.

Yes indeed copying them to both regions are still fine. So I changed the impl. to not do side effecting analysis here. Instead, we just move them all ops to both regions, as long as they can be moved (according to the filter function) and does not cause invalid IR def-use relationship.

72

Yes this is assuming all ops following the scf.if can be pulled into both regions. Therefore, values escaping the new scf.if regions are can only be used by the original terminator.

The planned use for this pattern is early in the CodeGen flow with tensor level abstractions. So such assumption is fine. It help to simplify the impl. Otherwise we need to analyze the ops following the scf.if op and see what values are now generated from the new scf.if regions and also yield those. I don't have a concrete use case here; it's better to support that when needed.

Good point regarding divergent control flow. For the case at my hand it's not an issue because tensor level abstractions and value semantics. But indeed, if it's used at a lower level, it can be problematic to also pull in ops like barriers. To address this, I introduced a filter function to the pattern to check whether it's possible to pull in ops before/after into both regions. This gives pattern users control. The filter function will be used to decide with prev/next ops to not pull in with the backward/forward slices.

84

The original idea was to require all prevOps to have no side effects, including the backward slice for the scf.if condition. So we can feel free to clone them into scf.if regions, but they will be dead code if not used in scf.if regions and will be removed. The previous impl. also don't explicitly remove prevOps and let DCE did that, so backward slice of the scf.if condition will be kept as-is.

I changed the impl. to explicitly filter the backward slice for the scf.if condition now to make it clear. Also added a test where we have multiple ops computing the scf.if condition.

103

oldYieldOp is intended here. Thew newYieldOp is not created until at the end of this pullIntoBlock lambda. We clone the old block's operations *without* the old yield op when there are nextOps to pull into both regions. The newYieldOp will only be created after pulling in all nextOps too.

nicolasvasilache requested changes to this revision.Mar 21 2022, 3:41 AM
nicolasvasilache added inline comments.
mlir/lib/Dialect/SCF/Transforms/IfRegionExpansion.cpp
38

This is much too large, it should be broken down with the use of a bunch of helper functions.

42

Logically, all checks that operate only on the ifOp should be pulled out of this helper function and become early exits in the matchAndRewrite method.

48

Hoist to match and rewrite.

52

Hoist to match and rewrite and pass ops as argument to helper function.

56

Hoist to match and rewrite.

62

sg!

65

Hoist to match and rewrite.

115

There should be much more idiomatic constructs in llvm, in either SetOperations.h or SetVector.h to do intersection, union, difference.

134

It seems fishy that the terminator of the op enclosing the ifOp is involved in the logic.
Can you explain with doc comments and pseudo-IR ?

179

It seems fishy that the terminator of the op enclosing the ifOp is involved in the logic.
Can you explain with doc comments and pseudo-IR ?

199

It seems fishy that the terminator of the op enclosing the ifOp is involved in the logic.
Can you explain with doc comments and pseudo-IR ?

This revision now requires changes to proceed.Mar 21 2022, 3:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 21 2022, 3:41 AM
mravishankar resigned from this revision.May 9 2022, 3:36 PM