This commit adds a canonicalization pass which inlines any single block execute region
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
mlir/lib/Dialect/SCF/SCF.cpp | ||
---|---|---|
134 | one op -> one block |
I think the description of the operation should be updated, this canonicalization isn't obvious to me.
I think the description of the operation should be updated, this canonicalization isn't obvious to me.
I think this is a pretty obvious canonicalization in mind from the start:
From https://llvm.discourse.group/t/introduce-std-inlined-call-op-proposal/282 ... 5. As examples, abstractions like affine grayboxes, lambdas with implicit captures (or even explicit when possible) could be lowered to this without first lowering out structured loops/ifs or outlining. But the initial use case is the “all implicit capture” one. In particular, an affine.graybox with > 1 block in its region is nicely lowered to such an std.inlined_call (those with 1 block can be readily inlined as is).
Why does the op description need an update?
Even from your description, this isn't an obvious canonicalization to me. There is a loss of information in the IR structure here, and for example extra attributes are lost.
The first example in the description is misleading to me when it gets canonicalized away entirely!
scf.for %i = 0 to 128 step %c1 { %y = scf.execute_region -> i32 { %x = load %A[%i] : memref<128xi32> scf.yield %x : i32 } }
I didn't understand. Did you mean execute_region's attributes? But the op has no (intrinsic) attributes. I'm also missing what IR structure we are losing - it's executing a region exactly once and we are inlining it here.
The first example in the description is misleading to me when it gets canonicalized away entirely!
scf.for %i = 0 to 128 step %c1 { %y = scf.execute_region -> i32 { %x = load %A[%i] : memref<128xi32> scf.yield %x : i32 } }
Since %y is unused, after inlining, it should all be dead.
Yes.
But the op has no (intrinsic) attributes.
Intrinsic is key here.
I'm also missing what IR structure we are losing - it's executing a region exactly once and we are inlining it here.
Yes: so it fits as part of an "inlining" transformation more than a canonicalization.
Other kind of structure lost here is that this operation likely could have an AutomaticMemoryAllocation scope and could be used for this purpose (I thought it was part of the original intent as well!), "inlining" it wouldn't preserve this.
The first example in the description is misleading to me when it gets canonicalized away entirely!
scf.for %i = 0 to 128 step %c1 { %y = scf.execute_region -> i32 { %x = load %A[%i] : memref<128xi32> scf.yield %x : i32 } }Since %y is unused, after inlining, it should all be dead.
This isn't the point: we're documenting a construct outside of its intended use. I am asking for the documentation to be as straight as possible here.
For example in https://reviews.llvm.org/D104960 the description is "The executeregionop is used to allow multiple blocks within SCF constructs"; if this is the main reason this op exists, then it seems to me like this should be the first line in the description!
Also the scf.if/scf.for description could refer to this operation in description of the multi-block support.
I'm also missing what IR structure we are losing - it's executing a region exactly once and we are inlining it here.
Yes: so it fits as part of an "inlining" transformation more than a canonicalization.
Other kind of structure lost here is that this operation likely could have an AutomaticMemoryAllocation scope and could be used for this purpose (I thought it was part of the original intent as well!), "inlining" it wouldn't preserve this.
It could, but it currently doesn't have such a trait! :-) This op doesn't have any special traits or attributes intrinsic to its definition. If and when someone adds it, they do have to worry about all the places the op is being touched - but that's true for any op. On that note, if a function is being inlined, we have to worry about the allocas in that function.
The first example in the description is misleading to me when it gets canonicalized away entirely!
scf.for %i = 0 to 128 step %c1 { %y = scf.execute_region -> i32 { %x = load %A[%i] : memref<128xi32> scf.yield %x : i32 } }Since %y is unused, after inlining, it should all be dead.
This isn't the point: we're documenting a construct outside of its intended use. I am asking for the documentation to be as straight as possible here.
For example in https://reviews.llvm.org/D104960 the description is "The executeregionop is used to allow multiple blocks within SCF constructs"; if this is the main reason this op exists, then it seems to me like this should be the first line in the description!
Also the scf.if/scf.for description could refer to this operation in description of the multi-block support.
Sure - all of this sounds good.
Right the fact that the documentation mentions " it allows representation of inlined calls ..." was misleading to me as well since some traits are missing. But I saw the absence of the traits as an overlook until this revision.
@wsmoses can you update the documentation?
@wsmoses can you update the documentation?
Sure, how does this sound:
The `execute_region` operation is used to allow multiple blocks within SCF and other operations which can hold only one block. The `execute_region` operation executes the region held exactly once and cannot have any operands. As such, its region has no arguments. All SSA values that dominate the op can be accessed inside the op. The op's region can have multiple blocks and the blocks can have multiple distinct terminators. Balues returned from this op's region define the op's results. This makes `execute_region` a good candidate for circumstances that require control flow encapsulation and isolation such as when inlining a call with contain multiple blocks.
If this sounds good to everyone should I open a new PR or just commit the updated documentation?
Balues->Values
region define the op's results. This makes `execute_region` a good candidate for circumstances that require control flow encapsulation and isolation such as when inlining a call with contain multiple blocks.
I'm not sure about when inlining a call with contain multiple blocks: there is the problem of alloca for example.
If this sounds good to everyone should I open a new PR or just commit the updated documentation?
Just push it :)
(you can mention this revision in the commit message for reference)
Does this patch implement this TODO? If so can it be removed? Also, the suggestion here seems to be a fold.