This is an archive of the discontinued LLVM Phabricator instance.

[MLIR][SCF] Inline single block ExecuteRegionOp
ClosedPublic

Authored by wsmoses on Jun 24 2021, 10:03 AM.

Details

Summary

This commit adds a canonicalization pass which inlines any single block execute region

Diff Detail

Event Timeline

wsmoses created this revision.Jun 24 2021, 10:03 AM
wsmoses requested review of this revision.Jun 24 2021, 10:03 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 24 2021, 10:03 AM
wsmoses updated this revision to Diff 354290.Jun 24 2021, 10:08 AM

Fix tests

ftynse accepted this revision.Jun 24 2021, 10:13 AM
ftynse added inline comments.
mlir/lib/Dialect/SCF/SCF.cpp
134

one op -> one block

This revision is now accepted and ready to land.Jun 24 2021, 10:13 AM
This revision was landed with ongoing or failed builds.Jun 24 2021, 10:17 AM
This revision was automatically updated to reflect the committed changes.
kiranchandramohan added inline comments.
mlir/include/mlir/Dialect/SCF/SCFOps.td
118

Does this patch implement this TODO? If so can it be removed? Also, the suggestion here seems to be a fold.

mlir/lib/Dialect/SCF/SCF.cpp
146

Nit: "test.bar"(%x)

I think the description of the operation should be updated, this canonicalization isn't obvious to me.

bondhugula added inline comments.
mlir/lib/Dialect/SCF/SCF.cpp
77

blockArgs isn't documented.

134

Use triple /// comments.

153

This will take O(N) time when compared to

!llvm::hasSingleElement(op.region())

which will be O(1).

I think the description of the operation should be updated, this canonicalization isn't obvious to me.

I think this is a pretty obvious canonicalization in mind from the start:

From https://llvm.discourse.group/t/introduce-std-inlined-call-op-proposal/282
...
5. As examples, abstractions like affine grayboxes, lambdas with implicit captures (or even explicit when possible) could be lowered to this without first lowering out structured loops/ifs or outlining. But the initial use case is the “all implicit capture” one. In particular, an affine.graybox with > 1 block in its region is nicely lowered to such an std.inlined_call (those with 1 block can be readily inlined as is).

Why does the op description need an update?

I think the description of the operation should be updated, this canonicalization isn't obvious to me.

I think this is a pretty obvious canonicalization in mind from the start:

From https://llvm.discourse.group/t/introduce-std-inlined-call-op-proposal/282
...
5. As examples, abstractions like affine grayboxes, lambdas with implicit captures (or even explicit when possible) could be lowered to this without first lowering out structured loops/ifs or outlining. But the initial use case is the “all implicit capture” one. In particular, an affine.graybox with > 1 block in its region is nicely lowered to such an std.inlined_call (those with 1 block can be readily inlined as is).

Why does the op description need an update?

Even from your description, this isn't an obvious canonicalization to me. There is a loss of information in the IR structure here, and for example extra attributes are lost.
The first example in the description is misleading to me when it gets canonicalized away entirely!

scf.for %i = 0 to 128 step %c1 {
  %y = scf.execute_region -> i32 {
    %x = load %A[%i] : memref<128xi32>
    scf.yield %x : i32
  }
}

I think the description of the operation should be updated, this canonicalization isn't obvious to me.

I think this is a pretty obvious canonicalization in mind from the start:

From https://llvm.discourse.group/t/introduce-std-inlined-call-op-proposal/282
...
5. As examples, abstractions like affine grayboxes, lambdas with implicit captures (or even explicit when possible) could be lowered to this without first lowering out structured loops/ifs or outlining. But the initial use case is the “all implicit capture” one. In particular, an affine.graybox with > 1 block in its region is nicely lowered to such an std.inlined_call (those with 1 block can be readily inlined as is).

Why does the op description need an update?

Even from your description, this isn't an obvious canonicalization to me. There is a loss of information in the IR structure here, and for example extra attributes are lost.

I didn't understand. Did you mean execute_region's attributes? But the op has no (intrinsic) attributes. I'm also missing what IR structure we are losing - it's executing a region exactly once and we are inlining it here.

The first example in the description is misleading to me when it gets canonicalized away entirely!

scf.for %i = 0 to 128 step %c1 {
  %y = scf.execute_region -> i32 {
    %x = load %A[%i] : memref<128xi32>
    scf.yield %x : i32
  }
}

Since %y is unused, after inlining, it should all be dead.

mehdi_amini added a comment.EditedJun 28 2021, 3:39 PM

I think the description of the operation should be updated, this canonicalization isn't obvious to me.

I think this is a pretty obvious canonicalization in mind from the start:

From https://llvm.discourse.group/t/introduce-std-inlined-call-op-proposal/282
...
5. As examples, abstractions like affine grayboxes, lambdas with implicit captures (or even explicit when possible) could be lowered to this without first lowering out structured loops/ifs or outlining. But the initial use case is the “all implicit capture” one. In particular, an affine.graybox with > 1 block in its region is nicely lowered to such an std.inlined_call (those with 1 block can be readily inlined as is).

Why does the op description need an update?

Even from your description, this isn't an obvious canonicalization to me. There is a loss of information in the IR structure here, and for example extra attributes are lost.

I didn't understand. Did you mean execute_region's attributes?

Yes.

But the op has no (intrinsic) attributes.

Intrinsic is key here.

I'm also missing what IR structure we are losing - it's executing a region exactly once and we are inlining it here.

Yes: so it fits as part of an "inlining" transformation more than a canonicalization.

Other kind of structure lost here is that this operation likely could have an AutomaticMemoryAllocation scope and could be used for this purpose (I thought it was part of the original intent as well!), "inlining" it wouldn't preserve this.

The first example in the description is misleading to me when it gets canonicalized away entirely!

scf.for %i = 0 to 128 step %c1 {
  %y = scf.execute_region -> i32 {
    %x = load %A[%i] : memref<128xi32>
    scf.yield %x : i32
  }
}

Since %y is unused, after inlining, it should all be dead.

This isn't the point: we're documenting a construct outside of its intended use. I am asking for the documentation to be as straight as possible here.
For example in https://reviews.llvm.org/D104960 the description is "The executeregionop is used to allow multiple blocks within SCF constructs"; if this is the main reason this op exists, then it seems to me like this should be the first line in the description!
Also the scf.if/scf.for description could refer to this operation in description of the multi-block support.

I'm also missing what IR structure we are losing - it's executing a region exactly once and we are inlining it here.

Yes: so it fits as part of an "inlining" transformation more than a canonicalization.

Other kind of structure lost here is that this operation likely could have an AutomaticMemoryAllocation scope and could be used for this purpose (I thought it was part of the original intent as well!), "inlining" it wouldn't preserve this.

It could, but it currently doesn't have such a trait! :-) This op doesn't have any special traits or attributes intrinsic to its definition. If and when someone adds it, they do have to worry about all the places the op is being touched - but that's true for any op. On that note, if a function is being inlined, we have to worry about the allocas in that function.

The first example in the description is misleading to me when it gets canonicalized away entirely!

scf.for %i = 0 to 128 step %c1 {
  %y = scf.execute_region -> i32 {
    %x = load %A[%i] : memref<128xi32>
    scf.yield %x : i32
  }
}

Since %y is unused, after inlining, it should all be dead.

This isn't the point: we're documenting a construct outside of its intended use. I am asking for the documentation to be as straight as possible here.
For example in https://reviews.llvm.org/D104960 the description is "The executeregionop is used to allow multiple blocks within SCF constructs"; if this is the main reason this op exists, then it seems to me like this should be the first line in the description!
Also the scf.if/scf.for description could refer to this operation in description of the multi-block support.

Sure - all of this sounds good.

Right the fact that the documentation mentions " it allows representation of inlined calls ..." was misleading to me as well since some traits are missing. But I saw the absence of the traits as an overlook until this revision.

@wsmoses can you update the documentation?

@wsmoses can you update the documentation?

Sure, how does this sound:

The `execute_region` operation is used to allow multiple blocks within SCF
and other operations which can hold only one block.  The `execute_region`
operation executes the region held exactly once and cannot have any operands.
As such, its region has no arguments. All SSA values that dominate the op can
be accessed inside the op. The op's region can have multiple blocks and the
blocks can have multiple distinct terminators. Balues returned from this op's
region define the op's results.  This makes `execute_region` a good candidate
for circumstances that require control flow encapsulation and isolation such as
when inlining a call with contain multiple blocks.

If this sounds good to everyone should I open a new PR or just commit the updated documentation?

@wsmoses can you update the documentation?

Sure, how does this sound:

The `execute_region` operation is used to allow multiple blocks within SCF
and other operations which can hold only one block.  The `execute_region`
operation executes the region held exactly once and cannot have any operands.
As such, its region has no arguments. All SSA values that dominate the op can
be accessed inside the op. The op's region can have multiple blocks and the
blocks can have multiple distinct terminators. Balues returned from this op's

Balues->Values

region define the op's results.  This makes `execute_region` a good candidate
for circumstances that require control flow encapsulation and isolation such as
when inlining a call with contain multiple blocks.

I'm not sure about when inlining a call with contain multiple blocks: there is the problem of alloca for example.

If this sounds good to everyone should I open a new PR or just commit the updated documentation?

Just push it :)
(you can mention this revision in the commit message for reference)