This is an archive of the discontinued LLVM Phabricator instance.

mlir/include/mlir/Dialect/SCF/SCFOps.td
118	Does this patch implement this TODO? If so can it be removed? Also, the suggestion here seems to be a fold.
mlir/lib/Dialect/SCF/SCF.cpp
146	Nit: "test.bar"(%x)

Harbormaster completed remote builds in B110855: Diff 354290.Jun 24 2021, 10:40 AM

I think the description of the operation should be updated, this canonicalization isn't obvious to me.

wsmoses mentioned this in D104960: [MLIR][SCF] Inline ExecuteRegion if parent can contain multiple blocks.Jun 25 2021, 11:45 PM

bondhugula added a subscriber: bondhugula.Jun 26 2021, 12:00 AM

bondhugula added inline comments.

mlir/lib/Dialect/SCF/SCF.cpp
77	`blockArgs` isn't documented.
134	Use triple /// comments.
153	This will take O(N) time when compared to !llvm::hasSingleElement(op.region()) which will be O(1).

I think the description of the operation should be updated, this canonicalization isn't obvious to me.

I think this is a pretty obvious canonicalization in mind from the start:

From https://llvm.discourse.group/t/introduce-std-inlined-call-op-proposal/282
...
5. As examples, abstractions like affine grayboxes, lambdas with implicit captures (or even explicit when possible) could be lowered to this without first lowering out structured loops/ifs or outlining. But the initial use case is the “all implicit capture” one. In particular, an affine.graybox with > 1 block in its region is nicely lowered to such an std.inlined_call (those with 1 block can be readily inlined as is).

Why does the op description need an update?

In D104865#2842238, @bondhugula wrote:
I think the description of the operation should be updated, this canonicalization isn't obvious to me.

I think this is a pretty obvious canonicalization in mind from the start:
From https://llvm.discourse.group/t/introduce-std-inlined-call-op-proposal/282
...
5. As examples, abstractions like affine grayboxes, lambdas with implicit captures (or even explicit when possible) could be lowered to this without first lowering out structured loops/ifs or outlining. But the initial use case is the “all implicit capture” one. In particular, an affine.graybox with > 1 block in its region is nicely lowered to such an std.inlined_call (those with 1 block can be readily inlined as is).
Why does the op description need an update?

Even from your description, this isn't an obvious canonicalization to me. There is a loss of information in the IR structure here, and for example extra attributes are lost.
The first example in the description is misleading to me when it gets canonicalized away entirely!

scf.for %i = 0 to 128 step %c1 {
  %y = scf.execute_region -> i32 {
    %x = load %A[%i] : memref<128xi32>
    scf.yield %x : i32
  }
}

In D104865#2843276, @mehdi_amini wrote:
In D104865#2842238, @bondhugula wrote:
I think the description of the operation should be updated, this canonicalization isn't obvious to me.

I think this is a pretty obvious canonicalization in mind from the start:
From https://llvm.discourse.group/t/introduce-std-inlined-call-op-proposal/282
...
5. As examples, abstractions like affine grayboxes, lambdas with implicit captures (or even explicit when possible) could be lowered to this without first lowering out structured loops/ifs or outlining. But the initial use case is the “all implicit capture” one. In particular, an affine.graybox with > 1 block in its region is nicely lowered to such an std.inlined_call (those with 1 block can be readily inlined as is).
Why does the op description need an update?
Even from your description, this isn't an obvious canonicalization to me. There is a loss of information in the IR structure here, and for example extra attributes are lost.

I didn't understand. Did you mean execute_region's attributes? But the op has no (intrinsic) attributes. I'm also missing what IR structure we are losing - it's executing a region exactly once and we are inlining it here.

The first example in the description is misleading to me when it gets canonicalized away entirely!
scf.for %i = 0 to 128 step %c1 {
  %y = scf.execute_region -> i32 {
    %x = load %A[%i] : memref<128xi32>
    scf.yield %x : i32
  }
}

Since %y is unused, after inlining, it should all be dead.

In D104865#2843307, @bondhugula wrote:
In D104865#2843276, @mehdi_amini wrote:
In D104865#2842238, @bondhugula wrote:
I think the description of the operation should be updated, this canonicalization isn't obvious to me.

I think this is a pretty obvious canonicalization in mind from the start:
From https://llvm.discourse.group/t/introduce-std-inlined-call-op-proposal/282
...
5. As examples, abstractions like affine grayboxes, lambdas with implicit captures (or even explicit when possible) could be lowered to this without first lowering out structured loops/ifs or outlining. But the initial use case is the “all implicit capture” one. In particular, an affine.graybox with > 1 block in its region is nicely lowered to such an std.inlined_call (those with 1 block can be readily inlined as is).
Why does the op description need an update?
Even from your description, this isn't an obvious canonicalization to me. There is a loss of information in the IR structure here, and for example extra attributes are lost.
I didn't understand. Did you mean execute_region's attributes?

Yes.

But the op has no (intrinsic) attributes.

Intrinsic is key here.

I'm also missing what IR structure we are losing - it's executing a region exactly once and we are inlining it here.

Yes: so it fits as part of an "inlining" transformation more than a canonicalization.

Other kind of structure lost here is that this operation likely could have an AutomaticMemoryAllocation scope and could be used for this purpose (I thought it was part of the original intent as well!), "inlining" it wouldn't preserve this.

The first example in the description is misleading to me when it gets canonicalized away entirely!
scf.for %i = 0 to 128 step %c1 {
  %y = scf.execute_region -> i32 {
    %x = load %A[%i] : memref<128xi32>
    scf.yield %x : i32
  }
}
Since %y is unused, after inlining, it should all be dead.

This isn't the point: we're documenting a construct outside of its intended use. I am asking for the documentation to be as straight as possible here.
For example in https://reviews.llvm.org/D104960 the description is "The executeregionop is used to allow multiple blocks within SCF constructs"; if this is the main reason this op exists, then it seems to me like this should be the first line in the description!
Also the scf.if/scf.for description could refer to this operation in description of the multi-block support.

I'm also missing what IR structure we are losing - it's executing a region exactly once and we are inlining it here.

Yes: so it fits as part of an "inlining" transformation more than a canonicalization.

Other kind of structure lost here is that this operation likely could have an AutomaticMemoryAllocation scope and could be used for this purpose (I thought it was part of the original intent as well!), "inlining" it wouldn't preserve this.

It could, but it currently doesn't have such a trait! :-) This op doesn't have any special traits or attributes intrinsic to its definition. If and when someone adds it, they do have to worry about all the places the op is being touched - but that's true for any op. On that note, if a function is being inlined, we have to worry about the allocas in that function.

The first example in the description is misleading to me when it gets canonicalized away entirely!
scf.for %i = 0 to 128 step %c1 {
  %y = scf.execute_region -> i32 {
    %x = load %A[%i] : memref<128xi32>
    scf.yield %x : i32
  }
}
Since %y is unused, after inlining, it should all be dead.
This isn't the point: we're documenting a construct outside of its intended use. I am asking for the documentation to be as straight as possible here.
For example in https://reviews.llvm.org/D104960 the description is "The executeregionop is used to allow multiple blocks within SCF constructs"; if this is the main reason this op exists, then it seems to me like this should be the first line in the description!
Also the scf.if/scf.for description could refer to this operation in description of the multi-block support.

Sure - all of this sounds good.

Right the fact that the documentation mentions " it allows representation of inlined calls ..." was misleading to me as well since some traits are missing. But I saw the absence of the traits as an overlook until this revision.

@wsmoses can you update the documentation?

@wsmoses can you update the documentation?

Sure, how does this sound:

The `execute_region` operation is used to allow multiple blocks within SCF
and other operations which can hold only one block.  The `execute_region`
operation executes the region held exactly once and cannot have any operands.
As such, its region has no arguments. All SSA values that dominate the op can
be accessed inside the op. The op's region can have multiple blocks and the
blocks can have multiple distinct terminators. Balues returned from this op's
region define the op's results.  This makes `execute_region` a good candidate
for circumstances that require control flow encapsulation and isolation such as
when inlining a call with contain multiple blocks.

If this sounds good to everyone should I open a new PR or just commit the updated documentation?

In D104865#2847977, @wsmoses wrote:

@wsmoses can you update the documentation?

Sure, how does this sound:

The `execute_region` operation is used to allow multiple blocks within SCF
and other operations which can hold only one block.  The `execute_region`
operation executes the region held exactly once and cannot have any operands.
As such, its region has no arguments. All SSA values that dominate the op can
be accessed inside the op. The op's region can have multiple blocks and the
blocks can have multiple distinct terminators. Balues returned from this op's

Balues->Values

region define the op's results.  This makes `execute_region` a good candidate
for circumstances that require control flow encapsulation and isolation such as
when inlining a call with contain multiple blocks.

I'm not sure about when inlining a call with contain multiple blocks: there is the problem of alloca for example.

If this sounds good to everyone should I open a new PR or just commit the updated documentation?

Just push it :)
(you can mention this revision in the commit message for reference)

wsmoses mentioned this in rGad4152d1b833: [MLIR] Update description of SCF.execute_region op.Jun 30 2021, 7:17 AM

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

SCF/

SCFOps.td

2 lines

lib/

Dialect/

SCF/

SCF.cpp

57 lines

test/

Dialect/

SCF/

canonicalize.mlir

27 lines

Diff 354292

mlir/include/mlir/Dialect/SCF/SCFOps.td

Show First 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	def ExecuteRegionOp : SCF_Op<"execute_region"> {

let results = (outs Variadic<AnyType>);		let results = (outs Variadic<AnyType>);

let regions = (region AnyRegion:$region);		let regions = (region AnyRegion:$region);

// TODO: If the parent is a func like op (which would be the case if all other		// TODO: If the parent is a func like op (which would be the case if all other
// ops are from the std dialect), the inliner logic could be readily used to		// ops are from the std dialect), the inliner logic could be readily used to
// inline.		// inline.
let hasCanonicalizer = 0;		let hasCanonicalizer = 1;

// TODO: can fold if it returns a constant.		// TODO: can fold if it returns a constant.
// TODO: Single block execute_region ops can be readily inlined irrespective		// TODO: Single block execute_region ops can be readily inlined irrespective
// of which op is a parent. Add a fold for this.		// of which op is a parent. Add a fold for this.
		kiranchandramohanUnsubmitted Not Done Reply Inline Actions Does this patch implement this TODO? If so can it be removed? Also, the suggestion here seems to be a fold. kiranchandramohan: Does this patch implement this TODO? If so can it be removed? Also, the suggestion here seems…
let hasFolder = 0;		let hasFolder = 0;
}		}

def ForOp : SCF_Op<"for",		def ForOp : SCF_Op<"for",
[DeclareOpInterfaceMethods<LoopLikeOpInterface>,		[DeclareOpInterfaceMethods<LoopLikeOpInterface>,
DeclareOpInterfaceMethods<RegionBranchOpInterface>,		DeclareOpInterfaceMethods<RegionBranchOpInterface>,
SingleBlockImplicitTerminator<"scf::YieldOp">,		SingleBlockImplicitTerminator<"scf::YieldOp">,
RecursiveSideEffects]> {		RecursiveSideEffects]> {
▲ Show 20 Lines • Show All 590 Lines • Show Last 20 Lines

mlir/lib/Dialect/SCF/SCF.cpp

Show First 20 Lines • Show All 67 Lines • ▼ Show 20 Lines
void mlir::scf::buildTerminatedBody(OpBuilder &builder, Location loc) {		void mlir::scf::buildTerminatedBody(OpBuilder &builder, Location loc) {
builder.create<scf::YieldOp>(loc);		builder.create<scf::YieldOp>(loc);
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// ExecuteRegionOp		// ExecuteRegionOp
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

		/// Replaces the given op with the contents of the given single-block region,
		/// using the operands of the block terminator to replace operation results.
		bondhugulaUnsubmitted Not Done Reply Inline Actions `blockArgs` isn't documented. bondhugula: `blockArgs` isn't documented.
		static void replaceOpWithRegion(PatternRewriter &rewriter, Operation *op,
		Region &region, ValueRange blockArgs = {}) {
		assert(llvm::hasSingleElement(region) && "expected single-region block");
		Block *block = &region.front();
		Operation *terminator = block->getTerminator();
		ValueRange results = terminator->getOperands();
		rewriter.mergeBlockBefore(block, op, blockArgs);
		rewriter.replaceOp(op, results);
		rewriter.eraseOp(terminator);
		}

///		///
/// (ssa-id `=`)? `execute_region` `->` function-result-type `{`		/// (ssa-id `=`)? `execute_region` `->` function-result-type `{`
/// block+		/// block+
/// `}`		/// `}`
///		///
/// Example:		/// Example:
/// scf.execute_region -> i32 {		/// scf.execute_region -> i32 {
/// %idx = load %rI[%i] : memref<128xi32>		/// %idx = load %rI[%i] : memref<128xi32>
Show All 29 Lines
static LogicalResult verify(ExecuteRegionOp op) {		static LogicalResult verify(ExecuteRegionOp op) {
if (op.region().empty())		if (op.region().empty())
return op.emitOpError("region needs to have at least one block");		return op.emitOpError("region needs to have at least one block");
if (op.region().front().getNumArguments() > 0)		if (op.region().front().getNumArguments() > 0)
return op.emitOpError("region cannot have any arguments");		return op.emitOpError("region cannot have any arguments");
return success();		return success();
}		}

		// Inline an ExecuteRegionOp if it only contains one block.
		ftynseUnsubmitted Not Done Reply Inline Actions one op -> one block ftynse: one op -> one block
		bondhugulaUnsubmitted Not Done Reply Inline Actions Use triple /// comments. bondhugula: Use triple /// comments.
		// "test.foo"() : () -> ()
		// %v = scf.execute_region -> i64 {
		// %x = "test.val"() : () -> i64
		// scf.yield %x : i64
		// }
		// "test.bar"(%v) : (i64) -> ()
		//
		// becomes
		//
		// "test.foo"() : () -> ()
		// %x = "test.val"() : () -> i64
		// "test.bar"(%v) : (i64) -> ()
		kiranchandramohanUnsubmitted Not Done Reply Inline Actions Nit: "test.bar"(%x) kiranchandramohan: Nit: "test.bar"(%x)
		//
		struct SingleBlockExecuteInliner : public OpRewritePattern<ExecuteRegionOp> {
		using OpRewritePattern<ExecuteRegionOp>::OpRewritePattern;

		LogicalResult matchAndRewrite(ExecuteRegionOp op,
		PatternRewriter &rewriter) const override {
		if (op.region().getBlocks().size() != 1)
		bondhugulaUnsubmitted Not Done Reply Inline Actions This will take O(N) time when compared to !llvm::hasSingleElement(op.region()) which will be O(1). bondhugula: This will take O(N) time when compared to ``` !llvm::hasSingleElement(op.region()) ``` which…
		return failure();
		replaceOpWithRegion(rewriter, op, op.region());
		return success();
		}
		};

		void ExecuteRegionOp::getCanonicalizationPatterns(RewritePatternSet &results,
		MLIRContext *context) {
		results.add<SingleBlockExecuteInliner>(context);
		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// ForOp		// ForOp
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

void ForOp::build(OpBuilder &builder, OperationState &result, Value lb,		void ForOp::build(OpBuilder &builder, OperationState &result, Value lb,
Value ub, Value step, ValueRange iterArgs,		Value ub, Value step, ValueRange iterArgs,
BodyBuilderFn bodyBuilder) {		BodyBuilderFn bodyBuilder) {
result.addOperands({lb, ub, step});		result.addOperands({lb, ub, step});
▲ Show 20 Lines • Show All 310 Lines • ▼ Show 20 Lines	return buildLoopNest(builder, loc, lbs, ubs, steps, llvm::None,
Location nestedLoc, ValueRange ivs,		Location nestedLoc, ValueRange ivs,
ValueRange) -> ValueVector {		ValueRange) -> ValueVector {
if (bodyBuilder)		if (bodyBuilder)
bodyBuilder(nestedBuilder, nestedLoc, ivs);		bodyBuilder(nestedBuilder, nestedLoc, ivs);
return {};		return {};
});		});
}		}

/// Replaces the given op with the contents of the given single-block region,
/// using the operands of the block terminator to replace operation results.
static void replaceOpWithRegion(PatternRewriter &rewriter, Operation *op,
Region &region, ValueRange blockArgs = {}) {
assert(llvm::hasSingleElement(region) && "expected single-region block");
Block *block = &region.front();
Operation *terminator = block->getTerminator();
ValueRange results = terminator->getOperands();
rewriter.mergeBlockBefore(block, op, blockArgs);
rewriter.replaceOp(op, results);
rewriter.eraseOp(terminator);
}

namespace {		namespace {
// Fold away ForOp iter arguments when:		// Fold away ForOp iter arguments when:
// 1) The op yields the iter arguments.		// 1) The op yields the iter arguments.
// 2) The iter arguments have no use and the corresponding outer region		// 2) The iter arguments have no use and the corresponding outer region
// iterators (inputs) are yielded.		// iterators (inputs) are yielded.
// 3) The iter arguments have no use and the corresponding (operation) results		// 3) The iter arguments have no use and the corresponding (operation) results
// have no use.		// have no use.
//		//
▲ Show 20 Lines • Show All 1,684 Lines • Show Last 20 Lines

mlir/test/Dialect/SCF/canonicalize.mlir

Show First 20 Lines • Show All 915 Lines • ▼ Show 20 Lines	^bb2:
%c2 = constant 2 : i64		%c2 = constant 2 : i64
br ^bb3(%c2 : i64)		br ^bb3(%c2 : i64)

^bb3(%x : i64):		^bb3(%x : i64):
scf.yield %x : i64		scf.yield %x : i64
}		}
"test.bar"(%v) : (i64) -> ()		"test.bar"(%v) : (i64) -> ()
// CHECK: %[[C2:.*]] = constant 2 : i64		// CHECK: %[[C2:.*]] = constant 2 : i64
// CHECK: scf.execute_region -> i64 {		// CHECK: "test.foo"
// CHECK-NEXT: scf.yield %[[C2]] : i64		// CHECK-NEXT: "test.bar"(%[[C2]]) : (i64) -> ()
// CHECK-NEXT: }		}
		return
		}

		// -----

		// CHECK-LABEL: func @execute_region_elim
		func @execute_region_elim() {
		affine.for %i = 0 to 100 {
		"test.foo"() : () -> ()
		%v = scf.execute_region -> i64 {
		%x = "test.val"() : () -> i64
		scf.yield %x : i64
		}
		"test.bar"(%v) : (i64) -> ()
}		}
return		return
}		}

		// CHECK-NEXT: affine.for %arg0 = 0 to 100 {
		// CHECK-NEXT: "test.foo"() : () -> ()
		// CHECK-NEXT: %[[VAL:.*]] = "test.val"() : () -> i64
		// CHECK-NEXT: "test.bar"(%[[VAL]]) : (i64) -> ()
		// CHECK-NEXT: }

This is an archive of the discontinued LLVM Phabricator instance.

[MLIR][SCF] Inline single block ExecuteRegionOpClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 354292

mlir/include/mlir/Dialect/SCF/SCFOps.td

mlir/lib/Dialect/SCF/SCF.cpp

mlir/test/Dialect/SCF/canonicalize.mlir

[MLIR][SCF] Inline single block ExecuteRegionOp
ClosedPublic