This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/SCF/
-
mlir/
-
Dialect/
-
SCF/
2/2
Passes.h
2/2
Passes.td
-
Transforms.h
-
lib/Dialect/SCF/Transforms/
-
Dialect/
-
SCF/
-
Transforms/
2/2
ParallelLoopTiling.cpp
-
test/Dialect/SCF/
-
Dialect/
-
SCF/
-
parallel-loop-tiling-inbound-check.mlir

Differential D105455

[MLIR][DISC] Revise ParallelLoopTilingPass with inbound_check mode
ClosedPublic

Authored by • linearhit on Jul 5 2021, 8:37 PM.

Download Raw Diff

Details

Reviewers

mehdi_amini
herhut
ftynse

Commits

rG2d45e332ba32: [MLIR][DISC] Revise ParallelLoopTilingPass with inbound_check mode

Summary

Expand ParallelLoopTilingPass with an inbound_check mode.

In default mode, the upper bound of the inner loop is from the min op; in inbound_check mode, the upper bound of the inner loop is the step of the outer loop and an additional inbound check will be emitted inside of the inner loop. This was 'FIXME' in the original codes and a typical usage is for GPU backends, thus the outer loop and inner loop can be mapped to blocks/threads in seperate.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

• linearhit created this revision.Jul 5 2021, 8:37 PM

Herald added subscribers: dcaballe, cota, teijeong and 15 others. · View Herald TranscriptJul 5 2021, 8:37 PM

• linearhit requested review of this revision.Jul 5 2021, 8:37 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 5 2021, 8:37 PM

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

Harbormaster completed remote builds in B112515: Diff 356593.Jul 5 2021, 9:19 PM

• linearhit updated this revision to Diff 356595.Jul 5 2021, 11:18 PM

• linearhit updated this revision to Diff 356606.Jul 5 2021, 11:23 PM

Harbormaster completed remote builds in B112526: Diff 356606.Jul 5 2021, 11:52 PM

mehdi_amini added a reviewer: bondhugula.Jul 8 2021, 6:53 PM

mehdi_amini added inline comments.

mlir/include/mlir/Dialect/SCF/Passes.h
45	Please document the new param
mlir/lib/Dialect/SCF/Transforms/ParallelLoopTiling.cpp
41	(Nit: Indent this to match the first dimension) I think this is the crux of the change here: the tiles will be all the same size instead of being dynamic. I was wondering if there is a better naming here for the option: right now it is pointing at the inbound check inside the body instead of pointing at the change in the tile size. Adding Uday to have another opinion here.

bondhugula added inline comments.Jul 9 2021, 9:05 PM

mlir/include/mlir/Dialect/SCF/Passes.td
51–54	The name of this option doesn't really convey what it's doing - in fact, it seems to imply exactly the opposite: it is putting the check in the body instead of in the bound! (i.e., `in-body-check` instead of `in-bound-check`). Please update the various instances of `upperbound`, etc. - I'd recommend space in between upper and bound.

bondhugula edited reviewers, added: ftynse; removed: bondhugula.Jul 9 2021, 9:05 PM

I think a better name here might be something like use-static-loop-bounds or ensure-static-loop-bounds.

We currently get rid of the affine.min when mapping to gpu. Producing static bounded loops like done here would work, too.

I believe the original comment was more about enabling easy vectorization, which this might not achieve. We are using the loop specialization pass for this, which transforms a loop with an upper bound defined via affine.min into a conditional and two loops. That makes the LLVM vectorizer understand the pattern. That is more for CPU use, though.

Just to be clear, I am not arguing against this change.

nit on the tag in the commit title: What is disc? How does that relate to MLIR upstream (more specifically SCF)?

• linearhit updated this revision to Diff 358825.Jul 14 2021, 7:50 PM

• linearhit updated this revision to Diff 358829.Jul 14 2021, 8:04 PM

• linearhit marked 2 inline comments as done.Jul 14 2021, 8:08 PM

• linearhit added inline comments.

mlir/include/mlir/Dialect/SCF/Passes.h
45	done.
mlir/include/mlir/Dialect/SCF/Passes.td
51–54	done, I rename withInboundCheck into useStaticLoopBounds as Herhut suggests. "upperbound" is changed to "upper bound"
mlir/lib/Dialect/SCF/Transforms/ParallelLoopTiling.cpp
41	Indent fixed.

In D105455#2871426, @herhut wrote:

I think a better name here might be something like use-static-loop-bounds or ensure-static-loop-bounds.

We currently get rid of the affine.min when mapping to gpu. Producing static bounded loops like done here would work, too.

I believe the original comment was more about enabling easy vectorization, which this might not achieve. We are using the loop specialization pass for this, which transforms a loop with an upper bound defined via affine.min into a conditional and two loops. That makes the LLVM vectorizer understand the pattern. That is more for CPU use, though.

Just to be clear, I am not arguing against this change.

I adopt Herhut's suggenstion and rename it into use-static-loop-bounds

In D105455#2872566, @rriddle wrote:

nit on the tag in the commit title: What is disc? How does that relate to MLIR upstream (more specifically SCF)?

DISC stands for an end2end dynamic shape compiler, which is doing upstreaming now. Most of the codes will be into mlir-hlo repo, with only a few necessary changes will be submitted to LLVM repo.
Please refer to https://drive.google.com/file/d/1t6Q_VhZVWBhi--fYTxTOLGklIemlKQmV/view?usp=sharing

In D105455#2878999, @linearhit wrote:

In D105455#2872566, @rriddle wrote:

nit on the tag in the commit title: What is disc? How does that relate to MLIR upstream (more specifically SCF)?

DISC stands for an end2end dynamic shape compiler, which is doing upstreaming now.

We just don't need to tag anything "DISC" in the commit message, I don't think it brings much value to the reader. Downstream projects in general send patches as they make sense in the context of the MLIR project.

This revision is now accepted and ready to land.Jul 14 2021, 8:25 PM

Harbormaster completed remote builds in B114146: Diff 358829.Jul 14 2021, 8:46 PM

I adopt Herhut's suggenstion and rename it into use-static-loop-bounds

This is a terminology issue but static would be inaccurate here. Throughout the compiler literature pervasively, bounds where you have symbols that do not depend on data are treated as static. They don't have to be constant nor need to be free of min/max. min(0, N) or max(M, N) is also considered statically predictable the same way as 32*N or M if M and N are treated as symbols. Are you specifically avoiding a min/max? In that case, you can make this more descriptive - no-min-max-bounds? "fixed upper bound" would also be confusing in the pass description.

I also notice multiple instances of "inbound" in the code comments which aren't meaningful.

In D105455#2879257, @bondhugula wrote:

I adopt Herhut's suggenstion and rename it into use-static-loop-bounds

This is a terminology issue but static would be inaccurate here. Throughout the compiler literature pervasively, bounds where you have symbols that do not depend on data are treated as static. They don't have to be constant nor need to be free of min/max. min(0, N) or max(M, N) is also considered statically predictable the same way as 32*N or M if M and N are treated as symbols. Are you specifically avoiding a min/max? In that case, you can make this more descriptive - no-min-max-bounds? "fixed upper bound" would also be confusing in the pass description.

I also notice multiple instances of "inbound" in the code comments which aren't meaningful.

I'm also okay on this.
Just to confirm it, is everybody agree with the name no-min-max-bounds? @herhut @mehdi_amini

In my

In D105455#2879257, @bondhugula wrote:

I adopt Herhut's suggenstion and rename it into use-static-loop-bounds

This is a terminology issue but static would be inaccurate here. Throughout the compiler literature pervasively, bounds where you have symbols that do not depend on data are treated as static. They don't have to be constant nor need to be free of min/max. min(0, N) or max(M, N) is also considered statically predictable the same way as 32*N or M if M and N are treated as symbols. Are you specifically avoiding a min/max? In that case, you can make this more descriptive - no-min-max-bounds? "fixed upper bound" would also be confusing in the pass description.

I also notice multiple instances of "inbound" in the code comments which aren't meaningful.

no-min-max-bounds is fine with me. Thanks!

In D105455#2895208, @mehdi_amini wrote:

no-min-max-bounds is fine with me. Thanks!

done!

Herald added a subscriber: Chia-hungDuan. · View Herald TranscriptJul 30 2021, 10:20 PM

In D105455#2879001, @mehdi_amini wrote:

In D105455#2878999, @linearhit wrote:

In D105455#2872566, @rriddle wrote:

nit on the tag in the commit title: What is disc? How does that relate to MLIR upstream (more specifically SCF)?

DISC stands for an end2end dynamic shape compiler, which is doing upstreaming now.

We just don't need to tag anything "DISC" in the commit message, I don't think it brings much value to the reader. Downstream projects in general send patches as they make sense in the context of the MLIR project.

done!

• linearhit updated this revision to Diff 363266.Jul 30 2021, 10:53 PM

• linearhit marked an inline comment as done.

• linearhit requested review of this revision.Jul 30 2021, 10:57 PM

Harbormaster completed remote builds in B117304: Diff 363266.Jul 30 2021, 11:24 PM

In D105455#2879257, @bondhugula wrote:

I adopt Herhut's suggenstion and rename it into use-static-loop-bounds

This is a terminology issue but static would be inaccurate here. Throughout the compiler literature pervasively, bounds where you have symbols that do not depend on data are treated as static. They don't have to be constant nor need to be free of min/max. min(0, N) or max(M, N) is also considered statically predictable the same way as 32*N or M if M and N are treated as symbols. Are you specifically avoiding a min/max? In that case, you can make this more descriptive - no-min-max-bounds? "fixed upper bound" would also be confusing in the pass description.

This is also the requirement here. You can have min and max and really any computation as long as such computation is not loop dependent. So min(0, N) would be fine. The current code we have introduces a min between the loop index of the outer loop and a symbol, which I would not consider static.

That said, I am also fine with the current name of the flag.

This revision is now accepted and ready to land.Aug 2 2021, 2:49 AM

In D105455#2919385, @herhut wrote:

In D105455#2879257, @bondhugula wrote:

I adopt Herhut's suggenstion and rename it into use-static-loop-bounds

This is a terminology issue but static would be inaccurate here. Throughout the compiler literature pervasively, bounds where you have symbols that do not depend on data are treated as static. They don't have to be constant nor need to be free of min/max. min(0, N) or max(M, N) is also considered statically predictable the same way as 32*N or M if M and N are treated as symbols. Are you specifically avoiding a min/max? In that case, you can make this more descriptive - no-min-max-bounds? "fixed upper bound" would also be confusing in the pass description.

This is also the requirement here. You can have min and max and really any computation as long as such computation is not loop dependent. So min(0, N) would be fine. The current code we have introduces a min between the loop index of the outer loop and a symbol, which I would not consider static.

Thanks @herhut
It seems to me that I don't have the permission to merge , could you help with that?

That said, I am also fine with the current name of the flag.

Herald added a subscriber: wrengr. · View Herald TranscriptAug 3 2021, 10:40 PM

In D105455#2924400, @linearhit wrote:

Thanks @herhut
It seems to me that I don't have the permission to merge , could you help with that?

Sure. Do you have a preferred name/email for me to use as the author of the patch? Essentially what git log would show on your end. I could not find that information.

In D105455#2928431, @herhut wrote:

In D105455#2924400, @linearhit wrote:

Thanks @herhut
It seems to me that I don't have the permission to merge , could you help with that?

Sure. Do you have a preferred name/email for me to use as the author of the patch? Essentially what git log would show on your end. I could not find that information.

Sorry for a late reply, please use "tashuang.zk" <tashuang.zk@alibaba-inc.com>
Thanks.

Closed by commit rG2d45e332ba32: [MLIR][DISC] Revise ParallelLoopTilingPass with inbound_check mode (authored by tashuang.zk <tashuang.zk@alibaba-inc.com>, committed by herhut). · Explain WhyAug 16 2021, 5:03 AM

This revision was automatically updated to reflect the committed changes.

herhut added a commit: rG2d45e332ba32: [MLIR][DISC] Revise ParallelLoopTilingPass with inbound_check mode.

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

SCF/

Passes.h

7 lines

Passes.td

6 lines

Transforms.h

3 lines

lib/

Dialect/

SCF/

Transforms/

ParallelLoopTiling.cpp

93 lines

test/

Dialect/

SCF/

parallel-loop-tiling-inbound-check.mlir

149 lines

Diff 366595

mlir/include/mlir/Dialect/SCF/Passes.h

	Show All 30 Lines
	/// Creates a loop fusion pass which fuses parallel loops.			/// Creates a loop fusion pass which fuses parallel loops.
	std::unique_ptr<Pass> createParallelLoopFusionPass();			std::unique_ptr<Pass> createParallelLoopFusionPass();

	/// Creates a pass that specializes parallel loop for unrolling and			/// Creates a pass that specializes parallel loop for unrolling and
	/// vectorization.			/// vectorization.
	std::unique_ptr<Pass> createParallelLoopSpecializationPass();			std::unique_ptr<Pass> createParallelLoopSpecializationPass();

	/// Creates a pass which tiles innermost parallel loops.			/// Creates a pass which tiles innermost parallel loops.
				/// If noMinMaxBounds, the upper bound of the inner loop will
				/// be a same value among different outter loop iterations, and
				/// an additional inbound check will be emitted inside the internal
				/// loops.
	std::unique_ptr<Pass>			std::unique_ptr<Pass>
	createParallelLoopTilingPass(llvm::ArrayRef<int64_t> tileSize = {});			createParallelLoopTilingPass(llvm::ArrayRef<int64_t> tileSize = {},
				bool noMinMaxBounds = false);
				mehdi_aminiUnsubmitted Done Reply Inline Actions Please document the new param mehdi_amini: Please document the new param
				linearhitAuthorUnsubmitted Done Reply Inline Actions done. linearhit: done.

	/// Creates a pass which folds arith ops on induction variable into			/// Creates a pass which folds arith ops on induction variable into
	/// loop range.			/// loop range.
	std::unique_ptr<Pass> createForLoopRangeFoldingPass();			std::unique_ptr<Pass> createForLoopRangeFoldingPass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Registration			// Registration
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	/// Generate the code for registering passes.			/// Generate the code for registering passes.
	#define GEN_PASS_REGISTRATION			#define GEN_PASS_REGISTRATION
	#include "mlir/Dialect/SCF/Passes.h.inc"			#include "mlir/Dialect/SCF/Passes.h.inc"

	} // namespace mlir			} // namespace mlir

	#endif // MLIR_DIALECT_SCF_PASSES_H_			#endif // MLIR_DIALECT_SCF_PASSES_H_

mlir/include/mlir/Dialect/SCF/Passes.td

	Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
	}			}

	def SCFParallelLoopTiling : FunctionPass<"parallel-loop-tiling"> {			def SCFParallelLoopTiling : FunctionPass<"parallel-loop-tiling"> {
	let summary = "Tile parallel loops";			let summary = "Tile parallel loops";
	let constructor = "mlir::createParallelLoopTilingPass()";			let constructor = "mlir::createParallelLoopTilingPass()";
	let options = [			let options = [
	ListOption<"tileSizes", "parallel-loop-tile-sizes", "int64_t",			ListOption<"tileSizes", "parallel-loop-tile-sizes", "int64_t",
	"Factors to tile parallel loops by",			"Factors to tile parallel loops by",
	"llvm::cl::ZeroOrMore, llvm::cl::MiscFlags::CommaSeparated">			"llvm::cl::ZeroOrMore, llvm::cl::MiscFlags::CommaSeparated">,
				Option<"noMinMaxBounds", "no-min-max-bounds", "bool",
				/default=/"false",
				"Perform tiling with fixed upper bound with inbound check "
				"inside the internal loops">
				bondhugulaUnsubmitted Done Reply Inline Actions The name of this option doesn't really convey what it's doing - in fact, it seems to imply exactly the opposite: it is putting the check in the body instead of in the bound! (i.e., `in-body-check` instead of `in-bound-check`). Please update the various instances of `upperbound`, etc. - I'd recommend space in between upper and bound. bondhugula: The name of this option doesn't really convey what it's doing - in fact, it seems to imply…
				linearhitAuthorUnsubmitted Done Reply Inline Actions done, I rename withInboundCheck into useStaticLoopBounds as Herhut suggests. "upperbound" is changed to "upper bound" linearhit: done, I rename withInboundCheck into useStaticLoopBounds as Herhut suggests. "upperbound" is…
	];			];
	let dependentDialects = ["AffineDialect"];			let dependentDialects = ["AffineDialect"];
	}			}

	def SCFForLoopRangeFolding			def SCFForLoopRangeFolding
	: Pass<"for-loop-range-folding"> {			: Pass<"for-loop-range-folding"> {
	let summary = "Fold add/mul ops into loop range";			let summary = "Fold add/mul ops into loop range";
	let constructor = "mlir::createForLoopRangeFoldingPass()";			let constructor = "mlir::createForLoopRangeFoldingPass()";
	}			}

	#endif // MLIR_DIALECT_SCF_PASSES			#endif // MLIR_DIALECT_SCF_PASSES

mlir/include/mlir/Dialect/SCF/Transforms.h

	Show First 20 Lines • Show All 81 Lines • ▼ Show 20 Lines
	/// scf.parallel (%j0, %j1) = (0, 0) to (min(tileSize[0], %arg2-%j0)			/// scf.parallel (%j0, %j1) = (0, 0) to (min(tileSize[0], %arg2-%j0)
	/// min(tileSize[1], %arg3-%j1))			/// min(tileSize[1], %arg3-%j1))
	/// step (%arg4, %arg5)			/// step (%arg4, %arg5)
	/// The old loop is replaced with the new one.			/// The old loop is replaced with the new one.
	///			///
	/// The function returns the resulting ParallelOps, i.e. {outer_loop_op,			/// The function returns the resulting ParallelOps, i.e. {outer_loop_op,
	/// inner_loop_op}.			/// inner_loop_op}.
	std::pair<ParallelOp, ParallelOp>			std::pair<ParallelOp, ParallelOp>
	tileParallelLoop(ParallelOp op, llvm::ArrayRef<int64_t> tileSizes);			tileParallelLoop(ParallelOp op, llvm::ArrayRef<int64_t> tileSizes,
				bool noMinMaxBounds);

	/// Populates patterns for SCF structural type conversions and sets up the			/// Populates patterns for SCF structural type conversions and sets up the
	/// provided ConversionTarget with the appropriate legality configuration for			/// provided ConversionTarget with the appropriate legality configuration for
	/// the ops to get converted properly.			/// the ops to get converted properly.
	///			///
	/// A "structural" type conversion is one where the underlying ops are			/// A "structural" type conversion is one where the underlying ops are
	/// completely agnostic to the actual types involved and simply need to update			/// completely agnostic to the actual types involved and simply need to update
	/// their types. An example of this is scf.if -- the scf.if op and the			/// their types. An example of this is scf.if -- the scf.if op and the
	▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

mlir/lib/Dialect/SCF/Transforms/ParallelLoopTiling.cpp

Show All 27 Lines
/// into		/// into
/// scf.parallel (%i0, %i1) = (%arg0, %arg1) to (%arg2, %arg3)		/// scf.parallel (%i0, %i1) = (%arg0, %arg1) to (%arg2, %arg3)
/// step (%arg4*tileSize[0],		/// step (%arg4*tileSize[0],
/// %arg5*tileSize[1])		/// %arg5*tileSize[1])
/// scf.parallel (%j0, %j1) = (0, 0) to (min(%arg4*tileSize[0], %arg2-%i0)		/// scf.parallel (%j0, %j1) = (0, 0) to (min(%arg4*tileSize[0], %arg2-%i0)
/// min(%arg5*tileSize[1], %arg3-%i1))		/// min(%arg5*tileSize[1], %arg3-%i1))
/// step (%arg4, %arg5)		/// step (%arg4, %arg5)
///		///
		/// or, when no-min-max-bounds is true, into
		/// scf.parallel (%i0, %i1) = (%arg0, %arg1) to (%arg2, %arg3)
		/// step (%arg4*tileSize[0],
		/// %arg5*tileSize[1])
		/// scf.parallel (%j0, %j1) = (0, 0) to (%arg4*tileSize[0],
		/// %arg5*tileSize[1])
		mehdi_aminiUnsubmitted Done Reply Inline Actions (Nit: Indent this to match the first dimension) I think this is the crux of the change here: the tiles will be all the same size instead of being dynamic. I was wondering if there is a better naming here for the option: right now it is pointing at the inbound check inside the body instead of pointing at the change in the tile size. Adding Uday to have another opinion here. mehdi_amini: (Nit: Indent this to match the first dimension) I think this is the crux of the change here…
		linearhitAuthorUnsubmitted Done Reply Inline Actions Indent fixed. linearhit: Indent fixed.
		/// step (%arg4, %arg5)
		/// %inbound = (%j0 * %arg4 + %i0 < %arg2) &&
		/// (%j1 * %arg5 + %i1 < %arg3)
		/// scf.if (%inbound)
		/// ....
		///
/// where the uses of %i0 and %i1 in the loop body are replaced by		/// where the uses of %i0 and %i1 in the loop body are replaced by
/// %i0 + j0 and %i1 + %j1.		/// %i0 + j0 and %i1 + %j1.
//		//
/// The old loop is replaced with the new one.		/// The old loop is replaced with the new one.
std::pair<ParallelOp, ParallelOp>		std::pair<ParallelOp, ParallelOp>
mlir::scf::tileParallelLoop(ParallelOp op, ArrayRef<int64_t> tileSizes) {		mlir::scf::tileParallelLoop(ParallelOp op, ArrayRef<int64_t> tileSizes,
		bool noMinMaxBounds) {
OpBuilder b(op);		OpBuilder b(op);
auto zero = b.create<ConstantIndexOp>(op.getLoc(), 0);		auto zero = b.create<ConstantIndexOp>(op.getLoc(), 0);
SmallVector<Value, 2> tileSizeConstants;		SmallVector<Value, 2> tileSizeConstants;
tileSizeConstants.reserve(op.upperBound().size());		tileSizeConstants.reserve(op.upperBound().size());
for (size_t i = 0, end = op.upperBound().size(); i != end; ++i) {		for (size_t i = 0, end = op.upperBound().size(); i != end; ++i) {
if (i < tileSizes.size())		if (i < tileSizes.size())
tileSizeConstants.push_back(		tileSizeConstants.push_back(
b.create<ConstantIndexOp>(op.getLoc(), tileSizes[i]));		b.create<ConstantIndexOp>(op.getLoc(), tileSizes[i]));
Show All 9 Lines	for (auto step : llvm::zip(op.step(), tileSizeConstants)) {
newSteps.push_back(		newSteps.push_back(
b.create<MulIOp>(op.getLoc(), std::get<0>(step), std::get<1>(step)));		b.create<MulIOp>(op.getLoc(), std::get<0>(step), std::get<1>(step)));
}		}
auto outerLoop = b.create<ParallelOp>(op.getLoc(), op.lowerBound(),		auto outerLoop = b.create<ParallelOp>(op.getLoc(), op.lowerBound(),
op.upperBound(), newSteps);		op.upperBound(), newSteps);
b.setInsertionPointToStart(outerLoop.getBody());		b.setInsertionPointToStart(outerLoop.getBody());

// Compute min(size, dim - offset) to avoid out-of-bounds accesses.		// Compute min(size, dim - offset) to avoid out-of-bounds accesses.
// FIXME: Instead of using min, we want to replicate the tail. This would give
// the inner loop constant bounds for easy vectorization.
auto minMap = AffineMap::get(		auto minMap = AffineMap::get(
/dimCount=/3, /symbolCount=/0,		/dimCount=/3, /symbolCount=/0,
{getAffineDimExpr(/position=/0, b.getContext()),		{getAffineDimExpr(/position=/0, b.getContext()),
getAffineDimExpr(/position=/1, b.getContext()) -		getAffineDimExpr(/position=/1, b.getContext()) -
getAffineDimExpr(/position=/2, b.getContext())},		getAffineDimExpr(/position=/2, b.getContext())},
b.getContext());		b.getContext());

// Create the inner loop with adjusted bounds.		// Create the inner loop with adjusted bounds.
SmallVector<Value, 2> newBounds;		SmallVector<Value, 2> newBounds;
newBounds.reserve(op.upperBound().size());		newBounds.reserve(op.upperBound().size());
		bool needInboundCheck = false;
for (auto dim : llvm::zip(outerLoop.lowerBound(), outerLoop.upperBound(),		for (auto dim : llvm::zip(outerLoop.lowerBound(), outerLoop.upperBound(),
outerLoop.step(), outerLoop.getInductionVars(),		outerLoop.step(), outerLoop.getInductionVars(),
op.step(), tileSizeConstants)) {		op.step(), tileSizeConstants)) {
Value lowerBound, upperBound, newStep, iv, step, tileSizeConstant;		Value lowerBound, upperBound, newStep, iv, step, tileSizeConstant;
std::tie(lowerBound, upperBound, newStep, iv, step, tileSizeConstant) = dim;		std::tie(lowerBound, upperBound, newStep, iv, step, tileSizeConstant) = dim;
// Collect the statically known loop bounds		// Collect the statically known loop bounds
auto lowerBoundConstant =		auto lowerBoundConstant =
dyn_cast_or_null<ConstantIndexOp>(lowerBound.getDefiningOp());		dyn_cast_or_null<ConstantIndexOp>(lowerBound.getDefiningOp());
Show All 9 Lines	if (lowerBoundConstant && upperBoundConstant && stepConstant) {
auto numIterations = llvm::divideCeil(upperBoundConstant.getValue() -		auto numIterations = llvm::divideCeil(upperBoundConstant.getValue() -
lowerBoundConstant.getValue(),		lowerBoundConstant.getValue(),
stepConstant.getValue());		stepConstant.getValue());
if (numIterations % tileSize == 0) {		if (numIterations % tileSize == 0) {
newBounds.push_back(newStep);		newBounds.push_back(newStep);
continue;		continue;
}		}
}		}

		// For InboundCheck mode, just use the variable outer step
		if (noMinMaxBounds) {
		newBounds.push_back(newStep);
		needInboundCheck = true;
		continue;
		}

// Otherwise, we dynamically compute the bound for		// Otherwise, we dynamically compute the bound for
// each iteration of the outer loop.		// each iteration of the outer loop.
newBounds.push_back(		newBounds.push_back(
b.create<AffineMinOp>(op.getLoc(), b.getIndexType(), minMap,		b.create<AffineMinOp>(op.getLoc(), b.getIndexType(), minMap,
ValueRange{newStep, upperBound, iv}));		ValueRange{newStep, upperBound, iv}));
}		}
auto innerLoop = b.create<ParallelOp>(		auto innerLoop = b.create<ParallelOp>(
op.getLoc(), SmallVector<Value, 2>(newBounds.size(), zero), newBounds,		op.getLoc(), SmallVector<Value, 2>(newBounds.size(), zero), newBounds,
op.step());		op.step());

// Steal the body of the old parallel loop and erase it.		if (noMinMaxBounds && needInboundCheck) {
		b.setInsertionPointToStart(innerLoop.getBody());
		// Insert in-bound check
		Value inbound =
		b.create<ConstantOp>(op.getLoc(), b.getIntegerType(1),
		b.getIntegerAttr(b.getIntegerType(1), 1));
		for (auto dim :
		llvm::zip(outerLoop.upperBound(), outerLoop.getInductionVars(),
		innerLoop.getInductionVars(), innerLoop.step())) {
		Value outerUpperBound, outerIV, innerIV, innerStep;
		std::tie(outerUpperBound, outerIV, innerIV, innerStep) = dim;
		// %in_bound = %in_bound &&
		// (%inner_iv * %inner_step + %outer_iv < %outer_upper_bound)
		Value index = b.create<AddIOp>(
		op.getLoc(), b.create<MulIOp>(op.getLoc(), innerIV, innerStep),
		outerIV);
		Value dimInbound = b.create<CmpIOp>(op.getLoc(), CmpIPredicate::ult,
		index, outerUpperBound);
		inbound = b.create<AndOp>(op.getLoc(), inbound, dimInbound);
		}
		auto ifInbound = b.create<IfOp>(op.getLoc(),
		/resultTypes/ ArrayRef<Type>{}, inbound,
		/hasElseRegion/ false);
		ifInbound.thenRegion().takeBody(op.region());
		Block &thenBlock = ifInbound.thenRegion().front();
		b.setInsertionPointToStart(innerLoop.getBody());
		for (auto ivs : llvm::enumerate(llvm::zip(innerLoop.getInductionVars(),
		outerLoop.getInductionVars()))) {
		AddIOp newIndex = b.create<AddIOp>(op.getLoc(), std::get<0>(ivs.value()),
		std::get<1>(ivs.value()));
		thenBlock.getArgument(ivs.index())
		.replaceAllUsesExcept(newIndex, newIndex);
		}
		thenBlock.eraseArguments(llvm::to_vector<4>(
		llvm::seq((unsigned)0, thenBlock.getNumArguments())));
		} else {
innerLoop.region().takeBody(op.region());		innerLoop.region().takeBody(op.region());

// Insert computation for new index vectors and replace uses.
b.setInsertionPointToStart(innerLoop.getBody());		b.setInsertionPointToStart(innerLoop.getBody());
for (auto ivs :		for (auto ivs : llvm::zip(innerLoop.getInductionVars(),
llvm::zip(innerLoop.getInductionVars(), outerLoop.getInductionVars())) {		outerLoop.getInductionVars())) {
Value inner_index = std::get<0>(ivs);		Value innerIndex = std::get<0>(ivs);
AddIOp newIndex =		AddIOp newIndex =
b.create<AddIOp>(op.getLoc(), std::get<0>(ivs), std::get<1>(ivs));		b.create<AddIOp>(op.getLoc(), std::get<0>(ivs), std::get<1>(ivs));
inner_index.replaceAllUsesExcept(newIndex, newIndex);		innerIndex.replaceAllUsesExcept(newIndex, newIndex);
		}
}		}

op.erase();		op.erase();
return std::make_pair(outerLoop, innerLoop);		return std::make_pair(outerLoop, innerLoop);
}		}

namespace {		namespace {
struct ParallelLoopTiling		struct ParallelLoopTiling
: public SCFParallelLoopTilingBase<ParallelLoopTiling> {		: public SCFParallelLoopTilingBase<ParallelLoopTiling> {
ParallelLoopTiling() = default;		ParallelLoopTiling() = default;
explicit ParallelLoopTiling(ArrayRef<int64_t> tileSizes) {		explicit ParallelLoopTiling(ArrayRef<int64_t> tileSizes,
		bool noMinMaxBounds = false) {
this->tileSizes = tileSizes;		this->tileSizes = tileSizes;
		this->noMinMaxBounds = noMinMaxBounds;
}		}

void runOnFunction() override {		void runOnFunction() override {
SmallVector<ParallelOp, 2> innermostPloops;		SmallVector<ParallelOp, 2> innermostPloops;
getInnermostParallelLoops(getFunction().getOperation(), innermostPloops);		getInnermostParallelLoops(getFunction().getOperation(), innermostPloops);
for (ParallelOp ploop : innermostPloops) {		for (ParallelOp ploop : innermostPloops) {
// FIXME: Add reduction support.		// FIXME: Add reduction support.
if (ploop.getNumReductions() == 0)		if (ploop.getNumReductions() == 0)
tileParallelLoop(ploop, tileSizes);		tileParallelLoop(ploop, tileSizes, noMinMaxBounds);
}		}
}		}
};		};
} // namespace		} // namespace

std::unique_ptr<Pass>		std::unique_ptr<Pass>
mlir::createParallelLoopTilingPass(ArrayRef<int64_t> tileSizes) {		mlir::createParallelLoopTilingPass(ArrayRef<int64_t> tileSizes,
return std::make_unique<ParallelLoopTiling>(tileSizes);		bool noMinMaxBounds) {
		return std::make_unique<ParallelLoopTiling>(tileSizes, noMinMaxBounds);
}		}

mlir/test/Dialect/SCF/parallel-loop-tiling-inbound-check.mlir

This file was added.

				// RUN: mlir-opt %s -pass-pipeline='builtin.func(parallel-loop-tiling{parallel-loop-tile-sizes=1,4 no-min-max-bounds=true})' -split-input-file \| FileCheck %s

				func @parallel_loop(%arg0 : index, %arg1 : index, %arg2 : index,
				%arg3 : index, %arg4 : index, %arg5 : index,
				%A: memref<?x?xf32>, %B: memref<?x?xf32>,
				%C: memref<?x?xf32>, %result: memref<?x?xf32>) {
				scf.parallel (%i0, %i1) = (%arg0, %arg1) to (%arg2, %arg3) step (%arg4, %arg5) {
				%B_elem = memref.load %B[%i0, %i1] : memref<?x?xf32>
				%C_elem = memref.load %C[%i0, %i1] : memref<?x?xf32>
				%sum_elem = addf %B_elem, %C_elem : f32
				memref.store %sum_elem, %result[%i0, %i1] : memref<?x?xf32>
				}
				return
				}

				// CHECK-LABEL: func @parallel_loop(
				// CHECK-SAME: [[ARG1:%.]]: index, [[ARG2:%.]]: index, [[ARG3:%.]]: index, [[ARG4:%.]]: index, [[ARG5:%.]]: index, [[ARG6:%.]]: index, [[ARG7:%.]]: memref<?x?xf32>, [[ARG8:%.]]: memref<?x?xf32>, [[ARG9:%.]]: memref<?x?xf32>, [[ARG10:%.]]: memref<?x?xf32>) {
				// CHECK: [[C0:%.*]] = constant 0 : index
				// CHECK: [[C1:%.*]] = constant 1 : index
				// CHECK: [[C4:%.*]] = constant 4 : index
				// CHECK: [[V1:%.*]] = muli [[ARG5]], [[C1]] : index
				// CHECK: [[V2:%.*]] = muli [[ARG6]], [[C4]] : index
				// CHECK: scf.parallel ([[V3:%.]], [[V4:%.]]) = ([[ARG1]], [[ARG2]]) to ([[ARG3]], [[ARG4]]) step ([[V1]], [[V2]]) {
				// CHECK: scf.parallel ([[V7:%.]], [[V8:%.]]) = ([[C0]], [[C0]]) to ([[V1]], [[V2]]) step ([[ARG5]], [[ARG6]]) {
				// CHECK: [[V9:%.*]] = addi [[V7]], [[V3]] : index
				// CHECK: [[V10:%.*]] = addi [[V8]], [[V4]] : index
				// CHECK: %true = constant true
				// CHECK: [[V11:%.*]] = muli [[V7]], [[ARG5]] : index
				// CHECK: [[V12:%.*]] = addi [[V11]], [[V3]] : index
				// CHECK: [[V13:%.*]] = cmpi ult, [[V12]], [[ARG3]] : index
				// CHECK: [[V14:%.*]] = and %true, [[V13]] : i1
				// CHECK: [[V15:%.*]] = muli [[V8]], [[ARG6]] : index
				// CHECK: [[V16:%.*]] = addi [[V15]], [[V4]] : index
				// CHECK: [[V17:%.*]] = cmpi ult, [[V16]], [[ARG4]] : index
				// CHECK: [[V18:%.*]] = and [[V14]], [[V17]] : i1
				// CHECK: scf.if [[V18]] {
				// CHECK: [[V19:%.*]] = memref.load [[ARG8]]{{\[}}[[V9]], [[V10]]] : memref<?x?xf32>
				// CHECK: [[V20:%.*]] = memref.load [[ARG9]]{{\[}}[[V9]], [[V10]]] : memref<?x?xf32>
				// CHECK: [[V21:%.*]] = addf [[V19]], [[V20]] : f32
				// CHECK: memref.store [[V21]], [[ARG10]]{{\[}}[[V9]], [[V10]]] : memref<?x?xf32>
				// CHECK: }
				// CHECK: }
				// CHECK: }
				// CHECK: return

				// -----

				func @static_loop_with_step() {
				%c0 = constant 0 : index
				%c3 = constant 3 : index
				%c22 = constant 22 : index
				%c24 = constant 24 : index
				scf.parallel (%i0, %i1) = (%c0, %c0) to (%c22, %c24) step (%c3, %c3) {
				}
				return
				}

				// CHECK-LABEL: func @static_loop_with_step() {
				// CHECK: [[C0:%.*]] = constant 0 : index
				// CHECK: [[C3:%.*]] = constant 3 : index
				// CHECK: [[C22:%.*]] = constant 22 : index
				// CHECK: [[C24:%.*]] = constant 24 : index
				// CHECK: [[C0_1:%.*]] = constant 0 : index
				// CHECK: [[C1:%.*]] = constant 1 : index
				// CHECK: [[C4:%.*]] = constant 4 : index
				// CHECK: [[V1:%.*]] = muli [[C3]], [[C1]] : index
				// CHECK: [[V2:%.*]] = muli [[C3]], [[C4]] : index
				// CHECK: scf.parallel ([[V3:%.]], [[V4:%.]]) = ([[C0]], [[C0]]) to ([[C22]], [[C24]]) step ([[V1]], [[V2]]) {
				// CHECK: scf.parallel ([[V5:%.]], [[V6:%.]]) = ([[C0_1]], [[C0_1]]) to ([[V1]], [[V2]]) step ([[C3]], [[C3]]) {
				// CHECK-NOT: scf.if
				// CHECK: = addi [[V5]], [[V3]] : index
				// CHECK: = addi [[V6]], [[V4]] : index
				// CHECK: }
				// CHECK: }
				// CHECK: return

				// -----

				func @tile_nested_innermost() {
				%c2 = constant 2 : index
				%c0 = constant 0 : index
				%c1 = constant 1 : index
				scf.parallel (%i, %j) = (%c0, %c0) to (%c2, %c2) step (%c1, %c1) {
				scf.parallel (%k, %l) = (%c0, %c0) to (%c2, %c2) step (%c1, %c1) {
				}
				}
				scf.parallel (%i, %j) = (%c0, %c0) to (%c2, %c2) step (%c1, %c1) {
				}
				return
				}

				// CHECK-LABEL: func @tile_nested_innermost() {
				// CHECK: [[C2:%.*]] = constant 2 : index
				// CHECK: [[C0:%.*]] = constant 0 : index
				// CHECK: [[C1:%.*]] = constant 1 : index
				// CHECK: scf.parallel ([[V1:%.]], [[V2:%.]]) = ([[C0]], [[C0]]) to ([[C2]], [[C2]]) step ([[C1]], [[C1]]) {
				// CHECK: [[C0_1:%.*]] = constant 0 : index
				// CHECK: [[C1_1:%.*]] = constant 1 : index
				// CHECK: [[C4:%.*]] = constant 4 : index
				// CHECK: [[V3:%.*]] = muli [[C1]], [[C1_1]] : index
				// CHECK: [[V4:%.*]] = muli [[C1]], [[C4]] : index
				// CHECK: scf.parallel ([[V5:%.]], [[V6:%.]]) = ([[C0]], [[C0]]) to ([[C2]], [[C2]]) step ([[V3]], [[V4]]) {
				// CHECK: scf.parallel ([[V8:%.]], [[V9:%.]]) = ([[C0_1]], [[C0_1]]) to ([[V3]], [[V4]]) step ([[C1]], [[C1]]) {
				// CHECK: = addi [[V8]], [[V5]] : index
				// CHECK: = addi [[V9]], [[V6]] : index
				// CHECK: scf.if
				// CHECK: }
				// CHECK: }
				// CHECK: }
				// CHECK: [[C0_2:%.*]] = constant 0 : index
				// CHECK: [[C1_2:%.*]] = constant 1 : index
				// CHECK: [[C4_1:%.*]] = constant 4 : index
				// CHECK: [[V10:%.*]] = muli [[C1]], [[C1_2]] : index
				// CHECK: [[V11:%.*]] = muli [[C1]], [[C4_1]] : index
				// CHECK: scf.parallel ([[V12:%.]], [[V13:%.]]) = ([[C0]], [[C0]]) to ([[C2]], [[C2]]) step ([[V10]], [[V11]]) {
				// CHECK: scf.parallel ([[V15:%.]], [[V16:%.]]) = ([[C0_2]], [[C0_2]]) to ([[V10]], [[V11]]) step ([[C1]], [[C1]]) {
				// CHECK: = addi [[V15]], [[V12]] : index
				// CHECK: = addi [[V16]], [[V13]] : index
				// CHECK: scf.if
				// CHECK: }
				// CHECK: }
				// CHECK: return
				// CHECK: }

				// -----

				func @tile_nested_in_non_ploop() {
				%c0 = constant 0 : index
				%c1 = constant 1 : index
				%c2 = constant 2 : index
				scf.for %i = %c0 to %c2 step %c1 {
				scf.for %j = %c0 to %c2 step %c1 {
				scf.parallel (%k, %l) = (%c0, %c0) to (%c2, %c2) step (%c1, %c1) {
				}
				}
				}
				return
				}

				// CHECK-LABEL: func @tile_nested_in_non_ploop
				// CHECK: scf.for
				// CHECK: scf.for
				// CHECK: scf.parallel
				// CHECK: scf.parallel
				// CHECK: }
				// CHECK: }
				// CHECK: }
				// CHECK: }
				// CHECK: }

This is an archive of the discontinued LLVM Phabricator instance.

[MLIR][DISC] Revise ParallelLoopTilingPass with inbound_check modeClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 366595

mlir/include/mlir/Dialect/SCF/Passes.h

mlir/include/mlir/Dialect/SCF/Passes.td

mlir/include/mlir/Dialect/SCF/Transforms.h

mlir/lib/Dialect/SCF/Transforms/ParallelLoopTiling.cpp

mlir/test/Dialect/SCF/parallel-loop-tiling-inbound-check.mlir

[MLIR][DISC] Revise ParallelLoopTilingPass with inbound_check mode
ClosedPublic