Download Raw Diff

Details

Reviewers

herhut
bondhugula

Commits

rG27c201aa1d97: [MLIR] Add parallel loop collapsing.

Summary

This allows conversion of a ParallelLoop from N induction variables to
some nuber of induction variables less than N.

The first intended use of this is for the GPUDialect to convert
ParallelLoops to iterate over 3 dimensions so they can be launched as
GPU Kernels.

To implement this:
- Normalize each iteration space of the ParallelLoop
- Use the same induction variable in a new ParallelLoop for multiple
  original iterations.
- Split the new induction variable back into the original set of values
  inside the body of the ParallelLoop.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

tpopp created this revision.Mar 18 2020, 7:22 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 18 2020, 7:22 AM

Herald added subscribers: llvm-commits, Joonsoo, liufengdb and 12 others. · View Herald Transcript

Harbormaster failed remote builds in B49594: Diff 251076!Mar 18 2020, 8:09 AM

Refactored some code to be more proper.

tpopp added a reviewer: herhut.Mar 18 2020, 8:33 AM

Save loops.getLoc in a variable and use variable everywhere instead.

Would it *please* be possible for MLIR patches to be named as such, please?

lebedev.ri retitled this revision from Add parallel loop coalescing. to [MLIR] Add parallel loop coalescing..Mar 18 2020, 8:51 AM

Harbormaster failed remote builds in B49604: Diff 251096!Mar 18 2020, 9:15 AM

Harbormaster failed remote builds in B49607: Diff 251099!Mar 18 2020, 9:47 AM

(Just some drive-by nits)

mlir/include/mlir/InitAllPasses.h
115	Let's keep these sorted.
mlir/include/mlir/Transforms/LoopUtils.h
229	Can you please add a comment here?
mlir/lib/Transforms/ParallelLoopCoalescing.cpp
23 ↗	(On Diff #251099)	Can you use pass options instead? https://mlir.llvm.org/docs/WritingAPass/#instance-specific-pass-options
mlir/lib/Transforms/Utils/LoopUtils.cpp
976–983	nit: Use /// for top-level comments
1000–1001	nit: Please drop all trivial braces.
1032	nit: Please use /// for top-level comments.
1119	nit: Cache the end iterator of the loop, and prefer pre-increment.
1152	Same here and below.

Could you please add a summary to the commit message - even if it's a couple of lines? On a side note, do you want to use the term 'collapse' instead of 'coalesce'? OpenMP uses collapse for such linearization, and coalesce could also imply fusion of loops, which this isn't. I do see that colaesceLoops already existed prior to this patch

mlir/lib/Transforms/ParallelLoopCoalescing.cpp
52 ↗	(On Diff #251099)	List initialize?

This revision now requires changes to proceed.Mar 18 2020, 10:48 AM

In D76363#1929217, @lebedev.ri wrote:

Would it *please* be possible for MLIR patches to be named as such, please?

Where is this coming from?
It'd be nice to clearly understand the goal and where is it coming from. In particular I don't understand why this isn't the role of separate tooling and it has to be encoded directly in commit messages. Are we going to prefix down to the last component manually? Is something like [LLVM][Backend][X86][Register allocator] to be expected?

In D76363#1929476, @bondhugula wrote:

Could you please add a summary to the commit message - even if it's a couple of lines?

+1, we document it here: https://mlir.llvm.org/getting_started/Contributing/#commit-messages

In D76363#1930039, @mehdi_amini wrote:

In D76363#1929476, @bondhugula wrote:

Could you please add a summary to the commit message - even if it's a couple of lines?

+1, we document it here: https://mlir.llvm.org/getting_started/Contributing/#commit-messages

I'll add to the summary. Is there a reason arc diff didn't pick up my git commit message? Does it only create the summary based on the first arc diff and not any subsequent calls?

tpopp edited the summary of this revision. (Show Details)Mar 19 2020, 2:07 AM

Handle formatting and naming feedback.

Harbormaster completed remote builds in B49714: Diff 251306.Mar 19 2020, 2:41 AM

Use OptionList instead of llvm:🆑:list

Thanks. Looks great with some nits.

mlir/lib/Transforms/ParallelLoopCoalescing.cpp
44 ↗	(On Diff #251099)	Can you make this an `OperationPass` instead?
52 ↗	(On Diff #251099)	Use `llvm::SmallVector` instead?
mlir/lib/Transforms/Utils/LoopUtils.cpp
979	Maybe have a little struct here instead of a tuple? Or use `std::tie` at use sites to improve readability.
983	Move these closer to their first use.
1006	Maybe `Value newLowerBound = isZeroBased ? lowerBound : boundsBuilder.create<ConstantIndexOp>(loc, 0)`?
1012	Here, too?
1050	Mega-nit: The order lower, step, upper is strange...
1138	Why not `newUpperBound = cst1` here?
1154	A comment what this computes would help readability.
1158	Should this be the normalized upper bound?
1160	It would read easier for me if updating previous was also done here except for the last case. Would that make sense?
1163	Normalized here, too?

This revision now requires changes to proceed.Mar 19 2020, 2:49 AM

Harbormaster failed remote builds in B49717: Diff 251312!Mar 19 2020, 3:45 AM

tpopp marked 3 inline comments as done.Mar 19 2020, 3:56 AM

Handle herhut's comments and fix broken collapsing
logic that used the wrong upper bound value.

tpopp marked an inline comment as done.Mar 19 2020, 6:14 AM

tpopp added inline comments.

mlir/lib/Transforms/Utils/LoopUtils.cpp
1138	No real reason. I thought it would be easier for debugging purposes if each string of calculations is fully unconnected from other calculations.
1158	Yes
1163	Yes

tpopp marked an inline comment as done.Mar 19 2020, 6:28 AM

tpopp added inline comments.

mlir/lib/Transforms/Utils/LoopUtils.cpp
1160	I think this trades one mess for a different one because then it's just a different bounds check and not all indexing is happening at ivar_idx anymore.

Change loop form inside of collapsePLoops.

tpopp marked 2 inline comments as done.Mar 19 2020, 6:45 AM

tpopp added inline comments.

mlir/lib/Transforms/Utils/LoopUtils.cpp
1160	I tried to restructure it to be more readable.

Harbormaster failed remote builds in B49735: Diff 251361!Mar 19 2020, 6:59 AM

Harbormaster failed remote builds in B49739: Diff 251369!Mar 19 2020, 7:32 AM

Use correct variable to fix undefined variable error.

Harbormaster failed remote builds in B49849: Diff 251574!Mar 20 2020, 3:13 AM

Rename variable for clang tidy reasons.

Harbormaster completed remote builds in B49870: Diff 251610.Mar 20 2020, 5:55 AM

Thanks.

This revision was not accepted when it landed; it landed in state Needs Review.Mar 26 2020, 1:35 AM

Closed by commit rG27c201aa1d97: [MLIR] Add parallel loop collapsing. (authored by Tres Popp <tpopp@google.com>). · Explain Why

This revision was automatically updated to reflect the committed changes.

In D76363#1930727, @tpopp wrote:

In D76363#1930039, @mehdi_amini wrote:

In D76363#1929476, @bondhugula wrote:

Could you please add a summary to the commit message - even if it's a couple of lines?

+1, we document it here: https://mlir.llvm.org/getting_started/Contributing/#commit-messages

I'll add to the summary. Is there a reason arc diff didn't pick up my git commit message? Does it only create the summary based on the first arc diff and not any subsequent calls?

I've noticed this too. It doesn't pick up subsequent commit message updates. I just manually update the commit message here when needed.

Mostly minor suggestions on readability

mlir/lib/Transforms/Utils/LoopUtils.cpp
1045	`OpBuilder innerBuilder(inner.getBody())` will be sufficient.
1109	But to spell this out? PLoops -> ParallelLoops?
1130	Nit: period at the end.
mlir/test/Transforms/parallel-loop-collapsing.mlir
5–14 ↗	(On Diff #252767)	Can you drop the extra indent? Also VAL_0 -> C6, VAL_1 -> C7, ...?
31 ↗	(On Diff #252767)	Can you use more descriptive names? VAL_10 -> IV0, VAL_11 -> IV1, ...
34 ↗	(On Diff #252767)	Drop the additional indent between the CHECK and the string?
49 ↗	(On Diff #252767)	CHECK-NEXT
50–51 ↗	(On Diff #252767)	CHECK-NEXT
mlir/test/Transforms/single-parallel-loop-collapsing.mlir
26 ↗	(On Diff #252767)	VAL_14 isn't used; no need to capture it.
32–33 ↗	(On Diff #252767)	CHECK-NEXT
34 ↗	(On Diff #252767)	Not needed.
35 ↗	(On Diff #252767)	Drop trailing blank lines.

Diff 251306

mlir/include/mlir/InitAllPasses.h

Show First 20 Lines • Show All 102 Lines • ▼ Show 20 Lines	#endif
createLinalgTilingToParallelLoopsPass();		createLinalgTilingToParallelLoopsPass();
createLinalgPromotionPass(0);		createLinalgPromotionPass(0);
createConvertLinalgToLoopsPass();		createConvertLinalgToLoopsPass();
createConvertLinalgToParallelLoopsPass();		createConvertLinalgToParallelLoopsPass();
createConvertLinalgToAffineLoopsPass();		createConvertLinalgToAffineLoopsPass();
createConvertLinalgToLLVMPass();		createConvertLinalgToLLVMPass();

// LoopOps		// LoopOps
		createParallelLoopCollapsingPass();
createParallelLoopFusionPass();		createParallelLoopFusionPass();
createParallelLoopSpecializationPass();		createParallelLoopSpecializationPass();
createParallelLoopTilingPass();		createParallelLoopTilingPass();

		rriddleUnsubmitted Done Reply Inline Actions Let's keep these sorted. rriddle: Let's keep these sorted.
// QuantOps		// QuantOps
quant::createConvertSimulatedQuantPass();		quant::createConvertSimulatedQuantPass();
quant::createConvertConstPass();		quant::createConvertConstPass();
quantizer::createAddDefaultStatsPass();		quantizer::createAddDefaultStatsPass();
quantizer::createRemoveInstrumentationPass();		quantizer::createRemoveInstrumentationPass();
quantizer::registerInferQuantizedTypesPass();		quantizer::registerInferQuantizedTypesPass();

// SPIR-V		// SPIR-V
Show All 16 Lines

mlir/include/mlir/Transforms/LoopUtils.h

	Show All 22 Lines
	class AffineForOp;			class AffineForOp;
	class FuncOp;			class FuncOp;
	class OpBuilder;			class OpBuilder;
	class Value;			class Value;
	struct MemRefRegion;			struct MemRefRegion;

	namespace loop {			namespace loop {
	class ForOp;			class ForOp;
				class ParallelOp;
	} // end namespace loop			} // end namespace loop

	/// Unrolls this for operation completely if the trip count is known to be			/// Unrolls this for operation completely if the trip count is known to be
	/// constant. Returns failure otherwise.			/// constant. Returns failure otherwise.
	LogicalResult loopUnrollFull(AffineForOp forOp);			LogicalResult loopUnrollFull(AffineForOp forOp);

	/// Unrolls this for operation by the specified unroll factor. Returns failure			/// Unrolls this for operation by the specified unroll factor. Returns failure
	/// if the loop cannot be unrolled either due to restrictions or due to invalid			/// if the loop cannot be unrolled either due to restrictions or due to invalid
	▲ Show 20 Lines • Show All 181 Lines • ▼ Show 20 Lines
	TileLoops extractFixedOuterLoops(loop::ForOp rootFOrOp,			TileLoops extractFixedOuterLoops(loop::ForOp rootFOrOp,
	ArrayRef<int64_t> sizes);			ArrayRef<int64_t> sizes);

	/// Replace a perfect nest of "for" loops with a single linearized loop. Assumes			/// Replace a perfect nest of "for" loops with a single linearized loop. Assumes
	/// `loops` contains a list of perfectly nested loops with bounds and steps			/// `loops` contains a list of perfectly nested loops with bounds and steps
	/// independent of any loop induction variable involved in the nest.			/// independent of any loop induction variable involved in the nest.
	void coalesceLoops(MutableArrayRef<loop::ForOp> loops);			void coalesceLoops(MutableArrayRef<loop::ForOp> loops);

				/// Take the ParallelLoop and for each set of dimension indices, combine them
				rriddleUnsubmitted Done Reply Inline Actions Can you please add a comment here? rriddle: Can you please add a comment here?
				/// into a single dimension. combinedDimensions must contain each index into
				/// loops exactly once.
				void collapsePLoops(loop::ParallelOp loops,
				std::vector<std::vector<unsigned>> combinedDimensions);

	/// Maps `forOp` for execution on a parallel grid of virtual `processorIds` of			/// Maps `forOp` for execution on a parallel grid of virtual `processorIds` of
	/// size given by `numProcessors`. This is achieved by embedding the SSA values			/// size given by `numProcessors`. This is achieved by embedding the SSA values
	/// corresponding to `processorIds` and `numProcessors` into the bounds and step			/// corresponding to `processorIds` and `numProcessors` into the bounds and step
	/// of the `forOp`. No check is performed on the legality of the rewrite, it is			/// of the `forOp`. No check is performed on the legality of the rewrite, it is
	/// the caller's responsibility to ensure legality.			/// the caller's responsibility to ensure legality.
	///			///
	/// Requires that `processorIds` and `numProcessors` have the same size and that			/// Requires that `processorIds` and `numProcessors` have the same size and that
	/// for each idx, `processorIds`[idx] takes, at runtime, all values between 0			/// for each idx, `processorIds`[idx] takes, at runtime, all values between 0
	Show All 33 Lines

mlir/include/mlir/Transforms/Passes.h

	Show First 20 Lines • Show All 85 Lines • ▼ Show 20 Lines
	/// Creates a pass to perform tiling on loop nests.			/// Creates a pass to perform tiling on loop nests.
	std::unique_ptr<OpPassBase<FuncOp>>			std::unique_ptr<OpPassBase<FuncOp>>
	createLoopTilingPass(uint64_t cacheSizeBytes);			createLoopTilingPass(uint64_t cacheSizeBytes);

	/// Creates a pass that transforms perfectly nested loops with independent			/// Creates a pass that transforms perfectly nested loops with independent
	/// bounds into a single loop.			/// bounds into a single loop.
	std::unique_ptr<OpPassBase<FuncOp>> createLoopCoalescingPass();			std::unique_ptr<OpPassBase<FuncOp>> createLoopCoalescingPass();

				/// Creates a pass that transforms a single ParallelLoop over N induction
				/// variables into another ParallelLoop over less than N induction variables.
				std::unique_ptr<OpPassBase<FuncOp>> createParallelLoopCollapsingPass();

	/// Performs packing (or explicit copying) of accessed memref regions into			/// Performs packing (or explicit copying) of accessed memref regions into
	/// buffers in the specified faster memory space through either pointwise copies			/// buffers in the specified faster memory space through either pointwise copies
	/// or DMA operations.			/// or DMA operations.
	std::unique_ptr<OpPassBase<FuncOp>> createAffineDataCopyGenerationPass(			std::unique_ptr<OpPassBase<FuncOp>> createAffineDataCopyGenerationPass(
	unsigned slowMemorySpace, unsigned fastMemorySpace,			unsigned slowMemorySpace, unsigned fastMemorySpace,
	unsigned tagMemorySpace = 0, int minDmaTransferSize = 1024,			unsigned tagMemorySpace = 0, int minDmaTransferSize = 1024,
	uint64_t fastMemCapacityBytes = std::numeric_limits<uint64_t>::max());			uint64_t fastMemCapacityBytes = std::numeric_limits<uint64_t>::max());

	Show All 21 Lines

mlir/lib/Transforms/CMakeLists.txt

Show All 10 Lines	add_mlir_library(MLIRTransforms
LoopCoalescing.cpp		LoopCoalescing.cpp
LoopFusion.cpp		LoopFusion.cpp
LoopInvariantCodeMotion.cpp		LoopInvariantCodeMotion.cpp
LoopTiling.cpp		LoopTiling.cpp
LoopUnrollAndJam.cpp		LoopUnrollAndJam.cpp
LoopUnroll.cpp		LoopUnroll.cpp
MemRefDataFlowOpt.cpp		MemRefDataFlowOpt.cpp
OpStats.cpp		OpStats.cpp
		ParallelLoopCollapsing.cpp
PipelineDataTransfer.cpp		PipelineDataTransfer.cpp
SimplifyAffineStructures.cpp		SimplifyAffineStructures.cpp
StripDebugInfo.cpp		StripDebugInfo.cpp
SymbolDCE.cpp		SymbolDCE.cpp
Vectorize.cpp		Vectorize.cpp
ViewOpGraph.cpp		ViewOpGraph.cpp
ViewRegionGraph.cpp		ViewRegionGraph.cpp

Show All 17 Lines

mlir/lib/Transforms/Utils/LoopUtils.cpp

Show First 20 Lines • Show All 967 Lines • ▼ Show 20 Lines
replaceAllUsesExcept(Value orig, Value replacement,		replaceAllUsesExcept(Value orig, Value replacement,
const SmallPtrSetImpl<Operation *> &exceptions) {		const SmallPtrSetImpl<Operation *> &exceptions) {
for (auto &use : llvm::make_early_inc_range(orig.getUses())) {		for (auto &use : llvm::make_early_inc_range(orig.getUses())) {
if (exceptions.count(use.getOwner()) == 0)		if (exceptions.count(use.getOwner()) == 0)
use.set(replacement);		use.set(replacement);
}		}
}		}

// Transform a loop with a strictly positive step		/// Return the new lower bound, upper bound, and step in that order. Insert any
// for %i = %lb to %ub step %s		/// additional bounds calculations before the given builder and any additional
// into a 0-based loop with step 1		/// conversion back to the original loop induction value inside the given Block.
// for %ii = 0 to ceildiv(%ub - %lb, %s) step 1 {		static std::tuple<Value, Value, Value>
		herhutUnsubmitted Done Reply Inline Actions Maybe have a little struct here instead of a tuple? Or use `std::tie` at use sites to improve readability. herhut: Maybe have a little struct here instead of a tuple? Or use `std::tie` at use sites to improve…
// %i = %ii * %s + %lb		normalizeLoop(OpBuilder &boundsBuilder, OpBuilder &insideLoopBuilder,
// Insert the induction variable remapping in the body of `inner`, which is		Location loc, Value lowerBound, Value upperBound, Value step,
// expected to be either `loop` or another loop perfectly nested under `loop`.		Value inductionVar) {
// Insert the definition of new bounds immediate before `outer`, which is		Value newLowerBound, newUpperBound, newStep;
		rriddleUnsubmitted Done Reply Inline Actions nit: Use /// for top-level comments rriddle: nit: Use /// for top-level comments
		herhutUnsubmitted Done Reply Inline Actions Move these closer to their first use. herhut: Move these closer to their first use.
// expected to be either `loop` or its parent in the loop nest.
static void normalizeLoop(loop::ForOp loop, loop::ForOp outer,
loop::ForOp inner) {
OpBuilder builder(outer);
Location loc = loop.getLoc();

// Check if the loop is already known to have a constant zero lower bound or		// Check if the loop is already known to have a constant zero lower bound or
// a constant one step.		// a constant one step.
bool isZeroBased = false;		bool isZeroBased = false;
if (auto ubCst =		if (auto ubCst =
dyn_cast_or_null<ConstantIndexOp>(loop.lowerBound().getDefiningOp()))		dyn_cast_or_null<ConstantIndexOp>(lowerBound.getDefiningOp()))
isZeroBased = ubCst.getValue() == 0;		isZeroBased = ubCst.getValue() == 0;

bool isStepOne = false;		bool isStepOne = false;
if (auto stepCst =		if (auto stepCst = dyn_cast_or_null<ConstantIndexOp>(step.getDefiningOp()))
dyn_cast_or_null<ConstantIndexOp>(loop.step().getDefiningOp()))
isStepOne = stepCst.getValue() == 1;		isStepOne = stepCst.getValue() == 1;

if (isZeroBased && isStepOne)
return;

// Compute the number of iterations the loop executes: ceildiv(ub - lb, step)		// Compute the number of iterations the loop executes: ceildiv(ub - lb, step)
// assuming the step is strictly positive. Update the bounds and the step		// assuming the step is strictly positive. Update the bounds and the step
// of the loop to go from 0 to the number of iterations, if necessary.		// of the loop to go from 0 to the number of iterations, if necessary.
// TODO(zinenko): introduce support for negative steps or emit dynamic asserts		// TODO(zinenko): introduce support for negative steps or emit dynamic asserts
// on step positivity, whatever gets implemented first.		// on step positivity, whatever gets implemented first.
Value diff =		if (isZeroBased && isStepOne)
		rriddleUnsubmitted Done Reply Inline Actions nit: Please drop all trivial braces. rriddle: nit: Please drop all trivial braces.
builder.create<SubIOp>(loc, loop.upperBound(), loop.lowerBound());		return {lowerBound, upperBound, step};
Value numIterations = ceilDivPositive(builder, loc, diff, loop.step());
loop.setUpperBound(numIterations);

Value lb = loop.lowerBound();
if (!isZeroBased) {
Value cst0 = builder.create<ConstantIndexOp>(loc, 0);
loop.setLowerBound(cst0);
}

Value step = loop.step();		Value diff = boundsBuilder.create<SubIOp>(loc, upperBound, lowerBound);
if (!isStepOne) {		newUpperBound = ceilDivPositive(boundsBuilder, loc, diff, step);
Value cst1 = builder.create<ConstantIndexOp>(loc, 1);
		herhutUnsubmitted Done Reply Inline Actions Maybe `Value newLowerBound = isZeroBased ? lowerBound : boundsBuilder.create<ConstantIndexOp>(loc, 0)`? herhut: Maybe `Value newLowerBound = isZeroBased ? lowerBound : boundsBuilder.create<ConstantIndexOp>…
loop.setStep(cst1);		if (isZeroBased)
}		newLowerBound = lowerBound;
		else
		newLowerBound = boundsBuilder.create<ConstantIndexOp>(loc, 0);

		if (isStepOne)
		herhutUnsubmitted Done Reply Inline Actions Here, too? herhut: Here, too?
		newStep = step;
		else
		newStep = boundsBuilder.create<ConstantIndexOp>(loc, 1);

// Insert code computing the value of the original loop induction variable		// Insert code computing the value of the original loop induction variable
// from the "normalized" one.		// from the "normalized" one.
builder.setInsertionPointToStart(inner.getBody());
Value scaled =		Value scaled =
isStepOne ? loop.getInductionVar()		isStepOne ? inductionVar
: builder.create<MulIOp>(loc, loop.getInductionVar(), step);		: insideLoopBuilder.create<MulIOp>(loc, inductionVar, step);
Value shifted =		Value shifted =
isZeroBased ? scaled : builder.create<AddIOp>(loc, scaled, lb);		isZeroBased ? scaled
		: insideLoopBuilder.create<AddIOp>(loc, scaled, lowerBound);

SmallPtrSet<Operation *, 2> preserve{scaled.getDefiningOp(),		SmallPtrSet<Operation *, 2> preserve{scaled.getDefiningOp(),
shifted.getDefiningOp()};		shifted.getDefiningOp()};
replaceAllUsesExcept(loop.getInductionVar(), shifted, preserve);		replaceAllUsesExcept(inductionVar, shifted, preserve);
		return {newLowerBound, newUpperBound, newStep};
		}

		/// Transform a loop with a strictly positive step
		rriddleUnsubmitted Done Reply Inline Actions nit: Please use /// for top-level comments. rriddle: nit: Please use /// for top-level comments.
		/// for %i = %lb to %ub step %s
		/// into a 0-based loop with step 1
		/// for %ii = 0 to ceildiv(%ub - %lb, %s) step 1 {
		/// %i = %ii * %s + %lb
		/// Insert the induction variable remapping in the body of `inner`, which is
		/// expected to be either `loop` or another loop perfectly nested under `loop`.
		/// Insert the definition of new bounds immediate before `outer`, which is
		/// expected to be either `loop` or its parent in the loop nest.
		static void normalizeLoop(loop::ForOp loop, loop::ForOp outer,
		loop::ForOp inner) {
		OpBuilder builder(outer);
		OpBuilder innerBuilder(inner.getBody(), inner.getBody()->begin());
		auto loopPieces =
		bondhugulaUnsubmitted Not Done Reply Inline Actions `OpBuilder innerBuilder(inner.getBody())` will be sufficient. bondhugula: `OpBuilder innerBuilder(inner.getBody())` will be sufficient.
		normalizeLoop(builder, innerBuilder, loop.getLoc(), loop.lowerBound(),
		loop.upperBound(), loop.step(), loop.getInductionVar());

		loop.setLowerBound(std::get<0>(loopPieces));
		loop.setStep(std::get<2>(loopPieces));
		herhutUnsubmitted Done Reply Inline Actions Mega-nit: The order lower, step, upper is strange... herhut: Mega-nit: The order lower, step, upper is strange...
		loop.setUpperBound(std::get<1>(loopPieces));
}		}

void mlir::coalesceLoops(MutableArrayRef<loop::ForOp> loops) {		void mlir::coalesceLoops(MutableArrayRef<loop::ForOp> loops) {
if (loops.size() < 2)		if (loops.size() < 2)
return;		return;

loop::ForOp innermost = loops.back();		loop::ForOp innermost = loops.back();
loop::ForOp outermost = loops.front();		loop::ForOp outermost = loops.front();
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	void mlir::coalesceLoops(MutableArrayRef<loop::ForOp> loops) {
loop::ForOp second = loops[1];		loop::ForOp second = loops[1];
innermost.getBody()->back().erase();		innermost.getBody()->back().erase();
outermost.getBody()->getOperations().splice(		outermost.getBody()->getOperations().splice(
Block::iterator(second.getOperation()),		Block::iterator(second.getOperation()),
innermost.getBody()->getOperations());		innermost.getBody()->getOperations());
second.erase();		second.erase();
}		}

		void mlir::collapsePLoops(
		bondhugulaUnsubmitted Not Done Reply Inline Actions But to spell this out? PLoops -> ParallelLoops? bondhugula: But to spell this out? PLoops -> ParallelLoops?
		loop::ParallelOp loops,
		std::vector<std::vector<unsigned>> combinedDimensions) {
		OpBuilder outsideBuilder(loops);
		Location loc = loops.getLoc();

		// Normalize ParallelOp's iteration pattern.
		SmallVector<Value, 3> normalizedLowerBounds;
		SmallVector<Value, 3> normalizedSteps;
		SmallVector<Value, 3> normalizedUpperBounds;
		for (unsigned i = 0, e = loops.getNumLoops(); i < e; ++i) {
		rriddleUnsubmitted Done Reply Inline Actions nit: Cache the end iterator of the loop, and prefer pre-increment. rriddle: nit: Cache the end iterator of the loop, and prefer pre-increment.
		OpBuilder insideLoopBuilder(loops.getBody(), loops.getBody()->begin());
		auto resultBounds =
		normalizeLoop(outsideBuilder, insideLoopBuilder, loc,
		loops.lowerBound()[i], loops.upperBound()[i],
		loops.step()[i], loops.getBody()->getArgument(i));

		normalizedLowerBounds.push_back(std::get<0>(resultBounds));
		normalizedUpperBounds.push_back(std::get<1>(resultBounds));
		normalizedSteps.push_back(std::get<2>(resultBounds));
		}

		bondhugulaUnsubmitted Not Done Reply Inline Actions Nit: period at the end. bondhugula: Nit: period at the end.
		// Combine iteration spaces
		SmallVector<Value, 3> lowerBounds;
		SmallVector<Value, 3> steps;
		SmallVector<Value, 3> upperBounds;
		auto cst0 = outsideBuilder.create<ConstantIndexOp>(loc, 0);
		auto cst1 = outsideBuilder.create<ConstantIndexOp>(loc, 1);
		for (unsigned i = 0, e = combinedDimensions.size(); i < e; ++i) {
		Value newUpperBound = outsideBuilder.create<ConstantIndexOp>(loc, 1);
		herhutUnsubmitted Done Reply Inline Actions Why not `newUpperBound = cst1` here? herhut: Why not `newUpperBound = cst1` here?
		tpoppAuthorUnsubmitted Done Reply Inline Actions No real reason. I thought it would be easier for debugging purposes if each string of calculations is fully unconnected from other calculations. tpopp: No real reason. I thought it would be easier for debugging purposes if each string of…
		for (auto idx : combinedDimensions[i]) {
		newUpperBound = outsideBuilder.create<MulIOp>(loc, newUpperBound,
		normalizedUpperBounds[idx]);
		}
		lowerBounds.push_back(cst0);
		steps.push_back(cst1);
		upperBounds.push_back(newUpperBound);
		}

		// Create new ParallelLoop with conversions to the original induction values.
		auto newPloop = outsideBuilder.create<loop::ParallelOp>(loc, lowerBounds,
		upperBounds, steps);
		OpBuilder insideBuilder(newPloop.getBody(), newPloop.getBody()->begin());
		for (unsigned i = 0, e = combinedDimensions.size(); i < e; ++i) {
		rriddleUnsubmitted Done Reply Inline Actions Same here and below. rriddle: Same here and below.
		Value previous = newPloop.getBody()->getArgument(i);
		for (unsigned idx = 0, e = combinedDimensions[i].size(); idx < e; ++idx) {
		herhutUnsubmitted Done Reply Inline Actions A comment what this computes would help readability. herhut: A comment what this computes would help readability.
		unsigned ivar_idx = combinedDimensions[i][idx];
		if (idx != 0)
		previous = insideBuilder.create<SignedDivIOp>(
		loc, previous, loops.upperBound()[ivar_idx]);
		herhutUnsubmitted Done Reply Inline Actions Should this be the normalized upper bound? herhut: Should this be the normalized upper bound?
		tpoppAuthorUnsubmitted Done Reply Inline Actions Yes tpopp: Yes

		Value iv = (idx == e - 1)
		herhutUnsubmitted Done Reply Inline Actions It would read easier for me if updating previous was also done here except for the last case. Would that make sense? herhut: It would read easier for me if updating previous was also done here except for the last case.
		tpoppAuthorUnsubmitted Done Reply Inline Actions I think this trades one mess for a different one because then it's just a different bounds check and not all indexing is happening at ivar_idx anymore. tpopp: I think this trades one mess for a different one because then it's just a different bounds…
		tpoppAuthorUnsubmitted Done Reply Inline Actions I tried to restructure it to be more readable. tpopp: I tried to restructure it to be more readable.
		? previous
		: insideBuilder.create<SignedRemIOp>(
		loc, previous, loops.upperBound()[ivar_idx]);
		herhutUnsubmitted Done Reply Inline Actions Normalized here, too? herhut: Normalized here, too?
		tpoppAuthorUnsubmitted Done Reply Inline Actions Yes tpopp: Yes
		replaceAllUsesInRegionWith(loops.getBody()->getArgument(ivar_idx), iv,
		loops.region());
		}
		}

		// Replace the old loop with the new loop.
		loops.getBody()->back().erase();
		newPloop.getBody()->getOperations().splice(
		Block::iterator(newPloop.getBody()->back()),
		loops.getBody()->getOperations());
		loops.erase();
		}

void mlir::mapLoopToProcessorIds(loop::ForOp forOp, ArrayRef<Value> processorId,		void mlir::mapLoopToProcessorIds(loop::ForOp forOp, ArrayRef<Value> processorId,
ArrayRef<Value> numProcessors) {		ArrayRef<Value> numProcessors) {
assert(processorId.size() == numProcessors.size());		assert(processorId.size() == numProcessors.size());
if (processorId.empty())		if (processorId.empty())
return;		return;

OpBuilder b(forOp);		OpBuilder b(forOp);
Location loc(forOp.getLoc());		Location loc(forOp.getLoc());
▲ Show 20 Lines • Show All 747 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[MLIR] Add parallel loop coalescing.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 251306

mlir/include/mlir/InitAllPasses.h

mlir/include/mlir/Transforms/LoopUtils.h

mlir/include/mlir/Transforms/Passes.h

mlir/lib/Transforms/CMakeLists.txt

mlir/lib/Transforms/Utils/LoopUtils.cpp

This is an archive of the discontinued LLVM Phabricator instance.

[MLIR] Add parallel loop coalescing.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 251306

mlir/include/mlir/InitAllPasses.h

mlir/include/mlir/Transforms/LoopUtils.h

mlir/include/mlir/Transforms/Passes.h

mlir/lib/Transforms/CMakeLists.txt

mlir/lib/Transforms/Utils/LoopUtils.cpp

[MLIR] Add parallel loop coalescing.
ClosedPublic