This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Transforms/
-
mlir/
-
Transforms/
-
LoopUtils.h
-
lib/Transforms/Utils/
-
Transforms/
-
Utils/
10/14
LoopUtils.cpp
-
test/
-
Dialect/Loops/
-
Loops/
2/3
loop-unroll.mlir
-
lib/Transforms/
-
Transforms/
-
CMakeLists.txt
2/4
TestLoopUnrolling.cpp
-
tools/mlir-opt/
-
mlir-opt/
-
mlir-opt.cpp

Differential D79184

[MLIR][LoopOps] Adds the loop unroll transformation for loop::ForOp.
ClosedPublic

Authored by andydavis1 on Apr 30 2020, 10:40 AM.

Download Raw Diff

Details

Reviewers

ftynse
nicolasvasilache

Commits

rG93d1108801dd: [MLIR][LoopOps] Adds the loop unroll transformation for loop::ForOp.

Summary

Adds the loop unroll transformation for loop::ForOp.
Adds support for promoting the body of single-iteration loop::ForOps into its containing block.
Adds check tests for loop::ForOps with dynamic and static lower/upper bounds and step.
Care was taken to share code (where possible) with the AffineForOp unroll transformation to ease maintenance and potential future transition to a LoopLike construct on which loop transformations for different loop types can implemented.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

andydavis1 created this revision.Apr 30 2020, 10:40 AM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptApr 30 2020, 10:40 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: llvm-commits, Kayjukh, frgossen and 15 others. · View Herald Transcript

bondhugula requested changes to this revision.May 1 2020, 12:57 AM

bondhugula added a subscriber: bondhugula.

bondhugula added inline comments.

mlir/lib/Transforms/Utils/LoopUtils.cpp
195	I don't think you need a separate method. Just change the promoteIfSingleIteration to work on LoopLikeOp. You'll just need to add a new getLowerBound interface method (pass the builder for it to create an affine.apply if necessary). The rest like getConstantTripCount / getConstantLowerBound are all already there or easy to add.

This revision now requires changes to proceed.May 1 2020, 12:57 AM

andydavis1 marked an inline comment as done.May 1 2020, 8:22 AM

andydavis1 added inline comments.

mlir/lib/Transforms/Utils/LoopUtils.cpp
195	Thanks. Yes, I agree that we can do more code sharing by moving these changes towards a LoopLike interface kind of a thing (I mentioned that a bit in the description). But if possible, I'd like to do that in a follow up change, as its not a simple as switching loop::ForOp to LoopLikeOp, and I'd like to minimize the changes in this patch. Thanks.

andydavis1 requested review of this revision.May 1 2020, 8:23 AM

bondhugula marked an inline comment as done.May 2 2020, 9:22 AM

bondhugula added inline comments.

mlir/lib/Transforms/Utils/LoopUtils.cpp
195	Sure, fine to do that in a follow up patch.

bondhugula removed a reviewer: bondhugula.May 2 2020, 9:23 AM

Thanks, Andy!

mlir/lib/Transforms/Utils/LoopUtils.cpp
140	Would it make sense to share this with affine expression lowering https://github.com/llvm/llvm-project/blob/d3588d0814c4cbc7fca677b4d9634f6e1428a331/mlir/lib/Conversion/AffineToStandard/AffineToStandard.cpp#L149 ?
211	Nit: it feels like you could just call `iv.replaceAllUsesWith` and let it do nothing if there are no uses
489	Since this is only used inside `std::next` below, how about taking `std::prev(..., 1)` and dropping `std::next` ?
504	Nit: drop trivial braces
603	Please document this precondition. I don't think loop::ForOp disallows negative bounds.
613	Nit: drop trivial braces
655	If you take the ceildiv implementations from AffineApplyExpander, you may be able to support negative dividends.
671	This comment looks confusing because it doesn't account for step. (Same issue with the affine version)
mlir/test/Dialect/Loops/loop-unroll.mlir
21	Why is it a DAG? Is there some non-determinism in operation order?
48	Hmm, could you just use the same input (i.e. `@dynamic_loop_unroll`) and match it with different prefixes? All input functions are transformed by all four RUNs and most of them are just ignored in the test.
mlir/test/lib/Transforms/TestLoopUnrolling.cpp
2	Nit: pad until 80 characters
47	Nit: since the required depth is known upfront, how about just storing the loops of this depth in a vector, instead of filtering the vector of all loops later?

This revision is now accepted and ready to land.May 5 2020, 3:01 AM

Thanks Alex! Will rebase this with changes in a bit...

mlir/lib/Transforms/Utils/LoopUtils.cpp
140	Thanks for pointing that out. I do think it makes sense to combine those at some point, as the implementation you reference appears more general. I'll give that some thought, perhaps it would make the unrolling implementation more general.
489	In this case, we need to keep the last non-terminator because the loop body is being cloned in place std::next(srcBlockEnd) can change as unrolled loop bodies are cloned in-place.
655	Thanks. Captured in the TODO here.
mlir/test/Dialect/Loops/loop-unroll.mlir
21	I think that I did see some non-determinism, but these are also DAG because the ordering here for these particular ops is not critical to the transformation (these ops just need to be ordered correctly w.r.t dependences). I've used CHECK and CHECK-NEXT for ops that are critical for the test to show the transformation.

addressing review comments

Rebase

Closed by commit rG93d1108801dd: [MLIR][LoopOps] Adds the loop unroll transformation for loop::ForOp. (authored by Andy Davis <andydavis@google.com>). · Explain WhyMay 5 2020, 10:47 AM

This revision was automatically updated to reflect the committed changes.

Harbormaster failed remote builds in B55822: Diff 262162!May 5 2020, 10:48 AM

Harbormaster failed remote builds in B55814: Diff 262150!May 5 2020, 11:20 AM

@andydavis1 : can you cleanup commit messages from extra phabricator stuff before pushing please? See https://mlir.llvm.org/getting_started/Contributing/#using-arcanist for a helper function.

Herald added subscribers: jurahul, stephenneuendorffer. · View Herald TranscriptMay 17 2020, 9:18 PM

rriddle added inline comments.May 27 2020, 1:07 PM

mlir/test/lib/Transforms/TestLoopUnrolling.cpp
24	nit: Static functions go in the top-level namespace.
52	nit: Drop trivial braces.

Revision Contents

Path

Size

mlir/

include/

mlir/

Transforms/

LoopUtils.h

8 lines

lib/

Transforms/

Utils/

LoopUtils.cpp

243 lines

test/

Dialect/

Loops/

loop-unroll.mlir

250 lines

lib/

Transforms/

CMakeLists.txt

1 line

TestLoopUnrolling.cpp

68 lines

tools/

mlir-opt/

mlir-opt.cpp

2 lines

Diff 262150

mlir/include/mlir/Transforms/LoopUtils.h

	Show All 32 Lines
	} // end namespace loop			} // end namespace loop

	/// Unrolls this for operation completely if the trip count is known to be			/// Unrolls this for operation completely if the trip count is known to be
	/// constant. Returns failure otherwise.			/// constant. Returns failure otherwise.
	LogicalResult loopUnrollFull(AffineForOp forOp);			LogicalResult loopUnrollFull(AffineForOp forOp);

	/// Unrolls this for operation by the specified unroll factor. Returns failure			/// Unrolls this for operation by the specified unroll factor. Returns failure
	/// if the loop cannot be unrolled either due to restrictions or due to invalid			/// if the loop cannot be unrolled either due to restrictions or due to invalid
	/// unroll factors.			/// unroll factors. Requires positive loop bounds and step.
	LogicalResult loopUnrollByFactor(AffineForOp forOp, uint64_t unrollFactor);			LogicalResult loopUnrollByFactor(AffineForOp forOp, uint64_t unrollFactor);
				LogicalResult loopUnrollByFactor(loop::ForOp forOp, uint64_t unrollFactor);

	/// Unrolls this loop by the specified unroll factor or its trip count,			/// Unrolls this loop by the specified unroll factor or its trip count,
	/// whichever is lower.			/// whichever is lower.
	LogicalResult loopUnrollUpToFactor(AffineForOp forOp, uint64_t unrollFactor);			LogicalResult loopUnrollUpToFactor(AffineForOp forOp, uint64_t unrollFactor);

	/// Returns true if `loops` is a perfectly nested loop nest, where loops appear			/// Returns true if `loops` is a perfectly nested loop nest, where loops appear
	/// in it from outermost to innermost.			/// in it from outermost to innermost.
	bool LLVM_ATTRIBUTE_UNUSED isPerfectlyNested(ArrayRef<AffineForOp> loops);			bool LLVM_ATTRIBUTE_UNUSED isPerfectlyNested(ArrayRef<AffineForOp> loops);
	Show All 12 Lines
	LogicalResult loopUnrollJamByFactor(AffineForOp forOp,			LogicalResult loopUnrollJamByFactor(AffineForOp forOp,
	uint64_t unrollJamFactor);			uint64_t unrollJamFactor);

	/// Unrolls and jams this loop by the specified factor or by the trip count (if			/// Unrolls and jams this loop by the specified factor or by the trip count (if
	/// constant), whichever is lower.			/// constant), whichever is lower.
	LogicalResult loopUnrollJamUpToFactor(AffineForOp forOp,			LogicalResult loopUnrollJamUpToFactor(AffineForOp forOp,
	uint64_t unrollJamFactor);			uint64_t unrollJamFactor);

	/// Promotes the loop body of a AffineForOp to its containing block if the			/// Promotes the loop body of a AffineForOp/loop::ForOp to its containing block
	/// AffineForOp was known to have a single iteration.			/// if the loop was known to have a single iteration.
	LogicalResult promoteIfSingleIteration(AffineForOp forOp);			LogicalResult promoteIfSingleIteration(AffineForOp forOp);
				LogicalResult promoteIfSingleIteration(loop::ForOp forOp);

	/// Promotes all single iteration AffineForOp's in the Function, i.e., moves			/// Promotes all single iteration AffineForOp's in the Function, i.e., moves
	/// their body into the containing Block.			/// their body into the containing Block.
	void promoteSingleIterationLoops(FuncOp f);			void promoteSingleIterationLoops(FuncOp f);

	/// Skew the operations in an affine.for's body with the specified			/// Skew the operations in an affine.for's body with the specified
	/// operation-wise shifts. The shifts are with respect to the original execution			/// operation-wise shifts. The shifts are with respect to the original execution
	/// order, and are multiplied by the loop 'step' before being applied. If			/// order, and are multiplied by the loop 'step' before being applied. If
	▲ Show 20 Lines • Show All 217 Lines • Show Last 20 Lines

mlir/lib/Transforms/Utils/LoopUtils.cpp

Show All 18 Lines
#include "mlir/Dialect/Affine/IR/AffineOps.h"		#include "mlir/Dialect/Affine/IR/AffineOps.h"
#include "mlir/Dialect/Affine/IR/AffineValueMap.h"		#include "mlir/Dialect/Affine/IR/AffineValueMap.h"
#include "mlir/Dialect/LoopOps/LoopOps.h"		#include "mlir/Dialect/LoopOps/LoopOps.h"
#include "mlir/IR/AffineMap.h"		#include "mlir/IR/AffineMap.h"
#include "mlir/IR/BlockAndValueMapping.h"		#include "mlir/IR/BlockAndValueMapping.h"
#include "mlir/IR/Function.h"		#include "mlir/IR/Function.h"
#include "mlir/IR/IntegerSet.h"		#include "mlir/IR/IntegerSet.h"
#include "mlir/IR/PatternMatch.h"		#include "mlir/IR/PatternMatch.h"
		#include "mlir/Support/MathExtras.h"
#include "mlir/Transforms/RegionUtils.h"		#include "mlir/Transforms/RegionUtils.h"
#include "mlir/Transforms/Utils.h"		#include "mlir/Transforms/Utils.h"
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/MapVector.h"		#include "llvm/ADT/MapVector.h"
#include "llvm/ADT/SetVector.h"		#include "llvm/ADT/SetVector.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines	static void getCleanupLoopLowerBound(AffineForOp forOp, unsigned unrollFactor,
for (auto v : bumpValues)		for (auto v : bumpValues)
if (v.use_empty())		if (v.use_empty())
v.getDefiningOp()->erase();		v.getDefiningOp()->erase();

if (lb.use_empty())		if (lb.use_empty())
lb.erase();		lb.erase();
}		}

		// Build the IR that performs ceil division of a positive value by a constant:
		// ceildiv(a, B) = divis(a + (B-1), B)
		// where divis is rounding-to-zero division.
		static Value ceilDivPositive(OpBuilder &builder, Location loc, Value dividend,
		int64_t divisor) {
		assert(divisor > 0 && "expected positive divisor");
		assert(dividend.getType().isIndex() && "expected index-typed value");

		Value divisorMinusOneCst = builder.create<ConstantIndexOp>(loc, divisor - 1);
		Value divisorCst = builder.create<ConstantIndexOp>(loc, divisor);
		Value sum = builder.create<AddIOp>(loc, dividend, divisorMinusOneCst);
		return builder.create<SignedDivIOp>(loc, sum, divisorCst);
		}

		// Build the IR that performs ceil division of a positive value by another
		// positive value:
		// ceildiv(a, b) = divis(a + (b - 1), b)
		// where divis is rounding-to-zero division.
		static Value ceilDivPositive(OpBuilder &builder, Location loc, Value dividend,
		ftynseUnsubmitted Not Done Reply Inline Actions Would it make sense to share this with affine expression lowering https://github.com/llvm/llvm-project/blob/d3588d0814c4cbc7fca677b4d9634f6e1428a331/mlir/lib/Conversion/AffineToStandard/AffineToStandard.cpp#L149 ? ftynse: Would it make sense to share this with affine expression lowering https://github.com/llvm/llvm…
		andydavis1AuthorUnsubmitted Done Reply Inline Actions Thanks for pointing that out. I do think it makes sense to combine those at some point, as the implementation you reference appears more general. I'll give that some thought, perhaps it would make the unrolling implementation more general. andydavis1: Thanks for pointing that out. I do think it makes sense to combine those at some point, as the…
		Value divisor) {
		assert(dividend.getType().isIndex() && "expected index-typed value");

		Value cstOne = builder.create<ConstantIndexOp>(loc, 1);
		Value divisorMinusOne = builder.create<SubIOp>(loc, divisor, cstOne);
		Value sum = builder.create<AddIOp>(loc, dividend, divisorMinusOne);
		return builder.create<SignedDivIOp>(loc, sum, divisor);
		}

/// Promotes the loop body of a forOp to its containing block if the forOp		/// Promotes the loop body of a forOp to its containing block if the forOp
/// was known to have a single iteration.		/// was known to have a single iteration.
// TODO(bondhugula): extend this for arbitrary affine bounds.		// TODO(bondhugula): extend this for arbitrary affine bounds.
LogicalResult mlir::promoteIfSingleIteration(AffineForOp forOp) {		LogicalResult mlir::promoteIfSingleIteration(AffineForOp forOp) {
Optional<uint64_t> tripCount = getConstantTripCount(forOp);		Optional<uint64_t> tripCount = getConstantTripCount(forOp);
if (!tripCount \|\| tripCount.getValue() != 1)		if (!tripCount \|\| tripCount.getValue() != 1)
return failure();		return failure();

Show All 27 Lines	LogicalResult mlir::promoteIfSingleIteration(AffineForOp forOp) {
// containing block.		// containing block.
forOp.getBody()->back().erase();		forOp.getBody()->back().erase();
parentBlock->getOperations().splice(Block::iterator(forOp),		parentBlock->getOperations().splice(Block::iterator(forOp),
forOp.getBody()->getOperations());		forOp.getBody()->getOperations());
forOp.erase();		forOp.erase();
return success();		return success();
}		}

		/// Promotes the loop body of a forOp to its containing block if the forOp
		/// it can be determined that the loop has a single iteration.
		LogicalResult mlir::promoteIfSingleIteration(loop::ForOp forOp) {
		bondhugulaUnsubmitted Not Done Reply Inline Actions I don't think you need a separate method. Just change the promoteIfSingleIteration to work on LoopLikeOp. You'll just need to add a new getLowerBound interface method (pass the builder for it to create an affine.apply if necessary). The rest like getConstantTripCount / getConstantLowerBound are all already there or easy to add. bondhugula: I don't think you need a separate method. Just change the promoteIfSingleIteration to work on…
		andydavis1AuthorUnsubmitted Done Reply Inline Actions Thanks. Yes, I agree that we can do more code sharing by moving these changes towards a LoopLike interface kind of a thing (I mentioned that a bit in the description). But if possible, I'd like to do that in a follow up change, as its not a simple as switching loop::ForOp to LoopLikeOp, and I'd like to minimize the changes in this patch. Thanks. andydavis1: Thanks. Yes, I agree that we can do more code sharing by moving these changes towards a…
		bondhugulaUnsubmitted Done Reply Inline Actions Sure, fine to do that in a follow up patch. bondhugula: Sure, fine to do that in a follow up patch.
		auto lbCstOp =
		dyn_cast_or_null<ConstantIndexOp>(forOp.lowerBound().getDefiningOp());
		auto ubCstOp =
		dyn_cast_or_null<ConstantIndexOp>(forOp.upperBound().getDefiningOp());
		auto stepCstOp =
		dyn_cast_or_null<ConstantIndexOp>(forOp.step().getDefiningOp());
		if (!lbCstOp \|\| !ubCstOp \|\| !stepCstOp \|\| lbCstOp.getValue() < 0 \|\|
		ubCstOp.getValue() < 0 \|\| stepCstOp.getValue() < 0)
		return failure();
		int64_t tripCount = mlir::ceilDiv(ubCstOp.getValue() - lbCstOp.getValue(),
		stepCstOp.getValue());
		if (tripCount != 1)
		return failure();
		auto iv = forOp.getInductionVar();
		iv.replaceAllUsesWith(lbCstOp);

		ftynseUnsubmitted Done Reply Inline Actions Nit: it feels like you could just call `iv.replaceAllUsesWith` and let it do nothing if there are no uses ftynse: Nit: it feels like you could just call `iv.replaceAllUsesWith` and let it do nothing if there…
		// Move the loop body operations, except for its terminator, to the loop's
		// containing block.
		auto *parentBlock = forOp.getOperation()->getBlock();
		forOp.getBody()->back().erase();
		parentBlock->getOperations().splice(Block::iterator(forOp),
		forOp.getBody()->getOperations());
		forOp.erase();
		return success();
		}

/// Promotes all single iteration 'for' ops in `f`, i.e., moves		/// Promotes all single iteration 'for' ops in `f`, i.e., moves
/// their body into the containing Block.		/// their body into the containing Block.
void mlir::promoteSingleIterationLoops(FuncOp f) {		void mlir::promoteSingleIterationLoops(FuncOp f) {
// Gathers all innermost loops through a post order pruned walk.		// Gathers all innermost loops through a post order pruned walk.
f.walk([](AffineForOp forOp) { promoteIfSingleIteration(forOp); });		f.walk([](AffineForOp forOp) { promoteIfSingleIteration(forOp); });
}		}

/// Generates an affine.for op with the specified lower and upper bounds		/// Generates an affine.for op with the specified lower and upper bounds
▲ Show 20 Lines • Show All 239 Lines • ▼ Show 20 Lines	LogicalResult mlir::loopUnrollUpToFactor(AffineForOp forOp,
Optional<uint64_t> mayBeConstantTripCount = getConstantTripCount(forOp);		Optional<uint64_t> mayBeConstantTripCount = getConstantTripCount(forOp);

if (mayBeConstantTripCount.hasValue() &&		if (mayBeConstantTripCount.hasValue() &&
mayBeConstantTripCount.getValue() < unrollFactor)		mayBeConstantTripCount.getValue() < unrollFactor)
return loopUnrollByFactor(forOp, mayBeConstantTripCount.getValue());		return loopUnrollByFactor(forOp, mayBeConstantTripCount.getValue());
return loopUnrollByFactor(forOp, unrollFactor);		return loopUnrollByFactor(forOp, unrollFactor);
}		}

		// Generates unrolled copies of AffineForOp or loop::ForOp 'loopBodyBlock', with
		// associated 'forOpIV' by 'unrollFactor', calling 'ivRemapFn' to remap
		// 'forOpIV' for each unrolled body.
		static void generateUnrolledLoop(
		Block *loopBodyBlock, Value forOpIV, uint64_t unrollFactor,
		function_ref<Value(unsigned, Value, OpBuilder)> ivRemapFn) {
		// Builder to insert unrolled bodies just before the terminator of the body of
		// 'forOp'.
		auto builder = OpBuilder::atBlockTerminator(loopBodyBlock);

		// Keep a pointer to the last non-terminator operation in the original block
		// so that we know what to clone (since we are doing this in-place).
		Block::iterator srcBlockEnd = std::prev(loopBodyBlock->end(), 2);
		ftynseUnsubmitted Not Done Reply Inline Actions Since this is only used inside `std::next` below, how about taking `std::prev(..., 1)` and dropping `std::next` ? ftynse: Since this is only used inside `std::next` below, how about taking `std::prev(..., 1)` and…
		andydavis1AuthorUnsubmitted Done Reply Inline Actions In this case, we need to keep the last non-terminator because the loop body is being cloned in place std::next(srcBlockEnd) can change as unrolled loop bodies are cloned in-place. andydavis1: In this case, we need to keep the last non-terminator because the loop body is being cloned in…

		// Unroll the contents of 'forOp' (append unrollFactor - 1 additional copies).
		for (unsigned i = 1; i < unrollFactor; i++) {
		BlockAndValueMapping operandMap;

		// If the induction variable is used, create a remapping to the value for
		// this unrolled instance.
		if (!forOpIV.use_empty()) {
		Value ivUnroll = ivRemapFn(i, forOpIV, builder);
		operandMap.map(forOpIV, ivUnroll);
		}

		// Clone the original body of 'forOp'.
		for (auto it = loopBodyBlock->begin(); it != std::next(srcBlockEnd); it++)
		builder.clone(*it, operandMap);
		ftynseUnsubmitted Done Reply Inline Actions Nit: drop trivial braces ftynse: Nit: drop trivial braces
		}
		}

/// Unrolls this loop by the specified factor. Returns success if the loop		/// Unrolls this loop by the specified factor. Returns success if the loop
/// is successfully unrolled.		/// is successfully unrolled.
LogicalResult mlir::loopUnrollByFactor(AffineForOp forOp,		LogicalResult mlir::loopUnrollByFactor(AffineForOp forOp,
uint64_t unrollFactor) {		uint64_t unrollFactor) {
assert(unrollFactor > 0 && "unroll factor should be positive");		assert(unrollFactor > 0 && "unroll factor should be positive");

if (unrollFactor == 1)		if (unrollFactor == 1)
return promoteIfSingleIteration(forOp);		return promoteIfSingleIteration(forOp);
Show All 35 Lines	if (getLargestDivisorOfTripCount(forOp) % unrollFactor != 0) {
// Adjust upper bound of the original loop; this is the same as the lower		// Adjust upper bound of the original loop; this is the same as the lower
// bound of the cleanup loop.		// bound of the cleanup loop.
forOp.setUpperBound(cleanupOperands, cleanupMap);		forOp.setUpperBound(cleanupOperands, cleanupMap);
}		}

// Scale the step of loop being unrolled by unroll factor.		// Scale the step of loop being unrolled by unroll factor.
int64_t step = forOp.getStep();		int64_t step = forOp.getStep();
forOp.setStep(step * unrollFactor);		forOp.setStep(step * unrollFactor);
		generateUnrolledLoop(forOp.getBody(), forOp.getInductionVar(), unrollFactor,
		[&](unsigned i, Value iv, OpBuilder b) {
		// iv' = iv + i * step
		auto d0 = b.getAffineDimExpr(0);
		auto bumpMap = AffineMap::get(1, 0, d0 + i * step);
		return b.create<AffineApplyOp>(forOp.getLoc(), bumpMap,
		iv);
		});

// Builder to insert unrolled bodies just before the terminator of the body of		// Promote the loop body up if this has turned into a single iteration loop.
// 'forOp'.		promoteIfSingleIteration(forOp);
auto builder = OpBuilder::atBlockTerminator(forOp.getBody());		return success();
		}
// Keep a pointer to the last non-terminator operation in the original block
// so that we know what to clone (since we are doing this in-place).
Block::iterator srcBlockEnd = std::prev(forOp.getBody()->end(), 2);

// Unroll the contents of 'forOp' (append unrollFactor - 1 additional copies).		/// Unrolls 'forOp' by 'unrollFactor', returns success if the loop is unrolled.
auto forOpIV = forOp.getInductionVar();		LogicalResult mlir::loopUnrollByFactor(loop::ForOp forOp,
for (unsigned i = 1; i < unrollFactor; i++) {		uint64_t unrollFactor) {
BlockAndValueMapping operandMap;		assert(unrollFactor > 0 && "expected positive unroll factor");
		if (unrollFactor == 1)
		return promoteIfSingleIteration(forOp);

// If the induction variable is used, create a remapping to the value for		// Return if the loop body is empty.
// this unrolled instance.		if (llvm::hasSingleElement(forOp.getBody()->getOperations()))
if (!forOpIV.use_empty()) {		return success();
// iv' = iv + 1/2/3...unrollFactor-1;
auto d0 = builder.getAffineDimExpr(0);
auto bumpMap = AffineMap::get(1, 0, d0 + i * step);
auto ivUnroll =
builder.create<AffineApplyOp>(forOp.getLoc(), bumpMap, forOpIV);
operandMap.map(forOpIV, ivUnroll);
}

// Clone the original body of 'forOp'.		// Compute tripCount = ceilDiv((upperBound - lowerBound), step) and populate
for (auto it = forOp.getBody()->begin(); it != std::next(srcBlockEnd);		// 'upperBoundUnrolled' and 'stepUnrolled' for static and dynamic cases.
it++) {		OpBuilder boundsBuilder(forOp);
builder.clone(*it, operandMap);		auto loc = forOp.getLoc();
}		auto step = forOp.step();
		Value upperBoundUnrolled;
		Value stepUnrolled;
		bool generateEpilogueLoop = true;

		auto lbCstOp =
		dyn_cast_or_null<ConstantIndexOp>(forOp.lowerBound().getDefiningOp());
		auto ubCstOp =
		dyn_cast_or_null<ConstantIndexOp>(forOp.upperBound().getDefiningOp());
		auto stepCstOp =
		dyn_cast_or_null<ConstantIndexOp>(forOp.step().getDefiningOp());
		if (lbCstOp && ubCstOp && stepCstOp) {
		// Constant loop bounds computation.
		int64_t lbCst = lbCstOp.getValue();
		int64_t ubCst = ubCstOp.getValue();
		int64_t stepCst = stepCstOp.getValue();
		ftynseUnsubmitted Done Reply Inline Actions Please document this precondition. I don't think loop::ForOp disallows negative bounds. ftynse: Please document this precondition. I don't think loop::ForOp disallows negative bounds.
		assert(lbCst >= 0 && ubCst >= 0 && stepCst >= 0 &&
		"expected positive loop bounds and step");
		int64_t tripCount = mlir::ceilDiv(ubCst - lbCst, stepCst);
		int64_t tripCountEvenMultiple = tripCount - (tripCount % unrollFactor);
		int64_t upperBoundUnrolledCst = lbCst + tripCountEvenMultiple * stepCst;
		assert(upperBoundUnrolledCst <= ubCst);
		int64_t stepUnrolledCst = stepCst * unrollFactor;

		// Create constant for 'upperBoundUnrolled' and set epilogue loop flag.
		generateEpilogueLoop = upperBoundUnrolledCst < ubCst;
		ftynseUnsubmitted Done Reply Inline Actions Nit: drop trivial braces ftynse: Nit: drop trivial braces
		if (generateEpilogueLoop)
		upperBoundUnrolled =
		boundsBuilder.create<ConstantIndexOp>(loc, upperBoundUnrolledCst);
		else
		upperBoundUnrolled = ubCstOp;

		// Create constant for 'stepUnrolled'.
		stepUnrolled =
		stepCst == stepUnrolledCst
		? step
		: boundsBuilder.create<ConstantIndexOp>(loc, stepUnrolledCst);
		} else {
		// Dynamic loop bounds computation.
		// TODO(andydavis) Add dynamic asserts for negative lb/ub/step, or
		// consider using ceilDiv from AffineApplyExpander.
		auto lowerBound = forOp.lowerBound();
		auto upperBound = forOp.upperBound();
		Value diff = boundsBuilder.create<SubIOp>(loc, upperBound, lowerBound);
		Value tripCount = ceilDivPositive(boundsBuilder, loc, diff, step);
		Value unrollFactorCst =
		boundsBuilder.create<ConstantIndexOp>(loc, unrollFactor);
		Value tripCountRem =
		boundsBuilder.create<SignedRemIOp>(loc, tripCount, unrollFactorCst);
		// Compute tripCountEvenMultiple = tripCount - (tripCount % unrollFactor)
		Value tripCountEvenMultiple =
		boundsBuilder.create<SubIOp>(loc, tripCount, tripCountRem);
		// Compute upperBoundUnrolled = lowerBound + tripCountEvenMultiple * step
		upperBoundUnrolled = boundsBuilder.create<AddIOp>(
		loc, lowerBound,
		boundsBuilder.create<MulIOp>(loc, tripCountEvenMultiple, step));
		// Scale 'step' by 'unrollFactor'.
		stepUnrolled = boundsBuilder.create<MulIOp>(loc, step, unrollFactorCst);
		}

		// Create epilogue clean up loop starting at 'upperBoundUnrolled'.
		if (generateEpilogueLoop) {
		OpBuilder epilogueBuilder(forOp.getOperation()->getBlock(),
		std::next(Block::iterator(forOp)));
		auto epilogueForOp = cast<loop::ForOp>(epilogueBuilder.clone(*forOp));
		epilogueForOp.setLowerBound(upperBoundUnrolled);
		promoteIfSingleIteration(epilogueForOp);
}		}
		ftynseUnsubmitted Not Done Reply Inline Actions If you take the ceildiv implementations from AffineApplyExpander, you may be able to support negative dividends. ftynse: If you take the ceildiv implementations from AffineApplyExpander, you may be able to support…
		andydavis1AuthorUnsubmitted Done Reply Inline Actions Thanks. Captured in the TODO here. andydavis1: Thanks. Captured in the TODO here.

		// Create unrolled loop.
		forOp.setUpperBound(upperBoundUnrolled);
		forOp.setStep(stepUnrolled);
		generateUnrolledLoop(forOp.getBody(), forOp.getInductionVar(), unrollFactor,
		[&](unsigned i, Value iv, OpBuilder b) {
		// iv' = iv + step * i;
		auto stride = b.create<MulIOp>(
		loc, step, b.create<ConstantIndexOp>(loc, i));
		return b.create<AddIOp>(loc, iv, stride);
		});
// Promote the loop body up if this has turned into a single iteration loop.		// Promote the loop body up if this has turned into a single iteration loop.
promoteIfSingleIteration(forOp);		promoteIfSingleIteration(forOp);
return success();		return success();
}		}

		ftynseUnsubmitted Done Reply Inline Actions This comment looks confusing because it doesn't account for step. (Same issue with the affine version) ftynse: This comment looks confusing because it doesn't account for step. (Same issue with the affine…
LogicalResult mlir::loopUnrollJamUpToFactor(AffineForOp forOp,		LogicalResult mlir::loopUnrollJamUpToFactor(AffineForOp forOp,
uint64_t unrollJamFactor) {		uint64_t unrollJamFactor) {
Optional<uint64_t> mayBeConstantTripCount = getConstantTripCount(forOp);		Optional<uint64_t> mayBeConstantTripCount = getConstantTripCount(forOp);
if (mayBeConstantTripCount.hasValue() &&		if (mayBeConstantTripCount.hasValue() &&
mayBeConstantTripCount.getValue() < unrollJamFactor)		mayBeConstantTripCount.getValue() < unrollJamFactor)
return loopUnrollJamByFactor(forOp, mayBeConstantTripCount.getValue());		return loopUnrollJamByFactor(forOp, mayBeConstantTripCount.getValue());
return loopUnrollJamByFactor(forOp, unrollJamFactor);		return loopUnrollJamByFactor(forOp, unrollJamFactor);
}		}
▲ Show 20 Lines • Show All 512 Lines • ▼ Show 20 Lines	Loops mlir::tilePerfectlyNested(loop::ForOp rootForOp, ArrayRef<Value> sizes) {
forOps.reserve(sizes.size());		forOps.reserve(sizes.size());
getPerfectlyNestedLoopsImpl(forOps, rootForOp, sizes.size());		getPerfectlyNestedLoopsImpl(forOps, rootForOp, sizes.size());
if (forOps.size() < sizes.size())		if (forOps.size() < sizes.size())
sizes = sizes.take_front(forOps.size());		sizes = sizes.take_front(forOps.size());

return ::tile(forOps, sizes, forOps.back());		return ::tile(forOps, sizes, forOps.back());
}		}

// Build the IR that performs ceil division of a positive value by a constant:
// ceildiv(a, B) = divis(a + (B-1), B)
// where divis is rounding-to-zero division.
static Value ceilDivPositive(OpBuilder &builder, Location loc, Value dividend,
int64_t divisor) {
assert(divisor > 0 && "expected positive divisor");
assert(dividend.getType().isIndex() && "expected index-typed value");

Value divisorMinusOneCst = builder.create<ConstantIndexOp>(loc, divisor - 1);
Value divisorCst = builder.create<ConstantIndexOp>(loc, divisor);
Value sum = builder.create<AddIOp>(loc, dividend, divisorMinusOneCst);
return builder.create<SignedDivIOp>(loc, sum, divisorCst);
}

// Build the IR that performs ceil division of a positive value by another
// positive value:
// ceildiv(a, b) = divis(a + (b - 1), b)
// where divis is rounding-to-zero division.
static Value ceilDivPositive(OpBuilder &builder, Location loc, Value dividend,
Value divisor) {
assert(dividend.getType().isIndex() && "expected index-typed value");

Value cstOne = builder.create<ConstantIndexOp>(loc, 1);
Value divisorMinusOne = builder.create<SubIOp>(loc, divisor, cstOne);
Value sum = builder.create<AddIOp>(loc, dividend, divisorMinusOne);
return builder.create<SignedDivIOp>(loc, sum, divisor);
}

// Hoist the ops within `outer` that appear before `inner`.		// Hoist the ops within `outer` that appear before `inner`.
// Such ops include the ops that have been introduced by parametric tiling.		// Such ops include the ops that have been introduced by parametric tiling.
// Ops that come from triangular loops (i.e. that belong to the program slice		// Ops that come from triangular loops (i.e. that belong to the program slice
// rooted at `outer`) and ops that have side effects cannot be hoisted.		// rooted at `outer`) and ops that have side effects cannot be hoisted.
// Return failure when any op fails to hoist.		// Return failure when any op fails to hoist.
static LogicalResult hoistOpsBetween(loop::ForOp outer, loop::ForOp inner) {		static LogicalResult hoistOpsBetween(loop::ForOp outer, loop::ForOp inner) {
SetVector<Operation *> forwardSlice;		SetVector<Operation *> forwardSlice;
getForwardSlice(outer.getOperation(), &forwardSlice, [&inner](Operation *op) {		getForwardSlice(outer.getOperation(), &forwardSlice, [&inner](Operation *op) {
▲ Show 20 Lines • Show All 1,311 Lines • Show Last 20 Lines

mlir/test/Dialect/Loops/loop-unroll.mlir

This file was added.

				// RUN: mlir-opt %s -test-loop-unrolling='unroll-factor=2' \| FileCheck %s --check-prefix UNROLL-BY-2
				// RUN: mlir-opt %s -test-loop-unrolling='unroll-factor=3' \| FileCheck %s --check-prefix UNROLL-BY-3
				// RUN: mlir-opt %s -test-loop-unrolling='unroll-factor=2 loop-depth=0' \| FileCheck %s --check-prefix UNROLL-OUTER-BY-2
				// RUN: mlir-opt %s -test-loop-unrolling='unroll-factor=2 loop-depth=1' \| FileCheck %s --check-prefix UNROLL-INNER-BY-2

				func @dynamic_loop_unroll(%arg0 : index, %arg1 : index, %arg2 : index,
				%arg3: memref<?xf32>) {
				%0 = constant 7.0 : f32
				loop.for %i0 = %arg0 to %arg1 step %arg2 {
				store %0, %arg3[%i0] : memref<?xf32>
				}
				return
				}
				// UNROLL-BY-2-LABEL: func @dynamic_loop_unroll
				// UNROLL-BY-2-SAME: %[[LB:.*0]]: index,
				// UNROLL-BY-2-SAME: %[[UB:.*1]]: index,
				// UNROLL-BY-2-SAME: %[[STEP:.*2]]: index,
				// UNROLL-BY-2-SAME: %[[MEM:.*3]]: memref<?xf32>
				//
				// UNROLL-BY-2-DAG: %[[V0:.*]] = subi %[[UB]], %[[LB]] : index
				// UNROLL-BY-2-DAG: %[[C1:.*]] = constant 1 : index
				ftynseUnsubmitted Not Done Reply Inline Actions Why is it a DAG? Is there some non-determinism in operation order? ftynse: Why is it a DAG? Is there some non-determinism in operation order?
				andydavis1AuthorUnsubmitted Done Reply Inline Actions I think that I did see some non-determinism, but these are also DAG because the ordering here for these particular ops is not critical to the transformation (these ops just need to be ordered correctly w.r.t dependences). I've used CHECK and CHECK-NEXT for ops that are critical for the test to show the transformation. andydavis1: I think that I did see some non-determinism, but these are also DAG because the ordering here…
				// UNROLL-BY-2-DAG: %[[V1:.*]] = subi %[[STEP]], %[[C1]] : index
				// UNROLL-BY-2-DAG: %[[V2:.*]] = addi %[[V0]], %[[V1]] : index
				// Compute trip count in V3.
				// UNROLL-BY-2-DAG: %[[V3:.*]] = divi_signed %[[V2]], %[[STEP]] : index
				// Store unroll factor in C2.
				// UNROLL-BY-2-DAG: %[[C2:.*]] = constant 2 : index
				// UNROLL-BY-2-DAG: %[[V4:.*]] = remi_signed %[[V3]], %[[C2]] : index
				// UNROLL-BY-2-DAG: %[[V5:.*]] = subi %[[V3]], %[[V4]] : index
				// UNROLL-BY-2-DAG: %[[V6:.*]] = muli %[[V5]], %[[STEP]] : index
				// Compute upper bound of unrolled loop in V7.
				// UNROLL-BY-2-DAG: %[[V7:.*]] = addi %[[LB]], %[[V6]] : index
				// Compute step of unrolled loop in V8.
				// UNROLL-BY-2-DAG: %[[V8:.*]] = muli %[[STEP]], %[[C2]] : index
				// UNROLL-BY-2: loop.for %[[IV:.*]] = %[[LB]] to %[[V7]] step %[[V8]] {
				// UNROLL-BY-2-NEXT: store %{{.*}}, %[[MEM]][%[[IV]]] : memref<?xf32>
				// UNROLL-BY-2-NEXT: %[[C1_IV:.*]] = constant 1 : index
				// UNROLL-BY-2-NEXT: %[[V9:.*]] = muli %[[STEP]], %[[C1_IV]] : index
				// UNROLL-BY-2-NEXT: %[[V10:.*]] = addi %[[IV]], %[[V9]] : index
				// UNROLL-BY-2-NEXT: store %{{.*}}, %[[MEM]][%[[V10]]] : memref<?xf32>
				// UNROLL-BY-2-NEXT: }
				// UNROLL-BY-2-NEXT: loop.for %[[IV:.*]] = %[[V7]] to %[[UB]] step %[[STEP]] {
				// UNROLL-BY-2-NEXT: store %{{.*}}, %[[MEM]][%[[IV]]] : memref<?xf32>
				// UNROLL-BY-2-NEXT: }
				// UNROLL-BY-2-NEXT: return

				// UNROLL-BY-3-LABEL: func @dynamic_loop_unroll
				// UNROLL-BY-3-SAME: %[[LB:.*0]]: index,
				ftynseUnsubmitted Done Reply Inline Actions Hmm, could you just use the same input (i.e. `@dynamic_loop_unroll`) and match it with different prefixes? All input functions are transformed by all four RUNs and most of them are just ignored in the test. ftynse: Hmm, could you just use the same input (i.e. `@dynamic_loop_unroll`) and match it with…
				// UNROLL-BY-3-SAME: %[[UB:.*1]]: index,
				// UNROLL-BY-3-SAME: %[[STEP:.*2]]: index,
				// UNROLL-BY-3-SAME: %[[MEM:.*3]]: memref<?xf32>
				//
				// UNROLL-BY-3-DAG: %[[V0:.*]] = subi %[[UB]], %[[LB]] : index
				// UNROLL-BY-3-DAG: %[[C1:.*]] = constant 1 : index
				// UNROLL-BY-3-DAG: %[[V1:.*]] = subi %[[STEP]], %[[C1]] : index
				// UNROLL-BY-3-DAG: %[[V2:.*]] = addi %[[V0]], %[[V1]] : index
				// Compute trip count in V3.
				// UNROLL-BY-3-DAG: %[[V3:.*]] = divi_signed %[[V2]], %[[STEP]] : index
				// Store unroll factor in C3.
				// UNROLL-BY-3-DAG: %[[C3:.*]] = constant 3 : index
				// UNROLL-BY-3-DAG: %[[V4:.*]] = remi_signed %[[V3]], %[[C3]] : index
				// UNROLL-BY-3-DAG: %[[V5:.*]] = subi %[[V3]], %[[V4]] : index
				// UNROLL-BY-3-DAG: %[[V6:.*]] = muli %[[V5]], %[[STEP]] : index
				// Compute upper bound of unrolled loop in V7.
				// UNROLL-BY-3-DAG: %[[V7:.*]] = addi %[[LB]], %[[V6]] : index
				// Compute step of unrolled loop in V8.
				// UNROLL-BY-3-DAG: %[[V8:.*]] = muli %[[STEP]], %[[C3]] : index
				// UNROLL-BY-3: loop.for %[[IV:.*]] = %[[LB]] to %[[V7]] step %[[V8]] {
				// UNROLL-BY-3-NEXT: store %{{.*}}, %[[MEM]][%[[IV]]] : memref<?xf32>
				// UNROLL-BY-3-NEXT: %[[C1_IV:.*]] = constant 1 : index
				// UNROLL-BY-3-NEXT: %[[V9:.*]] = muli %[[STEP]], %[[C1_IV]] : index
				// UNROLL-BY-3-NEXT: %[[V10:.*]] = addi %[[IV]], %[[V9]] : index
				// UNROLL-BY-3-NEXT: store %{{.*}}, %[[MEM]][%[[V10]]] : memref<?xf32>
				// UNROLL-BY-3-NEXT: %[[C2_IV:.*]] = constant 2 : index
				// UNROLL-BY-3-NEXT: %[[V11:.*]] = muli %[[STEP]], %[[C2_IV]] : index
				// UNROLL-BY-3-NEXT: %[[V12:.*]] = addi %[[IV]], %[[V11]] : index
				// UNROLL-BY-3-NEXT: store %{{.*}}, %[[MEM]][%[[V12]]] : memref<?xf32>
				// UNROLL-BY-3-NEXT: }
				// UNROLL-BY-3-NEXT: loop.for %[[IV:.*]] = %[[V7]] to %[[UB]] step %[[STEP]] {
				// UNROLL-BY-3-NEXT: store %{{.*}}, %[[MEM]][%[[IV]]] : memref<?xf32>
				// UNROLL-BY-3-NEXT: }
				// UNROLL-BY-3-NEXT: return

				func @dynamic_loop_unroll_outer_by_2(
				%arg0 : index, %arg1 : index, %arg2 : index, %arg3 : index, %arg4 : index,
				%arg5 : index, %arg6: memref<?xf32>) {
				%0 = constant 7.0 : f32
				loop.for %i0 = %arg0 to %arg1 step %arg2 {
				loop.for %i1 = %arg3 to %arg4 step %arg5 {
				store %0, %arg6[%i1] : memref<?xf32>
				}
				}
				return
				}
				// UNROLL-OUTER-BY-2-LABEL: func @dynamic_loop_unroll_outer_by_2
				// UNROLL-OUTER-BY-2-SAME: %[[LB0:.*0]]: index,
				// UNROLL-OUTER-BY-2-SAME: %[[UB0:.*1]]: index,
				// UNROLL-OUTER-BY-2-SAME: %[[STEP0:.*2]]: index,
				// UNROLL-OUTER-BY-2-SAME: %[[LB1:.*3]]: index,
				// UNROLL-OUTER-BY-2-SAME: %[[UB1:.*4]]: index,
				// UNROLL-OUTER-BY-2-SAME: %[[STEP1:.*5]]: index,
				// UNROLL-OUTER-BY-2-SAME: %[[MEM:.*6]]: memref<?xf32>
				//
				// UNROLL-OUTER-BY-2: loop.for %[[IV0:.]] = %[[LB0]] to %{{.}} step %{{.*}} {
				// UNROLL-OUTER-BY-2-NEXT: loop.for %[[IV1:.*]] = %[[LB1]] to %[[UB1]] step %[[STEP1]] {
				// UNROLL-OUTER-BY-2-NEXT: store %{{.*}}, %[[MEM]][%[[IV1]]] : memref<?xf32>
				// UNROLL-OUTER-BY-2-NEXT: }
				// UNROLL-OUTER-BY-2-NEXT: loop.for %[[IV1:.*]] = %[[LB1]] to %[[UB1]] step %[[STEP1]] {
				// UNROLL-OUTER-BY-2-NEXT: store %{{.*}}, %[[MEM]][%[[IV1]]] : memref<?xf32>
				// UNROLL-OUTER-BY-2-NEXT: }
				// UNROLL-OUTER-BY-2-NEXT: }
				// UNROLL-OUTER-BY-2-NEXT: loop.for %[[IV0:.]] = %{{.}} to %[[UB0]] step %[[STEP0]] {
				// UNROLL-OUTER-BY-2-NEXT: loop.for %[[IV1:.*]] = %[[LB1]] to %[[UB1]] step %[[STEP1]] {
				// UNROLL-OUTER-BY-2-NEXT: store %{{.*}}, %[[MEM]][%[[IV1]]] : memref<?xf32>
				// UNROLL-OUTER-BY-2-NEXT: }
				// UNROLL-OUTER-BY-2-NEXT: }
				// UNROLL-OUTER-BY-2-NEXT: return

				func @dynamic_loop_unroll_inner_by_2(
				%arg0 : index, %arg1 : index, %arg2 : index, %arg3 : index, %arg4 : index,
				%arg5 : index, %arg6: memref<?xf32>) {
				%0 = constant 7.0 : f32
				loop.for %i0 = %arg0 to %arg1 step %arg2 {
				loop.for %i1 = %arg3 to %arg4 step %arg5 {
				store %0, %arg6[%i1] : memref<?xf32>
				}
				}
				return
				}
				// UNROLL-INNER-BY-2-LABEL: func @dynamic_loop_unroll_inner_by_2
				// UNROLL-INNER-BY-2-SAME: %[[LB0:.*0]]: index,
				// UNROLL-INNER-BY-2-SAME: %[[UB0:.*1]]: index,
				// UNROLL-INNER-BY-2-SAME: %[[STEP0:.*2]]: index,
				// UNROLL-INNER-BY-2-SAME: %[[LB1:.*3]]: index,
				// UNROLL-INNER-BY-2-SAME: %[[UB1:.*4]]: index,
				// UNROLL-INNER-BY-2-SAME: %[[STEP1:.*5]]: index,
				// UNROLL-INNER-BY-2-SAME: %[[MEM:.*6]]: memref<?xf32>
				//
				// UNROLL-INNER-BY-2: loop.for %[[IV0:.*]] = %[[LB0]] to %[[UB0]] step %[[STEP0]] {
				// UNROLL-INNER-BY-2: loop.for %[[IV1:.]] = %[[LB1]] to %{{.}} step %{{.*}} {
				// UNROLL-INNER-BY-2-NEXT: store %{{.*}}, %[[MEM]][%[[IV1]]] : memref<?xf32>
				// UNROLL-INNER-BY-2-NEXT: %[[C1_IV:.*]] = constant 1 : index
				// UNROLL-INNER-BY-2-NEXT: %[[V0:.*]] = muli %[[STEP1]], %[[C1_IV]] : index
				// UNROLL-INNER-BY-2-NEXT: %[[V1:.*]] = addi %[[IV1]], %[[V0]] : index
				// UNROLL-INNER-BY-2-NEXT: store %{{.*}}, %[[MEM]][%[[V1]]] : memref<?xf32>
				// UNROLL-INNER-BY-2-NEXT: }
				// UNROLL-INNER-BY-2-NEXT: loop.for %[[IV1:.]] = %{{.}} to %[[UB1]] step %[[STEP1]] {
				// UNROLL-INNER-BY-2-NEXT: store %{{.*}}, %[[MEM]][%[[IV1]]] : memref<?xf32>
				// UNROLL-INNER-BY-2-NEXT: }
				// UNROLL-INNER-BY-2-NEXT: }
				// UNROLL-INNER-BY-2-NEXT: return

				// Test that no epilogue clean-up loop is generated because the trip count is
				// a multiple of the unroll factor.
				func @static_loop_unroll_by_2(%arg0 : memref<?xf32>) {
				%0 = constant 7.0 : f32
				%lb = constant 0 : index
				%ub = constant 20 : index
				%step = constant 1 : index
				loop.for %i0 = %lb to %ub step %step {
				store %0, %arg0[%i0] : memref<?xf32>
				}
				return
				}
				// UNROLL-BY-2-LABEL: func @static_loop_unroll_by_2
				// UNROLL-BY-2-SAME: %[[MEM:.*0]]: memref<?xf32>
				//
				// UNROLL-BY-2-DAG: %[[C0:.*]] = constant 0 : index
				// UNROLL-BY-2-DAG: %[[C1:.*]] = constant 1 : index
				// UNROLL-BY-2-DAG: %[[C20:.*]] = constant 20 : index
				// UNROLL-BY-2-DAG: %[[C2:.*]] = constant 2 : index
				// UNROLL-BY-2: loop.for %[[IV:.*]] = %[[C0]] to %[[C20]] step %[[C2]] {
				// UNROLL-BY-2-NEXT: store %{{.*}}, %[[MEM]][%[[IV]]] : memref<?xf32>
				// UNROLL-BY-2-NEXT: %[[C1_IV:.*]] = constant 1 : index
				// UNROLL-BY-2-NEXT: %[[V0:.*]] = muli %[[C1]], %[[C1_IV]] : index
				// UNROLL-BY-2-NEXT: %[[V1:.*]] = addi %[[IV]], %[[V0]] : index
				// UNROLL-BY-2-NEXT: store %{{.*}}, %[[MEM]][%[[V1]]] : memref<?xf32>
				// UNROLL-BY-2-NEXT: }
				// UNROLL-BY-2-NEXT: return

				// Test that epilogue clean up loop is generated (trip count is not
				// a multiple of unroll factor).
				func @static_loop_unroll_by_3(%arg0 : memref<?xf32>) {
				%0 = constant 7.0 : f32
				%lb = constant 0 : index
				%ub = constant 20 : index
				%step = constant 1 : index
				loop.for %i0 = %lb to %ub step %step {
				store %0, %arg0[%i0] : memref<?xf32>
				}
				return
				}

				// UNROLL-BY-3-LABEL: func @static_loop_unroll_by_3
				// UNROLL-BY-3-SAME: %[[MEM:.*0]]: memref<?xf32>
				//
				// UNROLL-BY-3-DAG: %[[C0:.*]] = constant 0 : index
				// UNROLL-BY-3-DAG: %[[C1:.*]] = constant 1 : index
				// UNROLL-BY-3-DAG: %[[C20:.*]] = constant 20 : index
				// UNROLL-BY-3-DAG: %[[C18:.*]] = constant 18 : index
				// UNROLL-BY-3-DAG: %[[C3:.*]] = constant 3 : index
				// UNROLL-BY-3: loop.for %[[IV:.*]] = %[[C0]] to %[[C18]] step %[[C3]] {
				// UNROLL-BY-3-NEXT: store %{{.*}}, %[[MEM]][%[[IV]]] : memref<?xf32>
				// UNROLL-BY-3-NEXT: %[[C1_IV:.*]] = constant 1 : index
				// UNROLL-BY-3-NEXT: %[[V0:.*]] = muli %[[C1]], %[[C1_IV]] : index
				// UNROLL-BY-3-NEXT: %[[V1:.*]] = addi %[[IV]], %[[V0]] : index
				// UNROLL-BY-3-NEXT: store %{{.*}}, %[[MEM]][%[[V1]]] : memref<?xf32>
				// UNROLL-BY-3-NEXT: %[[C2_IV:.*]] = constant 2 : index
				// UNROLL-BY-3-NEXT: %[[V2:.*]] = muli %[[C1]], %[[C2_IV]] : index
				// UNROLL-BY-3-NEXT: %[[V3:.*]] = addi %[[IV]], %[[V2]] : index
				// UNROLL-BY-3-NEXT: store %{{.*}}, %[[MEM]][%[[V3]]] : memref<?xf32>
				// UNROLL-BY-3-NEXT: }
				// UNROLL-BY-3-NEXT: loop.for %[[IV:.*]] = %[[C18]] to %[[C20]] step %[[C1]] {
				// UNROLL-BY-3-NEXT: store %{{.*}}, %[[MEM]][%[[IV]]] : memref<?xf32>
				// UNROLL-BY-3-NEXT: }
				// UNROLL-BY-3-NEXT: return

				// Test that the single iteration epilogue loop body is promoted to the loops
				// containing block.
				func @static_loop_unroll_by_3_promote_epilogue(%arg0 : memref<?xf32>) {
				%0 = constant 7.0 : f32
				%lb = constant 0 : index
				%ub = constant 10 : index
				%step = constant 1 : index
				loop.for %i0 = %lb to %ub step %step {
				store %0, %arg0[%i0] : memref<?xf32>
				}
				return
				}
				// UNROLL-BY-3-LABEL: func @static_loop_unroll_by_3_promote_epilogue
				// UNROLL-BY-3-SAME: %[[MEM:.*0]]: memref<?xf32>
				//
				// UNROLL-BY-3-DAG: %[[C0:.*]] = constant 0 : index
				// UNROLL-BY-3-DAG: %[[C1:.*]] = constant 1 : index
				// UNROLL-BY-3-DAG: %[[C10:.*]] = constant 10 : index
				// UNROLL-BY-3-DAG: %[[C9:.*]] = constant 9 : index
				// UNROLL-BY-3-DAG: %[[C3:.*]] = constant 3 : index
				// UNROLL-BY-3: loop.for %[[IV:.*]] = %[[C0]] to %[[C9]] step %[[C3]] {
				// UNROLL-BY-3-NEXT: store %{{.*}}, %[[MEM]][%[[IV]]] : memref<?xf32>
				// UNROLL-BY-3-NEXT: %[[C1_IV:.*]] = constant 1 : index
				// UNROLL-BY-3-NEXT: %[[V0:.*]] = muli %[[C1]], %[[C1_IV]] : index
				// UNROLL-BY-3-NEXT: %[[V1:.*]] = addi %[[IV]], %[[V0]] : index
				// UNROLL-BY-3-NEXT: store %{{.*}}, %[[MEM]][%[[V1]]] : memref<?xf32>
				// UNROLL-BY-3-NEXT: %[[C2_IV:.*]] = constant 2 : index
				// UNROLL-BY-3-NEXT: %[[V2:.*]] = muli %[[C1]], %[[C2_IV]] : index
				// UNROLL-BY-3-NEXT: %[[V3:.*]] = addi %[[IV]], %[[V2]] : index
				// UNROLL-BY-3-NEXT: store %{{.*}}, %[[MEM]][%[[V3]]] : memref<?xf32>
				// UNROLL-BY-3-NEXT: }
				// UNROLL-BY-3-NEXT: store %{{.*}}, %[[MEM]][%[[C9]]] : memref<?xf32>
				// UNROLL-BY-3-NEXT: return

mlir/test/lib/Transforms/CMakeLists.txt

	add_llvm_library(MLIRTestTransforms			add_llvm_library(MLIRTestTransforms
	TestAllReduceLowering.cpp			TestAllReduceLowering.cpp
	TestBufferPlacement.cpp			TestBufferPlacement.cpp
	TestCallGraph.cpp			TestCallGraph.cpp
	TestConstantFold.cpp			TestConstantFold.cpp
	TestConvertGPUKernelToCubin.cpp			TestConvertGPUKernelToCubin.cpp
	TestDominance.cpp			TestDominance.cpp
	TestLoopFusion.cpp			TestLoopFusion.cpp
	TestGpuMemoryPromotion.cpp			TestGpuMemoryPromotion.cpp
	TestGpuParallelLoopMapping.cpp			TestGpuParallelLoopMapping.cpp
	TestInlining.cpp			TestInlining.cpp
	TestLinalgTransforms.cpp			TestLinalgTransforms.cpp
	TestLiveness.cpp			TestLiveness.cpp
	TestLoopMapping.cpp			TestLoopMapping.cpp
	TestLoopParametricTiling.cpp			TestLoopParametricTiling.cpp
				TestLoopUnrolling.cpp
	TestOpaqueLoc.cpp			TestOpaqueLoc.cpp
	TestMemRefBoundCheck.cpp			TestMemRefBoundCheck.cpp
	TestMemRefDependenceCheck.cpp			TestMemRefDependenceCheck.cpp
	TestMemRefStrideCalculation.cpp			TestMemRefStrideCalculation.cpp
	TestVectorToLoopsConversion.cpp			TestVectorToLoopsConversion.cpp
	TestVectorTransforms.cpp			TestVectorTransforms.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	Show All 31 Lines

mlir/test/lib/Transforms/TestLoopUnrolling.cpp

This file was added.

				//===-------- TestLoopUnrolling.cpp --- loop unrolling test pass ----------===//
				//
				ftynseUnsubmitted Done Reply Inline Actions Nit: pad until 80 characters ftynse: Nit: pad until 80 characters
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements a pass to unroll loops by a specified unroll factor.
				//
				//===----------------------------------------------------------------------===//

				#include "mlir/Dialect/LoopOps/LoopOps.h"
				#include "mlir/IR/Builders.h"
				#include "mlir/Pass/Pass.h"
				#include "mlir/Transforms/LoopUtils.h"
				#include "mlir/Transforms/Passes.h"

				using namespace mlir;

				namespace {

				static unsigned getNestingDepth(Operation *op) {
				Operation *currOp = op;
				rriddleUnsubmitted Not Done Reply Inline Actions nit: Static functions go in the top-level namespace. rriddle: nit: Static functions go in the top-level namespace.
				unsigned depth = 0;
				while ((currOp = currOp->getParentOp())) {
				if (isa<loop::ForOp>(currOp))
				depth++;
				}
				return depth;
				}

				class TestLoopUnrollingPass
				: public PassWrapper<TestLoopUnrollingPass, FunctionPass> {
				public:
				TestLoopUnrollingPass() = default;
				TestLoopUnrollingPass(const TestLoopUnrollingPass &) {}
				explicit TestLoopUnrollingPass(uint64_t unrollFactorParam,
				unsigned loopDepthParam) {
				unrollFactor = unrollFactorParam;
				loopDepth = loopDepthParam;
				}

				void runOnFunction() override {
				FuncOp func = getFunction();
				SmallVector<loop::ForOp, 4> loops;
				func.walk([&](loop::ForOp forOp) {
				ftynseUnsubmitted Done Reply Inline Actions Nit: since the required depth is known upfront, how about just storing the loops of this depth in a vector, instead of filtering the vector of all loops later? ftynse: Nit: since the required depth is known upfront, how about just storing the loops of this depth…
				if (getNestingDepth(forOp) == loopDepth)
				loops.push_back(forOp);
				});
				for (auto loop : loops) {
				loopUnrollByFactor(loop, unrollFactor);
				rriddleUnsubmitted Not Done Reply Inline Actions nit: Drop trivial braces. rriddle: nit: Drop trivial braces.
				}
				}
				Option<uint64_t> unrollFactor{*this, "unroll-factor",
				llvm::cl::desc("Loop unroll factor."),
				llvm::cl::init(1)};
				Option<unsigned> loopDepth{*this, "loop-depth", llvm::cl::desc("Loop depth."),
				llvm::cl::init(0)};
				};
				} // end namespace

				namespace mlir {
				void registerTestLoopUnrollingPass() {
				PassRegistration<TestLoopUnrollingPass>(
				"test-loop-unrolling", "Tests loop unrolling transformation");
				}
				} // namespace mlir

mlir/tools/mlir-opt/mlir-opt.cpp

Show First 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
void registerTestConvertGPUKernelToCubinPass();		void registerTestConvertGPUKernelToCubinPass();
void registerTestDominancePass();		void registerTestDominancePass();
void registerTestFunc();		void registerTestFunc();
void registerTestGpuMemoryPromotionPass();		void registerTestGpuMemoryPromotionPass();
void registerTestLinalgTransforms();		void registerTestLinalgTransforms();
void registerTestLivenessPass();		void registerTestLivenessPass();
void registerTestLoopFusion();		void registerTestLoopFusion();
void registerTestLoopMappingPass();		void registerTestLoopMappingPass();
		void registerTestLoopUnrollingPass();
void registerTestMatchers();		void registerTestMatchers();
void registerTestMemRefDependenceCheck();		void registerTestMemRefDependenceCheck();
void registerTestMemRefStrideCalculation();		void registerTestMemRefStrideCalculation();
void registerTestOpaqueLoc();		void registerTestOpaqueLoc();
void registerTestParallelismDetection();		void registerTestParallelismDetection();
void registerTestGpuParallelLoopMappingPass();		void registerTestGpuParallelLoopMappingPass();
void registerTestVectorConversions();		void registerTestVectorConversions();
void registerTestVectorToLoopsPass();		void registerTestVectorToLoopsPass();
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	#endif
registerTestBufferPlacementPreparationPass();		registerTestBufferPlacementPreparationPass();
registerTestDominancePass();		registerTestDominancePass();
registerTestFunc();		registerTestFunc();
registerTestGpuMemoryPromotionPass();		registerTestGpuMemoryPromotionPass();
registerTestLinalgTransforms();		registerTestLinalgTransforms();
registerTestLivenessPass();		registerTestLivenessPass();
registerTestLoopFusion();		registerTestLoopFusion();
registerTestLoopMappingPass();		registerTestLoopMappingPass();
		registerTestLoopUnrollingPass();
registerTestMatchers();		registerTestMatchers();
registerTestMemRefDependenceCheck();		registerTestMemRefDependenceCheck();
registerTestMemRefStrideCalculation();		registerTestMemRefStrideCalculation();
registerTestOpaqueLoc();		registerTestOpaqueLoc();
registerTestParallelismDetection();		registerTestParallelismDetection();
registerTestGpuParallelLoopMappingPass();		registerTestGpuParallelLoopMappingPass();
registerTestVectorConversions();		registerTestVectorConversions();
registerTestVectorToLoopsPass();		registerTestVectorToLoopsPass();
▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[MLIR][LoopOps] Adds the loop unroll transformation for loop::ForOp.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 262150

mlir/include/mlir/Transforms/LoopUtils.h

mlir/lib/Transforms/Utils/LoopUtils.cpp

mlir/test/Dialect/Loops/loop-unroll.mlir

mlir/test/lib/Transforms/CMakeLists.txt

mlir/test/lib/Transforms/TestLoopUnrolling.cpp

mlir/tools/mlir-opt/mlir-opt.cpp

[MLIR][LoopOps] Adds the loop unroll transformation for loop::ForOp.
ClosedPublic