This is an archive of the discontinued LLVM Phabricator instance.

lib/Transforms/Scalar/LoopUnrollPass.cpp
736–739	Why do we need to move this?
lib/Transforms/Utils/LoopUnrollRuntime.cpp
354–355	This change is unnecessary.
365–367	What is the final expression for `ModVal`? From the code it looks like ModVal = ((BECount % Count) + 1) % Count Do we really need double urem here? If so, could you please add a comment explaining why we need exactly this?
368	Nitpick: dot missing at the end.

evstupac added inline comments.Mar 16 2016, 12:57 PM

lib/Transforms/Scalar/LoopUnrollPass.cpp
736–739	This is better place for the check. Previously "nounroll" attribute was even if we failed to unroll loop. There are plenty of checks inside UnrollLoop that can stop unrolling. Basically, now we set "nounroll" only when we have unrolled a loop.
lib/Transforms/Utils/LoopUnrollRuntime.cpp
354–355	As now the comment is inside "if" it is shifted on 2 spaces. That causes longest string in it to become 81 symbols. So I just make it shorter than 81.
365–367	Potentially BECount + 1 can unsigned overflow, while (BECount % Count) +1 can not. That is why double URem is used. I can try to simplify this to something like this ModVal = ((Count - ModValAdd) >> (ModVal->getSclarType()->getPrimitiveSizeInBits() - 1) & ModValAdd ``` or to select here or somewhere in further combiners. However non-power-of-2 is not frequent case as for runtime unrolling it comes only from user specified pragma.

Minor drive-by comment.

lib/Transforms/Utils/LoopUnrollRuntime.cpp
365–367	Alternatively, you could have `ModVal` as `BECount == -1 ? <constant expr> : ((BECount +nuw 1) % Count`.

evstupac updated this revision to Diff 50886.Mar 16 2016, 3:41 PM

evstupac edited edge metadata.

evstupac removed rL LLVM as the repository for this revision.

evstupac added inline comments.Mar 16 2016, 3:48 PM

lib/Transforms/Utils/LoopUnrollRuntime.cpp
365–367	Yes. That is what I mean by select. I would prefer combiner to decide the correct replacement (maybe based on some architecture properties). The simplification of TripCount % Count to TripCount & (Count - 1) when Count is power-of-2 is pretty obvious and architecture independent. This one is more complicated: ((BECount % Count) + 1) % Count. So I'd let combiner do this if someone find a performance opportunity.

Hi,

This patch LGTM, thanks for working on this!

Michael

lib/Transforms/Scalar/LoopUnrollPass.cpp
736–739	But maybe it's not a bad idea to have a 'nounroll' attribute on a loop that we cannot unroll:) Otherwise, next time we'll just probably spend some time trying again. But I can see arguments for additional attempts to unroll when we have pragma, so if you feel strong about it, please keep it.
lib/Transforms/Utils/LoopUnrollRuntime.cpp
354–355	Oh, makes sense. Phabricator doesn't show whitespace changes, so I missed the change in indentation.
365–367	Thanks for the explanation, it's clear now. I think both `BECount == -1 ? <constant expr> : ((BECount +nuw 1) % Count` and `((BECount % Count) + 1) % Count` are good options here.

This revision is now accepted and ready to land.Mar 17 2016, 12:46 PM

Closed by commit rL264407: Enable non-power-of-2 #pragma unroll counts. (authored by dlkreitz). · Explain WhyMar 25 2016, 7:30 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Transforms/

Scalar/

LoopUnrollPass.cpp

9 lines

Utils/

LoopUnrollRuntime.cpp

48 lines

test/

Transforms/

LoopUnroll/

unroll-pragmas.ll

37 lines

Diff 50782

lib/Transforms/Scalar/LoopUnrollPass.cpp

Show First 20 Lines • Show All 678 Lines • ▼ Show 20 Lines	if (OrigCount > Count) {
"count must divide the trip multiple, "		"count must divide the trip multiple, "
<< TripMultiple << ". Reducing unroll count from "		<< TripMultiple << ". Reducing unroll count from "
<< OrigCount << " to " << Count << ".\n");		<< OrigCount << " to " << Count << ".\n");
}		}
DEBUG(dbgs() << " partially unrolling with count: " << Count << "\n");		DEBUG(dbgs() << " partially unrolling with count: " << Count << "\n");
}		}

if (HasPragma) {		if (HasPragma) {
if (PragmaCount != 0)
// If loop has an unroll count pragma mark loop as unrolled to prevent
// unrolling beyond that requested by the pragma.
SetLoopAlreadyUnrolled(L);

// Emit optimization remarks if we are unable to unroll the loop		// Emit optimization remarks if we are unable to unroll the loop
// as directed by a pragma.		// as directed by a pragma.
DebugLoc LoopLoc = L->getStartLoc();		DebugLoc LoopLoc = L->getStartLoc();
Function *F = Header->getParent();		Function *F = Header->getParent();
LLVMContext &Ctx = F->getContext();		LLVMContext &Ctx = F->getContext();
if (PragmaCount > 0 && DecreasedCountDueToConvergence) {		if (PragmaCount > 0 && DecreasedCountDueToConvergence) {
emitOptimizationRemarkMissed(		emitOptimizationRemarkMissed(
Ctx, DEBUG_TYPE, *F, LoopLoc,		Ctx, DEBUG_TYPE, *F, LoopLoc,
Show All 33 Lines	if (Unrolling != Full && Count < 2) {
return false;		return false;
}		}

// Unroll the loop.		// Unroll the loop.
if (!UnrollLoop(L, Count, TripCount, AllowRuntime, UP.AllowExpensiveTripCount,		if (!UnrollLoop(L, Count, TripCount, AllowRuntime, UP.AllowExpensiveTripCount,
TripMultiple, LI, SE, &DT, &AC, PreserveLCSSA))		TripMultiple, LI, SE, &DT, &AC, PreserveLCSSA))
return false;		return false;

		// If loop has an unroll count pragma mark loop as unrolled to prevent
		// unrolling beyond that requested by the pragma.
		if (HasPragma && PragmaCount != 0)
		SetLoopAlreadyUnrolled(L);
		mzolotukhinUnsubmitted Not Done Reply Inline Actions Why do we need to move this? mzolotukhin: Why do we need to move this?
		evstupacAuthorUnsubmitted Not Done Reply Inline Actions This is better place for the check. Previously "nounroll" attribute was even if we failed to unroll loop. There are plenty of checks inside UnrollLoop that can stop unrolling. Basically, now we set "nounroll" only when we have unrolled a loop. evstupac: This is better place for the check. Previously "nounroll" attribute was even if we failed to…
		mzolotukhinUnsubmitted Not Done Reply Inline Actions But maybe it's not a bad idea to have a 'nounroll' attribute on a loop that we cannot unroll:) Otherwise, next time we'll just probably spend some time trying again. But I can see arguments for additional attempts to unroll when we have pragma, so if you feel strong about it, please keep it. mzolotukhin: But maybe it's not a bad idea to have a 'nounroll' attribute on a loop that we cannot unroll:)…
return true;		return true;
}		}

namespace {		namespace {
class LoopUnroll : public LoopPass {		class LoopUnroll : public LoopPass {
public:		public:
static char ID; // Pass ID, replacement for typeid		static char ID; // Pass ID, replacement for typeid
LoopUnroll(Optional<unsigned> Threshold = None,		LoopUnroll(Optional<unsigned> Threshold = None,
▲ Show 20 Lines • Show All 66 Lines • Show Last 20 Lines

lib/Transforms/Utils/LoopUnrollRuntime.cpp

Show First 20 Lines • Show All 111 Lines • ▼ Show 20 Lines	static void ConnectProlog(Loop L, Value BECount, unsigned Count,

// Create a branch around the original loop, which is taken if there are no		// Create a branch around the original loop, which is taken if there are no
// iterations remaining to be executed after running the prologue.		// iterations remaining to be executed after running the prologue.
Instruction *InsertPt = PrologEnd->getTerminator();		Instruction *InsertPt = PrologEnd->getTerminator();
IRBuilder<> B(InsertPt);		IRBuilder<> B(InsertPt);

assert(Count != 0 && "nonsensical Count!");		assert(Count != 0 && "nonsensical Count!");

// If BECount <u (Count - 1) then (BECount + 1) & (Count - 1) == (BECount + 1)		// If BECount <u (Count - 1) then (BECount + 1) % Count == (BECount + 1)
// (since Count is a power of 2). This means %xtraiter is (BECount + 1) and		// This means %xtraiter is (BECount + 1) and all of the iterations of this
// and all of the iterations of this loop were executed by the prologue. Note		// loop were executed by the prologue. Note that if BECount <u (Count - 1)
// that if BECount <u (Count - 1) then (BECount + 1) cannot unsigned-overflow.		// then (BECount + 1) cannot unsigned-overflow.
Value *BrLoopExit =		Value *BrLoopExit =
B.CreateICmpULT(BECount, ConstantInt::get(BECount->getType(), Count - 1));		B.CreateICmpULT(BECount, ConstantInt::get(BECount->getType(), Count - 1));
BasicBlock *Exit = L->getUniqueExitBlock();		BasicBlock *Exit = L->getUniqueExitBlock();
assert(Exit && "Loop must have a single exit block only");		assert(Exit && "Loop must have a single exit block only");
// Split the exit to maintain loop canonicalization guarantees		// Split the exit to maintain loop canonicalization guarantees
SmallVector<BasicBlock*, 4> Preds(pred_begin(Exit), pred_end(Exit));		SmallVector<BasicBlock*, 4> Preds(pred_begin(Exit), pred_end(Exit));
SplitBlockPredecessors(Exit, Preds, ".unr-lcssa", DT, LI,		SplitBlockPredecessors(Exit, Preds, ".unr-lcssa", DT, LI,
PreserveLCSSA);		PreserveLCSSA);
▲ Show 20 Lines • Show All 182 Lines • ▼ Show 20 Lines	bool llvm::UnrollRuntimeLoopProlog(Loop *L, unsigned Count,
BasicBlock *PH = L->getLoopPreheader();		BasicBlock *PH = L->getLoopPreheader();
BranchInst *PreHeaderBR = cast<BranchInst>(PH->getTerminator());		BranchInst *PreHeaderBR = cast<BranchInst>(PH->getTerminator());
const DataLayout &DL = Header->getModule()->getDataLayout();		const DataLayout &DL = Header->getModule()->getDataLayout();
SCEVExpander Expander(*SE, DL, "loop-unroll");		SCEVExpander Expander(*SE, DL, "loop-unroll");
if (!AllowExpensiveTripCount &&		if (!AllowExpensiveTripCount &&
Expander.isHighCostExpansion(TripCountSC, L, PreHeaderBR))		Expander.isHighCostExpansion(TripCountSC, L, PreHeaderBR))
return false;		return false;

// We only handle cases when the unroll factor is a power of 2.
// Count is the loop unroll factor, the number of extra copies added + 1.
if (!isPowerOf2_32(Count))
return false;

// This constraint lets us deal with an overflowing trip count easily; see the		// This constraint lets us deal with an overflowing trip count easily; see the
// comment on ModVal below.		// comment on ModVal below.
if (Log2_32(Count) > BEWidth)		if (Log2_32(Count) > BEWidth)
return false;		return false;

// If this loop is nested, then the loop unroller changes the code in the		// If this loop is nested, then the loop unroller changes the code in the
// parent loop, so the Scalar Evolution pass needs to be run again.		// parent loop, so the Scalar Evolution pass needs to be run again.
if (Loop *ParentLoop = L->getParentLoop())		if (Loop *ParentLoop = L->getParentLoop())
Show All 9 Lines	bool llvm::UnrollRuntimeLoopProlog(Loop *L, unsigned Count,
// Compute the number of extra iterations required, which is:		// Compute the number of extra iterations required, which is:
// extra iterations = run-time trip count % (loop unroll factor + 1)		// extra iterations = run-time trip count % (loop unroll factor + 1)
Value *TripCount = Expander.expandCodeFor(TripCountSC, TripCountSC->getType(),		Value *TripCount = Expander.expandCodeFor(TripCountSC, TripCountSC->getType(),
PreHeaderBR);		PreHeaderBR);
Value *BECount = Expander.expandCodeFor(BECountSC, BECountSC->getType(),		Value *BECount = Expander.expandCodeFor(BECountSC, BECountSC->getType(),
PreHeaderBR);		PreHeaderBR);

IRBuilder<> B(PreHeaderBR);		IRBuilder<> B(PreHeaderBR);
Value *ModVal = B.CreateAnd(TripCount, Count - 1, "xtraiter");		Value *ModVal;
		if (isPowerOf2_32(Count)) {
// If ModVal is zero, we know that either		ModVal = B.CreateAnd(TripCount, Count - 1, "xtraiter");
// 1. There are no iterations to be run in the prologue loop.		// 1. There are no iterations to be run in the prologue loop.
// OR		// OR
// 2. The addition computing TripCount overflowed.		// 2. The addition computing TripCount overflowed.
//		//
// If (2) is true, we know that TripCount really is (1 << BEWidth) and so the		// If (2) is true, we know that TripCount really is (1 << BEWidth) and so
// number of iterations that remain to be run in the original loop is a		// the number of iterations that remain to be run in the original loop is a
		mzolotukhinUnsubmitted Not Done Reply Inline Actions This change is unnecessary. mzolotukhin: This change is unnecessary.
		evstupacAuthorUnsubmitted Not Done Reply Inline Actions As now the comment is inside "if" it is shifted on 2 spaces. That causes longest string in it to become 81 symbols. So I just make it shorter than 81. evstupac: As now the comment is inside "if" it is shifted on 2 spaces. That causes longest string in it…
		mzolotukhinUnsubmitted Not Done Reply Inline Actions Oh, makes sense. Phabricator doesn't show whitespace changes, so I missed the change in indentation. mzolotukhin: Oh, makes sense. Phabricator doesn't show whitespace changes, so I missed the change in…
// multiple Count == (1 << Log2(Count)) because Log2(Count) <= BEWidth (we		// multiple Count == (1 << Log2(Count)) because Log2(Count) <= BEWidth (we
// explicitly check this above).		// explicitly check this above).
		} else {
		Value *ModValTmp = B.CreateURem(BECount,
		ConstantInt::get(BECount->getType(),
		Count));
		Value *ModValAdd = B.CreateAdd(ModValTmp,
		ConstantInt::get(ModValTmp->getType(), 1));
		// At that point ModValAdd could not overflow as ModValTmp < Count
		ModVal = B.CreateURem(ModValAdd,
		ConstantInt::get(BECount->getType(), Count),
		"xtraiter");
		mzolotukhinUnsubmitted Not Done Reply Inline Actions What is the final expression for `ModVal`? From the code it looks like ModVal = ((BECount % Count) + 1) % Count Do we really need double urem here? If so, could you please add a comment explaining why we need exactly this? mzolotukhin: What is the final expression for `ModVal`? From the code it looks like ``` ModVal = ((BECount…
		evstupacAuthorUnsubmitted Not Done Reply Inline Actions Potentially BECount + 1 can unsigned overflow, while (BECount % Count) +1 can not. That is why double URem is used. I can try to simplify this to something like this ModVal = ((Count - ModValAdd) >> (ModVal->getSclarType()->getPrimitiveSizeInBits() - 1) & ModValAdd ``` or to select here or somewhere in further combiners. However non-power-of-2 is not frequent case as for runtime unrolling it comes only from user specified pragma. evstupac: Potentially BECount + 1 can unsigned overflow, while (BECount % Count) +1 can not. That is why…
		sanjoyUnsubmitted Not Done Reply Inline Actions Alternatively, you could have `ModVal` as `BECount == -1 ? <constant expr> : ((BECount +nuw 1) % Count`. sanjoy: Alternatively, you could have `ModVal` as `BECount == -1 ? <constant expr> : ((BECount +nuw 1)…
		evstupacAuthorUnsubmitted Not Done Reply Inline Actions Yes. That is what I mean by select. I would prefer combiner to decide the correct replacement (maybe based on some architecture properties). The simplification of TripCount % Count to TripCount & (Count - 1) when Count is power-of-2 is pretty obvious and architecture independent. This one is more complicated: ((BECount % Count) + 1) % Count. So I'd let combiner do this if someone find a performance opportunity. evstupac: Yes. That is what I mean by select. I would prefer combiner to decide the correct replacement…
		mzolotukhinUnsubmitted Not Done Reply Inline Actions Thanks for the explanation, it's clear now. I think both `BECount == -1 ? <constant expr> : ((BECount +nuw 1) % Count` and `((BECount % Count) + 1) % Count` are good options here. mzolotukhin: Thanks for the explanation, it's clear now. I think both `BECount == -1 ? <constant expr>…
		// And finaly we get correct and overflow safe remainder counter
		mzolotukhinUnsubmitted Not Done Reply Inline Actions Nitpick: dot missing at the end. mzolotukhin: Nitpick: dot missing at the end.
		}
Value *BranchVal = B.CreateIsNotNull(ModVal, "lcmp.mod");		Value *BranchVal = B.CreateIsNotNull(ModVal, "lcmp.mod");

// Branch to either the extra iterations or the cloned/unrolled loop.		// Branch to either the extra iterations or the cloned/unrolled loop.
// We will fix up the true branch label when adding loop body copies.		// We will fix up the true branch label when adding loop body copies.
B.CreateCondBr(BranchVal, PEnd, PEnd);		B.CreateCondBr(BranchVal, PEnd, PEnd);
assert(PreHeaderBR->isUnconditional() &&		assert(PreHeaderBR->isUnconditional() &&
PreHeaderBR->getSuccessor(0) == PEnd &&		PreHeaderBR->getSuccessor(0) == PEnd &&
"CFG edges in Preheader are not correct");		"CFG edges in Preheader are not correct");
▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

test/Transforms/LoopUnroll/unroll-pragmas.ll

Show First 20 Lines • Show All 316 Lines • ▼ Show 20 Lines	for.body: ; preds = %entry, %for.body
%lftr.wideiv = trunc i64 %indvars.iv.next to i32		%lftr.wideiv = trunc i64 %indvars.iv.next to i32
%exitcond = icmp eq i32 %lftr.wideiv, %b		%exitcond = icmp eq i32 %lftr.wideiv, %b
br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !15		br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !15

for.end: ; preds = %for.body, %entry		for.end: ; preds = %for.body, %entry
ret void		ret void
}		}
!15 = !{!15, !14}		!15 = !{!15, !14}

		; #pragma clang loop unroll_count(3)
		; Loop has a runtime trip count. Runtime unrolling should occur and loop
		; should be duplicated (original and 3x unrolled).
		;
		; CHECK-LABEL: @runtime_loop_with_count3(
		; CHECK: for.body.prol:
		; CHECK: store
		; CHECK-NOT: store
		; CHECK: br i1
		; CHECK: for.body
		; CHECK: store
		; CHECK: store
		; CHECK: store
		; CHECK-NOT: store
		; CHECK: br i1
		define void @runtime_loop_with_count3(i32* nocapture %a, i32 %b) {
		entry:
		%cmp3 = icmp sgt i32 %b, 0
		br i1 %cmp3, label %for.body, label %for.end, !llvm.loop !16

		for.body: ; preds = %entry, %for.body
		%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
		%arrayidx = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
		%0 = load i32, i32* %arrayidx, align 4
		%inc = add nsw i32 %0, 1
		store i32 %inc, i32* %arrayidx, align 4
		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
		%lftr.wideiv = trunc i64 %indvars.iv.next to i32
		%exitcond = icmp eq i32 %lftr.wideiv, %b
		br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !16

		for.end: ; preds = %for.body, %entry
		ret void
		}
		!16 = !{!16, !17}
		!17 = !{!"llvm.loop.unroll.count", i32 3}

This is an archive of the discontinued LLVM Phabricator instance.

Enable non-power-of-2 pragma unroll countsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 50782

lib/Transforms/Scalar/LoopUnrollPass.cpp

lib/Transforms/Utils/LoopUnrollRuntime.cpp

test/Transforms/LoopUnroll/unroll-pragmas.ll

Enable non-power-of-2 pragma unroll counts
ClosedPublic