Download Raw Diff

Details

Reviewers

congzhe
mkazantsev
reames
nikic

Summary

This patch improves on https://reviews.llvm.org/D110587. To summarize
the patch, given backedge-taken count BC, trip count TC is BC + 1.
However, we don't know if BC we might overflow. So the patch modifies TC
computation to 1 + zext(BC).

This patch only adds the zext if necessary by looking at the constant
range. If we can determine that BC cannot be the max value for its
bitwidth, then we know adding 1 will not overflow, and the zext is not
needed. We apply loop guards before computing TC to get more data.

The primary motivation is to support my work on more precise trip
multiples in https://reviews.llvm.org/D141823. For example:

void test(unsigned n)
  __builtin_assume(n % 6 == 0);
  for (unsigned i = 0; i < n; ++i)
    foo();

Prior to this patch, we had `TC = 1 + zext(-1 + 6 * ((6 umax %n) /u
6))<nuw>`. SCEV range computation is able to determine that the BC
cannot be the max value, so the zext is not needed. The result is `TC
-> (6 * ((6 umax %n) /u 6))<nuw>`. From here, we would be able to
determine that %n is a multiple of 6.

There was one change in LoopCacheAnalysis/LoopInterchange required.
Before this patch, if a loop has BC = false, it would compute `TC -> 1 +
zext(false) -> 1`, which was fine. After this patch, it computes `TC -> 1
+ false = true`. CacheAnalysis would then sign extend the true, which
was not the intended the behavior. I modified CacheAnalysis such that
it would only zero extend trip counts.

This patch is not NFC, but also does not change any SCEV outputs. I
would like to get this patch out first to make work with trip multiples
easier.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

caojoshua created this revision.Mar 29 2023, 12:17 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 29 2023, 12:17 AM

Herald added subscribers: javed.absar, hiraditya. · View Herald Transcript

caojoshua added reviewers: congzhe, mkazantsev, reames, nikic.Mar 29 2023, 12:19 AM

Herald added a subscriber: StephenFan. · View Herald TranscriptMar 29 2023, 12:19 AM

caojoshua updated this revision to Diff 509232.Mar 29 2023, 12:22 AM

caojoshua edited the summary of this revision. (Show Details)

This comment was removed by caojoshua.

caojoshua published this revision for review.Mar 29 2023, 12:25 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 29 2023, 12:25 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B222422: Diff 509232.Mar 29 2023, 6:12 AM

Two levels of comments here.

First, you can probably use willNotOverflow(Opcode, LHS, RHS) here instead.

Second, I think you're solving the wrong problem with this patch. If "1 + zext((6 * %N) - 1)" does not simplify to "zext((6 * %N))" when truncating the constant and folding the constant produces the same answer, that seems like a SCEV folding bug, and is probably the more general fix. This would require the same basic proof, but inside getAddExpr. Oddly, we don't seem to have either the zext or sext case there already.

First, you can probably use willNotOverflow(Opcode, LHS, RHS) here instead.

Thanks. I'll try this.

Second, I think you're solving the wrong problem with this patch. If "1 + zext((6 * %N) - 1)" does not simplify to "zext((6 * %N))" when truncating the constant and folding the constant produces the same answer, that seems like a SCEV folding bug, and is probably the more general fix. This would require the same basic proof, but inside getAddExpr. Oddly, we don't seem to have either the zext or sext case there already.

Yes, I agree this is a general fix that would be beneficial for SCEV. However, I don't think its the same problem. You recommendation would give us zext(6 * %N), while my current solution gives us 6 * %N. I would prefer my result, but in practice, it probably does not matter. If you feel this extra code is not worth it, I'll add the changes to getAddExpr()

Use willNotOverflow().

Harbormaster completed remote builds in B222663: Diff 509561.Mar 30 2023, 2:18 AM

nikic added inline comments.Mar 30 2023, 6:56 AM

llvm/lib/Analysis/LoopCacheAnalysis.cpp
337	The changes to LoopCacheAnalysis should be split out into a separate patch. I'm also wondering why this changes some but not all of the AnyExtends...
llvm/lib/Analysis/ScalarEvolution.cpp
8048	I'd prefer this to go back to the previous version. willNotOverflow() is very expensive and not needed here.

caojoshua added inline comments.Mar 30 2023, 10:07 PM

llvm/lib/Analysis/LoopCacheAnalysis.cpp
337	The places where I changed sext->zext are SCEVs that are computed from TripCount. In this case, RefCost is set to TripCount in the else block above. I thought it made sense to have a single patch for these two changes. It would be harder to motivate the changes if I only submitted a patch for LoopCacheAnalysis. And the whole patch is small overall, so its not hard to read.

Revert back to checking overflow manually

caojoshua marked an inline comment as done.Mar 30 2023, 10:51 PM

caojoshua added inline comments.

llvm/lib/Analysis/ScalarEvolution.cpp
8048	I reverted back to previous version. I can see that willNotOverflow() is a bit overkill here. @reames what do you think?

Harbormaster completed remote builds in B222917: Diff 509905.Mar 30 2023, 11:58 PM

nikic added inline comments.Mar 31 2023, 1:30 AM

llvm/lib/Analysis/LoopCacheAnalysis.cpp
337	But aren't the two AnyExtends in the `else` block above also on trip counts?

caojoshua added inline comments.Apr 1 2023, 1:07 AM

llvm/lib/Analysis/LoopCacheAnalysis.cpp
337	You're right. I think its just lack of test coverage in LoopCacheAnalysis / LoopInterchange, and that we should be using `zext` in those cases. These cases don't come up very often. It's specifically when `BackedgeTakenCount = false`, or when there is a dead loop. The code we are talking about only applies when there are >=3 subscripts, and there probably just is not a test for it yet. When I get the chance, I'll see if I can create a breaking test.

caojoshua mentioned this in rGa400c6ac0337: [LoopInterchange] Add GEP with 3 indices test for pr57148.Apr 2 2023, 12:23 AM

caojoshua added inline comments.Apr 2 2023, 12:25 AM

llvm/lib/Analysis/LoopCacheAnalysis.cpp
337	Recall the issue only occurs when `BackedgeTakenCount = False`, resulting in `TripCount = True`. For Type *WiderType = SE.getWiderType(RefCost->getType(), TripCount->getType()); The WiderType is always going to just be 1. And the loop just multiplies `True x True = True`. We end up returning `zext(True) = 1`. If I didn't make the change here, we would have return `sext(True) = -1`. I'm gonna push a change to make those zero extends anyway. The tests still pass. I think zero extends make sense here because the returned RefCost is always positive, or -1 if the cost is invalid. For reference, I pushed a test that helped me debug this issue.

Convert more sign extend to zero extend

Harbormaster completed remote builds in B223224: Diff 510313.Apr 2 2023, 1:10 AM

ping

LGTM

This revision is now accepted and ready to land.Apr 10 2023, 12:41 AM

Closed by https://github.com/llvm/llvm-project/commit/585742cbfccd734b19c75dff9709b20367506668 (bots not finding this for some reason)

nikic mentioned this in D147355: [LV] Optimize trip count SCEV..Apr 12 2023, 12:02 PM

reames mentioned this in D148661: [SCEV] Common code for computing trip count in a fixed type [NFC-ish].Apr 18 2023, 2:14 PM

reames mentioned this in rG09d879d060ed: [SCEV] Common code for computing trip count in a fixed type [NFC-ish].Apr 25 2023, 12:05 PM

Diff 510313

llvm/lib/Analysis/LoopCacheAnalysis.cpp

Show First 20 Lines • Show All 291 Lines • ▼ Show 20 Lines	CacheCostTy IndexedReference::computeRefCost(const Loop &L,
if (isConsecutive(L, Stride, CLS)) {		if (isConsecutive(L, Stride, CLS)) {
// If the indexed reference is 'consecutive' the cost is		// If the indexed reference is 'consecutive' the cost is
// (TripCount*Stride)/CLS.		// (TripCount*Stride)/CLS.
assert(Stride != nullptr &&		assert(Stride != nullptr &&
"Stride should not be null for consecutive access!");		"Stride should not be null for consecutive access!");
Type *WiderType = SE.getWiderType(Stride->getType(), TripCount->getType());		Type *WiderType = SE.getWiderType(Stride->getType(), TripCount->getType());
const SCEV *CacheLineSize = SE.getConstant(WiderType, CLS);		const SCEV *CacheLineSize = SE.getConstant(WiderType, CLS);
Stride = SE.getNoopOrAnyExtend(Stride, WiderType);		Stride = SE.getNoopOrAnyExtend(Stride, WiderType);
TripCount = SE.getNoopOrAnyExtend(TripCount, WiderType);		TripCount = SE.getNoopOrZeroExtend(TripCount, WiderType);
const SCEV *Numerator = SE.getMulExpr(Stride, TripCount);		const SCEV *Numerator = SE.getMulExpr(Stride, TripCount);
RefCost = SE.getUDivExpr(Numerator, CacheLineSize);		RefCost = SE.getUDivExpr(Numerator, CacheLineSize);

LLVM_DEBUG(dbgs().indent(4)		LLVM_DEBUG(dbgs().indent(4)
<< "Access is consecutive: RefCost=(TripCount*Stride)/CLS="		<< "Access is consecutive: RefCost=(TripCount*Stride)/CLS="
<< *RefCost << "\n");		<< *RefCost << "\n");
} else {		} else {
// If the indexed reference is not 'consecutive' the cost is proportional to		// If the indexed reference is not 'consecutive' the cost is proportional to
Show All 9 Lines	if (isConsecutive(L, Stride, CLS)) {
assert(Index >= 0 && "Cound not locate a valid Index");		assert(Index >= 0 && "Cound not locate a valid Index");

for (unsigned I = Index + 1; I < getNumSubscripts() - 1; ++I) {		for (unsigned I = Index + 1; I < getNumSubscripts() - 1; ++I) {
const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(getSubscript(I));		const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(getSubscript(I));
assert(AR && AR->getLoop() && "Expecting valid loop");		assert(AR && AR->getLoop() && "Expecting valid loop");
const SCEV *TripCount =		const SCEV *TripCount =
computeTripCount(AR->getLoop(), Sizes.back(), SE);		computeTripCount(AR->getLoop(), Sizes.back(), SE);
Type *WiderType = SE.getWiderType(RefCost->getType(), TripCount->getType());		Type *WiderType = SE.getWiderType(RefCost->getType(), TripCount->getType());
RefCost = SE.getMulExpr(SE.getNoopOrAnyExtend(RefCost, WiderType),		RefCost = SE.getMulExpr(SE.getNoopOrZeroExtend(RefCost, WiderType),
SE.getNoopOrAnyExtend(TripCount, WiderType));		SE.getNoopOrZeroExtend(TripCount, WiderType));
}		}

LLVM_DEBUG(dbgs().indent(4)		LLVM_DEBUG(dbgs().indent(4)
<< "Access is not consecutive: RefCost=" << *RefCost << "\n");		<< "Access is not consecutive: RefCost=" << *RefCost << "\n");
}		}
assert(RefCost && "Expecting a valid RefCost");		assert(RefCost && "Expecting a valid RefCost");

// Attempt to fold RefCost into a constant.		// Attempt to fold RefCost into a constant.
if (auto ConstantCost = dyn_cast<SCEVConstant>(RefCost))		if (auto ConstantCost = dyn_cast<SCEVConstant>(RefCost))
return ConstantCost->getValue()->getSExtValue();		return ConstantCost->getValue()->getZExtValue();
		nikicUnsubmitted Not Done Reply Inline Actions The changes to LoopCacheAnalysis should be split out into a separate patch. I'm also wondering why this changes some but not all of the AnyExtends... nikic: The changes to LoopCacheAnalysis should be split out into a separate patch. I'm also wondering…
		caojoshuaAuthorUnsubmitted Done Reply Inline Actions The places where I changed sext->zext are SCEVs that are computed from TripCount. In this case, RefCost is set to TripCount in the else block above. I thought it made sense to have a single patch for these two changes. It would be harder to motivate the changes if I only submitted a patch for LoopCacheAnalysis. And the whole patch is small overall, so its not hard to read. caojoshua: The places where I changed sext->zext are SCEVs that are computed from TripCount. In this case…
		nikicUnsubmitted Not Done Reply Inline Actions But aren't the two AnyExtends in the `else` block above also on trip counts? nikic: But aren't the two AnyExtends in the `else` block above also on trip counts?
		caojoshuaAuthorUnsubmitted Done Reply Inline Actions You're right. I think its just lack of test coverage in LoopCacheAnalysis / LoopInterchange, and that we should be using `zext` in those cases. These cases don't come up very often. It's specifically when `BackedgeTakenCount = false`, or when there is a dead loop. The code we are talking about only applies when there are >=3 subscripts, and there probably just is not a test for it yet. When I get the chance, I'll see if I can create a breaking test. caojoshua: You're right. I think its just lack of test coverage in LoopCacheAnalysis / LoopInterchange…
		caojoshuaAuthorUnsubmitted Done Reply Inline Actions Recall the issue only occurs when `BackedgeTakenCount = False`, resulting in `TripCount = True`. For Type WiderType = SE.getWiderType(RefCost->getType(), TripCount->getType()); The WiderType is always going to just be 1. And the loop just multiplies `True x True = True`. We end up returning `zext(True) = 1`. If I didn't make the change here, we would have return `sext(True) = -1`. I'm gonna push a change to make those zero extends anyway. The tests still pass. I think zero extends make sense here because the returned RefCost is always positive, or -1 if the cost is invalid. For reference, I pushed a test that helped me debug this issue. caojoshua:* Recall the issue only occurs when `BackedgeTakenCount = False`, resulting in `TripCount = True`.

LLVM_DEBUG(dbgs().indent(4)		LLVM_DEBUG(dbgs().indent(4)
<< "RefCost is not a constant! Setting to RefCost=InvalidCost "		<< "RefCost is not a constant! Setting to RefCost=InvalidCost "
"(invalid value).\n");		"(invalid value).\n");

return CacheCost::InvalidCost;		return CacheCost::InvalidCost;
}		}

▲ Show 20 Lines • Show All 395 Lines • Show Last 20 Lines

llvm/lib/Analysis/ScalarEvolution.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 8,039 Lines • ▼ Show 20 Lines	const SCEV ScalarEvolution::getTripCountFromExitCount(const SCEV ExitCount,
if (isa<SCEVCouldNotCompute>(ExitCount))		if (isa<SCEVCouldNotCompute>(ExitCount))
return getCouldNotCompute();		return getCouldNotCompute();

auto *ExitCountType = ExitCount->getType();		auto *ExitCountType = ExitCount->getType();
assert(ExitCountType->isIntegerTy());		assert(ExitCountType->isIntegerTy());

if (!Extend)		if (!Extend)
return getAddExpr(ExitCount, getOne(ExitCountType));		return getAddExpr(ExitCount, getOne(ExitCountType));

		nikicUnsubmitted Not Done Reply Inline Actions I'd prefer this to go back to the previous version. willNotOverflow() is very expensive and not needed here. nikic: I'd prefer this to go back to the previous version. willNotOverflow() is very expensive and not…
		caojoshuaAuthorUnsubmitted Done Reply Inline Actions I reverted back to previous version. I can see that willNotOverflow() is a bit overkill here. @reames what do you think? caojoshua: I reverted back to previous version. I can see that willNotOverflow() is a bit overkill here.
		ConstantRange ExitCountRange =
		getRangeRef(ExitCount, RangeSignHint::HINT_RANGE_UNSIGNED);
		if (!ExitCountRange.contains(
		APInt::getMaxValue(ExitCountRange.getBitWidth())))
		return getAddExpr(ExitCount, getOne(ExitCountType));

auto *WiderType = Type::getIntNTy(ExitCountType->getContext(),		auto *WiderType = Type::getIntNTy(ExitCountType->getContext(),
1 + ExitCountType->getScalarSizeInBits());		1 + ExitCountType->getScalarSizeInBits());
return getAddExpr(getNoopOrZeroExtend(ExitCount, WiderType),		return getAddExpr(getNoopOrZeroExtend(ExitCount, WiderType),
getOne(WiderType));		getOne(WiderType));
}		}

static unsigned getConstantTripCount(const SCEVConstant *ExitCount) {		static unsigned getConstantTripCount(const SCEVConstant *ExitCount) {
if (!ExitCount)		if (!ExitCount)
▲ Show 20 Lines • Show All 166 Lines • ▼ Show 20 Lines
}		}

unsigned ScalarEvolution::getSmallConstantTripMultiple(const Loop *L,		unsigned ScalarEvolution::getSmallConstantTripMultiple(const Loop *L,
const SCEV *ExitCount) {		const SCEV *ExitCount) {
if (ExitCount == getCouldNotCompute())		if (ExitCount == getCouldNotCompute())
return 1;		return 1;

// Get the trip count		// Get the trip count
const SCEV *TCExpr = getTripCountFromExitCount(ExitCount);		const SCEV *TCExpr = getTripCountFromExitCount(applyLoopGuards(ExitCount, L));

const SCEVConstant *TC = dyn_cast<SCEVConstant>(TCExpr);		const SCEVConstant *TC = dyn_cast<SCEVConstant>(TCExpr);
if (!TC)		if (!TC)
// Attempt to factor more general cases. Returns the greatest power of		// Attempt to factor more general cases. Returns the greatest power of
// two divisor. If overflow happens, the trip count expression is still		// two divisor. If overflow happens, the trip count expression is still
// divisible by the greatest power of 2 divisor returned.		// divisible by the greatest power of 2 divisor returned.
return 1U << std::min((uint32_t)31,		return 1U << std::min((uint32_t)31, GetMinTrailingZeros(TCExpr));
GetMinTrailingZeros(applyLoopGuards(TCExpr, L)));

ConstantInt *Result = TC->getValue();		ConstantInt *Result = TC->getValue();

// Guard against huge trip counts (this requires checking		// Guard against huge trip counts (this requires checking
// for zero to handle the case where the trip count == -1 and the		// for zero to handle the case where the trip count == -1 and the
// addition wraps).		// addition wraps).
if (!Result \|\| Result->getValue().getActiveBits() > 32 \|\|		if (!Result \|\| Result->getValue().getActiveBits() > 32 \|\|
Result->getValue().getActiveBits() == 0)		Result->getValue().getActiveBits() == 0)
▲ Show 20 Lines • Show All 7,167 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SCEV] When computing trip count, only zext if necessary
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 510313

llvm/lib/Analysis/LoopCacheAnalysis.cpp

llvm/lib/Analysis/ScalarEvolution.cpp

This is an archive of the discontinued LLVM Phabricator instance.

[SCEV] When computing trip count, only zext if necessaryClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 510313

llvm/lib/Analysis/LoopCacheAnalysis.cpp

llvm/lib/Analysis/ScalarEvolution.cpp

[SCEV] When computing trip count, only zext if necessary
ClosedPublic