Download Raw Diff

Details

Reviewers

bmahjour
Meinersbur

Group Reviewers

Restricted Project

Commits

rG05ccde8023a6: [LoopCacheAnalysis] Fix a type mismatch problem in cost calculation

Summary

As reported in https://reviews.llvm.org/rGb941857b40edd7f3f3a9ec2ec85a26db24739774#1100674, there is a loop cache analysis bug exposed by the recent loop interchange new cost model patch. In isConsecutive() from LoopCacheAnalysis.cpp, sometimes the SCEV variables Coeff and ElemSize may not match, e.g., when there is no target datalayout provided in an IR. The mismatch would cause SCEV failures when multiplying Coeff with ElemSize.

As discussed in the loopopt meeting, the fix in this patch is to extend the type of both Coeff and ElemSize to whichever is wider in those two variables. The IR reported in https://reviews.llvm.org/rGb941857b40edd7f3f3a9ec2ec85a26db24739774#1100674 is added in loop cache analysis tests.

As a clean-up, the Stride variable in computeRefCost() has been already computed in isConsecutive(), so I remove the duplicate calculations of Stride which also helps this patch to be more compact.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

congzhe created this revision.Jun 29 2022, 10:27 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2022, 10:27 PM

Herald added subscribers: javed.absar, hiraditya, nemanjai. · View Herald Transcript

congzhe requested review of this revision.Jun 29 2022, 10:27 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2022, 10:27 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

congzhe edited the summary of this revision. (Show Details)Jun 29 2022, 10:30 PM

congzhe added reviewers: bmahjour, Meinersbur, Restricted Project.

congzhe added a project: Restricted Project.

Regarding Michael's question that whether SE.getNoopOrAnyExtend() is signed extension or unsigned extension: it is actually well handled in ScalarEvolution::getAnyExtendExpr() where it could do either signed or unsigned extension depending on the actual SCEV type of the value we want to extend. I'm wondering if it answers your question? @Meinersbur

congzhe mentioned this in rGb941857b40ed: [LoopInterchange] New cost model for loop interchange.Jun 29 2022, 10:37 PM

congzhe updated this revision to Diff 441268.Jun 29 2022, 10:49 PM

Harbormaster completed remote builds in B172942: Diff 441268.Jun 29 2022, 11:35 PM

In D128877#3621209, @congzhe wrote:

Regarding Michael's question that whether SE.getNoopOrAnyExtend() is signed extension or unsigned extension: it is actually well handled in ScalarEvolution::getAnyExtendExpr() where it could do either signed or unsigned extension depending on the actual SCEV type of the value we want to extend. I'm wondering if it answers your question? @Meinersbur

Whether to use sext or zext can make a semantic difference. Say we have an element size of i16 4 and a stride of -1 (going backwards) in 8-bit precision. That represented as i8 255. zero-extension of that is i16 255, and multiplying both gives i16 1020 which should be -4 (i16 0xFFFC).

getNoopOrAnyExtend() seems to prefer zero-extension over sign-extension.

In D128877#3622951, @Meinersbur wrote:

In D128877#3621209, @congzhe wrote:

Regarding Michael's question that whether SE.getNoopOrAnyExtend() is signed extension or unsigned extension: it is actually well handled in ScalarEvolution::getAnyExtendExpr() where it could do either signed or unsigned extension depending on the actual SCEV type of the value we want to extend. I'm wondering if it answers your question? @Meinersbur

Whether to use sext or zext can make a semantic difference. Say we have an element size of i16 4 and a stride of -1 (going backwards) in 8-bit precision. That represented as i8 255. zero-extension of that is i16 255, and multiplying both gives i16 1020 which should be -4 (i16 0xFFFC).

getNoopOrAnyExtend() seems to prefer zero-extension over sign-extension.

It looks like the case you mentioned has been handled correctly in ScalarEvolution::getAnyExtendExpr() (which is called from SE.getNoopOrAnyExtend()) by the following piece of code?

// Sign-extend negative constants.
if (const SCEVConstant *SC = dyn_cast<SCEVConstant>(Op))
  if (SC->getAPInt().isNegative())
    return getSignExtendExpr(Op, Ty);

Afaik getNoopOrAnyExtend() is defined as "If the type must be extended, it is extended with unspecified bits.". So you can't rely on it doing zero extend or sign extend. The extended bits are undefined.

That is probably OK in use-cases such as how getNoopOrAnyExtend is used in ScalarEvolutionExpander, when it is truncating the result from the add/mul expr that is using the extended size (i.e. the upper bits are of no interest anyway).
But I do not really see how the usage of getNoopOrAnyExtend() in LoopCacheAnalysis is safe in a similar manner. Those "undefined" bits seem to be significant both in the new Stride calculation in this patch, as well as in the already present code that for example is doing

Stride = SE.getNoopOrAnyExtend(Stride, WiderType);
TripCount = SE.getNoopOrAnyExtend(TripCount, WiderType);
const SCEV *Numerator = SE.getMulExpr(Stride, TripCount);
RefCost = SE.getUDivExpr(Numerator, CacheLineSize);

Or am I missing something?

@bjope I think you are correct. This is what the source code documentation says:

/// Return a SCEV corresponding to a conversion of the input value to the
/// specified type. If the type must be extended, it is extended with
/// unspecified bits. The conversion must not be narrowing.
const SCEV *getNoopOrAnyExtend(const SCEV *V, Type *Ty);

@congzhe getAnyExtendExpr has some heuristics for which kind of extension is better for some definition of "better", but that doesn't mean it will find the correct one every time. The test case you added contains zext. Shouldn't that somehow indicate that zext is to be used in this case?

Hi folks, my apologies for the delay, and thanks for the input. I do agree with you that we need to treat this scev expansion more carefully, although for some places where getNoopOrAnyExtend() is called (like the piece of code that Bjorn posted above), both Stride and TripCount are positive so it might be straightforward to use SE.getNoopOrZeroExtend(), or maybe just SE.getNoopOrAnyExtend() as how it is used now.

Motivated by @Meinersbur Michael's comment that we might just do zext for the test case in this patch, looking into the function isConsecutive(), I'm thinking we can change this patch to the following code in order to handle the expansions more carefully. What do you think?

const SCEV *Coeff = getLastCoefficient();
const SCEV *ElemSize = Sizes.back();
Type *WiderType = SE.getWiderType(Coeff->getType(), ElemSize->getType());
Stride = SE.getMulExpr(SE.isKnownNegative(Coeff)
                           ? SE.getNoopOrSignExtend(Coeff, WiderType)
                           : SE.getNoopOrZeroExtend(Coeff, WiderType),
                       SE.isKnownNegative(ElemSize)
                           ? SE.getNoopOrSignExtend(ElemSize, WiderType)
                           : SE.getNoopOrZeroExtend(ElemSize, WiderType));

As a side note, in order not to block @bjope Bjorn's work, I'd like to mention that if we add a target datalayout line to the IR that crashes (https://reviews.llvm.org/rGb941857b40edd7f3f3a9ec2ec85a26db24739774#1100674), it would work fine. Something like target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64" is sufficient. I mentioned it in the summary of this patch but just in case if you missed it. Hope this could provide a workaround for now.

I don't think we can rely on isKnownNegative

The SCEV might be sometimes negative, sometime positive, depending on some runtime value (e.g. stride is multiplied by a function argument). This is, if isKnownPositive and isKnownNegative both return false, we'd need to bail out.
isKnownNegative already assumes that the integer is signed. With that assumption, sext would always be correct (which is equivalent to zext for positive integers)

Updated the patch according to the discussion in the loopopt meeting. @Meinersbur I would appreciate it if you could take a second look, thanks a lot!

Harbormaster completed remote builds in B175208: Diff 444376.Jul 13 2022, 3:05 PM

uabelho added a subscriber: uabelho.Jul 18 2022, 1:57 AM

Meinersbur added inline comments.Jul 18 2022, 12:47 PM

llvm/lib/Analysis/LoopCacheAnalysis.cpp

479–480

Could you make it make more explicit that this is known-incorrect for some cases?
Suggestion:

// FIXME: This assumes that all values are signed integers which may be incorrect in unusual codes and incorrectly use sext instead of zext.
// for (uint32_t i = 256; i < 512; ++i) {
//   uint8_t trunc = i;
//   A[i] = 42;
// }
// This consecutively iterates twice over A. If `trunc` is sign-extended, we would conclude that this may iterate backwards over the array.
// However, LoopCacheAnalysis is heuristic anyway and transformations must not result in wrong optimizations if the heuristic was incorrect.

Thanks Michael for your suggestion, I've updated the patch accordingly.

I'm just wondering if you meant A[trunc]=42 in your example? Also I'm a bit confused if the loop actually iterates over A twice? It looks to me the variable i loops from 0x0100 to 0x01FF, hence trunc loops from A[0x00] to A[0xFF] so seems like it just iterates once?

Harbormaster completed remote builds in B176107: Diff 445613.Jul 18 2022, 2:59 PM

In D128877#3660863, @congzhe wrote:

Thanks Michael for your suggestion, I've updated the patch accordingly.

I'm just wondering if you meant A[trunc]=42 in your example? Also I'm a bit confused if the loop actually iterates over A twice? It looks to me the variable i loops from 0x0100 to 0x01FF, hence trunc loops from A[0x00] to A[0xFF] so seems like it just iterates once?

Both you remarks are correct. You already fixed the first. I started with a loop for (uint32_t i = 0; i < 512; ++i) which would iterate twice over the array (once forward, once backward) but then found it overcomplicates things but forgot to change it everywhere. Could you fix it either way? After that, LGTM.

This revision is now accepted and ready to land.Jul 19 2022, 12:59 PM

This revision was landed with ongoing or failed builds.Jul 20 2022, 10:58 PM

Closed by commit rG05ccde8023a6: [LoopCacheAnalysis] Fix a type mismatch problem in cost calculation (authored by congzhe). · Explain Why

This revision was automatically updated to reflect the committed changes.

congzhe added a commit: rG05ccde8023a6: [LoopCacheAnalysis] Fix a type mismatch problem in cost calculation.

Diff 446360

llvm/include/llvm/Analysis/LoopCacheAnalysis.h

Show First 20 Lines • Show All 102 Lines • ▼ Show 20 Lines	bool tryDelinearizeFixedSize(const SCEV *AccessFn,
SmallVectorImpl<const SCEV *> &Subscripts);		SmallVectorImpl<const SCEV *> &Subscripts);

/// Return true if the index reference is invariant with respect to loop \p L.		/// Return true if the index reference is invariant with respect to loop \p L.
bool isLoopInvariant(const Loop &L) const;		bool isLoopInvariant(const Loop &L) const;

/// Return true if the indexed reference is 'consecutive' in loop \p L.		/// Return true if the indexed reference is 'consecutive' in loop \p L.
/// An indexed reference is 'consecutive' if the only coefficient that uses		/// An indexed reference is 'consecutive' if the only coefficient that uses
/// the loop induction variable is the rightmost one, and the access stride is		/// the loop induction variable is the rightmost one, and the access stride is
/// smaller than the cache line size \p CLS.		/// smaller than the cache line size \p CLS. Provide a valid \p Stride value
bool isConsecutive(const Loop &L, unsigned CLS) const;		/// if the indexed reference is 'consecutive'.
		bool isConsecutive(const Loop &L, const SCEV *&Stride, unsigned CLS) const;

/// Retrieve the index of the subscript corresponding to the given loop \p		/// Retrieve the index of the subscript corresponding to the given loop \p
/// L. Return a zero-based positive index if the subscript index is		/// L. Return a zero-based positive index if the subscript index is
/// succesfully located and a negative value otherwise. For example given the		/// succesfully located and a negative value otherwise. For example given the
/// indexed reference 'A[i][2j+1][3k+2]', the call		/// indexed reference 'A[i][2j+1][3k+2]', the call
/// 'getSubscriptIndex(loop-k)' would return value 2.		/// 'getSubscriptIndex(loop-k)' would return value 2.
int getSubscriptIndex(const Loop &L) const;		int getSubscriptIndex(const Loop &L) const;

▲ Show 20 Lines • Show All 176 Lines • Show Last 20 Lines

llvm/lib/Analysis/LoopCacheAnalysis.cpp

Show First 20 Lines • Show All 283 Lines • ▼ Show 20 Lines	if (isLoopInvariant(L)) {
return 1;		return 1;
}		}

const SCEV TripCount = computeTripCount(L, Sizes.back(), SE);		const SCEV TripCount = computeTripCount(L, Sizes.back(), SE);
assert(TripCount && "Expecting valid TripCount");		assert(TripCount && "Expecting valid TripCount");
LLVM_DEBUG(dbgs() << "TripCount=" << *TripCount << "\n");		LLVM_DEBUG(dbgs() << "TripCount=" << *TripCount << "\n");

const SCEV *RefCost = nullptr;		const SCEV *RefCost = nullptr;
if (isConsecutive(L, CLS)) {		const SCEV *Stride = nullptr;
		if (isConsecutive(L, Stride, CLS)) {
// If the indexed reference is 'consecutive' the cost is		// If the indexed reference is 'consecutive' the cost is
// (TripCount*Stride)/CLS.		// (TripCount*Stride)/CLS.
const SCEV *Coeff = getLastCoefficient();		assert(Stride != nullptr &&
const SCEV *ElemSize = Sizes.back();		"Stride should not be null for consecutive access!");
assert(Coeff->getType() == ElemSize->getType() &&
"Expecting the same type");
const SCEV *Stride = SE.getMulExpr(Coeff, ElemSize);
Type *WiderType = SE.getWiderType(Stride->getType(), TripCount->getType());		Type *WiderType = SE.getWiderType(Stride->getType(), TripCount->getType());
const SCEV *CacheLineSize = SE.getConstant(WiderType, CLS);		const SCEV *CacheLineSize = SE.getConstant(WiderType, CLS);
if (SE.isKnownNegative(Stride))
Stride = SE.getNegativeSCEV(Stride);
Stride = SE.getNoopOrAnyExtend(Stride, WiderType);		Stride = SE.getNoopOrAnyExtend(Stride, WiderType);
TripCount = SE.getNoopOrAnyExtend(TripCount, WiderType);		TripCount = SE.getNoopOrAnyExtend(TripCount, WiderType);
const SCEV *Numerator = SE.getMulExpr(Stride, TripCount);		const SCEV *Numerator = SE.getMulExpr(Stride, TripCount);
RefCost = SE.getUDivExpr(Numerator, CacheLineSize);		RefCost = SE.getUDivExpr(Numerator, CacheLineSize);

LLVM_DEBUG(dbgs().indent(4)		LLVM_DEBUG(dbgs().indent(4)
<< "Access is consecutive: RefCost=(TripCount*Stride)/CLS="		<< "Access is consecutive: RefCost=(TripCount*Stride)/CLS="
<< *RefCost << "\n");		<< *RefCost << "\n");
▲ Show 20 Lines • Show All 147 Lines • ▼ Show 20 Lines	bool IndexedReference::isLoopInvariant(const Loop &L) const {
// the loop induction variable.		// the loop induction variable.
bool allCoeffForLoopAreZero = all_of(Subscripts, [&](const SCEV *Subscript) {		bool allCoeffForLoopAreZero = all_of(Subscripts, [&](const SCEV *Subscript) {
return isCoeffForLoopZeroOrInvariant(*Subscript, L);		return isCoeffForLoopZeroOrInvariant(*Subscript, L);
});		});

return allCoeffForLoopAreZero;		return allCoeffForLoopAreZero;
}		}

bool IndexedReference::isConsecutive(const Loop &L, unsigned CLS) const {		bool IndexedReference::isConsecutive(const Loop &L, const SCEV *&Stride,
		unsigned CLS) const {
// The indexed reference is 'consecutive' if the only coefficient that uses		// The indexed reference is 'consecutive' if the only coefficient that uses
// the loop induction variable is the last one...		// the loop induction variable is the last one...
const SCEV *LastSubscript = Subscripts.back();		const SCEV *LastSubscript = Subscripts.back();
for (const SCEV *Subscript : Subscripts) {		for (const SCEV *Subscript : Subscripts) {
if (Subscript == LastSubscript)		if (Subscript == LastSubscript)
continue;		continue;
if (!isCoeffForLoopZeroOrInvariant(*Subscript, L))		if (!isCoeffForLoopZeroOrInvariant(*Subscript, L))
return false;		return false;
}		}

// ...and the access stride is less than the cache line size.		// ...and the access stride is less than the cache line size.
const SCEV *Coeff = getLastCoefficient();		const SCEV *Coeff = getLastCoefficient();
const SCEV *ElemSize = Sizes.back();		const SCEV *ElemSize = Sizes.back();
const SCEV *Stride = SE.getMulExpr(Coeff, ElemSize);		Type *WiderType = SE.getWiderType(Coeff->getType(), ElemSize->getType());
		// FIXME: This assumes that all values are signed integers which may
		// be incorrect in unusual codes and incorrectly use sext instead of zext.
		MeinersburUnsubmitted Not Done Reply Inline Actions Could you make it make more explicit that this is known-incorrect for some cases? Suggestion: // FIXME: This assumes that all values are signed integers which may be incorrect in unusual codes and incorrectly use sext instead of zext. // for (uint32_t i = 256; i < 512; ++i) { // uint8_t trunc = i; // A[i] = 42; // } // This consecutively iterates twice over A. If `trunc` is sign-extended, we would conclude that this may iterate backwards over the array. // However, LoopCacheAnalysis is heuristic anyway and transformations must not result in wrong optimizations if the heuristic was incorrect. Meinersbur: Could you make it make more explicit that this is known-incorrect for some cases? Suggestion…
		// for (uint32_t i = 0; i < 512; ++i) {
		// uint8_t trunc = i;
		// A[trunc] = 42;
		// }
		// This consecutively iterates twice over A. If `trunc` is sign-extended,
		// we would conclude that this may iterate backwards over the array.
		// However, LoopCacheAnalysis is heuristic anyway and transformations must
		// not result in wrong optimizations if the heuristic was incorrect.
		Stride = SE.getMulExpr(SE.getNoopOrSignExtend(Coeff, WiderType),
		SE.getNoopOrSignExtend(ElemSize, WiderType));
const SCEV *CacheLineSize = SE.getConstant(Stride->getType(), CLS);		const SCEV *CacheLineSize = SE.getConstant(Stride->getType(), CLS);

Stride = SE.isKnownNegative(Stride) ? SE.getNegativeSCEV(Stride) : Stride;		Stride = SE.isKnownNegative(Stride) ? SE.getNegativeSCEV(Stride) : Stride;
return SE.isKnownPredicate(ICmpInst::ICMP_ULT, Stride, CacheLineSize);		return SE.isKnownPredicate(ICmpInst::ICMP_ULT, Stride, CacheLineSize);
}		}

int IndexedReference::getSubscriptIndex(const Loop &L) const {		int IndexedReference::getSubscriptIndex(const Loop &L) const {
for (auto Idx : seq<int>(0, getNumSubscripts())) {		for (auto Idx : seq<int>(0, getNumSubscripts())) {
▲ Show 20 Lines • Show All 243 Lines • Show Last 20 Lines

llvm/test/Analysis/LoopCacheAnalysis/PowerPC/compute-cost.ll

; RUN: opt < %s -passes='print<loop-cache-cost>' -disable-output 2>&1 \| FileCheck %s		; RUN: opt < %s -opaque-pointers -passes='print<loop-cache-cost>' -disable-output 2>&1 \| FileCheck %s

target datalayout = "e-m:e-i64:64-n32:64"		target datalayout = "e-m:e-i64:64-n32:64"
target triple = "powerpc64le-unknown-linux-gnu"		target triple = "powerpc64le-unknown-linux-gnu"

; Check IndexedReference::computeRefCost can handle type differences between		; Check IndexedReference::computeRefCost can handle type differences between
; Stride and TripCount		; Stride and TripCount

; CHECK: Loop 'for.cond' has cost = 64		; CHECK: Loop 'for.cond' has cost = 64
Show All 18 Lines	for.body: ; preds = %for.cond
%inc = add nuw nsw i32 %i.0, 1		%inc = add nuw nsw i32 %i.0, 1
br label %for.cond		br label %for.cond

; Exit blocks		; Exit blocks
for.end: ; preds = %for.cond		for.end: ; preds = %for.cond
ret void		ret void
}		}

		; Check IndexedReference::computeRefCost can handle type differences between
		; Coeff and ElemSize.

		; CHECK: Loop 'for.cond' has cost = 100000000
		; CHECK: Loop 'for.cond1' has cost = 1000000
		; CHECK: Loop 'for.cond5' has cost = 30000

		@data = external dso_local global [2 x [4 x [18 x i32]]], align 1

		define dso_local void @handle_to_ptr_2(i1 %b0, i1 %b1, i1 %b2) {
		entry:
		br label %for.cond

		for.cond:
		%i.0 = phi i16 [ 0, %entry ], [ %inc18, %for.inc17 ]
		%idxprom = zext i16 %i.0 to i32
		br i1 %b2, label %for.end19, label %for.cond1

		for.cond1:
		%j.0 = phi i16 [ %inc15, %for.inc14 ], [ 0, %for.cond ]
		br i1 %b1, label %for.inc17, label %for.cond5.preheader

		for.cond5.preheader:
		%idxprom10 = zext i16 %j.0 to i32
		br label %for.cond5

		for.cond5:
		%k.0 = phi i16 [ %inc, %for.inc ], [ 0, %for.cond5.preheader ]
		br i1 %b0, label %for.inc14, label %for.inc

		for.inc:
		%idxprom12 = zext i16 %k.0 to i32
		%arrayidx13 = getelementptr inbounds [2 x [4 x [18 x i32]]], ptr @data, i32 0, i32 %idxprom, i32 %idxprom10, i32 %idxprom12
		store i32 7, ptr %arrayidx13, align 1
		%inc = add nuw nsw i16 %k.0, 1
		br label %for.cond5

		for.inc14:
		%inc15 = add nuw nsw i16 %j.0, 1
		br label %for.cond1

		for.inc17:
		%inc18 = add nuw nsw i16 %i.0, 1
		br label %for.cond

		for.end19:
		ret void
		}

; Check IndexedReference::computeRefCost can handle negative stride		; Check IndexedReference::computeRefCost can handle negative stride

; CHECK: Loop 'for.neg.cond' has cost = 64		; CHECK: Loop 'for.neg.cond' has cost = 64

define void @handle_to_ptr_neg_stride(%struct._Handleitem** %blocks) {		define void @handle_to_ptr_neg_stride(%struct._Handleitem** %blocks) {
; Preheader:		; Preheader:
entry:		entry:
▲ Show 20 Lines • Show All 111 Lines • Show Last 20 Lines

llvm/test/Analysis/LoopCacheAnalysis/compute-cost.ll

Show All 29 Lines	for.body: ; preds = %for.cond
%inc = add nuw nsw i32 %i.0, 1		%inc = add nuw nsw i32 %i.0, 1
br label %for.cond		br label %for.cond

; Exit blocks		; Exit blocks
for.end: ; preds = %for.cond		for.end: ; preds = %for.cond
ret void		ret void
}		}

		; Check IndexedReference::computeRefCost can handle type differences between
		; Coeff and ElemSize.

		; SMALLER-CACHELINE: Loop 'for.cond' has cost = 100000000
		; SMALLER-CACHELINE: Loop 'for.cond1' has cost = 1000000
		; SMALLER-CACHELINE: Loop 'for.cond5' has cost = 120000
		; LARGER-CACHELINE: Loop 'for.cond' has cost = 100000000
		; LARGER-CACHELINE: Loop 'for.cond1' has cost = 1000000
		; LARGER-CACHELINE: Loop 'for.cond5' has cost = 10000
		@data = external dso_local global [2 x [4 x [18 x i32]]], align 1

		define dso_local void @handle_to_ptr_2(i1 %b0, i1 %b1, i1 %b2) {
		entry:
		br label %for.cond

		for.cond:
		%i.0 = phi i16 [ 0, %entry ], [ %inc18, %for.inc17 ]
		%idxprom = zext i16 %i.0 to i32
		br i1 %b2, label %for.end19, label %for.cond1

		for.cond1:
		%j.0 = phi i16 [ %inc15, %for.inc14 ], [ 0, %for.cond ]
		br i1 %b1, label %for.inc17, label %for.cond5.preheader

		for.cond5.preheader:
		%idxprom10 = zext i16 %j.0 to i32
		br label %for.cond5

		for.cond5:
		%k.0 = phi i16 [ %inc, %for.inc ], [ 0, %for.cond5.preheader ]
		br i1 %b0, label %for.inc14, label %for.inc

		for.inc:
		%idxprom12 = zext i16 %k.0 to i32
		%arrayidx13 = getelementptr inbounds [2 x [4 x [18 x i32]]], ptr @data, i32 0, i32 %idxprom, i32 %idxprom10, i32 %idxprom12
		store i32 7, ptr %arrayidx13, align 1
		%inc = add nuw nsw i16 %k.0, 1
		br label %for.cond5

		for.inc14:
		%inc15 = add nuw nsw i16 %j.0, 1
		br label %for.cond1

		for.inc17:
		%inc18 = add nuw nsw i16 %i.0, 1
		br label %for.cond

		for.end19:
		ret void
		}

; Check IndexedReference::computeRefCost can handle negative stride		; Check IndexedReference::computeRefCost can handle negative stride

; SMALLER-CACHELINE: Loop 'for.neg.cond' has cost = 256		; SMALLER-CACHELINE: Loop 'for.neg.cond' has cost = 256
; LARGER-CACHELINE: Loop 'for.neg.cond' has cost = 32		; LARGER-CACHELINE: Loop 'for.neg.cond' has cost = 32
define void @handle_to_ptr_neg_stride(%struct._Handleitem** %blocks) {		define void @handle_to_ptr_neg_stride(%struct._Handleitem** %blocks) {
; Preheader:		; Preheader:
entry:		entry:
▲ Show 20 Lines • Show All 114 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LoopCacheAnalysis] Fix a type mismatch bug in cost calculation
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 446360

llvm/include/llvm/Analysis/LoopCacheAnalysis.h

llvm/lib/Analysis/LoopCacheAnalysis.cpp

llvm/test/Analysis/LoopCacheAnalysis/PowerPC/compute-cost.ll

llvm/test/Analysis/LoopCacheAnalysis/compute-cost.ll

This is an archive of the discontinued LLVM Phabricator instance.

[LoopCacheAnalysis] Fix a type mismatch bug in cost calculationClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 446360

llvm/include/llvm/Analysis/LoopCacheAnalysis.h

llvm/lib/Analysis/LoopCacheAnalysis.cpp

llvm/test/Analysis/LoopCacheAnalysis/PowerPC/compute-cost.ll

llvm/test/Analysis/LoopCacheAnalysis/compute-cost.ll

[LoopCacheAnalysis] Fix a type mismatch bug in cost calculation
ClosedPublic