This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
include/llvm/Analysis/
-
llvm/
-
Analysis/
-
LoopAccessAnalysis.h
-
lib/
-
Analysis/
-
LoopAccessAnalysis.cpp
-
Transforms/Vectorize/
-
Vectorize/
-
LoopVectorizationLegality.cpp
-
test/
-
Analysis/LoopAccessAnalysis/
-
LoopAccessAnalysis/
-
memcheck-wrapping-pointers.ll
-
store-to-invariant-check1.ll
-
store-to-invariant-check2.ll
-
store-to-invariant-check3.ll
-
Transforms/LoopVectorize/
-
LoopVectorize/
-
invariant-store-vectorization.ll

Differential D54538

[LV] Avoid vectorizing unsafe dependencies in uniform address
ClosedPublic

Authored by anna on Nov 14 2018, 12:43 PM.

Download Raw Diff

Details

Reviewers

Ayal
efriedma

Commits

rG5e9215f02bc5: [LV] Avoid vectorizing unsafe dependencies in uniform address
rL347220: [LV] Avoid vectorizing unsafe dependencies in uniform address

Summary

Currently, when vectorizing stores to uniform addresses, the only
instance we prevent vectorization is if there are multiple stores to the
same uniform address.
This patch teaches LAA to avoid vectorizing loops that have an unsafe
cross-iteration dependency between a load and a store to the same uniform address.
We cannot vectorize loops that load from a uniform address where in the
previous iteration we stored to the same uniform address.

Two test cases are added to show safe and unsafe dependencies for
vectorization. Fixes PR39653.

Note: Initially tried setting CanVecMem to false and returning from the
LAA analysis, when we came across unsafe dependencies. However, this
broke some tests in LoopVersioningLICM. So, the patch changes the
HasMultipleStoresToInvariantAddress to
HasNonVectorizableStoresToInvariantAddress. This allows ORE to
state the reason we could not vectorize the loop.

Diff Detail

Repository: rL LLVM

Event Timeline

anna created this revision.Nov 14 2018, 12:43 PM

Herald added a subscriber: rkruppe. · View Herald TranscriptNov 14 2018, 12:43 PM

Harbormaster completed remote builds in B25006: Diff 174086.Nov 14 2018, 12:43 PM

efriedma added inline comments.Nov 14 2018, 1:14 PM

lib/Analysis/LoopAccessAnalysis.cpp
1927 ↗	(On Diff #174086)	I'm not sure the dominance check is sufficient. Yes, if the store dominates the load, it's theoretically possible to vectorize the loop, but only if vectorizer has a special case to forward the stored values to the load. Otherwise, the load will get values from the wrong loop iteration.

Ayal added inline comments.Nov 14 2018, 1:41 PM

include/llvm/Analysis/LoopAccessAnalysis.h
567 ↗	(On Diff #174086)	Comment is pretty obvious; better explain when stores to an invariant address are considered non-vectorizable.
lib/Analysis/LoopAccessAnalysis.cpp
1878 ↗	(On Diff #174086)	Is `UniformStores` still needed? Can check instead `UniformStoreMap.count(Ptr)` here (and below)?
1926 ↗	(On Diff #174086)	Can check instead `UniformStoreMap.count(Ptr)`? Note that checking `!HasNonVectorizableStoresToLoopInvariantAddress` is redundant, but could save time.
1927 ↗	(On Diff #174086)	Ahh, right! No special forwarding provided. Only the last of the VF stores to same address will survive, and it will feed all VF loads. As @anna originally intended: if an invariant address is both stored-to and loaded-from, inside the loop, bail out. So can continue to use the `ValueSet UniformStores`;
test/Transforms/LoopVectorize/invariant-store-vectorization.ll
555 ↗	(On Diff #174086)	Good to mention this test is related to pr39653. With -force-vector-width=4 the test can be simplified; it probably took the additional instructions to convince LV's cost-model to vectorize the original loop on its own, w/o -force-vector-width.

anna added inline comments.Nov 15 2018, 6:32 AM

lib/Analysis/LoopAccessAnalysis.cpp
1927 ↗	(On Diff #174086)	oh yes, that's right. I'll resurrect the older one.

based on above comments.

addressed review comments

Harbormaster completed remote builds in B25050: Diff 174216.Nov 15 2018, 7:59 AM

anna marked 2 inline comments as done.Nov 15 2018, 8:00 AM

anna added inline comments.

test/Transforms/LoopVectorize/invariant-store-vectorization.ll
555 ↗	(On Diff #174086)	I'd tried simplifying the test with force-vector-width, but it still avoided vectorization stating "unsafe memory operations".

LGTM, with minor optional comments. Would be good to add a test where the load is preceded by a store in the loop, indicating that such dependence is also non-vectorizable.

include/llvm/Analysis/LoopAccessAnalysis.h
569 ↗	(On Diff #174216)	Can simply state that if there's a memory dependence involving an invariant address, i.e., two stores or a store and a load, return true, else return false. Could alternatively name it `hasDependenceInvolvingLoopInvariantAddress` Note that a load from an invariant address that depends on store(s) to it in the loop, should ideally be promoted to use the temporary being stored; the temporary value gets expanded, unlike the store to the invariant address. The missed ORE message could indicate such potential resolution.
lib/Analysis/LoopAccessAnalysis.cpp
1919 ↗	(On Diff #174216)	Can alternatively use `if (UniformStores.count(Ptr))`
lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
819 ↗	(On Diff #174216)	No need for {}
test/Transforms/LoopVectorize/invariant-store-vectorization.ll
568 ↗	(On Diff #174216)	(Note: load could be replaced by phi(%arg4, %tmp12), a potentially vectorizable 1st-order-recurrence.)

This revision is now accepted and ready to land.Nov 18 2018, 11:08 PM

anna marked 5 inline comments as done.Nov 19 2018, 7:17 AM

anna added inline comments.

include/llvm/Analysis/LoopAccessAnalysis.h
569 ↗	(On Diff #174216)	addressed first 2 comments. Regarding the third one, the entirety of stores to invariant address or store-followed-by-load to inv address is a missed optimization not handled before vectorization, i.e. scalar promotion and LICM.
test/Transforms/LoopVectorize/invariant-store-vectorization.ll
568 ↗	(On Diff #174216)	good point! I'll add that as a comment.

Closed by commit rL347220: [LV] Avoid vectorizing unsafe dependencies in uniform address (authored by annat). · Explain WhyNov 19 2018, 7:42 AM

This revision was automatically updated to reflect the committed changes.

Ayal mentioned this in D67372: [LV] Support invariant addresses in speculation logic.Sep 11 2019, 12:44 PM

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

Analysis/

LoopAccessAnalysis.h

12 lines

lib/

Analysis/

LoopAccessAnalysis.cpp

16 lines

Transforms/

Vectorize/

LoopVectorizationLegality.cpp

7 lines

test/

Analysis/

LoopAccessAnalysis/

memcheck-wrapping-pointers.ll

2 lines

store-to-invariant-check1.ll

8 lines

store-to-invariant-check2.ll

4 lines

store-to-invariant-check3.ll

2 lines

Transforms/

LoopVectorize/

invariant-store-vectorization.ll

42 lines

Diff 174616

llvm/trunk/include/llvm/Analysis/LoopAccessAnalysis.h

Show First 20 Lines • Show All 558 Lines • ▼ Show 20 Lines	public:
const ValueToValueMap &getSymbolicStrides() const { return SymbolicStrides; }		const ValueToValueMap &getSymbolicStrides() const { return SymbolicStrides; }

/// Pointer has a symbolic stride.		/// Pointer has a symbolic stride.
bool hasStride(Value *V) const { return StrideSet.count(V); }		bool hasStride(Value *V) const { return StrideSet.count(V); }

/// Print the information about the memory accesses in the loop.		/// Print the information about the memory accesses in the loop.
void print(raw_ostream &OS, unsigned Depth = 0) const;		void print(raw_ostream &OS, unsigned Depth = 0) const;

/// If the loop has multiple stores to an invariant address, then		/// If the loop has memory dependence involving an invariant address, i.e. two
/// return true, else return false.		/// stores or a store and a load, then return true, else return false.
bool hasMultipleStoresToLoopInvariantAddress() const {		bool hasDependenceInvolvingLoopInvariantAddress() const {
return HasMultipleStoresToLoopInvariantAddress;		return HasDependenceInvolvingLoopInvariantAddress;
}		}

/// Used to add runtime SCEV checks. Simplifies SCEV expressions and converts		/// Used to add runtime SCEV checks. Simplifies SCEV expressions and converts
/// them to a more usable form. All SCEV expressions during the analysis		/// them to a more usable form. All SCEV expressions during the analysis
/// should be re-written (and therefore simplified) according to PSE.		/// should be re-written (and therefore simplified) according to PSE.
/// A user of LoopAccessAnalysis will need to emit the runtime checks		/// A user of LoopAccessAnalysis will need to emit the runtime checks
/// associated with this predicate.		/// associated with this predicate.
const PredicatedScalarEvolution &getPSE() const { return *PSE; }		const PredicatedScalarEvolution &getPSE() const { return *PSE; }
Show All 36 Lines	private:
unsigned NumLoads;		unsigned NumLoads;
unsigned NumStores;		unsigned NumStores;

uint64_t MaxSafeDepDistBytes;		uint64_t MaxSafeDepDistBytes;

/// Cache the result of analyzeLoop.		/// Cache the result of analyzeLoop.
bool CanVecMem;		bool CanVecMem;

/// Indicator that there are multiple stores to a uniform address.		/// Indicator that there are non vectorizable stores to a uniform address.
bool HasMultipleStoresToLoopInvariantAddress;		bool HasDependenceInvolvingLoopInvariantAddress;

/// The diagnostics report generated for the analysis. E.g. why we		/// The diagnostics report generated for the analysis. E.g. why we
/// couldn't analyze the loop.		/// couldn't analyze the loop.
std::unique_ptr<OptimizationRemarkAnalysis> Report;		std::unique_ptr<OptimizationRemarkAnalysis> Report;

/// If an access has a symbolic strides, this maps the pointer value to		/// If an access has a symbolic strides, this maps the pointer value to
/// the stride symbol.		/// the stride symbol.
ValueToValueMap SymbolicStrides;		ValueToValueMap SymbolicStrides;
▲ Show 20 Lines • Show All 129 Lines • Show Last 20 Lines

llvm/trunk/lib/Analysis/LoopAccessAnalysis.cpp

Show First 20 Lines • Show All 1,864 Lines • ▼ Show 20 Lines	void LoopAccessInfo::analyzeLoop(AliasAnalysis AA, LoopInfo LI,
// Record uniform store addresses to identify if we have multiple stores		// Record uniform store addresses to identify if we have multiple stores
// to the same address.		// to the same address.
ValueSet UniformStores;		ValueSet UniformStores;

for (StoreInst *ST : Stores) {		for (StoreInst *ST : Stores) {
Value *Ptr = ST->getPointerOperand();		Value *Ptr = ST->getPointerOperand();

if (isUniform(Ptr))		if (isUniform(Ptr))
HasMultipleStoresToLoopInvariantAddress \|=		HasDependenceInvolvingLoopInvariantAddress \|=
!UniformStores.insert(Ptr).second;		!UniformStores.insert(Ptr).second;

// If we did not see this pointer before, insert it to the read-write		// If we did not see this pointer before, insert it to the read-write
// list. At this phase it is only a 'write' list.		// list. At this phase it is only a 'write' list.
if (Seen.insert(Ptr).second) {		if (Seen.insert(Ptr).second) {
++NumReadWrites;		++NumReadWrites;

MemoryLocation Loc = MemoryLocation::get(ST);		MemoryLocation Loc = MemoryLocation::get(ST);
Show All 27 Lines	for (LoadInst *LD : Loads) {
// words may be written to the same address.		// words may be written to the same address.
bool IsReadOnlyPtr = false;		bool IsReadOnlyPtr = false;
if (Seen.insert(Ptr).second \|\|		if (Seen.insert(Ptr).second \|\|
!getPtrStride(*PSE, Ptr, TheLoop, SymbolicStrides)) {		!getPtrStride(*PSE, Ptr, TheLoop, SymbolicStrides)) {
++NumReads;		++NumReads;
IsReadOnlyPtr = true;		IsReadOnlyPtr = true;
}		}

		// See if there is an unsafe dependency between a load to a uniform address and
		// store to the same uniform address.
		if (UniformStores.count(Ptr)) {
		LLVM_DEBUG(dbgs() << "LAA: Found an unsafe dependency between a uniform "
		"load and uniform store to the same address!\n");
		HasDependenceInvolvingLoopInvariantAddress = true;
		}

MemoryLocation Loc = MemoryLocation::get(LD);		MemoryLocation Loc = MemoryLocation::get(LD);
// The TBAA metadata could have a control dependency on the predication		// The TBAA metadata could have a control dependency on the predication
// condition, so we cannot rely on it when determining whether or not we		// condition, so we cannot rely on it when determining whether or not we
// need runtime pointer checks.		// need runtime pointer checks.
if (blockNeedsPredication(LD->getParent(), TheLoop, DT))		if (blockNeedsPredication(LD->getParent(), TheLoop, DT))
Loc.AATags.TBAA = nullptr;		Loc.AATags.TBAA = nullptr;

Accesses.addLoad(Loc, IsReadOnlyPtr);		Accesses.addLoad(Loc, IsReadOnlyPtr);
▲ Show 20 Lines • Show All 342 Lines • ▼ Show 20 Lines

LoopAccessInfo::LoopAccessInfo(Loop L, ScalarEvolution SE,		LoopAccessInfo::LoopAccessInfo(Loop L, ScalarEvolution SE,
const TargetLibraryInfo TLI, AliasAnalysis AA,		const TargetLibraryInfo TLI, AliasAnalysis AA,
DominatorTree DT, LoopInfo LI)		DominatorTree DT, LoopInfo LI)
: PSE(llvm::make_unique<PredicatedScalarEvolution>(SE, L)),		: PSE(llvm::make_unique<PredicatedScalarEvolution>(SE, L)),
PtrRtChecking(llvm::make_unique<RuntimePointerChecking>(SE)),		PtrRtChecking(llvm::make_unique<RuntimePointerChecking>(SE)),
DepChecker(llvm::make_unique<MemoryDepChecker>(*PSE, L)), TheLoop(L),		DepChecker(llvm::make_unique<MemoryDepChecker>(*PSE, L)), TheLoop(L),
NumLoads(0), NumStores(0), MaxSafeDepDistBytes(-1), CanVecMem(false),		NumLoads(0), NumStores(0), MaxSafeDepDistBytes(-1), CanVecMem(false),
HasMultipleStoresToLoopInvariantAddress(false) {		HasDependenceInvolvingLoopInvariantAddress(false) {
if (canAnalyzeLoop())		if (canAnalyzeLoop())
analyzeLoop(AA, LI, TLI, DT);		analyzeLoop(AA, LI, TLI, DT);
}		}

void LoopAccessInfo::print(raw_ostream &OS, unsigned Depth) const {		void LoopAccessInfo::print(raw_ostream &OS, unsigned Depth) const {
if (CanVecMem) {		if (CanVecMem) {
OS.indent(Depth) << "Memory dependences are safe";		OS.indent(Depth) << "Memory dependences are safe";
if (MaxSafeDepDistBytes != -1ULL)		if (MaxSafeDepDistBytes != -1ULL)
Show All 15 Lines	if (auto *Dependences = DepChecker->getDependences()) {
}		}
} else		} else
OS.indent(Depth) << "Too many dependences, not recorded\n";		OS.indent(Depth) << "Too many dependences, not recorded\n";

// List the pair of accesses need run-time checks to prove independence.		// List the pair of accesses need run-time checks to prove independence.
PtrRtChecking->print(OS, Depth);		PtrRtChecking->print(OS, Depth);
OS << "\n";		OS << "\n";

OS.indent(Depth) << "Multiple stores to invariant address were "		OS.indent(Depth) << "Non vectorizable stores to invariant address were "
<< (HasMultipleStoresToLoopInvariantAddress ? "" : "not ")		<< (HasDependenceInvolvingLoopInvariantAddress ? "" : "not ")
<< "found in loop.\n";		<< "found in loop.\n";

OS.indent(Depth) << "SCEV assumptions:\n";		OS.indent(Depth) << "SCEV assumptions:\n";
PSE->getUnionPredicate().print(OS, Depth);		PSE->getUnionPredicate().print(OS, Depth);

OS << "\n";		OS << "\n";

OS.indent(Depth) << "Expressions re-written:\n";		OS.indent(Depth) << "Expressions re-written:\n";
▲ Show 20 Lines • Show All 68 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

Show First 20 Lines • Show All 811 Lines • ▼ Show 20 Lines	if (LAR) {
ORE->emit([&]() {		ORE->emit([&]() {
return OptimizationRemarkAnalysis(Hints->vectorizeAnalysisPassName(),		return OptimizationRemarkAnalysis(Hints->vectorizeAnalysisPassName(),
"loop not vectorized: ", *LAR);		"loop not vectorized: ", *LAR);
});		});
}		}
if (!LAI->canVectorizeMemory())		if (!LAI->canVectorizeMemory())
return false;		return false;

if (LAI->hasMultipleStoresToLoopInvariantAddress()) {		if (LAI->hasDependenceInvolvingLoopInvariantAddress()) {
ORE->emit(createMissedAnalysis("CantVectorizeStoreToLoopInvariantAddress")		ORE->emit(createMissedAnalysis("CantVectorizeStoreToLoopInvariantAddress")
<< "multiple writes to a loop invariant address could not "		<< "write to a loop invariant address could not "
"be vectorized");		"be vectorized");
LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << "LV: We don't allow multiple stores to a uniform address\n");		dbgs() << "LV: Non vectorizable stores to a uniform address\n");
return false;		return false;
}		}

Requirements->addRuntimePointerChecks(LAI->getNumRuntimePointerChecks());		Requirements->addRuntimePointerChecks(LAI->getNumRuntimePointerChecks());
PSE.addPredicate(LAI->getPSE().getUnionPredicate());		PSE.addPredicate(LAI->getPSE().getUnionPredicate());

return true;		return true;
}		}

bool LoopVectorizationLegality::isInductionPhi(const Value *V) {		bool LoopVectorizationLegality::isInductionPhi(const Value *V) {
Value In0 = const_cast<Value >(V);		Value In0 = const_cast<Value >(V);
▲ Show 20 Lines • Show All 356 Lines • Show Last 20 Lines

llvm/trunk/test/Analysis/LoopAccessAnalysis/memcheck-wrapping-pointers.ll

	Show All 33 Lines
	; CHECK-NEXT: %arrayidx4 = getelementptr inbounds i32, i32* %b, i64 %conv11			; CHECK-NEXT: %arrayidx4 = getelementptr inbounds i32, i32* %b, i64 %conv11
	; CHECK-NEXT: Grouped accesses:			; CHECK-NEXT: Grouped accesses:
	; CHECK-NEXT: Group			; CHECK-NEXT: Group
	; CHECK-NEXT: (Low: (4 + %a) High: (4 + (4 * (1 umax %x)) + %a))			; CHECK-NEXT: (Low: (4 + %a) High: (4 + (4 * (1 umax %x)) + %a))
	; CHECK-NEXT: Member: {(4 + %a),+,4}<%for.body>			; CHECK-NEXT: Member: {(4 + %a),+,4}<%for.body>
	; CHECK-NEXT: Group			; CHECK-NEXT: Group
	; CHECK-NEXT: (Low: %b High: ((4 * (1 umax %x)) + %b))			; CHECK-NEXT: (Low: %b High: ((4 * (1 umax %x)) + %b))
	; CHECK-NEXT: Member: {%b,+,4}<%for.body>			; CHECK-NEXT: Member: {%b,+,4}<%for.body>
	; CHECK: Multiple stores to invariant address were not found in loop.			; CHECK: Non vectorizable stores to invariant address were not found in loop.
	; CHECK-NEXT: SCEV assumptions:			; CHECK-NEXT: SCEV assumptions:
	; CHECK-NEXT: {1,+,1}<%for.body> Added Flags: <nusw>			; CHECK-NEXT: {1,+,1}<%for.body> Added Flags: <nusw>
	; CHECK-NEXT: {0,+,1}<%for.body> Added Flags: <nusw>			; CHECK-NEXT: {0,+,1}<%for.body> Added Flags: <nusw>
	; CHECK: Expressions re-written:			; CHECK: Expressions re-written:
	; CHECK-NEXT: [PSE] %arrayidx = getelementptr inbounds i32, i32* %a, i64 %idxprom:			; CHECK-NEXT: [PSE] %arrayidx = getelementptr inbounds i32, i32* %a, i64 %idxprom:
	; CHECK-NEXT: ((4 * (zext i32 {1,+,1}<%for.body> to i64))<nuw><nsw> + %a)<nsw>			; CHECK-NEXT: ((4 * (zext i32 {1,+,1}<%for.body> to i64))<nuw><nsw> + %a)<nsw>
	; CHECK-NEXT: --> {(4 + %a),+,4}<%for.body>			; CHECK-NEXT: --> {(4 + %a),+,4}<%for.body>
	; CHECK-NEXT: [PSE] %arrayidx4 = getelementptr inbounds i32, i32* %b, i64 %conv11:			; CHECK-NEXT: [PSE] %arrayidx4 = getelementptr inbounds i32, i32* %b, i64 %conv11:
	▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

llvm/trunk/test/Analysis/LoopAccessAnalysis/store-to-invariant-check1.ll

	; RUN: opt < %s -loop-accesses -analyze \| FileCheck -check-prefix=OLDPM %s			; RUN: opt < %s -loop-accesses -analyze \| FileCheck -check-prefix=OLDPM %s
	; RUN: opt -passes='require<scalar-evolution>,require<aa>,loop(print-access-info)' -disable-output < %s 2>&1 \| FileCheck -check-prefix=NEWPM %s			; RUN: opt -passes='require<scalar-evolution>,require<aa>,loop(print-access-info)' -disable-output < %s 2>&1 \| FileCheck -check-prefix=NEWPM %s

	; Test to confirm LAA will find multiple stores to an invariant address in the			; Test to confirm LAA will find multiple stores to an invariant address in the
	; inner loop.			; inner loop.
	;			;
	; for(; i < itr; i++) {			; for(; i < itr; i++) {
	; for(; j < itr; j++) {			; for(; j < itr; j++) {
	; var1[i] = var2[j] + var1[i];			; var1[i] = var2[j] + var1[i];
	; var1[i]++;			; var1[i]++;
	; }			; }
	; }			; }

	; The LAA with the new PM is a loop pass so we go from inner to outer loops.			; The LAA with the new PM is a loop pass so we go from inner to outer loops.

	; OLDPM: for.cond1.preheader:			; OLDPM: for.cond1.preheader:
	; OLDPM: Multiple stores to invariant address were not found in loop.			; OLDPM: Non vectorizable stores to invariant address were not found in loop.
	; OLDPM: for.body3:			; OLDPM: for.body3:
	; OLDPM: Multiple stores to invariant address were found in loop.			; OLDPM: Non vectorizable stores to invariant address were found in loop.

	; NEWPM: for.body3:			; NEWPM: for.body3:
	; NEWPM: Multiple stores to invariant address were found in loop.			; NEWPM: Non vectorizable stores to invariant address were found in loop.
	; NEWPM: for.cond1.preheader:			; NEWPM: for.cond1.preheader:
	; NEWPM: Multiple stores to invariant address were not found in loop.			; NEWPM: Non vectorizable stores to invariant address were not found in loop.

	define i32 @foo(i32* nocapture %var1, i32* nocapture readonly %var2, i32 %itr) #0 {			define i32 @foo(i32* nocapture %var1, i32* nocapture readonly %var2, i32 %itr) #0 {
	entry:			entry:
	%cmp20 = icmp eq i32 %itr, 0			%cmp20 = icmp eq i32 %itr, 0
	br i1 %cmp20, label %for.end10, label %for.cond1.preheader			br i1 %cmp20, label %for.end10, label %for.cond1.preheader

	for.cond1.preheader: ; preds = %entry, %for.inc8			for.cond1.preheader: ; preds = %entry, %for.inc8
	%indvars.iv23 = phi i64 [ %indvars.iv.next24, %for.inc8 ], [ 0, %entry ]			%indvars.iv23 = phi i64 [ %indvars.iv.next24, %for.inc8 ], [ 0, %entry ]
	Show All 35 Lines

llvm/trunk/test/Analysis/LoopAccessAnalysis/store-to-invariant-check2.ll

	; RUN: opt < %s -loop-accesses -analyze \| FileCheck %s			; RUN: opt < %s -loop-accesses -analyze \| FileCheck %s
	; RUN: opt -passes='require<scalar-evolution>,require<aa>,loop(print-access-info)' -disable-output < %s 2>&1 \| FileCheck %s			; RUN: opt -passes='require<scalar-evolution>,require<aa>,loop(print-access-info)' -disable-output < %s 2>&1 \| FileCheck %s

	; Test to confirm LAA will not find store to invariant address.			; Test to confirm LAA will not find store to invariant address.
	; Inner loop has no store to invariant address.			; Inner loop has no store to invariant address.
	;			;
	; for(; i < itr; i++) {			; for(; i < itr; i++) {
	; for(; j < itr; j++) {			; for(; j < itr; j++) {
	; var2[j] = var2[j] + var1[i];			; var2[j] = var2[j] + var1[i];
	; }			; }
	; }			; }

	; CHECK: Multiple stores to invariant address were not found in loop.			; CHECK: Non vectorizable stores to invariant address were not found in loop.
	; CHECK-NOT: Multiple stores to invariant address were found in loop.			; CHECK-NOT: Non vectorizable stores to invariant address were found in loop.


	define i32 @foo(i32* nocapture readonly %var1, i32* nocapture %var2, i32 %itr) #0 {			define i32 @foo(i32* nocapture readonly %var1, i32* nocapture %var2, i32 %itr) #0 {
	entry:			entry:
	%cmp20 = icmp eq i32 %itr, 0			%cmp20 = icmp eq i32 %itr, 0
	br i1 %cmp20, label %for.end10, label %for.cond1.preheader			br i1 %cmp20, label %for.end10, label %for.cond1.preheader

	for.cond1.preheader: ; preds = %entry, %for.inc8			for.cond1.preheader: ; preds = %entry, %for.inc8
	Show All 33 Lines

llvm/trunk/test/Analysis/LoopAccessAnalysis/store-to-invariant-check3.ll

	; RUN: opt < %s -loop-accesses -analyze \| FileCheck %s			; RUN: opt < %s -loop-accesses -analyze \| FileCheck %s
	; RUN: opt -passes='require<scalar-evolution>,require<aa>,loop(print-access-info)' -disable-output < %s 2>&1 \| FileCheck %s			; RUN: opt -passes='require<scalar-evolution>,require<aa>,loop(print-access-info)' -disable-output < %s 2>&1 \| FileCheck %s

	; Inner loop has a store to invariant address, but LAA does not need to identify			; Inner loop has a store to invariant address, but LAA does not need to identify
	; the store to invariant address, since it is a single store.			; the store to invariant address, since it is a single store.
	;			;
	; for(; i < itr; i++) {			; for(; i < itr; i++) {
	; for(; j < itr; j++) {			; for(; j < itr; j++) {
	; var1[j] = ++var2[i] + var1[j];			; var1[j] = ++var2[i] + var1[j];
	; }			; }
	; }			; }

	; CHECK: Multiple stores to invariant address were not found in loop.			; CHECK: Non vectorizable stores to invariant address were not found in loop.

	define void @foo(i32* nocapture %var1, i32* nocapture %var2, i32 %itr) #0 {			define void @foo(i32* nocapture %var1, i32* nocapture %var2, i32 %itr) #0 {
	entry:			entry:
	%cmp20 = icmp sgt i32 %itr, 0			%cmp20 = icmp sgt i32 %itr, 0
	br i1 %cmp20, label %for.cond1.preheader, label %for.end11			br i1 %cmp20, label %for.cond1.preheader, label %for.end11

	for.cond1.preheader: ; preds = %entry, %for.inc9			for.cond1.preheader: ; preds = %entry, %for.inc9
	%indvars.iv23 = phi i64 [ %indvars.iv.next24, %for.inc9 ], [ 0, %entry ]			%indvars.iv23 = phi i64 [ %indvars.iv.next24, %for.inc9 ], [ 0, %entry ]
	Show All 33 Lines

llvm/trunk/test/Transforms/LoopVectorize/invariant-store-vectorization.ll

Show First 20 Lines • Show All 545 Lines • ▼ Show 20 Lines	for.inc8: ; preds = %for.body3, %for.cond1.preheader
%indvars.iv.next24 = add nuw nsw i64 %indvars.iv23, 1		%indvars.iv.next24 = add nuw nsw i64 %indvars.iv23, 1
%lftr.wideiv25 = trunc i64 %indvars.iv.next24 to i32		%lftr.wideiv25 = trunc i64 %indvars.iv.next24 to i32
%exitcond26 = icmp eq i32 %lftr.wideiv25, %itr		%exitcond26 = icmp eq i32 %lftr.wideiv25, %itr
br i1 %exitcond26, label %for.end10, label %for.cond1.preheader		br i1 %exitcond26, label %for.end10, label %for.cond1.preheader

for.end10: ; preds = %for.inc8, %entry		for.end10: ; preds = %for.inc8, %entry
ret i32 undef		ret i32 undef
}		}

		; cannot vectorize loop with unsafe dependency between uniform load (%tmp10) and store
		; (%tmp12) to the same address
		; PR39653
		; Note: %tmp10 could be replaced by phi(%arg4, %tmp12), a potentially vectorizable
		; 1st-order-recurrence
		define void @unsafe_dep_uniform_load_store(i32 %arg, i32 %arg1, i64 %arg2, i16* %arg3, i32 %arg4, i64 %arg5) {
		; CHECK-LABEL: unsafe_dep_uniform_load_store
		; CHECK-NOT: <4 x i32>
		bb:
		%tmp = alloca i32
		store i32 %arg4, i32* %tmp
		%tmp6 = getelementptr inbounds i16, i16* %arg3, i64 %arg5
		br label %bb7

		bb7:
		%tmp8 = phi i64 [ 0, %bb ], [ %tmp24, %bb7 ]
		%tmp9 = phi i32 [ %arg1, %bb ], [ %tmp23, %bb7 ]
		%tmp10 = load i32, i32* %tmp
		%tmp11 = mul nsw i32 %tmp9, %tmp10
		%tmp12 = srem i32 %tmp11, 65536
		%tmp13 = add nsw i32 %tmp12, %tmp9
		%tmp14 = trunc i32 %tmp13 to i16
		%tmp15 = trunc i64 %tmp8 to i32
		%tmp16 = add i32 %arg, %tmp15
		%tmp17 = zext i32 %tmp16 to i64
		%tmp18 = getelementptr inbounds i16, i16* %tmp6, i64 %tmp17
		store i16 %tmp14, i16* %tmp18, align 2
		%tmp19 = add i32 %tmp13, %tmp9
		%tmp20 = trunc i32 %tmp19 to i16
		%tmp21 = and i16 %tmp20, 255
		%tmp22 = getelementptr inbounds i16, i16* %arg3, i64 %tmp17
		store i16 %tmp21, i16* %tmp22, align 2
		%tmp23 = add nsw i32 %tmp9, 1
		%tmp24 = add nuw nsw i64 %tmp8, 1
		%tmp25 = icmp eq i64 %tmp24, %arg2
		store i32 %tmp12, i32* %tmp
		br i1 %tmp25, label %bb26, label %bb7

		bb26:
		ret void
		}

This is an archive of the discontinued LLVM Phabricator instance.

[LV] Avoid vectorizing unsafe dependencies in uniform addressClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 174616

llvm/trunk/include/llvm/Analysis/LoopAccessAnalysis.h

llvm/trunk/lib/Analysis/LoopAccessAnalysis.cpp

llvm/trunk/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

llvm/trunk/test/Analysis/LoopAccessAnalysis/memcheck-wrapping-pointers.ll

llvm/trunk/test/Analysis/LoopAccessAnalysis/store-to-invariant-check1.ll

llvm/trunk/test/Analysis/LoopAccessAnalysis/store-to-invariant-check2.ll

llvm/trunk/test/Analysis/LoopAccessAnalysis/store-to-invariant-check3.ll

llvm/trunk/test/Transforms/LoopVectorize/invariant-store-vectorization.ll

[LV] Avoid vectorizing unsafe dependencies in uniform address
ClosedPublic