Diff 242089

llvm/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 5,483 Lines • ▼ Show 20 Lines
	the bit operand value is 1 vectorization is enabled. A value of 0 disables			the bit operand value is 1 vectorization is enabled. A value of 0 disables
	vectorization:			vectorization:

	.. code-block:: llvm			.. code-block:: llvm

	!0 = !{!"llvm.loop.vectorize.predicate.enable", i1 0}			!0 = !{!"llvm.loop.vectorize.predicate.enable", i1 0}
	!1 = !{!"llvm.loop.vectorize.predicate.enable", i1 1}			!1 = !{!"llvm.loop.vectorize.predicate.enable", i1 1}

				'``llvm.loop.vectorize.ivdep.enable``' Metadata
				MeinersburUnsubmitted Not Done Reply Inline Actions [serious] Please also describe the relationship to `llvm.mem.parallel_loop_access`. Meinersbur: [serious] Please also describe the relationship to `llvm.mem.parallel_loop_access`.
				YashasAndaluriAuthorUnsubmitted Done Reply Inline Actions `llvm.loop.parallel_accesses` metadata indicates that no dependencies exist between instructions marked with `llvm.access.group` metadata and can be executed in parallel, whereas `llvm.loop.vectorize.ivdep.enable` indicates that Unknown dependencies are safe for vectorization. YashasAndaluri: `llvm.loop.parallel_accesses` metadata indicates that no dependencies exist between…
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

				This metadata indicates to the vectorizer to ignore dependencies between
				memory accesses which have not been determined to be either safe or unsafe
				for vectorization. This differs from ``llvm.loop.parallel_access``, which
				considers no dependencies to be present between memory accesses belonging
				to the same access group. The first operand is the string
				``llvm.loop.vectorize.ivdep.enable`` and the second operand is a bit. A
				value of 1 implies that the functionality of this metadata is enabled for
				the loop.

				.. code-block:: llvm

				!0 = !{!"llvm.loop.vectorize.ivdep.enable", i1 1}

	'``llvm.loop.vectorize.width``' Metadata			'``llvm.loop.vectorize.width``' Metadata
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^			^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	This metadata sets the target width of the vectorizer. The first			This metadata sets the target width of the vectorizer. The first
	operand is the string ``llvm.loop.vectorize.width`` and the second			operand is the string ``llvm.loop.vectorize.width`` and the second
	operand is an integer specifying the width. For example:			operand is an integer specifying the width. For example:

	.. code-block:: llvm			.. code-block:: llvm
	▲ Show 20 Lines • Show All 12,409 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/LoopAccessAnalysis.h

Show First 20 Lines • Show All 195 Lines • ▼ Show 20 Lines	void addAccess(LoadInst *LI) {
InstMap.push_back(LI);		InstMap.push_back(LI);
++AccessIdx;		++AccessIdx;
}		}

/// Check whether the dependencies between the accesses are safe.		/// Check whether the dependencies between the accesses are safe.
///		///
/// Only checks sets with elements in \p CheckDeps.		/// Only checks sets with elements in \p CheckDeps.
bool areDepsSafe(DepCandidates &AccessSets, MemAccessInfoList &CheckDeps,		bool areDepsSafe(DepCandidates &AccessSets, MemAccessInfoList &CheckDeps,
const ValueToValueMap &Strides);		const ValueToValueMap &Strides, bool UnknownDepHint);
		MeinersburUnsubmitted Not Done Reply Inline Actions [style] Start variables with a capital letter. Also to not use two different styles on the same line. Meinersbur: [style] [[ https://www.llvm.org/docs/CodingStandards.html#name-types-functions-variables-and…

/// No memory dependence was encountered that would inhibit		/// No memory dependence was encountered that would inhibit
/// vectorization.		/// vectorization.
bool isSafeForVectorization() const {		bool isSafeForVectorization() const {
return Status == VectorizationSafetyStatus::Safe;		return Status == VectorizationSafetyStatus::Safe;
}		}

/// The maximum number of bytes of a vector register we can vectorize		/// The maximum number of bytes of a vector register we can vectorize
▲ Show 20 Lines • Show All 298 Lines • ▼ Show 20 Lines
/// ScalarEvolution, we will generate run-time checks by emitting a		/// ScalarEvolution, we will generate run-time checks by emitting a
/// SCEVUnionPredicate.		/// SCEVUnionPredicate.
///		///
/// Checks for both memory dependences and the SCEV predicates contained in the		/// Checks for both memory dependences and the SCEV predicates contained in the
/// PSE must be emitted in order for the results of this analysis to be valid.		/// PSE must be emitted in order for the results of this analysis to be valid.
class LoopAccessInfo {		class LoopAccessInfo {
public:		public:
LoopAccessInfo(Loop L, ScalarEvolution SE, const TargetLibraryInfo *TLI,		LoopAccessInfo(Loop L, ScalarEvolution SE, const TargetLibraryInfo *TLI,
AliasAnalysis AA, DominatorTree DT, LoopInfo *LI);		AliasAnalysis AA, DominatorTree DT, LoopInfo *LI,
		bool UnknownDepHint = false);

/// Return true we can analyze the memory accesses in the loop and there are		/// Return true we can analyze the memory accesses in the loop and there are
/// no memory dependence cycles.		/// no memory dependence cycles.
bool canVectorizeMemory() const { return CanVecMem; }		bool canVectorizeMemory() const { return CanVecMem; }

/// Return true if there is a convergent operation in the loop. There may		/// Return true if there is a convergent operation in the loop. There may
/// still be reported runtime pointer checks that would be required, but it is		/// still be reported runtime pointer checks that would be required, but it is
/// not legal to insert them.		/// not legal to insert them.
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	public:
/// should be re-written (and therefore simplified) according to PSE.		/// should be re-written (and therefore simplified) according to PSE.
/// A user of LoopAccessAnalysis will need to emit the runtime checks		/// A user of LoopAccessAnalysis will need to emit the runtime checks
/// associated with this predicate.		/// associated with this predicate.
const PredicatedScalarEvolution &getPSE() const { return *PSE; }		const PredicatedScalarEvolution &getPSE() const { return *PSE; }

private:		private:
/// Analyze the loop.		/// Analyze the loop.
void analyzeLoop(AliasAnalysis AA, LoopInfo LI,		void analyzeLoop(AliasAnalysis AA, LoopInfo LI,
const TargetLibraryInfo TLI, DominatorTree DT);		const TargetLibraryInfo TLI, DominatorTree DT,
		bool UnknownDepHint);

/// Check if the structure of the loop allows it to be analyzed by this		/// Check if the structure of the loop allows it to be analyzed by this
/// pass.		/// pass.
bool canAnalyzeLoop();		bool canAnalyzeLoop();

/// Save the analysis remark.		/// Save the analysis remark.
///		///
/// LAA does not directly emits the remarks. Instead it stores it which the		/// LAA does not directly emits the remarks. Instead it stores it which the
▲ Show 20 Lines • Show All 110 Lines • ▼ Show 20 Lines	public:

bool runOnFunction(Function &F) override;		bool runOnFunction(Function &F) override;

void getAnalysisUsage(AnalysisUsage &AU) const override;		void getAnalysisUsage(AnalysisUsage &AU) const override;

/// Query the result of the loop access information for the loop \p L.		/// Query the result of the loop access information for the loop \p L.
///		///
/// If there is no cached result available run the analysis.		/// If there is no cached result available run the analysis.
const LoopAccessInfo &getInfo(Loop *L);		const LoopAccessInfo &getInfo(Loop *L, bool UnknownDepHint = false);

void releaseMemory() override {		void releaseMemory() override {
// Invalidate the cache when the pass is freed.		// Invalidate the cache when the pass is freed.
LoopAccessInfoMap.clear();		LoopAccessInfoMap.clear();
}		}

/// Print the result of the analysis when invoked with -analyze.		/// Print the result of the analysis when invoked with -analyze.
void print(raw_ostream &OS, const Module *M = nullptr) const override;		void print(raw_ostream &OS, const Module *M = nullptr) const override;
▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h

Show All 38 Lines
/// and can, upon request, write them back as metadata on the loop. It will		/// and can, upon request, write them back as metadata on the loop. It will
/// initially scan the loop for existing metadata, and will update the local		/// initially scan the loop for existing metadata, and will update the local
/// values based on information in the loop.		/// values based on information in the loop.
/// We cannot write all values to metadata, as the mere presence of some info,		/// We cannot write all values to metadata, as the mere presence of some info,
/// for example 'force', means a decision has been made. So, we need to be		/// for example 'force', means a decision has been made. So, we need to be
/// careful NOT to add them if the user hasn't specifically asked so.		/// careful NOT to add them if the user hasn't specifically asked so.
class LoopVectorizeHints {		class LoopVectorizeHints {
enum HintKind { HK_WIDTH, HK_UNROLL, HK_FORCE, HK_ISVECTORIZED,		enum HintKind { HK_WIDTH, HK_UNROLL, HK_FORCE, HK_ISVECTORIZED,
HK_PREDICATE };		HK_PREDICATE, HK_IVDEP };

/// Hint - associates name and validation with the hint value.		/// Hint - associates name and validation with the hint value.
struct Hint {		struct Hint {
const char *Name;		const char *Name;
unsigned Value; // This may have to change for non-numeric values.		unsigned Value; // This may have to change for non-numeric values.
HintKind Kind;		HintKind Kind;

Hint(const char *Name, unsigned Value, HintKind Kind)		Hint(const char *Name, unsigned Value, HintKind Kind)
Show All 12 Lines	class LoopVectorizeHints {
Hint Force;		Hint Force;

/// Already Vectorized		/// Already Vectorized
Hint IsVectorized;		Hint IsVectorized;

/// Vector Predicate		/// Vector Predicate
Hint Predicate;		Hint Predicate;

		/// Ignore Vector dependencies
		Hint Ivdep;

/// Return the loop metadata prefix.		/// Return the loop metadata prefix.
static StringRef Prefix() { return "llvm.loop."; }		static StringRef Prefix() { return "llvm.loop."; }

/// True if there is any unsafe math in the loop.		/// True if there is any unsafe math in the loop.
bool PotentiallyUnsafe = false;		bool PotentiallyUnsafe = false;

public:		public:
enum ForceKind {		enum ForceKind {
Show All 13 Lines	public:

/// Dumps all the hint information.		/// Dumps all the hint information.
void emitRemarkWithHints() const;		void emitRemarkWithHints() const;

unsigned getWidth() const { return Width.Value; }		unsigned getWidth() const { return Width.Value; }
unsigned getInterleave() const { return Interleave.Value; }		unsigned getInterleave() const { return Interleave.Value; }
unsigned getIsVectorized() const { return IsVectorized.Value; }		unsigned getIsVectorized() const { return IsVectorized.Value; }
unsigned getPredicate() const { return Predicate.Value; }		unsigned getPredicate() const { return Predicate.Value; }
		unsigned getIvdep() const { return Ivdep.Value; }
enum ForceKind getForce() const {		enum ForceKind getForce() const {
if ((ForceKind)Force.Value == FK_Undefined &&		if ((ForceKind)Force.Value == FK_Undefined &&
hasDisableAllTransformsHint(TheLoop))		hasDisableAllTransformsHint(TheLoop))
return FK_Disabled;		return FK_Disabled;
return (ForceKind)Force.Value;		return (ForceKind)Force.Value;
}		}

/// If hints are provided that force vectorization, use the AlwaysPrint		/// If hints are provided that force vectorization, use the AlwaysPrint
▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines
/// etc. This code reflects the capabilities of InnerLoopVectorizer.		/// etc. This code reflects the capabilities of InnerLoopVectorizer.
/// This class is also used by InnerLoopVectorizer for identifying		/// This class is also used by InnerLoopVectorizer for identifying
/// induction variable and the different reduction variables.		/// induction variable and the different reduction variables.
class LoopVectorizationLegality {		class LoopVectorizationLegality {
public:		public:
LoopVectorizationLegality(		LoopVectorizationLegality(
Loop L, PredicatedScalarEvolution &PSE, DominatorTree DT,		Loop L, PredicatedScalarEvolution &PSE, DominatorTree DT,
TargetTransformInfo TTI, TargetLibraryInfo TLI, AliasAnalysis *AA,		TargetTransformInfo TTI, TargetLibraryInfo TLI, AliasAnalysis *AA,
Function F, std::function<const LoopAccessInfo &(Loop &)> GetLAA,		Function F, std::function<const LoopAccessInfo &(Loop &, bool)> GetLAA,
LoopInfo LI, OptimizationRemarkEmitter ORE,		LoopInfo LI, OptimizationRemarkEmitter ORE,
LoopVectorizationRequirements R, LoopVectorizeHints H, DemandedBits *DB,		LoopVectorizationRequirements R, LoopVectorizeHints H, DemandedBits *DB,
AssumptionCache *AC)		AssumptionCache *AC)
: TheLoop(L), LI(LI), PSE(PSE), TTI(TTI), TLI(TLI), DT(DT),		: TheLoop(L), LI(LI), PSE(PSE), TTI(TTI), TLI(TLI), DT(DT),
GetLAA(GetLAA), ORE(ORE), Requirements(R), Hints(H), DB(DB), AC(AC) {}		GetLAA(GetLAA), ORE(ORE), Requirements(R), Hints(H), DB(DB), AC(AC) {}

/// ReductionList contains the reduction descriptors for all		/// ReductionList contains the reduction descriptors for all
/// of the reductions that were found in the loop.		/// of the reductions that were found in the loop.
▲ Show 20 Lines • Show All 189 Lines • ▼ Show 20 Lines	private:

/// Target Library Info.		/// Target Library Info.
TargetLibraryInfo *TLI;		TargetLibraryInfo *TLI;

/// Dominator Tree.		/// Dominator Tree.
DominatorTree *DT;		DominatorTree *DT;

// LoopAccess analysis.		// LoopAccess analysis.
std::function<const LoopAccessInfo &(Loop &)> *GetLAA;		std::function<const LoopAccessInfo &(Loop &, bool)> *GetLAA;

// And the loop-accesses info corresponding to this loop. This pointer is		// And the loop-accesses info corresponding to this loop. This pointer is
// null until canVectorizeMemory sets it up.		// null until canVectorizeMemory sets it up.
const LoopAccessInfo *LAI = nullptr;		const LoopAccessInfo *LAI = nullptr;

/// Interface to emit optimization remarks.		/// Interface to emit optimization remarks.
OptimizationRemarkEmitter *ORE;		OptimizationRemarkEmitter *ORE;

▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/Vectorize/LoopVectorize.h

Show First 20 Lines • Show All 132 Lines • ▼ Show 20 Lines	struct LoopVectorizePass : public PassInfoMixin<LoopVectorizePass> {
LoopInfo *LI;		LoopInfo *LI;
TargetTransformInfo *TTI;		TargetTransformInfo *TTI;
DominatorTree *DT;		DominatorTree *DT;
BlockFrequencyInfo *BFI;		BlockFrequencyInfo *BFI;
TargetLibraryInfo *TLI;		TargetLibraryInfo *TLI;
DemandedBits *DB;		DemandedBits *DB;
AliasAnalysis *AA;		AliasAnalysis *AA;
AssumptionCache *AC;		AssumptionCache *AC;
std::function<const LoopAccessInfo &(Loop &)> *GetLAA;		std::function<const LoopAccessInfo &(Loop &, bool)> *GetLAA;
OptimizationRemarkEmitter *ORE;		OptimizationRemarkEmitter *ORE;
ProfileSummaryInfo *PSI;		ProfileSummaryInfo *PSI;

PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);		PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);

// Shim for old PM.		// Shim for old PM.
bool runImpl(Function &F, ScalarEvolution &SE_, LoopInfo &LI_,		bool runImpl(Function &F, ScalarEvolution &SE_, LoopInfo &LI_,
TargetTransformInfo &TTI_, DominatorTree &DT_,		TargetTransformInfo &TTI_, DominatorTree &DT_,
BlockFrequencyInfo &BFI_, TargetLibraryInfo *TLI_,		BlockFrequencyInfo &BFI_, TargetLibraryInfo *TLI_,
DemandedBits &DB_, AliasAnalysis &AA_, AssumptionCache &AC_,		DemandedBits &DB_, AliasAnalysis &AA_, AssumptionCache &AC_,
std::function<const LoopAccessInfo &(Loop &)> &GetLAA_,		std::function<const LoopAccessInfo &(Loop &, bool)> &GetLAA_,
OptimizationRemarkEmitter &ORE_, ProfileSummaryInfo *PSI_);		OptimizationRemarkEmitter &ORE_, ProfileSummaryInfo *PSI_);

bool processLoop(Loop *L);		bool processLoop(Loop *L);
};		};

/// Reports a vectorization failure: print \p DebugMsg for debugging		/// Reports a vectorization failure: print \p DebugMsg for debugging
/// purposes along with the corresponding optimization remark \p RemarkName.		/// purposes along with the corresponding optimization remark \p RemarkName.
/// If \p I is passed, it is an instruction that prevents vectorization.		/// If \p I is passed, it is an instruction that prevents vectorization.
/// Otherwise, the loop \p TheLoop is used for the location of the remark.		/// Otherwise, the loop \p TheLoop is used for the location of the remark.
void reportVectorizationFailure(const StringRef DebugMsg,		void reportVectorizationFailure(const StringRef DebugMsg,
const StringRef OREMsg, const StringRef ORETag,		const StringRef OREMsg, const StringRef ORETag,
OptimizationRemarkEmitter ORE, Loop TheLoop, Instruction *I = nullptr);		OptimizationRemarkEmitter ORE, Loop TheLoop, Instruction *I = nullptr);

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_TRANSFORMS_VECTORIZE_LOOPVECTORIZE_H		#endif // LLVM_TRANSFORMS_VECTORIZE_LOOPVECTORIZE_H

llvm/lib/Analysis/LoopAccessAnalysis.cpp

Show First 20 Lines • Show All 1,627 Lines • ▼ Show 20 Lines	LLVM_DEBUG(dbgs() << "LAA: Positive distance " << Val.getSExtValue()
<< " with max VF = " << MaxVF << '\n');		<< " with max VF = " << MaxVF << '\n');
uint64_t MaxVFInBits = MaxVF * TypeByteSize * 8;		uint64_t MaxVFInBits = MaxVF * TypeByteSize * 8;
MaxSafeRegisterWidth = std::min(MaxSafeRegisterWidth, MaxVFInBits);		MaxSafeRegisterWidth = std::min(MaxSafeRegisterWidth, MaxVFInBits);
return Dependence::BackwardVectorizable;		return Dependence::BackwardVectorizable;
}		}

bool MemoryDepChecker::areDepsSafe(DepCandidates &AccessSets,		bool MemoryDepChecker::areDepsSafe(DepCandidates &AccessSets,
MemAccessInfoList &CheckDeps,		MemAccessInfoList &CheckDeps,
const ValueToValueMap &Strides) {		const ValueToValueMap &Strides,
		bool UnknownDepHint) {

MaxSafeDepDistBytes = -1;		MaxSafeDepDistBytes = -1;
SmallPtrSet<MemAccessInfo, 8> Visited;		SmallPtrSet<MemAccessInfo, 8> Visited;
		Status = VectorizationSafetyStatus::Safe;
for (MemAccessInfo CurAccess : CheckDeps) {		for (MemAccessInfo CurAccess : CheckDeps) {
if (Visited.count(CurAccess))		if (Visited.count(CurAccess))
continue;		continue;

// Get the relevant memory access set.		// Get the relevant memory access set.
EquivalenceClasses<MemAccessInfo>::iterator I =		EquivalenceClasses<MemAccessInfo>::iterator I =
AccessSets.findValue(AccessSets.getLeaderValue(CurAccess));		AccessSets.findValue(AccessSets.getLeaderValue(CurAccess));

Show All 25 Lines	while (AI != AE) {
auto B = std::make_pair(&OI, I2);		auto B = std::make_pair(&OI, I2);

assert(I1 != I2);		assert(I1 != I2);
if (I1 > I2)		if (I1 > I2)
std::swap(A, B);		std::swap(A, B);

Dependence::DepType Type =		Dependence::DepType Type =
isDependent(A.first, A.second, B.first, B.second, Strides);		isDependent(A.first, A.second, B.first, B.second, Strides);
		// Update safety status depending on whether the Dependence type
		// is safe. If Unknown Dependence type is to be considered safe,
		// do not update safety status.
		if (!UnknownDepHint \|\|
		!(Dependence::isSafeForVectorization(Type) ==
		VectorizationSafetyStatus::PossiblySafeWithRtChecks))
mergeInStatus(Dependence::isSafeForVectorization(Type));		mergeInStatus(Dependence::isSafeForVectorization(Type));

// Gather dependences unless we accumulated MaxDependences		// Gather dependences unless we accumulated MaxDependences
// dependences. In that case return as soon as we find the first		// dependences. In that case return as soon as we find the first
// unsafe dependence. This puts a limit on this quadratic		// unsafe dependence. This puts a limit on this quadratic
// algorithm.		// algorithm.
if (RecordDependences) {		if (RecordDependences) {
if (Type != Dependence::NoDep)		if (Type != Dependence::NoDep)
Dependences.push_back(Dependence(A.second, B.second, Type));		Dependences.push_back(Dependence(A.second, B.second, Type));
▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines	if (ExitCount == PSE->getSE()->getCouldNotCompute()) {
return false;		return false;
}		}

return true;		return true;
}		}

void LoopAccessInfo::analyzeLoop(AliasAnalysis AA, LoopInfo LI,		void LoopAccessInfo::analyzeLoop(AliasAnalysis AA, LoopInfo LI,
const TargetLibraryInfo *TLI,		const TargetLibraryInfo *TLI,
DominatorTree *DT) {		DominatorTree *DT,
		bool UnknownDepHint) {
typedef SmallPtrSet<Value*, 16> ValueSet;		typedef SmallPtrSet<Value*, 16> ValueSet;

// Holds the Load and Store instructions.		// Holds the Load and Store instructions.
SmallVector<LoadInst *, 16> Loads;		SmallVector<LoadInst *, 16> Loads;
SmallVector<StoreInst *, 16> Stores;		SmallVector<StoreInst *, 16> Stores;

// Holds all the different accesses in the loop.		// Holds all the different accesses in the loop.
unsigned NumReads = 0;		unsigned NumReads = 0;
▲ Show 20 Lines • Show All 217 Lines • ▼ Show 20 Lines	void LoopAccessInfo::analyzeLoop(AliasAnalysis AA, LoopInfo LI,

LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << "LAA: May be able to perform a memory runtime check if needed.\n");		dbgs() << "LAA: May be able to perform a memory runtime check if needed.\n");

CanVecMem = true;		CanVecMem = true;
if (Accesses.isDependencyCheckNeeded()) {		if (Accesses.isDependencyCheckNeeded()) {
LLVM_DEBUG(dbgs() << "LAA: Checking memory dependencies\n");		LLVM_DEBUG(dbgs() << "LAA: Checking memory dependencies\n");
CanVecMem = DepChecker->areDepsSafe(		CanVecMem = DepChecker->areDepsSafe(
DependentAccesses, Accesses.getDependenciesToCheck(), SymbolicStrides);		DependentAccesses, Accesses.getDependenciesToCheck(), SymbolicStrides,
		UnknownDepHint);
MaxSafeDepDistBytes = DepChecker->getMaxSafeDepDistBytes();		MaxSafeDepDistBytes = DepChecker->getMaxSafeDepDistBytes();

		MeinersburUnsubmitted Not Done Reply Inline Actions [serious] `areDepsSafe` is an expensive call which we should not do redundantly when only one of the result is actually used. [suggestion] Use only `CanVecMem` and update it according to ivdep. Meinersbur: [serious] `areDepsSafe` is an expensive call which we should not do redundantly when only one…
if (!CanVecMem && DepChecker->shouldRetryWithRuntimeCheck()) {		if (!CanVecMem && DepChecker->shouldRetryWithRuntimeCheck()) {
LLVM_DEBUG(dbgs() << "LAA: Retrying with memory checks\n");		LLVM_DEBUG(dbgs() << "LAA: Retrying with memory checks\n");

// Clear the dependency checks. We assume they are not needed.		// Clear the dependency checks. We assume they are not needed.
Accesses.resetDepChecks(*DepChecker);		Accesses.resetDepChecks(*DepChecker);

PtrRtChecking->reset();		PtrRtChecking->reset();
PtrRtChecking->Need = true;		PtrRtChecking->Need = true;
▲ Show 20 Lines • Show All 302 Lines • ▼ Show 20 Lines	void LoopAccessInfo::collectStridedAccess(Value *MemAccess) {
LLVM_DEBUG(dbgs() << "LAA: Found a strided access that we can version.");		LLVM_DEBUG(dbgs() << "LAA: Found a strided access that we can version.");

SymbolicStrides[Ptr] = Stride;		SymbolicStrides[Ptr] = Stride;
StrideSet.insert(Stride);		StrideSet.insert(Stride);
}		}

LoopAccessInfo::LoopAccessInfo(Loop L, ScalarEvolution SE,		LoopAccessInfo::LoopAccessInfo(Loop L, ScalarEvolution SE,
const TargetLibraryInfo TLI, AliasAnalysis AA,		const TargetLibraryInfo TLI, AliasAnalysis AA,
DominatorTree DT, LoopInfo LI)		DominatorTree DT, LoopInfo LI,
		bool UnknownDepHint)
: PSE(std::make_unique<PredicatedScalarEvolution>(SE, L)),		: PSE(std::make_unique<PredicatedScalarEvolution>(SE, L)),
PtrRtChecking(std::make_unique<RuntimePointerChecking>(SE)),		PtrRtChecking(std::make_unique<RuntimePointerChecking>(SE)),
DepChecker(std::make_unique<MemoryDepChecker>(*PSE, L)), TheLoop(L),		DepChecker(std::make_unique<MemoryDepChecker>(*PSE, L)), TheLoop(L),
NumLoads(0), NumStores(0), MaxSafeDepDistBytes(-1), CanVecMem(false),		NumLoads(0), NumStores(0), MaxSafeDepDistBytes(-1), CanVecMem(false),
HasConvergentOp(false),		HasConvergentOp(false),
HasDependenceInvolvingLoopInvariantAddress(false) {		HasDependenceInvolvingLoopInvariantAddress(false) {
if (canAnalyzeLoop())		if (canAnalyzeLoop())
analyzeLoop(AA, LI, TLI, DT);		analyzeLoop(AA, LI, TLI, DT, UnknownDepHint);
}		}

void LoopAccessInfo::print(raw_ostream &OS, unsigned Depth) const {		void LoopAccessInfo::print(raw_ostream &OS, unsigned Depth) const {
if (CanVecMem) {		if (CanVecMem) {
OS.indent(Depth) << "Memory dependences are safe";		OS.indent(Depth) << "Memory dependences are safe";
if (MaxSafeDepDistBytes != -1ULL)		if (MaxSafeDepDistBytes != -1ULL)
OS << " with a maximum dependence distance of " << MaxSafeDepDistBytes		OS << " with a maximum dependence distance of " << MaxSafeDepDistBytes
<< " bytes";		<< " bytes";
Show All 29 Lines	void LoopAccessInfo::print(raw_ostream &OS, unsigned Depth) const {
PSE->getUnionPredicate().print(OS, Depth);		PSE->getUnionPredicate().print(OS, Depth);

OS << "\n";		OS << "\n";

OS.indent(Depth) << "Expressions re-written:\n";		OS.indent(Depth) << "Expressions re-written:\n";
PSE->print(OS, Depth);		PSE->print(OS, Depth);
}		}

const LoopAccessInfo &LoopAccessLegacyAnalysis::getInfo(Loop *L) {		const LoopAccessInfo &LoopAccessLegacyAnalysis::getInfo(Loop *L,
		bool UnknownDepHint) {
auto &LAI = LoopAccessInfoMap[L];		auto &LAI = LoopAccessInfoMap[L];

if (!LAI)		if (!LAI)
LAI = std::make_unique<LoopAccessInfo>(L, SE, TLI, AA, DT, LI);		LAI = std::make_unique<LoopAccessInfo>(L, SE, TLI, AA, DT, LI,
		UnknownDepHint);

return *LAI.get();		return *LAI.get();
}		}

void LoopAccessLegacyAnalysis::print(raw_ostream &OS, const Module *M) const {		void LoopAccessLegacyAnalysis::print(raw_ostream &OS, const Module *M) const {
LoopAccessLegacyAnalysis &LAA = const_cast<LoopAccessLegacyAnalysis >(this);		LoopAccessLegacyAnalysis &LAA = const_cast<LoopAccessLegacyAnalysis >(this);

for (Loop TopLevelLoop : LI)		for (Loop TopLevelLoop : LI)
▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

Show First 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	case HK_WIDTH:
return isPowerOf2_32(Val) && Val <= VectorizerParams::MaxVectorWidth;		return isPowerOf2_32(Val) && Val <= VectorizerParams::MaxVectorWidth;
case HK_UNROLL:		case HK_UNROLL:
return isPowerOf2_32(Val) && Val <= MaxInterleaveFactor;		return isPowerOf2_32(Val) && Val <= MaxInterleaveFactor;
case HK_FORCE:		case HK_FORCE:
return (Val <= 1);		return (Val <= 1);
case HK_ISVECTORIZED:		case HK_ISVECTORIZED:
case HK_PREDICATE:		case HK_PREDICATE:
return (Val == 0 \|\| Val == 1);		return (Val == 0 \|\| Val == 1);
		case HK_IVDEP:
		return (Val == 1);
}		}
return false;		return false;
}		}

LoopVectorizeHints::LoopVectorizeHints(const Loop *L,		LoopVectorizeHints::LoopVectorizeHints(const Loop *L,
bool InterleaveOnlyWhenForced,		bool InterleaveOnlyWhenForced,
OptimizationRemarkEmitter &ORE)		OptimizationRemarkEmitter &ORE)
: Width("vectorize.width", VectorizerParams::VectorizationFactor, HK_WIDTH),		: Width("vectorize.width", VectorizerParams::VectorizationFactor, HK_WIDTH),
Interleave("interleave.count", InterleaveOnlyWhenForced, HK_UNROLL),		Interleave("interleave.count", InterleaveOnlyWhenForced, HK_UNROLL),
Force("vectorize.enable", FK_Undefined, HK_FORCE),		Force("vectorize.enable", FK_Undefined, HK_FORCE),
IsVectorized("isvectorized", 0, HK_ISVECTORIZED),		IsVectorized("isvectorized", 0, HK_ISVECTORIZED),
Predicate("vectorize.predicate.enable", 0, HK_PREDICATE), TheLoop(L),		Predicate("vectorize.predicate.enable", 0, HK_PREDICATE),
		Ivdep("vectorize.ivdep.enable", 0, HK_IVDEP), TheLoop(L),
ORE(ORE) {		ORE(ORE) {
// Populate values with existing loop metadata.		// Populate values with existing loop metadata.
getHintsFromMetadata();		getHintsFromMetadata();

// force-vector-interleave overrides DisableInterleaving.		// force-vector-interleave overrides DisableInterleaving.
if (VectorizerParams::isInterleaveForced())		if (VectorizerParams::isInterleaveForced())
Interleave.Value = VectorizerParams::VectorizationInterleave;		Interleave.Value = VectorizerParams::VectorizationInterleave;

▲ Show 20 Lines • Show All 135 Lines • ▼ Show 20 Lines	if (!Name.startswith(Prefix()))
return;		return;
Name = Name.substr(Prefix().size(), StringRef::npos);		Name = Name.substr(Prefix().size(), StringRef::npos);

const ConstantInt *C = mdconst::dyn_extract<ConstantInt>(Arg);		const ConstantInt *C = mdconst::dyn_extract<ConstantInt>(Arg);
if (!C)		if (!C)
return;		return;
unsigned Val = C->getZExtValue();		unsigned Val = C->getZExtValue();

Hint *Hints[] = {&Width, &Interleave, &Force, &IsVectorized, &Predicate};		Hint *Hints[] = {&Width, &Interleave, &Force, &IsVectorized, &Predicate,
		&Ivdep};
for (auto H : Hints) {		for (auto H : Hints) {
if (Name == H->Name) {		if (Name == H->Name) {
if (H->validate(Val))		if (H->validate(Val))
H->Value = Val;		H->Value = Val;
else		else
LLVM_DEBUG(dbgs() << "LV: ignoring invalid hint '" << Name << "'\n");		LLVM_DEBUG(dbgs() << "LV: ignoring invalid hint '" << Name << "'\n");
break;		break;
}		}
▲ Show 20 Lines • Show All 584 Lines • ▼ Show 20 Lines	bool LoopVectorizationLegality::canVectorizeInstrs() {
// will create another.		// will create another.
if (PrimaryInduction && WidestIndTy != PrimaryInduction->getType())		if (PrimaryInduction && WidestIndTy != PrimaryInduction->getType())
PrimaryInduction = nullptr;		PrimaryInduction = nullptr;

return true;		return true;
}		}

bool LoopVectorizationLegality::canVectorizeMemory() {		bool LoopVectorizationLegality::canVectorizeMemory() {
LAI = &(GetLAA)(TheLoop);		LAI = &(GetLAA)(TheLoop, Hints->getIvdep());
const OptimizationRemarkAnalysis *LAR = LAI->getReport();		const OptimizationRemarkAnalysis *LAR = LAI->getReport();
if (LAR) {		if (LAR) {
ORE->emit([&]() {		ORE->emit([&]() {
return OptimizationRemarkAnalysis(Hints->vectorizeAnalysisPassName(),		return OptimizationRemarkAnalysis(Hints->vectorizeAnalysisPassName(),
"loop not vectorized: ", *LAR);		"loop not vectorized: ", *LAR);
});		});
}		}
if (!LAI->canVectorizeMemory())		if (!LAI->canVectorizeMemory())
▲ Show 20 Lines • Show All 405 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,604 Lines • ▼ Show 20 Lines	bool runOnFunction(Function &F) override {
auto *TLI = TLIP ? &TLIP->getTLI(F) : nullptr;		auto *TLI = TLIP ? &TLIP->getTLI(F) : nullptr;
auto *AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();		auto *AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();
auto *AC = &getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);		auto *AC = &getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);
auto *LAA = &getAnalysis<LoopAccessLegacyAnalysis>();		auto *LAA = &getAnalysis<LoopAccessLegacyAnalysis>();
auto *DB = &getAnalysis<DemandedBitsWrapperPass>().getDemandedBits();		auto *DB = &getAnalysis<DemandedBitsWrapperPass>().getDemandedBits();
auto *ORE = &getAnalysis<OptimizationRemarkEmitterWrapperPass>().getORE();		auto *ORE = &getAnalysis<OptimizationRemarkEmitterWrapperPass>().getORE();
auto *PSI = &getAnalysis<ProfileSummaryInfoWrapperPass>().getPSI();		auto *PSI = &getAnalysis<ProfileSummaryInfoWrapperPass>().getPSI();

std::function<const LoopAccessInfo &(Loop &)> GetLAA =		std::function<const LoopAccessInfo &(Loop &, bool)> GetLAA =
[&](Loop &L) -> const LoopAccessInfo & { return LAA->getInfo(&L); };		[&](Loop &L, bool UnknownDepHint) -> const LoopAccessInfo &
		{ return LAA->
		getInfo(&L, UnknownDepHint); };

return Impl.runImpl(F, SE, LI, TTI, DT, BFI, TLI, DB, AA, AC,		return Impl.runImpl(F, SE, LI, TTI, DT, BFI, TLI, DB, AA, AC,
GetLAA, *ORE, PSI);		GetLAA, *ORE, PSI);
}		}

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.addRequired<AssumptionCacheTracker>();		AU.addRequired<AssumptionCacheTracker>();
AU.addRequired<BlockFrequencyInfoWrapperPass>();		AU.addRequired<BlockFrequencyInfoWrapperPass>();
▲ Show 20 Lines • Show All 6,172 Lines • ▼ Show 20 Lines	#endif /* NDEBUG */
LLVM_DEBUG(verifyFunction(*L->getHeader()->getParent()));		LLVM_DEBUG(verifyFunction(*L->getHeader()->getParent()));
return true;		return true;
}		}

bool LoopVectorizePass::runImpl(		bool LoopVectorizePass::runImpl(
Function &F, ScalarEvolution &SE_, LoopInfo &LI_, TargetTransformInfo &TTI_,		Function &F, ScalarEvolution &SE_, LoopInfo &LI_, TargetTransformInfo &TTI_,
DominatorTree &DT_, BlockFrequencyInfo &BFI_, TargetLibraryInfo *TLI_,		DominatorTree &DT_, BlockFrequencyInfo &BFI_, TargetLibraryInfo *TLI_,
DemandedBits &DB_, AliasAnalysis &AA_, AssumptionCache &AC_,		DemandedBits &DB_, AliasAnalysis &AA_, AssumptionCache &AC_,
std::function<const LoopAccessInfo &(Loop &)> &GetLAA_,		std::function<const LoopAccessInfo &(Loop &, bool)> &GetLAA_,
OptimizationRemarkEmitter &ORE_, ProfileSummaryInfo *PSI_) {		OptimizationRemarkEmitter &ORE_, ProfileSummaryInfo *PSI_) {
SE = &SE_;		SE = &SE_;
LI = &LI_;		LI = &LI_;
TTI = &TTI_;		TTI = &TTI_;
DT = &DT_;		DT = &DT_;
BFI = &BFI_;		BFI = &BFI_;
TLI = TLI_;		TLI = TLI_;
AA = &AA_;		AA = &AA_;
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	PreservedAnalyses LoopVectorizePass::run(Function &F,
auto &AC = AM.getResult<AssumptionAnalysis>(F);		auto &AC = AM.getResult<AssumptionAnalysis>(F);
auto &DB = AM.getResult<DemandedBitsAnalysis>(F);		auto &DB = AM.getResult<DemandedBitsAnalysis>(F);
auto &ORE = AM.getResult<OptimizationRemarkEmitterAnalysis>(F);		auto &ORE = AM.getResult<OptimizationRemarkEmitterAnalysis>(F);
MemorySSA *MSSA = EnableMSSALoopDependency		MemorySSA *MSSA = EnableMSSALoopDependency
? &AM.getResult<MemorySSAAnalysis>(F).getMSSA()		? &AM.getResult<MemorySSAAnalysis>(F).getMSSA()
: nullptr;		: nullptr;

auto &LAM = AM.getResult<LoopAnalysisManagerFunctionProxy>(F).getManager();		auto &LAM = AM.getResult<LoopAnalysisManagerFunctionProxy>(F).getManager();
std::function<const LoopAccessInfo &(Loop &)> GetLAA =		std::function<const LoopAccessInfo &(Loop &, bool)> GetLAA =
[&](Loop &L) -> const LoopAccessInfo & {		[&](Loop &L, bool UnknownDepHint) -> const LoopAccessInfo & {
LoopStandardAnalysisResults AR = {AA, AC, DT, LI, SE, TLI, TTI, MSSA};		LoopStandardAnalysisResults AR = {AA, AC, DT, LI, SE, TLI, TTI, MSSA};
return LAM.getResult<LoopAccessAnalysis>(L, AR);		return LAM.getResult<LoopAccessAnalysis>(L, AR);
};		};
const ModuleAnalysisManager &MAM =		const ModuleAnalysisManager &MAM =
AM.getResult<ModuleAnalysisManagerFunctionProxy>(F).getManager();		AM.getResult<ModuleAnalysisManagerFunctionProxy>(F).getManager();
ProfileSummaryInfo *PSI =		ProfileSummaryInfo *PSI =
MAM.getCachedResult<ProfileSummaryAnalysis>(*F.getParent());		MAM.getCachedResult<ProfileSummaryAnalysis>(*F.getParent());
bool Changed =		bool Changed =
Show All 17 Lines

llvm/test/Transforms/LoopVectorize/X86/ivdep-alias.ll

This file was added.

				; RUN: opt < %s -O3 -S \| FileCheck %s
				; IR generated for a function containing the loop:
				; #pragma clang loop ivdep(enable)
				; for (int i=0; i<LEN_1D; i++)
				; a[b[i]]++
				; where LEN_1D is an integer constant.
				; The above is an unknown dependency as the vectorizer cannot determine if
				; vectorizing accesses to a[b[i]] will be safe or unsafe.
				; Check if loop has been vectorized when ivdep is present.
				target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; Function Attrs: noinline nounwind uwtable
				define dso_local i32* @addLoops(i32* noalias %a, i32* noalias %b, i32 %LEN_1D) #0 {
				entry:
				%a.addr = alloca i32*, align 8
				%b.addr = alloca i32*, align 8
				%LEN_1D.addr = alloca i32, align 4
				%i = alloca i32, align 4
				store i32* %a, i32** %a.addr, align 8
				store i32* %b, i32** %b.addr, align 8
				store i32 %LEN_1D, i32* %LEN_1D.addr, align 4
				store i32 0, i32* %i, align 4
				br label %for.cond

				; CHECK: vector.ph:
				; CHECK-NEXT: %n.vec = and i64 %wide.trip.count, 4294967292
				for.cond: ; preds = %for.inc, %entry
				%0 = load i32, i32* %i, align 4
				%1 = load i32, i32* %LEN_1D.addr, align 4
				%cmp = icmp slt i32 %0, %1
				br i1 %cmp, label %for.body, label %for.end
				; CHECK: br label %vector.body

				; CHECK: vector.body:
				for.body: ; preds = %for.cond
				%2 = load i32, i32* %a.addr, align 8
				%3 = load i32, i32* %b.addr, align 8
				; CHECK: %wide.load = load <4 x i32>, <4 x i32>* %1, align 4
				; CHECK: %3 = extractelement <4 x i64> %2, i32 0
				%4 = load i32, i32* %i, align 4
				%idxprom = sext i32 %4 to i64
				%arrayidx = getelementptr inbounds i32, i32* %3, i64 %idxprom
				%5 = load i32, i32* %arrayidx, align 4
				; CHECK: %16 = insertelement <4 x i32> %15, i32 %12, i32 1
				; CHECK: %21 = extractelement <4 x i32> %19, i32 1
				%idxprom1 = sext i32 %5 to i64
				%arrayidx2 = getelementptr inbounds i32, i32* %2, i64 %idxprom1
				%6 = load i32, i32* %arrayidx2, align 4
				%inc = add nsw i32 %6, 1
				store i32 %inc, i32* %arrayidx2, align 4
				br label %for.inc
				; CHECK: %24 = icmp eq i64 %index.next, %n.vec
				; CHECK: br i1 %24, label %middle.block, label %vector.body, !llvm.loop !2

				for.inc: ; preds = %for.body
				%7 = load i32, i32* %i, align 4
				%inc3 = add nsw i32 %7, 1
				store i32 %inc3, i32* %i, align 4
				br label %for.cond, !llvm.loop !2

				for.end: ; preds = %for.cond
				%8 = load i32, i32* %a.addr, align 8
				ret i32* %8
				}

				attributes #0 = { noinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="all" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

				!llvm.module.flags = !{!0}
				!llvm.ident = !{!1}

				!0 = !{i32 1, !"wchar_size", i32 4}
				!1 = !{!"clang version 10.0.0 (https://github.com/llvm/llvm-project 8a5bfbe6db2824642bf9a1d27a24c5b6132b244f)"}
				; CHECK: !2 = distinct !{!2, !3}
				; CHECK-NEXT: !3 = !{!"llvm.loop.isvectorized", i32 1}
				; CHECK-NEXT: !4 = distinct !{!4, !5, !3}
				; CHECK-NEXT: !5 = !{!"llvm.loop.unroll.runtime.disable"}
				!2 = distinct !{!2, !3, !4}
				!3 = !{!"llvm.loop.vectorize.ivdep.enable", i1 true}
				!4 = !{!"llvm.loop.vectorize.enable", i1 true}

llvm/test/Transforms/LoopVectorize/X86/ivdep-novec.ll

This file was added.

				; RUN: opt < %s -O3 -S \| FileCheck %s
				; IR generated for a function containing the loop:
				; #pragma clang loop ivdep(enable)
				; for (i = 1; i < n; i++)
				; A[i] = A[i] + A[i-1];
				; where n is an integer constants.
				; The above dependency can be determine by the vectorizer to be unsafe for
				; vectorization.
				; Should not vectorize even if ivdep is present.
				target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; Function Attrs: noinline nounwind uwtable
				define dso_local i32* @calcDepArray(i32* %A, i32 %n) #0 {
				entry:
				%A.addr = alloca i32*, align 8
				%n.addr = alloca i32, align 4
				%i = alloca i32, align 4
				store i32* %A, i32** %A.addr, align 8
				store i32 %n, i32* %n.addr, align 4
				store i32 1, i32* %i, align 4
				br label %for.cond

				for.cond: ; preds = %for.inc, %entry
				%0 = load i32, i32* %i, align 4
				%1 = load i32, i32* %n.addr, align 4
				%cmp = icmp slt i32 %0, %1
				br i1 %cmp, label %for.body, label %for.end

				; CHECK: for.body:
				for.body: ; preds = %for.cond
				%2 = load i32, i32* %A.addr, align 8
				%3 = load i32, i32* %i, align 4
				%idxprom = sext i32 %3 to i64
				%arrayidx = getelementptr inbounds i32, i32* %2, i64 %idxprom
				%4 = load i32, i32* %arrayidx, align 4
				%5 = load i32, i32* %A.addr, align 8
				%6 = load i32, i32* %i, align 4
				%sub = sub nsw i32 %6, 1
				%idxprom1 = sext i32 %sub to i64
				%arrayidx2 = getelementptr inbounds i32, i32* %5, i64 %idxprom1
				%7 = load i32, i32* %arrayidx2, align 4
				%add = add nsw i32 %4, %7
				%8 = load i32, i32* %A.addr, align 8
				%9 = load i32, i32* %i, align 4
				%idxprom3 = sext i32 %9 to i64
				%arrayidx4 = getelementptr inbounds i32, i32* %8, i64 %idxprom3
				store i32 %add, i32* %arrayidx4, align 4
				br label %for.inc

				for.inc: ; preds = %for.body
				%10 = load i32, i32* %i, align 4
				%inc = add nsw i32 %10, 1
				store i32 %inc, i32* %i, align 4
				br label %for.cond, !llvm.loop !2

				for.end: ; preds = %for.cond
				%11 = load i32, i32* %A.addr, align 8
				ret i32* %11
				}

				attributes #0 = { noinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="all" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

				!llvm.module.flags = !{!0}
				!llvm.ident = !{!1}

				!0 = !{i32 1, !"wchar_size", i32 4}
				!1 = !{!"clang version 10.0.0 (https://github.com/llvm/llvm-project 8a5bfbe6db2824642bf9a1d27a24c5b6132b244f)"}
				; CHECK: !2 = distinct !{!2, !3, !4}
				; CHECK-NEXT: !3 = !{!"llvm.loop.vectorize.ivdep.enable", i1 true}
				; CHECK-NEXT: !4 = !{!"llvm.loop.vectorize.enable", i1 true}
				!2 = distinct !{!2, !3, !4}
				!3 = !{!"llvm.loop.vectorize.ivdep.enable", i1 true}
				!4 = !{!"llvm.loop.vectorize.enable", i1 true}

llvm/test/Transforms/LoopVectorize/X86/ivdep-unkbounds.ll

This file was added.

				; RUN: opt < %s -O3 -S \| FileCheck %s
				; IR generated for a function containing the loop:
				; #pragma clang loop ivdep(enable)
				; for (i = 0; i < 64; i++)
				; A[ii] = 2;
				; In the above example, the vectorizer cannot determine if
				; array accesses are within array bounds and is safe for vectorization.
				; Vectorizer regards it as an unknown dependency.
				; Check if loop has been vectorized when ivdep is present.
				target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; Function Attrs: noinline nounwind uwtable
				define dso_local i32* @doubleArrayElements(i32* %A) #0 {
				entry:
				%A.addr = alloca i32*, align 8
				%i = alloca i32, align 4
				store i32* %A, i32** %A.addr, align 8
				store i32 0, i32* %i, align 4
				br label %for.cond
				; CHECK: br label %vector.body

				for.cond: ; preds = %for.inc, %entry
				%0 = load i32, i32* %i, align 4
				%cmp = icmp slt i32 %0, 64
				br i1 %cmp, label %for.body, label %for.end

				; CHECK: vector.body:
				for.body: ; preds = %for.cond
				; CHECK: %vec.ind = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, %entry ], [ %vec.ind.next, %vector.body ]
				; CHECK: %0 = mul <4 x i64> %vec.ind, %vec.ind
				; CHECK: %2 = extractelement <4 x i64> %1, i32 0
				%1 = load i32, i32* %A.addr, align 8
				%2 = load i32, i32* %i, align 4
				%3 = load i32, i32* %i, align 4
				%mul = mul nsw i32 %2, %3
				%idxprom = sext i32 %mul to i64
				%arrayidx = getelementptr inbounds i32, i32* %1, i64 %idxprom
				; CHECK: %15 = insertelement <4 x i32> %14, i32 %11, i32 1
				; CHECK: %21 = extractelement <4 x i32> %18, i32 2
				%4 = load i32, i32* %arrayidx, align 4
				%mul1 = mul nsw i32 %4, 2
				store i32 %mul1, i32* %arrayidx, align 4
				; CHECK: %vec.ind.next = add <4 x i64> %vec.ind, <i64 4, i64 4, i64 4, i64 4>
				br label %for.inc
				; CHECK: br i1 %23, label %for.end, label %vector.body, !llvm.loop !2

				for.inc: ; preds = %for.body
				%5 = load i32, i32* %i, align 4
				%inc = add nsw i32 %5, 1
				store i32 %inc, i32* %i, align 4
				br label %for.cond, !llvm.loop !2

				for.end: ; preds = %for.cond
				%6 = load i32, i32* %A.addr, align 8
				ret i32* %6
				}

				attributes #0 = { noinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="all" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

				!llvm.module.flags = !{!0}
				!llvm.ident = !{!1}

				!0 = !{i32 1, !"wchar_size", i32 4}
				!1 = !{!"clang version 10.0.0 (https://github.com/llvm/llvm-project 8a5bfbe6db2824642bf9a1d27a24c5b6132b244f)"}
				; CHECK: !2 = distinct !{!2, !3}
				; CHECK-NEXT: !3 = !{!"llvm.loop.isvectorized", i32 1}
				!2 = distinct !{!2, !3, !4}
				!3 = !{!"llvm.loop.vectorize.ivdep.enable", i1 true}
				!4 = !{!"llvm.loop.vectorize.enable", i1 true}

llvm/test/Transforms/LoopVectorize/X86/ivdep-unkdep.ll

This file was added.

				; RUN: opt < %s -O3 -S \| FileCheck %s
				; IR generated for a function containing the loop:
				; #pragma clang loop ivdep(enable)
				; for (i = 0; i < m; i++){
				; a[i] = a[i + k] * c;
				; where m, k, c are integer constants.
				; The above is an unknown dependency as the vectorizer cannot determine if
				; accesses are independent and a[i + k] is within
				; array bounds. It depends on value of k and dependence is not determined to
				; be safe or unsafe.
				; Check if the loop has been vectorized when ivdep is present.
				target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; Function Attrs: noinline nounwind uwtable
				define dso_local i32 @calcArray(i32* %a, i32 %m, i32 %k, i32 %c) #0 {
				entry:
				%a.addr = alloca i32*, align 8
				%m.addr = alloca i32, align 4
				%k.addr = alloca i32, align 4
				%c.addr = alloca i32, align 4
				%i = alloca i32, align 4
				store i32* %a, i32** %a.addr, align 8
				store i32 %m, i32* %m.addr, align 4
				store i32 %k, i32* %k.addr, align 4
				store i32 %c, i32* %c.addr, align 4
				store i32 0, i32* %i, align 4
				br label %for.cond

				for.cond: ; preds = %for.inc, %entry
				%0 = load i32, i32* %i, align 4
				%1 = load i32, i32* %m.addr, align 4
				%cmp = icmp slt i32 %0, %1
				br i1 %cmp, label %for.body, label %for.end
				; CHECK: vector.ph
				; CHECK: %n.vec = and i64 %wide.trip.count, 4294967288
				; CHECK: %broadcast.splatinsert10 = insertelement <4 x i32> undef, i32 %c, i32 0
				; CHECK: %broadcast.splat11 = shufflevector <4 x i32> %broadcast.splatinsert10, <4 x i32> undef, <4 x i32> zeroinitializer
				; CHECK: br i1 %4, label %middle.block.unr-lcssa, label %vector.ph.new

				; CHECK: vector.ph.new:
				; CHECK: br label %vector.body

				for.body: ; preds = %for.cond
				%2 = load i32, i32* %a.addr, align 8
				%3 = load i32, i32* %i, align 4
				%4 = load i32, i32* %k.addr, align 4
				%add = add nsw i32 %3, %4
				%idxprom = sext i32 %add to i64
				%arrayidx = getelementptr inbounds i32, i32* %2, i64 %idxprom
				%5 = load i32, i32* %arrayidx, align 4
				%6 = load i32, i32* %c.addr, align 4
				; CHECK: %wide.load = load <4 x i32>, <4 x i32>* %7, align 4
				; CHECK: %10 = mul nsw <4 x i32> %wide.load, %broadcast.splat11
				; CHECK: store <4 x i32> %10, <4 x i32>* %13, align 4
				%mul = mul nsw i32 %5, %6
				%7 = load i32, i32* %a.addr, align 8
				%8 = load i32, i32* %i, align 4
				%idxprom1 = sext i32 %8 to i64
				%arrayidx2 = getelementptr inbounds i32, i32* %7, i64 %idxprom1
				store i32 %mul, i32* %arrayidx2, align 4
				br label %for.inc
				; CHECK: br i1 %niter.ncmp.1, label %middle.block.unr-lcssa, label %vector.body, !llvm.loop !2

				for.inc: ; preds = %for.body
				%9 = load i32, i32* %i, align 4
				%inc = add nsw i32 %9, 1
				store i32 %inc, i32* %i, align 4
				br label %for.cond, !llvm.loop !2

				for.end: ; preds = %for.cond
				ret i32 0
				}

				attributes #0 = { noinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="all" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

				!llvm.module.flags = !{!0}
				!llvm.ident = !{!1}

				!0 = !{i32 1, !"wchar_size", i32 4}
				!1 = !{!"clang version 10.0.0 (https://github.com/llvm/llvm-project 8a5bfbe6db2824642bf9a1d27a24c5b6132b244f)"}
				; CHECK: !2 = distinct !{!2, !3}
				; CHECK-NEXT: !3 = !{!"llvm.loop.isvectorized", i32 1}
				; CHECK-NEXT: !4 = distinct !{!4, !5, !3}
				; CHECK-NEXT: !5 = !{!"llvm.loop.unroll.runtime.disable"}
				!2 = distinct !{!2, !3, !4}
				!3 = !{!"llvm.loop.vectorize.ivdep.enable", i1 true}
				!4 = !{!"llvm.loop.vectorize.enable", i1 true}

This is an archive of the discontinued LLVM Phabricator instance.

Ignore Unknown dependencies using vectorize.ivdep metadata
Needs ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 242089

llvm/docs/LangRef.rst

llvm/include/llvm/Analysis/LoopAccessAnalysis.h

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h

llvm/include/llvm/Transforms/Vectorize/LoopVectorize.h

llvm/lib/Analysis/LoopAccessAnalysis.cpp

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/test/Transforms/LoopVectorize/X86/ivdep-alias.ll

llvm/test/Transforms/LoopVectorize/X86/ivdep-novec.ll

llvm/test/Transforms/LoopVectorize/X86/ivdep-unkbounds.ll

llvm/test/Transforms/LoopVectorize/X86/ivdep-unkdep.ll

This is an archive of the discontinued LLVM Phabricator instance.

Ignore Unknown dependencies using vectorize.ivdep metadataNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 242089

llvm/docs/LangRef.rst

llvm/include/llvm/Analysis/LoopAccessAnalysis.h

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h

llvm/include/llvm/Transforms/Vectorize/LoopVectorize.h

llvm/lib/Analysis/LoopAccessAnalysis.cpp

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/test/Transforms/LoopVectorize/X86/ivdep-alias.ll

llvm/test/Transforms/LoopVectorize/X86/ivdep-novec.ll

llvm/test/Transforms/LoopVectorize/X86/ivdep-unkbounds.ll

llvm/test/Transforms/LoopVectorize/X86/ivdep-unkdep.ll

Ignore Unknown dependencies using vectorize.ivdep metadata
Needs ReviewPublic