This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
11/17
LoopVectorize.cpp
-
test/
-
Analysis/CostModel/X86/
-
CostModel/
-
X86/
-
masked-interleaved-store-i16.ll
-
masked-scatter-i32-with-i8-index.ll
-
masked-scatter-i64-with-i8-index.ll
-
Other/
-
new-pm-defaults.ll
-
new-pm-lto-defaults.ll
-
new-pm-thinlto-postlink-defaults.ll
-
Transforms/LoopVectorize/
-
LoopVectorize/
-
ARM/
-
mve-icmpcost.ll
-
icmp-uniforms.ll
-
if-pred-non-void.ll

Differential D147114

[LV] Use BFI to adjust cost of predicated instructions
Needs ReviewPublic

Authored by ebrevnov on Mar 28 2023, 11:17 PM.

Download Raw Diff

Details

Reviewers

fhahn
dmgreen
mssimpso

Summary

Currently, vectorizer uses hard coded scale (1/2) to adjust cost of predicated instructions. Since actual probability of predicated instruction execution may vary from 0 to 1 predicted cost may be very far from reality. This patch brings BFI to cost calculation of predicated instructions.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ebrevnov created this revision.Mar 28 2023, 11:17 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2023, 11:17 PM

Herald added subscribers: ormris, dmgreen, arphaman and 2 others. · View Herald Transcript

ebrevnov requested review of this revision.Mar 28 2023, 11:17 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2023, 11:17 PM

Herald added subscribers: llvm-commits, • pcwang-thead. · View Herald Transcript

ebrevnov added reviewers: fhahn, dmgreen, mssimpso.Mar 28 2023, 11:25 PM

Herald added a subscriber: StephenFan. · View Herald TranscriptMar 28 2023, 11:25 PM

Harbormaster completed remote builds in B222414: Diff 509220.Mar 29 2023, 4:25 AM

Hello. This sounds like a nice idea. I sometimes worry about practice not matching theory, but I hope it should an improvement for the most part. Do you have any benchmark data to back it up?

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5596	Could invalid cost handling be pushed into this function. It would help not needing it at all the call sites.
10586	Is BFI intended to be required now? Some of the other code makes it look like it is still optional.

The other thing I thought of was about scalar costs rounding down towards 0, as they are all stored as integers. I have no data to suggest that any one scheme is better or worse than another, but it might make sense to round towards nearest or round up. I'm just imagining costs possibly rounding to 0 in to many cases when the block frequencies are quite different.

Do we have a test with profile info? Might be worth having dedicated tests for this feature with a range of branch probabilities if possible

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4538	this needs updating now I think
5596	Yeah I think that would help readability
5602	Could you explain the reasoning here? Maybe fall back to the original code using `getReciprocalPredBlockProb` here?

In D147114#4230437, @dmgreen wrote:

Do you have any benchmark data to back it up?
I have motivating example which shows almost 2x degradation due to vectorization. I will add it as lit test.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4538	I just removed second sentence. I think everything is clear without extra details.
5596	Fixed.
5602	Good idea. Fixed as suggested.
10586	Is BFI intended to be required now? Yes. In fact, it should look like: BlockFrequencyInfo *BFI = &AM.getResult<BlockFrequencyAnalysis>(F); Will fix. Some of the other code makes it look like it is still optional. Right. Before this change there was exactly one use of BFI guarded by PSI. I would suggest not to touch this place (at least now) because if compile time regression shows up we may want to conditionally request BFI for predicated loops only.

In D147114#4231319, @fhahn wrote:

Do we have a test with profile info? Might be worth having dedicated tests for this feature with a range of branch probabilities if possible

Sure. Will such tests.

nikic added a subscriber: nikic.Mar 30 2023, 2:40 AM

nikic added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
10586	This will definitely regress compile-time. You are only allowed to query BFI in PGO builds, which implies presence of PSI.

Updated as requested. New tests added.

ebrevnov added inline comments.Mar 30 2023, 10:19 PM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
10586	But that depends on BPI availability from previous passes and needs of BPI by following passes, right?

Harbormaster completed remote builds in B222915: Diff 509901.Mar 30 2023, 10:52 PM

nikic added inline comments.Mar 31 2023, 1:38 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
10586	Not sure what you mean here. BPI is a dependency of BFI, so it will be calculated if not already available?

ebrevnov added inline comments.Apr 5 2023, 10:16 PM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
10586	I meant BFI (not BPI). Sorry for confusion.

Are there any additional concerns?

nikic added inline comments.Apr 14 2023, 3:24 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
10586	I still don't really get what you mean here. If you use getResult (rather than getCachedResult) BFI will be fetched regardless of whether it is available from previous passes. The important part is that you shouldn't fetch BFI if PSI is not available. In test cases, you need to request PSI explicitly, in the pass pipeline it will be automatically available for PGO builds.

ebrevnov added inline comments.Apr 20 2023, 1:26 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
10586	I still don't really get what you mean here. If you use getResult (rather than getCachedResult) BFI will be fetched regardless of whether it is available from previous passes. But if it is available from previous passes there will be no compile time impact. "This will definitely regress compile-time." I'm trying to show that this statement is not necessarily true and depends on different factors including particular pass ordering. Namely, getResult<BlockFrequencyAnalysis> is free (regardless of PGO/PSI) if BFI is available from previous passes. Moreover, if one of the passes following LV requests BFI and loop is not vectorized (because vectorization invalidates BFI) then there is no extra overhead either. Thus there is a pretty high chance that there will be no any impact in the practice. If it is it will be possible to significantly reduce the impact by requesting BFI lazily... but I would really like not to go this way with out strong need. The important part is that you shouldn't fetch BFI if PSI is not available. In test cases, you need to request PSI explicitly, in the pass pipeline it will be automatically available for PGO builds. Not sure, why you say "you shouldn't fetch BFI if PSI is not available". There are many cases when we do that. Why vectorizer can't do the same (is it due to potential compile time increase)? I think use of BFI is the best solution for the problem. Do you have any ideas how to implement the functionality without BFI?

nikic added inline comments.Apr 20 2023, 2:08 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
10586	In non-PGO builds at least, BFI will not be available in this pipeline position, and following passes will not request it. Anyway, I checked, and this does impact compile-time: http://llvm-compile-time-tracker.com/compare.php?from=9d52f69afef64776b830bb9adc4e1737ff8fc426&to=778b1b4ab6cd7d0d56f5746fce61a6e63c9cf722&stat=instructions:u There are many cases when we do that. No we don't. To the best of my knowledge the only place that makes use of BFI without PSI is the inliner. In all other passes BFI use is conditional on PSI. (Excluding passes that aren't part of the default pipeline, of course.)

ebrevnov added inline comments.Apr 20 2023, 2:48 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
10586	Ok, I see the problem now. Since BFI is going to be used to drive cost model decision it is not an option to make it PSI dependent. The only possible solution I see is to request BFI lazy only when we get to cost modeling. What do you think?

@nikic @fhahn please take a look at an alternative solution demonstrating an idea of lazy creation of BFI D155831.

Herald added subscribers: wangpc, artagnon. · View Herald TranscriptJul 20 2023, 6:48 AM

igor.kirillov added a subscriber: igor.kirillov.Jul 21 2023, 5:23 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

50 lines

test/

Analysis/

CostModel/

X86/

masked-interleaved-store-i16.ll

10 lines

masked-scatter-i32-with-i8-index.ll

2 lines

masked-scatter-i64-with-i8-index.ll

2 lines

Other/

new-pm-defaults.ll

2 lines

new-pm-lto-defaults.ll

2 lines

new-pm-thinlto-postlink-defaults.ll

2 lines

Transforms/

LoopVectorize/

ARM/

mve-icmpcost.ll

2 lines

icmp-uniforms.ll

25 lines

if-pred-non-void.ll

153 lines

Diff 509220

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,183 Lines • ▼ Show 20 Lines
/// different operations.		/// different operations.
class LoopVectorizationCostModel {		class LoopVectorizationCostModel {
public:		public:
LoopVectorizationCostModel(ScalarEpilogueLowering SEL, Loop *L,		LoopVectorizationCostModel(ScalarEpilogueLowering SEL, Loop *L,
PredicatedScalarEvolution &PSE, LoopInfo *LI,		PredicatedScalarEvolution &PSE, LoopInfo *LI,
LoopVectorizationLegality *Legal,		LoopVectorizationLegality *Legal,
const TargetTransformInfo &TTI,		const TargetTransformInfo &TTI,
const TargetLibraryInfo TLI, DemandedBits DB,		const TargetLibraryInfo TLI, DemandedBits DB,
AssumptionCache *AC,		AssumptionCache AC, BlockFrequencyInfo BFI,
OptimizationRemarkEmitter ORE, const Function F,		OptimizationRemarkEmitter ORE, const Function F,
const LoopVectorizeHints *Hints,		const LoopVectorizeHints *Hints,
InterleavedAccessInfo &IAI)		InterleavedAccessInfo &IAI)
: ScalarEpilogueStatus(SEL), TheLoop(L), PSE(PSE), LI(LI), Legal(Legal),		: ScalarEpilogueStatus(SEL), TheLoop(L), PSE(PSE), LI(LI), Legal(Legal),
TTI(TTI), TLI(TLI), DB(DB), AC(AC), ORE(ORE), TheFunction(F),		TTI(TTI), TLI(TLI), DB(DB), AC(AC), BFI(BFI), ORE(ORE), TheFunction(F),
Hints(Hints), InterleaveInfo(IAI) {}		Hints(Hints), InterleaveInfo(IAI) {}

/// \return An upper bound for the vectorization factors (both fixed and		/// \return An upper bound for the vectorization factors (both fixed and
/// scalable). If the factors are 0, vectorization and interleaving should be		/// scalable). If the factors are 0, vectorization and interleaving should be
/// avoided up front.		/// avoided up front.
FixedScalableVFPair computeMaxVF(ElementCount UserVF, unsigned UserIC);		FixedScalableVFPair computeMaxVF(ElementCount UserVF, unsigned UserIC);

/// \return True if runtime checks are required for vectorization, and false		/// \return True if runtime checks are required for vectorization, and false
▲ Show 20 Lines • Show All 640 Lines • ▼ Show 20 Lines	private:
bool isCandidateForEpilogueVectorization(const Loop &L,		bool isCandidateForEpilogueVectorization(const Loop &L,
const ElementCount VF) const;		const ElementCount VF) const;

/// Returns true if epilogue vectorization is considered profitable, and		/// Returns true if epilogue vectorization is considered profitable, and
/// false otherwise.		/// false otherwise.
/// \p VF is the vectorization factor chosen for the original loop.		/// \p VF is the vectorization factor chosen for the original loop.
bool isEpilogueVectorizationProfitable(const ElementCount VF) const;		bool isEpilogueVectorizationProfitable(const ElementCount VF) const;

		/// A helper function that scales provide instruction cost to the
		/// probability of it's execution relative to the loop header.
		InstructionCost getInstCostScaledByFreq(InstructionCost &Cost,
		const BasicBlock *BB) const;

public:		public:
/// The loop that we evaluate.		/// The loop that we evaluate.
Loop *TheLoop;		Loop *TheLoop;

/// Predicated scalar evolution analysis.		/// Predicated scalar evolution analysis.
PredicatedScalarEvolution &PSE;		PredicatedScalarEvolution &PSE;

/// Loop Info analysis.		/// Loop Info analysis.
Show All 9 Lines	public:
const TargetLibraryInfo *TLI;		const TargetLibraryInfo *TLI;

/// Demanded bits analysis.		/// Demanded bits analysis.
DemandedBits *DB;		DemandedBits *DB;

/// Assumption cache.		/// Assumption cache.
AssumptionCache *AC;		AssumptionCache *AC;

		BlockFrequencyInfo *BFI;

/// Interface to emit optimization remarks.		/// Interface to emit optimization remarks.
OptimizationRemarkEmitter *ORE;		OptimizationRemarkEmitter *ORE;

const Function *TheFunction;		const Function *TheFunction;

/// Loop Vectorize Hint.		/// Loop Vectorize Hint.
const LoopVectorizeHints *Hints;		const LoopVectorizeHints *Hints;

▲ Show 20 Lines • Show All 2,636 Lines • ▼ Show 20 Lines	if (!VF.isScalable()) {
ScalarizationCost += VF.getKnownMinValue() *		ScalarizationCost += VF.getKnownMinValue() *
TTI.getArithmeticInstrCost(I->getOpcode(), I->getType(), CostKind);		TTI.getArithmeticInstrCost(I->getOpcode(), I->getType(), CostKind);

// The cost of insertelement and extractelement instructions needed for		// The cost of insertelement and extractelement instructions needed for
// scalarization.		// scalarization.
ScalarizationCost += getScalarizationOverhead(I, VF, CostKind);		ScalarizationCost += getScalarizationOverhead(I, VF, CostKind);

// Scale the cost by the probability of executing the predicated blocks.		// Scale the cost by the probability of executing the predicated blocks.
// This assumes the predicated block for each vector lane is equally		// This assumes the predicated block for each vector lane is equally
		fhahnUnsubmitted Done Reply Inline Actions this needs updating now I think fhahn: this needs updating now I think
		ebrevnovAuthorUnsubmitted Done Reply Inline Actions I just removed second sentence. I think everything is clear without extra details. ebrevnov: I just removed second sentence. I think everything is clear without extra details.
// likely.		// likely.
ScalarizationCost = ScalarizationCost / getReciprocalPredBlockProb();		if (ScalarizationCost.isValid())
		ScalarizationCost =
		getInstCostScaledByFreq(ScalarizationCost, I->getParent());
}		}
InstructionCost SafeDivisorCost = 0;		InstructionCost SafeDivisorCost = 0;

auto *VecTy = ToVectorTy(I->getType(), VF);		auto *VecTy = ToVectorTy(I->getType(), VF);

// The cost of the select guard to ensure all lanes are well defined		// The cost of the select guard to ensure all lanes are well defined
// after we speculate above any internal control flow.		// after we speculate above any internal control flow.
SafeDivisorCost += TTI.getCmpSelInstrCost(		SafeDivisorCost += TTI.getCmpSelInstrCost(
▲ Show 20 Lines • Show All 1,035 Lines • ▼ Show 20 Lines	if (TTI.getMaxInterleaveFactor(VF) <= 1)
return false;		return false;
// FIXME: We should consider changing the threshold for scalable		// FIXME: We should consider changing the threshold for scalable
// vectors to take VScaleForTuning into account.		// vectors to take VScaleForTuning into account.
if (VF.getKnownMinValue() >= EpilogueVectorizationMinVF)		if (VF.getKnownMinValue() >= EpilogueVectorizationMinVF)
return true;		return true;
return false;		return false;
}		}

		InstructionCost LoopVectorizationCostModel::getInstCostScaledByFreq(
		InstructionCost &Cost, const BasicBlock *BB) const {
		assert(Cost.isValid() && "Can't scale invalid cost");
		dmgreenUnsubmitted Done Reply Inline Actions Could invalid cost handling be pushed into this function. It would help not needing it at all the call sites. dmgreen: Could invalid cost handling be pushed into this function. It would help not needing it at all…
		fhahnUnsubmitted Done Reply Inline Actions Yeah I think that would help readability fhahn: Yeah I think that would help readability
		ebrevnovAuthorUnsubmitted Done Reply Inline Actions Fixed. ebrevnov: Fixed.

		if (!BFI)
		return Cost / getReciprocalPredBlockProb();

		auto HeaderFreq = BFI->getBlockFreq(TheLoop->getHeader()).getFrequency();
		if (HeaderFreq == 0)
		fhahnUnsubmitted Not Done Reply Inline Actions Could you explain the reasoning here? Maybe fall back to the original code using `getReciprocalPredBlockProb` here? fhahn: Could you explain the reasoning here? Maybe fall back to the original code using…
		ebrevnovAuthorUnsubmitted Done Reply Inline Actions Good idea. Fixed as suggested. ebrevnov: Good idea. Fixed as suggested.
		return Cost;

		// Scale the total scalar cost by relative block probability.
		return (Cost.getValue()
		((double)BFI->getBlockFreq(BB).getFrequency() / HeaderFreq));
		}

VectorizationFactor		VectorizationFactor
LoopVectorizationCostModel::selectEpilogueVectorizationFactor(		LoopVectorizationCostModel::selectEpilogueVectorizationFactor(
const ElementCount MainLoopVF, const LoopVectorizationPlanner &LVP) {		const ElementCount MainLoopVF, const LoopVectorizationPlanner &LVP) {
VectorizationFactor Result = VectorizationFactor::Disabled();		VectorizationFactor Result = VectorizationFactor::Disabled();
if (!EnableEpilogueVectorization) {		if (!EnableEpilogueVectorization) {
LLVM_DEBUG(dbgs() << "LEV: Epilogue vectorization is disabled.\n";);		LLVM_DEBUG(dbgs() << "LEV: Epilogue vectorization is disabled.\n";);
return Result;		return Result;
}		}
▲ Show 20 Lines • Show All 793 Lines • ▼ Show 20 Lines	for (BasicBlock *BB : TheLoop->blocks()) {

// If we are vectorizing a predicated block, it will have been		// If we are vectorizing a predicated block, it will have been
// if-converted. This means that the block's instructions (aside from		// if-converted. This means that the block's instructions (aside from
// stores and instructions that may divide by zero) will now be		// stores and instructions that may divide by zero) will now be
// unconditionally executed. For the scalar case, we may not always execute		// unconditionally executed. For the scalar case, we may not always execute
// the predicated block, if it is an if-else block. Thus, scale the block's		// the predicated block, if it is an if-else block. Thus, scale the block's
// cost by the probability of executing it. blockNeedsPredication from		// cost by the probability of executing it. blockNeedsPredication from
// Legal is used so as to not include all blocks in tail folded loops.		// Legal is used so as to not include all blocks in tail folded loops.
if (VF.isScalar() && Legal->blockNeedsPredication(BB))		if (VF.isScalar() && Legal->blockNeedsPredication(BB) &&
BlockCost.first /= getReciprocalPredBlockProb();		BlockCost.first.isValid())
		BlockCost.first = getInstCostScaledByFreq(BlockCost.first, BB);

Cost.first += BlockCost.first;		Cost.first += BlockCost.first;
Cost.second \|= BlockCost.second;		Cost.second \|= BlockCost.second;
}		}

return Cost;		return Cost;
}		}

▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines	LoopVectorizationCostModel::getMemInstScalarizationCost(Instruction *I,

// Get the overhead of the extractelement and insertelement instructions		// Get the overhead of the extractelement and insertelement instructions
// we might create due to scalarization.		// we might create due to scalarization.
Cost += getScalarizationOverhead(I, VF, CostKind);		Cost += getScalarizationOverhead(I, VF, CostKind);

// If we have a predicated load/store, it will need extra i1 extracts and		// If we have a predicated load/store, it will need extra i1 extracts and
// conditional branches, but may not be executed for each vector lane. Scale		// conditional branches, but may not be executed for each vector lane. Scale
// the cost by the probability of executing the predicated block.		// the cost by the probability of executing the predicated block.
if (isPredicatedInst(I)) {		if (Cost.isValid() && isPredicatedInst(I)) {
Cost /= getReciprocalPredBlockProb();		Cost = getInstCostScaledByFreq(Cost, I->getParent());

// Add the cost of an i1 extract and a branch		// Add the cost of an i1 extract and a branch
auto *Vec_i1Ty =		auto *Vec_i1Ty =
VectorType::get(IntegerType::getInt1Ty(ValTy->getContext()), VF);		VectorType::get(IntegerType::getInt1Ty(ValTy->getContext()), VF);
Cost += TTI.getScalarizationOverhead(		Cost += TTI.getScalarizationOverhead(
Vec_i1Ty, APInt::getAllOnes(VF.getKnownMinValue()),		Vec_i1Ty, APInt::getAllOnes(VF.getKnownMinValue()),
/Insert=/false, /Extract=/true, CostKind);		/Insert=/false, /Extract=/true, CostKind);
Cost += TTI.getCFInstrCost(Instruction::Br, CostKind);		Cost += TTI.getCFInstrCost(Instruction::Br, CostKind);
▲ Show 20 Lines • Show All 3,362 Lines • ▼ Show 20 Lines	static bool processLoopInVPlanNativePath(
}		}
assert(EnableVPlanNativePath && "VPlan-native path is disabled.");		assert(EnableVPlanNativePath && "VPlan-native path is disabled.");
Function *F = L->getHeader()->getParent();		Function *F = L->getHeader()->getParent();
InterleavedAccessInfo IAI(PSE, L, DT, LI, LVL->getLAI());		InterleavedAccessInfo IAI(PSE, L, DT, LI, LVL->getLAI());

ScalarEpilogueLowering SEL = getScalarEpilogueLowering(		ScalarEpilogueLowering SEL = getScalarEpilogueLowering(
F, L, Hints, PSI, BFI, TTI, TLI, AC, LI, PSE.getSE(), DT, *LVL, &IAI);		F, L, Hints, PSI, BFI, TTI, TLI, AC, LI, PSE.getSE(), DT, *LVL, &IAI);

LoopVectorizationCostModel CM(SEL, L, PSE, LI, LVL, *TTI, TLI, DB, AC, ORE, F,		LoopVectorizationCostModel CM(SEL, L, PSE, LI, LVL, *TTI, TLI, DB, AC, BFI,
&Hints, IAI);		ORE, F, &Hints, IAI);
// Use the planner for outer loop vectorization.		// Use the planner for outer loop vectorization.
// TODO: CM is not used at this point inside the planner. Turn CM into an		// TODO: CM is not used at this point inside the planner. Turn CM into an
// optional argument if we don't need it in the future.		// optional argument if we don't need it in the future.
LoopVectorizationPlanner LVP(L, LI, TLI, TTI, LVL, CM, IAI, PSE, Hints, ORE);		LoopVectorizationPlanner LVP(L, LI, TLI, TTI, LVL, CM, IAI, PSE, Hints, ORE);

// Get user vectorization factor.		// Get user vectorization factor.
ElementCount UserVF = Hints.getWidth();		ElementCount UserVF = Hints.getWidth();

▲ Show 20 Lines • Show All 324 Lines • ▼ Show 20 Lines	if (!LVL.canVectorizeFPMath(AllowOrderedReductions)) {
});		});
LLVM_DEBUG(dbgs() << "LV: loop not vectorized: cannot prove it is safe to "		LLVM_DEBUG(dbgs() << "LV: loop not vectorized: cannot prove it is safe to "
"reorder floating-point operations\n");		"reorder floating-point operations\n");
Hints.emitRemarkWithHints();		Hints.emitRemarkWithHints();
return false;		return false;
}		}

// Use the cost model.		// Use the cost model.
LoopVectorizationCostModel CM(SEL, L, PSE, LI, &LVL, *TTI, TLI, DB, AC, ORE,		LoopVectorizationCostModel CM(SEL, L, PSE, LI, &LVL, *TTI, TLI, DB, AC, BFI,
F, &Hints, IAI);		ORE, F, &Hints, IAI);
CM.collectValuesToIgnore();		CM.collectValuesToIgnore();
CM.collectElementTypesForWidening();		CM.collectElementTypesForWidening();

// Use the planner for vectorization.		// Use the planner for vectorization.
LoopVectorizationPlanner LVP(L, LI, TLI, TTI, &LVL, CM, IAI, PSE, Hints, ORE);		LoopVectorizationPlanner LVP(L, LI, TLI, TTI, &LVL, CM, IAI, PSE, Hints, ORE);

// Get user vectorization factor and interleave count.		// Get user vectorization factor and interleave count.
ElementCount UserVF = Hints.getWidth();		ElementCount UserVF = Hints.getWidth();
▲ Show 20 Lines • Show All 341 Lines • ▼ Show 20 Lines	PreservedAnalyses LoopVectorizePass::run(Function &F,
auto &DB = AM.getResult<DemandedBitsAnalysis>(F);		auto &DB = AM.getResult<DemandedBitsAnalysis>(F);
auto &ORE = AM.getResult<OptimizationRemarkEmitterAnalysis>(F);		auto &ORE = AM.getResult<OptimizationRemarkEmitterAnalysis>(F);

LoopAccessInfoManager &LAIs = AM.getResult<LoopAccessAnalysis>(F);		LoopAccessInfoManager &LAIs = AM.getResult<LoopAccessAnalysis>(F);
auto &MAMProxy = AM.getResult<ModuleAnalysisManagerFunctionProxy>(F);		auto &MAMProxy = AM.getResult<ModuleAnalysisManagerFunctionProxy>(F);
ProfileSummaryInfo *PSI =		ProfileSummaryInfo *PSI =
MAMProxy.getCachedResult<ProfileSummaryAnalysis>(*F.getParent());		MAMProxy.getCachedResult<ProfileSummaryAnalysis>(*F.getParent());
BlockFrequencyInfo *BFI = nullptr;		BlockFrequencyInfo *BFI = nullptr;
if (PSI && PSI->hasProfileSummary())		//if (PSI && PSI->hasProfileSummary())
		dmgreenUnsubmitted Not Done Reply Inline Actions Is BFI intended to be required now? Some of the other code makes it look like it is still optional. dmgreen: Is BFI intended to be required now? Some of the other code makes it look like it is still…
		ebrevnovAuthorUnsubmitted Done Reply Inline Actions Is BFI intended to be required now? Yes. In fact, it should look like: BlockFrequencyInfo BFI = &AM.getResult<BlockFrequencyAnalysis>(F); Will fix. Some of the other code makes it look like it is still optional. Right. Before this change there was exactly one use of BFI guarded by PSI. I would suggest not to touch this place (at least now) because if compile time regression shows up we may want to conditionally request BFI for predicated loops only. ebrevnov:* > Is BFI intended to be required now? Yes. In fact, it should look like: BlockFrequencyInfo…
		nikicUnsubmitted Not Done Reply Inline Actions This will definitely regress compile-time. You are only allowed to query BFI in PGO builds, which implies presence of PSI. nikic: This will definitely regress compile-time. You are only allowed to query BFI in PGO builds…
		ebrevnovAuthorUnsubmitted Done Reply Inline Actions But that depends on BPI availability from previous passes and needs of BPI by following passes, right? ebrevnov: But that depends on BPI availability from previous passes and needs of BPI by following passes…
		nikicUnsubmitted Not Done Reply Inline Actions Not sure what you mean here. BPI is a dependency of BFI, so it will be calculated if not already available? nikic: Not sure what you mean here. BPI is a dependency of BFI, so it will be calculated if not…
		ebrevnovAuthorUnsubmitted Done Reply Inline Actions I meant BFI (not BPI). Sorry for confusion. ebrevnov: I meant BFI (not BPI). Sorry for confusion.
		nikicUnsubmitted Not Done Reply Inline Actions I still don't really get what you mean here. If you use getResult (rather than getCachedResult) BFI will be fetched regardless of whether it is available from previous passes. The important part is that you shouldn't fetch BFI if PSI is not available. In test cases, you need to request PSI explicitly, in the pass pipeline it will be automatically available for PGO builds. nikic: I still don't really get what you mean here. If you use getResult (rather than getCachedResult)…
		ebrevnovAuthorUnsubmitted Done Reply Inline Actions I still don't really get what you mean here. If you use getResult (rather than getCachedResult) BFI will be fetched regardless of whether it is available from previous passes. But if it is available from previous passes there will be no compile time impact. "This will definitely regress compile-time." I'm trying to show that this statement is not necessarily true and depends on different factors including particular pass ordering. Namely, getResult<BlockFrequencyAnalysis> is free (regardless of PGO/PSI) if BFI is available from previous passes. Moreover, if one of the passes following LV requests BFI and loop is not vectorized (because vectorization invalidates BFI) then there is no extra overhead either. Thus there is a pretty high chance that there will be no any impact in the practice. If it is it will be possible to significantly reduce the impact by requesting BFI lazily... but I would really like not to go this way with out strong need. The important part is that you shouldn't fetch BFI if PSI is not available. In test cases, you need to request PSI explicitly, in the pass pipeline it will be automatically available for PGO builds. Not sure, why you say "you shouldn't fetch BFI if PSI is not available". There are many cases when we do that. Why vectorizer can't do the same (is it due to potential compile time increase)? I think use of BFI is the best solution for the problem. Do you have any ideas how to implement the functionality without BFI? ebrevnov: > I still don't really get what you mean here. If you use getResult (rather than…
		nikicUnsubmitted Not Done Reply Inline Actions In non-PGO builds at least, BFI will not be available in this pipeline position, and following passes will not request it. Anyway, I checked, and this does impact compile-time: http://llvm-compile-time-tracker.com/compare.php?from=9d52f69afef64776b830bb9adc4e1737ff8fc426&to=778b1b4ab6cd7d0d56f5746fce61a6e63c9cf722&stat=instructions:u There are many cases when we do that. No we don't. To the best of my knowledge the only place that makes use of BFI without PSI is the inliner. In all other passes BFI use is conditional on PSI. (Excluding passes that aren't part of the default pipeline, of course.) nikic: In non-PGO builds at least, BFI will not be available in this pipeline position, and following…
		ebrevnovAuthorUnsubmitted Done Reply Inline Actions Ok, I see the problem now. Since BFI is going to be used to drive cost model decision it is not an option to make it PSI dependent. The only possible solution I see is to request BFI lazy only when we get to cost modeling. What do you think? ebrevnov: Ok, I see the problem now. Since BFI is going to be used to drive cost model decision it is not…
BFI = &AM.getResult<BlockFrequencyAnalysis>(F);		BFI = &AM.getResult<BlockFrequencyAnalysis>(F);
LoopVectorizeResult Result =		LoopVectorizeResult Result =
runImpl(F, SE, LI, TTI, DT, BFI, &TLI, DB, AC, LAIs, ORE, PSI);		runImpl(F, SE, LI, TTI, DT, BFI, &TLI, DB, AC, LAIs, ORE, PSI);
if (!Result.MadeAnyChange)		if (!Result.MadeAnyChange)
return PreservedAnalyses::all();		return PreservedAnalyses::all();
PreservedAnalyses PA;		PreservedAnalyses PA;

if (isAssignmentTrackingEnabled(*F.getParent())) {		if (isAssignmentTrackingEnabled(*F.getParent())) {
Show All 36 Lines

llvm/test/Analysis/CostModel/X86/masked-interleaved-store-i16.ll

	Show First 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
	; points[i*4] = x[i];			; points[i*4] = x[i];
	; points[i*4 + 1] = y[i];			; points[i*4 + 1] = y[i];
	; }			; }

	define void @test2(ptr noalias nocapture %points, i32 %numPoints, ptr noalias nocapture readonly %x, ptr noalias nocapture readonly %y) {			define void @test2(ptr noalias nocapture %points, i32 %numPoints, ptr noalias nocapture readonly %x, ptr noalias nocapture readonly %y) {
	; DISABLED_MASKED_STRIDED-LABEL: 'test2'			; DISABLED_MASKED_STRIDED-LABEL: 'test2'
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %0, ptr %arrayidx2, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %0, ptr %arrayidx2, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %2, ptr %arrayidx7, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %2, ptr %arrayidx7, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 5 for VF 2 For instruction: store i16 %0, ptr %arrayidx2, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 8 for VF 2 For instruction: store i16 %0, ptr %arrayidx2, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 2 For instruction: store i16 %2, ptr %arrayidx7, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 2 For instruction: store i16 %2, ptr %arrayidx7, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 10 for VF 4 For instruction: store i16 %0, ptr %arrayidx2, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 17 for VF 4 For instruction: store i16 %0, ptr %arrayidx2, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 4 For instruction: store i16 %2, ptr %arrayidx7, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 4 For instruction: store i16 %2, ptr %arrayidx7, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 21 for VF 8 For instruction: store i16 %0, ptr %arrayidx2, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 35 for VF 8 For instruction: store i16 %0, ptr %arrayidx2, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 8 For instruction: store i16 %2, ptr %arrayidx7, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 8 For instruction: store i16 %2, ptr %arrayidx7, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 43 for VF 16 For instruction: store i16 %0, ptr %arrayidx2, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 71 for VF 16 For instruction: store i16 %0, ptr %arrayidx2, align 2
	; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 16 For instruction: store i16 %2, ptr %arrayidx7, align 2			; DISABLED_MASKED_STRIDED: LV: Found an estimated cost of 3000000 for VF 16 For instruction: store i16 %2, ptr %arrayidx7, align 2
	;			;
	; ENABLED_MASKED_STRIDED-LABEL: 'test2'			; ENABLED_MASKED_STRIDED-LABEL: 'test2'
	; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %0, ptr %arrayidx2, align 2			; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %0, ptr %arrayidx2, align 2
	; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %2, ptr %arrayidx7, align 2			; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %2, ptr %arrayidx7, align 2
	; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 0 for VF 2 For instruction: store i16 %0, ptr %arrayidx2, align 2			; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 0 for VF 2 For instruction: store i16 %0, ptr %arrayidx2, align 2
	; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 10 for VF 2 For instruction: store i16 %2, ptr %arrayidx7, align 2			; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 13 for VF 2 For instruction: store i16 %2, ptr %arrayidx7, align 2
	; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 0 for VF 4 For instruction: store i16 %0, ptr %arrayidx2, align 2			; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 0 for VF 4 For instruction: store i16 %0, ptr %arrayidx2, align 2
	; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 14 for VF 4 For instruction: store i16 %2, ptr %arrayidx7, align 2			; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 14 for VF 4 For instruction: store i16 %2, ptr %arrayidx7, align 2
	; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 0 for VF 8 For instruction: store i16 %0, ptr %arrayidx2, align 2			; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 0 for VF 8 For instruction: store i16 %0, ptr %arrayidx2, align 2
	; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 14 for VF 8 For instruction: store i16 %2, ptr %arrayidx7, align 2			; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 14 for VF 8 For instruction: store i16 %2, ptr %arrayidx7, align 2
	; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 0 for VF 16 For instruction: store i16 %0, ptr %arrayidx2, align 2			; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 0 for VF 16 For instruction: store i16 %0, ptr %arrayidx2, align 2
	; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 27 for VF 16 For instruction: store i16 %2, ptr %arrayidx7, align 2			; ENABLED_MASKED_STRIDED: LV: Found an estimated cost of 27 for VF 16 For instruction: store i16 %2, ptr %arrayidx7, align 2
	;			;
	entry:			entry:
	▲ Show 20 Lines • Show All 80 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/X86/masked-scatter-i32-with-i8-index.ll

	Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	; AVX2: LV: Found an estimated cost of 4 for VF 4 For instruction: store i32 %valB, ptr %out, align 4			; AVX2: LV: Found an estimated cost of 4 for VF 4 For instruction: store i32 %valB, ptr %out, align 4
	; AVX2: LV: Found an estimated cost of 8 for VF 8 For instruction: store i32 %valB, ptr %out, align 4			; AVX2: LV: Found an estimated cost of 8 for VF 8 For instruction: store i32 %valB, ptr %out, align 4
	; AVX2: LV: Found an estimated cost of 17 for VF 16 For instruction: store i32 %valB, ptr %out, align 4			; AVX2: LV: Found an estimated cost of 17 for VF 16 For instruction: store i32 %valB, ptr %out, align 4
	; AVX2: LV: Found an estimated cost of 34 for VF 32 For instruction: store i32 %valB, ptr %out, align 4			; AVX2: LV: Found an estimated cost of 34 for VF 32 For instruction: store i32 %valB, ptr %out, align 4
	;			;
	; AVX512-LABEL: 'test'			; AVX512-LABEL: 'test'
	; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: store i32 %valB, ptr %out, align 4			; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: store i32 %valB, ptr %out, align 4
	; AVX512: LV: Found an estimated cost of 5 for VF 2 For instruction: store i32 %valB, ptr %out, align 4			; AVX512: LV: Found an estimated cost of 5 for VF 2 For instruction: store i32 %valB, ptr %out, align 4
	; AVX512: LV: Found an estimated cost of 10 for VF 4 For instruction: store i32 %valB, ptr %out, align 4			; AVX512: LV: Found an estimated cost of 12 for VF 4 For instruction: store i32 %valB, ptr %out, align 4
	; AVX512: LV: Found an estimated cost of 10 for VF 8 For instruction: store i32 %valB, ptr %out, align 4			; AVX512: LV: Found an estimated cost of 10 for VF 8 For instruction: store i32 %valB, ptr %out, align 4
	; AVX512: LV: Found an estimated cost of 18 for VF 16 For instruction: store i32 %valB, ptr %out, align 4			; AVX512: LV: Found an estimated cost of 18 for VF 16 For instruction: store i32 %valB, ptr %out, align 4
	; AVX512: LV: Found an estimated cost of 36 for VF 32 For instruction: store i32 %valB, ptr %out, align 4			; AVX512: LV: Found an estimated cost of 36 for VF 32 For instruction: store i32 %valB, ptr %out, align 4
	; AVX512: LV: Found an estimated cost of 72 for VF 64 For instruction: store i32 %valB, ptr %out, align 4			; AVX512: LV: Found an estimated cost of 72 for VF 64 For instruction: store i32 %valB, ptr %out, align 4
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	Show All 28 Lines

llvm/test/Analysis/CostModel/X86/masked-scatter-i64-with-i8-index.ll

	Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	; AVX2: LV: Found an estimated cost of 4 for VF 4 For instruction: store i64 %valB, ptr %out, align 8			; AVX2: LV: Found an estimated cost of 4 for VF 4 For instruction: store i64 %valB, ptr %out, align 8
	; AVX2: LV: Found an estimated cost of 9 for VF 8 For instruction: store i64 %valB, ptr %out, align 8			; AVX2: LV: Found an estimated cost of 9 for VF 8 For instruction: store i64 %valB, ptr %out, align 8
	; AVX2: LV: Found an estimated cost of 18 for VF 16 For instruction: store i64 %valB, ptr %out, align 8			; AVX2: LV: Found an estimated cost of 18 for VF 16 For instruction: store i64 %valB, ptr %out, align 8
	; AVX2: LV: Found an estimated cost of 36 for VF 32 For instruction: store i64 %valB, ptr %out, align 8			; AVX2: LV: Found an estimated cost of 36 for VF 32 For instruction: store i64 %valB, ptr %out, align 8
	;			;
	; AVX512-LABEL: 'test'			; AVX512-LABEL: 'test'
	; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: store i64 %valB, ptr %out, align 8			; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: store i64 %valB, ptr %out, align 8
	; AVX512: LV: Found an estimated cost of 5 for VF 2 For instruction: store i64 %valB, ptr %out, align 8			; AVX512: LV: Found an estimated cost of 5 for VF 2 For instruction: store i64 %valB, ptr %out, align 8
	; AVX512: LV: Found an estimated cost of 11 for VF 4 For instruction: store i64 %valB, ptr %out, align 8			; AVX512: LV: Found an estimated cost of 12 for VF 4 For instruction: store i64 %valB, ptr %out, align 8
	; AVX512: LV: Found an estimated cost of 10 for VF 8 For instruction: store i64 %valB, ptr %out, align 8			; AVX512: LV: Found an estimated cost of 10 for VF 8 For instruction: store i64 %valB, ptr %out, align 8
	; AVX512: LV: Found an estimated cost of 20 for VF 16 For instruction: store i64 %valB, ptr %out, align 8			; AVX512: LV: Found an estimated cost of 20 for VF 16 For instruction: store i64 %valB, ptr %out, align 8
	; AVX512: LV: Found an estimated cost of 40 for VF 32 For instruction: store i64 %valB, ptr %out, align 8			; AVX512: LV: Found an estimated cost of 40 for VF 32 For instruction: store i64 %valB, ptr %out, align 8
	; AVX512: LV: Found an estimated cost of 80 for VF 64 For instruction: store i64 %valB, ptr %out, align 8			; AVX512: LV: Found an estimated cost of 80 for VF 64 For instruction: store i64 %valB, ptr %out, align 8
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	Show All 28 Lines

llvm/test/Other/new-pm-defaults.ll

	Show First 20 Lines • Show All 240 Lines • ▼ Show 20 Lines
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O-NEXT: Running pass: LCSSAPass			; CHECK-O-NEXT: Running pass: LCSSAPass
	; CHECK-O-NEXT: Running pass: LoopRotatePass			; CHECK-O-NEXT: Running pass: LoopRotatePass
	; CHECK-O-NEXT: Running pass: LoopDeletionPass			; CHECK-O-NEXT: Running pass: LoopDeletionPass
	; CHECK-O-NEXT: Running pass: LoopDistributePass			; CHECK-O-NEXT: Running pass: LoopDistributePass
	; CHECK-O-NEXT: Running analysis: LoopAccessAnalysis on foo			; CHECK-O-NEXT: Running analysis: LoopAccessAnalysis on foo
	; CHECK-O-NEXT: Running pass: InjectTLIMappings			; CHECK-O-NEXT: Running pass: InjectTLIMappings
	; CHECK-O-NEXT: Running pass: LoopVectorizePass			; CHECK-O-NEXT: Running pass: LoopVectorizePass
				; CHECK-O-NEXT: Running analysis: BlockFrequencyAnalysis
				; CHECK-O-NEXT: Running analysis: BranchProbabilityAnalysis
	; CHECK-O-NEXT: Running pass: LoopLoadEliminationPass			; CHECK-O-NEXT: Running pass: LoopLoadEliminationPass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O2-NEXT: Running pass: SLPVectorizerPass			; CHECK-O2-NEXT: Running pass: SLPVectorizerPass
	; CHECK-O3-NEXT: Running pass: SLPVectorizerPass			; CHECK-O3-NEXT: Running pass: SLPVectorizerPass
	; CHECK-Os-NEXT: Running pass: SLPVectorizerPass			; CHECK-Os-NEXT: Running pass: SLPVectorizerPass
	; CHECK-O-NEXT: Running pass: VectorCombinePass			; CHECK-O-NEXT: Running pass: VectorCombinePass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-lto-defaults.ll

	Show First 20 Lines • Show All 112 Lines • ▼ Show 20 Lines
	; CHECK-O23SZ-NEXT: Running pass: LCSSAPass on foo			; CHECK-O23SZ-NEXT: Running pass: LCSSAPass on foo
	; CHECK-O23SZ-NEXT: Running pass: IndVarSimplifyPass on loop			; CHECK-O23SZ-NEXT: Running pass: IndVarSimplifyPass on loop
	; CHECK-O23SZ-NEXT: Running pass: LoopDeletionPass on loop			; CHECK-O23SZ-NEXT: Running pass: LoopDeletionPass on loop
	; CHECK-O23SZ-NEXT: Running pass: LoopFullUnrollPass on loop			; CHECK-O23SZ-NEXT: Running pass: LoopFullUnrollPass on loop
	; CHECK-O23SZ-NEXT: Running pass: LoopDistributePass on foo			; CHECK-O23SZ-NEXT: Running pass: LoopDistributePass on foo
	; CHECK-O23SZ-NEXT: Running analysis: LoopAccessAnalysis on foo			; CHECK-O23SZ-NEXT: Running analysis: LoopAccessAnalysis on foo
	; CHECK-O23SZ-NEXT: Running pass: LoopVectorizePass on foo			; CHECK-O23SZ-NEXT: Running pass: LoopVectorizePass on foo
	; CHECK-O23SZ-NEXT: Running analysis: DemandedBitsAnalysis on foo			; CHECK-O23SZ-NEXT: Running analysis: DemandedBitsAnalysis on foo
				; CHECK-O23SZ-NEXT: Running analysis: BlockFrequencyAnalysis
				; CHECK-O23SZ-NEXT: Running analysis: BranchProbabilityAnalysis
	; CHECK-O23SZ-NEXT: Running pass: LoopUnrollPass on foo			; CHECK-O23SZ-NEXT: Running pass: LoopUnrollPass on foo
	; CHECK-O23SZ-NEXT: WarnMissedTransformationsPass on foo			; CHECK-O23SZ-NEXT: WarnMissedTransformationsPass on foo
	; CHECK-O23SZ-NEXT: Running pass: SROAPass on foo			; CHECK-O23SZ-NEXT: Running pass: SROAPass on foo
	; CHECK-O23SZ-NEXT: Running pass: InstCombinePass on foo			; CHECK-O23SZ-NEXT: Running pass: InstCombinePass on foo
	; CHECK-O23SZ-NEXT: Running pass: SimplifyCFGPass on foo			; CHECK-O23SZ-NEXT: Running pass: SimplifyCFGPass on foo
	; CHECK-O23SZ-NEXT: Running pass: SCCPPass on foo			; CHECK-O23SZ-NEXT: Running pass: SCCPPass on foo
	; CHECK-O23SZ-NEXT: Running pass: InstCombinePass on foo			; CHECK-O23SZ-NEXT: Running pass: InstCombinePass on foo
	; CHECK-O23SZ-NEXT: Running pass: BDCEPass on foo			; CHECK-O23SZ-NEXT: Running pass: BDCEPass on foo
	▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-thinlto-postlink-defaults.ll

	Show First 20 Lines • Show All 176 Lines • ▼ Show 20 Lines
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-POSTLINK-O-NEXT: Running pass: LCSSAPass			; CHECK-POSTLINK-O-NEXT: Running pass: LCSSAPass
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopRotatePass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopRotatePass
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopDeletionPass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopDeletionPass
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopDistributePass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopDistributePass
	; CHECK-POSTLINK-O-NEXT: Running analysis: LoopAccessAnalysis on foo			; CHECK-POSTLINK-O-NEXT: Running analysis: LoopAccessAnalysis on foo
	; CHECK-POSTLINK-O-NEXT: Running pass: InjectTLIMappings			; CHECK-POSTLINK-O-NEXT: Running pass: InjectTLIMappings
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopVectorizePass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopVectorizePass
				; CHECK-POSTLINK-O-NEXT: Running analysis: BlockFrequencyAnalysis
				; CHECK-POSTLINK-O-NEXT: Running analysis: BranchProbabilityAnalysis
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopLoadEliminationPass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopLoadEliminationPass
	; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass			; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass
	; CHECK-POSTLINK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-POSTLINK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-POSTLINK-O2-NEXT: Running pass: SLPVectorizerPass			; CHECK-POSTLINK-O2-NEXT: Running pass: SLPVectorizerPass
	; CHECK-POSTLINK-O3-NEXT: Running pass: SLPVectorizerPass			; CHECK-POSTLINK-O3-NEXT: Running pass: SLPVectorizerPass
	; CHECK-POSTLINK-Os-NEXT: Running pass: SLPVectorizerPass			; CHECK-POSTLINK-Os-NEXT: Running pass: SLPVectorizerPass
	; CHECK-POSTLINK-O-NEXT: Running pass: VectorCombinePass			; CHECK-POSTLINK-O-NEXT: Running pass: VectorCombinePass
	; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass			; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass
	▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/ARM/mve-icmpcost.ll

	Show All 11 Lines
	; CHECK: LV: Found an estimated cost of 0 for VF 1 For instruction: br i1 %cmp2, label %if.then, label %for.inc			; CHECK: LV: Found an estimated cost of 0 for VF 1 For instruction: br i1 %cmp2, label %if.then, label %for.inc
	; CHECK: LV: Found an estimated cost of 1 for VF 1 For instruction: %conv6 = add i16 %1, %0			; CHECK: LV: Found an estimated cost of 1 for VF 1 For instruction: %conv6 = add i16 %1, %0
	; CHECK: LV: Found an estimated cost of 0 for VF 1 For instruction: %arrayidx7 = getelementptr inbounds i16, ptr %d, i32 %i.016			; CHECK: LV: Found an estimated cost of 0 for VF 1 For instruction: %arrayidx7 = getelementptr inbounds i16, ptr %d, i32 %i.016
	; CHECK: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %conv6, ptr %arrayidx7, align 2			; CHECK: LV: Found an estimated cost of 1 for VF 1 For instruction: store i16 %conv6, ptr %arrayidx7, align 2
	; CHECK: LV: Found an estimated cost of 0 for VF 1 For instruction: br label %for.inc			; CHECK: LV: Found an estimated cost of 0 for VF 1 For instruction: br label %for.inc
	; CHECK: LV: Found an estimated cost of 1 for VF 1 For instruction: %inc = add nuw nsw i32 %i.016, 1			; CHECK: LV: Found an estimated cost of 1 for VF 1 For instruction: %inc = add nuw nsw i32 %i.016, 1
	; CHECK: LV: Found an estimated cost of 1 for VF 1 For instruction: %exitcond.not = icmp eq i32 %inc, %n			; CHECK: LV: Found an estimated cost of 1 for VF 1 For instruction: %exitcond.not = icmp eq i32 %inc, %n
	; CHECK: LV: Found an estimated cost of 0 for VF 1 For instruction: br i1 %exitcond.not, label %for.cond.cleanup.loopexit, label %for.body			; CHECK: LV: Found an estimated cost of 0 for VF 1 For instruction: br i1 %exitcond.not, label %for.cond.cleanup.loopexit, label %for.body
	; CHECK: LV: Scalar loop costs: 5.			; CHECK: LV: Scalar loop costs: 4.
	; CHECK: LV: Found an estimated cost of 0 for VF 2 For instruction: %i.016 = phi i32 [ 0, %for.body.lr.ph ], [ %inc, %for.inc ]			; CHECK: LV: Found an estimated cost of 0 for VF 2 For instruction: %i.016 = phi i32 [ 0, %for.body.lr.ph ], [ %inc, %for.inc ]
	; CHECK: LV: Found an estimated cost of 0 for VF 2 For instruction: %arrayidx = getelementptr inbounds i16, ptr %s, i32 %i.016			; CHECK: LV: Found an estimated cost of 0 for VF 2 For instruction: %arrayidx = getelementptr inbounds i16, ptr %s, i32 %i.016
	; CHECK: LV: Found an estimated cost of 18 for VF 2 For instruction: %1 = load i16, ptr %arrayidx, align 2			; CHECK: LV: Found an estimated cost of 18 for VF 2 For instruction: %1 = load i16, ptr %arrayidx, align 2
	; CHECK: LV: Found an estimated cost of 4 for VF 2 For instruction: %conv = sext i16 %1 to i32			; CHECK: LV: Found an estimated cost of 4 for VF 2 For instruction: %conv = sext i16 %1 to i32
	; CHECK: LV: Found an estimated cost of 20 for VF 2 For instruction: %cmp2 = icmp sgt i32 %conv, %conv1			; CHECK: LV: Found an estimated cost of 20 for VF 2 For instruction: %cmp2 = icmp sgt i32 %conv, %conv1
	; CHECK: LV: Found an estimated cost of 0 for VF 2 For instruction: br i1 %cmp2, label %if.then, label %for.inc			; CHECK: LV: Found an estimated cost of 0 for VF 2 For instruction: br i1 %cmp2, label %if.then, label %for.inc
	; CHECK: LV: Found an estimated cost of 26 for VF 2 For instruction: %conv6 = add i16 %1, %0			; CHECK: LV: Found an estimated cost of 26 for VF 2 For instruction: %conv6 = add i16 %1, %0
	; CHECK: LV: Found an estimated cost of 0 for VF 2 For instruction: %arrayidx7 = getelementptr inbounds i16, ptr %d, i32 %i.016			; CHECK: LV: Found an estimated cost of 0 for VF 2 For instruction: %arrayidx7 = getelementptr inbounds i16, ptr %d, i32 %i.016
	▲ Show 20 Lines • Show All 239 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/icmp-uniforms.ll

	Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: vector.ph:			; CHECK-NEXT: vector.ph:
	; CHECK-NEXT: Successor(s): vector loop			; CHECK-NEXT: Successor(s): vector loop
	; CHECK-EMPTY:			; CHECK-EMPTY:
	; CHECK-NEXT: <x1> vector loop: {			; CHECK-NEXT: <x1> vector loop: {
	; CHECK-NEXT: vector.body:			; CHECK-NEXT: vector.body:
	; CHECK-NEXT: EMIT vp<[[CAN_IV:%.+]]> = CANONICAL-INDUCTION			; CHECK-NEXT: EMIT vp<[[CAN_IV:%.+]]> = CANONICAL-INDUCTION
	; CHECK-NEXT: WIDEN-INDUCTION %iv = phi 0, %iv.next, ir<1>			; CHECK-NEXT: WIDEN-INDUCTION %iv = phi 0, %iv.next, ir<1>
	; CHECK-NEXT: EMIT vp<[[COND:%.+]]> = icmp ule ir<%iv> vp<[[BTC]]>			; CHECK-NEXT: EMIT vp<[[COND:%.+]]> = icmp ule ir<%iv> vp<[[BTC]]>
	; CHECK-NEXT: WIDEN ir<%cond0> = icmp ult ir<%iv>, ir<13>
	; CHECK-NEXT: WIDEN-SELECT ir<%s> = select ir<%cond0>, ir<10>, ir<20>
	; CHECK-NEXT: Successor(s): pred.store			; CHECK-NEXT: Successor(s): pred.store
	; CHECK-EMPTY:			; CHECK-EMPTY:
	; CHECK-NEXT: <xVFxUF> pred.store: {			; CHECK-NEXT: <xVFxUF> pred.store: {
	; CHECK-NEXT: pred.store.entry:			; CHECK-NEXT: pred.store.entry:
	; CHECK-NEXT: BRANCH-ON-MASK vp<[[COND]]>			; CHECK-NEXT: BRANCH-ON-MASK vp<%5>
	; CHECK-NEXT: Successor(s): pred.store.if, pred.store.continue			; CHECK-NEXT: Successor(s): pred.store.if, pred.store.continue
	; CHECK-EMPTY:			; CHECK-EMPTY:
	; CHECK-NEXT: pred.store.if:			; CHECK-NEXT: pred.store.if:
	; CHECK-NEXT: vp<[[STEPS:%.+]]> = SCALAR-STEPS vp<[[CAN_IV]]>, ir<1>			; CHECK-NEXT: vp<%6> = SCALAR-STEPS vp<%3>, ir<1>
	; CHECK-NEXT: REPLICATE ir<%gep> = getelementptr ir<%ptr>, vp<[[STEPS]]>			; CHECK-NEXT: REPLICATE ir<%cond0> = icmp vp<%6>, ir<13>
				; CHECK-NEXT: REPLICATE ir<%gep> = getelementptr ir<%ptr>, vp<%6>
				; CHECK-NEXT: REPLICATE ir<%s> = select ir<%cond0>, ir<10>, ir<20>
	; CHECK-NEXT: REPLICATE store ir<%s>, ir<%gep>			; CHECK-NEXT: REPLICATE store ir<%s>, ir<%gep>
	; CHECK-NEXT: Successor(s): pred.store.continue			; CHECK-NEXT: Successor(s): pred.store.continue
	; CHECK-EMPTY:			; CHECK-EMPTY:
	; CHECK-NEXT: pred.store.continue:			; CHECK-NEXT: pred.store.continue:
	; CHECK-NEXT: No successors			; CHECK-NEXT: No successors
	; CHECK-NEXT: }			; CHECK-NEXT: }
	; CHECK-NEXT: Successor(s): loop.0			; CHECK-NEXT: Successor(s): loop.0
	; CHECK-EMPTY:			; CHECK-EMPTY:
	; CHECK-NEXT: loop.0:			; CHECK-NEXT: loop.0:
	; CHECK-NEXT: EMIT vp<[[CAN_IV_NEXT:%.+]]> = VF * UF + vp<[[CAN_IV]]>			; CHECK-NEXT: EMIT vp<%11> = VF * UF + vp<%3>
	; CHECK-NEXT: EMIT branch-on-count vp<[[CAN_IV_NEXT]]> vp<[[VEC_TC]]>			; CHECK-NEXT: EMIT branch-on-count vp<%11> vp<%1>
	; CHECK-NEXT: No successor			; CHECK-NEXT: No successors
	; CHECK-NEXT: }			; CHECK-NEXT:}



	define void @test(ptr %ptr) {			define void @test(ptr %ptr) {
	entry:			entry:
	br label %loop			br label %loop

	loop: ; preds = %loop, %entry			loop: ; preds = %loop, %entry
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
	%cond0 = icmp ult i64 %iv, 13			%cond0 = icmp ult i64 %iv, 13
	%s = select i1 %cond0, i32 10, i32 20			%s = select i1 %cond0, i32 10, i32 20
	Show All 9 Lines

llvm/test/Transforms/LoopVectorize/if-pred-non-void.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt -S -force-vector-width=2 -force-vector-interleave=1 -passes=loop-vectorize,simplifycfg -verify-loop-info -simplifycfg-require-and-preserve-domtree=1 -force-widen-divrem-via-safe-divisor=0 < %s \| FileCheck %s		; RUN: opt -S -force-vector-width=2 -force-vector-interleave=1 -passes=loop-vectorize,simplifycfg -verify-loop-info -simplifycfg-require-and-preserve-domtree=1 -force-widen-divrem-via-safe-divisor=0 < %s \| FileCheck %s
; RUN: opt -S -force-vector-width=1 -force-vector-interleave=2 -passes=loop-vectorize -verify-loop-info -force-widen-divrem-via-safe-divisor=0 < %s \| FileCheck %s --check-prefix=UNROLL-NO-VF		; RUN: opt -S -force-vector-width=1 -force-vector-interleave=2 -passes=loop-vectorize -verify-loop-info -force-widen-divrem-via-safe-divisor=0 < %s \| FileCheck %s --check-prefix=UNROLL-NO-VF

target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"		target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

; Test predication of non-void instructions, specifically (i) that these		; Test predication of non-void instructions, specifically (i) that these
; instructions permit vectorization and (ii) the creation of an insertelement		; instructions permit vectorization and (ii) the creation of an insertelement
; and a Phi node. We check the full 2-element sequence for all predicate instructions.		; and a Phi node. We check the full 2-element sequence for all predicate instructions.
define void @test(ptr nocapture %asd, ptr nocapture %aud,		define void @test(ptr nocapture %asd, ptr nocapture %aud,
; CHECK-LABEL: @test(		; CHECK-LABEL: @test(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[UGLYGEP:%.]] = getelementptr i8, ptr [[ASD:%.]], i64 512		; CHECK-NEXT: [[SCEVGEP:%.]] = getelementptr i8, ptr [[ASD:%.]], i64 512
; CHECK-NEXT: [[UGLYGEP1:%.]] = getelementptr i8, ptr [[AUD:%.]], i64 512		; CHECK-NEXT: [[SCEVGEP1:%.]] = getelementptr i8, ptr [[AUD:%.]], i64 512
; CHECK-NEXT: [[UGLYGEP2:%.]] = getelementptr i8, ptr [[ASR:%.]], i64 512		; CHECK-NEXT: [[SCEVGEP2:%.]] = getelementptr i8, ptr [[ASR:%.]], i64 512
; CHECK-NEXT: [[UGLYGEP3:%.]] = getelementptr i8, ptr [[AUR:%.]], i64 512		; CHECK-NEXT: [[SCEVGEP3:%.]] = getelementptr i8, ptr [[AUR:%.]], i64 512
; CHECK-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[ASD]], [[UGLYGEP1]]		; CHECK-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[ASD]], [[SCEVGEP1]]
; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[AUD]], [[UGLYGEP]]		; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[AUD]], [[SCEVGEP]]
; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]		; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
; CHECK-NEXT: [[BOUND04:%.*]] = icmp ult ptr [[ASD]], [[UGLYGEP2]]		; CHECK-NEXT: [[BOUND04:%.*]] = icmp ult ptr [[ASD]], [[SCEVGEP2]]
; CHECK-NEXT: [[BOUND15:%.*]] = icmp ult ptr [[ASR]], [[UGLYGEP]]		; CHECK-NEXT: [[BOUND15:%.*]] = icmp ult ptr [[ASR]], [[SCEVGEP]]
; CHECK-NEXT: [[FOUND_CONFLICT6:%.*]] = and i1 [[BOUND04]], [[BOUND15]]		; CHECK-NEXT: [[FOUND_CONFLICT6:%.*]] = and i1 [[BOUND04]], [[BOUND15]]
; CHECK-NEXT: [[CONFLICT_RDX:%.*]] = or i1 [[FOUND_CONFLICT]], [[FOUND_CONFLICT6]]		; CHECK-NEXT: [[CONFLICT_RDX:%.*]] = or i1 [[FOUND_CONFLICT]], [[FOUND_CONFLICT6]]
; CHECK-NEXT: [[BOUND07:%.*]] = icmp ult ptr [[ASD]], [[UGLYGEP3]]		; CHECK-NEXT: [[BOUND07:%.*]] = icmp ult ptr [[ASD]], [[SCEVGEP3]]
; CHECK-NEXT: [[BOUND18:%.*]] = icmp ult ptr [[AUR]], [[UGLYGEP]]		; CHECK-NEXT: [[BOUND18:%.*]] = icmp ult ptr [[AUR]], [[SCEVGEP]]
; CHECK-NEXT: [[FOUND_CONFLICT9:%.*]] = and i1 [[BOUND07]], [[BOUND18]]		; CHECK-NEXT: [[FOUND_CONFLICT9:%.*]] = and i1 [[BOUND07]], [[BOUND18]]
; CHECK-NEXT: [[CONFLICT_RDX10:%.*]] = or i1 [[CONFLICT_RDX]], [[FOUND_CONFLICT9]]		; CHECK-NEXT: [[CONFLICT_RDX10:%.*]] = or i1 [[CONFLICT_RDX]], [[FOUND_CONFLICT9]]
; CHECK-NEXT: [[BOUND011:%.*]] = icmp ult ptr [[AUD]], [[UGLYGEP2]]		; CHECK-NEXT: [[BOUND011:%.*]] = icmp ult ptr [[AUD]], [[SCEVGEP2]]
; CHECK-NEXT: [[BOUND112:%.*]] = icmp ult ptr [[ASR]], [[UGLYGEP1]]		; CHECK-NEXT: [[BOUND112:%.*]] = icmp ult ptr [[ASR]], [[SCEVGEP1]]
; CHECK-NEXT: [[FOUND_CONFLICT13:%.*]] = and i1 [[BOUND011]], [[BOUND112]]		; CHECK-NEXT: [[FOUND_CONFLICT13:%.*]] = and i1 [[BOUND011]], [[BOUND112]]
; CHECK-NEXT: [[CONFLICT_RDX14:%.*]] = or i1 [[CONFLICT_RDX10]], [[FOUND_CONFLICT13]]		; CHECK-NEXT: [[CONFLICT_RDX14:%.*]] = or i1 [[CONFLICT_RDX10]], [[FOUND_CONFLICT13]]
; CHECK-NEXT: [[BOUND015:%.*]] = icmp ult ptr [[AUD]], [[UGLYGEP3]]		; CHECK-NEXT: [[BOUND015:%.*]] = icmp ult ptr [[AUD]], [[SCEVGEP3]]
; CHECK-NEXT: [[BOUND116:%.*]] = icmp ult ptr [[AUR]], [[UGLYGEP1]]		; CHECK-NEXT: [[BOUND116:%.*]] = icmp ult ptr [[AUR]], [[SCEVGEP1]]
; CHECK-NEXT: [[FOUND_CONFLICT17:%.*]] = and i1 [[BOUND015]], [[BOUND116]]		; CHECK-NEXT: [[FOUND_CONFLICT17:%.*]] = and i1 [[BOUND015]], [[BOUND116]]
; CHECK-NEXT: [[CONFLICT_RDX18:%.*]] = or i1 [[CONFLICT_RDX14]], [[FOUND_CONFLICT17]]		; CHECK-NEXT: [[CONFLICT_RDX18:%.*]] = or i1 [[CONFLICT_RDX14]], [[FOUND_CONFLICT17]]
; CHECK-NEXT: [[BOUND019:%.*]] = icmp ult ptr [[ASR]], [[UGLYGEP3]]		; CHECK-NEXT: [[BOUND019:%.*]] = icmp ult ptr [[ASR]], [[SCEVGEP3]]
; CHECK-NEXT: [[BOUND120:%.*]] = icmp ult ptr [[AUR]], [[UGLYGEP2]]		; CHECK-NEXT: [[BOUND120:%.*]] = icmp ult ptr [[AUR]], [[SCEVGEP2]]
; CHECK-NEXT: [[FOUND_CONFLICT21:%.*]] = and i1 [[BOUND019]], [[BOUND120]]		; CHECK-NEXT: [[FOUND_CONFLICT21:%.*]] = and i1 [[BOUND019]], [[BOUND120]]
; CHECK-NEXT: [[CONFLICT_RDX22:%.*]] = or i1 [[CONFLICT_RDX18]], [[FOUND_CONFLICT21]]		; CHECK-NEXT: [[CONFLICT_RDX22:%.*]] = or i1 [[CONFLICT_RDX18]], [[FOUND_CONFLICT21]]
; CHECK-NEXT: br i1 [[CONFLICT_RDX22]], label [[SCALAR_PH:%.]], label [[VECTOR_BODY:%.]]		; CHECK-NEXT: br i1 [[CONFLICT_RDX22]], label [[SCALAR_PH:%.]], label [[VECTOR_BODY:%.]]
; CHECK: vector.body:		; CHECK: vector.body:
; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ [[INDEX_NEXT:%.]], [[PRED_UREM_CONTINUE27:%.]] ], [ 0, [[ENTRY:%.]] ]		; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ [[INDEX_NEXT:%.]], [[PRED_UREM_CONTINUE27:%.]] ], [ 0, [[ENTRY:%.]] ]
; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0		; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
; CHECK-NEXT: [[TMP1:%.*]] = getelementptr inbounds i32, ptr [[ASD]], i64 [[TMP0]]		; CHECK-NEXT: [[TMP1:%.*]] = getelementptr inbounds i32, ptr [[ASD]], i64 [[TMP0]]
; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds i32, ptr [[AUD]], i64 [[TMP0]]		; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds i32, ptr [[AUD]], i64 [[TMP0]]
▲ Show 20 Lines • Show All 120 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1		; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 128		; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 128
; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP19:![0-9]+]]		; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP19:![0-9]+]]
;		;
; UNROLL-NO-VF-LABEL: @test(		; UNROLL-NO-VF-LABEL: @test(
; UNROLL-NO-VF-NEXT: entry:		; UNROLL-NO-VF-NEXT: entry:
; UNROLL-NO-VF-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]		; UNROLL-NO-VF-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
; UNROLL-NO-VF: vector.memcheck:		; UNROLL-NO-VF: vector.memcheck:
; UNROLL-NO-VF-NEXT: [[UGLYGEP:%.]] = getelementptr i8, ptr [[ASD:%.]], i64 512		; UNROLL-NO-VF-NEXT: [[SCEVGEP:%.]] = getelementptr i8, ptr [[ASD:%.]], i64 512
; UNROLL-NO-VF-NEXT: [[UGLYGEP1:%.]] = getelementptr i8, ptr [[AUD:%.]], i64 512		; UNROLL-NO-VF-NEXT: [[SCEVGEP1:%.]] = getelementptr i8, ptr [[AUD:%.]], i64 512
; UNROLL-NO-VF-NEXT: [[UGLYGEP2:%.]] = getelementptr i8, ptr [[ASR:%.]], i64 512		; UNROLL-NO-VF-NEXT: [[SCEVGEP2:%.]] = getelementptr i8, ptr [[ASR:%.]], i64 512
; UNROLL-NO-VF-NEXT: [[UGLYGEP3:%.]] = getelementptr i8, ptr [[AUR:%.]], i64 512		; UNROLL-NO-VF-NEXT: [[SCEVGEP3:%.]] = getelementptr i8, ptr [[AUR:%.]], i64 512
; UNROLL-NO-VF-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[ASD]], [[UGLYGEP1]]		; UNROLL-NO-VF-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[ASD]], [[SCEVGEP1]]
; UNROLL-NO-VF-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[AUD]], [[UGLYGEP]]		; UNROLL-NO-VF-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[AUD]], [[SCEVGEP]]
; UNROLL-NO-VF-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]		; UNROLL-NO-VF-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
; UNROLL-NO-VF-NEXT: [[BOUND04:%.*]] = icmp ult ptr [[ASD]], [[UGLYGEP2]]		; UNROLL-NO-VF-NEXT: [[BOUND04:%.*]] = icmp ult ptr [[ASD]], [[SCEVGEP2]]
; UNROLL-NO-VF-NEXT: [[BOUND15:%.*]] = icmp ult ptr [[ASR]], [[UGLYGEP]]		; UNROLL-NO-VF-NEXT: [[BOUND15:%.*]] = icmp ult ptr [[ASR]], [[SCEVGEP]]
; UNROLL-NO-VF-NEXT: [[FOUND_CONFLICT6:%.*]] = and i1 [[BOUND04]], [[BOUND15]]		; UNROLL-NO-VF-NEXT: [[FOUND_CONFLICT6:%.*]] = and i1 [[BOUND04]], [[BOUND15]]
; UNROLL-NO-VF-NEXT: [[CONFLICT_RDX:%.*]] = or i1 [[FOUND_CONFLICT]], [[FOUND_CONFLICT6]]		; UNROLL-NO-VF-NEXT: [[CONFLICT_RDX:%.*]] = or i1 [[FOUND_CONFLICT]], [[FOUND_CONFLICT6]]
; UNROLL-NO-VF-NEXT: [[BOUND07:%.*]] = icmp ult ptr [[ASD]], [[UGLYGEP3]]		; UNROLL-NO-VF-NEXT: [[BOUND07:%.*]] = icmp ult ptr [[ASD]], [[SCEVGEP3]]
; UNROLL-NO-VF-NEXT: [[BOUND18:%.*]] = icmp ult ptr [[AUR]], [[UGLYGEP]]		; UNROLL-NO-VF-NEXT: [[BOUND18:%.*]] = icmp ult ptr [[AUR]], [[SCEVGEP]]
; UNROLL-NO-VF-NEXT: [[FOUND_CONFLICT9:%.*]] = and i1 [[BOUND07]], [[BOUND18]]		; UNROLL-NO-VF-NEXT: [[FOUND_CONFLICT9:%.*]] = and i1 [[BOUND07]], [[BOUND18]]
; UNROLL-NO-VF-NEXT: [[CONFLICT_RDX10:%.*]] = or i1 [[CONFLICT_RDX]], [[FOUND_CONFLICT9]]		; UNROLL-NO-VF-NEXT: [[CONFLICT_RDX10:%.*]] = or i1 [[CONFLICT_RDX]], [[FOUND_CONFLICT9]]
; UNROLL-NO-VF-NEXT: [[BOUND011:%.*]] = icmp ult ptr [[AUD]], [[UGLYGEP2]]		; UNROLL-NO-VF-NEXT: [[BOUND011:%.*]] = icmp ult ptr [[AUD]], [[SCEVGEP2]]
; UNROLL-NO-VF-NEXT: [[BOUND112:%.*]] = icmp ult ptr [[ASR]], [[UGLYGEP1]]		; UNROLL-NO-VF-NEXT: [[BOUND112:%.*]] = icmp ult ptr [[ASR]], [[SCEVGEP1]]
; UNROLL-NO-VF-NEXT: [[FOUND_CONFLICT13:%.*]] = and i1 [[BOUND011]], [[BOUND112]]		; UNROLL-NO-VF-NEXT: [[FOUND_CONFLICT13:%.*]] = and i1 [[BOUND011]], [[BOUND112]]
; UNROLL-NO-VF-NEXT: [[CONFLICT_RDX14:%.*]] = or i1 [[CONFLICT_RDX10]], [[FOUND_CONFLICT13]]		; UNROLL-NO-VF-NEXT: [[CONFLICT_RDX14:%.*]] = or i1 [[CONFLICT_RDX10]], [[FOUND_CONFLICT13]]
; UNROLL-NO-VF-NEXT: [[BOUND015:%.*]] = icmp ult ptr [[AUD]], [[UGLYGEP3]]		; UNROLL-NO-VF-NEXT: [[BOUND015:%.*]] = icmp ult ptr [[AUD]], [[SCEVGEP3]]
; UNROLL-NO-VF-NEXT: [[BOUND116:%.*]] = icmp ult ptr [[AUR]], [[UGLYGEP1]]		; UNROLL-NO-VF-NEXT: [[BOUND116:%.*]] = icmp ult ptr [[AUR]], [[SCEVGEP1]]
; UNROLL-NO-VF-NEXT: [[FOUND_CONFLICT17:%.*]] = and i1 [[BOUND015]], [[BOUND116]]		; UNROLL-NO-VF-NEXT: [[FOUND_CONFLICT17:%.*]] = and i1 [[BOUND015]], [[BOUND116]]
; UNROLL-NO-VF-NEXT: [[CONFLICT_RDX18:%.*]] = or i1 [[CONFLICT_RDX14]], [[FOUND_CONFLICT17]]		; UNROLL-NO-VF-NEXT: [[CONFLICT_RDX18:%.*]] = or i1 [[CONFLICT_RDX14]], [[FOUND_CONFLICT17]]
; UNROLL-NO-VF-NEXT: [[BOUND019:%.*]] = icmp ult ptr [[ASR]], [[UGLYGEP3]]		; UNROLL-NO-VF-NEXT: [[BOUND019:%.*]] = icmp ult ptr [[ASR]], [[SCEVGEP3]]
; UNROLL-NO-VF-NEXT: [[BOUND120:%.*]] = icmp ult ptr [[AUR]], [[UGLYGEP2]]		; UNROLL-NO-VF-NEXT: [[BOUND120:%.*]] = icmp ult ptr [[AUR]], [[SCEVGEP2]]
; UNROLL-NO-VF-NEXT: [[FOUND_CONFLICT21:%.*]] = and i1 [[BOUND019]], [[BOUND120]]		; UNROLL-NO-VF-NEXT: [[FOUND_CONFLICT21:%.*]] = and i1 [[BOUND019]], [[BOUND120]]
; UNROLL-NO-VF-NEXT: [[CONFLICT_RDX22:%.*]] = or i1 [[CONFLICT_RDX18]], [[FOUND_CONFLICT21]]		; UNROLL-NO-VF-NEXT: [[CONFLICT_RDX22:%.*]] = or i1 [[CONFLICT_RDX18]], [[FOUND_CONFLICT21]]
; UNROLL-NO-VF-NEXT: br i1 [[CONFLICT_RDX22]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]		; UNROLL-NO-VF-NEXT: br i1 [[CONFLICT_RDX22]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
; UNROLL-NO-VF: vector.ph:		; UNROLL-NO-VF: vector.ph:
; UNROLL-NO-VF-NEXT: br label [[VECTOR_BODY:%.*]]		; UNROLL-NO-VF-NEXT: br label [[VECTOR_BODY:%.*]]
; UNROLL-NO-VF: vector.body:		; UNROLL-NO-VF: vector.body:
; UNROLL-NO-VF-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_UREM_CONTINUE24:%.*]] ]		; UNROLL-NO-VF-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_UREM_CONTINUE24:%.*]] ]
; UNROLL-NO-VF-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0		; UNROLL-NO-VF-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
▲ Show 20 Lines • Show All 155 Lines • ▼ Show 20 Lines	if.end: ; preds = %if.then, %for.body
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond = icmp eq i64 %indvars.iv.next, 128		%exitcond = icmp eq i64 %indvars.iv.next, 128
br i1 %exitcond, label %for.cond.cleanup, label %for.body		br i1 %exitcond, label %for.cond.cleanup, label %for.body
}		}

define void @test_scalar2scalar(ptr nocapture %asd, ptr nocapture %bsd) {		define void @test_scalar2scalar(ptr nocapture %asd, ptr nocapture %bsd) {
; CHECK-LABEL: @test_scalar2scalar(		; CHECK-LABEL: @test_scalar2scalar(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[UGLYGEP:%.]] = getelementptr i8, ptr [[ASD:%.]], i64 512		; CHECK-NEXT: [[SCEVGEP:%.]] = getelementptr i8, ptr [[ASD:%.]], i64 512
; CHECK-NEXT: [[UGLYGEP1:%.]] = getelementptr i8, ptr [[BSD:%.]], i64 512		; CHECK-NEXT: [[SCEVGEP1:%.]] = getelementptr i8, ptr [[BSD:%.]], i64 512
; CHECK-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[ASD]], [[UGLYGEP1]]		; CHECK-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[ASD]], [[SCEVGEP1]]
; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[BSD]], [[UGLYGEP]]		; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[BSD]], [[SCEVGEP]]
; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]		; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH:%.]], label [[VECTOR_BODY:%.]]		; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH:%.]], label [[VECTOR_BODY:%.]]
; CHECK: vector.body:		; CHECK: vector.body:
; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ [[INDEX_NEXT:%.]], [[PRED_SDIV_CONTINUE4:%.]] ], [ 0, [[ENTRY:%.]] ]		; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ [[INDEX_NEXT:%.]], [[PRED_SDIV_CONTINUE4:%.]] ], [ 0, [[ENTRY:%.]] ]
; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0		; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
; CHECK-NEXT: [[TMP1:%.*]] = getelementptr inbounds i32, ptr [[ASD]], i64 [[TMP0]]		; CHECK-NEXT: [[TMP1:%.*]] = getelementptr inbounds i32, ptr [[ASD]], i64 [[TMP0]]
; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i32 0		; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i32 0
; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <2 x i32>, ptr [[TMP2]], align 4, !alias.scope !20, !noalias !23		; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <2 x i32>, ptr [[TMP2]], align 4, !alias.scope !20, !noalias !23
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1		; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 128		; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 128
; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP26:![0-9]+]]		; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP26:![0-9]+]]
;		;
; UNROLL-NO-VF-LABEL: @test_scalar2scalar(		; UNROLL-NO-VF-LABEL: @test_scalar2scalar(
; UNROLL-NO-VF-NEXT: entry:		; UNROLL-NO-VF-NEXT: entry:
; UNROLL-NO-VF-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]		; UNROLL-NO-VF-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
; UNROLL-NO-VF: vector.memcheck:		; UNROLL-NO-VF: vector.memcheck:
; UNROLL-NO-VF-NEXT: [[UGLYGEP:%.]] = getelementptr i8, ptr [[ASD:%.]], i64 512		; UNROLL-NO-VF-NEXT: [[SCEVGEP:%.]] = getelementptr i8, ptr [[ASD:%.]], i64 512
; UNROLL-NO-VF-NEXT: [[UGLYGEP1:%.]] = getelementptr i8, ptr [[BSD:%.]], i64 512		; UNROLL-NO-VF-NEXT: [[SCEVGEP1:%.]] = getelementptr i8, ptr [[BSD:%.]], i64 512
; UNROLL-NO-VF-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[ASD]], [[UGLYGEP1]]		; UNROLL-NO-VF-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[ASD]], [[SCEVGEP1]]
; UNROLL-NO-VF-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[BSD]], [[UGLYGEP]]		; UNROLL-NO-VF-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[BSD]], [[SCEVGEP]]
; UNROLL-NO-VF-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]		; UNROLL-NO-VF-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
; UNROLL-NO-VF-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]		; UNROLL-NO-VF-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
; UNROLL-NO-VF: vector.ph:		; UNROLL-NO-VF: vector.ph:
; UNROLL-NO-VF-NEXT: br label [[VECTOR_BODY:%.*]]		; UNROLL-NO-VF-NEXT: br label [[VECTOR_BODY:%.*]]
; UNROLL-NO-VF: vector.body:		; UNROLL-NO-VF: vector.body:
; UNROLL-NO-VF-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_SDIV_CONTINUE3:%.*]] ]		; UNROLL-NO-VF-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_SDIV_CONTINUE3:%.*]] ]
; UNROLL-NO-VF-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0		; UNROLL-NO-VF-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
; UNROLL-NO-VF-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1		; UNROLL-NO-VF-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
▲ Show 20 Lines • Show All 90 Lines • ▼ Show 20 Lines	if.end: ; preds = %if.then, %for.body
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond = icmp eq i64 %indvars.iv.next, 128		%exitcond = icmp eq i64 %indvars.iv.next, 128
br i1 %exitcond, label %for.cond.cleanup, label %for.body		br i1 %exitcond, label %for.cond.cleanup, label %for.body
}		}

define void @pr30172(ptr nocapture %asd, ptr nocapture %bsd) !dbg !5 {;		define void @pr30172(ptr nocapture %asd, ptr nocapture %bsd) !dbg !5 {;
; CHECK-LABEL: @pr30172(		; CHECK-LABEL: @pr30172(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[UGLYGEP:%.]] = getelementptr i8, ptr [[ASD:%.]], i64 512		; CHECK-NEXT: [[SCEVGEP:%.]] = getelementptr i8, ptr [[ASD:%.]], i64 512
; CHECK-NEXT: [[UGLYGEP1:%.]] = getelementptr i8, ptr [[BSD:%.]], i64 512		; CHECK-NEXT: [[SCEVGEP1:%.]] = getelementptr i8, ptr [[BSD:%.]], i64 512
; CHECK-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[ASD]], [[UGLYGEP1]]		; CHECK-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[ASD]], [[SCEVGEP1]]
; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[BSD]], [[UGLYGEP]]		; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[BSD]], [[SCEVGEP]]
; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]		; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH:%.]], label [[VECTOR_BODY:%.]]		; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH:%.]], label [[VECTOR_BODY:%.]]
; CHECK: vector.body:		; CHECK: vector.body:
; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ [[INDEX_NEXT:%.]], [[PRED_SDIV_CONTINUE4:%.]] ], [ 0, [[ENTRY:%.]] ]		; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ [[INDEX_NEXT:%.]], [[PRED_SDIV_CONTINUE4:%.]] ], [ 0, [[ENTRY:%.]] ]
; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0		; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
; CHECK-NEXT: [[TMP1:%.*]] = getelementptr inbounds i32, ptr [[ASD]], i64 [[TMP0]]		; CHECK-NEXT: [[TMP1:%.*]] = getelementptr inbounds i32, ptr [[ASD]], i64 [[TMP0]]
; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i32 0		; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i32 0
; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <2 x i32>, ptr [[TMP2]], align 4, !alias.scope !29, !noalias !32		; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <2 x i32>, ptr [[TMP2]], align 4, !alias.scope !29, !noalias !32
▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1		; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 128		; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 128
; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP37:![0-9]+]]		; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP37:![0-9]+]]
;		;
; UNROLL-NO-VF-LABEL: @pr30172(		; UNROLL-NO-VF-LABEL: @pr30172(
; UNROLL-NO-VF-NEXT: entry:		; UNROLL-NO-VF-NEXT: entry:
; UNROLL-NO-VF-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]		; UNROLL-NO-VF-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
; UNROLL-NO-VF: vector.memcheck:		; UNROLL-NO-VF: vector.memcheck:
; UNROLL-NO-VF-NEXT: [[UGLYGEP:%.]] = getelementptr i8, ptr [[ASD:%.]], i64 512		; UNROLL-NO-VF-NEXT: [[SCEVGEP:%.]] = getelementptr i8, ptr [[ASD:%.]], i64 512
; UNROLL-NO-VF-NEXT: [[UGLYGEP1:%.]] = getelementptr i8, ptr [[BSD:%.]], i64 512		; UNROLL-NO-VF-NEXT: [[SCEVGEP1:%.]] = getelementptr i8, ptr [[BSD:%.]], i64 512
; UNROLL-NO-VF-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[ASD]], [[UGLYGEP1]]		; UNROLL-NO-VF-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[ASD]], [[SCEVGEP1]]
; UNROLL-NO-VF-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[BSD]], [[UGLYGEP]]		; UNROLL-NO-VF-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[BSD]], [[SCEVGEP]]
; UNROLL-NO-VF-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]		; UNROLL-NO-VF-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
; UNROLL-NO-VF-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]		; UNROLL-NO-VF-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
; UNROLL-NO-VF: vector.ph:		; UNROLL-NO-VF: vector.ph:
; UNROLL-NO-VF-NEXT: br label [[VECTOR_BODY:%.*]]		; UNROLL-NO-VF-NEXT: br label [[VECTOR_BODY:%.*]]
; UNROLL-NO-VF: vector.body:		; UNROLL-NO-VF: vector.body:
; UNROLL-NO-VF-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_SDIV_CONTINUE3:%.*]] ]		; UNROLL-NO-VF-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_SDIV_CONTINUE3:%.*]] ]
; UNROLL-NO-VF-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0		; UNROLL-NO-VF-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
; UNROLL-NO-VF-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1		; UNROLL-NO-VF-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
▲ Show 20 Lines • Show All 112 Lines • ▼ Show 20 Lines
; CHECK-LABEL: @predicated_udiv_scalarized_operand(		; CHECK-LABEL: @predicated_udiv_scalarized_operand(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[SMAX:%.]] = call i64 @llvm.smax.i64(i64 [[N:%.]], i64 1)		; CHECK-NEXT: [[SMAX:%.]] = call i64 @llvm.smax.i64(i64 [[N:%.]], i64 1)
; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[SMAX]], 2		; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[SMAX]], 2
; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]		; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
; CHECK: vector.ph:		; CHECK: vector.ph:
; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[SMAX]], 2		; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[SMAX]], 2
; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[SMAX]], [[N_MOD_VF]]		; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[SMAX]], [[N_MOD_VF]]
; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <2 x i1> poison, i1 [[C:%.]], i64 0		; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <2 x i32> poison, i32 [[X:%.]], i64 0
; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i1> [[BROADCAST_SPLATINSERT]], <2 x i1> poison, <2 x i32> zeroinitializer		; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i32> [[BROADCAST_SPLATINSERT]], <2 x i32> poison, <2 x i32> zeroinitializer
		; CHECK-NEXT: [[BROADCAST_SPLATINSERT1:%.]] = insertelement <2 x i1> poison, i1 [[C:%.]], i64 0
		; CHECK-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <2 x i1> [[BROADCAST_SPLATINSERT1]], <2 x i1> poison, <2 x i32> zeroinitializer
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]		; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK: vector.body:		; CHECK: vector.body:
; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_UDIV_CONTINUE2:%.*]] ]		; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_UDIV_CONTINUE4:%.*]] ]
; CHECK-NEXT: [[VEC_PHI:%.]] = phi <2 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP18:%.]], [[PRED_UDIV_CONTINUE2]] ]		; CHECK-NEXT: [[VEC_PHI:%.]] = phi <2 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP17:%.]], [[PRED_UDIV_CONTINUE4]] ]
; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0		; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, ptr [[A:%.]], i64 [[TMP0]]		; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, ptr [[A:%.]], i64 [[TMP0]]
; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i32 0		; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i32 0
; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <2 x i32>, ptr [[TMP2]], align 4		; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <2 x i32>, ptr [[TMP2]], align 4
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x i1> [[BROADCAST_SPLAT]], i32 0		; CHECK-NEXT: [[TMP3:%.*]] = add nsw <2 x i32> [[WIDE_LOAD]], [[BROADCAST_SPLAT]]
; CHECK-NEXT: br i1 [[TMP3]], label [[PRED_UDIV_IF:%.]], label [[PRED_UDIV_CONTINUE:%.]]		; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i1> [[BROADCAST_SPLAT2]], i32 0
		; CHECK-NEXT: br i1 [[TMP4]], label [[PRED_UDIV_IF:%.]], label [[PRED_UDIV_CONTINUE:%.]]
; CHECK: pred.udiv.if:		; CHECK: pred.udiv.if:
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[WIDE_LOAD]], i32 0		; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x i32> [[WIDE_LOAD]], i32 0
; CHECK-NEXT: [[TMP5:%.]] = add nsw i32 [[TMP4]], [[X:%.]]		; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i32> [[TMP3]], i32 0
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i32> [[WIDE_LOAD]], i32 0		; CHECK-NEXT: [[TMP7:%.*]] = udiv i32 [[TMP5]], [[TMP6]]
; CHECK-NEXT: [[TMP7:%.*]] = udiv i32 [[TMP6]], [[TMP5]]
; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x i32> poison, i32 [[TMP7]], i32 0		; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x i32> poison, i32 [[TMP7]], i32 0
; CHECK-NEXT: br label [[PRED_UDIV_CONTINUE]]		; CHECK-NEXT: br label [[PRED_UDIV_CONTINUE]]
; CHECK: pred.udiv.continue:		; CHECK: pred.udiv.continue:
; CHECK-NEXT: [[TMP9:%.*]] = phi <2 x i32> [ poison, [[VECTOR_BODY]] ], [ [[TMP8]], [[PRED_UDIV_IF]] ]		; CHECK-NEXT: [[TMP9:%.*]] = phi <2 x i32> [ poison, [[VECTOR_BODY]] ], [ [[TMP8]], [[PRED_UDIV_IF]] ]
; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x i1> [[BROADCAST_SPLAT]], i32 1		; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x i1> [[BROADCAST_SPLAT2]], i32 1
; CHECK-NEXT: br i1 [[TMP10]], label [[PRED_UDIV_IF1:%.*]], label [[PRED_UDIV_CONTINUE2]]		; CHECK-NEXT: br i1 [[TMP10]], label [[PRED_UDIV_IF3:%.*]], label [[PRED_UDIV_CONTINUE4]]
; CHECK: pred.udiv.if1:		; CHECK: pred.udiv.if3:
; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x i32> [[WIDE_LOAD]], i32 1		; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x i32> [[WIDE_LOAD]], i32 1
; CHECK-NEXT: [[TMP12:%.*]] = add nsw i32 [[TMP11]], [[X]]		; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x i32> [[TMP3]], i32 1
; CHECK-NEXT: [[TMP13:%.*]] = extractelement <2 x i32> [[WIDE_LOAD]], i32 1		; CHECK-NEXT: [[TMP13:%.*]] = udiv i32 [[TMP11]], [[TMP12]]
; CHECK-NEXT: [[TMP14:%.*]] = udiv i32 [[TMP13]], [[TMP12]]		; CHECK-NEXT: [[TMP14:%.*]] = insertelement <2 x i32> [[TMP9]], i32 [[TMP13]], i32 1
; CHECK-NEXT: [[TMP15:%.*]] = insertelement <2 x i32> [[TMP9]], i32 [[TMP14]], i32 1		; CHECK-NEXT: br label [[PRED_UDIV_CONTINUE4]]
; CHECK-NEXT: br label [[PRED_UDIV_CONTINUE2]]		; CHECK: pred.udiv.continue4:
; CHECK: pred.udiv.continue2:		; CHECK-NEXT: [[TMP15:%.*]] = phi <2 x i32> [ [[TMP9]], [[PRED_UDIV_CONTINUE]] ], [ [[TMP14]], [[PRED_UDIV_IF3]] ]
; CHECK-NEXT: [[TMP16:%.*]] = phi <2 x i32> [ [[TMP9]], [[PRED_UDIV_CONTINUE]] ], [ [[TMP15]], [[PRED_UDIV_IF1]] ]		; CHECK-NEXT: [[TMP16:%.*]] = xor <2 x i1> [[BROADCAST_SPLAT2]], <i1 true, i1 true>
; CHECK-NEXT: [[TMP17:%.*]] = xor <2 x i1> [[BROADCAST_SPLAT]], <i1 true, i1 true>		; CHECK-NEXT: [[PREDPHI:%.*]] = select <2 x i1> [[BROADCAST_SPLAT2]], <2 x i32> [[TMP15]], <2 x i32> [[WIDE_LOAD]]
; CHECK-NEXT: [[PREDPHI:%.*]] = select <2 x i1> [[BROADCAST_SPLAT]], <2 x i32> [[TMP16]], <2 x i32> [[WIDE_LOAD]]		; CHECK-NEXT: [[TMP17]] = add <2 x i32> [[VEC_PHI]], [[PREDPHI]]
; CHECK-NEXT: [[TMP18]] = add <2 x i32> [[VEC_PHI]], [[PREDPHI]]
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2		; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
; CHECK-NEXT: [[TMP19:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]		; CHECK-NEXT: [[TMP18:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
; CHECK-NEXT: br i1 [[TMP19]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP38:![0-9]+]]		; CHECK-NEXT: br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP38:![0-9]+]]
; CHECK: middle.block:		; CHECK: middle.block:
; CHECK-NEXT: [[TMP20:%.*]] = call i32 @llvm.vector.reduce.add.v2i32(<2 x i32> [[TMP18]])		; CHECK-NEXT: [[TMP19:%.*]] = call i32 @llvm.vector.reduce.add.v2i32(<2 x i32> [[TMP17]])
; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[SMAX]], [[N_VEC]]		; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[SMAX]], [[N_VEC]]
; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]		; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
; CHECK: scalar.ph:		; CHECK: scalar.ph:
; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]		; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP20]], [[MIDDLE_BLOCK]] ]		; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP19]], [[MIDDLE_BLOCK]] ]
; CHECK-NEXT: br label [[FOR_BODY:%.*]]		; CHECK-NEXT: br label [[FOR_BODY:%.*]]
; CHECK: for.body:		; CHECK: for.body:
; CHECK-NEXT: [[I:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[I_NEXT:%.]], [[FOR_INC:%.*]] ]		; CHECK-NEXT: [[I:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[I_NEXT:%.]], [[FOR_INC:%.*]] ]
; CHECK-NEXT: [[R:%.]] = phi i32 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[T6:%.]], [[FOR_INC]] ]		; CHECK-NEXT: [[R:%.]] = phi i32 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[T6:%.]], [[FOR_INC]] ]
; CHECK-NEXT: [[T0:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[I]]		; CHECK-NEXT: [[T0:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[I]]
; CHECK-NEXT: [[T2:%.*]] = load i32, ptr [[T0]], align 4		; CHECK-NEXT: [[T2:%.*]] = load i32, ptr [[T0]], align 4
; CHECK-NEXT: br i1 [[C]], label [[IF_THEN:%.*]], label [[FOR_INC]]		; CHECK-NEXT: br i1 [[C]], label [[IF_THEN:%.*]], label [[FOR_INC]]
; CHECK: if.then:		; CHECK: if.then:
; CHECK-NEXT: [[T3:%.*]] = add nsw i32 [[T2]], [[X]]		; CHECK-NEXT: [[T3:%.*]] = add nsw i32 [[T2]], [[X]]
; CHECK-NEXT: [[T4:%.*]] = udiv i32 [[T2]], [[T3]]		; CHECK-NEXT: [[T4:%.*]] = udiv i32 [[T2]], [[T3]]
; CHECK-NEXT: br label [[FOR_INC]]		; CHECK-NEXT: br label [[FOR_INC]]
; CHECK: for.inc:		; CHECK: for.inc:
; CHECK-NEXT: [[T5:%.*]] = phi i32 [ [[T2]], [[FOR_BODY]] ], [ [[T4]], [[IF_THEN]] ]		; CHECK-NEXT: [[T5:%.*]] = phi i32 [ [[T2]], [[FOR_BODY]] ], [ [[T4]], [[IF_THEN]] ]
; CHECK-NEXT: [[T6]] = add i32 [[R]], [[T5]]		; CHECK-NEXT: [[T6]] = add i32 [[R]], [[T5]]
; CHECK-NEXT: [[I_NEXT]] = add nuw nsw i64 [[I]], 1		; CHECK-NEXT: [[I_NEXT]] = add nuw nsw i64 [[I]], 1
; CHECK-NEXT: [[COND:%.*]] = icmp slt i64 [[I_NEXT]], [[N]]		; CHECK-NEXT: [[COND:%.*]] = icmp slt i64 [[I_NEXT]], [[N]]
; CHECK-NEXT: br i1 [[COND]], label [[FOR_BODY]], label [[FOR_END]], !llvm.loop [[LOOP39:![0-9]+]]		; CHECK-NEXT: br i1 [[COND]], label [[FOR_BODY]], label [[FOR_END]], !llvm.loop [[LOOP39:![0-9]+]]
; CHECK: for.end:		; CHECK: for.end:
; CHECK-NEXT: [[T7:%.*]] = phi i32 [ [[T6]], [[FOR_INC]] ], [ [[TMP20]], [[MIDDLE_BLOCK]] ]		; CHECK-NEXT: [[T7:%.*]] = phi i32 [ [[T6]], [[FOR_INC]] ], [ [[TMP19]], [[MIDDLE_BLOCK]] ]
; CHECK-NEXT: ret i32 [[T7]]		; CHECK-NEXT: ret i32 [[T7]]
;		;
; UNROLL-NO-VF-LABEL: @predicated_udiv_scalarized_operand(		; UNROLL-NO-VF-LABEL: @predicated_udiv_scalarized_operand(
; UNROLL-NO-VF-NEXT: entry:		; UNROLL-NO-VF-NEXT: entry:
; UNROLL-NO-VF-NEXT: [[SMAX:%.]] = call i64 @llvm.smax.i64(i64 [[N:%.]], i64 1)		; UNROLL-NO-VF-NEXT: [[SMAX:%.]] = call i64 @llvm.smax.i64(i64 [[N:%.]], i64 1)
; UNROLL-NO-VF-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[SMAX]], 2		; UNROLL-NO-VF-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[SMAX]], 2
; UNROLL-NO-VF-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]		; UNROLL-NO-VF-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
; UNROLL-NO-VF: vector.ph:		; UNROLL-NO-VF: vector.ph:
▲ Show 20 Lines • Show All 108 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LV] Use BFI to adjust cost of predicated instructionsNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 509220

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/test/Analysis/CostModel/X86/masked-interleaved-store-i16.ll

llvm/test/Analysis/CostModel/X86/masked-scatter-i32-with-i8-index.ll

llvm/test/Analysis/CostModel/X86/masked-scatter-i64-with-i8-index.ll

llvm/test/Other/new-pm-defaults.ll

llvm/test/Other/new-pm-lto-defaults.ll

llvm/test/Other/new-pm-thinlto-postlink-defaults.ll

llvm/test/Transforms/LoopVectorize/ARM/mve-icmpcost.ll

llvm/test/Transforms/LoopVectorize/icmp-uniforms.ll

llvm/test/Transforms/LoopVectorize/if-pred-non-void.ll

[LV] Use BFI to adjust cost of predicated instructions
Needs ReviewPublic