This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
3/9
LoopVectorize.cpp
1
VPlan.h
-
VPlanRecipes.cpp
-
VPlanTransforms.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
RISCV/
-
uniform-load-store.ll
-
X86/
1/2
cost-model-assert.ll
-
uniform_mem_op.ll
1/2
first-order-recurrence-sink-replicate-region.ll
-
induction.ll
-
vplan-sink-scalars-and-merge.ll
-
unittests/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
-
VPlanTest.cpp

Differential D132892

[LV] Explicily lower uniform load as single load
AbandonedPublic

Authored by reames on Aug 29 2022, 3:11 PM.

Download Raw Diff

Details

Reviewers

fhahn
david-arm
Ayal
gilr

Summary

This replaces the "scalarize then clone-as-uniform" lowering for loads of uniform values with a dedicated mode of memory widening for this case. This avoids needing to relying on instcombine/GVN to clean up after redundant loads.

Note: I plan to do stores too. This patch includes some of the API plumbing on the store side, but doesn't actually use it yet. I thought the consistency in the API was worth a bit of temporarily nop code.

Diff Detail

Event Timeline

reames created this revision.Aug 29 2022, 3:11 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 29 2022, 3:11 PM

Herald added subscribers: frasercrmck, luismarques, apazos and 21 others. · View Herald Transcript

reames requested review of this revision.Aug 29 2022, 3:11 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 29 2022, 3:11 PM

Herald added subscribers: • pcwang-thead, vkmr, MaskRay. · View Herald Transcript

Harbormaster completed remote builds in B184023: Diff 456471.Aug 29 2022, 4:38 PM

I think this looks like a nice, logical improvement. Having a new uniform widening decision seems sensible. I have a few mostly minor comments, but I was a little surprised by a couple of test changes.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4262	nit: White space change.
4712	I'm not sure what the "if legal" bit refers to here? If we're treating this as uniform then it's because Legal->isUniformMemOp has returned true. I think you could just say: // Only try to scalarize the uniform memop itself if we're not using the direct lowering ...
9746	Not sure what this comment means?
9783	I think we normally write this as Value *Addr = State.get(getAddr(), VPIteration(0, 0)); which is what we've done elsewhere.
9785	Looks like this can fit on the end of the line above?
9786	Not sure I understand this comment? Why does this involve a reverse shuffle?
llvm/lib/Transforms/Vectorize/VPlan.h
1774	nit: I think this should have a simple comment like the functions above.
llvm/test/Transforms/LoopVectorize/X86/cost-model-assert.ll
26	This looks a bit unexpected. Do you know why the minimum iteration check has changed here from 8 to 4? It sounds like we've made a different choice of VF and/or interleave count (IC). I can only assume that the cost of some instructions has changed?
llvm/test/Transforms/LoopVectorize/first-order-recurrence-sink-replicate-region.ll
476	Do you know why this has changed? As far as I understand it, 'CLONE' literally means clone the scalar instruction once, whereas REPLICATE means insert copies into each lane of the vector. So in a sense we're now doing more loads than before I think?

fhahn mentioned this in D133019: [VPlan] Only generate single instr for loads uniform across all parts..Aug 31 2022, 7:08 AM

This replaces the "scalarize then clone-as-uniform" lowering for loads of uniform values with a dedicated mode of memory widening for this case. This avoids needing to relying on instcombine/GVN to clean up after redundant loads.

IIUC the main issue is that VPReplicateRecipe::isUniform means 'uniform-per-part/uniform-per-VF', not 'uniform-across-all-parts'. Unfortunately the terminology is not really used very consistently across different parts of LV. We should be able to catch some cases through improving lowering for uniform VPReplicateRecipes directly: D133019.

If we want to use isUniformMemOp to determine the uniform-across-all-parts property, I think it would make sense to extend VPReplicateRecipe to distinguish between our 2 versions of uniformity. AFAIK VPWidenXXX naming scheme is meant to indicate that the recipe will be widened (i.e. a wide vector instruction will be generated), but for uniform loads we only generate a single scalar instruction).

In D132892#3761149, @fhahn wrote:

This replaces the "scalarize then clone-as-uniform" lowering for loads of uniform values with a dedicated mode of memory widening for this case. This avoids needing to relying on instcombine/GVN to clean up after redundant loads.

IIUC the main issue is that VPReplicateRecipe::isUniform means 'uniform-per-part/uniform-per-VF', not 'uniform-across-all-parts'. Unfortunately the terminology is not really used very consistently across different parts of LV. We should be able to catch some cases through improving lowering for uniform VPReplicateRecipes directly: D133019.

I went ahead and accepted that one. I don't really care which approach we use here. This one felt slightly cleaner to me when considering store handling, but a) we can revisit if desired when adding stores, and b) incremental progress is good.

If we want to use isUniformMemOp to determine the uniform-across-all-parts property, I think it would make sense to extend VPReplicateRecipe to distinguish between our 2 versions of uniformity. AFAIK VPWidenXXX naming scheme is meant to indicate that the recipe will be widened (i.e. a wide vector instruction will be generated), but for uniform loads we only generate a single scalar instruction).

The result of the operation is still the widened vector type. It just happens to come from a splat of a single load. That would be my argument for this approach over yours, but as I said, I really don't care.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4712	We can't scalarize a scalable uniform store if the value being stored is not loop invariant. Because we don't know which lane to store, and the general scalarization support does not handle predicated scalable vectors.
9746	Stray change. This is the token I use for easy local search, should have been removed before posting.
9786	I coped the comment and forgot to update it. Will fix.
llvm/test/Transforms/LoopVectorize/X86/cost-model-assert.ll
26	Exactly. If I remember correctly, this is because uniform instructions are never considered possible to scalarize, and I didn't add the same check for the CM_Uniform. I'd decided it didn't really matter since this test wasn't about codegen (from the name), and was instead a no-crash test.
llvm/test/Transforms/LoopVectorize/first-order-recurrence-sink-replicate-region.ll
476	This one is definitely the profitable to scalarize point explained just above. I dug into this one in some depth.

fhahn mentioned this in rG422cf99161ed: [VPlan] Only generate single instr for loads uniform across all parts..Sep 8 2022, 6:28 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

55 lines

26 lines

5 lines

6 lines

test/

Transforms/

LoopVectorize/

RISCV/

uniform-load-store.ll

65 lines

X86/

cost-model-assert.ll

82 lines

uniform_mem_op.ll

178 lines

first-order-recurrence-sink-replicate-region.ll

2 lines

induction.ll

41 lines

vplan-sink-scalars-and-merge.ll

2 lines

unittests/

Transforms/

Vectorize/

VPlanTest.cpp

6 lines

Diff 456471

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,292 Lines • ▼ Show 20 Lines	public:

/// Decision that was taken during cost calculation for memory instruction.		/// Decision that was taken during cost calculation for memory instruction.
enum InstWidening {		enum InstWidening {
CM_Unknown,		CM_Unknown,
CM_Widen, // For consecutive accesses with stride +1.		CM_Widen, // For consecutive accesses with stride +1.
CM_Widen_Reverse, // For consecutive accesses with stride -1.		CM_Widen_Reverse, // For consecutive accesses with stride -1.
CM_Interleave,		CM_Interleave,
CM_GatherScatter,		CM_GatherScatter,
CM_Scalarize		CM_Scalarize,
		CM_Uniform
};		};

/// Save vectorization decision \p W and \p Cost taken by the cost model for		/// Save vectorization decision \p W and \p Cost taken by the cost model for
/// instruction \p I and vector width \p VF.		/// instruction \p I and vector width \p VF.
void setWideningDecision(Instruction *I, ElementCount VF, InstWidening W,		void setWideningDecision(Instruction *I, ElementCount VF, InstWidening W,
InstructionCost Cost) {		InstructionCost Cost) {
assert(VF.isVector() && "Expected VF >=2");		assert(VF.isVector() && "Expected VF >=2");
WideningDecisions[std::make_pair(I, VF)] = std::make_pair(W, Cost);		WideningDecisions[std::make_pair(I, VF)] = std::make_pair(W, Cost);
▲ Show 20 Lines • Show All 2,943 Lines • ▼ Show 20 Lines	void LoopVectorizationCostModel::collectLoopScalars(ElementCount VF) {
// store will remain scalar if the store is scalarized.		// store will remain scalar if the store is scalarized.
auto isScalarUse = [&](Instruction MemAccess, Value Ptr) {		auto isScalarUse = [&](Instruction MemAccess, Value Ptr) {
InstWidening WideningDecision = getWideningDecision(MemAccess, VF);		InstWidening WideningDecision = getWideningDecision(MemAccess, VF);
assert(WideningDecision != CM_Unknown &&		assert(WideningDecision != CM_Unknown &&
"Widening decision should be ready at this moment");		"Widening decision should be ready at this moment");
if (auto *Store = dyn_cast<StoreInst>(MemAccess))		if (auto *Store = dyn_cast<StoreInst>(MemAccess))
if (Ptr == Store->getValueOperand())		if (Ptr == Store->getValueOperand())
return WideningDecision == CM_Scalarize;		return WideningDecision == CM_Scalarize;

		david-armUnsubmitted Not Done Reply Inline Actions nit: White space change. david-arm: nit: White space change.
assert(Ptr == getLoadStorePointerOperand(MemAccess) &&		assert(Ptr == getLoadStorePointerOperand(MemAccess) &&
"Ptr is neither a value or pointer operand");		"Ptr is neither a value or pointer operand");
return WideningDecision != CM_GatherScatter;		return WideningDecision != CM_GatherScatter;
};		};

// A helper that returns true if the given value is a bitcast or		// A helper that returns true if the given value is a bitcast or
// getelementptr instruction contained in the loop.		// getelementptr instruction contained in the loop.
auto isLoopVaryingBitCastOrGEP = [&](Value *V) {		auto isLoopVaryingBitCastOrGEP = [&](Value *V) {
▲ Show 20 Lines • Show All 366 Lines • ▼ Show 20 Lines	if (Cmp && TheLoop->contains(Cmp) && Cmp->hasOneUse())
addToWorklistIfAllowed(Cmp);		addToWorklistIfAllowed(Cmp);

// Return true if all lanes perform the same memory operation, and we can		// Return true if all lanes perform the same memory operation, and we can
// thus chose to execute only one.		// thus chose to execute only one.
auto isUniformMemOpUse = [&](Instruction *I) {		auto isUniformMemOpUse = [&](Instruction *I) {
if (!Legal->isUniformMemOp(*I))		if (!Legal->isUniformMemOp(*I))
return false;		return false;
if (isa<LoadInst>(I))		if (isa<LoadInst>(I))
// Loading the same address always produces the same result - at least		// Handled via CM_Uniform
// assuming aliasing and ordering which have already been checked.		return false;
return true;
// Storing the same value on every iteration.		// Storing the same value on every iteration.
return TheLoop->isLoopInvariant(cast<StoreInst>(I)->getValueOperand());		return TheLoop->isLoopInvariant(cast<StoreInst>(I)->getValueOperand());
};		};

auto isUniformDecision = [&](Instruction *I, ElementCount VF) {		auto isUniformDecision = [&](Instruction *I, ElementCount VF) {
InstWidening WideningDecision = getWideningDecision(I, VF);		InstWidening WideningDecision = getWideningDecision(I, VF);
assert(WideningDecision != CM_Unknown &&		assert(WideningDecision != CM_Unknown &&
"Widening decision should be ready at this moment");		"Widening decision should be ready at this moment");

if (isUniformMemOpUse(I))		if (isUniformMemOpUse(I))
return true;		return true;

return (WideningDecision == CM_Widen \|\|		return (WideningDecision == CM_Widen \|\|
WideningDecision == CM_Widen_Reverse \|\|		WideningDecision == CM_Widen_Reverse \|\|
WideningDecision == CM_Interleave);		WideningDecision == CM_Interleave \|\|
		WideningDecision == CM_Uniform);
};		};


// Returns true if Ptr is the pointer operand of a memory access instruction		// Returns true if Ptr is the pointer operand of a memory access instruction
// I, and I is known to not require scalarization.		// I, and I is known to not require scalarization.
auto isVectorizedMemAccessUse = [&](Instruction I, Value Ptr) -> bool {		auto isVectorizedMemAccessUse = [&](Instruction I, Value Ptr) -> bool {
return getLoadStorePointerOperand(I) == Ptr && isUniformDecision(I, VF);		return getLoadStorePointerOperand(I) == Ptr && isUniformDecision(I, VF);
};		};
Show All 33 Lines	for (auto &I : *BB) {
continue;		continue;
}		}

// If there's no pointer operand, there's nothing to do.		// If there's no pointer operand, there's nothing to do.
auto *Ptr = getLoadStorePointerOperand(&I);		auto *Ptr = getLoadStorePointerOperand(&I);
if (!Ptr)		if (!Ptr)
continue;		continue;

		// Only try to scalarize the uniform memop itself if legal and we're not
		david-armUnsubmitted Not Done Reply Inline Actions I'm not sure what the "if legal" bit refers to here? If we're treating this as uniform then it's because Legal->isUniformMemOp has returned true. I think you could just say: // Only try to scalarize the uniform memop itself if we're not using the direct lowering ... david-arm: I'm not sure what the "if legal" bit refers to here? If we're treating this as uniform then…
		reamesAuthorUnsubmitted Done Reply Inline Actions We can't scalarize a scalable uniform store if the value being stored is not loop invariant. Because we don't know which lane to store, and the general scalarization support does not handle predicated scalable vectors. reames: We can't scalarize a scalable uniform store if the value being stored is not loop invariant.
		// using the direct lowering strategy (which is strictly better).
if (isUniformMemOpUse(&I))		if (isUniformMemOpUse(&I))
addToWorklistIfAllowed(&I);		addToWorklistIfAllowed(&I);

if (isUniformDecision(&I, VF)) {		if (isUniformDecision(&I, VF)) {
assert(isVectorizedMemAccessUse(&I, Ptr) && "consistency check");		assert(isVectorizedMemAccessUse(&I, Ptr) && "consistency check");
HasUniformUse.insert(Ptr);		HasUniformUse.insert(Ptr);
}		}
}		}
▲ Show 20 Lines • Show All 2,074 Lines • ▼ Show 20 Lines	for (Instruction &I : *BB) {
// TODO: We should generate better code and update the cost model for		// TODO: We should generate better code and update the cost model for
// predicated uniform stores. Today they are treated as any other		// predicated uniform stores. Today they are treated as any other
// predicated store (see added test cases in		// predicated store (see added test cases in
// invariant-store-vectorization.ll).		// invariant-store-vectorization.ll).
if (isa<StoreInst>(&I) && isScalarWithPredication(&I, VF))		if (isa<StoreInst>(&I) && isScalarWithPredication(&I, VF))
NumPredStores++;		NumPredStores++;

if (Legal->isUniformMemOp(I)) {		if (Legal->isUniformMemOp(I)) {
		if (isa<LoadInst>(I)) {
		setWideningDecision(&I, VF, CM_Uniform,
		getUniformMemOpCost(&I, VF));
		continue;
		}

auto isLegalToScalarize = [&]() {		auto isLegalToScalarize = [&]() {
if (!VF.isScalable())		if (!VF.isScalable())
// Scalarization of fixed length vectors "just works".		// Scalarization of fixed length vectors "just works".
return true;		return true;

// For scalable vectors, a uniform memop load is always
// uniform-by-parts and we know how to scalarize that.
if (isa<LoadInst>(I))
return true;

// A uniform store isn't neccessarily uniform-by-part		// A uniform store isn't neccessarily uniform-by-part
// and we can't assume scalarization.		// and we can't assume scalarization.
auto &SI = cast<StoreInst>(I);		auto &SI = cast<StoreInst>(I);
return TheLoop->isLoopInvariant(SI.getValueOperand());		return TheLoop->isLoopInvariant(SI.getValueOperand());
};		};

const InstructionCost GatherScatterCost =		const InstructionCost GatherScatterCost =
isLegalGatherOrScatter(&I, VF) ?		isLegalGatherOrScatter(&I, VF) ?
getGatherScatterCost(&I, VF) : InstructionCost::getInvalid();		getGatherScatterCost(&I, VF) : InstructionCost::getInvalid();

// Load: Scalar load + broadcast		// Load: Scalar load + broadcast
// Store: Scalar store + isLoopInvariantStoreValue ? 0 : extract		// Store: Scalar store + isLoopInvariantStoreValue ? 0 : extract
// TODO: Avoid replicating loads and stores instead of relying on		// TODO: Avoid replicating loads and stores instead of relying on
// instcombine to remove them.		// instcombine to remove them.
		// FIXME: Scalarization for predicated fixed vectors is way more
		// expensive than the cost we're using here.
const InstructionCost ScalarizationCost = isLegalToScalarize() ?		const InstructionCost ScalarizationCost = isLegalToScalarize() ?
getUniformMemOpCost(&I, VF) : InstructionCost::getInvalid();		getUniformMemOpCost(&I, VF) : InstructionCost::getInvalid();


// Choose better solution for the current VF, Note that Invalid		// Choose better solution for the current VF, Note that Invalid
// costs compare as maximumal large. If both are invalid, we get		// costs compare as maximumal large. If both are invalid, we get
// scalable invalid which signals a failure and a vectorization abort.		// scalable invalid which signals a failure and a vectorization abort.
if (GatherScatterCost < ScalarizationCost)		if (GatherScatterCost < ScalarizationCost)
setWideningDecision(&I, VF, CM_GatherScatter, GatherScatterCost);		setWideningDecision(&I, VF, CM_GatherScatter, GatherScatterCost);
else		else
setWideningDecision(&I, VF, CM_Scalarize, ScalarizationCost);		setWideningDecision(&I, VF, CM_Scalarize, ScalarizationCost);
continue;		continue;
▲ Show 20 Lines • Show All 373 Lines • ▼ Show 20 Lines	LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF,
case Instruction::Load: {		case Instruction::Load: {
ElementCount Width = VF;		ElementCount Width = VF;
if (Width.isVector()) {		if (Width.isVector()) {
InstWidening Decision = getWideningDecision(I, Width);		InstWidening Decision = getWideningDecision(I, Width);
assert(Decision != CM_Unknown &&		assert(Decision != CM_Unknown &&
"CM decision should be taken at this point");		"CM decision should be taken at this point");
if (getWideningCost(I, VF) == InstructionCost::getInvalid())		if (getWideningCost(I, VF) == InstructionCost::getInvalid())
return InstructionCost::getInvalid();		return InstructionCost::getInvalid();
if (Decision == CM_Scalarize)		if (Decision == CM_Scalarize \|\| Decision == CM_Uniform)
Width = ElementCount::getFixed(1);		Width = ElementCount::getFixed(1);
}		}
VectorTy = ToVectorTy(getLoadStoreType(I), Width);		VectorTy = ToVectorTy(getLoadStoreType(I), Width);
return getMemoryInstructionCost(I, VF);		return getMemoryInstructionCost(I, VF);
}		}
case Instruction::BitCast:		case Instruction::BitCast:
if (I->getType()->isPointerTy())		if (I->getType()->isPointerTy())
return 0;		return 0;
Show All 17 Lines	auto ComputeCCH = [&](Instruction *I) -> TTI::CastContextHint {
if (VF.isScalar() \|\| !TheLoop->contains(I))		if (VF.isScalar() \|\| !TheLoop->contains(I))
return TTI::CastContextHint::Normal;		return TTI::CastContextHint::Normal;

switch (getWideningDecision(I, VF)) {		switch (getWideningDecision(I, VF)) {
case LoopVectorizationCostModel::CM_GatherScatter:		case LoopVectorizationCostModel::CM_GatherScatter:
return TTI::CastContextHint::GatherScatter;		return TTI::CastContextHint::GatherScatter;
case LoopVectorizationCostModel::CM_Interleave:		case LoopVectorizationCostModel::CM_Interleave:
return TTI::CastContextHint::Interleave;		return TTI::CastContextHint::Interleave;
		case LoopVectorizationCostModel::CM_Uniform:
case LoopVectorizationCostModel::CM_Scalarize:		case LoopVectorizationCostModel::CM_Scalarize:
case LoopVectorizationCostModel::CM_Widen:		case LoopVectorizationCostModel::CM_Widen:
return Legal->isMaskRequired(I) ? TTI::CastContextHint::Masked		return Legal->isMaskRequired(I) ? TTI::CastContextHint::Masked
: TTI::CastContextHint::Normal;		: TTI::CastContextHint::Normal;
case LoopVectorizationCostModel::CM_Widen_Reverse:		case LoopVectorizationCostModel::CM_Widen_Reverse:
return TTI::CastContextHint::Reversed;		return TTI::CastContextHint::Reversed;
case LoopVectorizationCostModel::CM_Unknown:		case LoopVectorizationCostModel::CM_Unknown:
llvm_unreachable("Instr did not go through cost modelling?");		llvm_unreachable("Instr did not go through cost modelling?");
▲ Show 20 Lines • Show All 902 Lines • ▼ Show 20 Lines	VPRecipeBase VPRecipeBuilder::tryToWidenMemory(Instruction I,
// Determine if the pointer operand of the access is either consecutive or		// Determine if the pointer operand of the access is either consecutive or
// reverse consecutive.		// reverse consecutive.
LoopVectorizationCostModel::InstWidening Decision =		LoopVectorizationCostModel::InstWidening Decision =
CM.getWideningDecision(I, Range.Start);		CM.getWideningDecision(I, Range.Start);
bool Reverse = Decision == LoopVectorizationCostModel::CM_Widen_Reverse;		bool Reverse = Decision == LoopVectorizationCostModel::CM_Widen_Reverse;
bool Consecutive =		bool Consecutive =
Reverse \|\| Decision == LoopVectorizationCostModel::CM_Widen;		Reverse \|\| Decision == LoopVectorizationCostModel::CM_Widen;

		bool IsUniformMemOp = Decision == LoopVectorizationCostModel::CM_Uniform;

if (LoadInst *Load = dyn_cast<LoadInst>(I))		if (LoadInst *Load = dyn_cast<LoadInst>(I))
return new VPWidenMemoryInstructionRecipe(*Load, Operands[0], Mask,		return new VPWidenMemoryInstructionRecipe(*Load, Operands[0], Mask,
Consecutive, Reverse);		Consecutive, Reverse,
		IsUniformMemOp);

StoreInst *Store = cast<StoreInst>(I);		StoreInst *Store = cast<StoreInst>(I);
return new VPWidenMemoryInstructionRecipe(*Store, Operands[1], Operands[0],		return new VPWidenMemoryInstructionRecipe(*Store, Operands[1], Operands[0],
Mask, Consecutive, Reverse);		Mask, Consecutive, Reverse,
		IsUniformMemOp);
}		}

/// Creates a VPWidenIntOrFpInductionRecpipe for \p Phi. If needed, it will also		/// Creates a VPWidenIntOrFpInductionRecpipe for \p Phi. If needed, it will also
/// insert a recipe to expand the step for the induction recipe.		/// insert a recipe to expand the step for the induction recipe.
static VPWidenIntOrFpInductionRecipe *createWidenInductionRecipes(		static VPWidenIntOrFpInductionRecipe *createWidenInductionRecipes(
PHINode Phi, Instruction PhiOrTrunc, VPValue *Start,		PHINode Phi, Instruction PhiOrTrunc, VPValue *Start,
const InductionDescriptor &IndDesc, LoopVectorizationCostModel &CM,		const InductionDescriptor &IndDesc, LoopVectorizationCostModel &CM,
VPlan &Plan, ScalarEvolution &SE, Loop &OrigLoop, VFRange &Range) {		VPlan &Plan, ScalarEvolution &SE, Loop &OrigLoop, VFRange &Range) {
▲ Show 20 Lines • Show All 1,543 Lines • ▼ Show 20 Lines	const auto CreateVecPtr = [&](unsigned Part, Value Ptr) -> Value {
unsigned AddressSpace = Ptr->getType()->getPointerAddressSpace();		unsigned AddressSpace = Ptr->getType()->getPointerAddressSpace();
return Builder.CreateBitCast(PartPtr, DataTy->getPointerTo(AddressSpace));		return Builder.CreateBitCast(PartPtr, DataTy->getPointerTo(AddressSpace));
};		};

// Handle Stores:		// Handle Stores:
if (SI) {		if (SI) {
State.setDebugLocFromInst(SI);		State.setDebugLocFromInst(SI);

		// FLAGIT
		david-armUnsubmitted Not Done Reply Inline Actions Not sure what this comment means? david-arm: Not sure what this comment means?
		reamesAuthorUnsubmitted Done Reply Inline Actions Stray change. This is the token I use for easy local search, should have been removed before posting. reames: Stray change. This is the token I use for easy local search, should have been removed before…
		assert(!isUniformMemOp() &&
		"lowering for uniform stores not yet implemented");

for (unsigned Part = 0; Part < State.UF; ++Part) {		for (unsigned Part = 0; Part < State.UF; ++Part) {
Instruction *NewSI = nullptr;		Instruction *NewSI = nullptr;
Value *StoredVal = State.get(StoredValue, Part);		Value *StoredVal = State.get(StoredValue, Part);
if (CreateGatherScatter) {		if (CreateGatherScatter) {
Value *MaskPart = isMaskRequired ? BlockInMaskParts[Part] : nullptr;		Value *MaskPart = isMaskRequired ? BlockInMaskParts[Part] : nullptr;
Value *VectorGep = State.get(getAddr(), Part);		Value *VectorGep = State.get(getAddr(), Part);
NewSI = Builder.CreateMaskedScatter(StoredVal, VectorGep, Alignment,		NewSI = Builder.CreateMaskedScatter(StoredVal, VectorGep, Alignment,
MaskPart);		MaskPart);
Show All 16 Lines	for (unsigned Part = 0; Part < State.UF; ++Part) {
State.addMetadata(NewSI, SI);		State.addMetadata(NewSI, SI);
}		}
return;		return;
}		}

// Handle loads.		// Handle loads.
assert(LI && "Must have a load instruction");		assert(LI && "Must have a load instruction");
State.setDebugLocFromInst(LI);		State.setDebugLocFromInst(LI);
		if (isUniformMemOp()) {
		Value *Addr = State.get(getAddr(), {0, 0});
		david-armUnsubmitted Not Done Reply Inline Actions I think we normally write this as Value Addr = State.get(getAddr(), VPIteration(0, 0)); which is what we've done elsewhere. david-arm:* I think we normally write this as Value *Addr = State.get(getAddr(), VPIteration(0, 0))…
		auto *NewLI = Builder.CreateAlignedLoad(ScalarDataTy, Addr,
		Alignment);
		david-armUnsubmitted Not Done Reply Inline Actions Looks like this can fit on the end of the line above? david-arm: Looks like this can fit on the end of the line above?
		// Add metadata to the load, but setVectorValue to the reverse shuffle.
		david-armUnsubmitted Not Done Reply Inline Actions Not sure I understand this comment? Why does this involve a reverse shuffle? david-arm: Not sure I understand this comment? Why does this involve a reverse shuffle?
		reamesAuthorUnsubmitted Done Reply Inline Actions I coped the comment and forgot to update it. Will fix. reames: I coped the comment and forgot to update it. Will fix.
		State.addMetadata(NewLI, LI);

		for (unsigned Part = 0; Part < State.UF; ++Part)
		State.set(getVPSingleValue(), NewLI, {Part, 0});
		return;
		}

for (unsigned Part = 0; Part < State.UF; ++Part) {		for (unsigned Part = 0; Part < State.UF; ++Part) {
Value *NewLI;		Value *NewLI;
if (CreateGatherScatter) {		if (CreateGatherScatter) {
Value *MaskPart = isMaskRequired ? BlockInMaskParts[Part] : nullptr;		Value *MaskPart = isMaskRequired ? BlockInMaskParts[Part] : nullptr;
Value *VectorGep = State.get(getAddr(), Part);		Value *VectorGep = State.get(getAddr(), Part);
NewLI = Builder.CreateMaskedGather(DataTy, VectorGep, Alignment, MaskPart,		NewLI = Builder.CreateMaskedGather(DataTy, VectorGep, Alignment, MaskPart,
nullptr, "wide.masked.gather");		nullptr, "wide.masked.gather");
State.addMetadata(NewLI, LI);		State.addMetadata(NewLI, LI);
▲ Show 20 Lines • Show All 884 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/VPlan.h

Show First 20 Lines • Show All 1,694 Lines • ▼ Show 20 Lines	class VPWidenMemoryInstructionRecipe : public VPRecipeBase {
Instruction &Ingredient;		Instruction &Ingredient;

// Whether the loaded-from / stored-to addresses are consecutive.		// Whether the loaded-from / stored-to addresses are consecutive.
bool Consecutive;		bool Consecutive;

// Whether the consecutive loaded/stored addresses are in reverse order.		// Whether the consecutive loaded/stored addresses are in reverse order.
bool Reverse;		bool Reverse;

		// Whether this is a uniform mem op that we can lower with a single
		// copy of the original instruction for all lanes.
		bool UniformMemOp;

void setMask(VPValue *Mask) {		void setMask(VPValue *Mask) {
if (!Mask)		if (!Mask)
return;		return;
addOperand(Mask);		addOperand(Mask);
}		}

bool isMasked() const {		bool isMasked() const {
return isStore() ? getNumOperands() == 3 : getNumOperands() == 2;		return isStore() ? getNumOperands() == 3 : getNumOperands() == 2;
}		}

public:		public:
VPWidenMemoryInstructionRecipe(LoadInst &Load, VPValue Addr, VPValue Mask,		VPWidenMemoryInstructionRecipe(LoadInst &Load, VPValue Addr, VPValue Mask,
bool Consecutive, bool Reverse)		bool Consecutive, bool Reverse,
		bool UniformMemOp)
: VPRecipeBase(VPWidenMemoryInstructionSC, {Addr}), Ingredient(Load),		: VPRecipeBase(VPWidenMemoryInstructionSC, {Addr}), Ingredient(Load),
Consecutive(Consecutive), Reverse(Reverse) {		Consecutive(Consecutive), Reverse(Reverse), UniformMemOp(UniformMemOp) {
assert((Consecutive \|\| !Reverse) && "Reverse implies consecutive");		assert((Consecutive \|\| !Reverse) && "Reverse implies consecutive");
		assert(!(Consecutive && UniformMemOp) && "Uniform can't be consecutive");
new VPValue(VPValue::VPVMemoryInstructionSC, &Load, this);		new VPValue(VPValue::VPVMemoryInstructionSC, &Load, this);
setMask(Mask);		setMask(Mask);
}		}

VPWidenMemoryInstructionRecipe(StoreInst &Store, VPValue *Addr,		VPWidenMemoryInstructionRecipe(StoreInst &Store, VPValue *Addr,
VPValue StoredValue, VPValue Mask,		VPValue StoredValue, VPValue Mask,
bool Consecutive, bool Reverse)		bool Consecutive, bool Reverse,
		bool UniformMemOp)
: VPRecipeBase(VPWidenMemoryInstructionSC, {Addr, StoredValue}),		: VPRecipeBase(VPWidenMemoryInstructionSC, {Addr, StoredValue}),
Ingredient(Store), Consecutive(Consecutive), Reverse(Reverse) {		Ingredient(Store), Consecutive(Consecutive), Reverse(Reverse),
		UniformMemOp(UniformMemOp){
assert((Consecutive \|\| !Reverse) && "Reverse implies consecutive");		assert((Consecutive \|\| !Reverse) && "Reverse implies consecutive");
		assert(!(Consecutive && UniformMemOp) && "Uniform can't be consecutive");
setMask(Mask);		setMask(Mask);
}		}

/// Method to support type inquiry through isa, cast, and dyn_cast.		/// Method to support type inquiry through isa, cast, and dyn_cast.
static inline bool classof(const VPDef *D) {		static inline bool classof(const VPDef *D) {
return D->getVPDefID() == VPRecipeBase::VPWidenMemoryInstructionSC;		return D->getVPDefID() == VPRecipeBase::VPWidenMemoryInstructionSC;
}		}

Show All 20 Lines	public:

// Return whether the loaded-from / stored-to addresses are consecutive.		// Return whether the loaded-from / stored-to addresses are consecutive.
bool isConsecutive() const { return Consecutive; }		bool isConsecutive() const { return Consecutive; }

// Return whether the consecutive loaded/stored addresses are in reverse		// Return whether the consecutive loaded/stored addresses are in reverse
// order.		// order.
bool isReverse() const { return Reverse; }		bool isReverse() const { return Reverse; }

		bool isUniformMemOp() const { return UniformMemOp; }
		david-armUnsubmitted Not Done Reply Inline Actions nit: I think this should have a simple comment like the functions above. david-arm: nit: I think this should have a simple comment like the functions above.

/// Generate the wide load/store.		/// Generate the wide load/store.
void execute(VPTransformState &State) override;		void execute(VPTransformState &State) override;

#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)		#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)
/// Print the recipe.		/// Print the recipe.
void print(raw_ostream &O, const Twine &Indent,		void print(raw_ostream &O, const Twine &Indent,
VPSlotTracker &SlotTracker) const override;		VPSlotTracker &SlotTracker) const override;
#endif		#endif

/// Returns true if the recipe only uses the first lane of operand \p Op.		/// Returns true if the recipe only uses the first lane of operand \p Op.
bool onlyFirstLaneUsed(const VPValue *Op) const override {		bool onlyFirstLaneUsed(const VPValue *Op) const override {
assert(is_contained(operands(), Op) &&		assert(is_contained(operands(), Op) &&
"Op must be an operand of the recipe");		"Op must be an operand of the recipe");

		// The definition used for uniform mem op implies only the first lane
		// is needed (even for both ops on a store).
		if (UniformMemOp)
		return true;

// Widened, consecutive memory operations only demand the first lane of		// Widened, consecutive memory operations only demand the first lane of
// their address, unless the same operand is also stored. That latter can		// their address, unless the same operand is also stored. That latter can
// happen with opaque pointers.		// happen with opaque pointers.
return Op == getAddr() && isConsecutive() &&		return Op == getAddr() && isConsecutive() &&
(!isStore() \|\| Op != getStoredValue());		(!isStore() \|\| Op != getStoredValue());
}		}

Instruction &getIngredient() const { return Ingredient; }		Instruction &getIngredient() const { return Ingredient; }
▲ Show 20 Lines • Show All 1,259 Lines • ▼ Show 20 Lines
VPValue getOrCreateVPValueForSCEVExpr(VPlan &Plan, const SCEV Expr,		VPValue getOrCreateVPValueForSCEVExpr(VPlan &Plan, const SCEV Expr,
ScalarEvolution &SE);		ScalarEvolution &SE);

/// Returns true if \p VPV is uniform after vectorization.		/// Returns true if \p VPV is uniform after vectorization.
inline bool isUniformAfterVectorization(VPValue *VPV) {		inline bool isUniformAfterVectorization(VPValue *VPV) {
if (auto *Def = VPV->getDef()) {		if (auto *Def = VPV->getDef()) {
if (auto Rep = dyn_cast<VPReplicateRecipe>(Def))		if (auto Rep = dyn_cast<VPReplicateRecipe>(Def))
return Rep->isUniform();		return Rep->isUniform();
		if (auto Rep = dyn_cast<VPWidenMemoryInstructionRecipe>(Def))
		return Rep->isUniformMemOp();
return false;		return false;
}		}
// A value without a def is external to vplan and thus uniform.		// A value without a def is external to vplan and thus uniform.
return true;		return true;
}		}
} // end namespace vputils		} // end namespace vputils

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_TRANSFORMS_VECTORIZE_VPLAN_H		#endif // LLVM_TRANSFORMS_VECTORIZE_VPLAN_H

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

Show First 20 Lines • Show All 945 Lines • ▼ Show 20 Lines	void VPPredInstPHIRecipe::print(raw_ostream &O, const Twine &Indent,
O << Indent << "PHI-PREDICATED-INSTRUCTION ";		O << Indent << "PHI-PREDICATED-INSTRUCTION ";
printAsOperand(O, SlotTracker);		printAsOperand(O, SlotTracker);
O << " = ";		O << " = ";
printOperands(O, SlotTracker);		printOperands(O, SlotTracker);
}		}

void VPWidenMemoryInstructionRecipe::print(raw_ostream &O, const Twine &Indent,		void VPWidenMemoryInstructionRecipe::print(raw_ostream &O, const Twine &Indent,
VPSlotTracker &SlotTracker) const {		VPSlotTracker &SlotTracker) const {
		if (UniformMemOp)
		O << Indent << "UNIFORM-MEM ";
		else
O << Indent << "WIDEN ";		O << Indent << "WIDEN ";

if (!isStore()) {		if (!isStore()) {
getVPSingleValue()->printAsOperand(O, SlotTracker);		getVPSingleValue()->printAsOperand(O, SlotTracker);
O << " = ";		O << " = ";
}		}
O << Instruction::getOpcodeName(Ingredient.getOpcode()) << " ";		O << Instruction::getOpcodeName(Ingredient.getOpcode()) << " ";

printOperands(O, SlotTracker);		printOperands(O, SlotTracker);
▲ Show 20 Lines • Show All 277 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

Show First 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	for (VPRecipeBase &Ingredient :
} else {		} else {
assert(isa<VPInstruction>(&Ingredient) &&		assert(isa<VPInstruction>(&Ingredient) &&
"only VPInstructions expected here");		"only VPInstructions expected here");
assert(!isa<PHINode>(Inst) && "phis should be handled above");		assert(!isa<PHINode>(Inst) && "phis should be handled above");
// Create VPWidenMemoryInstructionRecipe for loads and stores.		// Create VPWidenMemoryInstructionRecipe for loads and stores.
if (LoadInst *Load = dyn_cast<LoadInst>(Inst)) {		if (LoadInst *Load = dyn_cast<LoadInst>(Inst)) {
NewRecipe = new VPWidenMemoryInstructionRecipe(		NewRecipe = new VPWidenMemoryInstructionRecipe(
*Load, Plan->getOrAddVPValue(getLoadStorePointerOperand(Inst)),		*Load, Plan->getOrAddVPValue(getLoadStorePointerOperand(Inst)),
nullptr /Mask/, false /Consecutive/, false /Reverse/);		nullptr /Mask/, false /Consecutive/, false /Reverse/,
		false /IsUniformMemOp/);
} else if (StoreInst *Store = dyn_cast<StoreInst>(Inst)) {		} else if (StoreInst *Store = dyn_cast<StoreInst>(Inst)) {
NewRecipe = new VPWidenMemoryInstructionRecipe(		NewRecipe = new VPWidenMemoryInstructionRecipe(
*Store, Plan->getOrAddVPValue(getLoadStorePointerOperand(Inst)),		*Store, Plan->getOrAddVPValue(getLoadStorePointerOperand(Inst)),
Plan->getOrAddVPValue(Store->getValueOperand()), nullptr /Mask/,		Plan->getOrAddVPValue(Store->getValueOperand()), nullptr /Mask/,
false /Consecutive/, false /Reverse/);		false /Consecutive/, false /Reverse/,
		false /IsUniformMemOp/);
} else if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(Inst)) {		} else if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(Inst)) {
NewRecipe = new VPWidenGEPRecipe(		NewRecipe = new VPWidenGEPRecipe(
GEP, Plan->mapToVPValues(GEP->operands()), OrigLoop);		GEP, Plan->mapToVPValues(GEP->operands()), OrigLoop);
} else if (CallInst *CI = dyn_cast<CallInst>(Inst)) {		} else if (CallInst *CI = dyn_cast<CallInst>(Inst)) {
NewRecipe =		NewRecipe =
new VPWidenCallRecipe(*CI, Plan->mapToVPValues(CI->args()));		new VPWidenCallRecipe(*CI, Plan->mapToVPValues(CI->args()));
} else if (SelectInst *SI = dyn_cast<SelectInst>(Inst)) {		} else if (SelectInst *SI = dyn_cast<SelectInst>(Inst)) {
bool InvariantCond =		bool InvariantCond =
▲ Show 20 Lines • Show All 353 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/RISCV/uniform-load-store.ll

	Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	; FIXEDLEN-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; FIXEDLEN-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; FIXEDLEN: vector.ph:			; FIXEDLEN: vector.ph:
	; FIXEDLEN-NEXT: br label [[VECTOR_BODY:%.*]]			; FIXEDLEN-NEXT: br label [[VECTOR_BODY:%.*]]
	; FIXEDLEN: vector.body:			; FIXEDLEN: vector.body:
	; FIXEDLEN-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; FIXEDLEN-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; FIXEDLEN-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0			; FIXEDLEN-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
	; FIXEDLEN-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 2			; FIXEDLEN-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 2
	; FIXEDLEN-NEXT: [[TMP2:%.]] = load i64, ptr [[B:%.]], align 8			; FIXEDLEN-NEXT: [[TMP2:%.]] = load i64, ptr [[B:%.]], align 8
	; FIXEDLEN-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <2 x i64> poison, i64 [[TMP2]], i32 0			; FIXEDLEN-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <2 x i64> poison, i64 [[TMP2]], i32 0
	; FIXEDLEN-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT]], <2 x i64> poison, <2 x i32> zeroinitializer
	; FIXEDLEN-NEXT: [[TMP3:%.*]] = load i64, ptr [[B]], align 8
	; FIXEDLEN-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <2 x i64> poison, i64 [[TMP3]], i32 0
	; FIXEDLEN-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT1]], <2 x i64> poison, <2 x i32> zeroinitializer			; FIXEDLEN-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT1]], <2 x i64> poison, <2 x i32> zeroinitializer
	; FIXEDLEN-NEXT: [[TMP4:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP0]]			; FIXEDLEN-NEXT: [[TMP3:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP0]]
	; FIXEDLEN-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]			; FIXEDLEN-NEXT: [[TMP4:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]
	; FIXEDLEN-NEXT: [[TMP6:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 0			; FIXEDLEN-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[TMP3]], i32 0
	; FIXEDLEN-NEXT: store <2 x i64> [[BROADCAST_SPLAT]], ptr [[TMP6]], align 8			; FIXEDLEN-NEXT: store <2 x i64> [[BROADCAST_SPLAT2]], ptr [[TMP5]], align 8
	; FIXEDLEN-NEXT: [[TMP7:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 2			; FIXEDLEN-NEXT: [[TMP6:%.*]] = getelementptr inbounds i64, ptr [[TMP3]], i32 2
	; FIXEDLEN-NEXT: store <2 x i64> [[BROADCAST_SPLAT2]], ptr [[TMP7]], align 8			; FIXEDLEN-NEXT: store <2 x i64> [[BROADCAST_SPLAT2]], ptr [[TMP6]], align 8
	; FIXEDLEN-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; FIXEDLEN-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; FIXEDLEN-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024			; FIXEDLEN-NEXT: [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024
	; FIXEDLEN-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]			; FIXEDLEN-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; FIXEDLEN: middle.block:			; FIXEDLEN: middle.block:
	; FIXEDLEN-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, 1024			; FIXEDLEN-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, 1024
	; FIXEDLEN-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; FIXEDLEN-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; FIXEDLEN: scalar.ph:			; FIXEDLEN: scalar.ph:
	; FIXEDLEN-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1024, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; FIXEDLEN-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1024, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; FIXEDLEN-NEXT: br label [[FOR_BODY:%.*]]			; FIXEDLEN-NEXT: br label [[FOR_BODY:%.*]]
	; FIXEDLEN: for.body:			; FIXEDLEN: for.body:
	; FIXEDLEN-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]			; FIXEDLEN-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
	▲ Show 20 Lines • Show All 146 Lines • ▼ Show 20 Lines
	; FIXEDLEN-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; FIXEDLEN-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; FIXEDLEN: vector.ph:			; FIXEDLEN: vector.ph:
	; FIXEDLEN-NEXT: br label [[VECTOR_BODY:%.*]]			; FIXEDLEN-NEXT: br label [[VECTOR_BODY:%.*]]
	; FIXEDLEN: vector.body:			; FIXEDLEN: vector.body:
	; FIXEDLEN-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; FIXEDLEN-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; FIXEDLEN-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0			; FIXEDLEN-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
	; FIXEDLEN-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 2			; FIXEDLEN-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 2
	; FIXEDLEN-NEXT: [[TMP2:%.]] = load i64, ptr [[B:%.]], align 8			; FIXEDLEN-NEXT: [[TMP2:%.]] = load i64, ptr [[B:%.]], align 8
	; FIXEDLEN-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <2 x i64> poison, i64 [[TMP2]], i32 0			; FIXEDLEN-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <2 x i64> poison, i64 [[TMP2]], i32 0
	; FIXEDLEN-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT]], <2 x i64> poison, <2 x i32> zeroinitializer
	; FIXEDLEN-NEXT: [[TMP3:%.*]] = load i64, ptr [[B]], align 8
	; FIXEDLEN-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <2 x i64> poison, i64 [[TMP3]], i32 0
	; FIXEDLEN-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT1]], <2 x i64> poison, <2 x i32> zeroinitializer			; FIXEDLEN-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT1]], <2 x i64> poison, <2 x i32> zeroinitializer
	; FIXEDLEN-NEXT: [[TMP4:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP0]]			; FIXEDLEN-NEXT: [[TMP3:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP0]]
	; FIXEDLEN-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]			; FIXEDLEN-NEXT: [[TMP4:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]
	; FIXEDLEN-NEXT: [[TMP6:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 0			; FIXEDLEN-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[TMP3]], i32 0
	; FIXEDLEN-NEXT: store <2 x i64> [[BROADCAST_SPLAT]], ptr [[TMP6]], align 8			; FIXEDLEN-NEXT: store <2 x i64> [[BROADCAST_SPLAT2]], ptr [[TMP5]], align 8
	; FIXEDLEN-NEXT: [[TMP7:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 2			; FIXEDLEN-NEXT: [[TMP6:%.*]] = getelementptr inbounds i64, ptr [[TMP3]], i32 2
	; FIXEDLEN-NEXT: store <2 x i64> [[BROADCAST_SPLAT2]], ptr [[TMP7]], align 8			; FIXEDLEN-NEXT: store <2 x i64> [[BROADCAST_SPLAT2]], ptr [[TMP6]], align 8
	; FIXEDLEN-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; FIXEDLEN-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; FIXEDLEN-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024			; FIXEDLEN-NEXT: [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024
	; FIXEDLEN-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]			; FIXEDLEN-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
	; FIXEDLEN: middle.block:			; FIXEDLEN: middle.block:
	; FIXEDLEN-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, 1024			; FIXEDLEN-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, 1024
	; FIXEDLEN-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; FIXEDLEN-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; FIXEDLEN: scalar.ph:			; FIXEDLEN: scalar.ph:
	; FIXEDLEN-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1024, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; FIXEDLEN-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1024, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; FIXEDLEN-NEXT: br label [[FOR_BODY:%.*]]			; FIXEDLEN-NEXT: br label [[FOR_BODY:%.*]]
	; FIXEDLEN: for.body:			; FIXEDLEN: for.body:
	; FIXEDLEN-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]			; FIXEDLEN-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
	; FIXEDLEN-NEXT: [[V:%.*]] = load i64, ptr [[B]], align 8			; FIXEDLEN-NEXT: [[V:%.*]] = load i64, ptr [[B]], align 8
	; FIXEDLEN-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]			; FIXEDLEN-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
	; FIXEDLEN-NEXT: store i64 [[V]], ptr [[ARRAYIDX]], align 8			; FIXEDLEN-NEXT: store i64 [[V]], ptr [[ARRAYIDX]], align 8
	; FIXEDLEN-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; FIXEDLEN-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; FIXEDLEN-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024			; FIXEDLEN-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024
	; FIXEDLEN-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]			; FIXEDLEN-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
	; FIXEDLEN: for.end:			; FIXEDLEN: for.end:
	; FIXEDLEN-NEXT: [[V_LCSSA:%.*]] = phi i64 [ [[V]], [[FOR_BODY]] ], [ [[TMP3]], [[MIDDLE_BLOCK]] ]			; FIXEDLEN-NEXT: [[V_LCSSA:%.*]] = phi i64 [ [[V]], [[FOR_BODY]] ], [ [[TMP2]], [[MIDDLE_BLOCK]] ]
	; FIXEDLEN-NEXT: ret i64 [[V_LCSSA]]			; FIXEDLEN-NEXT: ret i64 [[V_LCSSA]]
	;			;
	; TF-SCALABLE-LABEL: @uniform_load_outside_use(			; TF-SCALABLE-LABEL: @uniform_load_outside_use(
	; TF-SCALABLE-NEXT: entry:			; TF-SCALABLE-NEXT: entry:
	; TF-SCALABLE-NEXT: br label [[FOR_BODY:%.*]]			; TF-SCALABLE-NEXT: br label [[FOR_BODY:%.*]]
	; TF-SCALABLE: for.body:			; TF-SCALABLE: for.body:
	; TF-SCALABLE-NEXT: [[IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.*]], [[FOR_BODY]] ]			; TF-SCALABLE-NEXT: [[IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.*]], [[FOR_BODY]] ]
	; TF-SCALABLE-NEXT: [[V:%.]] = load i64, ptr [[B:%.]], align 8			; TF-SCALABLE-NEXT: [[V:%.]] = load i64, ptr [[B:%.]], align 8
	▲ Show 20 Lines • Show All 353 Lines • ▼ Show 20 Lines
	; FIXEDLEN-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; FIXEDLEN-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; FIXEDLEN: vector.ph:			; FIXEDLEN: vector.ph:
	; FIXEDLEN-NEXT: br label [[VECTOR_BODY:%.*]]			; FIXEDLEN-NEXT: br label [[VECTOR_BODY:%.*]]
	; FIXEDLEN: vector.body:			; FIXEDLEN: vector.body:
	; FIXEDLEN-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; FIXEDLEN-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; FIXEDLEN-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0			; FIXEDLEN-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
	; FIXEDLEN-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 2			; FIXEDLEN-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 2
	; FIXEDLEN-NEXT: [[TMP2:%.]] = load i64, ptr [[B:%.]], align 1			; FIXEDLEN-NEXT: [[TMP2:%.]] = load i64, ptr [[B:%.]], align 1
	; FIXEDLEN-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <2 x i64> poison, i64 [[TMP2]], i32 0			; FIXEDLEN-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <2 x i64> poison, i64 [[TMP2]], i32 0
	; FIXEDLEN-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT]], <2 x i64> poison, <2 x i32> zeroinitializer
	; FIXEDLEN-NEXT: [[TMP3:%.*]] = load i64, ptr [[B]], align 1
	; FIXEDLEN-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <2 x i64> poison, i64 [[TMP3]], i32 0
	; FIXEDLEN-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT1]], <2 x i64> poison, <2 x i32> zeroinitializer			; FIXEDLEN-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT1]], <2 x i64> poison, <2 x i32> zeroinitializer
	; FIXEDLEN-NEXT: [[TMP4:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP0]]			; FIXEDLEN-NEXT: [[TMP3:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP0]]
	; FIXEDLEN-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]			; FIXEDLEN-NEXT: [[TMP4:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]
	; FIXEDLEN-NEXT: [[TMP6:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 0			; FIXEDLEN-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[TMP3]], i32 0
	; FIXEDLEN-NEXT: store <2 x i64> [[BROADCAST_SPLAT]], ptr [[TMP6]], align 8			; FIXEDLEN-NEXT: store <2 x i64> [[BROADCAST_SPLAT2]], ptr [[TMP5]], align 8
	; FIXEDLEN-NEXT: [[TMP7:%.*]] = getelementptr inbounds i64, ptr [[TMP4]], i32 2			; FIXEDLEN-NEXT: [[TMP6:%.*]] = getelementptr inbounds i64, ptr [[TMP3]], i32 2
	; FIXEDLEN-NEXT: store <2 x i64> [[BROADCAST_SPLAT2]], ptr [[TMP7]], align 8			; FIXEDLEN-NEXT: store <2 x i64> [[BROADCAST_SPLAT2]], ptr [[TMP6]], align 8
	; FIXEDLEN-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; FIXEDLEN-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; FIXEDLEN-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024			; FIXEDLEN-NEXT: [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024
	; FIXEDLEN-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]			; FIXEDLEN-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
	; FIXEDLEN: middle.block:			; FIXEDLEN: middle.block:
	; FIXEDLEN-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, 1024			; FIXEDLEN-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, 1024
	; FIXEDLEN-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; FIXEDLEN-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; FIXEDLEN: scalar.ph:			; FIXEDLEN: scalar.ph:
	; FIXEDLEN-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1024, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; FIXEDLEN-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1024, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; FIXEDLEN-NEXT: br label [[FOR_BODY:%.*]]			; FIXEDLEN-NEXT: br label [[FOR_BODY:%.*]]
	; FIXEDLEN: for.body:			; FIXEDLEN: for.body:
	; FIXEDLEN-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]			; FIXEDLEN-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
	▲ Show 20 Lines • Show All 901 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/X86/cost-model-assert.ll

	Show All 14 Lines
	; CHECK-LABEL: @cff_index_load_offsets(			; CHECK-LABEL: @cff_index_load_offsets(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 [[COND:%.]], label [[IF_THEN:%.]], label [[EXIT:%.*]]			; CHECK-NEXT: br i1 [[COND:%.]], label [[IF_THEN:%.]], label [[EXIT:%.*]]
	; CHECK: if.then:			; CHECK: if.then:
	; CHECK-NEXT: [[UMAX:%.*]] = call i64 @llvm.umax.i64(i64 undef, i64 4)			; CHECK-NEXT: [[UMAX:%.*]] = call i64 @llvm.umax.i64(i64 undef, i64 4)
	; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[UMAX]], -1			; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[UMAX]], -1
	; CHECK-NEXT: [[TMP1:%.*]] = lshr i64 [[TMP0]], 2			; CHECK-NEXT: [[TMP1:%.*]] = lshr i64 [[TMP0]], 2
	; CHECK-NEXT: [[TMP2:%.*]] = add nuw nsw i64 [[TMP1]], 1			; CHECK-NEXT: [[TMP2:%.*]] = add nuw nsw i64 [[TMP1]], 1
	; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP2]], 8			; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP2]], 4
	; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP2]], 8			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP2]], 4
				david-armUnsubmitted Not Done Reply Inline Actions This looks a bit unexpected. Do you know why the minimum iteration check has changed here from 8 to 4? It sounds like we've made a different choice of VF and/or interleave count (IC). I can only assume that the cost of some instructions has changed? david-arm: This looks a bit unexpected. Do you know why the minimum iteration check has changed here from…
				reamesAuthorUnsubmitted Done Reply Inline Actions Exactly. If I remember correctly, this is because uniform instructions are never considered possible to scalarize, and I didn't add the same check for the CM_Uniform. I'd decided it didn't really matter since this test wasn't about codegen (from the name), and was instead a no-crash test. reames: Exactly. If I remember correctly, this is because uniform instructions are never considered…
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP2]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP2]], [[N_MOD_VF]]
	; CHECK-NEXT: [[TMP3:%.*]] = mul i64 [[N_VEC]], 4			; CHECK-NEXT: [[TMP3:%.*]] = mul i64 [[N_VEC]], 4
	; CHECK-NEXT: [[IND_END:%.]] = getelementptr i8, i8 null, i64 [[TMP3]]			; CHECK-NEXT: [[IND_END:%.]] = getelementptr i8, i8 null, i64 [[TMP3]]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <4 x i8> poison, i8 [[X:%.]], i32 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <4 x i8> poison, i8 [[X:%.]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i8> [[BROADCAST_SPLATINSERT]], <4 x i8> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i8> [[BROADCAST_SPLATINSERT]], <4 x i8> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <4 x i8> poison, i8 [[X]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <4 x i8> [[BROADCAST_SPLATINSERT1]], <4 x i8> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP4:%.*]] = zext <4 x i8> [[BROADCAST_SPLAT]] to <4 x i32>			; CHECK-NEXT: [[TMP4:%.*]] = zext <4 x i8> [[BROADCAST_SPLAT]] to <4 x i32>
	; CHECK-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[BROADCAST_SPLAT2]] to <4 x i32>			; CHECK-NEXT: [[TMP5:%.*]] = shl nuw <4 x i32> [[TMP4]], <i32 24, i32 24, i32 24, i32 24>
	; CHECK-NEXT: [[TMP6:%.*]] = shl nuw <4 x i32> [[TMP4]], <i32 24, i32 24, i32 24, i32 24>			; CHECK-NEXT: [[TMP6:%.]] = load i8, i8 [[P:%.*]], align 1, !tbaa [[TBAA1:![0-9]+]]
	; CHECK-NEXT: [[TMP7:%.*]] = shl nuw <4 x i32> [[TMP5]], <i32 24, i32 24, i32 24, i32 24>			; CHECK-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <4 x i8> poison, i8 [[TMP6]], i32 0
	; CHECK-NEXT: [[TMP8:%.]] = load i8, i8 [[P:%.*]], align 1, !tbaa [[TBAA1:![0-9]+]]			; CHECK-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <4 x i8> [[BROADCAST_SPLATINSERT1]], <4 x i8> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT3:%.*]] = insertelement <4 x i8> poison, i8 [[TMP8]], i32 0			; CHECK-NEXT: [[TMP7:%.*]] = zext <4 x i8> [[BROADCAST_SPLAT2]] to <4 x i32>
				; CHECK-NEXT: [[TMP8:%.*]] = shl nuw nsw <4 x i32> [[TMP7]], <i32 16, i32 16, i32 16, i32 16>
				; CHECK-NEXT: [[TMP9:%.*]] = or <4 x i32> [[TMP8]], [[TMP5]]
				; CHECK-NEXT: [[TMP10:%.]] = load i8, i8 undef, align 1, !tbaa [[TBAA1]]
				; CHECK-NEXT: [[BROADCAST_SPLATINSERT3:%.*]] = insertelement <4 x i8> poison, i8 [[TMP10]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT4:%.*]] = shufflevector <4 x i8> [[BROADCAST_SPLATINSERT3]], <4 x i8> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT4:%.*]] = shufflevector <4 x i8> [[BROADCAST_SPLATINSERT3]], <4 x i8> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP9:%.]] = load i8, i8 [[P]], align 1, !tbaa [[TBAA1]]			; CHECK-NEXT: [[TMP11:%.*]] = or <4 x i32> [[TMP9]], zeroinitializer
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT5:%.*]] = insertelement <4 x i8> poison, i8 [[TMP9]], i32 0			; CHECK-NEXT: [[TMP12:%.*]] = zext <4 x i8> [[BROADCAST_SPLAT4]] to <4 x i32>
	; CHECK-NEXT: [[BROADCAST_SPLAT6:%.*]] = shufflevector <4 x i8> [[BROADCAST_SPLATINSERT5]], <4 x i8> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[TMP13:%.*]] = or <4 x i32> [[TMP11]], [[TMP12]]
	; CHECK-NEXT: [[TMP10:%.*]] = zext <4 x i8> [[BROADCAST_SPLAT4]] to <4 x i32>			; CHECK-NEXT: [[TMP14:%.*]] = extractelement <4 x i32> [[TMP13]], i32 0
	; CHECK-NEXT: [[TMP11:%.*]] = zext <4 x i8> [[BROADCAST_SPLAT6]] to <4 x i32>			; CHECK-NEXT: store i32 [[TMP14]], i32* undef, align 4, !tbaa [[TBAA4:![0-9]+]]
	; CHECK-NEXT: [[TMP12:%.*]] = shl nuw nsw <4 x i32> [[TMP10]], <i32 16, i32 16, i32 16, i32 16>			; CHECK-NEXT: [[TMP15:%.*]] = extractelement <4 x i32> [[TMP13]], i32 1
	; CHECK-NEXT: [[TMP13:%.*]] = shl nuw nsw <4 x i32> [[TMP11]], <i32 16, i32 16, i32 16, i32 16>			; CHECK-NEXT: store i32 [[TMP15]], i32* undef, align 4, !tbaa [[TBAA4]]
	; CHECK-NEXT: [[TMP14:%.*]] = or <4 x i32> [[TMP12]], [[TMP6]]			; CHECK-NEXT: [[TMP16:%.*]] = extractelement <4 x i32> [[TMP13]], i32 2
	; CHECK-NEXT: [[TMP15:%.*]] = or <4 x i32> [[TMP13]], [[TMP7]]			; CHECK-NEXT: store i32 [[TMP16]], i32* undef, align 4, !tbaa [[TBAA4]]
	; CHECK-NEXT: [[TMP16:%.]] = load i8, i8 undef, align 1, !tbaa [[TBAA1]]			; CHECK-NEXT: [[TMP17:%.*]] = extractelement <4 x i32> [[TMP13]], i32 3
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT7:%.*]] = insertelement <4 x i8> poison, i8 [[TMP16]], i32 0			; CHECK-NEXT: store i32 [[TMP17]], i32* undef, align 4, !tbaa [[TBAA4]]
	; CHECK-NEXT: [[BROADCAST_SPLAT8:%.*]] = shufflevector <4 x i8> [[BROADCAST_SPLATINSERT7]], <4 x i8> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP17:%.]] = load i8, i8 undef, align 1, !tbaa [[TBAA1]]			; CHECK-NEXT: [[TMP18:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT9:%.*]] = insertelement <4 x i8> poison, i8 [[TMP17]], i32 0			; CHECK-NEXT: br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
	; CHECK-NEXT: [[BROADCAST_SPLAT10:%.*]] = shufflevector <4 x i8> [[BROADCAST_SPLATINSERT9]], <4 x i8> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP18:%.*]] = or <4 x i32> [[TMP14]], zeroinitializer
	; CHECK-NEXT: [[TMP19:%.*]] = or <4 x i32> [[TMP15]], zeroinitializer
	; CHECK-NEXT: [[TMP20:%.*]] = zext <4 x i8> [[BROADCAST_SPLAT8]] to <4 x i32>
	; CHECK-NEXT: [[TMP21:%.*]] = zext <4 x i8> [[BROADCAST_SPLAT10]] to <4 x i32>
	; CHECK-NEXT: [[TMP22:%.*]] = or <4 x i32> [[TMP18]], [[TMP20]]
	; CHECK-NEXT: [[TMP23:%.*]] = or <4 x i32> [[TMP19]], [[TMP21]]
	; CHECK-NEXT: [[TMP24:%.*]] = extractelement <4 x i32> [[TMP22]], i32 0
	; CHECK-NEXT: store i32 [[TMP24]], i32* undef, align 4, !tbaa [[TBAA4:![0-9]+]]
	; CHECK-NEXT: [[TMP25:%.*]] = extractelement <4 x i32> [[TMP22]], i32 1
	; CHECK-NEXT: store i32 [[TMP25]], i32* undef, align 4, !tbaa [[TBAA4]]
	; CHECK-NEXT: [[TMP26:%.*]] = extractelement <4 x i32> [[TMP22]], i32 2
	; CHECK-NEXT: store i32 [[TMP26]], i32* undef, align 4, !tbaa [[TBAA4]]
	; CHECK-NEXT: [[TMP27:%.*]] = extractelement <4 x i32> [[TMP22]], i32 3
	; CHECK-NEXT: store i32 [[TMP27]], i32* undef, align 4, !tbaa [[TBAA4]]
	; CHECK-NEXT: [[TMP28:%.*]] = extractelement <4 x i32> [[TMP23]], i32 0
	; CHECK-NEXT: store i32 [[TMP28]], i32* undef, align 4, !tbaa [[TBAA4]]
	; CHECK-NEXT: [[TMP29:%.*]] = extractelement <4 x i32> [[TMP23]], i32 1
	; CHECK-NEXT: store i32 [[TMP29]], i32* undef, align 4, !tbaa [[TBAA4]]
	; CHECK-NEXT: [[TMP30:%.*]] = extractelement <4 x i32> [[TMP23]], i32 2
	; CHECK-NEXT: store i32 [[TMP30]], i32* undef, align 4, !tbaa [[TBAA4]]
	; CHECK-NEXT: [[TMP31:%.*]] = extractelement <4 x i32> [[TMP23]], i32 3
	; CHECK-NEXT: store i32 [[TMP31]], i32* undef, align 4, !tbaa [[TBAA4]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
	; CHECK-NEXT: [[TMP32:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP32]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP2]], [[N_VEC]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP2]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[SW_EPILOG:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[SW_EPILOG:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i8 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ null, [[IF_THEN]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i8 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ null, [[IF_THEN]] ]
	; CHECK-NEXT: br label [[FOR_BODY68:%.*]]			; CHECK-NEXT: br label [[FOR_BODY68:%.*]]
	; CHECK: for.body68:			; CHECK: for.body68:
	; CHECK-NEXT: [[P_359:%.]] = phi i8 [ [[ADD_PTR86:%.*]], [[FOR_BODY68]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[P_359:%.]] = phi i8 [ [[ADD_PTR86:%.*]], [[FOR_BODY68]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[CONV70:%.*]] = zext i8 [[X]] to i32			; CHECK-NEXT: [[CONV70:%.*]] = zext i8 [[X]] to i32
	; CHECK-NEXT: [[SHL71:%.*]] = shl nuw i32 [[CONV70]], 24			; CHECK-NEXT: [[SHL71:%.*]] = shl nuw i32 [[CONV70]], 24
	; CHECK-NEXT: [[TMP33:%.]] = load i8, i8 [[P]], align 1, !tbaa [[TBAA1]]			; CHECK-NEXT: [[TMP19:%.]] = load i8, i8 [[P]], align 1, !tbaa [[TBAA1]]
	; CHECK-NEXT: [[CONV73:%.*]] = zext i8 [[TMP33]] to i32			; CHECK-NEXT: [[CONV73:%.*]] = zext i8 [[TMP19]] to i32
	; CHECK-NEXT: [[SHL74:%.*]] = shl nuw nsw i32 [[CONV73]], 16			; CHECK-NEXT: [[SHL74:%.*]] = shl nuw nsw i32 [[CONV73]], 16
	; CHECK-NEXT: [[OR75:%.*]] = or i32 [[SHL74]], [[SHL71]]			; CHECK-NEXT: [[OR75:%.*]] = or i32 [[SHL74]], [[SHL71]]
	; CHECK-NEXT: [[TMP34:%.]] = load i8, i8 undef, align 1, !tbaa [[TBAA1]]			; CHECK-NEXT: [[TMP20:%.]] = load i8, i8 undef, align 1, !tbaa [[TBAA1]]
	; CHECK-NEXT: [[SHL78:%.*]] = shl nuw nsw i32 undef, 8			; CHECK-NEXT: [[SHL78:%.*]] = shl nuw nsw i32 undef, 8
	; CHECK-NEXT: [[OR79:%.*]] = or i32 [[OR75]], [[SHL78]]			; CHECK-NEXT: [[OR79:%.*]] = or i32 [[OR75]], [[SHL78]]
	; CHECK-NEXT: [[CONV81:%.*]] = zext i8 [[TMP34]] to i32			; CHECK-NEXT: [[CONV81:%.*]] = zext i8 [[TMP20]] to i32
	; CHECK-NEXT: [[OR83:%.*]] = or i32 [[OR79]], [[CONV81]]			; CHECK-NEXT: [[OR83:%.*]] = or i32 [[OR79]], [[CONV81]]
	; CHECK-NEXT: store i32 [[OR83]], i32* undef, align 4, !tbaa [[TBAA4]]			; CHECK-NEXT: store i32 [[OR83]], i32* undef, align 4, !tbaa [[TBAA4]]
	; CHECK-NEXT: [[ADD_PTR86]] = getelementptr inbounds i8, i8* [[P_359]], i64 4			; CHECK-NEXT: [[ADD_PTR86]] = getelementptr inbounds i8, i8* [[P_359]], i64 4
	; CHECK-NEXT: [[CMP66:%.]] = icmp ult i8 [[ADD_PTR86]], undef			; CHECK-NEXT: [[CMP66:%.]] = icmp ult i8 [[ADD_PTR86]], undef
	; CHECK-NEXT: br i1 [[CMP66]], label [[FOR_BODY68]], label [[SW_EPILOG]], !llvm.loop [[LOOP8:![0-9]+]]			; CHECK-NEXT: br i1 [[CMP66]], label [[FOR_BODY68]], label [[SW_EPILOG]], !llvm.loop [[LOOP8:![0-9]+]]
	; CHECK: sw.epilog:			; CHECK: sw.epilog:
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: Exit:			; CHECK: Exit:
	▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll

	Show All 11 Lines
	; CHECK-LABEL: @uniform_load(			; CHECK-LABEL: @uniform_load(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[ADDR:%.*]], align 4			; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[ADDR:%.*]], align 4
	; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[ADDR]], align 4
	; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[ADDR]], align 4
	; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 [[ADDR]], align 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
	; CHECK-NEXT: [[TMP4:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096			; CHECK-NEXT: [[TMP1:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
	; CHECK-NEXT: br i1 [[TMP4]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP1]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 4097, 4096			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 4097, 4096
	; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOPEXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOPEXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[IV_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[IV_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[LOAD:%.]] = load i32, i32 [[ADDR]], align 4			; CHECK-NEXT: [[LOAD:%.]] = load i32, i32 [[ADDR]], align 4
	; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[IV]], 4096			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[IV]], 4096
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[LOOPEXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP2:![0-9]+]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[LOOPEXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP2:![0-9]+]]
	; CHECK: loopexit:			; CHECK: loopexit:
	; CHECK-NEXT: [[LOAD_LCSSA:%.*]] = phi i32 [ [[LOAD]], [[FOR_BODY]] ], [ [[TMP3]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[LOAD_LCSSA:%.*]] = phi i32 [ [[LOAD]], [[FOR_BODY]] ], [ [[TMP0]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: ret i32 [[LOAD_LCSSA]]			; CHECK-NEXT: ret i32 [[LOAD_LCSSA]]
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ %iv.next, %for.body ], [ 0, %entry ]			%iv = phi i64 [ %iv.next, %for.body ], [ 0, %entry ]
	%load = load i32, i32* %addr			%load = load i32, i32* %addr
	%iv.next = add nuw nsw i64 %iv, 1			%iv.next = add nuw nsw i64 %iv, 1
	%exitcond = icmp eq i64 %iv, 4096			%exitcond = icmp eq i64 %iv, 4096
	br i1 %exitcond, label %loopexit, label %for.body			br i1 %exitcond, label %loopexit, label %for.body

	loopexit:			loopexit:
	ret i32 %load			ret i32 %load
	}			}

	define i32 @uniform_load2(i32* align(4) %addr) {			define i32 @uniform_load2(i32* align(4) %addr) {
	; CHECK-LABEL: @uniform_load2(			; CHECK-LABEL: @uniform_load2(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP4:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP1:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI1:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP5:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI1:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP2:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI2:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP6:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI2:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP3:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI3:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP7:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI3:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP4:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[ADDR:%.*]], align 4			; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[ADDR:%.*]], align 4
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[TMP0]], i32 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT8:%.*]] = insertelement <4 x i32> poison, i32 [[TMP0]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[ADDR]], align 4
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT4:%.*]] = insertelement <4 x i32> poison, i32 [[TMP1]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT5:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT4]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[ADDR]], align 4
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT6:%.*]] = insertelement <4 x i32> poison, i32 [[TMP2]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT7:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT6]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 [[ADDR]], align 4
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT8:%.*]] = insertelement <4 x i32> poison, i32 [[TMP3]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT9:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT8]], <4 x i32> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT9:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT8]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP4]] = add <4 x i32> [[VEC_PHI]], [[BROADCAST_SPLAT]]			; CHECK-NEXT: [[TMP1]] = add <4 x i32> [[VEC_PHI]], [[BROADCAST_SPLAT9]]
	; CHECK-NEXT: [[TMP5]] = add <4 x i32> [[VEC_PHI1]], [[BROADCAST_SPLAT5]]			; CHECK-NEXT: [[TMP2]] = add <4 x i32> [[VEC_PHI1]], [[BROADCAST_SPLAT9]]
	; CHECK-NEXT: [[TMP6]] = add <4 x i32> [[VEC_PHI2]], [[BROADCAST_SPLAT7]]			; CHECK-NEXT: [[TMP3]] = add <4 x i32> [[VEC_PHI2]], [[BROADCAST_SPLAT9]]
	; CHECK-NEXT: [[TMP7]] = add <4 x i32> [[VEC_PHI3]], [[BROADCAST_SPLAT9]]			; CHECK-NEXT: [[TMP4]] = add <4 x i32> [[VEC_PHI3]], [[BROADCAST_SPLAT9]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
	; CHECK-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096			; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
	; CHECK-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP5]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP5]], [[TMP4]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP2]], [[TMP1]]
	; CHECK-NEXT: [[BIN_RDX10:%.*]] = add <4 x i32> [[TMP6]], [[BIN_RDX]]			; CHECK-NEXT: [[BIN_RDX10:%.*]] = add <4 x i32> [[TMP3]], [[BIN_RDX]]
	; CHECK-NEXT: [[BIN_RDX11:%.*]] = add <4 x i32> [[TMP7]], [[BIN_RDX10]]			; CHECK-NEXT: [[BIN_RDX11:%.*]] = add <4 x i32> [[TMP4]], [[BIN_RDX10]]
	; CHECK-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[BIN_RDX11]])			; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[BIN_RDX11]])
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 4097, 4096			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 4097, 4096
	; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOPEXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOPEXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP9]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP6]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[IV_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[IV_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[ACCUM:%.]] = phi i32 [ [[ACCUM_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[ACCUM:%.]] = phi i32 [ [[ACCUM_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[LOAD:%.]] = load i32, i32 [[ADDR]], align 4			; CHECK-NEXT: [[LOAD:%.]] = load i32, i32 [[ADDR]], align 4
	; CHECK-NEXT: [[ACCUM_NEXT]] = add i32 [[ACCUM]], [[LOAD]]			; CHECK-NEXT: [[ACCUM_NEXT]] = add i32 [[ACCUM]], [[LOAD]]
	; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[IV]], 4096			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[IV]], 4096
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[LOOPEXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[LOOPEXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
	; CHECK: loopexit:			; CHECK: loopexit:
	; CHECK-NEXT: [[ACCUM_NEXT_LCSSA:%.*]] = phi i32 [ [[ACCUM_NEXT]], [[FOR_BODY]] ], [ [[TMP9]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[ACCUM_NEXT_LCSSA:%.*]] = phi i32 [ [[ACCUM_NEXT]], [[FOR_BODY]] ], [ [[TMP6]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: ret i32 [[ACCUM_NEXT_LCSSA]]			; CHECK-NEXT: ret i32 [[ACCUM_NEXT_LCSSA]]
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ %iv.next, %for.body ], [ 0, %entry ]			%iv = phi i64 [ %iv.next, %for.body ], [ 0, %entry ]
	%accum = phi i32 [%accum.next, %for.body], [0, %entry]			%accum = phi i32 [%accum.next, %for.body], [0, %entry]
	Show All 19 Lines
	; CHECK-NEXT: [[TMP1:%.*]] = udiv i32 [[BYTE_OFFSET]], 4			; CHECK-NEXT: [[TMP1:%.*]] = udiv i32 [[BYTE_OFFSET]], 4
	; CHECK-NEXT: [[TMP2:%.*]] = udiv i32 [[BYTE_OFFSET]], 4			; CHECK-NEXT: [[TMP2:%.*]] = udiv i32 [[BYTE_OFFSET]], 4
	; CHECK-NEXT: [[TMP3:%.*]] = udiv i32 [[BYTE_OFFSET]], 4			; CHECK-NEXT: [[TMP3:%.*]] = udiv i32 [[BYTE_OFFSET]], 4
	; CHECK-NEXT: [[TMP4:%.]] = getelementptr i32, i32 [[ADDR:%.*]], i32 [[TMP0]]			; CHECK-NEXT: [[TMP4:%.]] = getelementptr i32, i32 [[ADDR:%.*]], i32 [[TMP0]]
	; CHECK-NEXT: [[TMP5:%.]] = getelementptr i32, i32 [[ADDR]], i32 [[TMP1]]			; CHECK-NEXT: [[TMP5:%.]] = getelementptr i32, i32 [[ADDR]], i32 [[TMP1]]
	; CHECK-NEXT: [[TMP6:%.]] = getelementptr i32, i32 [[ADDR]], i32 [[TMP2]]			; CHECK-NEXT: [[TMP6:%.]] = getelementptr i32, i32 [[ADDR]], i32 [[TMP2]]
	; CHECK-NEXT: [[TMP7:%.]] = getelementptr i32, i32 [[ADDR]], i32 [[TMP3]]			; CHECK-NEXT: [[TMP7:%.]] = getelementptr i32, i32 [[ADDR]], i32 [[TMP3]]
	; CHECK-NEXT: [[TMP8:%.]] = load i32, i32 [[TMP4]], align 4			; CHECK-NEXT: [[TMP8:%.]] = load i32, i32 [[TMP4]], align 4
	; CHECK-NEXT: [[TMP9:%.]] = load i32, i32 [[TMP5]], align 4
	; CHECK-NEXT: [[TMP10:%.]] = load i32, i32 [[TMP6]], align 4
	; CHECK-NEXT: [[TMP11:%.]] = load i32, i32 [[TMP7]], align 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
	; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096			; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
	; CHECK-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 4097, 4096			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 4097, 4096
	; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOPEXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOPEXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[IV_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[IV_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[OFFSET:%.*]] = udiv i32 [[BYTE_OFFSET]], 4			; CHECK-NEXT: [[OFFSET:%.*]] = udiv i32 [[BYTE_OFFSET]], 4
	; CHECK-NEXT: [[GEP:%.]] = getelementptr i32, i32 [[ADDR]], i32 [[OFFSET]]			; CHECK-NEXT: [[GEP:%.]] = getelementptr i32, i32 [[ADDR]], i32 [[OFFSET]]
	; CHECK-NEXT: [[LOAD:%.]] = load i32, i32 [[GEP]], align 4			; CHECK-NEXT: [[LOAD:%.]] = load i32, i32 [[GEP]], align 4
	; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[IV]], 4096			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[IV]], 4096
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[LOOPEXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[LOOPEXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]
	; CHECK: loopexit:			; CHECK: loopexit:
	; CHECK-NEXT: [[LOAD_LCSSA:%.*]] = phi i32 [ [[LOAD]], [[FOR_BODY]] ], [ [[TMP11]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[LOAD_LCSSA:%.*]] = phi i32 [ [[LOAD]], [[FOR_BODY]] ], [ [[TMP8]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: ret i32 [[LOAD_LCSSA]]			; CHECK-NEXT: ret i32 [[LOAD_LCSSA]]
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ %iv.next, %for.body ], [ 0, %entry ]			%iv = phi i64 [ %iv.next, %for.body ], [ 0, %entry ]
	%offset = udiv i32 %byte_offset, 4			%offset = udiv i32 %byte_offset, 4
	▲ Show 20 Lines • Show All 174 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[BOUND1:%.]] = icmp ult i8 [[A3]], [[SCEVGEP2]]			; CHECK-NEXT: [[BOUND1:%.]] = icmp ult i8 [[A3]], [[SCEVGEP2]]
	; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]			; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
	; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]			; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[A]], align 4, !alias.scope !12			; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[A]], align 4, !alias.scope !12
	; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[A]], align 4, !alias.scope !12
	; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[A]], align 4, !alias.scope !12
	; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 [[A]], align 4, !alias.scope !12
	; CHECK-NEXT: store i32 [[TMP0]], i32* [[B]], align 4, !alias.scope !15, !noalias !12			; CHECK-NEXT: store i32 [[TMP0]], i32* [[B]], align 4, !alias.scope !15, !noalias !12
	; CHECK-NEXT: store i32 [[TMP0]], i32* [[B]], align 4, !alias.scope !15, !noalias !12			; CHECK-NEXT: store i32 [[TMP0]], i32* [[B]], align 4, !alias.scope !15, !noalias !12
	; CHECK-NEXT: store i32 [[TMP0]], i32* [[B]], align 4, !alias.scope !15, !noalias !12			; CHECK-NEXT: store i32 [[TMP0]], i32* [[B]], align 4, !alias.scope !15, !noalias !12
	; CHECK-NEXT: store i32 [[TMP0]], i32* [[B]], align 4, !alias.scope !15, !noalias !12			; CHECK-NEXT: store i32 [[TMP0]], i32* [[B]], align 4, !alias.scope !15, !noalias !12
	; CHECK-NEXT: store i32 [[TMP1]], i32* [[B]], align 4, !alias.scope !15, !noalias !12			; CHECK-NEXT: store i32 [[TMP0]], i32* [[B]], align 4, !alias.scope !15, !noalias !12
	; CHECK-NEXT: store i32 [[TMP1]], i32* [[B]], align 4, !alias.scope !15, !noalias !12			; CHECK-NEXT: store i32 [[TMP0]], i32* [[B]], align 4, !alias.scope !15, !noalias !12
	; CHECK-NEXT: store i32 [[TMP1]], i32* [[B]], align 4, !alias.scope !15, !noalias !12			; CHECK-NEXT: store i32 [[TMP0]], i32* [[B]], align 4, !alias.scope !15, !noalias !12
	; CHECK-NEXT: store i32 [[TMP1]], i32* [[B]], align 4, !alias.scope !15, !noalias !12			; CHECK-NEXT: store i32 [[TMP0]], i32* [[B]], align 4, !alias.scope !15, !noalias !12
	; CHECK-NEXT: store i32 [[TMP2]], i32* [[B]], align 4, !alias.scope !15, !noalias !12			; CHECK-NEXT: store i32 [[TMP0]], i32* [[B]], align 4, !alias.scope !15, !noalias !12
	; CHECK-NEXT: store i32 [[TMP2]], i32* [[B]], align 4, !alias.scope !15, !noalias !12			; CHECK-NEXT: store i32 [[TMP0]], i32* [[B]], align 4, !alias.scope !15, !noalias !12
	; CHECK-NEXT: store i32 [[TMP2]], i32* [[B]], align 4, !alias.scope !15, !noalias !12			; CHECK-NEXT: store i32 [[TMP0]], i32* [[B]], align 4, !alias.scope !15, !noalias !12
	; CHECK-NEXT: store i32 [[TMP2]], i32* [[B]], align 4, !alias.scope !15, !noalias !12			; CHECK-NEXT: store i32 [[TMP0]], i32* [[B]], align 4, !alias.scope !15, !noalias !12
	; CHECK-NEXT: store i32 [[TMP3]], i32* [[B]], align 4, !alias.scope !15, !noalias !12			; CHECK-NEXT: store i32 [[TMP0]], i32* [[B]], align 4, !alias.scope !15, !noalias !12
	; CHECK-NEXT: store i32 [[TMP3]], i32* [[B]], align 4, !alias.scope !15, !noalias !12			; CHECK-NEXT: store i32 [[TMP0]], i32* [[B]], align 4, !alias.scope !15, !noalias !12
	; CHECK-NEXT: store i32 [[TMP3]], i32* [[B]], align 4, !alias.scope !15, !noalias !12			; CHECK-NEXT: store i32 [[TMP0]], i32* [[B]], align 4, !alias.scope !15, !noalias !12
	; CHECK-NEXT: store i32 [[TMP3]], i32* [[B]], align 4, !alias.scope !15, !noalias !12			; CHECK-NEXT: store i32 [[TMP0]], i32* [[B]], align 4, !alias.scope !15, !noalias !12
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
	; CHECK-NEXT: [[TMP4:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096			; CHECK-NEXT: [[TMP1:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
	; CHECK-NEXT: br i1 [[TMP4]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP17:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP1]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP17:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 4097, 4096			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 4097, 4096
	; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOPEXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOPEXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_MEMCHECK]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_MEMCHECK]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[IV_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[IV_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	▲ Show 20 Lines • Show All 155 Lines • ▼ Show 20 Lines
	define i32 @uniform_load_global() {			define i32 @uniform_load_global() {
	; CHECK-LABEL: @uniform_load_global(			; CHECK-LABEL: @uniform_load_global(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP4:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP1:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI1:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP5:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI1:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP2:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI2:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP6:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI2:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP3:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI3:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP7:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI3:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP4:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 @GAddr, align 4			; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 @GAddr, align 4
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[TMP0]], i32 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT8:%.*]] = insertelement <4 x i32> poison, i32 [[TMP0]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 @GAddr, align 4
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT4:%.*]] = insertelement <4 x i32> poison, i32 [[TMP1]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT5:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT4]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 @GAddr, align 4
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT6:%.*]] = insertelement <4 x i32> poison, i32 [[TMP2]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT7:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT6]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 @GAddr, align 4
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT8:%.*]] = insertelement <4 x i32> poison, i32 [[TMP3]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT9:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT8]], <4 x i32> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT9:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT8]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP4]] = add <4 x i32> [[VEC_PHI]], [[BROADCAST_SPLAT]]			; CHECK-NEXT: [[TMP1]] = add <4 x i32> [[VEC_PHI]], [[BROADCAST_SPLAT9]]
	; CHECK-NEXT: [[TMP5]] = add <4 x i32> [[VEC_PHI1]], [[BROADCAST_SPLAT5]]			; CHECK-NEXT: [[TMP2]] = add <4 x i32> [[VEC_PHI1]], [[BROADCAST_SPLAT9]]
	; CHECK-NEXT: [[TMP6]] = add <4 x i32> [[VEC_PHI2]], [[BROADCAST_SPLAT7]]			; CHECK-NEXT: [[TMP3]] = add <4 x i32> [[VEC_PHI2]], [[BROADCAST_SPLAT9]]
	; CHECK-NEXT: [[TMP7]] = add <4 x i32> [[VEC_PHI3]], [[BROADCAST_SPLAT9]]			; CHECK-NEXT: [[TMP4]] = add <4 x i32> [[VEC_PHI3]], [[BROADCAST_SPLAT9]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
	; CHECK-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096			; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
	; CHECK-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP21:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP5]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP21:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP5]], [[TMP4]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP2]], [[TMP1]]
	; CHECK-NEXT: [[BIN_RDX10:%.*]] = add <4 x i32> [[TMP6]], [[BIN_RDX]]			; CHECK-NEXT: [[BIN_RDX10:%.*]] = add <4 x i32> [[TMP3]], [[BIN_RDX]]
	; CHECK-NEXT: [[BIN_RDX11:%.*]] = add <4 x i32> [[TMP7]], [[BIN_RDX10]]			; CHECK-NEXT: [[BIN_RDX11:%.*]] = add <4 x i32> [[TMP4]], [[BIN_RDX10]]
	; CHECK-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[BIN_RDX11]])			; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[BIN_RDX11]])
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 4097, 4096			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 4097, 4096
	; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOPEXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOPEXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP9]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP6]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[IV_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[IV_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[ACCUM:%.]] = phi i32 [ [[ACCUM_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[ACCUM:%.]] = phi i32 [ [[ACCUM_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[LOAD:%.]] = load i32, i32 @GAddr, align 4			; CHECK-NEXT: [[LOAD:%.]] = load i32, i32 @GAddr, align 4
	; CHECK-NEXT: [[ACCUM_NEXT]] = add i32 [[ACCUM]], [[LOAD]]			; CHECK-NEXT: [[ACCUM_NEXT]] = add i32 [[ACCUM]], [[LOAD]]
	; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[IV]], 4096			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[IV]], 4096
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[LOOPEXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP22:![0-9]+]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[LOOPEXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP22:![0-9]+]]
	; CHECK: loopexit:			; CHECK: loopexit:
	; CHECK-NEXT: [[ACCUM_NEXT_LCSSA:%.*]] = phi i32 [ [[ACCUM_NEXT]], [[FOR_BODY]] ], [ [[TMP9]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[ACCUM_NEXT_LCSSA:%.*]] = phi i32 [ [[ACCUM_NEXT]], [[FOR_BODY]] ], [ [[TMP6]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: ret i32 [[ACCUM_NEXT_LCSSA]]			; CHECK-NEXT: ret i32 [[ACCUM_NEXT_LCSSA]]
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ %iv.next, %for.body ], [ 0, %entry ]			%iv = phi i64 [ %iv.next, %for.body ], [ 0, %entry ]
	%accum = phi i32 [%accum.next, %for.body], [0, %entry]			%accum = phi i32 [%accum.next, %for.body], [0, %entry]
	Show All 11 Lines
	define i32 @uniform_load_constexpr() {			define i32 @uniform_load_constexpr() {
	; CHECK-LABEL: @uniform_load_constexpr(			; CHECK-LABEL: @uniform_load_constexpr(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP4:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP1:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI1:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP5:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI1:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP2:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI2:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP6:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI2:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP3:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI3:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP7:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI3:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP4:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 getelementptr (i32, i32* @GAddr, i64 5), align 4			; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 getelementptr (i32, i32* @GAddr, i64 5), align 4
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[TMP0]], i32 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT8:%.*]] = insertelement <4 x i32> poison, i32 [[TMP0]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 getelementptr (i32, i32* @GAddr, i64 5), align 4
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT4:%.*]] = insertelement <4 x i32> poison, i32 [[TMP1]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT5:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT4]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr (i32, i32* @GAddr, i64 5), align 4
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT6:%.*]] = insertelement <4 x i32> poison, i32 [[TMP2]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT7:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT6]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr (i32, i32* @GAddr, i64 5), align 4
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT8:%.*]] = insertelement <4 x i32> poison, i32 [[TMP3]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT9:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT8]], <4 x i32> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT9:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT8]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP4]] = add <4 x i32> [[VEC_PHI]], [[BROADCAST_SPLAT]]			; CHECK-NEXT: [[TMP1]] = add <4 x i32> [[VEC_PHI]], [[BROADCAST_SPLAT9]]
	; CHECK-NEXT: [[TMP5]] = add <4 x i32> [[VEC_PHI1]], [[BROADCAST_SPLAT5]]			; CHECK-NEXT: [[TMP2]] = add <4 x i32> [[VEC_PHI1]], [[BROADCAST_SPLAT9]]
	; CHECK-NEXT: [[TMP6]] = add <4 x i32> [[VEC_PHI2]], [[BROADCAST_SPLAT7]]			; CHECK-NEXT: [[TMP3]] = add <4 x i32> [[VEC_PHI2]], [[BROADCAST_SPLAT9]]
	; CHECK-NEXT: [[TMP7]] = add <4 x i32> [[VEC_PHI3]], [[BROADCAST_SPLAT9]]			; CHECK-NEXT: [[TMP4]] = add <4 x i32> [[VEC_PHI3]], [[BROADCAST_SPLAT9]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
	; CHECK-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096			; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
	; CHECK-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP23:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP5]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP23:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP5]], [[TMP4]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP2]], [[TMP1]]
	; CHECK-NEXT: [[BIN_RDX10:%.*]] = add <4 x i32> [[TMP6]], [[BIN_RDX]]			; CHECK-NEXT: [[BIN_RDX10:%.*]] = add <4 x i32> [[TMP3]], [[BIN_RDX]]
	; CHECK-NEXT: [[BIN_RDX11:%.*]] = add <4 x i32> [[TMP7]], [[BIN_RDX10]]			; CHECK-NEXT: [[BIN_RDX11:%.*]] = add <4 x i32> [[TMP4]], [[BIN_RDX10]]
	; CHECK-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[BIN_RDX11]])			; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[BIN_RDX11]])
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 4097, 4096			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 4097, 4096
	; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOPEXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOPEXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP9]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP6]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[IV_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[IV_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[ACCUM:%.]] = phi i32 [ [[ACCUM_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[ACCUM:%.]] = phi i32 [ [[ACCUM_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[LOAD:%.]] = load i32, i32 getelementptr (i32, i32* @GAddr, i64 5), align 4			; CHECK-NEXT: [[LOAD:%.]] = load i32, i32 getelementptr (i32, i32* @GAddr, i64 5), align 4
	; CHECK-NEXT: [[ACCUM_NEXT]] = add i32 [[ACCUM]], [[LOAD]]			; CHECK-NEXT: [[ACCUM_NEXT]] = add i32 [[ACCUM]], [[LOAD]]
	; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[IV]], 4096			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[IV]], 4096
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[LOOPEXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP24:![0-9]+]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[LOOPEXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP24:![0-9]+]]
	; CHECK: loopexit:			; CHECK: loopexit:
	; CHECK-NEXT: [[ACCUM_NEXT_LCSSA:%.*]] = phi i32 [ [[ACCUM_NEXT]], [[FOR_BODY]] ], [ [[TMP9]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[ACCUM_NEXT_LCSSA:%.*]] = phi i32 [ [[ACCUM_NEXT]], [[FOR_BODY]] ], [ [[TMP6]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: ret i32 [[ACCUM_NEXT_LCSSA]]			; CHECK-NEXT: ret i32 [[ACCUM_NEXT_LCSSA]]
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%iv = phi i64 [ %iv.next, %for.body ], [ 0, %entry ]			%iv = phi i64 [ %iv.next, %for.body ], [ 0, %entry ]
	%accum = phi i32 [ %accum.next, %for.body ], [ 0, %entry ]			%accum = phi i32 [ %accum.next, %for.body ], [ 0, %entry ]
	Show All 9 Lines

llvm/test/Transforms/LoopVectorize/first-order-recurrence-sink-replicate-region.ll

	Show First 20 Lines • Show All 467 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: EMIT vp<[[CAN_IV:%.+]]> = CANONICAL-INDUCTION			; CHECK-NEXT: EMIT vp<[[CAN_IV:%.+]]> = CANONICAL-INDUCTION
	; CHECK-NEXT: FIRST-ORDER-RECURRENCE-PHI ir<%.pn> = phi ir<0>, ir<[[L:%.+]]>			; CHECK-NEXT: FIRST-ORDER-RECURRENCE-PHI ir<%.pn> = phi ir<0>, ir<[[L:%.+]]>
	; CHECK-NEXT: vp<[[SCALAR_STEPS:%.+]]> = SCALAR-STEPS vp<[[CAN_IV]]>, ir<2>, ir<1>			; CHECK-NEXT: vp<[[SCALAR_STEPS:%.+]]> = SCALAR-STEPS vp<[[CAN_IV]]>, ir<2>, ir<1>
	; CHECK-NEXT: EMIT vp<[[WIDE_IV:%.+]]> = WIDEN-CANONICAL-INDUCTION vp<[[CAN_IV]]>			; CHECK-NEXT: EMIT vp<[[WIDE_IV:%.+]]> = WIDEN-CANONICAL-INDUCTION vp<[[CAN_IV]]>
	; CHECK-NEXT: EMIT vp<[[CMP:%.+]]> = icmp ule vp<[[WIDE_IV]]> vp<[[BTC]]>			; CHECK-NEXT: EMIT vp<[[CMP:%.+]]> = icmp ule vp<[[WIDE_IV]]> vp<[[BTC]]>
	; CHECK-NEXT: Successor(s): loop.0			; CHECK-NEXT: Successor(s): loop.0
	; CHECK-EMPTY:			; CHECK-EMPTY:
	; CHECK-NEXT: loop.0:			; CHECK-NEXT: loop.0:
	; CHECK-NEXT: CLONE ir<[[L]]> = load ir<%src>			; CHECK-NEXT: REPLICATE ir<[[L]]> = load ir<%src>
				david-armUnsubmitted Not Done Reply Inline Actions Do you know why this has changed? As far as I understand it, 'CLONE' literally means clone the scalar instruction once, whereas REPLICATE means insert copies into each lane of the vector. So in a sense we're now doing more loads than before I think? david-arm: Do you know why this has changed? As far as I understand it, 'CLONE' literally means clone the…
				reamesAuthorUnsubmitted Done Reply Inline Actions This one is definitely the profitable to scalarize point explained just above. I dug into this one in some depth. reames: This one is definitely the profitable to scalarize point explained just above. I dug into this…
	; CHECK-NEXT: EMIT vp<[[SPLICE:%.+]]> = first-order splice ir<%.pn> ir<[[L]]>			; CHECK-NEXT: EMIT vp<[[SPLICE:%.+]]> = first-order splice ir<%.pn> ir<[[L]]>
	; CHECK-NEXT: Successor(s): loop.0.split			; CHECK-NEXT: Successor(s): loop.0.split
	; CHECK-EMPTY:			; CHECK-EMPTY:
	; CHECK-NEXT: loop.0.split:			; CHECK-NEXT: loop.0.split:
	; CHECK-NEXT: Successor(s): pred.store			; CHECK-NEXT: Successor(s): pred.store
	; CHECK-EMPTY:			; CHECK-EMPTY:
	; CHECK-NEXT: <xVFxUF> pred.store: {			; CHECK-NEXT: <xVFxUF> pred.store: {
	; CHECK-NEXT: pred.store.entry:			; CHECK-NEXT: pred.store.entry:
	▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/induction.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 6,200 Lines • ▼ Show 20 Lines
	; UNROLL: vector.body:			; UNROLL: vector.body:
	; UNROLL-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; UNROLL-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; UNROLL-NEXT: [[VECTOR_RECUR:%.]] = phi <2 x i32> [ <i32 poison, i32 0>, [[VECTOR_PH]] ], [ [[STEP_ADD:%.]], [[VECTOR_BODY]] ]			; UNROLL-NEXT: [[VECTOR_RECUR:%.]] = phi <2 x i32> [ <i32 poison, i32 0>, [[VECTOR_PH]] ], [ [[STEP_ADD:%.]], [[VECTOR_BODY]] ]
	; UNROLL-NEXT: [[VEC_IND:%.]] = phi <2 x i32> [ <i32 0, i32 1>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]			; UNROLL-NEXT: [[VEC_IND:%.]] = phi <2 x i32> [ <i32 0, i32 1>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
	; UNROLL-NEXT: [[STEP_ADD]] = add <2 x i32> [[VEC_IND]], <i32 2, i32 2>			; UNROLL-NEXT: [[STEP_ADD]] = add <2 x i32> [[VEC_IND]], <i32 2, i32 2>
	; UNROLL-NEXT: [[TMP0:%.*]] = shufflevector <2 x i32> [[VECTOR_RECUR]], <2 x i32> [[VEC_IND]], <2 x i32> <i32 1, i32 2>			; UNROLL-NEXT: [[TMP0:%.*]] = shufflevector <2 x i32> [[VECTOR_RECUR]], <2 x i32> [[VEC_IND]], <2 x i32> <i32 1, i32 2>
	; UNROLL-NEXT: [[TMP1:%.*]] = shufflevector <2 x i32> [[VEC_IND]], <2 x i32> [[STEP_ADD]], <2 x i32> <i32 1, i32 2>			; UNROLL-NEXT: [[TMP1:%.*]] = shufflevector <2 x i32> [[VEC_IND]], <2 x i32> [[STEP_ADD]], <2 x i32> <i32 1, i32 2>
	; UNROLL-NEXT: [[TMP2:%.]] = load i32, i32 [[SRC:%.*]], align 4			; UNROLL-NEXT: [[TMP2:%.]] = load i32, i32 [[SRC:%.*]], align 4
	; UNROLL-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <2 x i32> poison, i32 [[TMP2]], i64 0
	; UNROLL-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i32> [[BROADCAST_SPLATINSERT]], <2 x i32> poison, <2 x i32> zeroinitializer
	; UNROLL-NEXT: [[BROADCAST_SPLATINSERT3:%.*]] = insertelement <2 x i32> poison, i32 [[TMP2]], i64 0			; UNROLL-NEXT: [[BROADCAST_SPLATINSERT3:%.*]] = insertelement <2 x i32> poison, i32 [[TMP2]], i64 0
	; UNROLL-NEXT: [[BROADCAST_SPLAT4:%.*]] = shufflevector <2 x i32> [[BROADCAST_SPLATINSERT3]], <2 x i32> poison, <2 x i32> zeroinitializer			; UNROLL-NEXT: [[BROADCAST_SPLAT4:%.*]] = shufflevector <2 x i32> [[BROADCAST_SPLATINSERT3]], <2 x i32> poison, <2 x i32> zeroinitializer
	; UNROLL-NEXT: [[TMP3:%.*]] = mul nsw <2 x i32> [[BROADCAST_SPLAT]], [[TMP0]]			; UNROLL-NEXT: [[TMP3:%.*]] = mul nsw <2 x i32> [[BROADCAST_SPLAT4]], [[TMP0]]
	; UNROLL-NEXT: [[TMP4:%.*]] = mul nsw <2 x i32> [[BROADCAST_SPLAT4]], [[TMP1]]			; UNROLL-NEXT: [[TMP4:%.*]] = mul nsw <2 x i32> [[BROADCAST_SPLAT4]], [[TMP1]]
	; UNROLL-NEXT: [[SEXT:%.*]] = shl i64 [[INDEX]], 32			; UNROLL-NEXT: [[SEXT:%.*]] = shl i64 [[INDEX]], 32
	; UNROLL-NEXT: [[TMP5:%.*]] = ashr exact i64 [[SEXT]], 32			; UNROLL-NEXT: [[TMP5:%.*]] = ashr exact i64 [[SEXT]], 32
	; UNROLL-NEXT: [[TMP6:%.]] = getelementptr i32, i32 [[DST:%.*]], i64 [[TMP5]]			; UNROLL-NEXT: [[TMP6:%.]] = getelementptr i32, i32 [[DST:%.*]], i64 [[TMP5]]
	; UNROLL-NEXT: [[TMP7:%.*]] = add <2 x i32> [[VEC_IND]], [[TMP3]]			; UNROLL-NEXT: [[TMP7:%.*]] = add <2 x i32> [[VEC_IND]], [[TMP3]]
	; UNROLL-NEXT: [[TMP8:%.*]] = add <2 x i32> [[STEP_ADD]], [[TMP4]]			; UNROLL-NEXT: [[TMP8:%.*]] = add <2 x i32> [[STEP_ADD]], [[TMP4]]
	; UNROLL-NEXT: [[TMP9:%.]] = bitcast i32 [[TMP6]] to <2 x i32>*			; UNROLL-NEXT: [[TMP9:%.]] = bitcast i32 [[TMP6]] to <2 x i32>*
	; UNROLL-NEXT: store <2 x i32> [[TMP7]], <2 x i32>* [[TMP9]], align 4			; UNROLL-NEXT: store <2 x i32> [[TMP7]], <2 x i32>* [[TMP9]], align 4
	Show All 24 Lines
	; UNROLL-NO-IC-NEXT: [[VEC_IND:%.]] = phi <2 x i32> [ <i32 0, i32 1>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]			; UNROLL-NO-IC-NEXT: [[VEC_IND:%.]] = phi <2 x i32> [ <i32 0, i32 1>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
	; UNROLL-NO-IC-NEXT: [[STEP_ADD]] = add <2 x i32> [[VEC_IND]], <i32 2, i32 2>			; UNROLL-NO-IC-NEXT: [[STEP_ADD]] = add <2 x i32> [[VEC_IND]], <i32 2, i32 2>
	; UNROLL-NO-IC-NEXT: [[TMP0:%.*]] = trunc i64 [[INDEX]] to i32			; UNROLL-NO-IC-NEXT: [[TMP0:%.*]] = trunc i64 [[INDEX]] to i32
	; UNROLL-NO-IC-NEXT: [[TMP1:%.*]] = add i32 [[TMP0]], 0			; UNROLL-NO-IC-NEXT: [[TMP1:%.*]] = add i32 [[TMP0]], 0
	; UNROLL-NO-IC-NEXT: [[TMP2:%.*]] = add i32 [[TMP0]], 2			; UNROLL-NO-IC-NEXT: [[TMP2:%.*]] = add i32 [[TMP0]], 2
	; UNROLL-NO-IC-NEXT: [[TMP3:%.*]] = shufflevector <2 x i32> [[VECTOR_RECUR]], <2 x i32> [[VEC_IND]], <2 x i32> <i32 1, i32 2>			; UNROLL-NO-IC-NEXT: [[TMP3:%.*]] = shufflevector <2 x i32> [[VECTOR_RECUR]], <2 x i32> [[VEC_IND]], <2 x i32> <i32 1, i32 2>
	; UNROLL-NO-IC-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[VEC_IND]], <2 x i32> [[STEP_ADD]], <2 x i32> <i32 1, i32 2>			; UNROLL-NO-IC-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[VEC_IND]], <2 x i32> [[STEP_ADD]], <2 x i32> <i32 1, i32 2>
	; UNROLL-NO-IC-NEXT: [[TMP5:%.]] = load i32, i32 [[SRC:%.*]], align 4			; UNROLL-NO-IC-NEXT: [[TMP5:%.]] = load i32, i32 [[SRC:%.*]], align 4
	; UNROLL-NO-IC-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <2 x i32> poison, i32 [[TMP5]], i32 0			; UNROLL-NO-IC-NEXT: [[BROADCAST_SPLATINSERT3:%.*]] = insertelement <2 x i32> poison, i32 [[TMP5]], i32 0
	; UNROLL-NO-IC-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i32> [[BROADCAST_SPLATINSERT]], <2 x i32> poison, <2 x i32> zeroinitializer
	; UNROLL-NO-IC-NEXT: [[TMP6:%.]] = load i32, i32 [[SRC]], align 4
	; UNROLL-NO-IC-NEXT: [[BROADCAST_SPLATINSERT3:%.*]] = insertelement <2 x i32> poison, i32 [[TMP6]], i32 0
	; UNROLL-NO-IC-NEXT: [[BROADCAST_SPLAT4:%.*]] = shufflevector <2 x i32> [[BROADCAST_SPLATINSERT3]], <2 x i32> poison, <2 x i32> zeroinitializer			; UNROLL-NO-IC-NEXT: [[BROADCAST_SPLAT4:%.*]] = shufflevector <2 x i32> [[BROADCAST_SPLATINSERT3]], <2 x i32> poison, <2 x i32> zeroinitializer
	; UNROLL-NO-IC-NEXT: [[TMP7:%.*]] = mul nsw <2 x i32> [[BROADCAST_SPLAT]], [[TMP3]]			; UNROLL-NO-IC-NEXT: [[TMP6:%.*]] = mul nsw <2 x i32> [[BROADCAST_SPLAT4]], [[TMP3]]
	; UNROLL-NO-IC-NEXT: [[TMP8:%.*]] = mul nsw <2 x i32> [[BROADCAST_SPLAT4]], [[TMP4]]			; UNROLL-NO-IC-NEXT: [[TMP7:%.*]] = mul nsw <2 x i32> [[BROADCAST_SPLAT4]], [[TMP4]]
	; UNROLL-NO-IC-NEXT: [[TMP9:%.]] = getelementptr i32, i32 [[DST:%.*]], i32 [[TMP1]]			; UNROLL-NO-IC-NEXT: [[TMP8:%.]] = getelementptr i32, i32 [[DST:%.*]], i32 [[TMP1]]
	; UNROLL-NO-IC-NEXT: [[TMP10:%.]] = getelementptr i32, i32 [[DST]], i32 [[TMP2]]			; UNROLL-NO-IC-NEXT: [[TMP9:%.]] = getelementptr i32, i32 [[DST]], i32 [[TMP2]]
	; UNROLL-NO-IC-NEXT: [[TMP11:%.*]] = add <2 x i32> [[VEC_IND]], [[TMP7]]			; UNROLL-NO-IC-NEXT: [[TMP10:%.*]] = add <2 x i32> [[VEC_IND]], [[TMP6]]
	; UNROLL-NO-IC-NEXT: [[TMP12:%.*]] = add <2 x i32> [[STEP_ADD]], [[TMP8]]			; UNROLL-NO-IC-NEXT: [[TMP11:%.*]] = add <2 x i32> [[STEP_ADD]], [[TMP7]]
	; UNROLL-NO-IC-NEXT: [[TMP13:%.]] = getelementptr i32, i32 [[TMP9]], i32 0			; UNROLL-NO-IC-NEXT: [[TMP12:%.]] = getelementptr i32, i32 [[TMP8]], i32 0
	; UNROLL-NO-IC-NEXT: [[TMP14:%.]] = bitcast i32 [[TMP13]] to <2 x i32>*			; UNROLL-NO-IC-NEXT: [[TMP13:%.]] = bitcast i32 [[TMP12]] to <2 x i32>*
	; UNROLL-NO-IC-NEXT: store <2 x i32> [[TMP11]], <2 x i32>* [[TMP14]], align 4			; UNROLL-NO-IC-NEXT: store <2 x i32> [[TMP10]], <2 x i32>* [[TMP13]], align 4
	; UNROLL-NO-IC-NEXT: [[TMP15:%.]] = getelementptr i32, i32 [[TMP9]], i32 2			; UNROLL-NO-IC-NEXT: [[TMP14:%.]] = getelementptr i32, i32 [[TMP8]], i32 2
	; UNROLL-NO-IC-NEXT: [[TMP16:%.]] = bitcast i32 [[TMP15]] to <2 x i32>*			; UNROLL-NO-IC-NEXT: [[TMP15:%.]] = bitcast i32 [[TMP14]] to <2 x i32>*
	; UNROLL-NO-IC-NEXT: store <2 x i32> [[TMP12]], <2 x i32>* [[TMP16]], align 4			; UNROLL-NO-IC-NEXT: store <2 x i32> [[TMP11]], <2 x i32>* [[TMP15]], align 4
	; UNROLL-NO-IC-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; UNROLL-NO-IC-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; UNROLL-NO-IC-NEXT: [[VEC_IND_NEXT]] = add <2 x i32> [[STEP_ADD]], <i32 2, i32 2>			; UNROLL-NO-IC-NEXT: [[VEC_IND_NEXT]] = add <2 x i32> [[STEP_ADD]], <i32 2, i32 2>
	; UNROLL-NO-IC-NEXT: [[TMP17:%.*]] = icmp eq i64 [[INDEX_NEXT]], 100			; UNROLL-NO-IC-NEXT: [[TMP16:%.*]] = icmp eq i64 [[INDEX_NEXT]], 100
	; UNROLL-NO-IC-NEXT: br i1 [[TMP17]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP52:![0-9]+]]			; UNROLL-NO-IC-NEXT: br i1 [[TMP16]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP52:![0-9]+]]
	; UNROLL-NO-IC: middle.block:			; UNROLL-NO-IC: middle.block:
	; UNROLL-NO-IC-NEXT: [[CMP_N:%.*]] = icmp eq i64 100, 100			; UNROLL-NO-IC-NEXT: [[CMP_N:%.*]] = icmp eq i64 100, 100
	; UNROLL-NO-IC-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <2 x i32> [[STEP_ADD]], i32 1			; UNROLL-NO-IC-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <2 x i32> [[STEP_ADD]], i32 1
	; UNROLL-NO-IC-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] = extractelement <2 x i32> [[STEP_ADD]], i32 0			; UNROLL-NO-IC-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] = extractelement <2 x i32> [[STEP_ADD]], i32 0
	; UNROLL-NO-IC-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]			; UNROLL-NO-IC-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
	; UNROLL-NO-IC: scalar.ph:			; UNROLL-NO-IC: scalar.ph:
	; UNROLL-NO-IC-NEXT: [[SCALAR_RECUR_INIT:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[VECTOR_RECUR_EXTRACT]], [[MIDDLE_BLOCK]] ]			; UNROLL-NO-IC-NEXT: [[SCALAR_RECUR_INIT:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[VECTOR_RECUR_EXTRACT]], [[MIDDLE_BLOCK]] ]
	; UNROLL-NO-IC-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ 100, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]			; UNROLL-NO-IC-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ 100, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
	Show All 24 Lines
	; INTERLEAVE: vector.body:			; INTERLEAVE: vector.body:
	; INTERLEAVE-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; INTERLEAVE-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; INTERLEAVE-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i32> [ <i32 poison, i32 poison, i32 poison, i32 0>, [[VECTOR_PH]] ], [ [[STEP_ADD:%.]], [[VECTOR_BODY]] ]			; INTERLEAVE-NEXT: [[VECTOR_RECUR:%.]] = phi <4 x i32> [ <i32 poison, i32 poison, i32 poison, i32 0>, [[VECTOR_PH]] ], [ [[STEP_ADD:%.]], [[VECTOR_BODY]] ]
	; INTERLEAVE-NEXT: [[VEC_IND:%.]] = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]			; INTERLEAVE-NEXT: [[VEC_IND:%.]] = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
	; INTERLEAVE-NEXT: [[STEP_ADD]] = add <4 x i32> [[VEC_IND]], <i32 4, i32 4, i32 4, i32 4>			; INTERLEAVE-NEXT: [[STEP_ADD]] = add <4 x i32> [[VEC_IND]], <i32 4, i32 4, i32 4, i32 4>
	; INTERLEAVE-NEXT: [[TMP0:%.*]] = shufflevector <4 x i32> [[VECTOR_RECUR]], <4 x i32> [[VEC_IND]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>			; INTERLEAVE-NEXT: [[TMP0:%.*]] = shufflevector <4 x i32> [[VECTOR_RECUR]], <4 x i32> [[VEC_IND]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
	; INTERLEAVE-NEXT: [[TMP1:%.*]] = shufflevector <4 x i32> [[VEC_IND]], <4 x i32> [[STEP_ADD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>			; INTERLEAVE-NEXT: [[TMP1:%.*]] = shufflevector <4 x i32> [[VEC_IND]], <4 x i32> [[STEP_ADD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
	; INTERLEAVE-NEXT: [[TMP2:%.]] = load i32, i32 [[SRC:%.*]], align 4			; INTERLEAVE-NEXT: [[TMP2:%.]] = load i32, i32 [[SRC:%.*]], align 4
	; INTERLEAVE-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[TMP2]], i64 0
	; INTERLEAVE-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
	; INTERLEAVE-NEXT: [[BROADCAST_SPLATINSERT3:%.*]] = insertelement <4 x i32> poison, i32 [[TMP2]], i64 0			; INTERLEAVE-NEXT: [[BROADCAST_SPLATINSERT3:%.*]] = insertelement <4 x i32> poison, i32 [[TMP2]], i64 0
	; INTERLEAVE-NEXT: [[BROADCAST_SPLAT4:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT3]], <4 x i32> poison, <4 x i32> zeroinitializer			; INTERLEAVE-NEXT: [[BROADCAST_SPLAT4:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT3]], <4 x i32> poison, <4 x i32> zeroinitializer
	; INTERLEAVE-NEXT: [[TMP3:%.*]] = mul nsw <4 x i32> [[BROADCAST_SPLAT]], [[TMP0]]			; INTERLEAVE-NEXT: [[TMP3:%.*]] = mul nsw <4 x i32> [[BROADCAST_SPLAT4]], [[TMP0]]
	; INTERLEAVE-NEXT: [[TMP4:%.*]] = mul nsw <4 x i32> [[BROADCAST_SPLAT4]], [[TMP1]]			; INTERLEAVE-NEXT: [[TMP4:%.*]] = mul nsw <4 x i32> [[BROADCAST_SPLAT4]], [[TMP1]]
	; INTERLEAVE-NEXT: [[SEXT:%.*]] = shl i64 [[INDEX]], 32			; INTERLEAVE-NEXT: [[SEXT:%.*]] = shl i64 [[INDEX]], 32
	; INTERLEAVE-NEXT: [[TMP5:%.*]] = ashr exact i64 [[SEXT]], 32			; INTERLEAVE-NEXT: [[TMP5:%.*]] = ashr exact i64 [[SEXT]], 32
	; INTERLEAVE-NEXT: [[TMP6:%.]] = getelementptr i32, i32 [[DST:%.*]], i64 [[TMP5]]			; INTERLEAVE-NEXT: [[TMP6:%.]] = getelementptr i32, i32 [[DST:%.*]], i64 [[TMP5]]
	; INTERLEAVE-NEXT: [[TMP7:%.*]] = add <4 x i32> [[VEC_IND]], [[TMP3]]			; INTERLEAVE-NEXT: [[TMP7:%.*]] = add <4 x i32> [[VEC_IND]], [[TMP3]]
	; INTERLEAVE-NEXT: [[TMP8:%.*]] = add <4 x i32> [[STEP_ADD]], [[TMP4]]			; INTERLEAVE-NEXT: [[TMP8:%.*]] = add <4 x i32> [[STEP_ADD]], [[TMP4]]
	; INTERLEAVE-NEXT: [[TMP9:%.]] = bitcast i32 [[TMP6]] to <4 x i32>*			; INTERLEAVE-NEXT: [[TMP9:%.]] = bitcast i32 [[TMP6]] to <4 x i32>*
	; INTERLEAVE-NEXT: store <4 x i32> [[TMP7]], <4 x i32>* [[TMP9]], align 4			; INTERLEAVE-NEXT: store <4 x i32> [[TMP7]], <4 x i32>* [[TMP9]], align 4
	▲ Show 20 Lines • Show All 471 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/vplan-sink-scalars-and-merge.ll

	Show First 20 Lines • Show All 247 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: <x1> vector loop: {			; CHECK-NEXT: <x1> vector loop: {
	; CHECK-NEXT: vector.body:			; CHECK-NEXT: vector.body:
	; CHECK-NEXT: EMIT vp<[[CAN_IV:%.+]]> = CANONICAL-INDUCTION			; CHECK-NEXT: EMIT vp<[[CAN_IV:%.+]]> = CANONICAL-INDUCTION
	; CHECK-NEXT: WIDEN-INDUCTION %iv = phi 21, %iv.next, ir<1>			; CHECK-NEXT: WIDEN-INDUCTION %iv = phi 21, %iv.next, ir<1>
	; CHECK-NEXT: vp<[[STEPS:%.+]]> = SCALAR-STEPS vp<[[CAN_IV]]>, ir<21>, ir<1>			; CHECK-NEXT: vp<[[STEPS:%.+]]> = SCALAR-STEPS vp<[[CAN_IV]]>, ir<21>, ir<1>
	; CHECK-NEXT: EMIT vp<[[WIDE_CAN_IV:%.+]]> = WIDEN-CANONICAL-INDUCTION vp<[[CAN_IV]]>			; CHECK-NEXT: EMIT vp<[[WIDE_CAN_IV:%.+]]> = WIDEN-CANONICAL-INDUCTION vp<[[CAN_IV]]>
	; CHECK-NEXT: EMIT vp<[[MASK:%.+]]> = icmp ule vp<[[WIDE_CAN_IV]]> vp<[[BTC]]>			; CHECK-NEXT: EMIT vp<[[MASK:%.+]]> = icmp ule vp<[[WIDE_CAN_IV]]> vp<[[BTC]]>
	; CHECK-NEXT: CLONE ir<%gep.A.uniform> = getelementptr ir<%A>, ir<0>			; CHECK-NEXT: CLONE ir<%gep.A.uniform> = getelementptr ir<%A>, ir<0>
	; CHECK-NEXT: CLONE ir<%lv> = load ir<%gep.A.uniform>			; CHECK-NEXT: UNIFORM-MEM ir<%lv> = load ir<%gep.A.uniform>
	; CHECK-NEXT: WIDEN ir<%cmp> = icmp ir<%iv>, ir<%k>			; CHECK-NEXT: WIDEN ir<%cmp> = icmp ir<%iv>, ir<%k>
	; CHECK-NEXT: Successor(s): loop.then			; CHECK-NEXT: Successor(s): loop.then
	; CHECK-EMPTY:			; CHECK-EMPTY:
	; CHECK-NEXT: loop.then:			; CHECK-NEXT: loop.then:
	; CHECK-NEXT: EMIT vp<[[NOT2:%.+]]> = not ir<%cmp>			; CHECK-NEXT: EMIT vp<[[NOT2:%.+]]> = not ir<%cmp>
	; CHECK-NEXT: EMIT vp<[[MASK2:%.+]]> = select vp<[[MASK]]> vp<[[NOT2]]> ir<false>			; CHECK-NEXT: EMIT vp<[[MASK2:%.+]]> = select vp<[[MASK]]> vp<[[NOT2]]> ir<false>
	; CHECK-NEXT: Successor(s): pred.store			; CHECK-NEXT: Successor(s): pred.store
	; CHECK-EMPTY:			; CHECK-EMPTY:
	▲ Show 20 Lines • Show All 866 Lines • Show Last 20 Lines

llvm/unittests/Transforms/Vectorize/VPlanTest.cpp

Show First 20 Lines • Show All 932 Lines • ▼ Show 20 Lines	TEST(VPRecipeTest, CastVPWidenMemoryInstructionRecipeToVPUserAndVPDef) {
LLVMContext C;		LLVMContext C;

IntegerType *Int32 = IntegerType::get(C, 32);		IntegerType *Int32 = IntegerType::get(C, 32);
PointerType *Int32Ptr = PointerType::get(Int32, 0);		PointerType *Int32Ptr = PointerType::get(Int32, 0);
auto *Load =		auto *Load =
new LoadInst(Int32, UndefValue::get(Int32Ptr), "", false, Align(1));		new LoadInst(Int32, UndefValue::get(Int32Ptr), "", false, Align(1));
VPValue Addr;		VPValue Addr;
VPValue Mask;		VPValue Mask;
VPWidenMemoryInstructionRecipe Recipe(*Load, &Addr, &Mask, true, false);		VPWidenMemoryInstructionRecipe Recipe(*Load, &Addr, &Mask, true, false, false);
EXPECT_TRUE(isa<VPUser>(&Recipe));		EXPECT_TRUE(isa<VPUser>(&Recipe));
VPRecipeBase *BaseR = &Recipe;		VPRecipeBase *BaseR = &Recipe;
EXPECT_TRUE(isa<VPUser>(BaseR));		EXPECT_TRUE(isa<VPUser>(BaseR));
EXPECT_EQ(&Recipe, BaseR);		EXPECT_EQ(&Recipe, BaseR);

VPValue *VPV = Recipe.getVPSingleValue();		VPValue *VPV = Recipe.getVPSingleValue();
EXPECT_TRUE(isa<VPRecipeBase>(VPV->getDef()));		EXPECT_TRUE(isa<VPRecipeBase>(VPV->getDef()));
EXPECT_EQ(&Recipe, dyn_cast<VPRecipeBase>(VPV->getDef()));		EXPECT_EQ(&Recipe, dyn_cast<VPRecipeBase>(VPV->getDef()));
▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	PointerType *Int32Ptr = PointerType::get(Int32, 0);
EXPECT_FALSE(Recipe.mayReadOrWriteMemory());		EXPECT_FALSE(Recipe.mayReadOrWriteMemory());
}		}

{		{
auto *Load =		auto *Load =
new LoadInst(Int32, UndefValue::get(Int32Ptr), "", false, Align(1));		new LoadInst(Int32, UndefValue::get(Int32Ptr), "", false, Align(1));
VPValue Addr;		VPValue Addr;
VPValue Mask;		VPValue Mask;
VPWidenMemoryInstructionRecipe Recipe(*Load, &Addr, &Mask, true, false);		VPWidenMemoryInstructionRecipe Recipe(*Load, &Addr, &Mask, true, false, false);
EXPECT_TRUE(Recipe.mayHaveSideEffects());		EXPECT_TRUE(Recipe.mayHaveSideEffects());
EXPECT_TRUE(Recipe.mayReadFromMemory());		EXPECT_TRUE(Recipe.mayReadFromMemory());
EXPECT_FALSE(Recipe.mayWriteToMemory());		EXPECT_FALSE(Recipe.mayWriteToMemory());
EXPECT_TRUE(Recipe.mayReadOrWriteMemory());		EXPECT_TRUE(Recipe.mayReadOrWriteMemory());
delete Load;		delete Load;
}		}

{		{
auto *Store = new StoreInst(UndefValue::get(Int32),		auto *Store = new StoreInst(UndefValue::get(Int32),
UndefValue::get(Int32Ptr), false, Align(1));		UndefValue::get(Int32Ptr), false, Align(1));
VPValue Addr;		VPValue Addr;
VPValue Mask;		VPValue Mask;
VPValue StoredV;		VPValue StoredV;
VPWidenMemoryInstructionRecipe Recipe(*Store, &Addr, &StoredV, &Mask, false,		VPWidenMemoryInstructionRecipe Recipe(*Store, &Addr, &StoredV, &Mask, false,
false);		false, false);
EXPECT_TRUE(Recipe.mayHaveSideEffects());		EXPECT_TRUE(Recipe.mayHaveSideEffects());
EXPECT_FALSE(Recipe.mayReadFromMemory());		EXPECT_FALSE(Recipe.mayReadFromMemory());
EXPECT_TRUE(Recipe.mayWriteToMemory());		EXPECT_TRUE(Recipe.mayWriteToMemory());
EXPECT_TRUE(Recipe.mayReadOrWriteMemory());		EXPECT_TRUE(Recipe.mayReadOrWriteMemory());
delete Store;		delete Store;
}		}

{		{
▲ Show 20 Lines • Show All 156 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LV] Explicily lower uniform load as single loadAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 456471

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/lib/Transforms/Vectorize/VPlan.h

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

llvm/test/Transforms/LoopVectorize/RISCV/uniform-load-store.ll

llvm/test/Transforms/LoopVectorize/X86/cost-model-assert.ll

llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll

llvm/test/Transforms/LoopVectorize/first-order-recurrence-sink-replicate-region.ll

llvm/test/Transforms/LoopVectorize/induction.ll

llvm/test/Transforms/LoopVectorize/vplan-sink-scalars-and-merge.ll

llvm/unittests/Transforms/Vectorize/VPlanTest.cpp

[LV] Explicily lower uniform load as single load
AbandonedPublic