This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
1
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
RISCV/
1/3
uniform-load-store.ll
-
first-order-recurrence-sink-replicate-region.ll
-
pr46525-expander-insertpoint.ll
-
vplan-sink-scalars-and-merge.ll

Differential D131093

[LV] Restructure isPredicatedInst and isScalarWithPredication (w/a fix for uniform mem ops)
ClosedPublic

Authored by reames on Aug 3 2022, 12:36 PM.

Download Raw Diff

Details

Reviewers

fhahn
david-arm

Commits

rG531dd3634dd1: [LV] Restructure isPredicatedInst and isScalarWithPredication (w/a fix for…

Summary

This change reorganizes the code and comments to make the expected semantics of these routines more clear. However, this is *not* an NFC change. The functional change is having isScalarWithPredication return false if the instruction does not need predicated. Specifically, for the case of a uniform memory operation we were previously considering it *not* to be a predicated instruction, but *were* considering it to be scalable with predication.

As can be seen with the test changes, this causes uniform memory ops which should have been lowered as uniform-per-parts values to instead be lowering via naive scalarization or if scalarization is infeasible (i.e. scalable vectors) aborted entirely.

I also don't trust the code to bail out correctly 100% of the time, so it's possible we had a crash or miscompile from trying to scalarize something which isn't scalaralizable. I haven't found a concrete example here, but I am suspicious.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

reames created this revision.Aug 3 2022, 12:36 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 3 2022, 12:36 PM

Herald added subscribers: frasercrmck, luismarques, apazos and 22 others. · View Herald Transcript

reames requested review of this revision.Aug 3 2022, 12:36 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 3 2022, 12:36 PM

Herald added subscribers: alextsao1999, • pcwang-thead, MaskRay. · View Herald Transcript

Harbormaster completed remote builds in B179089: Diff 449738.Aug 3 2022, 1:50 PM

reames added a child revision: D131118: [LV] Add generic scalarization support for unpredicated scalable vectors.Aug 3 2022, 2:54 PM

david-arm added inline comments.Aug 11 2022, 12:51 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1438–1441	nit: I think it should be 'instruction'
llvm/test/Transforms/LoopVectorize/RISCV/uniform-load-store.ll
585	I tried vectorising these tests with SVE (`opt -loop-vectorize -mattr=+sve -mtriple=aarch64-linux-gnu < ../llvm/test/Transforms/LoopVectorize/RISCV/uniform-load-store.ll -S`) and it already works today without this patch. In fact it works both and without tail-folding enabled. Do you know why RISC-V is different to SVE and why this patch is needed to vectorise?

Matt added a subscriber: Matt.Aug 12 2022, 12:36 PM

reames added inline comments.Aug 15 2022, 9:00 AM

llvm/test/Transforms/LoopVectorize/RISCV/uniform-load-store.ll
585	This is just a guess, but... aarch64 and riscv have different sets of legal scatters and gathers. The widening cost currently bails for illegal scatters we don't know we can scalarize. If aarch64 was picking scatter, but then optimizing to a scalarization, we'd see something like this. This doesn't explain why aarch64 is able to handle unaligned uniform loads and riscv isn't though. I didn't dig into that in detail. This type of difference isn't particularly surprising though. This code is delicate and the same result can be reached through multiple event chains. That doesn't mean we shouldn't fix issues when we find them.

fix typo

Harbormaster completed remote builds in B181303: Diff 452699.Aug 15 2022, 10:49 AM

This patch looks sensible to me! I just had one question about one of the tests.

llvm/test/Transforms/LoopVectorize/RISCV/uniform-load-store.ll
585	That seems like a good explanation. I imagined it probably had something to do with gathers/scatters.
llvm/test/Transforms/SLPVectorizer/RISCV/load-store.ll
65 ↗	(On Diff #452699)	I'm not sure why this test is part of this patch, since it's related to SLP?

reames added inline comments.Aug 16 2022, 7:55 AM

llvm/test/Transforms/SLPVectorizer/RISCV/load-store.ll
65 ↗	(On Diff #452699)	This was a bad rebase. Will remove.

Drop a spurious test change accidentally added in a rebase.

Harbormaster completed remote builds in B181556: Diff 453045.Aug 16 2022, 10:03 AM

LGTM!

This revision is now accepted and ready to land.Aug 18 2022, 5:41 AM

LGTM, thanks!

This revision was landed with ongoing or failed builds.Aug 18 2022, 7:14 AM

Closed by commit rG531dd3634dd1: [LV] Restructure isPredicatedInst and isScalarWithPredication (w/a fix for… (authored by reames). · Explain Why

This revision was automatically updated to reflect the committed changes.

reames added a commit: rG531dd3634dd1: [LV] Restructure isPredicatedInst and isScalarWithPredication (w/a fix for….

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

88 lines

test/

Transforms/

LoopVectorize/

RISCV/

uniform-load-store.ll

78 lines

first-order-recurrence-sink-replicate-region.ll

2 lines

pr46525-expander-insertpoint.ll

1 line

vplan-sink-scalars-and-merge.ll

2 lines

Diff 453654

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,429 Lines • ▼ Show 20 Lines	public:
/// variables found for the given VF.		/// variables found for the given VF.
bool canVectorizeReductions(ElementCount VF) const {		bool canVectorizeReductions(ElementCount VF) const {
return (all_of(Legal->getReductionVars(), [&](auto &Reduction) -> bool {		return (all_of(Legal->getReductionVars(), [&](auto &Reduction) -> bool {
const RecurrenceDescriptor &RdxDesc = Reduction.second;		const RecurrenceDescriptor &RdxDesc = Reduction.second;
return TTI.isLegalToVectorizeReduction(RdxDesc, VF);		return TTI.isLegalToVectorizeReduction(RdxDesc, VF);
}));		}));
}		}

/// Returns true if \p I is an instruction that will be scalarized with		/// Returns true if \p I is an instruction which requires predication and
/// predication when vectorizing \p I with vectorization factor \p VF. Such		/// for which our chosen predication strategy is scalarization (i.e. we
/// instructions include conditional stores and instructions that may divide		/// don't have an alternate strategy such as masking available).
/// by zero.		/// \p VF is the vectorization factor that will be used to vectorize \p I.
		david-armUnsubmitted Not Done Reply Inline Actions nit: I think it should be 'instruction' david-arm: nit: I think it should be 'instruction'
bool isScalarWithPredication(Instruction *I, ElementCount VF) const;		bool isScalarWithPredication(Instruction *I, ElementCount VF) const;

// Returns true if \p I is an instruction that will be predicated either		/// Returns true if \p I is an instruction that needs to be predicated
// through scalar predication or masked load/store or masked gather/scatter.		/// at runtime. The result is independent of the predication mechanism.
// \p VF is the vectorization factor that will be used to vectorize \p I.		/// \p VF is the vectorization factor that will be used to vectorize \p I.
// Superset of instructions that return true for isScalarWithPredication.		/// Superset of instructions that return true for isScalarWithPredication.
bool isPredicatedInst(Instruction *I, ElementCount VF) const;		bool isPredicatedInst(Instruction *I, ElementCount VF) const;

/// Returns true if \p I is a memory instruction with consecutive memory		/// Returns true if \p I is a memory instruction with consecutive memory
/// access that can be widened.		/// access that can be widened.
bool		bool
memoryInstructionCanBeWidened(Instruction *I,		memoryInstructionCanBeWidened(Instruction *I,
ElementCount VF = ElementCount::getFixed(1));		ElementCount VF = ElementCount::getFixed(1));

▲ Show 20 Lines • Show All 2,951 Lines • ▼ Show 20 Lines	LLVM_DEBUG(dbgs() << "LV: Found scalar instruction: " << *IndUpdate
<< "\n");		<< "\n");
}		}

Scalars[VF].insert(Worklist.begin(), Worklist.end());		Scalars[VF].insert(Worklist.begin(), Worklist.end());
}		}

bool LoopVectorizationCostModel::isScalarWithPredication(		bool LoopVectorizationCostModel::isScalarWithPredication(
Instruction *I, ElementCount VF) const {		Instruction *I, ElementCount VF) const {
if (!blockNeedsPredicationForAnyReason(I->getParent()))		if (!isPredicatedInst(I, VF))
return false;		return false;

		// Do we have a non-scalar lowering for this predicated
		// instruction? No - it is scalar with predication.
switch(I->getOpcode()) {		switch(I->getOpcode()) {
default:		default:
break;		return true;
case Instruction::Load:		case Instruction::Load:
case Instruction::Store: {		case Instruction::Store: {
if (!Legal->isMaskRequired(I))
return false;
auto *Ptr = getLoadStorePointerOperand(I);		auto *Ptr = getLoadStorePointerOperand(I);
auto *Ty = getLoadStoreType(I);		auto *Ty = getLoadStoreType(I);
Type *VTy = Ty;		Type *VTy = Ty;
if (VF.isVector())		if (VF.isVector())
VTy = VectorType::get(Ty, VF);		VTy = VectorType::get(Ty, VF);
const Align Alignment = getLoadStoreAlignment(I);		const Align Alignment = getLoadStoreAlignment(I);
return isa<LoadInst>(I) ? !(isLegalMaskedLoad(Ty, Ptr, Alignment) \|\|		return isa<LoadInst>(I) ? !(isLegalMaskedLoad(Ty, Ptr, Alignment) \|\|
TTI.isLegalMaskedGather(VTy, Alignment))		TTI.isLegalMaskedGather(VTy, Alignment))
: !(isLegalMaskedStore(Ty, Ptr, Alignment) \|\|		: !(isLegalMaskedStore(Ty, Ptr, Alignment) \|\|
TTI.isLegalMaskedScatter(VTy, Alignment));		TTI.isLegalMaskedScatter(VTy, Alignment));
}		}
case Instruction::UDiv:
case Instruction::SDiv:
case Instruction::SRem:
case Instruction::URem:
// TODO: We can use the loop-preheader as context point here and get
// context sensitive reasoning
return !isSafeToSpeculativelyExecute(I);
}		}
return false;
}		}

bool LoopVectorizationCostModel::isPredicatedInst(Instruction *I,		bool LoopVectorizationCostModel::isPredicatedInst(Instruction *I,
ElementCount VF) const {		ElementCount VF) const {
		if (!blockNeedsPredicationForAnyReason(I->getParent()))
		return false;

		// Can we prove this instruction is safe to unconditionally execute?
		// If not, we must use some form of predication.
		switch(I->getOpcode()) {
		default:
		return false;
		case Instruction::Load:
		case Instruction::Store: {
		if (!Legal->isMaskRequired(I))
		return false;
// When we know the load's address is loop invariant and the instruction		// When we know the load's address is loop invariant and the instruction
// in the original scalar loop was unconditionally executed then we		// in the original scalar loop was unconditionally executed then we
// don't need to mark it as a predicated instruction. Tail folding may		// don't need to mark it as a predicated instruction. Tail folding may
// introduce additional predication, but we're guaranteed to always have		// introduce additional predication, but we're guaranteed to always have
// at least one active lane. We call Legal->blockNeedsPredication here		// at least one active lane. We call Legal->blockNeedsPredication here
// because it doesn't query tail-folding. For stores, we need to prove		// because it doesn't query tail-folding. For stores, we need to prove
// both speculation safety (which follows from the same argument as loads),		// both speculation safety (which follows from the same argument as loads),
// but also must prove the value being stored is correct. The easiest		// but also must prove the value being stored is correct. The easiest
// form of the later is to require that all values stored are the same.		// form of the later is to require that all values stored are the same.
if (Legal->isUniformMemOp(*I) &&		if (Legal->isUniformMemOp(*I) &&
(isa<LoadInst>(I) \|\|		(isa<LoadInst>(I) \|\|
(isa<StoreInst>(I) &&		(isa<StoreInst>(I) &&
TheLoop->isLoopInvariant(cast<StoreInst>(I)->getValueOperand()))) &&		TheLoop->isLoopInvariant(cast<StoreInst>(I)->getValueOperand()))) &&
!Legal->blockNeedsPredication(I->getParent()))		!Legal->blockNeedsPredication(I->getParent()))
return false;		return false;
if (!blockNeedsPredicationForAnyReason(I->getParent()))		return true;
return false;		}
// Loads and stores that need some form of masked operation are predicated		case Instruction::UDiv:
// instructions.		case Instruction::SDiv:
if (isa<LoadInst>(I) \|\| isa<StoreInst>(I))		case Instruction::SRem:
return Legal->isMaskRequired(I);		case Instruction::URem:
return isScalarWithPredication(I, VF);		// TODO: We can use the loop-preheader as context point here and get
		// context sensitive reasoning
		return !isSafeToSpeculativelyExecute(I);
		}
}		}

bool LoopVectorizationCostModel::interleavedAccessCanBeWidened(		bool LoopVectorizationCostModel::interleavedAccessCanBeWidened(
Instruction *I, ElementCount VF) {		Instruction *I, ElementCount VF) {
assert(isAccessInterleaved(I) && "Expecting interleaved access.");		assert(isAccessInterleaved(I) && "Expecting interleaved access.");
assert(getWideningDecision(I, VF) == CM_Unknown &&		assert(getWideningDecision(I, VF) == CM_Unknown &&
"Decision should not be set yet.");		"Decision should not be set yet.");
auto *Group = getInterleavedAccessGroup(I);		auto *Group = getInterleavedAccessGroup(I);
▲ Show 20 Lines • Show All 1,587 Lines • ▼ Show 20 Lines	bool LoopVectorizationCostModel::useEmulatedMaskMemRefHack(Instruction *I,
// TODO: Cost model for emulated masked load/store is completely		// TODO: Cost model for emulated masked load/store is completely
// broken. This hack guides the cost model to use an artificially		// broken. This hack guides the cost model to use an artificially
// high enough value to practically disable vectorization with such		// high enough value to practically disable vectorization with such
// operations, except where previously deployed legality hack allowed		// operations, except where previously deployed legality hack allowed
// using very low cost values. This is to avoid regressions coming simply		// using very low cost values. This is to avoid regressions coming simply
// from moving "masked load/store" check from legality to cost model.		// from moving "masked load/store" check from legality to cost model.
// Masked Load/Gather emulation was previously never allowed.		// Masked Load/Gather emulation was previously never allowed.
// Limited number of Masked Store/Scatter emulation was allowed.		// Limited number of Masked Store/Scatter emulation was allowed.
assert((isPredicatedInst(I, VF) \|\| Legal->isUniformMemOp(*I)) &&		assert((isPredicatedInst(I, VF)) &&
"Expecting a scalar emulated instruction");		"Expecting a scalar emulated instruction");
return isa<LoadInst>(I) \|\|		return isa<LoadInst>(I) \|\|
(isa<StoreInst>(I) &&		(isa<StoreInst>(I) &&
NumPredStores > NumberOfStoresToPredicate);		NumPredStores > NumberOfStoresToPredicate);
}		}

void LoopVectorizationCostModel::collectInstsToScalarize(ElementCount VF) {		void LoopVectorizationCostModel::collectInstsToScalarize(ElementCount VF) {
// If we aren't vectorizing the loop, or if we've already collected the		// If we aren't vectorizing the loop, or if we've already collected the
▲ Show 20 Lines • Show All 4,511 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/RISCV/uniform-load-store.ll

	Show First 20 Lines • Show All 576 Lines • ▼ Show 20 Lines
	; FIXEDLEN-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]			; FIXEDLEN-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
	; FIXEDLEN-NEXT: store i64 [[V]], ptr [[ARRAYIDX]], align 8			; FIXEDLEN-NEXT: store i64 [[V]], ptr [[ARRAYIDX]], align 8
	; FIXEDLEN-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; FIXEDLEN-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; FIXEDLEN-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024			; FIXEDLEN-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024
	; FIXEDLEN-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]]			; FIXEDLEN-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]]
	; FIXEDLEN: for.end:			; FIXEDLEN: for.end:
	; FIXEDLEN-NEXT: ret void			; FIXEDLEN-NEXT: ret void
	;			;
	; TF-SCALABLE-LABEL: @uniform_load_unaligned(			; TF-SCALABLE-LABEL: @uniform_load_unaligned(
				david-armUnsubmitted Not Done Reply Inline Actions I tried vectorising these tests with SVE (`opt -loop-vectorize -mattr=+sve -mtriple=aarch64-linux-gnu < ../llvm/test/Transforms/LoopVectorize/RISCV/uniform-load-store.ll -S`) and it already works today without this patch. In fact it works both and without tail-folding enabled. Do you know why RISC-V is different to SVE and why this patch is needed to vectorise? david-arm: I tried vectorising these tests with SVE (`opt -loop-vectorize -mattr=+sve -mtriple=aarch64…
				reamesAuthorUnsubmitted Done Reply Inline Actions This is just a guess, but... aarch64 and riscv have different sets of legal scatters and gathers. The widening cost currently bails for illegal scatters we don't know we can scalarize. If aarch64 was picking scatter, but then optimizing to a scalarization, we'd see something like this. This doesn't explain why aarch64 is able to handle unaligned uniform loads and riscv isn't though. I didn't dig into that in detail. This type of difference isn't particularly surprising though. This code is delicate and the same result can be reached through multiple event chains. That doesn't mean we shouldn't fix issues when we find them. reames: This is just a guess, but... aarch64 and riscv have different sets of legal scatters and…
				david-armUnsubmitted Not Done Reply Inline Actions That seems like a good explanation. I imagined it probably had something to do with gathers/scatters. david-arm: That seems like a good explanation. I imagined it probably had something to do with…
	; TF-SCALABLE-NEXT: entry:			; TF-SCALABLE-NEXT: entry:
				; TF-SCALABLE-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
				; TF-SCALABLE-NEXT: [[TMP1:%.*]] = icmp ult i64 -1025, [[TMP0]]
				; TF-SCALABLE-NEXT: br i1 [[TMP1]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; TF-SCALABLE: vector.ph:
				; TF-SCALABLE-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
				; TF-SCALABLE-NEXT: [[TMP3:%.*]] = call i64 @llvm.vscale.i64()
				; TF-SCALABLE-NEXT: [[TMP4:%.*]] = sub i64 [[TMP3]], 1
				; TF-SCALABLE-NEXT: [[N_RND_UP:%.*]] = add i64 1024, [[TMP4]]
				; TF-SCALABLE-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP2]]
				; TF-SCALABLE-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
				; TF-SCALABLE-NEXT: br label [[VECTOR_BODY:%.*]]
				; TF-SCALABLE: vector.body:
				; TF-SCALABLE-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; TF-SCALABLE-NEXT: [[TMP5:%.*]] = add i64 [[INDEX]], 0
				; TF-SCALABLE-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <vscale x 1 x i1> @llvm.get.active.lane.mask.nxv1i1.i64(i64 [[TMP5]], i64 1024)
				; TF-SCALABLE-NEXT: [[TMP6:%.]] = load i64, ptr [[B:%.]], align 1
				; TF-SCALABLE-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 1 x i64> poison, i64 [[TMP6]], i32 0
				; TF-SCALABLE-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 1 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer
				; TF-SCALABLE-NEXT: [[TMP7:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP5]]
				; TF-SCALABLE-NEXT: [[TMP8:%.*]] = getelementptr inbounds i64, ptr [[TMP7]], i32 0
				; TF-SCALABLE-NEXT: call void @llvm.masked.store.nxv1i64.p0(<vscale x 1 x i64> [[BROADCAST_SPLAT]], ptr [[TMP8]], i32 8, <vscale x 1 x i1> [[ACTIVE_LANE_MASK]])
				; TF-SCALABLE-NEXT: [[TMP9:%.*]] = call i64 @llvm.vscale.i64()
				; TF-SCALABLE-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], [[TMP9]]
				; TF-SCALABLE-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; TF-SCALABLE-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
				; TF-SCALABLE: middle.block:
				; TF-SCALABLE-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; TF-SCALABLE: scalar.ph:
				; TF-SCALABLE-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; TF-SCALABLE-NEXT: br label [[FOR_BODY:%.*]]			; TF-SCALABLE-NEXT: br label [[FOR_BODY:%.*]]
	; TF-SCALABLE: for.body:			; TF-SCALABLE: for.body:
	; TF-SCALABLE-NEXT: [[IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.*]], [[FOR_BODY]] ]			; TF-SCALABLE-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
	; TF-SCALABLE-NEXT: [[V:%.]] = load i64, ptr [[B:%.]], align 1			; TF-SCALABLE-NEXT: [[V:%.*]] = load i64, ptr [[B]], align 1
	; TF-SCALABLE-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[IV]]			; TF-SCALABLE-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
	; TF-SCALABLE-NEXT: store i64 [[V]], ptr [[ARRAYIDX]], align 8			; TF-SCALABLE-NEXT: store i64 [[V]], ptr [[ARRAYIDX]], align 8
	; TF-SCALABLE-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; TF-SCALABLE-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; TF-SCALABLE-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024			; TF-SCALABLE-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024
	; TF-SCALABLE-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]			; TF-SCALABLE-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
	; TF-SCALABLE: for.end:			; TF-SCALABLE: for.end:
	; TF-SCALABLE-NEXT: ret void			; TF-SCALABLE-NEXT: ret void
	;			;
	; TF-FIXEDLEN-LABEL: @uniform_load_unaligned(			; TF-FIXEDLEN-LABEL: @uniform_load_unaligned(
	; TF-FIXEDLEN-NEXT: entry:			; TF-FIXEDLEN-NEXT: entry:
	; TF-FIXEDLEN-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; TF-FIXEDLEN-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; TF-FIXEDLEN: vector.ph:			; TF-FIXEDLEN: vector.ph:
	; TF-FIXEDLEN-NEXT: br label [[VECTOR_BODY:%.*]]			; TF-FIXEDLEN-NEXT: br label [[VECTOR_BODY:%.*]]
	▲ Show 20 Lines • Show All 145 Lines • ▼ Show 20 Lines
	; TF-SCALABLE-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <vscale x 1 x i1> @llvm.get.active.lane.mask.nxv1i1.i64(i64 [[TMP5]], i64 1024)			; TF-SCALABLE-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <vscale x 1 x i1> @llvm.get.active.lane.mask.nxv1i1.i64(i64 [[TMP5]], i64 1024)
	; TF-SCALABLE-NEXT: store i64 [[V]], ptr [[B:%.*]], align 8			; TF-SCALABLE-NEXT: store i64 [[V]], ptr [[B:%.*]], align 8
	; TF-SCALABLE-NEXT: [[TMP6:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP5]]			; TF-SCALABLE-NEXT: [[TMP6:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP5]]
	; TF-SCALABLE-NEXT: [[TMP7:%.*]] = getelementptr inbounds i64, ptr [[TMP6]], i32 0			; TF-SCALABLE-NEXT: [[TMP7:%.*]] = getelementptr inbounds i64, ptr [[TMP6]], i32 0
	; TF-SCALABLE-NEXT: call void @llvm.masked.store.nxv1i64.p0(<vscale x 1 x i64> [[BROADCAST_SPLAT]], ptr [[TMP7]], i32 8, <vscale x 1 x i1> [[ACTIVE_LANE_MASK]])			; TF-SCALABLE-NEXT: call void @llvm.masked.store.nxv1i64.p0(<vscale x 1 x i64> [[BROADCAST_SPLAT]], ptr [[TMP7]], i32 8, <vscale x 1 x i1> [[ACTIVE_LANE_MASK]])
	; TF-SCALABLE-NEXT: [[TMP8:%.*]] = call i64 @llvm.vscale.i64()			; TF-SCALABLE-NEXT: [[TMP8:%.*]] = call i64 @llvm.vscale.i64()
	; TF-SCALABLE-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], [[TMP8]]			; TF-SCALABLE-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], [[TMP8]]
	; TF-SCALABLE-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; TF-SCALABLE-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; TF-SCALABLE-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]			; TF-SCALABLE-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
	; TF-SCALABLE: middle.block:			; TF-SCALABLE: middle.block:
	; TF-SCALABLE-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]			; TF-SCALABLE-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; TF-SCALABLE: scalar.ph:			; TF-SCALABLE: scalar.ph:
	; TF-SCALABLE-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; TF-SCALABLE-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; TF-SCALABLE-NEXT: br label [[FOR_BODY:%.*]]			; TF-SCALABLE-NEXT: br label [[FOR_BODY:%.*]]
	; TF-SCALABLE: for.body:			; TF-SCALABLE: for.body:
	; TF-SCALABLE-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]			; TF-SCALABLE-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
	; TF-SCALABLE-NEXT: store i64 [[V]], ptr [[B]], align 8			; TF-SCALABLE-NEXT: store i64 [[V]], ptr [[B]], align 8
	; TF-SCALABLE-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]			; TF-SCALABLE-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
	; TF-SCALABLE-NEXT: store i64 [[V]], ptr [[ARRAYIDX]], align 8			; TF-SCALABLE-NEXT: store i64 [[V]], ptr [[ARRAYIDX]], align 8
	; TF-SCALABLE-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; TF-SCALABLE-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; TF-SCALABLE-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024			; TF-SCALABLE-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024
	; TF-SCALABLE-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]			; TF-SCALABLE-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]
	; TF-SCALABLE: for.end:			; TF-SCALABLE: for.end:
	; TF-SCALABLE-NEXT: ret void			; TF-SCALABLE-NEXT: ret void
	;			;
	; TF-FIXEDLEN-LABEL: @uniform_store(			; TF-FIXEDLEN-LABEL: @uniform_store(
	; TF-FIXEDLEN-NEXT: entry:			; TF-FIXEDLEN-NEXT: entry:
	; TF-FIXEDLEN-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; TF-FIXEDLEN-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; TF-FIXEDLEN: vector.ph:			; TF-FIXEDLEN: vector.ph:
	; TF-FIXEDLEN-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <2 x i64> poison, i64 [[V:%.]], i32 0			; TF-FIXEDLEN-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <2 x i64> poison, i64 [[V:%.]], i32 0
	▲ Show 20 Lines • Show All 407 Lines • ▼ Show 20 Lines
	; FIXEDLEN-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; FIXEDLEN-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; FIXEDLEN-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024			; FIXEDLEN-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024
	; FIXEDLEN-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP17:![0-9]+]]			; FIXEDLEN-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP17:![0-9]+]]
	; FIXEDLEN: for.end:			; FIXEDLEN: for.end:
	; FIXEDLEN-NEXT: ret void			; FIXEDLEN-NEXT: ret void
	;			;
	; TF-SCALABLE-LABEL: @uniform_store_unaligned(			; TF-SCALABLE-LABEL: @uniform_store_unaligned(
	; TF-SCALABLE-NEXT: entry:			; TF-SCALABLE-NEXT: entry:
				; TF-SCALABLE-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
				; TF-SCALABLE-NEXT: [[TMP1:%.*]] = icmp ult i64 -1025, [[TMP0]]
				; TF-SCALABLE-NEXT: br i1 [[TMP1]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; TF-SCALABLE: vector.ph:
				; TF-SCALABLE-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
				; TF-SCALABLE-NEXT: [[TMP3:%.*]] = call i64 @llvm.vscale.i64()
				; TF-SCALABLE-NEXT: [[TMP4:%.*]] = sub i64 [[TMP3]], 1
				; TF-SCALABLE-NEXT: [[N_RND_UP:%.*]] = add i64 1024, [[TMP4]]
				; TF-SCALABLE-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP2]]
				; TF-SCALABLE-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
				; TF-SCALABLE-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <vscale x 1 x i64> poison, i64 [[V:%.]], i32 0
				; TF-SCALABLE-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 1 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer
				; TF-SCALABLE-NEXT: br label [[VECTOR_BODY:%.*]]
				; TF-SCALABLE: vector.body:
				; TF-SCALABLE-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; TF-SCALABLE-NEXT: [[TMP5:%.*]] = add i64 [[INDEX]], 0
				; TF-SCALABLE-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <vscale x 1 x i1> @llvm.get.active.lane.mask.nxv1i1.i64(i64 [[TMP5]], i64 1024)
				; TF-SCALABLE-NEXT: store i64 [[V]], ptr [[B:%.*]], align 1
				; TF-SCALABLE-NEXT: [[TMP6:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP5]]
				; TF-SCALABLE-NEXT: [[TMP7:%.*]] = getelementptr inbounds i64, ptr [[TMP6]], i32 0
				; TF-SCALABLE-NEXT: call void @llvm.masked.store.nxv1i64.p0(<vscale x 1 x i64> [[BROADCAST_SPLAT]], ptr [[TMP7]], i32 8, <vscale x 1 x i1> [[ACTIVE_LANE_MASK]])
				; TF-SCALABLE-NEXT: [[TMP8:%.*]] = call i64 @llvm.vscale.i64()
				; TF-SCALABLE-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], [[TMP8]]
				; TF-SCALABLE-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; TF-SCALABLE-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
				; TF-SCALABLE: middle.block:
				; TF-SCALABLE-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; TF-SCALABLE: scalar.ph:
				; TF-SCALABLE-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; TF-SCALABLE-NEXT: br label [[FOR_BODY:%.*]]			; TF-SCALABLE-NEXT: br label [[FOR_BODY:%.*]]
	; TF-SCALABLE: for.body:			; TF-SCALABLE: for.body:
	; TF-SCALABLE-NEXT: [[IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.*]], [[FOR_BODY]] ]			; TF-SCALABLE-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
	; TF-SCALABLE-NEXT: store i64 [[V:%.]], ptr [[B:%.]], align 1			; TF-SCALABLE-NEXT: store i64 [[V]], ptr [[B]], align 1
	; TF-SCALABLE-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[IV]]			; TF-SCALABLE-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
	; TF-SCALABLE-NEXT: store i64 [[V]], ptr [[ARRAYIDX]], align 8			; TF-SCALABLE-NEXT: store i64 [[V]], ptr [[ARRAYIDX]], align 8
	; TF-SCALABLE-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; TF-SCALABLE-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; TF-SCALABLE-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024			; TF-SCALABLE-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024
	; TF-SCALABLE-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]			; TF-SCALABLE-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]]
	; TF-SCALABLE: for.end:			; TF-SCALABLE: for.end:
	; TF-SCALABLE-NEXT: ret void			; TF-SCALABLE-NEXT: ret void
	;			;
	; TF-FIXEDLEN-LABEL: @uniform_store_unaligned(			; TF-FIXEDLEN-LABEL: @uniform_store_unaligned(
	; TF-FIXEDLEN-NEXT: entry:			; TF-FIXEDLEN-NEXT: entry:
	; TF-FIXEDLEN-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; TF-FIXEDLEN-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; TF-FIXEDLEN: vector.ph:			; TF-FIXEDLEN: vector.ph:
	; TF-FIXEDLEN-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <2 x i64> poison, i64 [[V:%.]], i32 0			; TF-FIXEDLEN-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <2 x i64> poison, i64 [[V:%.]], i32 0
	▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/first-order-recurrence-sink-replicate-region.ll

	Show First 20 Lines • Show All 467 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: EMIT vp<[[CAN_IV:%.+]]> = CANONICAL-INDUCTION			; CHECK-NEXT: EMIT vp<[[CAN_IV:%.+]]> = CANONICAL-INDUCTION
	; CHECK-NEXT: FIRST-ORDER-RECURRENCE-PHI ir<%.pn> = phi ir<0>, ir<[[L:%.+]]>			; CHECK-NEXT: FIRST-ORDER-RECURRENCE-PHI ir<%.pn> = phi ir<0>, ir<[[L:%.+]]>
	; CHECK-NEXT: vp<[[SCALAR_STEPS:%.+]]> = SCALAR-STEPS vp<[[CAN_IV]]>, ir<2>, ir<1>			; CHECK-NEXT: vp<[[SCALAR_STEPS:%.+]]> = SCALAR-STEPS vp<[[CAN_IV]]>, ir<2>, ir<1>
	; CHECK-NEXT: EMIT vp<[[WIDE_IV:%.+]]> = WIDEN-CANONICAL-INDUCTION vp<[[CAN_IV]]>			; CHECK-NEXT: EMIT vp<[[WIDE_IV:%.+]]> = WIDEN-CANONICAL-INDUCTION vp<[[CAN_IV]]>
	; CHECK-NEXT: EMIT vp<[[CMP:%.+]]> = icmp ule vp<[[WIDE_IV]]> vp<[[BTC]]>			; CHECK-NEXT: EMIT vp<[[CMP:%.+]]> = icmp ule vp<[[WIDE_IV]]> vp<[[BTC]]>
	; CHECK-NEXT: Successor(s): loop.0			; CHECK-NEXT: Successor(s): loop.0
	; CHECK-EMPTY:			; CHECK-EMPTY:
	; CHECK-NEXT: loop.0:			; CHECK-NEXT: loop.0:
	; CHECK-NEXT: REPLICATE ir<[[L]]> = load ir<%src>			; CHECK-NEXT: CLONE ir<[[L]]> = load ir<%src>
	; CHECK-NEXT: EMIT vp<[[SPLICE:%.+]]> = first-order splice ir<%.pn> ir<[[L]]>			; CHECK-NEXT: EMIT vp<[[SPLICE:%.+]]> = first-order splice ir<%.pn> ir<[[L]]>
	; CHECK-NEXT: Successor(s): loop.0.split			; CHECK-NEXT: Successor(s): loop.0.split
	; CHECK-EMPTY:			; CHECK-EMPTY:
	; CHECK-NEXT: loop.0.split:			; CHECK-NEXT: loop.0.split:
	; CHECK-NEXT: Successor(s): pred.store			; CHECK-NEXT: Successor(s): pred.store
	; CHECK-EMPTY:			; CHECK-EMPTY:
	; CHECK-NEXT: <xVFxUF> pred.store: {			; CHECK-NEXT: <xVFxUF> pred.store: {
	; CHECK-NEXT: pred.store.entry:			; CHECK-NEXT: pred.store.entry:
	▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/pr46525-expander-insertpoint.ll

	Show All 22 Lines
	; CHECK-NEXT: [[N_RND_UP:%.*]] = add i64 [[TMP2]], 1			; CHECK-NEXT: [[N_RND_UP:%.*]] = add i64 [[TMP2]], 1
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], 2			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], 2
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
	; CHECK-NEXT: [[IND_END:%.*]] = mul i64 [[N_VEC]], [[INC]]			; CHECK-NEXT: [[IND_END:%.*]] = mul i64 [[N_VEC]], [[INC]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: store i32 0, i32* [[PTR:%.*]], align 4			; CHECK-NEXT: store i32 0, i32* [[PTR:%.*]], align 4
	; CHECK-NEXT: store i32 0, i32* [[PTR]], align 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 2			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 2
	; CHECK-NEXT: [[TMP3:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP3:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP3]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP3]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: br i1 true, label [[LOOP_EXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[LOOP_EXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 0, [[LOOP_PREHEADER]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 0, [[LOOP_PREHEADER]] ]
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/vplan-sink-scalars-and-merge.ll

	Show First 20 Lines • Show All 247 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: <x1> vector loop: {			; CHECK-NEXT: <x1> vector loop: {
	; CHECK-NEXT: vector.body:			; CHECK-NEXT: vector.body:
	; CHECK-NEXT: EMIT vp<[[CAN_IV:%.+]]> = CANONICAL-INDUCTION			; CHECK-NEXT: EMIT vp<[[CAN_IV:%.+]]> = CANONICAL-INDUCTION
	; CHECK-NEXT: WIDEN-INDUCTION %iv = phi 21, %iv.next, ir<1>			; CHECK-NEXT: WIDEN-INDUCTION %iv = phi 21, %iv.next, ir<1>
	; CHECK-NEXT: vp<[[STEPS:%.+]]> = SCALAR-STEPS vp<[[CAN_IV]]>, ir<21>, ir<1>			; CHECK-NEXT: vp<[[STEPS:%.+]]> = SCALAR-STEPS vp<[[CAN_IV]]>, ir<21>, ir<1>
	; CHECK-NEXT: EMIT vp<[[WIDE_CAN_IV:%.+]]> = WIDEN-CANONICAL-INDUCTION vp<[[CAN_IV]]>			; CHECK-NEXT: EMIT vp<[[WIDE_CAN_IV:%.+]]> = WIDEN-CANONICAL-INDUCTION vp<[[CAN_IV]]>
	; CHECK-NEXT: EMIT vp<[[MASK:%.+]]> = icmp ule vp<[[WIDE_CAN_IV]]> vp<[[BTC]]>			; CHECK-NEXT: EMIT vp<[[MASK:%.+]]> = icmp ule vp<[[WIDE_CAN_IV]]> vp<[[BTC]]>
	; CHECK-NEXT: CLONE ir<%gep.A.uniform> = getelementptr ir<%A>, ir<0>			; CHECK-NEXT: CLONE ir<%gep.A.uniform> = getelementptr ir<%A>, ir<0>
	; CHECK-NEXT: REPLICATE ir<%lv> = load ir<%gep.A.uniform>			; CHECK-NEXT: CLONE ir<%lv> = load ir<%gep.A.uniform>
	; CHECK-NEXT: WIDEN ir<%cmp> = icmp ir<%iv>, ir<%k>			; CHECK-NEXT: WIDEN ir<%cmp> = icmp ir<%iv>, ir<%k>
	; CHECK-NEXT: Successor(s): loop.then			; CHECK-NEXT: Successor(s): loop.then
	; CHECK-EMPTY:			; CHECK-EMPTY:
	; CHECK-NEXT: loop.then:			; CHECK-NEXT: loop.then:
	; CHECK-NEXT: EMIT vp<[[NOT2:%.+]]> = not ir<%cmp>			; CHECK-NEXT: EMIT vp<[[NOT2:%.+]]> = not ir<%cmp>
	; CHECK-NEXT: EMIT vp<[[MASK2:%.+]]> = select vp<[[MASK]]> vp<[[NOT2]]> ir<false>			; CHECK-NEXT: EMIT vp<[[MASK2:%.+]]> = select vp<[[MASK]]> vp<[[NOT2]]> ir<false>
	; CHECK-NEXT: Successor(s): pred.store			; CHECK-NEXT: Successor(s): pred.store
	; CHECK-EMPTY:			; CHECK-EMPTY:
	▲ Show 20 Lines • Show All 866 Lines • Show Last 20 Lines