This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
1/2
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
AArch64/
3/8
eliminate-tail-predication.ll
-
sve-tail-folding.ll
-
RISCV/
-
short-trip-count.ll
-
X86/
1/2
constant-fold.ll
-
optsize.ll
-
pr34438.ll
-
vect.omp.force.small-tc.ll
-
create-induction-resume.ll
-
dont-fold-tail-for-const-TC.ll
1/2
dont-fold-tail-for-divisible-TC.ll
-
pr39417-optsize-scevchecks.ll
-
pr44488-predication.ll

Differential D154314

[LV] Remove the reminder loop if we know the mask is always true
AbandonedPublic

Authored by Allen on Jul 2 2023, 7:40 PM.

Download Raw Diff

Details

Reviewers

david-arm
sdesmalen
reames
dmgreen
fhahn

Summary

We check the loop trip count is known a power of 2 to determine
whether the tail loop can be eliminated in D146199.
However, the remainder loop of mask scalable loop can also be removed
If we know the mask is always going to be true for every vector iteration.

Fix https://github.com/llvm/llvm-project/issues/63616.

Diff Detail

Event Timeline

Allen created this revision.Jul 2 2023, 7:40 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 2 2023, 7:40 PM

Herald added subscribers: artagnon, StephenFan, hiraditya. · View Herald Transcript

Allen requested review of this revision.Jul 2 2023, 7:40 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 2 2023, 7:40 PM

Herald added subscribers: llvm-commits, alextsao1999. · View Herald Transcript

Harbormaster completed remote builds in B242722: Diff 536646.Jul 2 2023, 8:19 PM

david-arm added inline comments.Jul 3 2023, 1:34 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5180	I think this looks like a regression to me. If we know the mask is always going to be true for every vector iteration, then we should be using unpredicated loops instead even if the target supports tail-folding. That's because creating and maintaining the loop predicate is always going to be a bit more expensive than maintaining a simple integer induction variable. I believe gcc's code in https://github.com/llvm/llvm-project/issues/63616 is worse than clang, although perhaps it's worth testing this on AArch64 hardware to confirm?

Allen added inline comments.Jul 5 2023, 6:48 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5180	Thanks for your idea. I test their performance , and it doesn't look significantly different base on aarch64 target (the length of scalable vector is 256). But on the x86 target,it seems clang is better (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99411), so I agree with you, this is not a problem.

Herald added a subscriber: wangpc. · View Herald TranscriptJul 5 2023, 6:48 AM

Allen updated this revision to Diff 537578.Jul 5 2023, 9:47 PM

Allen retitled this revision from [LV] Prefer the tail fold according the the user hint to Remove the reminder loop if we know the mask is always true.

Allen edited the summary of this revision. (Show Details)

Allen retitled this revision from Remove the reminder loop if we know the mask is always true to [LV] Remove the reminder loop if we know the mask is always true.

Harbormaster completed remote builds in B243368: Diff 537578.Jul 5 2023, 10:38 PM

david-arm added inline comments.Jul 6 2023, 1:39 AM

llvm/test/Transforms/LoopVectorize/AArch64/eliminate-tail-predication.ll
34	I'm guessing that InstCombine does not determine this is guaranteed to always be true? However, I thought that someone did work in the DAGCombiner that will replace this with br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]] when vscale is known to be a power of 2? Are you hoping to benefit from eliminating the scalar tail in IR because it helps us to make better decisions later in the pipeline? I can imagine it's beneficial for LTO where the scalar tail could prevent inlining. If I remember correctly one of the problems with folding away the icmp in InstCombine is that it doesn't have access to the TTI interface so we cannot query the target.
llvm/test/Transforms/LoopVectorize/X86/constant-fold.ll
30	For all the fixed-lenth vector tests this icmp will get replaced with "i1 true" by InstCombine so the scalar tail should get automatically deleted.
llvm/test/Transforms/LoopVectorize/dont-fold-tail-for-divisible-TC.ll
4	Hmm, I know this is not introduced by your patch, but I don't think we should have tests with target-specifics in the top level LoopVectorize directory.

Allen added inline comments.Jul 6 2023, 1:51 AM

llvm/test/Transforms/LoopVectorize/AArch64/eliminate-tail-predication.ll
34	I may not have caught your idea, are you saying that the current optimization needs to be handled in combinine ?

Fix Transforms/LoopVectorize/RISCV/short-trip-count.ll

Herald added subscribers: luke, frasercrmck, luismarques and 20 others. · View Herald TranscriptJul 6 2023, 2:06 AM

Harbormaster completed remote builds in B243406: Diff 537629.Jul 6 2023, 3:02 AM

Allen added inline comments.Jul 7 2023, 12:59 AM

llvm/test/Transforms/LoopVectorize/X86/constant-fold.ll
30	Yes, the remainder loop body will be deleted as the trip count is multiple of VF, so it will not be executed.
llvm/test/Transforms/LoopVectorize/dont-fold-tail-for-divisible-TC.ll
4	Thanks. Do you have any plans to reposition them? Or where do you want me to put them that would look more appropriate?

david-arm added inline comments.Jul 7 2023, 5:00 AM

llvm/test/Transforms/LoopVectorize/AArch64/eliminate-tail-predication.ll
34	Well, I'm just trying to understand what this patch is trying to achieve that's all. I'm not against it because it does clean up the IR generated by the vectoriser. However, I'm not sure if expect you to see many real performance gains from doing this because we should delete the scalar tail during codegen. It might also be worth investigating whether or not InstCombine already optimises the urem calculation, similar to what this patch (https://reviews.llvm.org/D129609) did in codegen.

Allen added inline comments.Jul 8 2023, 1:41 AM

llvm/test/Transforms/LoopVectorize/AArch64/eliminate-tail-predication.ll
34	Yes, I think it's going to benefit beside the codesize in the following scenario. When the loop is a nested loop, the trip count of the inner loop is an power of 2, so eliminating the tail block reduces branch jumps. BTW, the AArch64 backend already supported the isVScaleKnownToBeAPowerOfTwo in patch (https://reviews.llvm.org/D146199).

david-arm added inline comments.Jul 10 2023, 1:19 AM

llvm/test/Transforms/LoopVectorize/AArch64/eliminate-tail-predication.ll
34	Sure, you're absolutely right about about the benefit of deleting the tail! I'm just trying to say that we should be deleting the tail already as part of codegen. If you look at the final assembly output for the loop I hope that there is no tail because the DAGCombiner + MachineDCE should have killed it off. So this patch is then about trying to delete the tail even earlier in IR. It's a nice cleanup, but I was thinking it would be nice if InstCombine could do this for us because that would be even more powerful.

There are already function attributes for specifying the minimum and maximum vscale vector lengths through vscale_range. I think it should be expanded to include whether the vscale is know to be a power-of-two. That would allow Instcombine and value tracking to properly reason about it in other situations too, as well as removing the dead tails in cases like this.

That doesn't mean this patch is not a good idea too, as we already know the tail is unnecessary in the vectorizer and removing it earlier will have benefits.

v01dXYZ added a subscriber: v01dXYZ.Jul 12 2023, 2:36 AM

v01dXYZ added inline comments.

llvm/test/Transforms/LoopVectorize/AArch64/eliminate-tail-predication.ll
34	Do you know where to start in order to understand how the two following passes can possibly simplify the comparison result ? `InstCombine` `DAGCombiner` They would need to propagate the fact vscale is a power of two within a given interval in order to simplify `urem` operations, similar to what is done by interval partition or scalar evolution. Do you think those two passes are powerful enough to do this kind of analysis ?

v01dXYZ added inline comments.Jul 12 2023, 2:42 AM

llvm/test/Transforms/LoopVectorize/AArch64/eliminate-tail-predication.ll
34	*edit: I meant range analysis, not interval partition.

Allen planned changes to this revision.Jul 12 2023, 3:59 AM

Allen added inline comments.

llvm/test/Transforms/LoopVectorize/AArch64/eliminate-tail-predication.ll
34	the compare is insert in LoopVectorize.cpp:3255 vscale is a power of two is discussed in https://reviews.llvm.org/D154953

Allen abandoned this revision.Jul 17 2023, 5:36 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

11 lines

test/

Transforms/

LoopVectorize/

AArch64/

eliminate-tail-predication.ll

3 lines

sve-tail-folding.ll

3 lines

RISCV/

short-trip-count.ll

6 lines

X86/

constant-fold.ll

3 lines

optsize.ll

6 lines

pr34438.ll

3 lines

vect.omp.force.small-tc.ll

3 lines

create-induction-resume.ll

3 lines

dont-fold-tail-for-const-TC.ll

3 lines

dont-fold-tail-for-divisible-TC.ll

6 lines

pr39417-optsize-scevchecks.ll

3 lines

pr44488-predication.ll

3 lines

Diff 537629

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

	Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines
	return TTI.getPreferredTailFoldingStyle(IVUpdateMayOverflow);			return TTI.getPreferredTailFoldingStyle(IVUpdateMayOverflow);
	}			}

	/// Returns true if all loop blocks should be masked to fold tail loop.			/// Returns true if all loop blocks should be masked to fold tail loop.
	bool foldTailByMasking() const {			bool foldTailByMasking() const {
	return getTailFoldingStyle() != TailFoldingStyle::None;			return getTailFoldingStyle() != TailFoldingStyle::None;
	}			}

				/// Returns true if current loop can delete the remainder.
				bool RemoveFoldTail() const {
				return CanRemoveFoldTail;
				}

	/// Returns true if the instructions in this block requires predication			/// Returns true if the instructions in this block requires predication
	/// for any reason, e.g. because tail folding now requires a predicate			/// for any reason, e.g. because tail folding now requires a predicate
	/// or because the block in the original loop was predicated.			/// or because the block in the original loop was predicated.
	bool blockNeedsPredicationForAnyReason(BasicBlock *BB) const {			bool blockNeedsPredicationForAnyReason(BasicBlock *BB) const {
	return foldTailByMasking() \|\| Legal->blockNeedsPredication(BB);			return foldTailByMasking() \|\| Legal->blockNeedsPredication(BB);
	}			}

	/// A SmallMapVector to store the InLoop reduction op chains, mapping phi			/// A SmallMapVector to store the InLoop reduction op chains, mapping phi
	▲ Show 20 Lines • Show All 150 Lines • ▼ Show 20 Lines
	/// or as a peel-loop to handle gaps in interleave-groups.			/// or as a peel-loop to handle gaps in interleave-groups.
	/// Under optsize and when the trip count is very small we don't allow any			/// Under optsize and when the trip count is very small we don't allow any
	/// iterations to execute in the scalar loop.			/// iterations to execute in the scalar loop.
	ScalarEpilogueLowering ScalarEpilogueStatus = CM_ScalarEpilogueAllowed;			ScalarEpilogueLowering ScalarEpilogueStatus = CM_ScalarEpilogueAllowed;

	/// All blocks of loop are to be masked to fold tail of scalar iterations.			/// All blocks of loop are to be masked to fold tail of scalar iterations.
	bool CanFoldTailByMasking = false;			bool CanFoldTailByMasking = false;

				/// Don't need the remainder because the trip count is multiples of VF.
				bool CanRemoveFoldTail = false;

	/// A map holding scalar costs for different vectorization factors. The			/// A map holding scalar costs for different vectorization factors. The
	/// presence of a cost for an instruction in the mapping indicates that the			/// presence of a cost for an instruction in the mapping indicates that the
	/// instruction will be scalarized when vectorizing with the associated			/// instruction will be scalarized when vectorizing with the associated
	/// vectorization factor. The entries are VF-ScalarCostTy pairs.			/// vectorization factor. The entries are VF-ScalarCostTy pairs.
	DenseMap<ElementCount, ScalarCostsTy> InstsToScalarize;			DenseMap<ElementCount, ScalarCostsTy> InstsToScalarize;

	/// Holds the instructions known to be uniform after vectorization.			/// Holds the instructions known to be uniform after vectorization.
	/// The data is collected per VF.			/// The data is collected per VF.
	▲ Show 20 Lines • Show All 182 Lines • ▼ Show 20 Lines
	// all of the iterations in the first vector loop. Three cases:			// all of the iterations in the first vector loop. Three cases:
	// 1) If we require a scalar epilogue, there is no conditional branch as			// 1) If we require a scalar epilogue, there is no conditional branch as
	// we unconditionally branch to the scalar preheader. Do nothing.			// we unconditionally branch to the scalar preheader. Do nothing.
	// 2) If (N - N%VF) == N, then we don't need to run the remainder.			// 2) If (N - N%VF) == N, then we don't need to run the remainder.
	// Thus if tail is to be folded, we know we don't need to run the			// Thus if tail is to be folded, we know we don't need to run the
	// remainder and we can use the previous value for the condition (true).			// remainder and we can use the previous value for the condition (true).
	// 3) Otherwise, construct a runtime check.			// 3) Otherwise, construct a runtime check.
	if (!Cost->requiresScalarEpilogue(VF.isVector()) &&			if (!Cost->requiresScalarEpilogue(VF.isVector()) &&
	!Cost->foldTailByMasking()) {			!(Cost->foldTailByMasking() \|\| Cost->RemoveFoldTail())) {
	Instruction *CmpN = CmpInst::Create(Instruction::ICmp, CmpInst::ICMP_EQ,			Instruction *CmpN = CmpInst::Create(Instruction::ICmp, CmpInst::ICMP_EQ,
	Count, VectorTripCount, "cmp.n",			Count, VectorTripCount, "cmp.n",
	LoopMiddleBlock->getTerminator());			LoopMiddleBlock->getTerminator());

	// Here we use the same DebugLoc as the scalar loop latch terminator instead			// Here we use the same DebugLoc as the scalar loop latch terminator instead
	// of the corresponding compare because they may have ended up with			// of the corresponding compare because they may have ended up with
	// different line numbers and we want to avoid awkward line stepping while			// different line numbers and we want to avoid awkward line stepping while
	// debugging. Eg. if the compare has got a line number inside the loop.			// debugging. Eg. if the compare has got a line number inside the loop.
	▲ Show 20 Lines • Show All 145 Lines • ▼ Show 20 Lines
	}			}

	// Now try the tail folding			// Now try the tail folding

	// Invalidate interleave groups that require an epilogue if we can't mask			// Invalidate interleave groups that require an epilogue if we can't mask
	// the interleave-group.			// the interleave-group.
	if (!useMaskedInterleavedAccesses(TTI)) {			if (!useMaskedInterleavedAccesses(TTI)) {
	assert(WideningDecisions.empty() && Uniforms.empty() && Scalars.empty() &&			assert(WideningDecisions.empty() && Uniforms.empty() && Scalars.empty() &&
	"No decisions should have been taken at this point");			"No decisions should have been taken at this point");
				david-armUnsubmitted Not Done Reply Inline Actions I think this looks like a regression to me. If we know the mask is always going to be true for every vector iteration, then we should be using unpredicated loops instead even if the target supports tail-folding. That's because creating and maintaining the loop predicate is always going to be a bit more expensive than maintaining a simple integer induction variable. I believe gcc's code in https://github.com/llvm/llvm-project/issues/63616 is worse than clang, although perhaps it's worth testing this on AArch64 hardware to confirm? david-arm: I think this looks like a regression to me. If we know the mask is always going to be true for…
				AllenAuthorUnsubmitted Done Reply Inline Actions Thanks for your idea. I test their performance , and it doesn't look significantly different base on aarch64 target (the length of scalable vector is 256). But on the x86 target,it seems clang is better (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99411), so I agree with you, this is not a problem. Allen: Thanks for your idea. I test their performance , and it doesn't look significantly different…
	// Note: There is no need to invalidate any cost modeling decisions here, as			// Note: There is no need to invalidate any cost modeling decisions here, as
	// non where taken so far.			// non where taken so far.
	InterleaveInfo.invalidateGroupsRequiringScalarEpilogue();			InterleaveInfo.invalidateGroupsRequiringScalarEpilogue();
	}			}

	FixedScalableVFPair MaxFactors = computeFeasibleMaxVF(TC, UserVF, true);			FixedScalableVFPair MaxFactors = computeFeasibleMaxVF(TC, UserVF, true);

	// Avoid tail folding if the trip count is known to be a multiple of any VF			// Avoid tail folding if the trip count is known to be a multiple of any VF
	Show All 20 Lines
	const SCEV *ExitCount = SE->getAddExpr(			const SCEV *ExitCount = SE->getAddExpr(
	BackedgeTakenCount, SE->getOne(BackedgeTakenCount->getType()));			BackedgeTakenCount, SE->getOne(BackedgeTakenCount->getType()));
	const SCEV *Rem = SE->getURemExpr(			const SCEV *Rem = SE->getURemExpr(
	SE->applyLoopGuards(ExitCount, TheLoop),			SE->applyLoopGuards(ExitCount, TheLoop),
	SE->getConstant(BackedgeTakenCount->getType(), MaxVFtimesIC));			SE->getConstant(BackedgeTakenCount->getType(), MaxVFtimesIC));
	if (Rem->isZero()) {			if (Rem->isZero()) {
	// Accept MaxFixedVF if we do not have a tail.			// Accept MaxFixedVF if we do not have a tail.
	LLVM_DEBUG(dbgs() << "LV: No tail will remain for any chosen VF.\n");			LLVM_DEBUG(dbgs() << "LV: No tail will remain for any chosen VF.\n");
				CanRemoveFoldTail = true;
	return MaxFactors;			return MaxFactors;
	}			}
	}			}

	// If we don't know the precise trip count, or if the trip count that we			// If we don't know the precise trip count, or if the trip count that we
	// found modulo the vectorization factor is not zero, try to fold the tail			// found modulo the vectorization factor is not zero, try to fold the tail
	// by masking.			// by masking.
	// FIXME: look for a smaller MaxVF that does divide TC rather than masking.			// FIXME: look for a smaller MaxVF that does divide TC rather than masking.
	▲ Show 20 Lines • Show All 91 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/eliminate-tail-predication.ll

	Show All 25 Lines
	; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds i32, ptr [[TMP5]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds i32, ptr [[TMP5]], i32 0
	; CHECK-NEXT: store <vscale x 4 x i32> shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer), ptr [[TMP6]], align 4			; CHECK-NEXT: store <vscale x 4 x i32> shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer), ptr [[TMP6]], align 4
	; CHECK-NEXT: [[TMP7:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP7:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP8:%.*]] = mul i64 [[TMP7]], 4			; CHECK-NEXT: [[TMP8:%.*]] = mul i64 [[TMP7]], 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP8]]			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP8]]
	; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, [[N_VEC]]			; CHECK-NEXT: br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]]
	david-armUnsubmitted Not Done Reply Inline Actions I'm guessing that InstCombine does not determine this is guaranteed to always be true? However, I thought that someone did work in the DAGCombiner that will replace this with br i1 true, label [[EXIT:%.]], label [[SCALAR_PH]] when vscale is known to be a power of 2? Are you hoping to benefit from eliminating the scalar tail in IR because it helps us to make better decisions later in the pipeline? I can imagine it's beneficial for LTO where the scalar tail could prevent inlining. If I remember correctly one of the problems with folding away the icmp in InstCombine is that it doesn't have access to the TTI interface so we cannot query the target. david-arm:* I'm guessing that InstCombine does not determine this is guaranteed to always be true? However…
	AllenAuthorUnsubmitted Done Reply Inline Actions I may not have caught your idea, are you saying that the current optimization needs to be handled in combinine ? Allen: I may not have caught your idea, are you saying that the current optimization needs to be…
	david-armUnsubmitted Not Done Reply Inline Actions Well, I'm just trying to understand what this patch is trying to achieve that's all. I'm not against it because it does clean up the IR generated by the vectoriser. However, I'm not sure if expect you to see many real performance gains from doing this because we should delete the scalar tail during codegen. It might also be worth investigating whether or not InstCombine already optimises the urem calculation, similar to what this patch (https://reviews.llvm.org/D129609) did in codegen. david-arm: Well, I'm just trying to understand what this patch is trying to achieve that's all. I'm not…
	AllenAuthorUnsubmitted Done Reply Inline Actions Yes, I think it's going to benefit beside the codesize in the following scenario. When the loop is a nested loop, the trip count of the inner loop is an power of 2, so eliminating the tail block reduces branch jumps. BTW, the AArch64 backend already supported the isVScaleKnownToBeAPowerOfTwo in patch (https://reviews.llvm.org/D146199). Allen: Yes, I think it's going to benefit beside the codesize in the following scenario. # When the…
	david-armUnsubmitted Not Done Reply Inline Actions Sure, you're absolutely right about about the benefit of deleting the tail! I'm just trying to say that we should be deleting the tail already as part of codegen. If you look at the final assembly output for the loop I hope that there is no tail because the DAGCombiner + MachineDCE should have killed it off. So this patch is then about trying to delete the tail even earlier in IR. It's a nice cleanup, but I was thinking it would be nice if InstCombine could do this for us because that would be even more powerful. david-arm: Sure, you're absolutely right about about the benefit of deleting the tail! I'm just trying to…
	v01dXYZUnsubmitted Not Done Reply Inline Actions Do you know where to start in order to understand how the two following passes can possibly simplify the comparison result ? `InstCombine` `DAGCombiner` They would need to propagate the fact vscale is a power of two within a given interval in order to simplify `urem` operations, similar to what is done by interval partition or scalar evolution. Do you think those two passes are powerful enough to do this kind of analysis ? v01dXYZ: Do you know where to start in order to understand how the two following passes can possibly…
	v01dXYZUnsubmitted Not Done Reply Inline Actions edit: I meant range analysis, not interval partition. v01dXYZ:* *edit: I meant range analysis, not interval partition.
	AllenAuthorUnsubmitted Done Reply Inline Actions the compare is insert in LoopVectorize.cpp:3255 vscale is a power of two is discussed in https://reviews.llvm.org/D154953 Allen: the compare is insert in LoopVectorize.cpp:3255 vscale is a power of two is discussed in https…
	; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[IV]]			; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[IV]]
	; CHECK-NEXT: store i32 1, ptr [[ARRAYIDX]], align 4			; CHECK-NEXT: store i32 1, ptr [[ARRAYIDX]], align 4
	; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	Show All 21 Lines

llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding.ll

	Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP6:%.*]] = getelementptr i32, ptr [[TMP5]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = getelementptr i32, ptr [[TMP5]], i32 0
	; CHECK-NEXT: store <vscale x 4 x i32> [[BROADCAST_SPLAT]], ptr [[TMP6]], align 4			; CHECK-NEXT: store <vscale x 4 x i32> [[BROADCAST_SPLAT]], ptr [[TMP6]], align 4
	; CHECK-NEXT: [[TMP7:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP7:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP8:%.*]] = mul i64 [[TMP7]], 4			; CHECK-NEXT: [[TMP8:%.*]] = mul i64 [[TMP7]], 4
	; CHECK-NEXT: [[INDEX_NEXT2]] = add nuw i64 [[INDEX1]], [[TMP8]]			; CHECK-NEXT: [[INDEX_NEXT2]] = add nuw i64 [[INDEX1]], [[TMP8]]
	; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT2]], [[N_VEC]]			; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT2]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP22:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP22:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, [[N_VEC]]			; CHECK-NEXT: br i1 true, label [[WHILE_END_LOOPEXIT:%.*]], label [[SCALAR_PH]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[WHILE_END_LOOPEXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[WHILE_BODY:%.*]]			; CHECK-NEXT: br label [[WHILE_BODY:%.*]]
	; CHECK: while.body:			; CHECK: while.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ [[INDEX_NEXT:%.]], [[WHILE_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ [[INDEX_NEXT:%.]], [[WHILE_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[GEP:%.*]] = getelementptr i32, ptr [[PTR]], i64 [[INDEX]]			; CHECK-NEXT: [[GEP:%.*]] = getelementptr i32, ptr [[PTR]], i64 [[INDEX]]
	; CHECK-NEXT: store i32 [[VAL]], ptr [[GEP]], align 4			; CHECK-NEXT: store i32 [[VAL]], ptr [[GEP]], align 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add nsw i64 [[INDEX]], 1			; CHECK-NEXT: [[INDEX_NEXT]] = add nsw i64 [[INDEX]], 1
	Show All 27 Lines

llvm/test/Transforms/LoopVectorize/RISCV/short-trip-count.ll

	Show All 12 Lines
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, ptr [[A:%.]], i32 [[TMP0]]			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, ptr [[A:%.]], i32 [[TMP0]]
	; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i32 0
	; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP2]], align 4			; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP2]], align 4
	; CHECK-NEXT: [[TMP3:%.*]] = add nsw <4 x i32> [[WIDE_LOAD]], <i32 1, i32 1, i32 1, i32 1>			; CHECK-NEXT: [[TMP3:%.*]] = add nsw <4 x i32> [[WIDE_LOAD]], <i32 1, i32 1, i32 1, i32 1>
	; CHECK-NEXT: store <4 x i32> [[TMP3]], ptr [[TMP2]], align 4			; CHECK-NEXT: store <4 x i32> [[TMP3]], ptr [[TMP2]], align 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4
	; CHECK-NEXT: br i1 true, label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]			; CHECK-NEXT: br i1 true, label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 4, 4			; CHECK-NEXT: br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 4, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 4, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[IV:%.]] = phi i32 [ [[IV_NEXT:%.]], [[LOOP]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[IV:%.]] = phi i32 [ [[IV_NEXT:%.]], [[LOOP]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[GEP:%.*]] = getelementptr inbounds i32, ptr [[A]], i32 [[IV]]			; CHECK-NEXT: [[GEP:%.*]] = getelementptr inbounds i32, ptr [[A]], i32 [[IV]]
	; CHECK-NEXT: [[V:%.*]] = load i32, ptr [[GEP]], align 4			; CHECK-NEXT: [[V:%.*]] = load i32, ptr [[GEP]], align 4
	; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[V]], 1			; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[V]], 1
	Show All 33 Lines
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, ptr [[A:%.]], i32 [[TMP0]]			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, ptr [[A:%.]], i32 [[TMP0]]
	; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i32 0
	; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP2]], align 4			; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP2]], align 4
	; CHECK-NEXT: [[TMP3:%.*]] = add nsw <4 x i32> [[WIDE_LOAD]], <i32 1, i32 1, i32 1, i32 1>			; CHECK-NEXT: [[TMP3:%.*]] = add nsw <4 x i32> [[WIDE_LOAD]], <i32 1, i32 1, i32 1, i32 1>
	; CHECK-NEXT: store <4 x i32> [[TMP3]], ptr [[TMP2]], align 4			; CHECK-NEXT: store <4 x i32> [[TMP3]], ptr [[TMP2]], align 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4
	; CHECK-NEXT: br i1 true, label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]			; CHECK-NEXT: br i1 true, label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 4, 4			; CHECK-NEXT: br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 4, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 4, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[IV:%.]] = phi i32 [ [[IV_NEXT:%.]], [[LOOP]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[IV:%.]] = phi i32 [ [[IV_NEXT:%.]], [[LOOP]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[GEP:%.*]] = getelementptr inbounds i32, ptr [[A]], i32 [[IV]]			; CHECK-NEXT: [[GEP:%.*]] = getelementptr inbounds i32, ptr [[A]], i32 [[IV]]
	; CHECK-NEXT: [[V:%.*]] = load i32, ptr [[GEP]], align 4			; CHECK-NEXT: [[V:%.*]] = load i32, ptr [[GEP]], align 4
	; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[V]], 1			; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[V]], 1
	Show All 23 Lines

llvm/test/Transforms/LoopVectorize/X86/constant-fold.ll

	Show All 21 Lines
	; CHECK-NEXT: [[TMP0:%.*]] = add i16 [[OFFSET_IDX]], 0			; CHECK-NEXT: [[TMP0:%.*]] = add i16 [[OFFSET_IDX]], 0
	; CHECK-NEXT: [[TMP1:%.*]] = sext i16 [[TMP0]] to i64			; CHECK-NEXT: [[TMP1:%.*]] = sext i16 [[TMP0]] to i64
	; CHECK-NEXT: [[TMP2:%.*]] = getelementptr [2 x ptr], ptr @b, i16 0, i64 [[TMP1]]			; CHECK-NEXT: [[TMP2:%.*]] = getelementptr [2 x ptr], ptr @b, i16 0, i64 [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.*]] = getelementptr ptr, ptr [[TMP2]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = getelementptr ptr, ptr [[TMP2]], i32 0
	; CHECK-NEXT: store <2 x ptr> <ptr @a, ptr @a>, ptr [[TMP3]], align 8			; CHECK-NEXT: store <2 x ptr> <ptr @a, ptr @a>, ptr [[TMP3]], align 8
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 2			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 2
	; CHECK-NEXT: br i1 true, label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]			; CHECK-NEXT: br i1 true, label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 2, 2			; CHECK-NEXT: br i1 true, label [[BB3:%.*]], label [[SCALAR_PH]]
	david-armUnsubmitted Not Done Reply Inline Actions For all the fixed-lenth vector tests this icmp will get replaced with "i1 true" by InstCombine so the scalar tail should get automatically deleted. david-arm: For all the fixed-lenth vector tests this icmp will get replaced with "i1 true" by InstCombine…
	AllenAuthorUnsubmitted Done Reply Inline Actions Yes, the remainder loop body will be deleted as the trip count is multiple of VF, so it will not be executed. Allen: Yes, the remainder loop body will be deleted as the trip count is multiple of VF, so it will…
	; CHECK-NEXT: br i1 [[CMP_N]], label [[BB3:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i16 [ 2, [[MIDDLE_BLOCK]] ], [ 0, [[BB1:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i16 [ 2, [[MIDDLE_BLOCK]] ], [ 0, [[BB1:%.]] ]
	; CHECK-NEXT: br label [[BB2:%.*]]			; CHECK-NEXT: br label [[BB2:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[C_1_0:%.]] = phi i16 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[_TMP9:%.]], [[BB2]] ]			; CHECK-NEXT: [[C_1_0:%.]] = phi i16 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[_TMP9:%.]], [[BB2]] ]
	; CHECK-NEXT: [[_TMP1:%.*]] = zext i16 0 to i64			; CHECK-NEXT: [[_TMP1:%.*]] = zext i16 0 to i64
	; CHECK-NEXT: [[_TMP2:%.*]] = getelementptr [1 x %rec8], ptr @a, i16 0, i64 [[_TMP1]]			; CHECK-NEXT: [[_TMP2:%.*]] = getelementptr [1 x %rec8], ptr @a, i16 0, i64 [[_TMP1]]
	; CHECK-NEXT: [[_TMP6:%.*]] = sext i16 [[C_1_0]] to i64			; CHECK-NEXT: [[_TMP6:%.*]] = sext i16 [[C_1_0]] to i64
	Show All 26 Lines

llvm/test/Transforms/LoopVectorize/X86/optsize.ll

	Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, ptr [[A:%.]], i32 [[TMP0]]			; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, ptr [[A:%.]], i32 [[TMP0]]
	; CHECK-NEXT: [[TMP4:%.*]] = getelementptr inbounds i32, ptr [[TMP3]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = getelementptr inbounds i32, ptr [[TMP3]], i32 0
	; CHECK-NEXT: store <64 x i32> [[WIDE_MASKED_GATHER]], ptr [[TMP4]], align 4			; CHECK-NEXT: store <64 x i32> [[WIDE_MASKED_GATHER]], ptr [[TMP4]], align 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 64			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 64
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <64 x i32> [[VEC_IND]], <i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64>			; CHECK-NEXT: [[VEC_IND_NEXT]] = add <64 x i32> [[VEC_IND]], <i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64>
	; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i32 [[INDEX_NEXT]], 256			; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i32 [[INDEX_NEXT]], 256
	; CHECK-NEXT: br i1 [[TMP5]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP5]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 256, 256			; CHECK-NEXT: br i1 true, label [[FOR_END_LOOPEXIT:%.*]], label [[SCALAR_PH]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END_LOOPEXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 256, [[MIDDLE_BLOCK]] ], [ 0, [[FOR_BODY_PREHEADER:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 256, [[MIDDLE_BLOCK]] ], [ 0, [[FOR_BODY_PREHEADER:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[I_07:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[I_07:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[I_07]], [[K]]			; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[I_07]], [[K]]
	; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[B]], i32 [[MUL]]			; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[B]], i32 [[MUL]]
	; CHECK-NEXT: [[TMP6:%.*]] = load i32, ptr [[ARRAYIDX]], align 4			; CHECK-NEXT: [[TMP6:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
	Show All 22 Lines
	; AUTOVF-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, ptr [[A:%.]], i32 [[TMP0]]			; AUTOVF-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, ptr [[A:%.]], i32 [[TMP0]]
	; AUTOVF-NEXT: [[TMP4:%.*]] = getelementptr inbounds i32, ptr [[TMP3]], i32 0			; AUTOVF-NEXT: [[TMP4:%.*]] = getelementptr inbounds i32, ptr [[TMP3]], i32 0
	; AUTOVF-NEXT: store <8 x i32> [[WIDE_MASKED_GATHER]], ptr [[TMP4]], align 4			; AUTOVF-NEXT: store <8 x i32> [[WIDE_MASKED_GATHER]], ptr [[TMP4]], align 4
	; AUTOVF-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8			; AUTOVF-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8
	; AUTOVF-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>			; AUTOVF-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>
	; AUTOVF-NEXT: [[TMP5:%.*]] = icmp eq i32 [[INDEX_NEXT]], 256			; AUTOVF-NEXT: [[TMP5:%.*]] = icmp eq i32 [[INDEX_NEXT]], 256
	; AUTOVF-NEXT: br i1 [[TMP5]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]			; AUTOVF-NEXT: br i1 [[TMP5]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
	; AUTOVF: middle.block:			; AUTOVF: middle.block:
	; AUTOVF-NEXT: [[CMP_N:%.*]] = icmp eq i32 256, 256			; AUTOVF-NEXT: br i1 true, label [[FOR_END_LOOPEXIT:%.*]], label [[SCALAR_PH]]
	; AUTOVF-NEXT: br i1 [[CMP_N]], label [[FOR_END_LOOPEXIT:%.*]], label [[SCALAR_PH]]
	; AUTOVF: scalar.ph:			; AUTOVF: scalar.ph:
	; AUTOVF-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 256, [[MIDDLE_BLOCK]] ], [ 0, [[FOR_BODY_PREHEADER:%.]] ]			; AUTOVF-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 256, [[MIDDLE_BLOCK]] ], [ 0, [[FOR_BODY_PREHEADER:%.]] ]
	; AUTOVF-NEXT: br label [[FOR_BODY:%.*]]			; AUTOVF-NEXT: br label [[FOR_BODY:%.*]]
	; AUTOVF: for.body:			; AUTOVF: for.body:
	; AUTOVF-NEXT: [[I_07:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; AUTOVF-NEXT: [[I_07:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; AUTOVF-NEXT: [[MUL:%.*]] = mul nsw i32 [[I_07]], [[K]]			; AUTOVF-NEXT: [[MUL:%.*]] = mul nsw i32 [[I_07]], [[K]]
	; AUTOVF-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[B]], i32 [[MUL]]			; AUTOVF-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[B]], i32 [[MUL]]
	; AUTOVF-NEXT: [[TMP6:%.*]] = load i32, ptr [[ARRAYIDX]], align 4			; AUTOVF-NEXT: [[TMP6:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
	▲ Show 20 Lines • Show All 71 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/X86/pr34438.ll

	Show All 23 Lines
	; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds float, ptr [[A:%.]], i64 [[TMP0]]			; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds float, ptr [[A:%.]], i64 [[TMP0]]
	; CHECK-NEXT: [[TMP4:%.*]] = getelementptr inbounds float, ptr [[TMP3]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = getelementptr inbounds float, ptr [[TMP3]], i32 0
	; CHECK-NEXT: [[WIDE_LOAD1:%.*]] = load <8 x float>, ptr [[TMP4]], align 4, !llvm.access.group [[ACC_GRP0]]			; CHECK-NEXT: [[WIDE_LOAD1:%.*]] = load <8 x float>, ptr [[TMP4]], align 4, !llvm.access.group [[ACC_GRP0]]
	; CHECK-NEXT: [[TMP5:%.*]] = fadd fast <8 x float> [[WIDE_LOAD]], [[WIDE_LOAD1]]			; CHECK-NEXT: [[TMP5:%.*]] = fadd fast <8 x float> [[WIDE_LOAD]], [[WIDE_LOAD1]]
	; CHECK-NEXT: store <8 x float> [[TMP5]], ptr [[TMP4]], align 4, !llvm.access.group [[ACC_GRP0]]			; CHECK-NEXT: store <8 x float> [[TMP5]], ptr [[TMP4]], align 4, !llvm.access.group [[ACC_GRP0]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
	; CHECK-NEXT: br i1 true, label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP1:![0-9]+]]			; CHECK-NEXT: br i1 true, label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP1:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 8, 8			; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 8, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 8, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds float, ptr [[B]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds float, ptr [[B]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: [[TMP6:%.*]] = load float, ptr [[ARRAYIDX]], align 4, !llvm.access.group [[ACC_GRP0]]			; CHECK-NEXT: [[TMP6:%.*]] = load float, ptr [[ARRAYIDX]], align 4, !llvm.access.group [[ACC_GRP0]]
	; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[INDVARS_IV]]
	Show All 31 Lines

llvm/test/Transforms/LoopVectorize/X86/vect.omp.force.small-tc.ll

	Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP4:%.*]] = getelementptr inbounds float, ptr [[TMP3]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = getelementptr inbounds float, ptr [[TMP3]], i32 0
	; CHECK-NEXT: [[WIDE_LOAD1:%.*]] = load <8 x float>, ptr [[TMP4]], align 4, !llvm.access.group [[ACC_GRP6]]			; CHECK-NEXT: [[WIDE_LOAD1:%.*]] = load <8 x float>, ptr [[TMP4]], align 4, !llvm.access.group [[ACC_GRP6]]
	; CHECK-NEXT: [[TMP5:%.*]] = fadd fast <8 x float> [[WIDE_LOAD]], [[WIDE_LOAD1]]			; CHECK-NEXT: [[TMP5:%.*]] = fadd fast <8 x float> [[WIDE_LOAD]], [[WIDE_LOAD1]]
	; CHECK-NEXT: store <8 x float> [[TMP5]], ptr [[TMP4]], align 4, !llvm.access.group [[ACC_GRP6]]			; CHECK-NEXT: store <8 x float> [[TMP5]], ptr [[TMP4]], align 4, !llvm.access.group [[ACC_GRP6]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
	; CHECK-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], 16			; CHECK-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], 16
	; CHECK-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 16, 16			; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 16, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 16, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds float, ptr [[B]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds float, ptr [[B]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: [[TMP7:%.*]] = load float, ptr [[ARRAYIDX]], align 4, !llvm.access.group [[ACC_GRP6]]			; CHECK-NEXT: [[TMP7:%.*]] = load float, ptr [[ARRAYIDX]], align 4, !llvm.access.group [[ACC_GRP6]]
	; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[INDVARS_IV]]
	Show All 30 Lines

llvm/test/Transforms/LoopVectorize/create-induction-resume.ll

	Show First 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[IND_END:%.*]] = add i32 1, [[TMP3]]			; CHECK-NEXT: [[IND_END:%.*]] = add i32 1, [[TMP3]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP4:%.*]] = icmp eq i64 [[INDEX_NEXT]], 12			; CHECK-NEXT: [[TMP4:%.*]] = icmp eq i64 [[INDEX_NEXT]], 12
	; CHECK-NEXT: br i1 [[TMP4]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP4]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 12, 12			; CHECK-NEXT: br i1 true, label [[L2_HEADER_LOOPEXIT:%.*]], label [[SCALAR_PH]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[L2_HEADER_LOOPEXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 1, [[L2_INNER_HEADER_PREHEADER]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 1, [[L2_INNER_HEADER_PREHEADER]] ]
	; CHECK-NEXT: [[BC_RESUME_VAL2:%.*]] = phi i64 [ 13, [[MIDDLE_BLOCK]] ], [ 1, [[L2_INNER_HEADER_PREHEADER]] ]			; CHECK-NEXT: [[BC_RESUME_VAL2:%.*]] = phi i64 [ 13, [[MIDDLE_BLOCK]] ], [ 1, [[L2_INNER_HEADER_PREHEADER]] ]
	; CHECK-NEXT: br label [[L2_INNER_HEADER:%.*]]			; CHECK-NEXT: br label [[L2_INNER_HEADER:%.*]]
	; CHECK: L2.Inner.header:			; CHECK: L2.Inner.header:
	; CHECK-NEXT: [[L2_ACCUM:%.]] = phi i32 [ [[L2_ACCUM_NEXT:%.]], [[L2_INNER_HEADER]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[L2_ACCUM:%.]] = phi i32 [ [[L2_ACCUM_NEXT:%.]], [[L2_INNER_HEADER]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[L2_IV:%.]] = phi i64 [ [[L2_IV_NEXT:%.]], [[L2_INNER_HEADER]] ], [ [[BC_RESUME_VAL2]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[L2_IV:%.]] = phi i64 [ [[L2_IV_NEXT:%.]], [[L2_INNER_HEADER]] ], [ [[BC_RESUME_VAL2]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[L2_ACCUM_NEXT]] = sub i32 [[L2_ACCUM]], [[L1_EXIT_VAL]]			; CHECK-NEXT: [[L2_ACCUM_NEXT]] = sub i32 [[L2_ACCUM]], [[L1_EXIT_VAL]]
	▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/dont-fold-tail-for-const-TC.ll

	Show All 24 Lines
	; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds i32, ptr [[TMP3]], i32 2			; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds i32, ptr [[TMP3]], i32 2
	; CHECK-NEXT: store <2 x i32> <i32 13, i32 13>, ptr [[TMP7]], align 1			; CHECK-NEXT: store <2 x i32> <i32 13, i32 13>, ptr [[TMP7]], align 1
	; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds i32, ptr [[TMP3]], i32 4			; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds i32, ptr [[TMP3]], i32 4
	; CHECK-NEXT: store <2 x i32> <i32 13, i32 13>, ptr [[TMP8]], align 1			; CHECK-NEXT: store <2 x i32> <i32 13, i32 13>, ptr [[TMP8]], align 1
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 6			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 6
	; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1800			; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1800
	; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 1800, 1800			; CHECK-NEXT: br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 1800, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 1800, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[RIV:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[RIVPLUS1:%.]], [[LOOP]] ]			; CHECK-NEXT: [[RIV:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[RIVPLUS1:%.]], [[LOOP]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[A]], i32 [[RIV]]			; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[A]], i32 [[RIV]]
	; CHECK-NEXT: store i32 13, ptr [[ARRAYIDX]], align 1			; CHECK-NEXT: store i32 13, ptr [[ARRAYIDX]], align 1
	; CHECK-NEXT: [[RIVPLUS1]] = add nuw nsw i32 [[RIV]], 1			; CHECK-NEXT: [[RIVPLUS1]] = add nuw nsw i32 [[RIV]], 1
	Show All 19 Lines

llvm/test/Transforms/LoopVectorize/dont-fold-tail-for-divisible-TC.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -passes=loop-vectorize -force-vector-width=4 -S \| FileCheck %s			; RUN: opt < %s -passes=loop-vectorize -force-vector-width=4 -S \| FileCheck %s

	target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
	david-armUnsubmitted Not Done Reply Inline Actions Hmm, I know this is not introduced by your patch, but I don't think we should have tests with target-specifics in the top level LoopVectorize directory. david-arm: Hmm, I know this is not introduced by your patch, but I don't think we should have tests with…
	AllenAuthorUnsubmitted Done Reply Inline Actions Thanks. Do you have any plans to reposition them? Or where do you want me to put them that would look more appropriate? Allen: Thanks. Do you have any plans to reposition them? Or where do you want me to put them that…

	; Make sure the loop is vectorized under -Os without folding its tail based on			; Make sure the loop is vectorized under -Os without folding its tail based on
	; its trip-count's lower bits known to be zero.			; its trip-count's lower bits known to be zero.

	define dso_local void @alignTC(ptr noalias nocapture %A, i32 %n) optsize {			define dso_local void @alignTC(ptr noalias nocapture %A, i32 %n) optsize {
	; CHECK-LABEL: @alignTC(			; CHECK-LABEL: @alignTC(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ALIGNEDTC:%.]] = and i32 [[N:%.]], -8			; CHECK-NEXT: [[ALIGNEDTC:%.]] = and i32 [[N:%.]], -8
	; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[ALIGNEDTC]], 4			; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[ALIGNEDTC]], 4
	; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[ALIGNEDTC]], 4			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[ALIGNEDTC]], 4
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[ALIGNEDTC]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[ALIGNEDTC]], [[N_MOD_VF]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0			; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, ptr [[A:%.]], i32 [[TMP0]]			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, ptr [[A:%.]], i32 [[TMP0]]
	; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i32 0
	; CHECK-NEXT: store <4 x i32> <i32 13, i32 13, i32 13, i32 13>, ptr [[TMP2]], align 1			; CHECK-NEXT: store <4 x i32> <i32 13, i32 13, i32 13, i32 13>, ptr [[TMP2]], align 1
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4
	; CHECK-NEXT: [[TMP3:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP3:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP3]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP3]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[ALIGNEDTC]], [[N_VEC]]			; CHECK-NEXT: br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[RIV:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[RIVPLUS1:%.]], [[LOOP]] ]			; CHECK-NEXT: [[RIV:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[RIVPLUS1:%.]], [[LOOP]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[A]], i32 [[RIV]]			; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[A]], i32 [[RIV]]
	; CHECK-NEXT: store i32 13, ptr [[ARRAYIDX]], align 1			; CHECK-NEXT: store i32 13, ptr [[ARRAYIDX]], align 1
	; CHECK-NEXT: [[RIVPLUS1]] = add nuw nsw i32 [[RIV]], 1			; CHECK-NEXT: [[RIVPLUS1]] = add nuw nsw i32 [[RIV]], 1
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0			; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, ptr [[A:%.]], i32 [[TMP0]]			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, ptr [[A:%.]], i32 [[TMP0]]
	; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i32 0
	; CHECK-NEXT: store <4 x i32> <i32 13, i32 13, i32 13, i32 13>, ptr [[TMP2]], align 1			; CHECK-NEXT: store <4 x i32> <i32 13, i32 13, i32 13, i32 13>, ptr [[TMP2]], align 1
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4
	; CHECK-NEXT: [[TMP3:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP3:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP3]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP3]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[N]], [[N_VEC]]			; CHECK-NEXT: br i1 true, label [[EXIT_LOOPEXIT:%.*]], label [[SCALAR_PH]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT_LOOPEXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[LOOP_PREHEADER]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[LOOP_PREHEADER]] ]
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[RIV:%.]] = phi i32 [ [[RIVPLUS1:%.]], [[LOOP]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[RIV:%.]] = phi i32 [ [[RIVPLUS1:%.]], [[LOOP]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[A]], i32 [[RIV]]			; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[A]], i32 [[RIV]]
	; CHECK-NEXT: store i32 13, ptr [[ARRAYIDX]], align 1			; CHECK-NEXT: store i32 13, ptr [[ARRAYIDX]], align 1
	; CHECK-NEXT: [[RIVPLUS1]] = add nuw nsw i32 [[RIV]], 1			; CHECK-NEXT: [[RIVPLUS1]] = add nuw nsw i32 [[RIV]], 1
	▲ Show 20 Lines • Show All 91 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/pr39417-optsize-scevchecks.ll

	Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP21:%.]] = getelementptr inbounds i32, ptr [[A:%.]], i32 [[TMP0]]			; CHECK-NEXT: [[TMP21:%.]] = getelementptr inbounds i32, ptr [[A:%.]], i32 [[TMP0]]
	; CHECK-NEXT: [[TMP22:%.*]] = getelementptr inbounds i32, ptr [[TMP21]], i32 0			; CHECK-NEXT: [[TMP22:%.*]] = getelementptr inbounds i32, ptr [[TMP21]], i32 0
	; CHECK-NEXT: store <4 x i32> [[TMP20]], ptr [[TMP22]], align 4			; CHECK-NEXT: store <4 x i32> [[TMP20]], ptr [[TMP22]], align 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i32> [[VEC_IND]], <i32 4, i32 4, i32 4, i32 4>			; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i32> [[VEC_IND]], <i32 4, i32 4, i32 4, i32 4>
	; CHECK-NEXT: [[TMP24:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1024			; CHECK-NEXT: [[TMP24:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1024
	; CHECK-NEXT: br i1 [[TMP24]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !0			; CHECK-NEXT: br i1 [[TMP24]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !0
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 1024, 1024			; CHECK-NEXT: br i1 true, label [[FOR_END_LOOPEXIT:%.*]], label [[SCALAR_PH]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END_LOOPEXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK: for.end.loopexit:			; CHECK: for.end.loopexit:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	for.body.preheader:			for.body.preheader:
	br label %for.body			br label %for.body

	Show All 16 Lines

llvm/test/Transforms/LoopVectorize/pr44488-predication.ll

	Show All 38 Lines
	; CHECK-NEXT: [[TMP10:%.*]] = phi <2 x i16> [ [[TMP6]], [[PRED_SREM_CONTINUE]] ], [ [[TMP9]], [[PRED_SREM_IF1]] ]			; CHECK-NEXT: [[TMP10:%.*]] = phi <2 x i16> [ [[TMP6]], [[PRED_SREM_CONTINUE]] ], [ [[TMP9]], [[PRED_SREM_IF1]] ]
	; CHECK-NEXT: [[PREDPHI:%.*]] = select <2 x i1> [[TMP1]], <2 x i16> <i16 5786, i16 5786>, <2 x i16> [[TMP10]]			; CHECK-NEXT: [[PREDPHI:%.*]] = select <2 x i1> [[TMP1]], <2 x i16> <i16 5786, i16 5786>, <2 x i16> [[TMP10]]
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x i16> [[PREDPHI]], i32 1			; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x i16> [[PREDPHI]], i32 1
	; CHECK-NEXT: store i16 [[TMP11]], ptr @v_39, align 1			; CHECK-NEXT: store i16 [[TMP11]], ptr @v_39, align 1
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 2			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 2
	; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i32 [[INDEX_NEXT]], 12			; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i32 [[INDEX_NEXT]], 12
	; CHECK-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 12, 12			; CHECK-NEXT: br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i16 [ 111, [[MIDDLE_BLOCK]] ], [ 99, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i16 [ 111, [[MIDDLE_BLOCK]] ], [ 99, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[I_07:%.]] = phi i16 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC7:%.]], [[FOR_LATCH:%.*]] ]			; CHECK-NEXT: [[I_07:%.]] = phi i16 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC7:%.]], [[FOR_LATCH:%.*]] ]
	; CHECK-NEXT: [[LV:%.*]] = load i16, ptr @v_38, align 1			; CHECK-NEXT: [[LV:%.*]] = load i16, ptr @v_38, align 1
	; CHECK-NEXT: [[CMP1:%.*]] = icmp eq i16 [[LV]], 32767			; CHECK-NEXT: [[CMP1:%.*]] = icmp eq i16 [[LV]], 32767
	; CHECK-NEXT: br i1 [[CMP1]], label [[COND_END:%.*]], label [[COND_END]]			; CHECK-NEXT: br i1 [[CMP1]], label [[COND_END:%.*]], label [[COND_END]]
	▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LV] Remove the reminder loop if we know the mask is always trueAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 537629

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/test/Transforms/LoopVectorize/AArch64/eliminate-tail-predication.ll

llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding.ll

llvm/test/Transforms/LoopVectorize/RISCV/short-trip-count.ll

llvm/test/Transforms/LoopVectorize/X86/constant-fold.ll

llvm/test/Transforms/LoopVectorize/X86/optsize.ll

llvm/test/Transforms/LoopVectorize/X86/pr34438.ll

llvm/test/Transforms/LoopVectorize/X86/vect.omp.force.small-tc.ll

llvm/test/Transforms/LoopVectorize/create-induction-resume.ll

llvm/test/Transforms/LoopVectorize/dont-fold-tail-for-const-TC.ll

llvm/test/Transforms/LoopVectorize/dont-fold-tail-for-divisible-TC.ll

llvm/test/Transforms/LoopVectorize/pr39417-optsize-scevchecks.ll

llvm/test/Transforms/LoopVectorize/pr44488-predication.ll

[LV] Remove the reminder loop if we know the mask is always true
AbandonedPublic