This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
60/61
LoopVectorize.cpp
21/23
VPlanTransforms.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
AArch64/
-
sve-tail-folding-forced.ll
-
sve-tail-folding.ll
-
tail-fold-uniform-memops.ll
-
ARM/
-
mve-gather-scatter-tailpred.ll
-
mve-reduction-types.ll
-
X86/
-
constant-fold.ll
-
optsize.ll
-
pr34438.ll
-
small-size.ll
-
tail_loop_folding.ll
-
vect.omp.force.small-tc.ll
-
dont-fold-tail-for-const-TC.ll
-
dont-fold-tail-for-divisible-TC.ll
2/2
first-order-recurrence-sink-replicate-region.ll
-
first-order-recurrence.ll
-
pr44488-predication.ll
2/2
pr45679-fold-tail-by-masking.ll
-
pr46525-expander-insertpoint.ll
-
pr51614-fold-tail-by-masking.ll
-
select-reduction.ll
-
tail-folding-vectorization-factor-1.ll

Differential D116123

[VPlan] Handle IV vector splat using VPWidenCanonicalIV.
ClosedPublic

Authored by fhahn on Dec 21 2021, 11:55 AM.

Download Raw Diff

Details

Reviewers

Ayal
gilr
rengolin

Commits

rGefd4938723ef: [VPlan] Handle IV vector splat using VPWidenCanonicalIV.

Summary

This patch tries to use an existing VPWidenCanonicalIVRecipe
instead of creating another step-vector for canonical
induction recipes in widenIntOrFpInduction.

This has the following benefits:

First step to avoid setting both vector and scalar values for the same induction def.
Reducing complexity of widenIntOrFpInduction through making things more explicit in VPlan
Only need to splat the vector IV for block in masks.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Ayal added inline comments.Dec 26 2021, 9:51 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2576–2577	This last case which only emits a scalar IV can now be folded with the end of the previous case which emits both?
8403–8404	Could we simplify by always creating a VPWidenCanonicalIVRecipe, w/o trying to use Legal->getPrimaryInduction() even when it exists? At first FoldTail relied solely on PrimaryInduction, until D77635 extended it to work w/o PrimaryInduction; but it should probably have removed the dependence on PrimaryInduction altogether. This may also simplify/revert D92017. A subsequent optimization could try to fold multiple IV recipes, in case both the PrimaryInduction was widened and WidenCanonicalIV was created (or leave it to LSR).
8413–8414	TailFolded should always be true? (While we're here, irrespective of this patch)
8417	No need for {} So a StepVector is needed only to feed ICmpULE, not ActiveLaneMask which depends on first lane only? Seems like an optimization best applied separately? BTC is no longer used by ActiveLaneMask, so setting it should sink below to where ICmpULE needs it (While we're here, irrespective of this patch).
8428	StepVector needs to hold One as its second operand, mainly to convey the desired type?
8849	NeedsScalarInduction(VF) will always early-return true given that ShouldScalarizeInstruction(Instr, VF) is true?
9159	As a redundancy elimination optimization? This buildVPlanWithVPRecipes() method deserves outlining anything from it.
llvm/lib/Transforms/Vectorize/VPlan.h
856 ↗	(On Diff #395732)	Where are these new classof's needed?

fhahn mentioned this in rG511726c64d3b: [LV] Move getStepVector out of ILV (NFC)..Dec 26 2021, 12:18 PM

fhahn mentioned this in rG2e630eabd329: [LV] Sink BTC creation to actual use (NFC)..Dec 27 2021, 2:35 AM

Address latest comments, thanks!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2529–2530	I inlined the lambda, as it is a single use now. I also inlined the scalar path from `getStepVector`, which seems like it should have never been there in the first place.
2568–2569	removed
2576–2577	Done!
2576–2577	removed
8403–8404	Could we simplify by always creating a VPWidenCanonicalIVRecipe, w/o trying to use Legal->getPrimaryInduction() even when it exists? We could do, but it would probably mean a bigger test diff and a bigger functional change. I'd prefer to do it separately, to avoid any unexpected knock-on effects.
8413–8414	Yes, I noticed that as well. I'll push a commit, replacing it with an assert.
8417	No need for {} I think if the body spans multiple lines due to a comment, braces should not be omitted https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements So a StepVector is needed only to feed ICmpULE, not ActiveLaneMask which depends on first lane only? Seems like an optimization best applied separately? Yes, let me try to split that off. `ActiveLaneMask` only needs the first scalar step of each `Part`. BTC is no longer used by ActiveLaneMask, so setting it should sink below to where ICmpULE needs it (While we're here, irrespective of this patch). done separately in 2e630eabd329
8428	At the moment only a step of one is used, but it will be used with different steps in the future (build scalar steps for inductions with arbitrary steps.
8849	I think it is required for the case where `Instr` is a truncate.
9159	Sounds good, moved to `VPlanTransforms::removeRedundantStepVector`
llvm/lib/Transforms/Vectorize/VPlan.cpp
741 ↗	(On Diff #395732)	Changed to get `VPIteration(Part, 0)`. I kept accessing `Val` outside the loop, to keep a single broadcast. But there's now an assert.
746 ↗	(On Diff #395732)	Move out of the loop and will push a fix separately.
llvm/lib/Transforms/Vectorize/VPlan.h
856 ↗	(On Diff #395732)	Unfortunately this is needed to `dyn_cast` directly from VPUser -> VPInstruction

fhahn added inline comments.Dec 27 2021, 8:37 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8417	So a StepVector is needed only to feed ICmpULE, not ActiveLaneMask which depends on first lane only? Seems like an optimization best applied separately? Yes, let me try to split that off. ActiveLaneMask only needs the first scalar step of each Part. I just double checked and we may still need to go the route with the widened recipe if there's no primary induction. But there's no test case without primary inductions that use active.lane.mask. I;ll push one and update the patch.

Harbormaster completed remote builds in B140725: Diff 396313.Dec 27 2021, 8:40 AM

fhahn added inline comments.Dec 27 2021, 1:26 PM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8417	I missed something here and IIUC your comment was regarding creating a step vector in both cases at first, right? The issue with that is that in some cases `ActiveLaneMask` will use the step-vector instead of the scalar steps, so I left it as is for now. I just double checked and we may still need to go the route with the widened recipe if there's no primary induction. The current patch still creates the vector phi when necessary and the existing tests cover that already.

Ayal added inline comments.Dec 27 2021, 2:10 PM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8417	Yes, my comment refers to the fact that currently both ActiveLaneMask and ICmpULE are treated the same when Legal has a PrimaryInduction() - both use the same IV = Plan->getOrAddVPValue(Legal->getPrimaryInduction()); This patch continues to feed ActiveLaneMask with this IV, but feeds ICmpULE with a StepVector.
8849	Doesn't `ShouldScalarizeInstruction(Instr, VF) && NeedsScalarInduction(VF)` always equals `ShouldScalarizeInstruction(Instr, VF)` given that `NeedsScalarInduction(VF)` returns true if `ShouldScalarizeInstruction(Instr, VF)` returns true for the same Instr and VF?

Simplify onlyScalarStepsNeeded a bit.

Harbormaster completed remote builds in B140811: Diff 396422.Dec 28 2021, 12:41 PM

fhahn added inline comments.Dec 28 2021, 12:46 PM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8417	Yes, my comment refers to the fact that currently both ActiveLaneMask and ICmpULE are treated the same when Legal has a PrimaryInduction() - both use the same IV = Plan->getOrAddVPValue(Legal->getPrimaryInduction()); Ah I see. Yes, currently they both use the same VPValue, because the single recipe creates both the scalar steps and the step vector. But ActiveLaneMask only needs the steps and ICmpULE only needs the step vector. This is explicit now in the code as a result of breaking this part of the phi handling into distinct pieces.
8849	I think to preserve the existing behaviour the PHI needs to be scalarized and any of its users. I updated the code to make the check a bit more direct.

Ayal mentioned this in D113223: [VPlan] Add VPCanonicalIVRecipe, partly retire createInductionVariable..Dec 29 2021, 12:50 AM

Ayal added inline comments.Dec 29 2021, 1:57 PM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2537	(assert !State.VF.isZero()? Indep. of this patch.)
2540	assert outside loop. This is handling the unrolling-only case, does it matter whether vectors are scalable or not?
2544	State.VF == 1? Calling getRuntimeVF* seems redundant, StartIdx == (Value*...)Part?
2551	ditto
2564	Clarifying the three options for VF>1: Create a vector IV (createVectorIntOrFpInductionPHI) Create a scalar IV (CreateScalarIV) with per-lane steps (buildScalarSteps) both 1&2 which complements the above option for VF=1: 0. Create a scalar IV (CreateScalarIV) with per-part steps (CreateSplatIV updated and inlined) It may be clearer to have auto NeedsVectorIV = !shouldScalarizeInstruction(EntryVal); if (NeedsVectorIV) createVectorIntOrFpInductionPHI(ID, Step, Start, EntryVal, Def, State); auto NeedsScalarIV = needsScalarInduction(EntryVal); if (NeedsScalarIV) { Value *ScalarIV = CreateScalarIV(Step); buildScalarSteps(ScalarIV, Step, EntryVal, ID, Def, State); } Perhaps VPWidenIntOrFpInductionRecipe should know of NeedsScalarIV and NeedsVectorIV (names to be improved), pass them to ILV->widenIntOrFpInduction() and to others - see removeRedundantStepVector below?
8403–8404	Could we simplify by always creating a VPWidenCanonicalIVRecipe, w/o trying to use Legal->getPrimaryInduction() even when it exists? We could do, but it would probably mean a bigger test diff and a bigger functional change. I'd prefer to do it separately, to avoid any unexpected knock-on effects. sure, can also do so separately first. It may help simplify this patch, among other things.
8417	Ah I see. Yes, currently they both use the same VPValue, because the single recipe creates both the scalar steps and the step vector. But ActiveLaneMask only needs the steps and ICmpULE only needs the step vector. This is explicit now in the code as a result of breaking this part of the phi handling into distinct pieces. ok, that confirms 1st Q above, what about the 2nd: Seems like an optimization best applied separately?
8420	(If a VPWidenCanonicalIVRecipe is always constructed then IV will always be set to it here and this construction is not needed?)
8423	Is anything but CanIV expected to feed getCanonicalStepVector? Does it aim to cope with multiple attempts to obtain CanIV's StepVector? It only feeds a single ICmpULE. Sounds like Plan->getOrCreateCanonicalStepVector()? Can alternatively have VPCanonicalIVPhiRecipe define two VPValues: one being the scalar-per-part providing lane 0, and the other (optionally optional) providing its vector steps?
8428	At the moment only a step of one is used, but it will be used with different steps in the future (build scalar steps for inductions with arbitrary steps. Have that future patch extend the support from the initial unit step needed here?
8836	If applied as a VPlan2VPlan optimization after VPlans were built according to ranges that have been clamped, suffice to check properties of any VF in Range; can assert they are the same across Range; best prevent clamping Range.
9159	place comment next to call.
9228	Have removeRedundantStepVector() take care of whatever it needs to check internally, rather than passing in this lambda?
llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
178 ↗	(On Diff #396422)	document
llvm/lib/Transforms/Vectorize/VPlan.cpp
741 ↗	(On Diff #395732)	Val >> a more informative name: CanonicalIV? ScalarFirstLanesOfIV? The single scalar value of the scalar canonical IV chain can be recorded per-part (0 or all) throughout: `Value *Val = State.get(getOperand(0), 0)`?
746 ↗	(On Diff #395732)	`assert(Val == State.get(getOperand(0), Part) && "Val changed for Part");` ?
741 ↗	(On Diff #396422)	`Value *Step = State.get(getOperand(1), Part);` ?
751 ↗	(On Diff #396422)	Does Instruction::Add also work for FP inductions?
llvm/lib/Transforms/Vectorize/VPlan.h
856 ↗	(On Diff #395732)	document with said need?
2746 ↗	(On Diff #396422)	There's only a single VPCanonicalIVPHIRecipe per Plan to be passed in, may as well have getCanonicalStepVector() obtain it from Plan?
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
345	Can early-exit before the loop if OnlyScalarStepsNeeded(OriginalCanonicalIV). Then here check if (IV && IV->getUnderlyingValue() == OriginalCanonicalIV). But perhaps better to check if (IV && IV->getUnderlyingValue() == OriginalCanonicalIV && !IV->NeedsVectorIV()) as mentioned above.
llvm/lib/Transforms/Vectorize/VPlanValue.h
333 ↗	(On Diff #396422)	Lex order

Address latest comments, thanks! The most notable changes are always generating VPWidenCanonicalIVRecipes and then try to replace them by either a vector induction or a step-vector in a separate VPlan-to-VPlan transform, if possible.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2537	Done in e2f1c4c7066b
2540	Moved! This is handling the unrolling-only case, does it matter whether vectors are scalable or not? Not sure why the assert is here, but I think it should be retained in this patch.
2544	Replaced call.
2551	Replaced call
2564	Clarifying the three options for VF>1: Exactly! I restructured the code as suggested, thanks.
8403–8404	Update to always generate the widened canonical induction.
8417	I updated the patch to always generate widened induction here, and added a transform to replace it if possible later.
8423	Is anything but CanIV expected to feed getCanonicalStepVector? Does it aim to cope with multiple attempts to obtain CanIV's StepVector? It only feeds a single ICmpULE. Yes that was the original goal. The code is now gone, the stepvector may be replace the widened canonical IV created unconditional here. Can alternatively have VPCanonicalIVPhiRecipe define two VPValues: one being the scalar-per-part providing lane 0, and the other (optionally optional) providing its vector steps? I think modeling the stepvector separately allows us to re-use it for vector induction increments and optionally defining an additional value would make it a bit harder to first add the stepvector and then optimize it away; with a separate recipe for the stepvector we can simplify replace the uses and remove the step recipe. With multi-defs we would need to either undef one of the multi-defs or replace the multi-def recipe with a single def version.
8428	A stepvector is also needed for incrementing vector inductions and I plan to use the recipe in a later patch, but unfortunately there's nothing to share yet.
8836	The issue here is that for VF == 1 there's no need for a step-vector. We could define step-vector to just produce a scalar for VF = 1 though? Later patches with a separate recipe for scalar-steps would handle this as well.
9159	the comment is stale now, removed
9228	All information to take the decision is already available in the VPlan. We only need to know whether the tail will be folded and check check the users of the induction recipe in the plan.
llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
178 ↗	(On Diff #396422)	Moved definition out of VPRecipeBuilder and removed this here.
llvm/lib/Transforms/Vectorize/VPlan.cpp
746 ↗	(On Diff #395732)	Unfortunately this won't work as expected in this case, because the operand is an constant wrapped in a VPValue and State is not explicitly set. Currently State.get will broadcast the constant in a vector when using State.get(getOperand(1), Part); and return scalar when using VPIteration(Part, 0).
753 ↗	(On Diff #395732)	At the moment there's no need for float support here. I added an assert.
741 ↗	(On Diff #396422)	see above.
751 ↗	(On Diff #396422)	stepvector is not used for floats yet, added an assert
llvm/lib/Transforms/Vectorize/VPlan.h
2746 ↗	(On Diff #396422)	This is not needed any longer in the current version of the patch. Removed.
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
345	This has been completely removed and replaced by a different transform.
llvm/lib/Transforms/Vectorize/VPlanValue.h
333 ↗	(On Diff #396422)	This is a leftover from an earlier iteration and has been removed.

Harbormaster completed remote builds in B141339: Diff 397079.Jan 3 2022, 9:03 AM

Rebase & ping :)

I also removed the explicit step argument from the patch, as suggested. I think now all outstanding comments should be addressed.

Harbormaster completed remote builds in B142444: Diff 398643.Jan 10 2022, 7:49 AM

Ayal added inline comments.Jan 12 2022, 5:09 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2570	Set StartIdx to int or fp from its beginning? Indep. of this patch.
2577	Would be good to fold int and fp handlings, indep. of this patch: Set and use `MulOp` here as well? Use CreateBinOp here as well? Name the instruction "induction" above for fp as well? OTOH, hanlding Trunc should apply to int only?
8417	Better have the transform in a separate patch, to retain exiting behavior here?
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
337	`Phi`: VPWidenIntOrFpInductionRecipe represents a phi (assumed to exist, corresponding to OriginalCanonicalIV) but VPWidenCanonicalIVRecipe represents a non-phi and thus appears after the former. Ordering their handling accordingly is more logical? Break as soon as (if) the latter is found? WidenCan >> WidenNewIV? WidenInduction >> WidenOriginalIV? (both are canonical...)
357	Is this optimization expected to clamp Range.End for the current VPlan during VPlans-for-ranges construction? Is there a need to (only) know if VPlan's range is scalar? Specifically, if Start is Scalar then End can be asserted to be Start*2, if needed? Suffice the check if all recipes provide scalars only, although faster to check if range is scalar?
359	vputils::onlyScalarsDemanded()?
366	vputils::onlyFirstLaneDemanded()?
374	Place next to above early-exit if !WidenCan?
397	Scalar Start handled above (w/o retaining WidenCan)?

Ayal added inline comments.Jan 12 2022, 5:09 AM

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
391	WidenCan is going to be replaced, question is by what; check instead if StepVector is needed (one recipe producing only scalars the other vectors (deserves more review thoughts...)) and if so create it, followed by a single replacing and erasing of WidenCan?

fhahn mentioned this in D117140: [LV] Always create VPWidenCanonicalIVRecipe, optimize away later..Jan 12 2022, 12:17 PM

fhahn mentioned this in rG7ce48be0fd83: [LV] Inline CreateSplatIV call for scalar VFs (NFC)..Jan 13 2022, 1:34 AM

Address latest comments. Pulled in onlyFirstLaneDemanded from D116554. Unfortunately D116554 itself cannot be applied before this patch, because the stepvector transition is required for the use in buildScalarSteps.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2570	Landed the change separately in 7ce48be0fd83 . StartIdx depends on `Part` though, so I left it in the loop for now.
2577	Thanks, I applied the folding and added an assert in 7ce48be0fd83.
8417	Good idea, split off to D117140
llvm/lib/Transforms/Vectorize/VPlan.h
856 ↗	(On Diff #395732)	Added documentation and also added it for other recipes where needed.
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
337	Updated in the split-off review.
357	It's sufficient to check whether the start is scalar. There's no need to clamp the end.
374	Moved!
391	Updated to check if `StepVector` is needed and sink replace & erase code.
397	This is not needed in the latest version, removed.

Harbormaster completed remote builds in B143105: Diff 399590.Jan 13 2022, 1:44 AM

fhahn added a parent revision: D117140: [LV] Always create VPWidenCanonicalIVRecipe, optimize away later..Jan 16 2022, 5:34 AM

Introduce VPRecipeBase::onlyFirstLaneUsed & VPRecipeBase::onlyScalarsUsed as suggested by @Ayal in D116554.

Harbormaster completed remote builds in B143675: Diff 400380.Jan 16 2022, 7:35 AM

fhahn mentioned this in D116554: [VPlan] Use VPlan to check if only the first lane is used..Jan 16 2022, 7:55 AM

Rebased after latest changes to D117140.

Harbormaster completed remote builds in B144089: Diff 400951.Jan 18 2022, 12:33 PM

Ayal added inline comments.Jan 19 2022, 8:35 AM

llvm/lib/Transforms/Vectorize/VPlan.cpp
1674 ↗	(On Diff #400951)	Demanded == Used?
llvm/lib/Transforms/Vectorize/VPlan.h
75 ↗	(On Diff #400951)	Place above next to getRuntimeVF()?
80 ↗	(On Diff #400951)	(Indep. of this patch, but while we're here:) "The sequence starts at StartIndex." - redundant? \p Opcode >> \p BinOp
773 ↗	(On Diff #400951)	Default is to return false conservatively.
780 ↗	(On Diff #400951)	ditto
926 ↗	(On Diff #400951)	The recipes along the canonical IV chain also use first lane only, namely: CanonicalIVIncrement, CanonicalIVIncrementNUW, BranchOnCount and VPCanonicalIVPHIRecipe? They are also "onlyFirstPartUsed()", but are unique in that.
1364 ↗	(On Diff #400951)	Comment that "// Recursing through Blend recipes only, must terminate at header phi's the latest." ?
1744 ↗	(On Diff #400951)	Already confirmed above that Op is the address?
1747 ↗	(On Diff #400951)	... of their address. `return Op == getAddr() && isConsecutive();` ?
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
342	Ahh, removeRedundantInductionCasts() is called before hoisting VPWidenIntOrFpInductionRecipes-built-from-Truncs, so the latter will be excluded if we only visit phis() here? OTOH, are such truncated IV's candidates for folding with WidenCanonicalIV in terms of matching types?
362	The VPWidenCanonicalIVRecipe WidenNewIV is created only to feed the ICmpULE or ActiveLane, i.e., when vectorizing with fold tail, so can assert(!Range.Start.isScalar && FoldTail) if needed below? Scalar values are demanded from WidenNewIV only if it feeds ActiveLane, which demands only the first lane, so broadcasting and adding building scalar steps for each lane seems redundant here? Suffice to feed ActiveLane with UF parts each holding first lane only, either constructed via a simplified StepVector from canonicalIV or perhaps have the canonicalIV supply them directly instead of broadcasting its Part 0 value across all parts (which now seems potentially misleading)?

Address latest comments, thanks!

Harbormaster completed remote builds in B144337: Diff 401290.Jan 19 2022, 9:39 AM

fhahn mentioned this in rG5f2854f1daa7: [LV] Always create VPWidenCanonicalIVRecipe, optimize away later..Jan 22 2022, 7:34 AM

Rebase & ping :)

llvm/lib/Transforms/Vectorize/VPlan.cpp
1674 ↗	(On Diff #400951)	Updated, thanks!
llvm/lib/Transforms/Vectorize/VPlan.h
75 ↗	(On Diff #400951)	Moved up, thanks!
80 ↗	(On Diff #400951)	Adjusted, thanks!
773 ↗	(On Diff #400951)	extended comment.
780 ↗	(On Diff #400951)	extended comment.
926 ↗	(On Diff #400951)	Updated to include CanonicalIVIncrement, CanonicalIVIncrementNUW, BranchOnCount opcodes.
1364 ↗	(On Diff #400951)	Added the comment, thanks!
1747 ↗	(On Diff #400951)	Simplified as suggested, thanks!
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
342	OTOH, are such truncated IV's candidates for folding with WidenCanonicalIV in terms of matching types? I don't think so, as we should widen using the canonical IVs type, which would be the wider type.
362	The VPWidenCanonicalIVRecipe WidenNewIV is created only to feed the ICmpULE or ActiveLane, i.e., when vectorizing with fold tail, so can assert(!Range.Start.isScalar && FoldTail) if needed below? I might be missing something, but I think It is still possible to fold the tail with VF=1 and UF>1, so I left the check (e.g. in `llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll`) Scalar values are demanded from WidenNewIV only if it feeds ActiveLane, which demands only the first lane, so broadcasting and adding building scalar steps for each lane seems redundant here? Suffice to feed ActiveLane with UF parts each holding first lane only, either constructed via a simplified StepVector from canonicalIV or perhaps have the canonicalIV supply them directly instead of broadcasting its Part 0 value across all parts (which now seems potentially misleading)? Yes, for ActiveLane we only need the first scalar part of each lane. I think it might be better to have a special version of build-scalar-steps that only generates the first lane as follow-up to the build-scalar-steps patches.

Harbormaster completed remote builds in B145093: Diff 402314.Jan 23 2022, 5:06 AM

Ayal added inline comments.Jan 25 2022, 12:00 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4801	Wonder if this is still the case now that NewIV is always created to feed the vector compare, until replaced?
llvm/lib/Transforms/Vectorize/VPlan.cpp
769 ↗	(On Diff #402314)	Assert is redundant given than Step was created above as a ConstantInt - assert instead that Val is integer, before the loop?
llvm/lib/Transforms/Vectorize/VPlan.h
926 ↗	(On Diff #400951)	Also have VPCanonicalIVPHIRecipe indicate it uses only first lane?
1383 ↗	(On Diff #402314)	Could a user be live-out / not a recipe?
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
362	The VPWidenCanonicalIVRecipe WidenNewIV is created only to feed the ICmpULE or ActiveLane, i.e., when vectorizing with fold tail, so can assert(!Range.Start.isScalar && FoldTail) if needed below? I might be missing something, but I think It is still possible to fold the tail with VF=1 and UF>1, so I left the check (e.g. in llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll) Ah, you're right, I forgot foldTail may operate when unrolling only; it should probably stop doing that, but in a separate patch. Scalar values are demanded from WidenNewIV only if it feeds ActiveLane, which demands only the first lane, so broadcasting and adding building scalar steps for each lane seems redundant here? Suffice to feed ActiveLane with UF parts each holding first lane only, either constructed via a simplified StepVector from canonicalIV or perhaps have the canonicalIV supply them directly instead of broadcasting its Part 0 value across all parts (which now seems potentially misleading)? Yes, for ActiveLane we only need the first scalar part of each lane. I think it might be better to have a special version of build-scalar-steps that only generates the first lane as follow-up to the build-scalar-steps patches. Follow-up TODO would be fine. Still would be good to try and simplify the logic here that checks if OriginalIV cannot replace NewIV (but instead a StepVector'd CanonicalIV replaces NewIV): FoldTail must be true given that NewIV exists, so can be asserted or not passed-in; which leaves Range includes only positive VF's && onlyFirstLaneUsed(WidenOriginalIV) && onlyScalarsUsed(WidenOriginalIV) && NewIV feeds some VPInstruction that isn't ActiveLaneMask ?

fhahn mentioned this in D118167: [VPlan] Record whether scalar IVs are need in induction recipe. (NFC).Jan 25 2022, 11:06 AM

fhahn added a parent revision: D118167: [VPlan] Record whether scalar IVs are need in induction recipe. (NFC).Jan 25 2022, 11:11 AM

A major update & simplification: the latest version doesn't need a separate step-vector recipe; if a 'step-vector' is needed, the VPWidenCanonicalInduction recipe is retained. It also removes the VPlan-based analysis of the uses in favor of looking up the recorded information added in D118167.

Note that it still checks if all users of the induction are active-lane-masks. The legacy cost-model doesn't know about them, but if it all users are active-lane-masks, no widened canonical IV is needed, only the scalar steps.

fhahn marked 4 inline comments as done.Jan 25 2022, 11:21 AM

fhahn added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4801	I think for compatibility with existing behavior we still need to retain it, as it impacts the decision whether the scalarize the induction or not. Could be investigated in a follow-up change, once all related patches are through.
llvm/lib/Transforms/Vectorize/VPlan.cpp
769 ↗	(On Diff #402314)	This code is now gone in the latest version.
llvm/lib/Transforms/Vectorize/VPlan.h
926 ↗	(On Diff #400951)	Will update, but the code has moved to a separate patch again.
1383 ↗	(On Diff #402314)	at the moment there's not possible, all users must be a recipe (also the code is gone from this patch)
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
362	The logic here has been simplified with help of D118167

fhahn retitled this revision from [VPlan] Handle IV vector splat as VPInstruction. to [VPlan] Handle IV vector splat using VPWidenCanonicalIV..Jan 26 2022, 4:30 AM

fhahn edited the summary of this revision. (Show Details)

Ayal added inline comments.Jan 26 2022, 11:19 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2566	OK, this is synonymous with NeedsVectorIV(), capturing it more accurately. Follows the above comment: Perhaps VPWidenIntOrFpInductionRecipe should know of NeedsScalarIV and NeedsVectorIV (names to be improved), pass them to ILV->widenIntOrFpInduction() and to others - see removeRedundantStepVector below?
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
357	Does the following summarize the logic here: Original and New IV's can each provide either scalar only, vector only, or both. Original can replace New iff it provides whatever New needs to provide. Have VPWidenCanonicalIVRecipe also support needsScalarIVOnly() - by checking if all it's users are ActiveLaneMask VPInstructions, and needsScalarIV() - by checking if any user is ActiveLaneMask? Then check if ((WidenNewIV->needsScalarIV() && WidenOriginalIV->needsScalarIV()) \|\| (!WidenNewIV->needsScalarIVOnly() && !WidenOriginalIV->needsScalarIVOnly())) { WidenNewIV->replaceAllUsesWith(WidenOriginalIV); WidenNewIV->eraseFromParent(); } ?
llvm/test/Transforms/LoopVectorize/first-order-recurrence-sink-replicate-region.ll
175–176	Should vp<%2> be vp<[[CAN_IV]]> ?
llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll
153–154	Hmm, this enables sink scalar operands to handle scalar steps.

Harbormaster completed remote builds in B145562: Diff 402978.Jan 26 2022, 6:52 PM

Addressed latest comments, thanks!

Clarified comment for replacement check for WidenNewIV and WidenOriginalIV, update test to use CAN_IV variable.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2566	Updated D118167 to use `needsVectorIV`
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
357	Does the following summarize the logic here: Original and New IV's can each provide either scalar only, vector only, or both. Original can replace New iff it provides whatever New needs to provide. Yes, that's a better summary! I updated the comment. I am not sure about adding `needsScalarIVOnly`/`needsScalarIV` to `VPWidenCanonicalIVRecipe`, as this distinction seems only relevant here and might muddy the waters a bit, as the recipe will always generate a vector value (and I don't think we should change that). I left the check here as is for now, but once D116554 lands this check can be replaced with `onlyFirstLaneUsed(WidenNewIV)`. I think to wrap things up, it is worth to keep the current order of patches and have the temporary check of `WidenNewIV`'s users here, to simplify the patch ordering.
llvm/test/Transforms/LoopVectorize/first-order-recurrence-sink-replicate-region.ll
175–176	Yep, updated, thanks!
llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll
153–154	I think this is caused by the drop of the check for the scalar VF; we add the VPWidenCanonicalInduction, but the cost model knows that the induction recipe will be scalar only, so we cannot replace the VPWidenCanonicalInduction with the other induction recipe and we have some extra steps. Given that the VF=1,UF>1 is somewhat of an edge case, it seems cleaner to not check for the scalar VF, as suggested earlier.

This is fine, with a last nit

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
357	Very well. How about just setting here bool WidenNewIVOnlyFirstLaneUsed = all_of(...); if (WidenOriginalIV->needsVectorIV() \|\| WidenNewIVOnlyFirstLaneUsed) { WidenNewIV->replaceAllUsesWith(WidenOriginalIV); WidenNewIV->eraseFromParent(); } ?

This revision is now accepted and ready to land.Jan 27 2022, 2:31 AM

Thanks Ayal! Moved the all_of check outside the if and assigned to variable.

This change now depends on D118167, which is NFC and adds the needsVectorIV/needsScalarIV required.

Harbormaster completed remote builds in B145964: Diff 403567.Jan 27 2022, 9:50 AM

fhahn mentioned this in rG96400f179ff6: [VPlan] Record whether scalar IVs are need in induction recipe. (NFC).Jan 28 2022, 1:34 AM

Rebased on current main, I am planning on landing this soon.

This revision was landed with ongoing or failed builds.Jan 29 2022, 8:25 AM

Closed by commit rGefd4938723ef: [VPlan] Handle IV vector splat using VPWidenCanonicalIV. (authored by fhahn). · Explain Why

This revision was automatically updated to reflect the committed changes.

fhahn added a commit: rGefd4938723ef: [VPlan] Handle IV vector splat using VPWidenCanonicalIV..

Harbormaster completed remote builds in B146461: Diff 404274.Jan 29 2022, 9:13 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

44 lines

VPlanTransforms.cpp

19 lines

test/

Transforms/

LoopVectorize/

AArch64/

sve-tail-folding-forced.ll

6 lines

sve-tail-folding.ll

60 lines

tail-fold-uniform-memops.ll

6 lines

ARM/

mve-gather-scatter-tailpred.ll

6 lines

mve-reduction-types.ll

27 lines

X86/

3 lines

14 lines

3 lines

28 lines

6 lines

vect.omp.force.small-tc.ll

5 lines

dont-fold-tail-for-const-TC.ll

5 lines

dont-fold-tail-for-divisible-TC.ll

6 lines

first-order-recurrence-sink-replicate-region.ll

3 lines

first-order-recurrence.ll

14 lines

pr44488-predication.ll

7 lines

pr45679-fold-tail-by-masking.ll

24 lines

pr46525-expander-insertpoint.ll

16 lines

pr51614-fold-tail-by-masking.ll

7 lines

select-reduction.ll

3 lines

tail-folding-vectorization-factor-1.ll

24 lines

Diff 404277

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,520 Lines • ▼ Show 20 Lines	if (Trunc) {
auto *TruncType = cast<IntegerType>(Trunc->getType());		auto *TruncType = cast<IntegerType>(Trunc->getType());
assert(Step->getType()->isIntegerTy() &&		assert(Step->getType()->isIntegerTy() &&
"Truncation requires an integer step");		"Truncation requires an integer step");
ScalarIV = Builder.CreateTrunc(ScalarIV, TruncType);		ScalarIV = Builder.CreateTrunc(ScalarIV, TruncType);
Step = Builder.CreateTrunc(Step, TruncType);		Step = Builder.CreateTrunc(Step, TruncType);
}		}
return ScalarIV;		return ScalarIV;
};		};

// Create the vector values from the scalar IV, in the absence of creating a
// vector IV.
auto CreateSplatIV = [&](Value ScalarIV, Value Step) {
Value *Broadcasted = getBroadcastInstrs(ScalarIV);
for (unsigned Part = 0; Part < UF; ++Part) {
Value *StartIdx;
if (Step->getType()->isFloatingPointTy())
StartIdx =
getRuntimeVFAsFloat(Builder, Step->getType(), State.VF * Part);
else
StartIdx = getRuntimeVF(Builder, Step->getType(), State.VF * Part);

Value *EntryPart =
getStepVector(Broadcasted, StartIdx, Step, ID.getInductionOpcode(),
State.VF, State.Builder);
State.set(Def, EntryPart, Part);
if (Trunc)
addMetadata(EntryPart, Trunc);
}
};

// Fast-math-flags propagate from the original induction instruction.		// Fast-math-flags propagate from the original induction instruction.
		AyalUnsubmitted Done Reply Inline Actions CreateSplatIV() will now be called when unrolling only, i.e., when VF==1, so it creates and stores scalar values w/o actually Splatting into vectors. Update comment and name of lambda? Ayal: CreateSplatIV() will now be called when unrolling only, i.e., when VF==1, so it creates and…
		fhahnAuthorUnsubmitted Done Reply Inline Actions I inlined the lambda, as it is a single use now. I also inlined the scalar path from `getStepVector`, which seems like it should have never been there in the first place. fhahn: I inlined the lambda, as it is a single use now. I also inlined the scalar path from…
IRBuilder<>::FastMathFlagGuard FMFG(Builder);		IRBuilder<>::FastMathFlagGuard FMFG(Builder);
if (ID.getInductionBinOp() && isa<FPMathOperator>(ID.getInductionBinOp()))		if (ID.getInductionBinOp() && isa<FPMathOperator>(ID.getInductionBinOp()))
Builder.setFastMathFlags(ID.getInductionBinOp()->getFastMathFlags());		Builder.setFastMathFlags(ID.getInductionBinOp()->getFastMathFlags());

// Now do the actual transformations, and start with creating the step value.		// Now do the actual transformations, and start with creating the step value.
Value *Step = CreateStepValue(ID.getStep());		Value *Step = CreateStepValue(ID.getStep());
if (State.VF.isScalar()) {		if (State.VF.isScalar()) {
		AyalUnsubmitted Done Reply Inline Actions (assert !State.VF.isZero()? Indep. of this patch.) Ayal: (assert !State.VF.isZero()? Indep. of this patch.)
		fhahnAuthorUnsubmitted Done Reply Inline Actions Done in e2f1c4c7066b fhahn: Done in e2f1c4c7066b
Value *ScalarIV = CreateScalarIV(Step);		Value *ScalarIV = CreateScalarIV(Step);
Type *ScalarTy = IntegerType::get(ScalarIV->getContext(),		Type *ScalarTy = IntegerType::get(ScalarIV->getContext(),
Step->getType()->getScalarSizeInBits());		Step->getType()->getScalarSizeInBits());
		AyalUnsubmitted Done Reply Inline Actions assert outside loop. This is handling the unrolling-only case, does it matter whether vectors are scalable or not? Ayal: assert outside loop. This is handling the unrolling-only case, does it matter whether vectors…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Moved! This is handling the unrolling-only case, does it matter whether vectors are scalable or not? Not sure why the assert is here, but I think it should be retained in this patch. fhahn: Moved! > This is handling the unrolling-only case, does it matter whether vectors are scalable…

Instruction::BinaryOps IncOp = ID.getInductionOpcode();		Instruction::BinaryOps IncOp = ID.getInductionOpcode();
if (IncOp == Instruction::BinaryOpsEnd)		if (IncOp == Instruction::BinaryOpsEnd)
IncOp = Instruction::Add;		IncOp = Instruction::Add;
		AyalUnsubmitted Done Reply Inline Actions State.VF == 1? Calling getRuntimeVF* seems redundant, StartIdx == (Value...)Part? Ayal:* State.VF == 1? Calling getRuntimeVF* seems redundant, StartIdx == (Value*...)Part?
		fhahnAuthorUnsubmitted Done Reply Inline Actions Replaced call. fhahn: Replaced call.
for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
Value *StartIdx = ConstantInt::get(ScalarTy, Part);		Value *StartIdx = ConstantInt::get(ScalarTy, Part);
Instruction::BinaryOps MulOp = Instruction::Mul;		Instruction::BinaryOps MulOp = Instruction::Mul;
if (Step->getType()->isFloatingPointTy()) {		if (Step->getType()->isFloatingPointTy()) {
StartIdx = Builder.CreateUIToFP(StartIdx, Step->getType());		StartIdx = Builder.CreateUIToFP(StartIdx, Step->getType());
MulOp = Instruction::FMul;		MulOp = Instruction::FMul;
}		}
		AyalUnsubmitted Done Reply Inline Actions ditto Ayal: ditto
		fhahnAuthorUnsubmitted Done Reply Inline Actions Replaced call fhahn: Replaced call

Value *Mul = Builder.CreateBinOp(MulOp, StartIdx, Step);		Value *Mul = Builder.CreateBinOp(MulOp, StartIdx, Step);
Value *EntryPart = Builder.CreateBinOp(IncOp, ScalarIV, Mul, "induction");		Value *EntryPart = Builder.CreateBinOp(IncOp, ScalarIV, Mul, "induction");
State.set(Def, EntryPart, Part);		State.set(Def, EntryPart, Part);
if (Trunc) {		if (Trunc) {
assert(!Step->getType()->isFloatingPointTy() &&		assert(!Step->getType()->isFloatingPointTy() &&
"fp inductions shouldn't be truncated");		"fp inductions shouldn't be truncated");
addMetadata(EntryPart, Trunc);		addMetadata(EntryPart, Trunc);
}		}
}		}
return;		return;
}		}

		AyalUnsubmitted Done Reply Inline Actions Clarifying the three options for VF>1: Create a vector IV (createVectorIntOrFpInductionPHI) Create a scalar IV (CreateScalarIV) with per-lane steps (buildScalarSteps) both 1&2 which complements the above option for VF=1: 0. Create a scalar IV (CreateScalarIV) with per-part steps (CreateSplatIV updated and inlined) It may be clearer to have auto NeedsVectorIV = !shouldScalarizeInstruction(EntryVal); if (NeedsVectorIV) createVectorIntOrFpInductionPHI(ID, Step, Start, EntryVal, Def, State); auto NeedsScalarIV = needsScalarInduction(EntryVal); if (NeedsScalarIV) { Value ScalarIV = CreateScalarIV(Step); buildScalarSteps(ScalarIV, Step, EntryVal, ID, Def, State); } Perhaps VPWidenIntOrFpInductionRecipe should know of NeedsScalarIV and NeedsVectorIV (names to be improved), pass them to ILV->widenIntOrFpInduction() and to others - see removeRedundantStepVector below? Ayal:* Clarifying the three options for VF>1: 1. Create a vector IV (createVectorIntOrFpInductionPHI)…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Clarifying the three options for VF>1: Exactly! I restructured the code as suggested, thanks. fhahn: > Clarifying the three options for VF>1: Exactly! I restructured the code as suggested, thanks.
// If only a vector induction is needed, create it and return.		// Create a new independent vector induction variable, if one is needed.
if (!Def->needsScalarIV()) {		if (Def->needsVectorIV())
		AyalUnsubmitted Done Reply Inline Actions OK, this is synonymous with NeedsVectorIV(), capturing it more accurately. Follows the above comment: Perhaps VPWidenIntOrFpInductionRecipe should know of NeedsScalarIV and NeedsVectorIV (names to be improved), pass them to ILV->widenIntOrFpInduction() and to others - see removeRedundantStepVector below? Ayal: OK, this is synonymous with NeedsVectorIV(), capturing it more accurately. Follows the above…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Updated D118167 to use `needsVectorIV` fhahn: Updated D118167 to use `needsVectorIV`
createVectorIntOrFpInductionPHI(ID, Step, Start, EntryVal, Def, State);		createVectorIntOrFpInductionPHI(ID, Step, Start, EntryVal, Def, State);
return;
}

// Try to create a new independent vector induction variable. If we can't		if (Def->needsScalarIV()) {
		AyalUnsubmitted Done Reply Inline Actions 2nd "If we can't ..." sentence is obsolete? Ayal: 2nd "If we can't ..." sentence is obsolete?
		fhahnAuthorUnsubmitted Done Reply Inline Actions removed fhahn: removed
// create the phi node, we will splat the scalar induction variable in each
// loop iteration.
if (Def->needsVectorIV()) {
createVectorIntOrFpInductionPHI(ID, Step, Start, EntryVal, Def, State);
Value *ScalarIV = CreateScalarIV(Step);
// Create scalar steps that can be used by instructions we will later		// Create scalar steps that can be used by instructions we will later
		AyalUnsubmitted Done Reply Inline Actions Set StartIdx to int or fp from its beginning? Indep. of this patch. Ayal: Set StartIdx to int or fp from its beginning? Indep. of this patch.
		fhahnAuthorUnsubmitted Done Reply Inline Actions Landed the change separately in 7ce48be0fd83 . StartIdx depends on `Part` though, so I left it in the loop for now. fhahn: Landed the change separately in 7ce48be0fd83 . StartIdx depends on `Part` though, so I left it…
// scalarize. Note that the addition of the scalar steps will not increase		// scalarize. Note that the addition of the scalar steps will not increase
// the number of instructions in the loop in the common case prior to		// the number of instructions in the loop in the common case prior to
// InstCombine. We will be trading one vector extract for each scalar step.		// InstCombine. We will be trading one vector extract for each scalar step.
buildScalarSteps(ScalarIV, Step, EntryVal, ID, Def, State);
return;
}

// All IV users are scalar instructions, so only emit a scalar IV, not a
// vectorised IV. Except when we tail-fold, then the splat IV feeds the
// predicate used by the masked loads/stores.
Value *ScalarIV = CreateScalarIV(Step);		Value *ScalarIV = CreateScalarIV(Step);
if (!Cost->isScalarEpilogueAllowed())
CreateSplatIV(ScalarIV, Step);
buildScalarSteps(ScalarIV, Step, EntryVal, ID, Def, State);		buildScalarSteps(ScalarIV, Step, EntryVal, ID, Def, State);
}		}
		}
		AyalUnsubmitted Done Reply Inline Actions 2nd "Except ..." sentence is obsolete? Ayal: 2nd "Except ..." sentence is obsolete?
		fhahnAuthorUnsubmitted Done Reply Inline Actions removed fhahn: removed
		AyalUnsubmitted Done Reply Inline Actions This last case which only emits a scalar IV can now be folded with the end of the previous case which emits both? Ayal: This last case which only emits a scalar IV can now be folded with the end of the previous case…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Done! fhahn: Done!
		AyalUnsubmitted Done Reply Inline Actions Would be good to fold int and fp handlings, indep. of this patch: Set and use `MulOp` here as well? Use CreateBinOp here as well? Name the instruction "induction" above for fp as well? OTOH, hanlding Trunc should apply to int only? Ayal: Would be good to fold int and fp handlings, indep. of this patch: Set and use `MulOp` here as…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Thanks, I applied the folding and added an assert in 7ce48be0fd83. fhahn: Thanks, I applied the folding and added an assert in 7ce48be0fd83.

void InnerLoopVectorizer::buildScalarSteps(Value ScalarIV, Value Step,		void InnerLoopVectorizer::buildScalarSteps(Value ScalarIV, Value Step,
Instruction *EntryVal,		Instruction *EntryVal,
const InductionDescriptor &ID,		const InductionDescriptor &ID,
VPValue *Def,		VPValue *Def,
VPTransformState &State) {		VPTransformState &State) {
IRBuilder<> &Builder = State.Builder;		IRBuilder<> &Builder = State.Builder;
// We shouldn't have to build scalar steps if we aren't vectorizing.		// We shouldn't have to build scalar steps if we aren't vectorizing.
▲ Show 20 Lines • Show All 2,207 Lines • ▼ Show 20 Lines	void LoopVectorizationCostModel::collectLoopScalars(ElementCount VF) {

// An induction variable will remain scalar if all users of the induction		// An induction variable will remain scalar if all users of the induction
// variable and induction variable update remain scalar.		// variable and induction variable update remain scalar.
for (auto &Induction : Legal->getInductionVars()) {		for (auto &Induction : Legal->getInductionVars()) {
auto *Ind = Induction.first;		auto *Ind = Induction.first;
auto *IndUpdate = cast<Instruction>(Ind->getIncomingValueForBlock(Latch));		auto *IndUpdate = cast<Instruction>(Ind->getIncomingValueForBlock(Latch));

// If tail-folding is applied, the primary induction variable will be used		// If tail-folding is applied, the primary induction variable will be used
// to feed a vector compare.		// to feed a vector compare.
		AyalUnsubmitted Done Reply Inline Actions Wonder if this is still the case now that NewIV is always created to feed the vector compare, until replaced? Ayal: Wonder if this is still the case now that NewIV is always created to feed the vector compare…
		fhahnAuthorUnsubmitted Done Reply Inline Actions I think for compatibility with existing behavior we still need to retain it, as it impacts the decision whether the scalarize the induction or not. Could be investigated in a follow-up change, once all related patches are through. fhahn: I think for compatibility with existing behavior we still need to retain it, as it impacts the…
if (Ind == Legal->getPrimaryInduction() && foldTailByMasking())		if (Ind == Legal->getPrimaryInduction() && foldTailByMasking())
continue;		continue;

// Returns true if \p Indvar is a pointer induction that is used directly by		// Returns true if \p Indvar is a pointer induction that is used directly by
// load/store instruction \p I.		// load/store instruction \p I.
auto IsDirectLoadStoreFromPtrIndvar = [&](Instruction *Indvar,		auto IsDirectLoadStoreFromPtrIndvar = [&](Instruction *Indvar,
Instruction *I) {		Instruction *I) {
return Induction.second.getKind() ==		return Induction.second.getKind() ==
▲ Show 20 Lines • Show All 3,585 Lines • ▼ Show 20 Lines	VPValue VPRecipeBuilder::createBlockInMask(BasicBlock BB, VPlanPtr &Plan) {
// All-one mask is modelled as no-mask following the convention for masked		// All-one mask is modelled as no-mask following the convention for masked
// load/store/gather/scatter. Initialize BlockMask to no-mask.		// load/store/gather/scatter. Initialize BlockMask to no-mask.
VPValue *BlockMask = nullptr;		VPValue *BlockMask = nullptr;

if (OrigLoop->getHeader() == BB) {		if (OrigLoop->getHeader() == BB) {
if (!CM.blockNeedsPredicationForAnyReason(BB))		if (!CM.blockNeedsPredicationForAnyReason(BB))
return BlockMaskCache[BB] = BlockMask; // Loop incoming mask is all-one.		return BlockMaskCache[BB] = BlockMask; // Loop incoming mask is all-one.

// Introduce the early-exit compare IV <= BTC to form header block mask.		// Introduce the early-exit compare IV <= BTC to form header block mask.
// This is used instead of IV < TC because TC may wrap, unlike BTC. Start by		// This is used instead of IV < TC because TC may wrap, unlike BTC. Start by
		AyalUnsubmitted Done Reply Inline Actions Could we simplify by always creating a VPWidenCanonicalIVRecipe, w/o trying to use Legal->getPrimaryInduction() even when it exists? At first FoldTail relied solely on PrimaryInduction, until D77635 extended it to work w/o PrimaryInduction; but it should probably have removed the dependence on PrimaryInduction altogether. This may also simplify/revert D92017. A subsequent optimization could try to fold multiple IV recipes, in case both the PrimaryInduction was widened and WidenCanonicalIV was created (or leave it to LSR). Ayal: Could we simplify by always creating a VPWidenCanonicalIVRecipe, w/o trying to use Legal…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Could we simplify by always creating a VPWidenCanonicalIVRecipe, w/o trying to use Legal->getPrimaryInduction() even when it exists? We could do, but it would probably mean a bigger test diff and a bigger functional change. I'd prefer to do it separately, to avoid any unexpected knock-on effects. fhahn: > Could we simplify by always creating a VPWidenCanonicalIVRecipe, w/o trying to use Legal…
		AyalUnsubmitted Done Reply Inline Actions Could we simplify by always creating a VPWidenCanonicalIVRecipe, w/o trying to use Legal->getPrimaryInduction() even when it exists? We could do, but it would probably mean a bigger test diff and a bigger functional change. I'd prefer to do it separately, to avoid any unexpected knock-on effects. sure, can also do so separately first. It may help simplify this patch, among other things. Ayal: >> Could we simplify by always creating a VPWidenCanonicalIVRecipe, w/o trying to use Legal…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Update to always generate the widened canonical induction. fhahn: Update to always generate the widened canonical induction.
// constructing the desired canonical IV in the header block as its first		// constructing the desired canonical IV in the header block as its first
// non-phi instructions.		// non-phi instructions.
assert(CM.foldTailByMasking() && "must fold the tail");		assert(CM.foldTailByMasking() && "must fold the tail");
VPBasicBlock *HeaderVPBB = Plan->getEntry()->getEntryBasicBlock();		VPBasicBlock *HeaderVPBB = Plan->getEntry()->getEntryBasicBlock();
auto NewInsertionPoint = HeaderVPBB->getFirstNonPhi();		auto NewInsertionPoint = HeaderVPBB->getFirstNonPhi();
auto *IV = new VPWidenCanonicalIVRecipe(Plan->getCanonicalIV());		auto *IV = new VPWidenCanonicalIVRecipe(Plan->getCanonicalIV());
HeaderVPBB->insert(IV, HeaderVPBB->getFirstNonPhi());		HeaderVPBB->insert(IV, HeaderVPBB->getFirstNonPhi());

VPBuilder::InsertPointGuard Guard(Builder);		VPBuilder::InsertPointGuard Guard(Builder);
Builder.setInsertPoint(HeaderVPBB, NewInsertionPoint);		Builder.setInsertPoint(HeaderVPBB, NewInsertionPoint);
		AyalUnsubmitted Done Reply Inline Actions TailFolded should always be true? (While we're here, irrespective of this patch) Ayal: TailFolded should always be true? (While we're here, irrespective of this patch)
		fhahnAuthorUnsubmitted Done Reply Inline Actions Yes, I noticed that as well. I'll push a commit, replacing it with an assert. fhahn: Yes, I noticed that as well. I'll push a commit, replacing it with an assert.
if (CM.TTI.emitGetActiveLaneMask()) {		if (CM.TTI.emitGetActiveLaneMask()) {
VPValue *TC = Plan->getOrCreateTripCount();		VPValue *TC = Plan->getOrCreateTripCount();
BlockMask = Builder.createNaryOp(VPInstruction::ActiveLaneMask, {IV, TC});		BlockMask = Builder.createNaryOp(VPInstruction::ActiveLaneMask, {IV, TC});
		AyalUnsubmitted Done Reply Inline Actions No need for {} So a StepVector is needed only to feed ICmpULE, not ActiveLaneMask which depends on first lane only? Seems like an optimization best applied separately? BTC is no longer used by ActiveLaneMask, so setting it should sink below to where ICmpULE needs it (While we're here, irrespective of this patch). Ayal: No need for {} So a StepVector is needed only to feed ICmpULE, not ActiveLaneMask which…
		fhahnAuthorUnsubmitted Done Reply Inline Actions No need for {} I think if the body spans multiple lines due to a comment, braces should not be omitted https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements So a StepVector is needed only to feed ICmpULE, not ActiveLaneMask which depends on first lane only? Seems like an optimization best applied separately? Yes, let me try to split that off. `ActiveLaneMask` only needs the first scalar step of each `Part`. BTC is no longer used by ActiveLaneMask, so setting it should sink below to where ICmpULE needs it (While we're here, irrespective of this patch). done separately in 2e630eabd329 fhahn: > No need for {} I think if the body spans multiple lines due to a comment, braces should not…
		fhahnAuthorUnsubmitted Done Reply Inline Actions So a StepVector is needed only to feed ICmpULE, not ActiveLaneMask which depends on first lane only? Seems like an optimization best applied separately? Yes, let me try to split that off. ActiveLaneMask only needs the first scalar step of each Part. I just double checked and we may still need to go the route with the widened recipe if there's no primary induction. But there's no test case without primary inductions that use active.lane.mask. I;ll push one and update the patch. fhahn: >> So a StepVector is needed only to feed ICmpULE, not ActiveLaneMask which depends on first…
		fhahnAuthorUnsubmitted Done Reply Inline Actions I missed something here and IIUC your comment was regarding creating a step vector in both cases at first, right? The issue with that is that in some cases `ActiveLaneMask` will use the step-vector instead of the scalar steps, so I left it as is for now. I just double checked and we may still need to go the route with the widened recipe if there's no primary induction. The current patch still creates the vector phi when necessary and the existing tests cover that already. fhahn: I missed something here and IIUC your comment was regarding creating a step vector in both…
		AyalUnsubmitted Done Reply Inline Actions Yes, my comment refers to the fact that currently both ActiveLaneMask and ICmpULE are treated the same when Legal has a PrimaryInduction() - both use the same IV = Plan->getOrAddVPValue(Legal->getPrimaryInduction()); This patch continues to feed ActiveLaneMask with this IV, but feeds ICmpULE with a StepVector. Ayal: Yes, my comment refers to the fact that currently both ActiveLaneMask and ICmpULE are treated…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Yes, my comment refers to the fact that currently both ActiveLaneMask and ICmpULE are treated the same when Legal has a PrimaryInduction() - both use the same IV = Plan->getOrAddVPValue(Legal->getPrimaryInduction()); Ah I see. Yes, currently they both use the same VPValue, because the single recipe creates both the scalar steps and the step vector. But ActiveLaneMask only needs the steps and ICmpULE only needs the step vector. This is explicit now in the code as a result of breaking this part of the phi handling into distinct pieces. fhahn: > Yes, my comment refers to the fact that currently both ActiveLaneMask and ICmpULE are treated…
		AyalUnsubmitted Done Reply Inline Actions Ah I see. Yes, currently they both use the same VPValue, because the single recipe creates both the scalar steps and the step vector. But ActiveLaneMask only needs the steps and ICmpULE only needs the step vector. This is explicit now in the code as a result of breaking this part of the phi handling into distinct pieces. ok, that confirms 1st Q above, what about the 2nd: Seems like an optimization best applied separately? Ayal: > Ah I see. Yes, currently they both use the same VPValue, because the single recipe creates…
		fhahnAuthorUnsubmitted Done Reply Inline Actions I updated the patch to always generate widened induction here, and added a transform to replace it if possible later. fhahn: I updated the patch to always generate widened induction here, and added a transform to replace…
		AyalUnsubmitted Done Reply Inline Actions Better have the transform in a separate patch, to retain exiting behavior here? Ayal: Better have the transform in a separate patch, to retain exiting behavior here?
		fhahnAuthorUnsubmitted Done Reply Inline Actions Good idea, split off to D117140 fhahn: Good idea, split off to D117140
} else {		} else {
VPValue *BTC = Plan->getOrCreateBackedgeTakenCount();		VPValue *BTC = Plan->getOrCreateBackedgeTakenCount();
BlockMask = Builder.createNaryOp(VPInstruction::ICmpULE, {IV, BTC});		BlockMask = Builder.createNaryOp(VPInstruction::ICmpULE, {IV, BTC});
		AyalUnsubmitted Done Reply Inline Actions (If a VPWidenCanonicalIVRecipe is always constructed then IV will always be set to it here and this construction is not needed?) Ayal: (If a VPWidenCanonicalIVRecipe is always constructed then IV will always be set to it here and…
}		}
return BlockMaskCache[BB] = BlockMask;		return BlockMaskCache[BB] = BlockMask;
}		}
		AyalUnsubmitted Done Reply Inline Actions Is anything but CanIV expected to feed getCanonicalStepVector? Does it aim to cope with multiple attempts to obtain CanIV's StepVector? It only feeds a single ICmpULE. Sounds like Plan->getOrCreateCanonicalStepVector()? Can alternatively have VPCanonicalIVPhiRecipe define two VPValues: one being the scalar-per-part providing lane 0, and the other (optionally optional) providing its vector steps? Ayal: Is anything but CanIV expected to feed getCanonicalStepVector? Does it aim to cope with…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Is anything but CanIV expected to feed getCanonicalStepVector? Does it aim to cope with multiple attempts to obtain CanIV's StepVector? It only feeds a single ICmpULE. Yes that was the original goal. The code is now gone, the stepvector may be replace the widened canonical IV created unconditional here. Can alternatively have VPCanonicalIVPhiRecipe define two VPValues: one being the scalar-per-part providing lane 0, and the other (optionally optional) providing its vector steps? I think modeling the stepvector separately allows us to re-use it for vector induction increments and optionally defining an additional value would make it a bit harder to first add the stepvector and then optimize it away; with a separate recipe for the stepvector we can simplify replace the uses and remove the step recipe. With multi-defs we would need to either undef one of the multi-defs or replace the multi-def recipe with a single def version. fhahn: > Is anything but CanIV expected to feed getCanonicalStepVector? > Does it aim to cope with…

// This is the block mask. We OR all incoming edges.		// This is the block mask. We OR all incoming edges.
for (auto *Predecessor : predecessors(BB)) {		for (auto *Predecessor : predecessors(BB)) {
VPValue *EdgeMask = createEdgeMask(Predecessor, BB, Plan);		VPValue *EdgeMask = createEdgeMask(Predecessor, BB, Plan);
if (!EdgeMask) // Mask of predecessor is all-one so mask of block is too.		if (!EdgeMask) // Mask of predecessor is all-one so mask of block is too.
		AyalUnsubmitted Done Reply Inline Actions StepVector needs to hold One as its second operand, mainly to convey the desired type? Ayal: StepVector needs to hold One as its second operand, mainly to convey the desired type?
		fhahnAuthorUnsubmitted Done Reply Inline Actions At the moment only a step of one is used, but it will be used with different steps in the future (build scalar steps for inductions with arbitrary steps. fhahn: At the moment only a step of one is used, but it will be used with different steps in the…
		AyalUnsubmitted Done Reply Inline Actions At the moment only a step of one is used, but it will be used with different steps in the future (build scalar steps for inductions with arbitrary steps. Have that future patch extend the support from the initial unit step needed here? Ayal: > At the moment only a step of one is used, but it will be used with different steps in the…
		fhahnAuthorUnsubmitted Done Reply Inline Actions A stepvector is also needed for incrementing vector inductions and I plan to use the recipe in a later patch, but unfortunately there's nothing to share yet. fhahn: A stepvector is also needed for incrementing vector inductions and I plan to use the recipe in…
return BlockMaskCache[BB] = EdgeMask;		return BlockMaskCache[BB] = EdgeMask;

if (!BlockMask) { // BlockMask has its initialized nullptr value.		if (!BlockMask) { // BlockMask has its initialized nullptr value.
BlockMask = EdgeMask;		BlockMask = EdgeMask;
continue;		continue;
}		}

BlockMask = Builder.createOr(BlockMask, EdgeMask, {});		BlockMask = Builder.createOr(BlockMask, EdgeMask, {});
▲ Show 20 Lines • Show All 391 Lines • ▼ Show 20 Lines

VPRecipeOrVPValueTy		VPRecipeOrVPValueTy
VPRecipeBuilder::tryToCreateWidenRecipe(Instruction *Instr,		VPRecipeBuilder::tryToCreateWidenRecipe(Instruction *Instr,
ArrayRef<VPValue *> Operands,		ArrayRef<VPValue *> Operands,
VFRange &Range, VPlanPtr &Plan) {		VFRange &Range, VPlanPtr &Plan) {
// First, check for specific widening recipes that deal with calls, memory		// First, check for specific widening recipes that deal with calls, memory
// operations, inductions and Phi nodes.		// operations, inductions and Phi nodes.
if (auto *CI = dyn_cast<CallInst>(Instr))		if (auto *CI = dyn_cast<CallInst>(Instr))
return toVPRecipeResult(tryToWidenCall(CI, Operands, Range));		return toVPRecipeResult(tryToWidenCall(CI, Operands, Range));
		AyalUnsubmitted Done Reply Inline Actions If applied as a VPlan2VPlan optimization after VPlans were built according to ranges that have been clamped, suffice to check properties of any VF in Range; can assert they are the same across Range; best prevent clamping Range. Ayal: If applied as a VPlan2VPlan optimization after VPlans were built according to ranges that have…
		fhahnAuthorUnsubmitted Done Reply Inline Actions The issue here is that for VF == 1 there's no need for a step-vector. We could define step-vector to just produce a scalar for VF = 1 though? Later patches with a separate recipe for scalar-steps would handle this as well. fhahn: The issue here is that for VF == 1 there's no need for a step-vector. We could define step…

if (isa<LoadInst>(Instr) \|\| isa<StoreInst>(Instr))		if (isa<LoadInst>(Instr) \|\| isa<StoreInst>(Instr))
return toVPRecipeResult(tryToWidenMemory(Instr, Operands, Range, Plan));		return toVPRecipeResult(tryToWidenMemory(Instr, Operands, Range, Plan));

VPRecipeBase *Recipe;		VPRecipeBase *Recipe;
if (auto Phi = dyn_cast<PHINode>(Instr)) {		if (auto Phi = dyn_cast<PHINode>(Instr)) {
if (Phi->getParent() != OrigLoop->getHeader())		if (Phi->getParent() != OrigLoop->getHeader())
return tryToBlend(Phi, Operands, Plan);		return tryToBlend(Phi, Operands, Plan);
if ((Recipe = tryToOptimizeInductionPHI(Phi, Operands, Range)))		if ((Recipe = tryToOptimizeInductionPHI(Phi, Operands, Range)))
return toVPRecipeResult(Recipe);		return toVPRecipeResult(Recipe);

VPHeaderPHIRecipe *PhiRecipe = nullptr;		VPHeaderPHIRecipe *PhiRecipe = nullptr;
if (Legal->isReductionVariable(Phi) \|\| Legal->isFirstOrderRecurrence(Phi)) {		if (Legal->isReductionVariable(Phi) \|\| Legal->isFirstOrderRecurrence(Phi)) {
		AyalUnsubmitted Done Reply Inline Actions NeedsScalarInduction(VF) will always early-return true given that ShouldScalarizeInstruction(Instr, VF) is true? Ayal: NeedsScalarInduction(VF) will always early-return true given that ShouldScalarizeInstruction…
		fhahnAuthorUnsubmitted Done Reply Inline Actions I think it is required for the case where `Instr` is a truncate. fhahn: I think it is required for the case where `Instr` is a truncate.
		AyalUnsubmitted Not Done Reply Inline Actions Doesn't `ShouldScalarizeInstruction(Instr, VF) && NeedsScalarInduction(VF)` always equals `ShouldScalarizeInstruction(Instr, VF)` given that `NeedsScalarInduction(VF)` returns true if `ShouldScalarizeInstruction(Instr, VF)` returns true for the same Instr and VF? Ayal: Doesn't `ShouldScalarizeInstruction(Instr, VF) && NeedsScalarInduction(VF)` always equals…
		fhahnAuthorUnsubmitted Done Reply Inline Actions I think to preserve the existing behaviour the PHI needs to be scalarized and any of its users. I updated the code to make the check a bit more direct. fhahn: I think to preserve the existing behaviour the PHI needs to be scalarized and any of its users.
VPValue *StartV = Operands[0];		VPValue *StartV = Operands[0];
if (Legal->isReductionVariable(Phi)) {		if (Legal->isReductionVariable(Phi)) {
const RecurrenceDescriptor &RdxDesc =		const RecurrenceDescriptor &RdxDesc =
Legal->getReductionVars().find(Phi)->second;		Legal->getReductionVars().find(Phi)->second;
assert(RdxDesc.getRecurrenceStartValue() ==		assert(RdxDesc.getRecurrenceStartValue() ==
Phi->getIncomingValueForBlock(OrigLoop->getLoopPreheader()));		Phi->getIncomingValueForBlock(OrigLoop->getLoopPreheader()));
PhiRecipe = new VPReductionPHIRecipe(Phi, RdxDesc, *StartV,		PhiRecipe = new VPReductionPHIRecipe(Phi, RdxDesc, *StartV,
CM.isInLoopReduction(Phi),		CM.isInLoopReduction(Phi),
▲ Show 20 Lines • Show All 293 Lines • ▼ Show 20 Lines	VPlanPtr LoopVectorizationPlanner::buildVPlanWithVPRecipes(

assert(isa<VPRegionBlock>(Plan->getEntry()) &&		assert(isa<VPRegionBlock>(Plan->getEntry()) &&
!Plan->getEntry()->getEntryBasicBlock()->empty() &&		!Plan->getEntry()->getEntryBasicBlock()->empty() &&
"entry block must be set to a VPRegionBlock having a non-empty entry "		"entry block must be set to a VPRegionBlock having a non-empty entry "
"VPBasicBlock");		"VPBasicBlock");
RecipeBuilder.fixHeaderPhis();		RecipeBuilder.fixHeaderPhis();

// ---------------------------------------------------------------------------		// ---------------------------------------------------------------------------
// Transform initial VPlan: Apply previously taken decisions, in order, to		// Transform initial VPlan: Apply previously taken decisions, in order, to
		AyalUnsubmitted Done Reply Inline Actions As a redundancy elimination optimization? This buildVPlanWithVPRecipes() method deserves outlining anything from it. Ayal: As a redundancy elimination optimization? This buildVPlanWithVPRecipes() method deserves…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Sounds good, moved to `VPlanTransforms::removeRedundantStepVector` fhahn: Sounds good, moved to ` VPlanTransforms::removeRedundantStepVector`
		AyalUnsubmitted Done Reply Inline Actions place comment next to call. Ayal: place comment next to call.
		fhahnAuthorUnsubmitted Done Reply Inline Actions the comment is stale now, removed fhahn: the comment is stale now, removed
// bring the VPlan to its final state.		// bring the VPlan to its final state.
// ---------------------------------------------------------------------------		// ---------------------------------------------------------------------------

// Apply Sink-After legal constraints.		// Apply Sink-After legal constraints.
auto GetReplicateRegion = [](VPRecipeBase R) -> VPRegionBlock {		auto GetReplicateRegion = [](VPRecipeBase R) -> VPRegionBlock {
auto *Region = dyn_cast_or_null<VPRegionBlock>(R->getParent()->getParent());		auto *Region = dyn_cast_or_null<VPRegionBlock>(R->getParent()->getParent());
if (Region && Region->isReplicator()) {		if (Region && Region->isReplicator()) {
assert(Region->getNumSuccessors() == 1 &&		assert(Region->getNumSuccessors() == 1 &&
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	if (TargetRegion) {
VPBlockUtils::connectBlocks(SinkRegion, SplitBlock);		VPBlockUtils::connectBlocks(SinkRegion, SplitBlock);
}		}
}		}

VPlanTransforms::removeRedundantCanonicalIVs(*Plan);		VPlanTransforms::removeRedundantCanonicalIVs(*Plan);
VPlanTransforms::removeRedundantInductionCasts(*Plan);		VPlanTransforms::removeRedundantInductionCasts(*Plan);

// Now that sink-after is done, move induction recipes for optimized truncates		// Now that sink-after is done, move induction recipes for optimized truncates
// to the phi section of the header block.		// to the phi section of the header block.
		AyalUnsubmitted Done Reply Inline Actions Have removeRedundantStepVector() take care of whatever it needs to check internally, rather than passing in this lambda? Ayal: Have removeRedundantStepVector() take care of whatever it needs to check internally, rather…
		fhahnAuthorUnsubmitted Done Reply Inline Actions All information to take the decision is already available in the VPlan. We only need to know whether the tail will be folded and check check the users of the induction recipe in the plan. fhahn: All information to take the decision is already available in the VPlan. We only need to know…
for (VPWidenIntOrFpInductionRecipe *Ind : InductionsToMove)		for (VPWidenIntOrFpInductionRecipe *Ind : InductionsToMove)
Ind->moveBefore(*HeaderVPBB, HeaderVPBB->getFirstNonPhi());		Ind->moveBefore(*HeaderVPBB, HeaderVPBB->getFirstNonPhi());

// Adjust the recipes for any inloop reductions.		// Adjust the recipes for any inloop reductions.
adjustRecipesForReductions(cast<VPBasicBlock>(TopRegion->getExit()), Plan,		adjustRecipesForReductions(cast<VPBasicBlock>(TopRegion->getExit()), Plan,
RecipeBuilder, Range.Start);		RecipeBuilder, Range.Start);

// Introduce a recipe to combine the incoming and previous values of a		// Introduce a recipe to combine the incoming and previous values of a
▲ Show 20 Lines • Show All 1,553 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

	Show First 20 Lines • Show All 328 Lines • ▼ Show 20 Lines
	void VPlanTransforms::removeRedundantCanonicalIVs(VPlan &Plan) {			void VPlanTransforms::removeRedundantCanonicalIVs(VPlan &Plan) {
	VPCanonicalIVPHIRecipe *CanonicalIV = Plan.getCanonicalIV();			VPCanonicalIVPHIRecipe *CanonicalIV = Plan.getCanonicalIV();
	VPWidenCanonicalIVRecipe *WidenNewIV = nullptr;			VPWidenCanonicalIVRecipe *WidenNewIV = nullptr;
	for (VPUser *U : CanonicalIV->users()) {			for (VPUser *U : CanonicalIV->users()) {
	WidenNewIV = dyn_cast<VPWidenCanonicalIVRecipe>(U);			WidenNewIV = dyn_cast<VPWidenCanonicalIVRecipe>(U);
	if (WidenNewIV)			if (WidenNewIV)
	break;			break;
	}			}

				AyalUnsubmitted Done Reply Inline Actions `Phi`: VPWidenIntOrFpInductionRecipe represents a phi (assumed to exist, corresponding to OriginalCanonicalIV) but VPWidenCanonicalIVRecipe represents a non-phi and thus appears after the former. Ordering their handling accordingly is more logical? Break as soon as (if) the latter is found? WidenCan >> WidenNewIV? WidenInduction >> WidenOriginalIV? (both are canonical...) Ayal: `Phi`: VPWidenIntOrFpInductionRecipe represents a phi (assumed to exist, corresponding to…
				fhahnAuthorUnsubmitted Done Reply Inline Actions Updated in the split-off review. fhahn: Updated in the split-off review.
	if (!WidenNewIV)			if (!WidenNewIV)
	return;			return;

	VPBasicBlock *HeaderVPBB = Plan.getVectorLoopRegion()->getEntryBasicBlock();			VPBasicBlock *HeaderVPBB = Plan.getVectorLoopRegion()->getEntryBasicBlock();
	for (VPRecipeBase &Phi : HeaderVPBB->phis()) {			for (VPRecipeBase &Phi : HeaderVPBB->phis()) {
				AyalUnsubmitted Done Reply Inline Actions Ahh, removeRedundantInductionCasts() is called before hoisting VPWidenIntOrFpInductionRecipes-built-from-Truncs, so the latter will be excluded if we only visit phis() here? OTOH, are such truncated IV's candidates for folding with WidenCanonicalIV in terms of matching types? Ayal: Ahh, removeRedundantInductionCasts() is called before hoisting VPWidenIntOrFpInductionRecipes…
				fhahnAuthorUnsubmitted Done Reply Inline Actions OTOH, are such truncated IV's candidates for folding with WidenCanonicalIV in terms of matching types? I don't think so, as we should widen using the canonical IVs type, which would be the wider type. fhahn: > OTOH, are such truncated IV's candidates for folding with WidenCanonicalIV in terms of…
	auto *WidenOriginalIV = dyn_cast<VPWidenIntOrFpInductionRecipe>(&Phi);			auto *WidenOriginalIV = dyn_cast<VPWidenIntOrFpInductionRecipe>(&Phi);

	// If the induction recipe is canonical and the types match, use it			if (!WidenOriginalIV \|\| !WidenOriginalIV->isCanonical() \|\|
				AyalUnsubmitted Done Reply Inline Actions Can early-exit before the loop if OnlyScalarStepsNeeded(OriginalCanonicalIV). Then here check if (IV && IV->getUnderlyingValue() == OriginalCanonicalIV). But perhaps better to check if (IV && IV->getUnderlyingValue() == OriginalCanonicalIV && !IV->NeedsVectorIV()) as mentioned above. Ayal: Can early-exit before the loop if OnlyScalarStepsNeeded(OriginalCanonicalIV). Then here check…
				fhahnAuthorUnsubmitted Done Reply Inline Actions This has been completely removed and replaced by a different transform. fhahn: This has been completely removed and replaced by a different transform.
	// directly.			WidenOriginalIV->getScalarType() != WidenNewIV->getScalarType())
	if (WidenOriginalIV && WidenOriginalIV->isCanonical() &&			continue;
	WidenOriginalIV->getScalarType() == WidenNewIV->getScalarType()) {
				// Replace WidenNewIV with WidenOriginalIV if WidenOriginalIV provides
				// everything WidenNewIV's users need. That is, WidenOriginalIV will
				// generate a vector phi or all users of WidenNewIV demand the first lane
				// only.
				bool WidenNewIVOnlyFirstLaneUsed =
				all_of(WidenNewIV->users(), [](VPUser *U) {
				auto *R = dyn_cast<VPRecipeBase>(U);
				auto *VPI = dyn_cast_or_null<VPInstruction>(R);
				return VPI && VPI->getOpcode() == VPInstruction::ActiveLaneMask;
				AyalUnsubmitted Done Reply Inline Actions Is this optimization expected to clamp Range.End for the current VPlan during VPlans-for-ranges construction? Is there a need to (only) know if VPlan's range is scalar? Specifically, if Start is Scalar then End can be asserted to be Start2, if needed? Suffice the check if all recipes provide scalars only, although faster to check if range is scalar? Ayal:* Is this optimization expected to clamp Range.End for the current VPlan during VPlans-for-ranges…
				fhahnAuthorUnsubmitted Done Reply Inline Actions It's sufficient to check whether the start is scalar. There's no need to clamp the end. fhahn: It's sufficient to check whether the start is scalar. There's no need to clamp the end.
				AyalUnsubmitted Done Reply Inline Actions Does the following summarize the logic here: Original and New IV's can each provide either scalar only, vector only, or both. Original can replace New iff it provides whatever New needs to provide. Have VPWidenCanonicalIVRecipe also support needsScalarIVOnly() - by checking if all it's users are ActiveLaneMask VPInstructions, and needsScalarIV() - by checking if any user is ActiveLaneMask? Then check if ((WidenNewIV->needsScalarIV() && WidenOriginalIV->needsScalarIV()) \|\| (!WidenNewIV->needsScalarIVOnly() && !WidenOriginalIV->needsScalarIVOnly())) { WidenNewIV->replaceAllUsesWith(WidenOriginalIV); WidenNewIV->eraseFromParent(); } ? Ayal: Does the following summarize the logic here: Original and New IV's can each provide either…
				fhahnAuthorUnsubmitted Done Reply Inline Actions Does the following summarize the logic here: Original and New IV's can each provide either scalar only, vector only, or both. Original can replace New iff it provides whatever New needs to provide. Yes, that's a better summary! I updated the comment. I am not sure about adding `needsScalarIVOnly`/`needsScalarIV` to `VPWidenCanonicalIVRecipe`, as this distinction seems only relevant here and might muddy the waters a bit, as the recipe will always generate a vector value (and I don't think we should change that). I left the check here as is for now, but once D116554 lands this check can be replaced with `onlyFirstLaneUsed(WidenNewIV)`. I think to wrap things up, it is worth to keep the current order of patches and have the temporary check of `WidenNewIV`'s users here, to simplify the patch ordering. fhahn: > Does the following summarize the logic here: Original and New IV's can each provide either…
				AyalUnsubmitted Not Done Reply Inline Actions Very well. How about just setting here bool WidenNewIVOnlyFirstLaneUsed = all_of(...); if (WidenOriginalIV->needsVectorIV() \|\| WidenNewIVOnlyFirstLaneUsed) { WidenNewIV->replaceAllUsesWith(WidenOriginalIV); WidenNewIV->eraseFromParent(); } ? Ayal: Very well. How about just setting here ``` bool WidenNewIVOnlyFirstLaneUsed = all_of(...)…
				});
				if (WidenOriginalIV->needsVectorIV() \|\| WidenNewIVOnlyFirstLaneUsed) {
				AyalUnsubmitted Done Reply Inline Actions vputils::onlyScalarsDemanded()? Ayal: vputils::onlyScalarsDemanded()?
	WidenNewIV->replaceAllUsesWith(WidenOriginalIV);			WidenNewIV->replaceAllUsesWith(WidenOriginalIV);
	WidenNewIV->eraseFromParent();			WidenNewIV->eraseFromParent();
	return;			return;
				AyalUnsubmitted Done Reply Inline Actions The VPWidenCanonicalIVRecipe WidenNewIV is created only to feed the ICmpULE or ActiveLane, i.e., when vectorizing with fold tail, so can assert(!Range.Start.isScalar && FoldTail) if needed below? Scalar values are demanded from WidenNewIV only if it feeds ActiveLane, which demands only the first lane, so broadcasting and adding building scalar steps for each lane seems redundant here? Suffice to feed ActiveLane with UF parts each holding first lane only, either constructed via a simplified StepVector from canonicalIV or perhaps have the canonicalIV supply them directly instead of broadcasting its Part 0 value across all parts (which now seems potentially misleading)? Ayal: The VPWidenCanonicalIVRecipe WidenNewIV is created only to feed the ICmpULE or ActiveLane, i.e.
				fhahnAuthorUnsubmitted Done Reply Inline Actions The VPWidenCanonicalIVRecipe WidenNewIV is created only to feed the ICmpULE or ActiveLane, i.e., when vectorizing with fold tail, so can assert(!Range.Start.isScalar && FoldTail) if needed below? I might be missing something, but I think It is still possible to fold the tail with VF=1 and UF>1, so I left the check (e.g. in `llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll`) Scalar values are demanded from WidenNewIV only if it feeds ActiveLane, which demands only the first lane, so broadcasting and adding building scalar steps for each lane seems redundant here? Suffice to feed ActiveLane with UF parts each holding first lane only, either constructed via a simplified StepVector from canonicalIV or perhaps have the canonicalIV supply them directly instead of broadcasting its Part 0 value across all parts (which now seems potentially misleading)? Yes, for ActiveLane we only need the first scalar part of each lane. I think it might be better to have a special version of build-scalar-steps that only generates the first lane as follow-up to the build-scalar-steps patches. fhahn: > The VPWidenCanonicalIVRecipe WidenNewIV is created only to feed the ICmpULE or ActiveLane, i.
				AyalUnsubmitted Not Done Reply Inline Actions The VPWidenCanonicalIVRecipe WidenNewIV is created only to feed the ICmpULE or ActiveLane, i.e., when vectorizing with fold tail, so can assert(!Range.Start.isScalar && FoldTail) if needed below? I might be missing something, but I think It is still possible to fold the tail with VF=1 and UF>1, so I left the check (e.g. in llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll) Ah, you're right, I forgot foldTail may operate when unrolling only; it should probably stop doing that, but in a separate patch. Scalar values are demanded from WidenNewIV only if it feeds ActiveLane, which demands only the first lane, so broadcasting and adding building scalar steps for each lane seems redundant here? Suffice to feed ActiveLane with UF parts each holding first lane only, either constructed via a simplified StepVector from canonicalIV or perhaps have the canonicalIV supply them directly instead of broadcasting its Part 0 value across all parts (which now seems potentially misleading)? Yes, for ActiveLane we only need the first scalar part of each lane. I think it might be better to have a special version of build-scalar-steps that only generates the first lane as follow-up to the build-scalar-steps patches. Follow-up TODO would be fine. Still would be good to try and simplify the logic here that checks if OriginalIV cannot replace NewIV (but instead a StepVector'd CanonicalIV replaces NewIV): FoldTail must be true given that NewIV exists, so can be asserted or not passed-in; which leaves Range includes only positive VF's && onlyFirstLaneUsed(WidenOriginalIV) && onlyScalarsUsed(WidenOriginalIV) && NewIV feeds some VPInstruction that isn't ActiveLaneMask ? Ayal: >> The VPWidenCanonicalIVRecipe WidenNewIV is created only to feed the ICmpULE or ActiveLane, i.
				fhahnAuthorUnsubmitted Done Reply Inline Actions The logic here has been simplified with help of D118167 fhahn: The logic here has been simplified with help of D118167
	}			}
	}			}
	}			}
				AyalUnsubmitted Done Reply Inline Actions vputils::onlyFirstLaneDemanded()? Ayal: vputils::onlyFirstLaneDemanded()?
				AyalUnsubmitted Done Reply Inline Actions Place next to above early-exit if !WidenCan? Ayal: Place next to above early-exit if !WidenCan?
				fhahnAuthorUnsubmitted Done Reply Inline Actions Moved! fhahn: Moved!
				AyalUnsubmitted Done Reply Inline Actions Scalar Start handled above (w/o retaining WidenCan)? Ayal: Scalar Start handled above (w/o retaining WidenCan)?
				fhahnAuthorUnsubmitted Done Reply Inline Actions This is not needed in the latest version, removed. fhahn: This is not needed in the latest version, removed.
				AyalUnsubmitted Done Reply Inline Actions WidenCan is going to be replaced, question is by what; check instead if StepVector is needed (one recipe producing only scalars the other vectors (deserves more review thoughts...)) and if so create it, followed by a single replacing and erasing of WidenCan? Ayal: WidenCan is going to be replaced, question is by what; check instead if StepVector is needed…
				fhahnAuthorUnsubmitted Done Reply Inline Actions Updated to check if `StepVector` is needed and sink replace & erase code. fhahn: Updated to check if `StepVector` is needed and sink replace & erase code.

llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-forced.ll

	Show All 19 Lines
	; CHECK-NEXT: [[N_RND_UP:%.*]] = add i64 [[UMAX]], [[TMP4]]			; CHECK-NEXT: [[N_RND_UP:%.*]] = add i64 [[UMAX]], [[TMP4]]
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP1]]			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP1]]
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT5:%.]] = insertelement <vscale x 4 x i32> poison, i32 [[VAL:%.]], i32 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT5:%.]] = insertelement <vscale x 4 x i32> poison, i32 [[VAL:%.]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT6:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT5]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT6:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT5]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: br label %vector.body			; CHECK-NEXT: br label %vector.body
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX1:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT2:%.]], %vector.body ]			; CHECK-NEXT: [[INDEX1:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT2:%.]], %vector.body ]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT3:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[INDEX1]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT4:%.*]] = shufflevector <vscale x 4 x i64> [[BROADCAST_SPLATINSERT3]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP5:%.*]] = call <vscale x 4 x i64> @llvm.experimental.stepvector.nxv4i64()
	; CHECK-NEXT: [[TMP6:%.*]] = add <vscale x 4 x i64> [[TMP5]], zeroinitializer
	; CHECK-NEXT: [[TMP7:%.*]] = mul <vscale x 4 x i64> [[TMP6]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECK-NEXT: [[INDUCTION:%.*]] = add <vscale x 4 x i64> [[BROADCAST_SPLAT4]], [[TMP7]]
	; CHECK-NEXT: [[TMP8:%.*]] = add i64 [[INDEX1]], 0			; CHECK-NEXT: [[TMP8:%.*]] = add i64 [[INDEX1]], 0
	; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i64(i64 [[TMP8]], i64 [[UMAX]])			; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i64(i64 [[TMP8]], i64 [[UMAX]])
	; CHECK-NEXT: [[TMP10:%.]] = getelementptr i32, i32 [[PTR:%.*]], i64 [[TMP8]]			; CHECK-NEXT: [[TMP10:%.]] = getelementptr i32, i32 [[PTR:%.*]], i64 [[TMP8]]
	; CHECK-NEXT: [[TMP11:%.]] = getelementptr i32, i32 [[TMP10]], i32 0			; CHECK-NEXT: [[TMP11:%.]] = getelementptr i32, i32 [[TMP10]], i32 0
	; CHECK-NEXT: [[TMP12:%.]] = bitcast i32 [[TMP11]] to <vscale x 4 x i32>*			; CHECK-NEXT: [[TMP12:%.]] = bitcast i32 [[TMP11]] to <vscale x 4 x i32>*
	; CHECK-NEXT: call void @llvm.masked.store.nxv4i32.p0nxv4i32(<vscale x 4 x i32> [[BROADCAST_SPLAT6]], <vscale x 4 x i32>* [[TMP12]], i32 4, <vscale x 4 x i1> [[ACTIVE_LANE_MASK]])			; CHECK-NEXT: call void @llvm.masked.store.nxv4i32.p0nxv4i32(<vscale x 4 x i32> [[BROADCAST_SPLAT6]], <vscale x 4 x i32>* [[TMP12]], i32 4, <vscale x 4 x i1> [[ACTIVE_LANE_MASK]])
	; CHECK-NEXT: [[TMP13:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP13:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP14:%.*]] = mul i64 [[TMP13]], 4			; CHECK-NEXT: [[TMP14:%.*]] = mul i64 [[TMP13]], 4
	Show All 26 Lines

llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding.ll

	Show All 19 Lines
	; CHECK-NEXT: [[N_RND_UP:%.*]] = add i64 [[UMAX]], [[TMP4]]			; CHECK-NEXT: [[N_RND_UP:%.*]] = add i64 [[UMAX]], [[TMP4]]
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP1]]			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP1]]
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT5:%.]] = insertelement <vscale x 4 x i32> poison, i32 [[VAL:%.]], i32 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT5:%.]] = insertelement <vscale x 4 x i32> poison, i32 [[VAL:%.]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT6:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT5]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT6:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT5]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: br label %vector.body			; CHECK-NEXT: br label %vector.body
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX1:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT2:%.]], %vector.body ]			; CHECK-NEXT: [[INDEX1:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT2:%.]], %vector.body ]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT3:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[INDEX1]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT4:%.*]] = shufflevector <vscale x 4 x i64> [[BROADCAST_SPLATINSERT3]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP5:%.*]] = call <vscale x 4 x i64> @llvm.experimental.stepvector.nxv4i64()
	; CHECK-NEXT: [[TMP6:%.*]] = add <vscale x 4 x i64> [[TMP5]], zeroinitializer
	; CHECK-NEXT: [[TMP7:%.*]] = mul <vscale x 4 x i64> [[TMP6]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECK-NEXT: [[INDUCTION:%.*]] = add <vscale x 4 x i64> [[BROADCAST_SPLAT4]], [[TMP7]]
	; CHECK-NEXT: [[TMP8:%.*]] = add i64 [[INDEX1]], 0			; CHECK-NEXT: [[TMP8:%.*]] = add i64 [[INDEX1]], 0
	; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i64(i64 [[TMP8]], i64 [[UMAX]])			; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i64(i64 [[TMP8]], i64 [[UMAX]])
	; CHECK-NEXT: [[TMP10:%.]] = getelementptr i32, i32 [[PTR:%.*]], i64 [[TMP8]]			; CHECK-NEXT: [[TMP10:%.]] = getelementptr i32, i32 [[PTR:%.*]], i64 [[TMP8]]
	; CHECK-NEXT: [[TMP11:%.]] = getelementptr i32, i32 [[TMP10]], i32 0			; CHECK-NEXT: [[TMP11:%.]] = getelementptr i32, i32 [[TMP10]], i32 0
	; CHECK-NEXT: [[TMP12:%.]] = bitcast i32 [[TMP11]] to <vscale x 4 x i32>*			; CHECK-NEXT: [[TMP12:%.]] = bitcast i32 [[TMP11]] to <vscale x 4 x i32>*
	; CHECK-NEXT: call void @llvm.masked.store.nxv4i32.p0nxv4i32(<vscale x 4 x i32> [[BROADCAST_SPLAT6]], <vscale x 4 x i32>* [[TMP12]], i32 4, <vscale x 4 x i1> [[ACTIVE_LANE_MASK]])			; CHECK-NEXT: call void @llvm.masked.store.nxv4i32.p0nxv4i32(<vscale x 4 x i32> [[BROADCAST_SPLAT6]], <vscale x 4 x i32>* [[TMP12]], i32 4, <vscale x 4 x i1> [[ACTIVE_LANE_MASK]])
	; CHECK-NEXT: [[TMP13:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP13:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP14:%.*]] = mul i64 [[TMP13]], 4			; CHECK-NEXT: [[TMP14:%.*]] = mul i64 [[TMP13]], 4
	Show All 31 Lines
	; CHECK-NEXT: [[TMP3:%.*]] = mul i64 [[TMP2]], 4			; CHECK-NEXT: [[TMP3:%.*]] = mul i64 [[TMP2]], 4
	; CHECK-NEXT: [[TMP4:%.*]] = sub i64 [[TMP3]], 1			; CHECK-NEXT: [[TMP4:%.*]] = sub i64 [[TMP3]], 1
	; CHECK-NEXT: [[N_RND_UP:%.*]] = add i64 [[UMAX]], [[TMP4]]			; CHECK-NEXT: [[N_RND_UP:%.*]] = add i64 [[UMAX]], [[TMP4]]
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP1]]			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP1]]
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
	; CHECK-NEXT: br label %vector.body			; CHECK-NEXT: br label %vector.body
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX1:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT2:%.]], %vector.body ]			; CHECK-NEXT: [[INDEX1:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT2:%.]], %vector.body ]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT3:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[INDEX1]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT4:%.*]] = shufflevector <vscale x 4 x i64> [[BROADCAST_SPLATINSERT3]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP5:%.*]] = call <vscale x 4 x i64> @llvm.experimental.stepvector.nxv4i64()
	; CHECK-NEXT: [[TMP6:%.*]] = add <vscale x 4 x i64> [[TMP5]], zeroinitializer
	; CHECK-NEXT: [[TMP7:%.*]] = mul <vscale x 4 x i64> [[TMP6]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECK-NEXT: [[INDUCTION:%.*]] = add <vscale x 4 x i64> [[BROADCAST_SPLAT4]], [[TMP7]]
	; CHECK-NEXT: [[TMP8:%.*]] = add i64 [[INDEX1]], 0			; CHECK-NEXT: [[TMP8:%.*]] = add i64 [[INDEX1]], 0
	; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i64(i64 [[TMP8]], i64 [[UMAX]])			; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i64(i64 [[TMP8]], i64 [[UMAX]])
	; CHECK-NEXT: [[TMP10:%.]] = getelementptr i32, i32 [[SRC:%.*]], i64 [[TMP8]]			; CHECK-NEXT: [[TMP10:%.]] = getelementptr i32, i32 [[SRC:%.*]], i64 [[TMP8]]
	; CHECK-NEXT: [[TMP11:%.]] = getelementptr i32, i32 [[TMP10]], i32 0			; CHECK-NEXT: [[TMP11:%.]] = getelementptr i32, i32 [[TMP10]], i32 0
	; CHECK-NEXT: [[TMP12:%.]] = bitcast i32 [[TMP11]] to <vscale x 4 x i32>*			; CHECK-NEXT: [[TMP12:%.]] = bitcast i32 [[TMP11]] to <vscale x 4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0nxv4i32(<vscale x 4 x i32> [[TMP12]], i32 4, <vscale x 4 x i1> [[ACTIVE_LANE_MASK]], <vscale x 4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0nxv4i32(<vscale x 4 x i32> [[TMP12]], i32 4, <vscale x 4 x i1> [[ACTIVE_LANE_MASK]], <vscale x 4 x i32> poison)
	; CHECK-NEXT: [[TMP13:%.]] = getelementptr i32, i32 [[DST:%.*]], i64 [[TMP8]]			; CHECK-NEXT: [[TMP13:%.]] = getelementptr i32, i32 [[DST:%.*]], i64 [[TMP8]]
	; CHECK-NEXT: [[TMP14:%.]] = getelementptr i32, i32 [[TMP13]], i32 0			; CHECK-NEXT: [[TMP14:%.]] = getelementptr i32, i32 [[TMP13]], i32 0
	▲ Show 20 Lines • Show All 106 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP3:%.*]] = mul i64 [[TMP2]], 4			; CHECK-NEXT: [[TMP3:%.*]] = mul i64 [[TMP2]], 4
	; CHECK-NEXT: [[TMP4:%.*]] = sub i64 [[TMP3]], 1			; CHECK-NEXT: [[TMP4:%.*]] = sub i64 [[TMP3]], 1
	; CHECK-NEXT: [[N_RND_UP:%.*]] = add i64 [[UMAX]], [[TMP4]]			; CHECK-NEXT: [[N_RND_UP:%.*]] = add i64 [[UMAX]], [[TMP4]]
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP1]]			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP1]]
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX1:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT2:%.]], %vector.body ]			; CHECK-NEXT: [[INDEX1:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT2:%.]], %vector.body ]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT3:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[INDEX1]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT4:%.*]] = shufflevector <vscale x 4 x i64> [[BROADCAST_SPLATINSERT3]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP5:%.*]] = call <vscale x 4 x i64> @llvm.experimental.stepvector.nxv4i64()
	; CHECK-NEXT: [[TMP6:%.*]] = add <vscale x 4 x i64> [[TMP5]], zeroinitializer
	; CHECK-NEXT: [[TMP7:%.*]] = mul <vscale x 4 x i64> [[TMP6]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECK-NEXT: [[INDUCTION:%.*]] = add <vscale x 4 x i64> [[BROADCAST_SPLAT4]], [[TMP7]]
	; CHECK-NEXT: [[TMP8:%.*]] = add i64 [[INDEX1]], 0			; CHECK-NEXT: [[TMP8:%.*]] = add i64 [[INDEX1]], 0
	; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i64(i64 [[TMP8]], i64 [[UMAX]])			; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i64(i64 [[TMP8]], i64 [[UMAX]])
	; CHECK-NEXT: [[TMP10:%.]] = getelementptr i32, i32 [[IND:%.*]], i64 [[TMP8]]			; CHECK-NEXT: [[TMP10:%.]] = getelementptr i32, i32 [[IND:%.*]], i64 [[TMP8]]
	; CHECK-NEXT: [[TMP11:%.]] = getelementptr i32, i32 [[TMP10]], i32 0			; CHECK-NEXT: [[TMP11:%.]] = getelementptr i32, i32 [[TMP10]], i32 0
	; CHECK-NEXT: [[TMP12:%.]] = bitcast i32 [[TMP11]] to <vscale x 4 x i32>*			; CHECK-NEXT: [[TMP12:%.]] = bitcast i32 [[TMP11]] to <vscale x 4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0nxv4i32(<vscale x 4 x i32> [[TMP12]], i32 4, <vscale x 4 x i1> [[ACTIVE_LANE_MASK]], <vscale x 4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0nxv4i32(<vscale x 4 x i32> [[TMP12]], i32 4, <vscale x 4 x i1> [[ACTIVE_LANE_MASK]], <vscale x 4 x i32> poison)
	; CHECK-NEXT: [[TMP13:%.]] = getelementptr i32, i32 [[SRC:%.*]], <vscale x 4 x i32> [[WIDE_MASKED_LOAD]]			; CHECK-NEXT: [[TMP13:%.]] = getelementptr i32, i32 [[SRC:%.*]], <vscale x 4 x i32> [[WIDE_MASKED_LOAD]]
	; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.]] = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32.nxv4p0i32(<vscale x 4 x i32> [[TMP13]], i32 4, <vscale x 4 x i1> [[ACTIVE_LANE_MASK]], <vscale x 4 x i32> undef)			; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.]] = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32.nxv4p0i32(<vscale x 4 x i32> [[TMP13]], i32 4, <vscale x 4 x i1> [[ACTIVE_LANE_MASK]], <vscale x 4 x i32> undef)
	Show All 40 Lines
	; CHECK-NEXT: [[TMP3:%.*]] = mul i64 [[TMP2]], 4			; CHECK-NEXT: [[TMP3:%.*]] = mul i64 [[TMP2]], 4
	; CHECK-NEXT: [[TMP4:%.*]] = sub i64 [[TMP3]], 1			; CHECK-NEXT: [[TMP4:%.*]] = sub i64 [[TMP3]], 1
	; CHECK-NEXT: [[N_RND_UP:%.]] = add i64 [[N:%.]], [[TMP4]]			; CHECK-NEXT: [[N_RND_UP:%.]] = add i64 [[N:%.]], [[TMP4]]
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP1]]			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP1]]
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
	; CHECK-NEXT: br label %vector.body			; CHECK-NEXT: br label %vector.body
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], %vector.body ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], %vector.body ]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[INDEX]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <vscale x 4 x i64> [[BROADCAST_SPLATINSERT1]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP5:%.*]] = call <vscale x 4 x i64> @llvm.experimental.stepvector.nxv4i64()
	; CHECK-NEXT: [[TMP6:%.*]] = add <vscale x 4 x i64> [[TMP5]], zeroinitializer
	; CHECK-NEXT: [[TMP7:%.*]] = mul <vscale x 4 x i64> [[TMP6]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECK-NEXT: [[INDUCTION:%.*]] = add <vscale x 4 x i64> [[BROADCAST_SPLAT2]], [[TMP7]]
	; CHECK-NEXT: [[TMP8:%.*]] = add i64 [[INDEX]], 0			; CHECK-NEXT: [[TMP8:%.*]] = add i64 [[INDEX]], 0
	; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i64(i64 [[TMP8]], i64 %n)			; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i64(i64 [[TMP8]], i64 %n)
	; CHECK-NEXT: [[TMP10:%.]] = load i32, i32 [[SRC:%.*]], align 4			; CHECK-NEXT: [[TMP10:%.]] = load i32, i32 [[SRC:%.*]], align 4
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT3:%.*]] = insertelement <vscale x 4 x i32> poison, i32 [[TMP10]], i32 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT3:%.*]] = insertelement <vscale x 4 x i32> poison, i32 [[TMP10]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT4:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT3]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT4:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT3]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 [[TMP8]]			; CHECK-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 [[TMP8]]
	; CHECK-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[TMP11]], i32 0			; CHECK-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[TMP11]], i32 0
	; CHECK-NEXT: [[TMP13:%.]] = bitcast i32 [[TMP12]] to <vscale x 4 x i32>*			; CHECK-NEXT: [[TMP13:%.]] = bitcast i32 [[TMP12]] to <vscale x 4 x i32>*
	▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[N_RND_UP:%.]] = add i64 [[N:%.]], [[TMP4]]			; CHECK-NEXT: [[N_RND_UP:%.]] = add i64 [[N:%.]], [[TMP4]]
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP1]]			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP1]]
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT5:%.]] = insertelement <vscale x 4 x i32> poison, i32* [[SRC:%.*]], i32 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT5:%.]] = insertelement <vscale x 4 x i32> poison, i32* [[SRC:%.*]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT6:%.]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT5]], <vscale x 4 x i32*> poison, <vscale x 4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT6:%.]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT5]], <vscale x 4 x i32*> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: br label %vector.body			; CHECK-NEXT: br label %vector.body
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX1:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT2:%.]], %vector.body ]			; CHECK-NEXT: [[INDEX1:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT2:%.]], %vector.body ]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT3:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[INDEX1]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT4:%.*]] = shufflevector <vscale x 4 x i64> [[BROADCAST_SPLATINSERT3]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP5:%.*]] = call <vscale x 4 x i64> @llvm.experimental.stepvector.nxv4i64()
	; CHECK-NEXT: [[TMP6:%.*]] = add <vscale x 4 x i64> [[TMP5]], zeroinitializer
	; CHECK-NEXT: [[TMP7:%.*]] = mul <vscale x 4 x i64> [[TMP6]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECK-NEXT: [[INDUCTION:%.*]] = add <vscale x 4 x i64> [[BROADCAST_SPLAT4]], [[TMP7]]
	; CHECK-NEXT: [[TMP8:%.*]] = add i64 [[INDEX1]], 0			; CHECK-NEXT: [[TMP8:%.*]] = add i64 [[INDEX1]], 0
	; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i64(i64 [[TMP8]], i64 %n)			; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i64(i64 [[TMP8]], i64 %n)
	; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[COND:%.*]], i64 [[TMP8]]			; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[COND:%.*]], i64 [[TMP8]]
	; CHECK-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[TMP10]], i32 0			; CHECK-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[TMP10]], i32 0
	; CHECK-NEXT: [[TMP12:%.]] = bitcast i32 [[TMP11]] to <vscale x 4 x i32>*			; CHECK-NEXT: [[TMP12:%.]] = bitcast i32 [[TMP11]] to <vscale x 4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0nxv4i32(<vscale x 4 x i32> [[TMP12]], i32 4, <vscale x 4 x i1> [[ACTIVE_LANE_MASK]], <vscale x 4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0nxv4i32(<vscale x 4 x i32> [[TMP12]], i32 4, <vscale x 4 x i1> [[ACTIVE_LANE_MASK]], <vscale x 4 x i32> poison)
	; CHECK-NEXT: [[TMP13:%.*]] = icmp eq <vscale x 4 x i32> [[WIDE_MASKED_LOAD]], zeroinitializer			; CHECK-NEXT: [[TMP13:%.*]] = icmp eq <vscale x 4 x i32> [[WIDE_MASKED_LOAD]], zeroinitializer
	; CHECK-NEXT: [[TMP14:%.*]] = xor <vscale x 4 x i1> [[TMP13]], shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i32 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer)			; CHECK-NEXT: [[TMP14:%.*]] = xor <vscale x 4 x i1> [[TMP13]], shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i32 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer)
	▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[N_RND_UP:%.]] = add i64 [[N:%.]], [[TMP4]]			; CHECK-NEXT: [[N_RND_UP:%.]] = add i64 [[N:%.]], [[TMP4]]
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP1]]			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP1]]
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT3:%.]] = insertelement <vscale x 4 x i32> poison, i32* [[DST:%.*]], i32 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT3:%.]] = insertelement <vscale x 4 x i32> poison, i32* [[DST:%.*]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT4:%.]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT3]], <vscale x 4 x i32*> poison, <vscale x 4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT4:%.]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT3]], <vscale x 4 x i32*> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: br label %vector.body			; CHECK-NEXT: br label %vector.body
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], %vector.body ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], %vector.body ]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[INDEX]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <vscale x 4 x i64> [[BROADCAST_SPLATINSERT1]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP5:%.*]] = call <vscale x 4 x i64> @llvm.experimental.stepvector.nxv4i64()
	; CHECK-NEXT: [[TMP6:%.*]] = add <vscale x 4 x i64> [[TMP5]], zeroinitializer
	; CHECK-NEXT: [[TMP7:%.*]] = mul <vscale x 4 x i64> [[TMP6]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECK-NEXT: [[INDUCTION:%.*]] = add <vscale x 4 x i64> [[BROADCAST_SPLAT2]], [[TMP7]]
	; CHECK-NEXT: [[TMP8:%.*]] = add i64 [[INDEX]], 0			; CHECK-NEXT: [[TMP8:%.*]] = add i64 [[INDEX]], 0
	; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i64(i64 [[TMP8]], i64 %n)			; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i64(i64 [[TMP8]], i64 %n)
	; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[SRC:%.*]], i64 [[TMP8]]			; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[SRC:%.*]], i64 [[TMP8]]
	; CHECK-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[TMP10]], i32 0			; CHECK-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[TMP10]], i32 0
	; CHECK-NEXT: [[TMP12:%.]] = bitcast i32 [[TMP11]] to <vscale x 4 x i32>*			; CHECK-NEXT: [[TMP12:%.]] = bitcast i32 [[TMP11]] to <vscale x 4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0nxv4i32(<vscale x 4 x i32> [[TMP12]], i32 4, <vscale x 4 x i1> [[ACTIVE_LANE_MASK]], <vscale x 4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0nxv4i32(<vscale x 4 x i32> [[TMP12]], i32 4, <vscale x 4 x i1> [[ACTIVE_LANE_MASK]], <vscale x 4 x i32> poison)
	; CHECK-NEXT: call void @llvm.masked.scatter.nxv4i32.nxv4p0i32(<vscale x 4 x i32> [[WIDE_MASKED_LOAD]], <vscale x 4 x i32*> [[BROADCAST_SPLAT4]], i32 4, <vscale x 4 x i1> [[ACTIVE_LANE_MASK]])			; CHECK-NEXT: call void @llvm.masked.scatter.nxv4i32.nxv4p0i32(<vscale x 4 x i32> [[WIDE_MASKED_LOAD]], <vscale x 4 x i32*> [[BROADCAST_SPLAT4]], i32 4, <vscale x 4 x i1> [[ACTIVE_LANE_MASK]])
	; CHECK-NEXT: [[TMP13:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP13:%.*]] = call i64 @llvm.vscale.i64()
	Show All 34 Lines
	; CHECK-NEXT: [[TMP3:%.*]] = mul i64 [[TMP2]], 4			; CHECK-NEXT: [[TMP3:%.*]] = mul i64 [[TMP2]], 4
	; CHECK-NEXT: [[TMP4:%.*]] = sub i64 [[TMP3]], 1			; CHECK-NEXT: [[TMP4:%.*]] = sub i64 [[TMP3]], 1
	; CHECK-NEXT: [[N_RND_UP:%.*]] = add i64 [[UMAX]], [[TMP4]]			; CHECK-NEXT: [[N_RND_UP:%.*]] = add i64 [[UMAX]], [[TMP4]]
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP1]]			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP1]]
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
	; CHECK-NEXT: br label %vector.body			; CHECK-NEXT: br label %vector.body
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX1:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT2:%.]], %vector.body ]			; CHECK-NEXT: [[INDEX1:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT2:%.]], %vector.body ]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT3:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[INDEX1]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT4:%.*]] = shufflevector <vscale x 4 x i64> [[BROADCAST_SPLATINSERT3]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP5:%.*]] = call <vscale x 4 x i64> @llvm.experimental.stepvector.nxv4i64()
	; CHECK-NEXT: [[TMP6:%.*]] = add <vscale x 4 x i64> [[TMP5]], zeroinitializer
	; CHECK-NEXT: [[TMP7:%.*]] = mul <vscale x 4 x i64> [[TMP6]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECK-NEXT: [[INDUCTION:%.*]] = add <vscale x 4 x i64> [[BROADCAST_SPLAT4]], [[TMP7]]
	; CHECK-NEXT: [[TMP8:%.*]] = add i64 [[INDEX1]], 0			; CHECK-NEXT: [[TMP8:%.*]] = add i64 [[INDEX1]], 0
	; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i64(i64 [[TMP8]], i64 [[UMAX]])			; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i64(i64 [[TMP8]], i64 [[UMAX]])
	; CHECK-NEXT: [[TMP10:%.]] = getelementptr float, float [[SRC:%.*]], i64 [[TMP8]]			; CHECK-NEXT: [[TMP10:%.]] = getelementptr float, float [[SRC:%.*]], i64 [[TMP8]]
	; CHECK-NEXT: [[TMP11:%.]] = getelementptr float, float [[DST:%.*]], i64 [[TMP8]]			; CHECK-NEXT: [[TMP11:%.]] = getelementptr float, float [[DST:%.*]], i64 [[TMP8]]
	; CHECK-NEXT: [[TMP12:%.]] = getelementptr float, float [[TMP10]], i32 0			; CHECK-NEXT: [[TMP12:%.]] = getelementptr float, float [[TMP10]], i32 0
	; CHECK-NEXT: [[TMP13:%.]] = bitcast float [[TMP12]] to <vscale x 4 x float>*			; CHECK-NEXT: [[TMP13:%.]] = bitcast float [[TMP12]] to <vscale x 4 x float>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <vscale x 4 x float> @llvm.masked.load.nxv4f32.p0nxv4f32(<vscale x 4 x float> [[TMP13]], i32 4, <vscale x 4 x i1> [[ACTIVE_LANE_MASK]], <vscale x 4 x float> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <vscale x 4 x float> @llvm.masked.load.nxv4f32.p0nxv4f32(<vscale x 4 x float> [[TMP13]], i32 4, <vscale x 4 x i1> [[ACTIVE_LANE_MASK]], <vscale x 4 x float> poison)
	; CHECK-NEXT: [[TMP14:%.]] = getelementptr float, float [[TMP11]], i32 0			; CHECK-NEXT: [[TMP14:%.]] = getelementptr float, float [[TMP11]], i32 0
	▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP4:%.*]] = sub i64 [[TMP3]], 1			; CHECK-NEXT: [[TMP4:%.*]] = sub i64 [[TMP3]], 1
	; CHECK-NEXT: [[N_RND_UP:%.*]] = add i64 [[UMAX]], [[TMP4]]			; CHECK-NEXT: [[N_RND_UP:%.*]] = add i64 [[UMAX]], [[TMP4]]
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP1]]			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP1]]
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
	; CHECK-NEXT: br label %vector.body			; CHECK-NEXT: br label %vector.body
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX1:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT2:%.]], %vector.body ]			; CHECK-NEXT: [[INDEX1:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT2:%.]], %vector.body ]
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi i32 [ 0, %vector.ph ], [ [[TMP14:%.]], %vector.body ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi i32 [ 0, %vector.ph ], [ [[TMP14:%.]], %vector.body ]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[INDEX1]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP5:%.*]] = call <vscale x 4 x i64> @llvm.experimental.stepvector.nxv4i64()
	; CHECK-NEXT: [[TMP6:%.*]] = add <vscale x 4 x i64> [[TMP5]], zeroinitializer
	; CHECK-NEXT: [[TMP7:%.*]] = mul <vscale x 4 x i64> [[TMP6]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECK-NEXT: [[INDUCTION:%.*]] = add <vscale x 4 x i64> [[BROADCAST_SPLAT]], [[TMP7]]
	; CHECK-NEXT: [[TMP8:%.*]] = add i64 [[INDEX1]], 0			; CHECK-NEXT: [[TMP8:%.*]] = add i64 [[INDEX1]], 0
	; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i64(i64 [[TMP8]], i64 [[UMAX]])			; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i64(i64 [[TMP8]], i64 [[UMAX]])
	; CHECK-NEXT: [[TMP9:%.]] = getelementptr i32, i32 [[PTR:%.*]], i64 [[TMP8]]			; CHECK-NEXT: [[TMP9:%.]] = getelementptr i32, i32 [[PTR:%.*]], i64 [[TMP8]]
	; CHECK-NEXT: [[TMP10:%.]] = getelementptr i32, i32 [[TMP9]], i32 0			; CHECK-NEXT: [[TMP10:%.]] = getelementptr i32, i32 [[TMP9]], i32 0
	; CHECK-NEXT: [[TMP11:%.]] = bitcast i32 [[TMP10]] to <vscale x 4 x i32>*			; CHECK-NEXT: [[TMP11:%.]] = bitcast i32 [[TMP10]] to <vscale x 4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0nxv4i32(<vscale x 4 x i32> [[TMP11]], i32 4, <vscale x 4 x i1> [[ACTIVE_LANE_MASK]], <vscale x 4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0nxv4i32(<vscale x 4 x i32> [[TMP11]], i32 4, <vscale x 4 x i1> [[ACTIVE_LANE_MASK]], <vscale x 4 x i32> poison)
	; CHECK-NEXT: [[TMP12:%.*]] = select <vscale x 4 x i1> [[ACTIVE_LANE_MASK]], <vscale x 4 x i32> [[WIDE_MASKED_LOAD]], <vscale x 4 x i32> zeroinitializer			; CHECK-NEXT: [[TMP12:%.*]] = select <vscale x 4 x i1> [[ACTIVE_LANE_MASK]], <vscale x 4 x i32> [[WIDE_MASKED_LOAD]], <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP13:%.*]] = call i32 @llvm.vector.reduce.add.nxv4i32(<vscale x 4 x i32> [[TMP12]])			; CHECK-NEXT: [[TMP13:%.*]] = call i32 @llvm.vector.reduce.add.nxv4i32(<vscale x 4 x i32> [[TMP12]])
	Show All 36 Lines
	; CHECK-NEXT: [[TMP4:%.*]] = sub i64 [[TMP3]], 1			; CHECK-NEXT: [[TMP4:%.*]] = sub i64 [[TMP3]], 1
	; CHECK-NEXT: [[N_RND_UP:%.*]] = add i64 [[UMAX]], [[TMP4]]			; CHECK-NEXT: [[N_RND_UP:%.*]] = add i64 [[UMAX]], [[TMP4]]
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP1]]			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP1]]
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
	; CHECK-NEXT: br label %vector.body			; CHECK-NEXT: br label %vector.body
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX1:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT2:%.]], %vector.body ]			; CHECK-NEXT: [[INDEX1:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT2:%.]], %vector.body ]
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi float [ 0.000000e+00, %vector.ph ], [ [[TMP14:%.]], %vector.body ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi float [ 0.000000e+00, %vector.ph ], [ [[TMP14:%.]], %vector.body ]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT3:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[INDEX1]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT4:%.*]] = shufflevector <vscale x 4 x i64> [[BROADCAST_SPLATINSERT3]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP5:%.*]] = call <vscale x 4 x i64> @llvm.experimental.stepvector.nxv4i64()
	; CHECK-NEXT: [[TMP6:%.*]] = add <vscale x 4 x i64> [[TMP5]], zeroinitializer
	; CHECK-NEXT: [[TMP7:%.*]] = mul <vscale x 4 x i64> [[TMP6]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECK-NEXT: [[INDUCTION:%.*]] = add <vscale x 4 x i64> [[BROADCAST_SPLAT4]], [[TMP7]]
	; CHECK-NEXT: [[TMP8:%.*]] = add i64 [[INDEX1]], 0			; CHECK-NEXT: [[TMP8:%.*]] = add i64 [[INDEX1]], 0
	; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i64(i64 [[TMP8]], i64 [[UMAX]])			; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i64(i64 [[TMP8]], i64 [[UMAX]])
	; CHECK-NEXT: [[TMP10:%.]] = getelementptr float, float [[PTR:%.*]], i64 [[TMP8]]			; CHECK-NEXT: [[TMP10:%.]] = getelementptr float, float [[PTR:%.*]], i64 [[TMP8]]
	; CHECK-NEXT: [[TMP11:%.]] = getelementptr float, float [[TMP10]], i32 0			; CHECK-NEXT: [[TMP11:%.]] = getelementptr float, float [[TMP10]], i32 0
	; CHECK-NEXT: [[TMP12:%.]] = bitcast float [[TMP11]] to <vscale x 4 x float>*			; CHECK-NEXT: [[TMP12:%.]] = bitcast float [[TMP11]] to <vscale x 4 x float>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <vscale x 4 x float> @llvm.masked.load.nxv4f32.p0nxv4f32(<vscale x 4 x float> [[TMP12]], i32 4, <vscale x 4 x i1> [[ACTIVE_LANE_MASK]], <vscale x 4 x float> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <vscale x 4 x float> @llvm.masked.load.nxv4f32.p0nxv4f32(<vscale x 4 x float> [[TMP12]], i32 4, <vscale x 4 x i1> [[ACTIVE_LANE_MASK]], <vscale x 4 x float> poison)
	; CHECK-NEXT: [[TMP13:%.*]] = select <vscale x 4 x i1> [[ACTIVE_LANE_MASK]], <vscale x 4 x float> [[WIDE_MASKED_LOAD]], <vscale x 4 x float> shufflevector (<vscale x 4 x float> insertelement (<vscale x 4 x float> poison, float -0.000000e+00, i32 0), <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer)			; CHECK-NEXT: [[TMP13:%.*]] = select <vscale x 4 x i1> [[ACTIVE_LANE_MASK]], <vscale x 4 x float> [[WIDE_MASKED_LOAD]], <vscale x 4 x float> shufflevector (<vscale x 4 x float> insertelement (<vscale x 4 x float> poison, float -0.000000e+00, i32 0), <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECK-NEXT: [[TMP14]] = call float @llvm.vector.reduce.fadd.nxv4f32(float [[VEC_PHI]], <vscale x 4 x float> [[TMP13]])			; CHECK-NEXT: [[TMP14]] = call float @llvm.vector.reduce.fadd.nxv4f32(float [[VEC_PHI]], <vscale x 4 x float> [[TMP13]])
	Show All 34 Lines
	; CHECK-NEXT: [[TMP4:%.*]] = sub i64 [[TMP3]], 1			; CHECK-NEXT: [[TMP4:%.*]] = sub i64 [[TMP3]], 1
	; CHECK-NEXT: [[N_RND_UP:%.]] = add i64 [[N:%.]], [[TMP4]]			; CHECK-NEXT: [[N_RND_UP:%.]] = add i64 [[N:%.]], [[TMP4]]
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP1]]			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP1]]
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
	; CHECK-NEXT: br label %vector.body			; CHECK-NEXT: br label %vector.body
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], %vector.body ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], %vector.body ]
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi <vscale x 4 x i32> [ insertelement (<vscale x 4 x i32> zeroinitializer, i32 7, i32 0), %vector.ph ], [ [[PREDPHI:%.]], %vector.body ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi <vscale x 4 x i32> [ insertelement (<vscale x 4 x i32> zeroinitializer, i32 7, i32 0), %vector.ph ], [ [[PREDPHI:%.]], %vector.body ]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[INDEX]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP5:%.*]] = call <vscale x 4 x i64> @llvm.experimental.stepvector.nxv4i64()
	; CHECK-NEXT: [[TMP6:%.*]] = add <vscale x 4 x i64> [[TMP5]], zeroinitializer
	; CHECK-NEXT: [[TMP7:%.*]] = mul <vscale x 4 x i64> [[TMP6]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECK-NEXT: [[INDUCTION:%.*]] = add <vscale x 4 x i64> [[BROADCAST_SPLAT]], [[TMP7]]
	; CHECK-NEXT: [[TMP8:%.*]] = add i64 [[INDEX]], 0			; CHECK-NEXT: [[TMP8:%.*]] = add i64 [[INDEX]], 0
	; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i64(i64 [[TMP8]], i64 [[N]])			; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i64(i64 [[TMP8]], i64 [[N]])
	; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, i32 [[COND:%.*]], i64 [[TMP8]]			; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, i32 [[COND:%.*]], i64 [[TMP8]]
	; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TMP9]], i32 0			; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TMP9]], i32 0
	; CHECK-NEXT: [[TMP11:%.]] = bitcast i32 [[TMP10]] to <vscale x 4 x i32>*			; CHECK-NEXT: [[TMP11:%.]] = bitcast i32 [[TMP10]] to <vscale x 4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0nxv4i32(<vscale x 4 x i32> [[TMP11]], i32 4, <vscale x 4 x i1> [[ACTIVE_LANE_MASK]], <vscale x 4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0nxv4i32(<vscale x 4 x i32> [[TMP11]], i32 4, <vscale x 4 x i1> [[ACTIVE_LANE_MASK]], <vscale x 4 x i32> poison)
	; CHECK-NEXT: [[TMP12:%.*]] = icmp eq <vscale x 4 x i32> [[WIDE_MASKED_LOAD]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 5, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)			; CHECK-NEXT: [[TMP12:%.*]] = icmp eq <vscale x 4 x i32> [[WIDE_MASKED_LOAD]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 5, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECK-NEXT: [[TMP13:%.]] = getelementptr i32, i32 [[A:%.*]], i64 [[TMP8]]			; CHECK-NEXT: [[TMP13:%.]] = getelementptr i32, i32 [[A:%.*]], i64 [[TMP8]]
	▲ Show 20 Lines • Show All 77 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/tail-fold-uniform-memops.ll

	; RUN: opt -loop-vectorize -scalable-vectorization=off -force-vector-width=4 -prefer-predicate-over-epilogue=predicate-dont-vectorize -S < %s \| FileCheck %s			; RUN: opt -loop-vectorize -scalable-vectorization=off -force-vector-width=4 -prefer-predicate-over-epilogue=predicate-dont-vectorize -S < %s \| FileCheck %s

	; NOTE: These tests aren't really target-specific, but it's convenient to target AArch64			; NOTE: These tests aren't really target-specific, but it's convenient to target AArch64
	; so that TTI.isLegalMaskedLoad can return true.			; so that TTI.isLegalMaskedLoad can return true.

	target triple = "aarch64-linux-gnu"			target triple = "aarch64-linux-gnu"

	; The original loop had an unconditional uniform load. Let's make sure			; The original loop had an unconditional uniform load. Let's make sure
	; we don't artificially create new predicated blocks for the load.			; we don't artificially create new predicated blocks for the load.
	define void @uniform_load(i32* noalias %dst, i32* noalias readonly %src, i64 %n) #0 {			define void @uniform_load(i32* noalias %dst, i32* noalias readonly %src, i64 %n) #0 {
	; CHECK-LABEL: @uniform_load(			; CHECK-LABEL: @uniform_load(
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[IDX:%.]] = phi i64 [ 0, %vector.ph ], [ [[IDX_NEXT:%.]], %vector.body ]			; CHECK-NEXT: [[IDX:%.]] = phi i64 [ 0, %vector.ph ], [ [[IDX_NEXT:%.]], %vector.body ]
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x i64> poison, i64 [[IDX]], i32 0
	; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x i64> [[TMP1]], <4 x i64> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[INDUCTION:%.*]] = add <4 x i64> [[TMP2]], <i64 0, i64 1, i64 2, i64 3>
	; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[IDX]], 0			; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[IDX]], 0
	; CHECK-NEXT: [[LOOP_PRED:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i64(i64 [[TMP3]], i64 %n)			; CHECK-NEXT: [[LOOP_PRED:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i64(i64 [[TMP3]], i64 %n)
	; CHECK-NEXT: [[LOAD_VAL:%.]] = load i32, i32 %src, align 4			; CHECK-NEXT: [[LOAD_VAL:%.]] = load i32, i32 %src, align 4
	; CHECK-NOT: load i32, i32* %src, align 4			; CHECK-NOT: load i32, i32* %src, align 4
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x i32> poison, i32 [[LOAD_VAL]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x i32> poison, i32 [[LOAD_VAL]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP4]], <4 x i32> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP4]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 %dst, i64 [[TMP3]]			; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 %dst, i64 [[TMP3]]
	; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TMP6]], i32 0			; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TMP6]], i32 0
	Show All 25 Lines
	; and the original condition.			; and the original condition.
	define void @cond_uniform_load(i32* nocapture %dst, i32* nocapture readonly %src, i32* nocapture readonly %cond, i64 %n) #0 {			define void @cond_uniform_load(i32* nocapture %dst, i32* nocapture readonly %src, i32* nocapture readonly %cond, i64 %n) #0 {
	; CHECK-LABEL: @cond_uniform_load(			; CHECK-LABEL: @cond_uniform_load(
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK: [[TMP1:%.]] = insertelement <4 x i32> poison, i32* %src, i32 0			; CHECK: [[TMP1:%.]] = insertelement <4 x i32> poison, i32* %src, i32 0
	; CHECK-NEXT: [[SRC_SPLAT:%.]] = shufflevector <4 x i32> [[TMP1]], <4 x i32*> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[SRC_SPLAT:%.]] = shufflevector <4 x i32> [[TMP1]], <4 x i32*> poison, <4 x i32> zeroinitializer
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[IDX:%.]] = phi i64 [ 0, %vector.ph ], [ [[IDX_NEXT:%.]], %vector.body ]			; CHECK-NEXT: [[IDX:%.]] = phi i64 [ 0, %vector.ph ], [ [[IDX_NEXT:%.]], %vector.body ]
	; CHECK: [[TMP1:%.*]] = insertelement <4 x i64> poison, i64 [[IDX]], i32 0
	; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x i64> [[TMP1]], <4 x i64> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[INDUCTION:%.*]] = add <4 x i64> [[TMP2]], <i64 0, i64 1, i64 2, i64 3>
	; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[IDX]], 0			; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[IDX]], 0
	; CHECK-NEXT: [[LOOP_PRED:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i64(i64 [[TMP3]], i64 %n)			; CHECK-NEXT: [[LOOP_PRED:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i64(i64 [[TMP3]], i64 %n)
	; CHECK: [[COND_LOAD:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> {{%.*}}, i32 4, <4 x i1> [[LOOP_PRED]], <4 x i32> poison)			; CHECK: [[COND_LOAD:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> {{%.*}}, i32 4, <4 x i1> [[LOOP_PRED]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP4:%.*]] = icmp eq <4 x i32> [[COND_LOAD]], zeroinitializer			; CHECK-NEXT: [[TMP4:%.*]] = icmp eq <4 x i32> [[COND_LOAD]], zeroinitializer
	; CHECK-NEXT: [[TMP5:%.*]] = xor <4 x i1> [[TMP4]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP5:%.*]] = xor <4 x i1> [[TMP4]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[MASK:%.*]] = select <4 x i1> [[LOOP_PRED]], <4 x i1> [[TMP5]], <4 x i1> zeroinitializer			; CHECK-NEXT: [[MASK:%.*]] = select <4 x i1> [[LOOP_PRED]], <4 x i1> [[TMP5]], <4 x i1> zeroinitializer
	; CHECK-NEXT: call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32*> [[SRC_SPLAT]], i32 4, <4 x i1> [[MASK]], <4 x i32> undef)			; CHECK-NEXT: call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32*> [[SRC_SPLAT]], i32 4, <4 x i1> [[MASK]], <4 x i32> undef)
	entry:			entry:
	Show All 26 Lines

llvm/test/Transforms/LoopVectorize/ARM/mve-gather-scatter-tailpred.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -loop-vectorize -force-vector-width=4 -mtriple=thumbv8.1m.main-none-none-eabi -mattr=+mve.fp -tail-predication=force-enabled -S %s -o - \| FileCheck %s			; RUN: opt -loop-vectorize -force-vector-width=4 -mtriple=thumbv8.1m.main-none-none-eabi -mattr=+mve.fp -tail-predication=force-enabled -S %s -o - \| FileCheck %s

	target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"			target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"

	define void @test_stride1_4i32(i32* readonly %data, i32* noalias nocapture %dst, i32 %n) {			define void @test_stride1_4i32(i32* readonly %data, i32* noalias nocapture %dst, i32 %n) {
	; CHECK-LABEL: @test_stride1_4i32(			; CHECK-LABEL: @test_stride1_4i32(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_RND_UP:%.]] = add i32 [[N:%.]], 3			; CHECK-NEXT: [[N_RND_UP:%.]] = add i32 [[N:%.]], 3
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N_RND_UP]], 4			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N_RND_UP]], 4
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[N_RND_UP]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[N_RND_UP]], [[N_MOD_VF]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[INDEX]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[INDUCTION:%.*]] = add <4 x i32> [[BROADCAST_SPLAT]], <i32 0, i32 1, i32 2, i32 3>
	; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0			; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0
	; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 [[TMP0]], i32 [[N]])			; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 [[TMP0]], i32 [[N]])
	; CHECK-NEXT: [[TMP1:%.*]] = mul nuw nsw i32 [[TMP0]], 1			; CHECK-NEXT: [[TMP1:%.*]] = mul nuw nsw i32 [[TMP0]], 1
	; CHECK-NEXT: [[TMP2:%.*]] = add nuw nsw i32 [[TMP1]], 2			; CHECK-NEXT: [[TMP2:%.*]] = add nuw nsw i32 [[TMP1]], 2
	; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[DATA:%.*]], i32 [[TMP2]]			; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[DATA:%.*]], i32 [[TMP2]]
	; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TMP3]], i32 0			; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TMP3]], i32 0
	; CHECK-NEXT: [[TMP5:%.]] = bitcast i32 [[TMP4]] to <4 x i32>*			; CHECK-NEXT: [[TMP5:%.]] = bitcast i32 [[TMP4]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP5]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP5]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> poison)
	▲ Show 20 Lines • Show All 323 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: br i1 [[IDENT_CHECK]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]			; CHECK-NEXT: br i1 [[IDENT_CHECK]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_RND_UP:%.]] = add i32 [[N:%.]], 3			; CHECK-NEXT: [[N_RND_UP:%.]] = add i32 [[N:%.]], 3
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N_RND_UP]], 4			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N_RND_UP]], 4
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[N_RND_UP]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[N_RND_UP]], [[N_MOD_VF]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[INDEX]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[INDUCTION:%.*]] = add <4 x i32> [[BROADCAST_SPLAT]], <i32 0, i32 1, i32 2, i32 3>
	; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[INDEX]], 0			; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[INDEX]], 0
	; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 [[TMP1]], i32 [[N]])			; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 [[TMP1]], i32 [[N]])
	; CHECK-NEXT: [[TMP2:%.*]] = mul nuw nsw i32 [[TMP1]], [[STRIDE]]			; CHECK-NEXT: [[TMP2:%.*]] = mul nuw nsw i32 [[TMP1]], [[STRIDE]]
	; CHECK-NEXT: [[TMP3:%.*]] = add nuw nsw i32 [[TMP2]], 2			; CHECK-NEXT: [[TMP3:%.*]] = add nuw nsw i32 [[TMP2]], 2
	; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[DATA:%.*]], i32 [[TMP3]]			; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[DATA:%.*]], i32 [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 0			; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 0
	; CHECK-NEXT: [[TMP6:%.]] = bitcast i32 [[TMP5]] to <4 x i32>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast i32 [[TMP5]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP6]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP6]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> poison)
	▲ Show 20 Lines • Show All 253 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/ARM/mve-reduction-types.ll

	Show All 13 Lines
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_RND_UP:%.*]] = add i32 [[N]], 15			; CHECK-NEXT: [[N_RND_UP:%.*]] = add i32 [[N]], 15
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N_RND_UP]], 16			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N_RND_UP]], 16
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[N_RND_UP]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[N_RND_UP]], [[N_MOD_VF]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[TMP12:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[TMP12:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <16 x i32> poison, i32 [[INDEX]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <16 x i32> [[BROADCAST_SPLATINSERT]], <16 x i32> poison, <16 x i32> zeroinitializer
	; CHECK-NEXT: [[INDUCTION:%.*]] = add <16 x i32> [[BROADCAST_SPLAT]], <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0			; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0
	; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <16 x i1> @llvm.get.active.lane.mask.v16i1.i32(i32 [[TMP0]], i32 [[N]])			; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <16 x i1> @llvm.get.active.lane.mask.v16i1.i32(i32 [[TMP0]], i32 [[N]])
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i8, i8 [[A:%.*]], i32 [[TMP0]]			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i8, i8 [[A:%.*]], i32 [[TMP0]]
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i8, i8 [[TMP1]], i32 0			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i8, i8 [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i8 [[TMP2]] to <16 x i8>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast i8 [[TMP2]] to <16 x i8>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <16 x i8> @llvm.masked.load.v16i8.p0v16i8(<16 x i8> [[TMP3]], i32 1, <16 x i1> [[ACTIVE_LANE_MASK]], <16 x i8> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <16 x i8> @llvm.masked.load.v16i8.p0v16i8(<16 x i8> [[TMP3]], i32 1, <16 x i1> [[ACTIVE_LANE_MASK]], <16 x i8> poison)
	; CHECK-NEXT: [[TMP4:%.*]] = sext <16 x i8> [[WIDE_MASKED_LOAD]] to <16 x i32>			; CHECK-NEXT: [[TMP4:%.*]] = sext <16 x i8> [[WIDE_MASKED_LOAD]] to <16 x i32>
	; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds i8, i8 [[B:%.*]], i32 [[TMP0]]			; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds i8, i8 [[B:%.*]], i32 [[TMP0]]
	▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_RND_UP:%.*]] = add i32 [[N]], 15			; CHECK-NEXT: [[N_RND_UP:%.*]] = add i32 [[N]], 15
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N_RND_UP]], 16			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N_RND_UP]], 16
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[N_RND_UP]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[N_RND_UP]], [[N_MOD_VF]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[TMP12:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[TMP12:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <16 x i32> poison, i32 [[INDEX]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <16 x i32> [[BROADCAST_SPLATINSERT]], <16 x i32> poison, <16 x i32> zeroinitializer
	; CHECK-NEXT: [[INDUCTION:%.*]] = add <16 x i32> [[BROADCAST_SPLAT]], <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0			; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0
	; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <16 x i1> @llvm.get.active.lane.mask.v16i1.i32(i32 [[TMP0]], i32 [[N]])			; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <16 x i1> @llvm.get.active.lane.mask.v16i1.i32(i32 [[TMP0]], i32 [[N]])
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i8, i8 [[A:%.*]], i32 [[TMP0]]			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i8, i8 [[A:%.*]], i32 [[TMP0]]
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i8, i8 [[TMP1]], i32 0			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i8, i8 [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i8 [[TMP2]] to <16 x i8>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast i8 [[TMP2]] to <16 x i8>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <16 x i8> @llvm.masked.load.v16i8.p0v16i8(<16 x i8> [[TMP3]], i32 1, <16 x i1> [[ACTIVE_LANE_MASK]], <16 x i8> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <16 x i8> @llvm.masked.load.v16i8.p0v16i8(<16 x i8> [[TMP3]], i32 1, <16 x i1> [[ACTIVE_LANE_MASK]], <16 x i8> poison)
	; CHECK-NEXT: [[TMP4:%.*]] = sext <16 x i8> [[WIDE_MASKED_LOAD]] to <16 x i32>			; CHECK-NEXT: [[TMP4:%.*]] = sext <16 x i8> [[WIDE_MASKED_LOAD]] to <16 x i32>
	; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds i8, i8 [[B:%.*]], i32 [[TMP0]]			; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds i8, i8 [[B:%.*]], i32 [[TMP0]]
	▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_RND_UP:%.*]] = add i32 [[N]], 3			; CHECK-NEXT: [[N_RND_UP:%.*]] = add i32 [[N]], 3
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N_RND_UP]], 4			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N_RND_UP]], 4
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[N_RND_UP]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[N_RND_UP]], [[N_MOD_VF]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[TMP6:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[TMP6:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[INDEX]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[INDUCTION:%.*]] = add <4 x i32> [[BROADCAST_SPLAT]], <i32 0, i32 1, i32 2, i32 3>
	; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0			; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0
	; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 [[TMP0]], i32 [[N]])			; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 [[TMP0]], i32 [[N]])
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[X:%.*]], i32 [[TMP0]]			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[X:%.*]], i32 [[TMP0]]
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i32 0			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP3]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP3]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP4:%.*]] = select <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> [[WIDE_MASKED_LOAD]], <4 x i32> zeroinitializer			; CHECK-NEXT: [[TMP4:%.*]] = select <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> [[WIDE_MASKED_LOAD]], <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP4]])			; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP4]])
	▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_RND_UP:%.*]] = add i32 [[N]], 3			; CHECK-NEXT: [[N_RND_UP:%.*]] = add i32 [[N]], 3
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N_RND_UP]], 4			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N_RND_UP]], 4
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[N_RND_UP]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[N_RND_UP]], [[N_MOD_VF]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ <i32 1, i32 1, i32 1, i32 1>, [[VECTOR_PH]] ], [ [[TMP5:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ <i32 1, i32 1, i32 1, i32 1>, [[VECTOR_PH]] ], [ [[TMP5:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[INDEX]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[INDUCTION:%.*]] = add <4 x i32> [[BROADCAST_SPLAT]], <i32 0, i32 1, i32 2, i32 3>
	; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0			; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0
	; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 [[TMP0]], i32 [[N]])			; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 [[TMP0]], i32 [[N]])
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[X:%.*]], i32 [[TMP0]]			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[X:%.*]], i32 [[TMP0]]
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i32 0			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP3]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP3]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP4:%.*]] = mul <4 x i32> [[WIDE_MASKED_LOAD]], [[VEC_PHI]]			; CHECK-NEXT: [[TMP4:%.*]] = mul <4 x i32> [[WIDE_MASKED_LOAD]], [[VEC_PHI]]
	; CHECK-NEXT: [[TMP5]] = select <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> [[TMP4]], <4 x i32> [[VEC_PHI]]			; CHECK-NEXT: [[TMP5]] = select <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> [[TMP4]], <4 x i32> [[VEC_PHI]]
	▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_RND_UP:%.*]] = add i32 [[N]], 3			; CHECK-NEXT: [[N_RND_UP:%.*]] = add i32 [[N]], 3
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N_RND_UP]], 4			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N_RND_UP]], 4
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[N_RND_UP]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[N_RND_UP]], [[N_MOD_VF]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ <i32 -1, i32 -1, i32 -1, i32 -1>, [[VECTOR_PH]] ], [ [[TMP5:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ <i32 -1, i32 -1, i32 -1, i32 -1>, [[VECTOR_PH]] ], [ [[TMP5:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[INDEX]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[INDUCTION:%.*]] = add <4 x i32> [[BROADCAST_SPLAT]], <i32 0, i32 1, i32 2, i32 3>
	; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0			; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0
	; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 [[TMP0]], i32 [[N]])			; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 [[TMP0]], i32 [[N]])
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[X:%.*]], i32 [[TMP0]]			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[X:%.*]], i32 [[TMP0]]
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i32 0			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP3]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP3]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP4:%.*]] = and <4 x i32> [[WIDE_MASKED_LOAD]], [[VEC_PHI]]			; CHECK-NEXT: [[TMP4:%.*]] = and <4 x i32> [[WIDE_MASKED_LOAD]], [[VEC_PHI]]
	; CHECK-NEXT: [[TMP5]] = select <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> [[TMP4]], <4 x i32> [[VEC_PHI]]			; CHECK-NEXT: [[TMP5]] = select <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> [[TMP4]], <4 x i32> [[VEC_PHI]]
	▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_RND_UP:%.*]] = add i32 [[N]], 3			; CHECK-NEXT: [[N_RND_UP:%.*]] = add i32 [[N]], 3
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N_RND_UP]], 4			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N_RND_UP]], 4
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[N_RND_UP]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[N_RND_UP]], [[N_MOD_VF]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP5:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP5:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[INDEX]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[INDUCTION:%.*]] = add <4 x i32> [[BROADCAST_SPLAT]], <i32 0, i32 1, i32 2, i32 3>
	; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0			; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0
	; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 [[TMP0]], i32 [[N]])			; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 [[TMP0]], i32 [[N]])
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[X:%.*]], i32 [[TMP0]]			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[X:%.*]], i32 [[TMP0]]
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i32 0			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP3]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP3]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP4:%.*]] = or <4 x i32> [[WIDE_MASKED_LOAD]], [[VEC_PHI]]			; CHECK-NEXT: [[TMP4:%.*]] = or <4 x i32> [[WIDE_MASKED_LOAD]], [[VEC_PHI]]
	; CHECK-NEXT: [[TMP5]] = select <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> [[TMP4]], <4 x i32> [[VEC_PHI]]			; CHECK-NEXT: [[TMP5]] = select <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> [[TMP4]], <4 x i32> [[VEC_PHI]]
	▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_RND_UP:%.*]] = add i32 [[N]], 3			; CHECK-NEXT: [[N_RND_UP:%.*]] = add i32 [[N]], 3
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N_RND_UP]], 4			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N_RND_UP]], 4
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[N_RND_UP]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[N_RND_UP]], [[N_MOD_VF]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP5:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP5:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[INDEX]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[INDUCTION:%.*]] = add <4 x i32> [[BROADCAST_SPLAT]], <i32 0, i32 1, i32 2, i32 3>
	; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0			; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0
	; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 [[TMP0]], i32 [[N]])			; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 [[TMP0]], i32 [[N]])
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[X:%.*]], i32 [[TMP0]]			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[X:%.*]], i32 [[TMP0]]
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i32 0			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP3]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP3]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP4:%.*]] = xor <4 x i32> [[WIDE_MASKED_LOAD]], [[VEC_PHI]]			; CHECK-NEXT: [[TMP4:%.*]] = xor <4 x i32> [[WIDE_MASKED_LOAD]], [[VEC_PHI]]
	; CHECK-NEXT: [[TMP5]] = select <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> [[TMP4]], <4 x i32> [[VEC_PHI]]			; CHECK-NEXT: [[TMP5]] = select <4 x i1> [[ACTIVE_LANE_MASK]], <4 x i32> [[TMP4]], <4 x i32> [[VEC_PHI]]
	▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_RND_UP:%.*]] = add i32 [[N]], 3			; CHECK-NEXT: [[N_RND_UP:%.*]] = add i32 [[N]], 3
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N_RND_UP]], 4			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N_RND_UP]], 4
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[N_RND_UP]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[N_RND_UP]], [[N_MOD_VF]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x float> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP5:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x float> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP5:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[INDEX]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[INDUCTION:%.*]] = add <4 x i32> [[BROADCAST_SPLAT]], <i32 0, i32 1, i32 2, i32 3>
	; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0			; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0
	; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 [[TMP0]], i32 [[N]])			; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 [[TMP0]], i32 [[N]])
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds float, float [[X:%.*]], i32 [[TMP0]]			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds float, float [[X:%.*]], i32 [[TMP0]]
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds float, float [[TMP1]], i32 0			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds float, float [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[TMP2]] to <4 x float>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[TMP2]] to <4 x float>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x float> @llvm.masked.load.v4f32.p0v4f32(<4 x float> [[TMP3]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]], <4 x float> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x float> @llvm.masked.load.v4f32.p0v4f32(<4 x float> [[TMP3]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]], <4 x float> poison)
	; CHECK-NEXT: [[TMP4:%.*]] = fadd fast <4 x float> [[WIDE_MASKED_LOAD]], [[VEC_PHI]]			; CHECK-NEXT: [[TMP4:%.*]] = fadd fast <4 x float> [[WIDE_MASKED_LOAD]], [[VEC_PHI]]
	; CHECK-NEXT: [[TMP5]] = select <4 x i1> [[ACTIVE_LANE_MASK]], <4 x float> [[TMP4]], <4 x float> [[VEC_PHI]]			; CHECK-NEXT: [[TMP5]] = select <4 x i1> [[ACTIVE_LANE_MASK]], <4 x float> [[TMP4]], <4 x float> [[VEC_PHI]]
	▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_RND_UP:%.*]] = add i32 [[N]], 3			; CHECK-NEXT: [[N_RND_UP:%.*]] = add i32 [[N]], 3
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N_RND_UP]], 4			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N_RND_UP]], 4
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[N_RND_UP]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[N_RND_UP]], [[N_MOD_VF]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x float> [ <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>, [[VECTOR_PH]] ], [ [[TMP5:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x float> [ <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>, [[VECTOR_PH]] ], [ [[TMP5:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[INDEX]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[INDUCTION:%.*]] = add <4 x i32> [[BROADCAST_SPLAT]], <i32 0, i32 1, i32 2, i32 3>
	; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0			; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0
	; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 [[TMP0]], i32 [[N]])			; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 [[TMP0]], i32 [[N]])
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds float, float [[X:%.*]], i32 [[TMP0]]			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds float, float [[X:%.*]], i32 [[TMP0]]
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds float, float [[TMP1]], i32 0			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds float, float [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[TMP2]] to <4 x float>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[TMP2]] to <4 x float>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x float> @llvm.masked.load.v4f32.p0v4f32(<4 x float> [[TMP3]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]], <4 x float> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x float> @llvm.masked.load.v4f32.p0v4f32(<4 x float> [[TMP3]], i32 4, <4 x i1> [[ACTIVE_LANE_MASK]], <4 x float> poison)
	; CHECK-NEXT: [[TMP4:%.*]] = fmul fast <4 x float> [[WIDE_MASKED_LOAD]], [[VEC_PHI]]			; CHECK-NEXT: [[TMP4:%.*]] = fmul fast <4 x float> [[WIDE_MASKED_LOAD]], [[VEC_PHI]]
	; CHECK-NEXT: [[TMP5]] = select <4 x i1> [[ACTIVE_LANE_MASK]], <4 x float> [[TMP4]], <4 x float> [[VEC_PHI]]			; CHECK-NEXT: [[TMP5]] = select <4 x i1> [[ACTIVE_LANE_MASK]], <4 x float> [[TMP4]], <4 x float> [[VEC_PHI]]
	▲ Show 20 Lines • Show All 467 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/X86/constant-fold.ll

	Show All 13 Lines
	; CHECK-LABEL: @f1(			; CHECK-LABEL: @f1(
	; CHECK-NEXT: bb1:			; CHECK-NEXT: bb1:
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[OFFSET_IDX:%.*]] = trunc i32 [[INDEX]] to i16			; CHECK-NEXT: [[OFFSET_IDX:%.*]] = trunc i32 [[INDEX]] to i16
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <2 x i16> poison, i16 [[OFFSET_IDX]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i16> [[BROADCAST_SPLATINSERT]], <2 x i16> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[INDUCTION:%.*]] = add <2 x i16> [[BROADCAST_SPLAT]], <i16 0, i16 1>
	; CHECK-NEXT: [[TMP0:%.*]] = add i16 [[OFFSET_IDX]], 0			; CHECK-NEXT: [[TMP0:%.*]] = add i16 [[OFFSET_IDX]], 0
	; CHECK-NEXT: [[TMP1:%.*]] = sext i16 [[TMP0]] to i64			; CHECK-NEXT: [[TMP1:%.*]] = sext i16 [[TMP0]] to i64
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr [2 x i16], [2 x i16] @b, i16 0, i64 [[TMP1]]			; CHECK-NEXT: [[TMP2:%.]] = getelementptr [2 x i16], [2 x i16] @b, i16 0, i64 [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.]] = getelementptr i16, i16** [[TMP2]], i32 0			; CHECK-NEXT: [[TMP3:%.]] = getelementptr i16, i16** [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP4:%.]] = bitcast i16* [[TMP3]] to <2 x i16>			; CHECK-NEXT: [[TMP4:%.]] = bitcast i16* [[TMP3]] to <2 x i16>
	; CHECK-NEXT: store <2 x i16> <i16 getelementptr inbounds ([1 x %rec8], [1 x %rec8]* @a, i32 0, i32 0, i32 0), i16* getelementptr inbounds ([1 x %rec8], [1 x %rec8]* @a, i32 0, i32 0, i32 0)>, <2 x i16> [[TMP4]], align 8			; CHECK-NEXT: store <2 x i16> <i16 getelementptr inbounds ([1 x %rec8], [1 x %rec8]* @a, i32 0, i32 0, i32 0), i16* getelementptr inbounds ([1 x %rec8], [1 x %rec8]* @a, i32 0, i32 0, i32 0)>, <2 x i16> [[TMP4]], align 8
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 2			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 2
	; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i32 [[INDEX_NEXT]], 2			; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i32 [[INDEX_NEXT]], 2
	Show All 40 Lines

llvm/test/Transforms/LoopVectorize/X86/optsize.ll

	Show All 11 Lines
	define i32 @foo_optsize() #0 {			define i32 @foo_optsize() #0 {
	; CHECK-LABEL: @foo_optsize(			; CHECK-LABEL: @foo_optsize(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <64 x i32> poison, i32 [[INDEX]], i32 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <64 x i32> poison, i32 [[INDEX]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <64 x i32> [[BROADCAST_SPLATINSERT]], <64 x i32> poison, <64 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <64 x i32> [[BROADCAST_SPLATINSERT]], <64 x i32> poison, <64 x i32> zeroinitializer
	; CHECK-NEXT: [[INDUCTION:%.*]] = add <64 x i32> [[BROADCAST_SPLAT]], <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 32, i32 33, i32 34, i32 35, i32 36, i32 37, i32 38, i32 39, i32 40, i32 41, i32 42, i32 43, i32 44, i32 45, i32 46, i32 47, i32 48, i32 49, i32 50, i32 51, i32 52, i32 53, i32 54, i32 55, i32 56, i32 57, i32 58, i32 59, i32 60, i32 61, i32 62, i32 63>			; CHECK-NEXT: [[INDUCTION:%.*]] = add <64 x i32> [[BROADCAST_SPLAT]], <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 32, i32 33, i32 34, i32 35, i32 36, i32 37, i32 38, i32 39, i32 40, i32 41, i32 42, i32 43, i32 44, i32 45, i32 46, i32 47, i32 48, i32 49, i32 50, i32 51, i32 52, i32 53, i32 54, i32 55, i32 56, i32 57, i32 58, i32 59, i32 60, i32 61, i32 62, i32 63>
	; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0
	; CHECK-NEXT: [[TMP1:%.*]] = icmp ule <64 x i32> [[INDUCTION]], <i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202>			; CHECK-NEXT: [[TMP1:%.*]] = icmp ule <64 x i32> [[INDUCTION]], <i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202>
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[TMP0]]			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[TMP0]]
	; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds i8, i8 [[TMP2]], i32 0			; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds i8, i8 [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP4:%.]] = bitcast i8 [[TMP3]] to <64 x i8>*			; CHECK-NEXT: [[TMP4:%.]] = bitcast i8 [[TMP3]] to <64 x i8>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <64 x i8> @llvm.masked.load.v64i8.p0v64i8(<64 x i8> [[TMP4]], i32 1, <64 x i1> [[TMP1]], <64 x i8> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <64 x i8> @llvm.masked.load.v64i8.p0v64i8(<64 x i8> [[TMP4]], i32 1, <64 x i1> [[TMP1]], <64 x i8> poison)
	; CHECK-NEXT: [[TMP5:%.*]] = icmp eq <64 x i8> [[WIDE_MASKED_LOAD]], zeroinitializer			; CHECK-NEXT: [[TMP5:%.*]] = icmp eq <64 x i8> [[WIDE_MASKED_LOAD]], zeroinitializer
	; CHECK-NEXT: [[TMP6:%.*]] = select <64 x i1> [[TMP5]], <64 x i8> <i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2>, <64 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			; CHECK-NEXT: [[TMP6:%.*]] = select <64 x i1> [[TMP5]], <64 x i8> <i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2>, <64 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	; CHECK-NEXT: [[TMP7:%.]] = bitcast i8 [[TMP3]] to <64 x i8>*			; CHECK-NEXT: [[TMP7:%.]] = bitcast i8 [[TMP3]] to <64 x i8>*
	; CHECK-NEXT: call void @llvm.masked.store.v64i8.p0v64i8(<64 x i8> [[TMP6]], <64 x i8>* [[TMP7]], i32 1, <64 x i1> [[TMP1]])			; CHECK-NEXT: call void @llvm.masked.store.v64i8.p0v64i8(<64 x i8> [[TMP6]], <64 x i8>* [[TMP7]], i32 1, <64 x i1> [[TMP1]])
	; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 64			; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 64
	; CHECK-NEXT: [[TMP8:%.*]] = icmp eq i32 [[INDEX_NEXT]], 256			; CHECK-NEXT: [[TMP8:%.*]] = icmp eq i32 [[INDEX_NEXT]], 256
	; CHECK-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP0:!llvm.loop !.]]			; CHECK-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 256, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 256, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[I_08:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[I_08:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[I_08]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[I_08]]
	Show All 9 Lines
	;			;
	; AUTOVF-LABEL: @foo_optsize(			; AUTOVF-LABEL: @foo_optsize(
	; AUTOVF-NEXT: entry:			; AUTOVF-NEXT: entry:
	; AUTOVF-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; AUTOVF-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; AUTOVF: vector.ph:			; AUTOVF: vector.ph:
	; AUTOVF-NEXT: br label [[VECTOR_BODY:%.*]]			; AUTOVF-NEXT: br label [[VECTOR_BODY:%.*]]
	; AUTOVF: vector.body:			; AUTOVF: vector.body:
	; AUTOVF-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; AUTOVF-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; AUTOVF-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0
	; AUTOVF-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <32 x i32> poison, i32 [[INDEX]], i32 0			; AUTOVF-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <32 x i32> poison, i32 [[INDEX]], i32 0
	; AUTOVF-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <32 x i32> [[BROADCAST_SPLATINSERT]], <32 x i32> poison, <32 x i32> zeroinitializer			; AUTOVF-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <32 x i32> [[BROADCAST_SPLATINSERT]], <32 x i32> poison, <32 x i32> zeroinitializer
	; AUTOVF-NEXT: [[INDUCTION:%.*]] = add <32 x i32> [[BROADCAST_SPLAT]], <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>			; AUTOVF-NEXT: [[INDUCTION:%.*]] = add <32 x i32> [[BROADCAST_SPLAT]], <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
	; AUTOVF-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0
	; AUTOVF-NEXT: [[TMP1:%.*]] = icmp ule <32 x i32> [[INDUCTION]], <i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202>			; AUTOVF-NEXT: [[TMP1:%.*]] = icmp ule <32 x i32> [[INDUCTION]], <i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202>
	; AUTOVF-NEXT: [[TMP2:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[TMP0]]			; AUTOVF-NEXT: [[TMP2:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[TMP0]]
	; AUTOVF-NEXT: [[TMP3:%.]] = getelementptr inbounds i8, i8 [[TMP2]], i32 0			; AUTOVF-NEXT: [[TMP3:%.]] = getelementptr inbounds i8, i8 [[TMP2]], i32 0
	; AUTOVF-NEXT: [[TMP4:%.]] = bitcast i8 [[TMP3]] to <32 x i8>*			; AUTOVF-NEXT: [[TMP4:%.]] = bitcast i8 [[TMP3]] to <32 x i8>*
	; AUTOVF-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <32 x i8> @llvm.masked.load.v32i8.p0v32i8(<32 x i8> [[TMP4]], i32 1, <32 x i1> [[TMP1]], <32 x i8> poison)			; AUTOVF-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <32 x i8> @llvm.masked.load.v32i8.p0v32i8(<32 x i8> [[TMP4]], i32 1, <32 x i1> [[TMP1]], <32 x i8> poison)
	; AUTOVF-NEXT: [[TMP5:%.*]] = icmp eq <32 x i8> [[WIDE_MASKED_LOAD]], zeroinitializer			; AUTOVF-NEXT: [[TMP5:%.*]] = icmp eq <32 x i8> [[WIDE_MASKED_LOAD]], zeroinitializer
	; AUTOVF-NEXT: [[TMP6:%.*]] = select <32 x i1> [[TMP5]], <32 x i8> <i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2>, <32 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			; AUTOVF-NEXT: [[TMP6:%.*]] = select <32 x i1> [[TMP5]], <32 x i8> <i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2>, <32 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	; AUTOVF-NEXT: [[TMP7:%.]] = bitcast i8 [[TMP3]] to <32 x i8>*			; AUTOVF-NEXT: [[TMP7:%.]] = bitcast i8 [[TMP3]] to <32 x i8>*
	; AUTOVF-NEXT: call void @llvm.masked.store.v32i8.p0v32i8(<32 x i8> [[TMP6]], <32 x i8>* [[TMP7]], i32 1, <32 x i1> [[TMP1]])			; AUTOVF-NEXT: call void @llvm.masked.store.v32i8.p0v32i8(<32 x i8> [[TMP6]], <32 x i8>* [[TMP7]], i32 1, <32 x i1> [[TMP1]])
	; AUTOVF-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 32			; AUTOVF-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 32
	; AUTOVF-NEXT: [[TMP8:%.*]] = icmp eq i32 [[INDEX_NEXT]], 224			; AUTOVF-NEXT: [[TMP8:%.*]] = icmp eq i32 [[INDEX_NEXT]], 224
	; AUTOVF-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP0:!llvm.loop !.]]			; AUTOVF-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; AUTOVF: middle.block:			; AUTOVF: middle.block:
	; AUTOVF-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]			; AUTOVF-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; AUTOVF: scalar.ph:			; AUTOVF: scalar.ph:
	; AUTOVF-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 224, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; AUTOVF-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 224, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; AUTOVF-NEXT: br label [[FOR_BODY:%.*]]			; AUTOVF-NEXT: br label [[FOR_BODY:%.*]]
	; AUTOVF: for.body:			; AUTOVF: for.body:
	; AUTOVF-NEXT: [[I_08:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_BODY]] ]			; AUTOVF-NEXT: [[I_08:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_BODY]] ]
	; AUTOVF-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[I_08]]			; AUTOVF-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[I_08]]
	Show All 31 Lines
	define i32 @foo_minsize() #1 {			define i32 @foo_minsize() #1 {
	; CHECK-LABEL: @foo_minsize(			; CHECK-LABEL: @foo_minsize(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <64 x i32> poison, i32 [[INDEX]], i32 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <64 x i32> poison, i32 [[INDEX]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <64 x i32> [[BROADCAST_SPLATINSERT]], <64 x i32> poison, <64 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <64 x i32> [[BROADCAST_SPLATINSERT]], <64 x i32> poison, <64 x i32> zeroinitializer
	; CHECK-NEXT: [[INDUCTION:%.*]] = add <64 x i32> [[BROADCAST_SPLAT]], <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 32, i32 33, i32 34, i32 35, i32 36, i32 37, i32 38, i32 39, i32 40, i32 41, i32 42, i32 43, i32 44, i32 45, i32 46, i32 47, i32 48, i32 49, i32 50, i32 51, i32 52, i32 53, i32 54, i32 55, i32 56, i32 57, i32 58, i32 59, i32 60, i32 61, i32 62, i32 63>			; CHECK-NEXT: [[INDUCTION:%.*]] = add <64 x i32> [[BROADCAST_SPLAT]], <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 32, i32 33, i32 34, i32 35, i32 36, i32 37, i32 38, i32 39, i32 40, i32 41, i32 42, i32 43, i32 44, i32 45, i32 46, i32 47, i32 48, i32 49, i32 50, i32 51, i32 52, i32 53, i32 54, i32 55, i32 56, i32 57, i32 58, i32 59, i32 60, i32 61, i32 62, i32 63>
	; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0
	; CHECK-NEXT: [[TMP1:%.*]] = icmp ule <64 x i32> [[INDUCTION]], <i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202>			; CHECK-NEXT: [[TMP1:%.*]] = icmp ule <64 x i32> [[INDUCTION]], <i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202>
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[TMP0]]			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[TMP0]]
	; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds i8, i8 [[TMP2]], i32 0			; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds i8, i8 [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP4:%.]] = bitcast i8 [[TMP3]] to <64 x i8>*			; CHECK-NEXT: [[TMP4:%.]] = bitcast i8 [[TMP3]] to <64 x i8>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <64 x i8> @llvm.masked.load.v64i8.p0v64i8(<64 x i8> [[TMP4]], i32 1, <64 x i1> [[TMP1]], <64 x i8> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <64 x i8> @llvm.masked.load.v64i8.p0v64i8(<64 x i8> [[TMP4]], i32 1, <64 x i1> [[TMP1]], <64 x i8> poison)
	; CHECK-NEXT: [[TMP5:%.*]] = icmp eq <64 x i8> [[WIDE_MASKED_LOAD]], zeroinitializer			; CHECK-NEXT: [[TMP5:%.*]] = icmp eq <64 x i8> [[WIDE_MASKED_LOAD]], zeroinitializer
	; CHECK-NEXT: [[TMP6:%.*]] = select <64 x i1> [[TMP5]], <64 x i8> <i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2>, <64 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			; CHECK-NEXT: [[TMP6:%.*]] = select <64 x i1> [[TMP5]], <64 x i8> <i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2>, <64 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	; CHECK-NEXT: [[TMP7:%.]] = bitcast i8 [[TMP3]] to <64 x i8>*			; CHECK-NEXT: [[TMP7:%.]] = bitcast i8 [[TMP3]] to <64 x i8>*
	Show All 21 Lines
	;			;
	; AUTOVF-LABEL: @foo_minsize(			; AUTOVF-LABEL: @foo_minsize(
	; AUTOVF-NEXT: entry:			; AUTOVF-NEXT: entry:
	; AUTOVF-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; AUTOVF-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; AUTOVF: vector.ph:			; AUTOVF: vector.ph:
	; AUTOVF-NEXT: br label [[VECTOR_BODY:%.*]]			; AUTOVF-NEXT: br label [[VECTOR_BODY:%.*]]
	; AUTOVF: vector.body:			; AUTOVF: vector.body:
	; AUTOVF-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; AUTOVF-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; AUTOVF-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0
	; AUTOVF-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <32 x i32> poison, i32 [[INDEX]], i32 0			; AUTOVF-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <32 x i32> poison, i32 [[INDEX]], i32 0
	; AUTOVF-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <32 x i32> [[BROADCAST_SPLATINSERT]], <32 x i32> poison, <32 x i32> zeroinitializer			; AUTOVF-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <32 x i32> [[BROADCAST_SPLATINSERT]], <32 x i32> poison, <32 x i32> zeroinitializer
	; AUTOVF-NEXT: [[INDUCTION:%.*]] = add <32 x i32> [[BROADCAST_SPLAT]], <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>			; AUTOVF-NEXT: [[INDUCTION:%.*]] = add <32 x i32> [[BROADCAST_SPLAT]], <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
	; AUTOVF-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0
	; AUTOVF-NEXT: [[TMP1:%.*]] = icmp ule <32 x i32> [[INDUCTION]], <i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202>			; AUTOVF-NEXT: [[TMP1:%.*]] = icmp ule <32 x i32> [[INDUCTION]], <i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202, i32 202>
	; AUTOVF-NEXT: [[TMP2:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[TMP0]]			; AUTOVF-NEXT: [[TMP2:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[TMP0]]
	; AUTOVF-NEXT: [[TMP3:%.]] = getelementptr inbounds i8, i8 [[TMP2]], i32 0			; AUTOVF-NEXT: [[TMP3:%.]] = getelementptr inbounds i8, i8 [[TMP2]], i32 0
	; AUTOVF-NEXT: [[TMP4:%.]] = bitcast i8 [[TMP3]] to <32 x i8>*			; AUTOVF-NEXT: [[TMP4:%.]] = bitcast i8 [[TMP3]] to <32 x i8>*
	; AUTOVF-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <32 x i8> @llvm.masked.load.v32i8.p0v32i8(<32 x i8> [[TMP4]], i32 1, <32 x i1> [[TMP1]], <32 x i8> poison)			; AUTOVF-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <32 x i8> @llvm.masked.load.v32i8.p0v32i8(<32 x i8> [[TMP4]], i32 1, <32 x i1> [[TMP1]], <32 x i8> poison)
	; AUTOVF-NEXT: [[TMP5:%.*]] = icmp eq <32 x i8> [[WIDE_MASKED_LOAD]], zeroinitializer			; AUTOVF-NEXT: [[TMP5:%.*]] = icmp eq <32 x i8> [[WIDE_MASKED_LOAD]], zeroinitializer
	; AUTOVF-NEXT: [[TMP6:%.*]] = select <32 x i1> [[TMP5]], <32 x i8> <i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2>, <32 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			; AUTOVF-NEXT: [[TMP6:%.*]] = select <32 x i1> [[TMP5]], <32 x i8> <i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2>, <32 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	; AUTOVF-NEXT: [[TMP7:%.]] = bitcast i8 [[TMP3]] to <32 x i8>*			; AUTOVF-NEXT: [[TMP7:%.]] = bitcast i8 [[TMP3]] to <32 x i8>*
	; AUTOVF-NEXT: call void @llvm.masked.store.v32i8.p0v32i8(<32 x i8> [[TMP6]], <32 x i8>* [[TMP7]], i32 1, <32 x i1> [[TMP1]])			; AUTOVF-NEXT: call void @llvm.masked.store.v32i8.p0v32i8(<32 x i8> [[TMP6]], <32 x i8>* [[TMP7]], i32 1, <32 x i1> [[TMP1]])
	; AUTOVF-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 32			; AUTOVF-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 32
	; AUTOVF-NEXT: [[TMP8:%.*]] = icmp eq i32 [[INDEX_NEXT]], 224			; AUTOVF-NEXT: [[TMP8:%.*]] = icmp eq i32 [[INDEX_NEXT]], 224
	; AUTOVF-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP4:!llvm.loop !.]]			; AUTOVF-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
	; AUTOVF: middle.block:			; AUTOVF: middle.block:
	; AUTOVF-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]			; AUTOVF-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; AUTOVF: scalar.ph:			; AUTOVF: scalar.ph:
	; AUTOVF-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 224, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; AUTOVF-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 224, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; AUTOVF-NEXT: br label [[FOR_BODY:%.*]]			; AUTOVF-NEXT: br label [[FOR_BODY:%.*]]
	; AUTOVF: for.body:			; AUTOVF: for.body:
	; AUTOVF-NEXT: [[I_08:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_BODY]] ]			; AUTOVF-NEXT: [[I_08:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_BODY]] ]
	; AUTOVF-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[I_08]]			; AUTOVF-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [32 x i8], [32 x i8] @tab, i32 0, i32 [[I_08]]
	▲ Show 20 Lines • Show All 249 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/X86/pr34438.ll

	Show All 11 Lines
	define void @small_tc(float* noalias nocapture %A, float* noalias nocapture readonly %B) {			define void @small_tc(float* noalias nocapture %A, float* noalias nocapture readonly %B) {
	; CHECK-LABEL: @small_tc(			; CHECK-LABEL: @small_tc(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <8 x i64> poison, i64 [[INDEX]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <8 x i64> [[BROADCAST_SPLATINSERT]], <8 x i64> poison, <8 x i32> zeroinitializer
	; CHECK-NEXT: [[INDUCTION:%.*]] = add <8 x i64> [[BROADCAST_SPLAT]], <i64 0, i64 1, i64 2, i64 3, i64 4, i64 5, i64 6, i64 7>
	; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0			; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds float, float [[B:%.*]], i64 [[TMP0]]			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds float, float [[B:%.*]], i64 [[TMP0]]
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds float, float [[TMP1]], i32 0			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds float, float [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[TMP2]] to <8 x float>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[TMP2]] to <8 x float>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <8 x float>, <8 x float> [[TMP3]], align 4, !llvm.access.group !0			; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <8 x float>, <8 x float> [[TMP3]], align 4, !llvm.access.group !0
	; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds float, float [[A:%.*]], i64 [[TMP0]]			; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds float, float [[A:%.*]], i64 [[TMP0]]
	; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[TMP4]], i32 0			; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[TMP4]], i32 0
	; CHECK-NEXT: [[TMP6:%.]] = bitcast float [[TMP5]] to <8 x float>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast float [[TMP5]] to <8 x float>*
	▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/X86/small-size.ll

	Show First 20 Lines • Show All 148 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[INDEX38:%.]] = phi i64 [ 0, [[VECTOR_PH10]] ], [ [[INDEX_NEXT37:%.]], [[PRED_STORE_CONTINUE36:%.*]] ]			; CHECK-NEXT: [[INDEX38:%.]] = phi i64 [ 0, [[VECTOR_PH10]] ], [ [[INDEX_NEXT37:%.]], [[PRED_STORE_CONTINUE36:%.*]] ]
	; CHECK-NEXT: [[OFFSET_IDX:%.*]] = add i64 [[I_0_LCSSA]], [[INDEX38]]			; CHECK-NEXT: [[OFFSET_IDX:%.*]] = add i64 [[I_0_LCSSA]], [[INDEX38]]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT27:%.*]] = insertelement <4 x i64> poison, i64 [[INDEX38]], i64 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT27:%.*]] = insertelement <4 x i64> poison, i64 [[INDEX38]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLAT28:%.*]] = shufflevector <4 x i64> [[BROADCAST_SPLATINSERT27]], <4 x i64> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT28:%.*]] = shufflevector <4 x i64> [[BROADCAST_SPLATINSERT27]], <4 x i64> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[VEC_IV:%.*]] = or <4 x i64> [[BROADCAST_SPLAT28]], <i64 0, i64 1, i64 2, i64 3>			; CHECK-NEXT: [[VEC_IV:%.*]] = or <4 x i64> [[BROADCAST_SPLAT28]], <i64 0, i64 1, i64 2, i64 3>
	; CHECK-NEXT: [[TMP20:%.*]] = icmp ule <4 x i64> [[VEC_IV]], [[BROADCAST_SPLAT19]]			; CHECK-NEXT: [[TMP20:%.*]] = icmp ule <4 x i64> [[VEC_IV]], [[BROADCAST_SPLAT19]]
	; CHECK-NEXT: [[TMP21:%.*]] = extractelement <4 x i1> [[TMP20]], i64 0			; CHECK-NEXT: [[TMP21:%.*]] = extractelement <4 x i1> [[TMP20]], i64 0
	; CHECK-NEXT: br i1 [[TMP21]], label [[PRED_STORE_IF29:%.]], label [[PRED_STORE_CONTINUE30:%.]]			; CHECK-NEXT: br i1 [[TMP21]], label [[PRED_STORE_IF29:%.]], label [[PRED_STORE_CONTINUE30:%.]]
	; CHECK: pred.store.if29:			; CHECK: pred.store.if24:
	; CHECK-NEXT: [[TMP22:%.]] = getelementptr inbounds [2048 x i32], [2048 x i32] @b, i64 0, i64 [[OFFSET_IDX]]			; CHECK-NEXT: [[TMP22:%.]] = getelementptr inbounds [2048 x i32], [2048 x i32] @b, i64 0, i64 [[OFFSET_IDX]]
	; CHECK-NEXT: [[TMP23:%.]] = load i32, i32 [[TMP22]], align 4			; CHECK-NEXT: [[TMP23:%.]] = load i32, i32 [[TMP22]], align 4
	; CHECK-NEXT: [[TMP24:%.]] = getelementptr inbounds [2048 x i32], [2048 x i32] @c, i64 0, i64 [[OFFSET_IDX]]			; CHECK-NEXT: [[TMP24:%.]] = getelementptr inbounds [2048 x i32], [2048 x i32] @c, i64 0, i64 [[OFFSET_IDX]]
	; CHECK-NEXT: [[TMP25:%.]] = load i32, i32 [[TMP24]], align 4			; CHECK-NEXT: [[TMP25:%.]] = load i32, i32 [[TMP24]], align 4
	; CHECK-NEXT: [[TMP26:%.*]] = and i32 [[TMP25]], [[TMP23]]			; CHECK-NEXT: [[TMP26:%.*]] = and i32 [[TMP25]], [[TMP23]]
	; CHECK-NEXT: [[TMP27:%.]] = getelementptr inbounds [2048 x i32], [2048 x i32] @a, i64 0, i64 [[OFFSET_IDX]]			; CHECK-NEXT: [[TMP27:%.]] = getelementptr inbounds [2048 x i32], [2048 x i32] @a, i64 0, i64 [[OFFSET_IDX]]
	; CHECK-NEXT: store i32 [[TMP26]], i32* [[TMP27]], align 4			; CHECK-NEXT: store i32 [[TMP26]], i32* [[TMP27]], align 4
	; CHECK-NEXT: br label [[PRED_STORE_CONTINUE30]]			; CHECK-NEXT: br label [[PRED_STORE_CONTINUE30]]
	; CHECK: pred.store.continue30:			; CHECK: pred.store.continue25:
	; CHECK-NEXT: [[TMP28:%.*]] = extractelement <4 x i1> [[TMP20]], i64 1			; CHECK-NEXT: [[TMP28:%.*]] = extractelement <4 x i1> [[TMP20]], i64 1
	; CHECK-NEXT: br i1 [[TMP28]], label [[PRED_STORE_IF31:%.]], label [[PRED_STORE_CONTINUE32:%.]]			; CHECK-NEXT: br i1 [[TMP28]], label [[PRED_STORE_IF31:%.]], label [[PRED_STORE_CONTINUE32:%.]]
	; CHECK: pred.store.if31:			; CHECK: pred.store.if26:
	; CHECK-NEXT: [[TMP29:%.*]] = add i64 [[OFFSET_IDX]], 1			; CHECK-NEXT: [[TMP29:%.*]] = add i64 [[OFFSET_IDX]], 1
	; CHECK-NEXT: [[TMP30:%.]] = getelementptr inbounds [2048 x i32], [2048 x i32] @b, i64 0, i64 [[TMP29]]			; CHECK-NEXT: [[TMP30:%.]] = getelementptr inbounds [2048 x i32], [2048 x i32] @b, i64 0, i64 [[TMP29]]
	; CHECK-NEXT: [[TMP31:%.]] = load i32, i32 [[TMP30]], align 4			; CHECK-NEXT: [[TMP31:%.]] = load i32, i32 [[TMP30]], align 4
	; CHECK-NEXT: [[TMP32:%.]] = getelementptr inbounds [2048 x i32], [2048 x i32] @c, i64 0, i64 [[TMP29]]			; CHECK-NEXT: [[TMP32:%.]] = getelementptr inbounds [2048 x i32], [2048 x i32] @c, i64 0, i64 [[TMP29]]
	; CHECK-NEXT: [[TMP33:%.]] = load i32, i32 [[TMP32]], align 4			; CHECK-NEXT: [[TMP33:%.]] = load i32, i32 [[TMP32]], align 4
	; CHECK-NEXT: [[TMP34:%.*]] = and i32 [[TMP33]], [[TMP31]]			; CHECK-NEXT: [[TMP34:%.*]] = and i32 [[TMP33]], [[TMP31]]
	; CHECK-NEXT: [[TMP35:%.]] = getelementptr inbounds [2048 x i32], [2048 x i32] @a, i64 0, i64 [[TMP29]]			; CHECK-NEXT: [[TMP35:%.]] = getelementptr inbounds [2048 x i32], [2048 x i32] @a, i64 0, i64 [[TMP29]]
	; CHECK-NEXT: store i32 [[TMP34]], i32* [[TMP35]], align 4			; CHECK-NEXT: store i32 [[TMP34]], i32* [[TMP35]], align 4
	; CHECK-NEXT: br label [[PRED_STORE_CONTINUE32]]			; CHECK-NEXT: br label [[PRED_STORE_CONTINUE32]]
	; CHECK: pred.store.continue32:			; CHECK: pred.store.continue27:
	; CHECK-NEXT: [[TMP36:%.*]] = extractelement <4 x i1> [[TMP20]], i64 2			; CHECK-NEXT: [[TMP36:%.*]] = extractelement <4 x i1> [[TMP20]], i64 2
	; CHECK-NEXT: br i1 [[TMP36]], label [[PRED_STORE_IF33:%.]], label [[PRED_STORE_CONTINUE34:%.]]			; CHECK-NEXT: br i1 [[TMP36]], label [[PRED_STORE_IF33:%.]], label [[PRED_STORE_CONTINUE34:%.]]
	; CHECK: pred.store.if33:			; CHECK: pred.store.if28:
	; CHECK-NEXT: [[TMP37:%.*]] = add i64 [[OFFSET_IDX]], 2			; CHECK-NEXT: [[TMP37:%.*]] = add i64 [[OFFSET_IDX]], 2
	; CHECK-NEXT: [[TMP38:%.]] = getelementptr inbounds [2048 x i32], [2048 x i32] @b, i64 0, i64 [[TMP37]]			; CHECK-NEXT: [[TMP38:%.]] = getelementptr inbounds [2048 x i32], [2048 x i32] @b, i64 0, i64 [[TMP37]]
	; CHECK-NEXT: [[TMP39:%.]] = load i32, i32 [[TMP38]], align 4			; CHECK-NEXT: [[TMP39:%.]] = load i32, i32 [[TMP38]], align 4
	; CHECK-NEXT: [[TMP40:%.]] = getelementptr inbounds [2048 x i32], [2048 x i32] @c, i64 0, i64 [[TMP37]]			; CHECK-NEXT: [[TMP40:%.]] = getelementptr inbounds [2048 x i32], [2048 x i32] @c, i64 0, i64 [[TMP37]]
	; CHECK-NEXT: [[TMP41:%.]] = load i32, i32 [[TMP40]], align 4			; CHECK-NEXT: [[TMP41:%.]] = load i32, i32 [[TMP40]], align 4
	; CHECK-NEXT: [[TMP42:%.*]] = and i32 [[TMP41]], [[TMP39]]			; CHECK-NEXT: [[TMP42:%.*]] = and i32 [[TMP41]], [[TMP39]]
	; CHECK-NEXT: [[TMP43:%.]] = getelementptr inbounds [2048 x i32], [2048 x i32] @a, i64 0, i64 [[TMP37]]			; CHECK-NEXT: [[TMP43:%.]] = getelementptr inbounds [2048 x i32], [2048 x i32] @a, i64 0, i64 [[TMP37]]
	; CHECK-NEXT: store i32 [[TMP42]], i32* [[TMP43]], align 4			; CHECK-NEXT: store i32 [[TMP42]], i32* [[TMP43]], align 4
	; CHECK-NEXT: br label [[PRED_STORE_CONTINUE34]]			; CHECK-NEXT: br label [[PRED_STORE_CONTINUE34]]
	; CHECK: pred.store.continue34:			; CHECK: pred.store.continue29:
	; CHECK-NEXT: [[TMP44:%.*]] = extractelement <4 x i1> [[TMP20]], i64 3			; CHECK-NEXT: [[TMP44:%.*]] = extractelement <4 x i1> [[TMP20]], i64 3
	; CHECK-NEXT: br i1 [[TMP44]], label [[PRED_STORE_IF35:%.*]], label [[PRED_STORE_CONTINUE36]]			; CHECK-NEXT: br i1 [[TMP44]], label [[PRED_STORE_IF35:%.*]], label [[PRED_STORE_CONTINUE36]]
	; CHECK: pred.store.if35:			; CHECK: pred.store.if30:
	; CHECK-NEXT: [[TMP45:%.*]] = add i64 [[OFFSET_IDX]], 3			; CHECK-NEXT: [[TMP45:%.*]] = add i64 [[OFFSET_IDX]], 3
	; CHECK-NEXT: [[TMP46:%.]] = getelementptr inbounds [2048 x i32], [2048 x i32] @b, i64 0, i64 [[TMP45]]			; CHECK-NEXT: [[TMP46:%.]] = getelementptr inbounds [2048 x i32], [2048 x i32] @b, i64 0, i64 [[TMP45]]
	; CHECK-NEXT: [[TMP47:%.]] = load i32, i32 [[TMP46]], align 4			; CHECK-NEXT: [[TMP47:%.]] = load i32, i32 [[TMP46]], align 4
	; CHECK-NEXT: [[TMP48:%.]] = getelementptr inbounds [2048 x i32], [2048 x i32] @c, i64 0, i64 [[TMP45]]			; CHECK-NEXT: [[TMP48:%.]] = getelementptr inbounds [2048 x i32], [2048 x i32] @c, i64 0, i64 [[TMP45]]
	; CHECK-NEXT: [[TMP49:%.]] = load i32, i32 [[TMP48]], align 4			; CHECK-NEXT: [[TMP49:%.]] = load i32, i32 [[TMP48]], align 4
	; CHECK-NEXT: [[TMP50:%.*]] = and i32 [[TMP49]], [[TMP47]]			; CHECK-NEXT: [[TMP50:%.*]] = and i32 [[TMP49]], [[TMP47]]
	; CHECK-NEXT: [[TMP51:%.]] = getelementptr inbounds [2048 x i32], [2048 x i32] @a, i64 0, i64 [[TMP45]]			; CHECK-NEXT: [[TMP51:%.]] = getelementptr inbounds [2048 x i32], [2048 x i32] @a, i64 0, i64 [[TMP45]]
	; CHECK-NEXT: store i32 [[TMP50]], i32* [[TMP51]], align 4			; CHECK-NEXT: store i32 [[TMP50]], i32* [[TMP51]], align 4
	; CHECK-NEXT: br label [[PRED_STORE_CONTINUE36]]			; CHECK-NEXT: br label [[PRED_STORE_CONTINUE36]]
	; CHECK: pred.store.continue36:			; CHECK: pred.store.continue31:
	; CHECK-NEXT: [[INDEX_NEXT37]] = add i64 [[INDEX38]], 4			; CHECK-NEXT: [[INDEX_NEXT37]] = add i64 [[INDEX38]], 4
	; CHECK-NEXT: [[TMP52:%.*]] = icmp eq i64 [[INDEX_NEXT37]], [[N_VEC13]]			; CHECK-NEXT: [[TMP52:%.*]] = icmp eq i64 [[INDEX_NEXT37]], [[N_VEC13]]
	; CHECK-NEXT: br i1 [[TMP52]], label [[MIDDLE_BLOCK7:%.*]], label [[VECTOR_BODY9]], !llvm.loop [[LOOP5:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP52]], label [[MIDDLE_BLOCK7:%.*]], label [[VECTOR_BODY9]], !llvm.loop [[LOOP5:![0-9]+]]
	; CHECK: middle.block7:			; CHECK: middle.block7:
	; CHECK-NEXT: br i1 true, label [[DOT_CRIT_EDGE_LOOPEXIT:%.*]], label [[SCALAR_PH8]]			; CHECK-NEXT: br i1 true, label [[DOT_CRIT_EDGE_LOOPEXIT:%.*]], label [[SCALAR_PH8]]
	; CHECK: scalar.ph8:			; CHECK: scalar.ph8:
	; CHECK-NEXT: br label [[DOTLR_PH:%.*]]			; CHECK-NEXT: br label [[DOTLR_PH:%.*]]
	; CHECK: .lr.ph5:			; CHECK: .lr.ph5:
	▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[NEXT_GEP:%.]] = getelementptr i32, i32 [[P:%.*]], i64 [[INDEX]]			; CHECK-NEXT: [[NEXT_GEP:%.]] = getelementptr i32, i32 [[P:%.*]], i64 [[INDEX]]
	; CHECK-NEXT: [[NEXT_GEP10:%.]] = getelementptr i32, i32 [[Q:%.*]], i64 [[INDEX]]			; CHECK-NEXT: [[NEXT_GEP10:%.]] = getelementptr i32, i32 [[Q:%.*]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP6:%.]] = load i32, i32 [[NEXT_GEP10]], align 16			; CHECK-NEXT: [[TMP6:%.]] = load i32, i32 [[NEXT_GEP10]], align 16
	; CHECK-NEXT: store i32 [[TMP6]], i32* [[NEXT_GEP]], align 16			; CHECK-NEXT: store i32 [[TMP6]], i32* [[NEXT_GEP]], align 16
	; CHECK-NEXT: br label [[PRED_STORE_CONTINUE]]			; CHECK-NEXT: br label [[PRED_STORE_CONTINUE]]
	; CHECK: pred.store.continue:			; CHECK: pred.store.continue:
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i1> [[TMP4]], i64 1			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i1> [[TMP4]], i64 1
	; CHECK-NEXT: br i1 [[TMP7]], label [[PRED_STORE_IF16:%.]], label [[PRED_STORE_CONTINUE17:%.]]			; CHECK-NEXT: br i1 [[TMP7]], label [[PRED_STORE_IF16:%.]], label [[PRED_STORE_CONTINUE17:%.]]
	; CHECK: pred.store.if16:			; CHECK: pred.store.if14:
	; CHECK-NEXT: [[TMP8:%.*]] = or i64 [[INDEX]], 1			; CHECK-NEXT: [[TMP8:%.*]] = or i64 [[INDEX]], 1
	; CHECK-NEXT: [[NEXT_GEP7:%.]] = getelementptr i32, i32 [[P]], i64 [[TMP8]]			; CHECK-NEXT: [[NEXT_GEP7:%.]] = getelementptr i32, i32 [[P]], i64 [[TMP8]]
	; CHECK-NEXT: [[TMP9:%.*]] = or i64 [[INDEX]], 1			; CHECK-NEXT: [[TMP9:%.*]] = or i64 [[INDEX]], 1
	; CHECK-NEXT: [[NEXT_GEP11:%.]] = getelementptr i32, i32 [[Q]], i64 [[TMP9]]			; CHECK-NEXT: [[NEXT_GEP11:%.]] = getelementptr i32, i32 [[Q]], i64 [[TMP9]]
	; CHECK-NEXT: [[TMP10:%.]] = load i32, i32 [[NEXT_GEP11]], align 16			; CHECK-NEXT: [[TMP10:%.]] = load i32, i32 [[NEXT_GEP11]], align 16
	; CHECK-NEXT: store i32 [[TMP10]], i32* [[NEXT_GEP7]], align 16			; CHECK-NEXT: store i32 [[TMP10]], i32* [[NEXT_GEP7]], align 16
	; CHECK-NEXT: br label [[PRED_STORE_CONTINUE17]]			; CHECK-NEXT: br label [[PRED_STORE_CONTINUE17]]
	; CHECK: pred.store.continue17:			; CHECK: pred.store.continue15:
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <4 x i1> [[TMP4]], i64 2			; CHECK-NEXT: [[TMP11:%.*]] = extractelement <4 x i1> [[TMP4]], i64 2
	; CHECK-NEXT: br i1 [[TMP11]], label [[PRED_STORE_IF18:%.]], label [[PRED_STORE_CONTINUE19:%.]]			; CHECK-NEXT: br i1 [[TMP11]], label [[PRED_STORE_IF18:%.]], label [[PRED_STORE_CONTINUE19:%.]]
	; CHECK: pred.store.if18:			; CHECK: pred.store.if16:
	; CHECK-NEXT: [[TMP12:%.*]] = or i64 [[INDEX]], 2			; CHECK-NEXT: [[TMP12:%.*]] = or i64 [[INDEX]], 2
	; CHECK-NEXT: [[NEXT_GEP8:%.]] = getelementptr i32, i32 [[P]], i64 [[TMP12]]			; CHECK-NEXT: [[NEXT_GEP8:%.]] = getelementptr i32, i32 [[P]], i64 [[TMP12]]
	; CHECK-NEXT: [[TMP13:%.*]] = or i64 [[INDEX]], 2			; CHECK-NEXT: [[TMP13:%.*]] = or i64 [[INDEX]], 2
	; CHECK-NEXT: [[NEXT_GEP12:%.]] = getelementptr i32, i32 [[Q]], i64 [[TMP13]]			; CHECK-NEXT: [[NEXT_GEP12:%.]] = getelementptr i32, i32 [[Q]], i64 [[TMP13]]
	; CHECK-NEXT: [[TMP14:%.]] = load i32, i32 [[NEXT_GEP12]], align 16			; CHECK-NEXT: [[TMP14:%.]] = load i32, i32 [[NEXT_GEP12]], align 16
	; CHECK-NEXT: store i32 [[TMP14]], i32* [[NEXT_GEP8]], align 16			; CHECK-NEXT: store i32 [[TMP14]], i32* [[NEXT_GEP8]], align 16
	; CHECK-NEXT: br label [[PRED_STORE_CONTINUE19]]			; CHECK-NEXT: br label [[PRED_STORE_CONTINUE19]]
	; CHECK: pred.store.continue19:			; CHECK: pred.store.continue17:
	; CHECK-NEXT: [[TMP15:%.*]] = extractelement <4 x i1> [[TMP4]], i64 3			; CHECK-NEXT: [[TMP15:%.*]] = extractelement <4 x i1> [[TMP4]], i64 3
	; CHECK-NEXT: br i1 [[TMP15]], label [[PRED_STORE_IF20:%.*]], label [[PRED_STORE_CONTINUE21]]			; CHECK-NEXT: br i1 [[TMP15]], label [[PRED_STORE_IF20:%.*]], label [[PRED_STORE_CONTINUE21]]
	; CHECK: pred.store.if20:			; CHECK: pred.store.if18:
	; CHECK-NEXT: [[TMP16:%.*]] = or i64 [[INDEX]], 3			; CHECK-NEXT: [[TMP16:%.*]] = or i64 [[INDEX]], 3
	; CHECK-NEXT: [[NEXT_GEP9:%.]] = getelementptr i32, i32 [[P]], i64 [[TMP16]]			; CHECK-NEXT: [[NEXT_GEP9:%.]] = getelementptr i32, i32 [[P]], i64 [[TMP16]]
	; CHECK-NEXT: [[TMP17:%.*]] = or i64 [[INDEX]], 3			; CHECK-NEXT: [[TMP17:%.*]] = or i64 [[INDEX]], 3
	; CHECK-NEXT: [[NEXT_GEP13:%.]] = getelementptr i32, i32 [[Q]], i64 [[TMP17]]			; CHECK-NEXT: [[NEXT_GEP13:%.]] = getelementptr i32, i32 [[Q]], i64 [[TMP17]]
	; CHECK-NEXT: [[TMP18:%.]] = load i32, i32 [[NEXT_GEP13]], align 16			; CHECK-NEXT: [[TMP18:%.]] = load i32, i32 [[NEXT_GEP13]], align 16
	; CHECK-NEXT: store i32 [[TMP18]], i32* [[NEXT_GEP9]], align 16			; CHECK-NEXT: store i32 [[TMP18]], i32* [[NEXT_GEP9]], align 16
	; CHECK-NEXT: br label [[PRED_STORE_CONTINUE21]]			; CHECK-NEXT: br label [[PRED_STORE_CONTINUE21]]
	; CHECK: pred.store.continue21:			; CHECK: pred.store.continue19:
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP19:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP19:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP19]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP19]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: br i1 true, label [[DOT_CRIT_EDGE_LOOPEXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[DOT_CRIT_EDGE_LOOPEXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: br label [[DOTLR_PH:%.*]]			; CHECK-NEXT: br label [[DOTLR_PH:%.*]]
	; CHECK: .lr.ph:			; CHECK: .lr.ph:
	▲ Show 20 Lines • Show All 248 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/X86/tail_loop_folding.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -loop-vectorize -S \| FileCheck %s			; RUN: opt < %s -loop-vectorize -S \| FileCheck %s
	; RUN: opt < %s -loop-vectorize -prefer-predicate-over-epilogue=predicate-dont-vectorize -S \| FileCheck %s			; RUN: opt < %s -loop-vectorize -prefer-predicate-over-epilogue=predicate-dont-vectorize -S \| FileCheck %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	define dso_local void @tail_folding_enabled(i32* noalias nocapture %A, i32* noalias nocapture readonly %B, i32* noalias nocapture readonly %C) local_unnamed_addr #0 {			define dso_local void @tail_folding_enabled(i32* noalias nocapture %A, i32* noalias nocapture readonly %B, i32* noalias nocapture readonly %C) local_unnamed_addr #0 {
	; CHECK-LABEL: @tail_folding_enabled(			; CHECK-LABEL: @tail_folding_enabled(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <8 x i64> poison, i64 [[INDEX]], i32 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <8 x i64> poison, i64 [[INDEX]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <8 x i64> [[BROADCAST_SPLATINSERT]], <8 x i64> poison, <8 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <8 x i64> [[BROADCAST_SPLATINSERT]], <8 x i64> poison, <8 x i32> zeroinitializer
	; CHECK-NEXT: [[INDUCTION:%.*]] = add <8 x i64> [[BROADCAST_SPLAT]], <i64 0, i64 1, i64 2, i64 3, i64 4, i64 5, i64 6, i64 7>			; CHECK-NEXT: [[INDUCTION:%.*]] = add <8 x i64> [[BROADCAST_SPLAT]], <i64 0, i64 1, i64 2, i64 3, i64 4, i64 5, i64 6, i64 7>
	; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
	; CHECK-NEXT: [[TMP1:%.*]] = icmp ule <8 x i64> [[INDUCTION]], <i64 429, i64 429, i64 429, i64 429, i64 429, i64 429, i64 429, i64 429>			; CHECK-NEXT: [[TMP1:%.*]] = icmp ule <8 x i64> [[INDUCTION]], <i64 429, i64 429, i64 429, i64 429, i64 429, i64 429, i64 429, i64 429>
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 [[TMP0]]			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 [[TMP0]]
	; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP2]], i32 0			; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP4:%.]] = bitcast i32 [[TMP3]] to <8 x i32>*			; CHECK-NEXT: [[TMP4:%.]] = bitcast i32 [[TMP3]] to <8 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32> [[TMP4]], i32 4, <8 x i1> [[TMP1]], <8 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32> [[TMP4]], i32 4, <8 x i1> [[TMP1]], <8 x i32> poison)
	; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[C:%.*]], i64 [[TMP0]]			; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[C:%.*]], i64 [[TMP0]]
	; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP5]], i32 0			; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP5]], i32 0
	; CHECK-NEXT: [[TMP7:%.]] = bitcast i32 [[TMP6]] to <8 x i32>*			; CHECK-NEXT: [[TMP7:%.]] = bitcast i32 [[TMP6]] to <8 x i32>*
	▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
	define dso_local void @tail_folding_disabled(i32* noalias nocapture %A, i32* noalias nocapture readonly %B, i32* noalias nocapture readonly %C) local_unnamed_addr #0 {			define dso_local void @tail_folding_disabled(i32* noalias nocapture %A, i32* noalias nocapture readonly %B, i32* noalias nocapture readonly %C) local_unnamed_addr #0 {
	; CHECK-LABEL: @tail_folding_disabled(			; CHECK-LABEL: @tail_folding_disabled(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <8 x i64> poison, i64 [[INDEX]], i32 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <8 x i64> poison, i64 [[INDEX]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <8 x i64> [[BROADCAST_SPLATINSERT]], <8 x i64> poison, <8 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <8 x i64> [[BROADCAST_SPLATINSERT]], <8 x i64> poison, <8 x i32> zeroinitializer
	; CHECK-NEXT: [[INDUCTION:%.*]] = add <8 x i64> [[BROADCAST_SPLAT]], <i64 0, i64 1, i64 2, i64 3, i64 4, i64 5, i64 6, i64 7>			; CHECK-NEXT: [[INDUCTION:%.*]] = add <8 x i64> [[BROADCAST_SPLAT]], <i64 0, i64 1, i64 2, i64 3, i64 4, i64 5, i64 6, i64 7>
	; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
	; CHECK-NEXT: [[TMP1:%.*]] = icmp ule <8 x i64> [[INDUCTION]], <i64 429, i64 429, i64 429, i64 429, i64 429, i64 429, i64 429, i64 429>			; CHECK-NEXT: [[TMP1:%.*]] = icmp ule <8 x i64> [[INDUCTION]], <i64 429, i64 429, i64 429, i64 429, i64 429, i64 429, i64 429, i64 429>
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 [[TMP0]]			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 [[TMP0]]
	; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP2]], i32 0			; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP4:%.]] = bitcast i32 [[TMP3]] to <8 x i32>*			; CHECK-NEXT: [[TMP4:%.]] = bitcast i32 [[TMP3]] to <8 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32> [[TMP4]], i32 4, <8 x i1> [[TMP1]], <8 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32> [[TMP4]], i32 4, <8 x i1> [[TMP1]], <8 x i32> poison)
	; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[C:%.*]], i64 [[TMP0]]			; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[C:%.*]], i64 [[TMP0]]
	; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP5]], i32 0			; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP5]], i32 0
	; CHECK-NEXT: [[TMP7:%.]] = bitcast i32 [[TMP6]] to <8 x i32>*			; CHECK-NEXT: [[TMP7:%.]] = bitcast i32 [[TMP6]] to <8 x i32>*
	▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
	; CHECK-NEXT: [[TRIP_COUNT_MINUS_1:%.*]] = sub i64 [[TMP2]], 1			; CHECK-NEXT: [[TRIP_COUNT_MINUS_1:%.*]] = sub i64 [[TMP2]], 1
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <8 x i64> poison, i64 [[TRIP_COUNT_MINUS_1]], i32 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <8 x i64> poison, i64 [[TRIP_COUNT_MINUS_1]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <8 x i64> [[BROADCAST_SPLATINSERT]], <8 x i64> poison, <8 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <8 x i64> [[BROADCAST_SPLATINSERT]], <8 x i64> poison, <8 x i32> zeroinitializer
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi <8 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP12:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi <8 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP12:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 0
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <8 x i64> poison, i64 [[INDEX]], i32 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <8 x i64> poison, i64 [[INDEX]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <8 x i64> [[BROADCAST_SPLATINSERT1]], <8 x i64> poison, <8 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <8 x i64> [[BROADCAST_SPLATINSERT1]], <8 x i64> poison, <8 x i32> zeroinitializer
	; CHECK-NEXT: [[INDUCTION:%.*]] = add <8 x i64> [[BROADCAST_SPLAT2]], <i64 0, i64 1, i64 2, i64 3, i64 4, i64 5, i64 6, i64 7>			; CHECK-NEXT: [[INDUCTION:%.*]] = add <8 x i64> [[BROADCAST_SPLAT2]], <i64 0, i64 1, i64 2, i64 3, i64 4, i64 5, i64 6, i64 7>
	; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 0
	; CHECK-NEXT: [[TMP4:%.*]] = icmp ule <8 x i64> [[INDUCTION]], [[BROADCAST_SPLAT]]			; CHECK-NEXT: [[TMP4:%.*]] = icmp ule <8 x i64> [[INDUCTION]], [[BROADCAST_SPLAT]]
	; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 [[TMP3]]			; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 [[TMP3]]
	; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP5]], i32 0			; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP5]], i32 0
	; CHECK-NEXT: [[TMP7:%.]] = bitcast i32 [[TMP6]] to <8 x i32>*			; CHECK-NEXT: [[TMP7:%.]] = bitcast i32 [[TMP6]] to <8 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32> [[TMP7]], i32 4, <8 x i1> [[TMP4]], <8 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32> [[TMP7]], i32 4, <8 x i1> [[TMP4]], <8 x i32> poison)
	; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 [[TMP3]]			; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 [[TMP3]]
	; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, i32 [[TMP8]], i32 0			; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, i32 [[TMP8]], i32 0
	; CHECK-NEXT: [[TMP10:%.]] = bitcast i32 [[TMP9]] to <8 x i32>*			; CHECK-NEXT: [[TMP10:%.]] = bitcast i32 [[TMP9]] to <8 x i32>*
	▲ Show 20 Lines • Show All 68 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/X86/vect.omp.force.small-tc.ll

	Show First 20 Lines • Show All 88 Lines • ▼ Show 20 Lines
	define void @vectorized1(float* noalias nocapture %A, float* noalias nocapture readonly %B) {			define void @vectorized1(float* noalias nocapture %A, float* noalias nocapture readonly %B) {
	; CHECK-LABEL: @vectorized1(			; CHECK-LABEL: @vectorized1(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <8 x i64> poison, i64 [[INDEX]], i32 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <8 x i64> poison, i64 [[INDEX]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <8 x i64> [[BROADCAST_SPLATINSERT]], <8 x i64> poison, <8 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <8 x i64> [[BROADCAST_SPLATINSERT]], <8 x i64> poison, <8 x i32> zeroinitializer
	; CHECK-NEXT: [[INDUCTION:%.*]] = add <8 x i64> [[BROADCAST_SPLAT]], <i64 0, i64 1, i64 2, i64 3, i64 4, i64 5, i64 6, i64 7>			; CHECK-NEXT: [[INDUCTION:%.*]] = add <8 x i64> [[BROADCAST_SPLAT]], <i64 0, i64 1, i64 2, i64 3, i64 4, i64 5, i64 6, i64 7>
	; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
	; CHECK-NEXT: [[TMP1:%.*]] = icmp ule <8 x i64> [[INDUCTION]], <i64 19, i64 19, i64 19, i64 19, i64 19, i64 19, i64 19, i64 19>			; CHECK-NEXT: [[TMP1:%.*]] = icmp ule <8 x i64> [[INDUCTION]], <i64 19, i64 19, i64 19, i64 19, i64 19, i64 19, i64 19, i64 19>
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds float, float [[B:%.*]], i64 [[TMP0]]			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds float, float [[B:%.*]], i64 [[TMP0]]
	; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds float, float [[TMP2]], i32 0			; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds float, float [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP4:%.]] = bitcast float [[TMP3]] to <8 x float>*			; CHECK-NEXT: [[TMP4:%.]] = bitcast float [[TMP3]] to <8 x float>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float> [[TMP4]], i32 4, <8 x i1> [[TMP1]], <8 x float> poison), !llvm.access.group !6			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float> [[TMP4]], i32 4, <8 x i1> [[TMP1]], <8 x float> poison), !llvm.access.group !6
	; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[A:%.*]], i64 [[TMP0]]			; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[A:%.*]], i64 [[TMP0]]
	; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[TMP5]], i32 0			; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[TMP5]], i32 0
	; CHECK-NEXT: [[TMP7:%.]] = bitcast float [[TMP6]] to <8 x float>*			; CHECK-NEXT: [[TMP7:%.]] = bitcast float [[TMP6]] to <8 x float>*
	▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
	define void @vectorized2(float* noalias nocapture %A, float* noalias nocapture readonly %B) {			define void @vectorized2(float* noalias nocapture %A, float* noalias nocapture readonly %B) {
	; CHECK-LABEL: @vectorized2(			; CHECK-LABEL: @vectorized2(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <8 x i64> poison, i64 [[INDEX]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <8 x i64> [[BROADCAST_SPLATINSERT]], <8 x i64> poison, <8 x i32> zeroinitializer
	; CHECK-NEXT: [[INDUCTION:%.*]] = add <8 x i64> [[BROADCAST_SPLAT]], <i64 0, i64 1, i64 2, i64 3, i64 4, i64 5, i64 6, i64 7>
	; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0			; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds float, float [[B:%.*]], i64 [[TMP0]]			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds float, float [[B:%.*]], i64 [[TMP0]]
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds float, float [[TMP1]], i32 0			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds float, float [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[TMP2]] to <8 x float>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[TMP2]] to <8 x float>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <8 x float>, <8 x float> [[TMP3]], align 4, !llvm.access.group !6			; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <8 x float>, <8 x float> [[TMP3]], align 4, !llvm.access.group !6
	; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds float, float [[A:%.*]], i64 [[TMP0]]			; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds float, float [[A:%.*]], i64 [[TMP0]]
	; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[TMP4]], i32 0			; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[TMP4]], i32 0
	; CHECK-NEXT: [[TMP6:%.]] = bitcast float [[TMP5]] to <8 x float>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast float [[TMP5]] to <8 x float>*
	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/dont-fold-tail-for-const-TC.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -loop-vectorize -force-vector-interleave=3 -force-vector-width=2 -S \| FileCheck %s			; RUN: opt < %s -loop-vectorize -force-vector-interleave=3 -force-vector-width=2 -S \| FileCheck %s

	target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"

	; Make sure the loop is vectorized and unrolled under -Os without folding its			; Make sure the loop is vectorized and unrolled under -Os without folding its
	; tail based on its trip-count being provably divisible by chosen VFxIC.			; tail based on its trip-count being provably divisible by chosen VFxIC.

	define dso_local void @constTC(i32* noalias nocapture %A) optsize {			define dso_local void @constTC(i32* noalias nocapture %A) optsize {
	; CHECK-LABEL: @constTC(			; CHECK-LABEL: @constTC(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <2 x i32> poison, i32 [[INDEX]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i32> [[BROADCAST_SPLATINSERT]], <2 x i32> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[INDUCTION:%.*]] = add <2 x i32> [[BROADCAST_SPLAT]], <i32 0, i32 1>
	; CHECK-NEXT: [[INDUCTION1:%.*]] = add <2 x i32> [[BROADCAST_SPLAT]], <i32 2, i32 3>
	; CHECK-NEXT: [[INDUCTION2:%.*]] = add <2 x i32> [[BROADCAST_SPLAT]], <i32 4, i32 5>
	; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0			; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0
	; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[INDEX]], 2			; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[INDEX]], 2
	; CHECK-NEXT: [[TMP2:%.*]] = add i32 [[INDEX]], 4			; CHECK-NEXT: [[TMP2:%.*]] = add i32 [[INDEX]], 4
	; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i32 [[TMP0]]			; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i32 [[TMP0]]
	; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[A]], i32 [[TMP1]]			; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[A]], i32 [[TMP1]]
	; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[A]], i32 [[TMP2]]			; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[A]], i32 [[TMP2]]
	; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP3]], i32 0			; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP3]], i32 0
	; CHECK-NEXT: [[TMP7:%.]] = bitcast i32 [[TMP6]] to <2 x i32>*			; CHECK-NEXT: [[TMP7:%.]] = bitcast i32 [[TMP6]] to <2 x i32>*
	Show All 40 Lines

llvm/test/Transforms/LoopVectorize/dont-fold-tail-for-divisible-TC.ll

	Show All 12 Lines
	; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[ALIGNEDTC]], 4			; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[ALIGNEDTC]], 4
	; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[ALIGNEDTC]], 4			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[ALIGNEDTC]], 4
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[ALIGNEDTC]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[ALIGNEDTC]], [[N_MOD_VF]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[INDEX]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[INDUCTION:%.*]] = add <4 x i32> [[BROADCAST_SPLAT]], <i32 0, i32 1, i32 2, i32 3>
	; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0			; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i32 [[TMP0]]			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i32 [[TMP0]]
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i32 0			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> <i32 13, i32 13, i32 13, i32 13>, <4 x i32>* [[TMP3]], align 1			; CHECK-NEXT: store <4 x i32> <i32 13, i32 13, i32 13, i32 13>, <4 x i32>* [[TMP3]], align 1
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4
	; CHECK-NEXT: [[TMP4:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP4:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP4]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP0:!llvm.loop !.]]			; CHECK-NEXT: br i1 [[TMP4]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP0:!llvm.loop !.]]
	▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[N]], 4			; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[N]], 4
	; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N]], 4			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N]], 4
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[N]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[N]], [[N_MOD_VF]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[INDEX]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[INDUCTION:%.*]] = add <4 x i32> [[BROADCAST_SPLAT]], <i32 0, i32 1, i32 2, i32 3>
	; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0			; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i32 [[TMP0]]			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i32 [[TMP0]]
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i32 0			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> <i32 13, i32 13, i32 13, i32 13>, <4 x i32>* [[TMP3]], align 1			; CHECK-NEXT: store <4 x i32> <i32 13, i32 13, i32 13, i32 13>, <4 x i32>* [[TMP3]], align 1
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4
	; CHECK-NEXT: [[TMP4:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP4:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP4]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP4:!llvm.loop !.]]			; CHECK-NEXT: br i1 [[TMP4]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP4:!llvm.loop !.]]
	▲ Show 20 Lines • Show All 155 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/first-order-recurrence-sink-replicate-region.ll

	Show First 20 Lines • Show All 166 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Live-in vp<[[BTC:%.+]]> = backedge-taken count			; CHECK-NEXT: Live-in vp<[[BTC:%.+]]> = backedge-taken count
	; CHECK-EMPTY:			; CHECK-EMPTY:
	; CHECK-NEXT: <x1> vector loop: {			; CHECK-NEXT: <x1> vector loop: {
	; CHECK-NEXT: loop:			; CHECK-NEXT: loop:
	; CHECK-NEXT: EMIT vp<[[CAN_IV:%.+]]> = CANONICAL-INDUCTION			; CHECK-NEXT: EMIT vp<[[CAN_IV:%.+]]> = CANONICAL-INDUCTION
	; CHECK-NEXT: FIRST-ORDER-RECURRENCE-PHI ir<%recur> = phi ir<0>, ir<%recur.next>			; CHECK-NEXT: FIRST-ORDER-RECURRENCE-PHI ir<%recur> = phi ir<0>, ir<%recur.next>
	; CHECK-NEXT: WIDEN-INDUCTION %iv = phi 0, %iv.next			; CHECK-NEXT: WIDEN-INDUCTION %iv = phi 0, %iv.next
	; CHECK-NEXT: WIDEN-REDUCTION-PHI ir<%and.red> = phi ir<1234>, ir<%and.red.next>			; CHECK-NEXT: WIDEN-REDUCTION-PHI ir<%and.red> = phi ir<1234>, ir<%and.red.next>
	; CHECK-NEXT: EMIT vp<[[MASK:%.+]]> = icmp ule ir<%iv> vp<[[BTC]]>			; CHECK-NEXT: EMIT vp<[[WIDEN_CAN:%.+]]> = WIDEN-CANONICAL-INDUCTION vp<[[CAN_IV]]>
				; CHECK-NEXT: EMIT vp<[[MASK:%.+]]> = icmp ule vp<[[WIDEN_CAN]]> vp<[[BTC]]>
				AyalUnsubmitted Done Reply Inline Actions Should vp<%2> be vp<[[CAN_IV]]> ? Ayal: Should vp<%2> be vp<[[CAN_IV]]> ?
				fhahnAuthorUnsubmitted Done Reply Inline Actions Yep, updated, thanks! fhahn: Yep, updated, thanks!
	; CHECK-NEXT: Successor(s): loop.0			; CHECK-NEXT: Successor(s): loop.0
	; CHECK-EMPTY:			; CHECK-EMPTY:
	; CHECK-NEXT: loop.0:			; CHECK-NEXT: loop.0:
	; CHECK-NEXT: WIDEN ir<%recur.next> = sext ir<%y>			; CHECK-NEXT: WIDEN ir<%recur.next> = sext ir<%y>
	; CHECK-NEXT: EMIT vp<[[SPLICE:%.+]]> = first-order splice ir<%recur> ir<%recur.next>			; CHECK-NEXT: EMIT vp<[[SPLICE:%.+]]> = first-order splice ir<%recur> ir<%recur.next>
	; CHECK-NEXT: Successor(s): pred.srem			; CHECK-NEXT: Successor(s): pred.srem
	; CHECK-EMPTY:			; CHECK-EMPTY:
	; CHECK-NEXT: <xVFxUF> pred.srem: {			; CHECK-NEXT: <xVFxUF> pred.srem: {
	▲ Show 20 Lines • Show All 228 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/first-order-recurrence.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 6,339 Lines • ▼ Show 20 Lines
	; UNROLL-NO-VF-NEXT: [[TMP3:%.*]] = icmp ule i32 [[INDUCTION4]], [[TRIP_COUNT_MINUS_1]]			; UNROLL-NO-VF-NEXT: [[TMP3:%.*]] = icmp ule i32 [[INDUCTION4]], [[TRIP_COUNT_MINUS_1]]
	; UNROLL-NO-VF-NEXT: br i1 [[TMP2]], label [[PRED_UDIV_IF:%.]], label [[PRED_UDIV_CONTINUE:%.]]			; UNROLL-NO-VF-NEXT: br i1 [[TMP2]], label [[PRED_UDIV_IF:%.]], label [[PRED_UDIV_CONTINUE:%.]]
	; UNROLL-NO-VF: pred.udiv.if:			; UNROLL-NO-VF: pred.udiv.if:
	; UNROLL-NO-VF-NEXT: [[TMP4:%.*]] = udiv i32 219220132, [[INDUCTION]]			; UNROLL-NO-VF-NEXT: [[TMP4:%.*]] = udiv i32 219220132, [[INDUCTION]]
	; UNROLL-NO-VF-NEXT: br label [[PRED_UDIV_CONTINUE]]			; UNROLL-NO-VF-NEXT: br label [[PRED_UDIV_CONTINUE]]
	; UNROLL-NO-VF: pred.udiv.continue:			; UNROLL-NO-VF: pred.udiv.continue:
	; UNROLL-NO-VF-NEXT: [[TMP5:%.*]] = phi i32 [ poison, [[VECTOR_BODY]] ], [ [[TMP4]], [[PRED_UDIV_IF]] ]			; UNROLL-NO-VF-NEXT: [[TMP5:%.*]] = phi i32 [ poison, [[VECTOR_BODY]] ], [ [[TMP4]], [[PRED_UDIV_IF]] ]
	; UNROLL-NO-VF-NEXT: br i1 [[TMP3]], label [[PRED_UDIV_IF6:%.]], label [[PRED_UDIV_CONTINUE7:%.]]			; UNROLL-NO-VF-NEXT: br i1 [[TMP3]], label [[PRED_UDIV_IF6:%.]], label [[PRED_UDIV_CONTINUE7:%.]]
	; UNROLL-NO-VF: pred.udiv.if6:			; UNROLL-NO-VF: pred.udiv.if7:
	; UNROLL-NO-VF-NEXT: [[TMP6:%.*]] = udiv i32 219220132, [[INDUCTION2]]			; UNROLL-NO-VF-NEXT: [[TMP6:%.*]] = udiv i32 219220132, [[INDUCTION2]]
	; UNROLL-NO-VF-NEXT: br label [[PRED_UDIV_CONTINUE7]]			; UNROLL-NO-VF-NEXT: br label [[PRED_UDIV_CONTINUE7]]
	; UNROLL-NO-VF: pred.udiv.continue7:			; UNROLL-NO-VF: pred.udiv.continue8:
	; UNROLL-NO-VF-NEXT: [[TMP7]] = phi i32 [ poison, [[PRED_UDIV_CONTINUE]] ], [ [[TMP6]], [[PRED_UDIV_IF6]] ]			; UNROLL-NO-VF-NEXT: [[TMP7]] = phi i32 [ poison, [[PRED_UDIV_CONTINUE]] ], [ [[TMP6]], [[PRED_UDIV_IF6]] ]
	; UNROLL-NO-VF-NEXT: [[TMP8]] = add i32 [[VEC_PHI]], [[VECTOR_RECUR]]			; UNROLL-NO-VF-NEXT: [[TMP8]] = add i32 [[VEC_PHI]], [[VECTOR_RECUR]]
	; UNROLL-NO-VF-NEXT: [[TMP9]] = add i32 [[VEC_PHI5]], [[TMP5]]			; UNROLL-NO-VF-NEXT: [[TMP9]] = add i32 [[VEC_PHI5]], [[TMP5]]
	; UNROLL-NO-VF-NEXT: br i1 [[TMP2]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]			; UNROLL-NO-VF-NEXT: br i1 [[TMP2]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]
	; UNROLL-NO-VF: pred.store.if:			; UNROLL-NO-VF: pred.store.if:
	; UNROLL-NO-VF-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[X:%.*]], i32 [[INDUCTION3]]			; UNROLL-NO-VF: [[SUNK_IND0:%.+]] = add i32 [[INDEX]], 0
				; UNROLL-NO-VF-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[X:%.*]], i32 [[SUNK_IND0]]
	; UNROLL-NO-VF-NEXT: store i32 [[INDUCTION]], i32* [[TMP10]], align 4			; UNROLL-NO-VF-NEXT: store i32 [[INDUCTION]], i32* [[TMP10]], align 4
	; UNROLL-NO-VF-NEXT: br label [[PRED_STORE_CONTINUE]]			; UNROLL-NO-VF-NEXT: br label [[PRED_STORE_CONTINUE]]
	; UNROLL-NO-VF: pred.store.continue:			; UNROLL-NO-VF: pred.store.continue:
	; UNROLL-NO-VF-NEXT: br i1 [[TMP3]], label [[PRED_STORE_IF8:%.*]], label [[PRED_STORE_CONTINUE9]]			; UNROLL-NO-VF-NEXT: br i1 [[TMP3]], label [[PRED_STORE_IF8:%.*]], label [[PRED_STORE_CONTINUE9]]
	; UNROLL-NO-VF: pred.store.if8:			; UNROLL-NO-VF: pred.store.if9:
	; UNROLL-NO-VF-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[X]], i32 [[INDUCTION4]]			; UNROLL-NO-VF-NEXT: [[SUNK_IND1:%.+]] = add i32 [[INDEX]], 1
				; UNROLL-NO-VF-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[X]], i32 [[SUNK_IND1]]
	; UNROLL-NO-VF-NEXT: store i32 [[INDUCTION2]], i32* [[TMP11]], align 4			; UNROLL-NO-VF-NEXT: store i32 [[INDUCTION2]], i32* [[TMP11]], align 4
	; UNROLL-NO-VF-NEXT: br label [[PRED_STORE_CONTINUE9]]			; UNROLL-NO-VF-NEXT: br label [[PRED_STORE_CONTINUE9]]
	; UNROLL-NO-VF: pred.store.continue9:			; UNROLL-NO-VF: pred.store.continue10:
	; UNROLL-NO-VF-NEXT: [[TMP12:%.*]] = select i1 [[TMP2]], i32 [[TMP8]], i32 [[VEC_PHI]]			; UNROLL-NO-VF-NEXT: [[TMP12:%.*]] = select i1 [[TMP2]], i32 [[TMP8]], i32 [[VEC_PHI]]
	; UNROLL-NO-VF-NEXT: [[TMP13:%.*]] = select i1 [[TMP3]], i32 [[TMP9]], i32 [[VEC_PHI5]]			; UNROLL-NO-VF-NEXT: [[TMP13:%.*]] = select i1 [[TMP3]], i32 [[TMP9]], i32 [[VEC_PHI5]]
	; UNROLL-NO-VF-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 2			; UNROLL-NO-VF-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 2
	; UNROLL-NO-VF-NEXT: [[TMP14:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]			; UNROLL-NO-VF-NEXT: [[TMP14:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
	; UNROLL-NO-VF-NEXT: br i1 [[TMP14]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !prof [[PROF51]], !llvm.loop [[LOOP55:![0-9]+]]			; UNROLL-NO-VF-NEXT: br i1 [[TMP14]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !prof [[PROF51]], !llvm.loop [[LOOP55:![0-9]+]]
	; UNROLL-NO-VF: middle.block:			; UNROLL-NO-VF: middle.block:
	; UNROLL-NO-VF-NEXT: [[BIN_RDX:%.*]] = add i32 [[TMP13]], [[TMP12]]			; UNROLL-NO-VF-NEXT: [[BIN_RDX:%.*]] = add i32 [[TMP13]], [[TMP12]]
	; UNROLL-NO-VF-NEXT: br i1 true, label [[BB1:%.*]], label [[SCALAR_PH]]			; UNROLL-NO-VF-NEXT: br i1 true, label [[BB1:%.*]], label [[SCALAR_PH]]
	▲ Show 20 Lines • Show All 626 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/pr44488-predication.ll

	Show All 12 Lines
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_SREM_CONTINUE4:%.*]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_SREM_CONTINUE4:%.*]] ]
	; CHECK-NEXT: [[TMP0:%.*]] = trunc i32 [[INDEX]] to i16			; CHECK-NEXT: [[TMP0:%.*]] = trunc i32 [[INDEX]] to i16
	; CHECK-NEXT: [[OFFSET_IDX:%.*]] = add i16 99, [[TMP0]]			; CHECK-NEXT: [[OFFSET_IDX:%.*]] = add i16 99, [[TMP0]]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <2 x i16> poison, i16 [[OFFSET_IDX]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i16> [[BROADCAST_SPLATINSERT]], <2 x i16> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[INDUCTION:%.*]] = add <2 x i16> [[BROADCAST_SPLAT]], <i16 0, i16 1>
	; CHECK-NEXT: [[TMP1:%.*]] = add i16 [[OFFSET_IDX]], 0			; CHECK-NEXT: [[TMP1:%.*]] = add i16 [[OFFSET_IDX]], 0
	; CHECK-NEXT: [[TMP2:%.]] = load i16, i16 @v_38, align 1			; CHECK-NEXT: [[TMP2:%.]] = load i16, i16 @v_38, align 1
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <2 x i16> poison, i16 [[TMP2]], i32 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <2 x i16> poison, i16 [[TMP2]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <2 x i16> [[BROADCAST_SPLATINSERT1]], <2 x i16> poison, <2 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <2 x i16> [[BROADCAST_SPLATINSERT1]], <2 x i16> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP3:%.*]] = icmp eq <2 x i16> [[BROADCAST_SPLAT2]], <i16 32767, i16 32767>			; CHECK-NEXT: [[TMP3:%.*]] = icmp eq <2 x i16> [[BROADCAST_SPLAT2]], <i16 32767, i16 32767>
	; CHECK-NEXT: [[TMP4:%.*]] = icmp eq <2 x i16> [[BROADCAST_SPLAT2]], zeroinitializer			; CHECK-NEXT: [[TMP4:%.*]] = icmp eq <2 x i16> [[BROADCAST_SPLAT2]], zeroinitializer
	; CHECK-NEXT: [[TMP5:%.*]] = xor <2 x i1> [[TMP4]], <i1 true, i1 true>			; CHECK-NEXT: [[TMP5:%.*]] = xor <2 x i1> [[TMP4]], <i1 true, i1 true>
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i1> [[TMP5]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i1> [[TMP5]], i32 0
	; CHECK-NEXT: br i1 [[TMP6]], label [[PRED_SREM_IF:%.]], label [[PRED_SREM_CONTINUE:%.]]			; CHECK-NEXT: br i1 [[TMP6]], label [[PRED_SREM_IF:%.]], label [[PRED_SREM_CONTINUE:%.]]
	; CHECK: pred.srem.if:			; CHECK: pred.srem.if:
	; CHECK-NEXT: [[TMP7:%.*]] = srem i16 5786, [[TMP2]]			; CHECK-NEXT: [[TMP7:%.*]] = srem i16 5786, [[TMP2]]
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x i16> poison, i16 [[TMP7]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x i16> poison, i16 [[TMP7]], i32 0
	; CHECK-NEXT: br label [[PRED_SREM_CONTINUE]]			; CHECK-NEXT: br label [[PRED_SREM_CONTINUE]]
	; CHECK: pred.srem.continue:			; CHECK: pred.srem.continue:
	; CHECK-NEXT: [[TMP9:%.*]] = phi <2 x i16> [ poison, [[VECTOR_BODY]] ], [ [[TMP8]], [[PRED_SREM_IF]] ]			; CHECK-NEXT: [[TMP9:%.*]] = phi <2 x i16> [ poison, [[VECTOR_BODY]] ], [ [[TMP8]], [[PRED_SREM_IF]] ]
	; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x i1> [[TMP5]], i32 1			; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x i1> [[TMP5]], i32 1
	; CHECK-NEXT: br i1 [[TMP10]], label [[PRED_SREM_IF3:%.*]], label [[PRED_SREM_CONTINUE4]]			; CHECK-NEXT: br i1 [[TMP10]], label [[PRED_SREM_IF3:%.*]], label [[PRED_SREM_CONTINUE4]]
	; CHECK: pred.srem.if3:			; CHECK: pred.srem.if1:
	; CHECK-NEXT: [[TMP11:%.*]] = srem i16 5786, [[TMP2]]			; CHECK-NEXT: [[TMP11:%.*]] = srem i16 5786, [[TMP2]]
	; CHECK-NEXT: [[TMP12:%.*]] = insertelement <2 x i16> [[TMP9]], i16 [[TMP11]], i32 1			; CHECK-NEXT: [[TMP12:%.*]] = insertelement <2 x i16> [[TMP9]], i16 [[TMP11]], i32 1
	; CHECK-NEXT: br label [[PRED_SREM_CONTINUE4]]			; CHECK-NEXT: br label [[PRED_SREM_CONTINUE4]]
	; CHECK: pred.srem.continue4:			; CHECK: pred.srem.continue2:
	; CHECK-NEXT: [[TMP13:%.*]] = phi <2 x i16> [ [[TMP9]], [[PRED_SREM_CONTINUE]] ], [ [[TMP12]], [[PRED_SREM_IF3]] ]			; CHECK-NEXT: [[TMP13:%.*]] = phi <2 x i16> [ [[TMP9]], [[PRED_SREM_CONTINUE]] ], [ [[TMP12]], [[PRED_SREM_IF3]] ]
	; CHECK-NEXT: [[PREDPHI:%.*]] = select <2 x i1> [[TMP4]], <2 x i16> <i16 5786, i16 5786>, <2 x i16> [[TMP13]]			; CHECK-NEXT: [[PREDPHI:%.*]] = select <2 x i1> [[TMP4]], <2 x i16> <i16 5786, i16 5786>, <2 x i16> [[TMP13]]
	; CHECK-NEXT: [[TMP14:%.*]] = extractelement <2 x i16> [[PREDPHI]], i32 0			; CHECK-NEXT: [[TMP14:%.*]] = extractelement <2 x i16> [[PREDPHI]], i32 0
	; CHECK-NEXT: store i16 [[TMP14]], i16* @v_39, align 1			; CHECK-NEXT: store i16 [[TMP14]], i16* @v_39, align 1
	; CHECK-NEXT: [[TMP15:%.*]] = extractelement <2 x i16> [[PREDPHI]], i32 1			; CHECK-NEXT: [[TMP15:%.*]] = extractelement <2 x i16> [[PREDPHI]], i32 1
	; CHECK-NEXT: store i16 [[TMP15]], i16* @v_39, align 1			; CHECK-NEXT: store i16 [[TMP15]], i16* @v_39, align 1
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 2			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 2
	; CHECK-NEXT: [[TMP16:%.*]] = icmp eq i32 [[INDEX_NEXT]], 12			; CHECK-NEXT: [[TMP16:%.*]] = icmp eq i32 [[INDEX_NEXT]], 12
	▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll

	Show First 20 Lines • Show All 144 Lines • ▼ Show 20 Lines
	; VF1UF4-NEXT: [[INDUCTION2:%.*]] = add i32 [[INDEX]], 2			; VF1UF4-NEXT: [[INDUCTION2:%.*]] = add i32 [[INDEX]], 2
	; VF1UF4-NEXT: [[INDUCTION3:%.*]] = add i32 [[INDEX]], 3			; VF1UF4-NEXT: [[INDUCTION3:%.*]] = add i32 [[INDEX]], 3
	; VF1UF4-NEXT: [[TMP0:%.*]] = icmp ule i32 [[INDUCTION]], 13			; VF1UF4-NEXT: [[TMP0:%.*]] = icmp ule i32 [[INDUCTION]], 13
	; VF1UF4-NEXT: [[TMP1:%.*]] = icmp ule i32 [[INDUCTION1]], 13			; VF1UF4-NEXT: [[TMP1:%.*]] = icmp ule i32 [[INDUCTION1]], 13
	; VF1UF4-NEXT: [[TMP2:%.*]] = icmp ule i32 [[INDUCTION2]], 13			; VF1UF4-NEXT: [[TMP2:%.*]] = icmp ule i32 [[INDUCTION2]], 13
	; VF1UF4-NEXT: [[TMP3:%.*]] = icmp ule i32 [[INDUCTION3]], 13			; VF1UF4-NEXT: [[TMP3:%.*]] = icmp ule i32 [[INDUCTION3]], 13
	; VF1UF4-NEXT: br i1 [[TMP0]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]			; VF1UF4-NEXT: br i1 [[TMP0]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]
	; VF1UF4: pred.store.if:			; VF1UF4: pred.store.if:
	; VF1UF4-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i32 [[INDUCTION]]			; VF1UF4-NEXT: [[SUNK_IND0:%.*]] = add i32 [[INDEX]], 0
				; VF1UF4-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i32 [[SUNK_IND0]]
				AyalUnsubmitted Done Reply Inline Actions Hmm, this enables sink scalar operands to handle scalar steps. Ayal: Hmm, this enables sink scalar operands to handle scalar steps.
				fhahnAuthorUnsubmitted Done Reply Inline Actions I think this is caused by the drop of the check for the scalar VF; we add the VPWidenCanonicalInduction, but the cost model knows that the induction recipe will be scalar only, so we cannot replace the VPWidenCanonicalInduction with the other induction recipe and we have some extra steps. Given that the VF=1,UF>1 is somewhat of an edge case, it seems cleaner to not check for the scalar VF, as suggested earlier. fhahn: I think this is caused by the drop of the check for the scalar VF; we add the…
	; VF1UF4-NEXT: store i32 13, i32* [[TMP4]], align 1			; VF1UF4-NEXT: store i32 13, i32* [[TMP4]], align 1
	; VF1UF4-NEXT: br label [[PRED_STORE_CONTINUE]]			; VF1UF4-NEXT: br label [[PRED_STORE_CONTINUE]]
	; VF1UF4: pred.store.continue:			; VF1UF4: pred.store.continue:
	; VF1UF4-NEXT: br i1 [[TMP1]], label [[PRED_STORE_IF4:%.]], label [[PRED_STORE_CONTINUE5:%.]]			; VF1UF4-NEXT: br i1 [[TMP1]], label [[PRED_STORE_IF4:%.]], label [[PRED_STORE_CONTINUE5:%.]]
	; VF1UF4: pred.store.if4:			; VF1UF4: pred.store.if7:
	; VF1UF4-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[A]], i32 [[INDUCTION1]]			; VF1UF4-NEXT: [[SUNK_IND1:%.*]] = add i32 [[INDEX]], 1
				; VF1UF4-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[A]], i32 [[SUNK_IND1]]
	; VF1UF4-NEXT: store i32 13, i32* [[TMP5]], align 1			; VF1UF4-NEXT: store i32 13, i32* [[TMP5]], align 1
	; VF1UF4-NEXT: br label [[PRED_STORE_CONTINUE5]]			; VF1UF4-NEXT: br label [[PRED_STORE_CONTINUE5]]
	; VF1UF4: pred.store.continue5:			; VF1UF4: pred.store.continue8:
	; VF1UF4-NEXT: br i1 [[TMP2]], label [[PRED_STORE_IF6:%.]], label [[PRED_STORE_CONTINUE7:%.]]			; VF1UF4-NEXT: br i1 [[TMP2]], label [[PRED_STORE_IF6:%.]], label [[PRED_STORE_CONTINUE7:%.]]
	; VF1UF4: pred.store.if6:			; VF1UF4: pred.store.if9:
	; VF1UF4-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[A]], i32 [[INDUCTION2]]			; VF1UF4-NEXT: [[SUNK_IND2:%.*]] = add i32 [[INDEX]], 2
				; VF1UF4-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[A]], i32 [[SUNK_IND2]]
	; VF1UF4-NEXT: store i32 13, i32* [[TMP6]], align 1			; VF1UF4-NEXT: store i32 13, i32* [[TMP6]], align 1
	; VF1UF4-NEXT: br label [[PRED_STORE_CONTINUE7]]			; VF1UF4-NEXT: br label [[PRED_STORE_CONTINUE7]]
	; VF1UF4: pred.store.continue7:			; VF1UF4: pred.store.continue10:
	; VF1UF4-NEXT: br i1 [[TMP3]], label [[PRED_STORE_IF8:%.*]], label [[PRED_STORE_CONTINUE9]]			; VF1UF4-NEXT: br i1 [[TMP3]], label [[PRED_STORE_IF8:%.*]], label [[PRED_STORE_CONTINUE9]]
	; VF1UF4: pred.store.if8:			; VF1UF4: pred.store.if11:
	; VF1UF4-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[A]], i32 [[INDUCTION3]]			; VF1UF4-NEXT: [[SUNK_IND3:%.*]] = add i32 [[INDEX]], 3
				; VF1UF4-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[A]], i32 [[SUNK_IND3]]
	; VF1UF4-NEXT: store i32 13, i32* [[TMP7]], align 1			; VF1UF4-NEXT: store i32 13, i32* [[TMP7]], align 1
	; VF1UF4-NEXT: br label [[PRED_STORE_CONTINUE9]]			; VF1UF4-NEXT: br label [[PRED_STORE_CONTINUE9]]
	; VF1UF4: pred.store.continue9:			; VF1UF4: pred.store.continue12:
	; VF1UF4-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4			; VF1UF4-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4
	; VF1UF4-NEXT: [[TMP8:%.*]] = icmp eq i32 [[INDEX_NEXT]], 16			; VF1UF4-NEXT: [[TMP8:%.*]] = icmp eq i32 [[INDEX_NEXT]], 16
	; VF1UF4-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]]			; VF1UF4-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]]
	; VF1UF4: middle.block:			; VF1UF4: middle.block:
	; VF1UF4-NEXT: br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]]			; VF1UF4-NEXT: br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]]
	; VF1UF4: scalar.ph:			; VF1UF4: scalar.ph:
	; VF1UF4-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 16, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; VF1UF4-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 16, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; VF1UF4-NEXT: br label [[LOOP:%.*]]			; VF1UF4-NEXT: br label [[LOOP:%.*]]
	Show All 24 Lines

llvm/test/Transforms/LoopVectorize/pr46525-expander-insertpoint.ll

	Show All 25 Lines
	; CHECK-NEXT: [[IND_END:%.*]] = mul i64 [[N_VEC]], [[INC]]			; CHECK-NEXT: [[IND_END:%.*]] = mul i64 [[N_VEC]], [[INC]]
	; CHECK-NEXT: [[TRIP_COUNT_MINUS_1:%.*]] = sub i64 [[TMP2]], 1			; CHECK-NEXT: [[TRIP_COUNT_MINUS_1:%.*]] = sub i64 [[TMP2]], 1
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <2 x i64> poison, i64 [[TRIP_COUNT_MINUS_1]], i32 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <2 x i64> poison, i64 [[TRIP_COUNT_MINUS_1]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT]], <2 x i64> poison, <2 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT]], <2 x i64> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE12:%.*]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE12:%.*]] ]
	; CHECK-NEXT: [[OFFSET_IDX:%.*]] = mul i64 [[INDEX]], [[INC]]			; CHECK-NEXT: [[OFFSET_IDX:%.*]] = mul i64 [[INDEX]], [[INC]]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <2 x i64> poison, i64 [[OFFSET_IDX]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT1]], <2 x i64> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <2 x i64> poison, i64 [[INC]], i32 0
	; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <2 x i64> [[DOTSPLATINSERT]], <2 x i64> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP3:%.*]] = mul <2 x i64> <i64 0, i64 1>, [[DOTSPLAT]]
	; CHECK-NEXT: [[INDUCTION:%.*]] = add <2 x i64> [[BROADCAST_SPLAT2]], [[TMP3]]
	; CHECK-NEXT: [[TMP4:%.*]] = mul i64 0, [[INC]]			; CHECK-NEXT: [[TMP4:%.*]] = mul i64 0, [[INC]]
	; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[OFFSET_IDX]], [[TMP4]]			; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[OFFSET_IDX]], [[TMP4]]
	; CHECK-NEXT: [[OFFSET_IDX3:%.*]] = mul i64 [[INDEX]], [[INC]]			; CHECK-NEXT: [[OFFSET_IDX3:%.*]] = mul i64 [[INDEX]], [[INC]]
	; CHECK-NEXT: [[TMP6:%.*]] = trunc i64 [[OFFSET_IDX3]] to i8			; CHECK-NEXT: [[TMP6:%.*]] = trunc i64 [[OFFSET_IDX3]] to i8
	; CHECK-NEXT: [[TMP7:%.*]] = trunc i64 [[INC]] to i8			; CHECK-NEXT: [[TMP7:%.*]] = trunc i64 [[INC]] to i8
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT4:%.*]] = insertelement <2 x i8> poison, i8 [[TMP6]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT5:%.*]] = shufflevector <2 x i8> [[BROADCAST_SPLATINSERT4]], <2 x i8> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[DOTSPLATINSERT6:%.*]] = insertelement <2 x i8> poison, i8 [[TMP7]], i32 0
	; CHECK-NEXT: [[DOTSPLAT7:%.*]] = shufflevector <2 x i8> [[DOTSPLATINSERT6]], <2 x i8> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP8:%.*]] = mul <2 x i8> <i8 0, i8 1>, [[DOTSPLAT7]]
	; CHECK-NEXT: [[INDUCTION8:%.*]] = add <2 x i8> [[BROADCAST_SPLAT5]], [[TMP8]]
	; CHECK-NEXT: [[TMP9:%.*]] = mul i8 0, [[TMP7]]			; CHECK-NEXT: [[TMP9:%.*]] = mul i8 0, [[TMP7]]
	; CHECK-NEXT: [[TMP10:%.*]] = add i8 [[TMP6]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = add i8 [[TMP6]], [[TMP9]]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT9:%.*]] = insertelement <2 x i64> poison, i64 [[INDEX]], i32 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT9:%.*]] = insertelement <2 x i64> poison, i64 [[INDEX]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT10:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT9]], <2 x i64> poison, <2 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT10:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT9]], <2 x i64> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[VEC_IV:%.*]] = add <2 x i64> [[BROADCAST_SPLAT10]], <i64 0, i64 1>			; CHECK-NEXT: [[VEC_IV:%.*]] = add <2 x i64> [[BROADCAST_SPLAT10]], <i64 0, i64 1>
	; CHECK-NEXT: [[TMP11:%.*]] = icmp ule <2 x i64> [[VEC_IV]], [[BROADCAST_SPLAT]]			; CHECK-NEXT: [[TMP11:%.*]] = icmp ule <2 x i64> [[VEC_IV]], [[BROADCAST_SPLAT]]
	; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x i1> [[TMP11]], i32 0			; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x i1> [[TMP11]], i32 0
	; CHECK-NEXT: br i1 [[TMP12]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]			; CHECK-NEXT: br i1 [[TMP12]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]
	; CHECK: pred.store.if:			; CHECK: pred.store.if:
	; CHECK-NEXT: store i32 0, i32* [[PTR:%.*]], align 4			; CHECK-NEXT: store i32 0, i32* [[PTR:%.*]], align 4
	; CHECK-NEXT: br label [[PRED_STORE_CONTINUE]]			; CHECK-NEXT: br label [[PRED_STORE_CONTINUE]]
	; CHECK: pred.store.continue:			; CHECK: pred.store.continue:
	; CHECK-NEXT: [[TMP13:%.*]] = extractelement <2 x i1> [[TMP11]], i32 1			; CHECK-NEXT: [[TMP13:%.*]] = extractelement <2 x i1> [[TMP11]], i32 1
	; CHECK-NEXT: br i1 [[TMP13]], label [[PRED_STORE_IF11:%.*]], label [[PRED_STORE_CONTINUE12]]			; CHECK-NEXT: br i1 [[TMP13]], label [[PRED_STORE_IF11:%.*]], label [[PRED_STORE_CONTINUE12]]
	; CHECK: pred.store.if11:			; CHECK: pred.store.if4:
	; CHECK-NEXT: store i32 0, i32* [[PTR]], align 4			; CHECK-NEXT: store i32 0, i32* [[PTR]], align 4
	; CHECK-NEXT: br label [[PRED_STORE_CONTINUE12]]			; CHECK-NEXT: br label [[PRED_STORE_CONTINUE12]]
	; CHECK: pred.store.continue12:			; CHECK: pred.store.continue5:
	; CHECK-NEXT: [[TMP14:%.*]] = add i8 [[TMP10]], 1			; CHECK-NEXT: [[TMP14:%.*]] = add i8 [[TMP10]], 1
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 2			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 2
	; CHECK-NEXT: [[TMP15:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP15:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP15]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP15]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: br i1 true, label [[LOOP_EXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[LOOP_EXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 0, [[LOOP_PREHEADER]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 0, [[LOOP_PREHEADER]] ]
	▲ Show 20 Lines • Show All 66 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/pr51614-fold-tail-by-masking.ll

	Show All 13 Lines
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_LOAD_CONTINUE4:%.*]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_LOAD_CONTINUE4:%.*]] ]
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi <2 x i16> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP25:%.]], [[PRED_LOAD_CONTINUE4]] ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi <2 x i16> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP25:%.]], [[PRED_LOAD_CONTINUE4]] ]
	; CHECK-NEXT: [[TMP0:%.*]] = trunc i32 [[INDEX]] to i16			; CHECK-NEXT: [[TMP0:%.*]] = trunc i32 [[INDEX]] to i16
	; CHECK-NEXT: [[OFFSET_IDX:%.*]] = sub i16 41, [[TMP0]]			; CHECK-NEXT: [[OFFSET_IDX:%.*]] = sub i16 41, [[TMP0]]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <2 x i16> poison, i16 [[OFFSET_IDX]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i16> [[BROADCAST_SPLATINSERT]], <2 x i16> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[INDUCTION:%.*]] = add <2 x i16> [[BROADCAST_SPLAT]], <i16 0, i16 -1>
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <2 x i32> poison, i32 [[INDEX]], i32 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <2 x i32> poison, i32 [[INDEX]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <2 x i32> [[BROADCAST_SPLATINSERT1]], <2 x i32> poison, <2 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <2 x i32> [[BROADCAST_SPLATINSERT1]], <2 x i32> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[VEC_IV:%.*]] = add <2 x i32> [[BROADCAST_SPLAT2]], <i32 0, i32 1>			; CHECK-NEXT: [[VEC_IV:%.*]] = add <2 x i32> [[BROADCAST_SPLAT2]], <i32 0, i32 1>
	; CHECK-NEXT: [[TMP1:%.*]] = icmp ule <2 x i32> [[VEC_IV]], <i32 40, i32 40>			; CHECK-NEXT: [[TMP1:%.*]] = icmp ule <2 x i32> [[VEC_IV]], <i32 40, i32 40>
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x i1> [[TMP1]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x i1> [[TMP1]], i32 0
	; CHECK-NEXT: br i1 [[TMP2]], label [[PRED_LOAD_IF:%.]], label [[PRED_LOAD_CONTINUE:%.]]			; CHECK-NEXT: br i1 [[TMP2]], label [[PRED_LOAD_IF:%.]], label [[PRED_LOAD_CONTINUE:%.]]
	; CHECK: pred.load.if:			; CHECK: pred.load.if:
	; CHECK-NEXT: [[TMP3:%.*]] = add i16 [[OFFSET_IDX]], 0			; CHECK-NEXT: [[TMP3:%.*]] = add i16 [[OFFSET_IDX]], 0
	; CHECK-NEXT: [[TMP4:%.*]] = add nsw i16 [[TMP3]], -1			; CHECK-NEXT: [[TMP4:%.*]] = add nsw i16 [[TMP3]], -1
	; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds [40 x [4 x i16]], [40 x [4 x i16]] @A, i16 0, i16 [[TMP4]], i16 0			; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds [40 x [4 x i16]], [40 x [4 x i16]] @A, i16 0, i16 [[TMP4]], i16 0
	; CHECK-NEXT: [[TMP6:%.]] = load i16, i16 [[TMP5]], align 1			; CHECK-NEXT: [[TMP6:%.]] = load i16, i16 [[TMP5]], align 1
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x i16> poison, i16 [[TMP6]], i32 0			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x i16> poison, i16 [[TMP6]], i32 0
	; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds [40 x [4 x i16]], [40 x [4 x i16]] @A, i16 0, i16 [[TMP4]], i16 3			; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds [40 x [4 x i16]], [40 x [4 x i16]] @A, i16 0, i16 [[TMP4]], i16 3
	; CHECK-NEXT: [[TMP9:%.]] = load i16, i16 [[TMP8]], align 1			; CHECK-NEXT: [[TMP9:%.]] = load i16, i16 [[TMP8]], align 1
	; CHECK-NEXT: [[TMP10:%.*]] = insertelement <2 x i16> poison, i16 [[TMP9]], i32 0			; CHECK-NEXT: [[TMP10:%.*]] = insertelement <2 x i16> poison, i16 [[TMP9]], i32 0
	; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE]]			; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE]]
	; CHECK: pred.load.continue:			; CHECK: pred.load.continue:
	; CHECK-NEXT: [[TMP11:%.*]] = phi <2 x i16> [ poison, [[VECTOR_BODY]] ], [ [[TMP7]], [[PRED_LOAD_IF]] ]			; CHECK-NEXT: [[TMP11:%.*]] = phi <2 x i16> [ poison, [[VECTOR_BODY]] ], [ [[TMP7]], [[PRED_LOAD_IF]] ]
	; CHECK-NEXT: [[TMP12:%.*]] = phi <2 x i16> [ poison, [[VECTOR_BODY]] ], [ [[TMP10]], [[PRED_LOAD_IF]] ]			; CHECK-NEXT: [[TMP12:%.*]] = phi <2 x i16> [ poison, [[VECTOR_BODY]] ], [ [[TMP10]], [[PRED_LOAD_IF]] ]
	; CHECK-NEXT: [[TMP13:%.*]] = extractelement <2 x i1> [[TMP1]], i32 1			; CHECK-NEXT: [[TMP13:%.*]] = extractelement <2 x i1> [[TMP1]], i32 1
	; CHECK-NEXT: br i1 [[TMP13]], label [[PRED_LOAD_IF3:%.*]], label [[PRED_LOAD_CONTINUE4]]			; CHECK-NEXT: br i1 [[TMP13]], label [[PRED_LOAD_IF3:%.*]], label [[PRED_LOAD_CONTINUE4]]
	; CHECK: pred.load.if3:			; CHECK: pred.load.if1:
	; CHECK-NEXT: [[TMP14:%.*]] = add i16 [[OFFSET_IDX]], -1			; CHECK-NEXT: [[TMP14:%.*]] = add i16 [[OFFSET_IDX]], -1
	; CHECK-NEXT: [[TMP15:%.*]] = add nsw i16 [[TMP14]], -1			; CHECK-NEXT: [[TMP15:%.*]] = add nsw i16 [[TMP14]], -1
	; CHECK-NEXT: [[TMP16:%.]] = getelementptr inbounds [40 x [4 x i16]], [40 x [4 x i16]] @A, i16 0, i16 [[TMP15]], i16 0			; CHECK-NEXT: [[TMP16:%.]] = getelementptr inbounds [40 x [4 x i16]], [40 x [4 x i16]] @A, i16 0, i16 [[TMP15]], i16 0
	; CHECK-NEXT: [[TMP17:%.]] = load i16, i16 [[TMP16]], align 1			; CHECK-NEXT: [[TMP17:%.]] = load i16, i16 [[TMP16]], align 1
	; CHECK-NEXT: [[TMP18:%.*]] = insertelement <2 x i16> [[TMP11]], i16 [[TMP17]], i32 1			; CHECK-NEXT: [[TMP18:%.*]] = insertelement <2 x i16> [[TMP11]], i16 [[TMP17]], i32 1
	; CHECK-NEXT: [[TMP19:%.]] = getelementptr inbounds [40 x [4 x i16]], [40 x [4 x i16]] @A, i16 0, i16 [[TMP15]], i16 3			; CHECK-NEXT: [[TMP19:%.]] = getelementptr inbounds [40 x [4 x i16]], [40 x [4 x i16]] @A, i16 0, i16 [[TMP15]], i16 3
	; CHECK-NEXT: [[TMP20:%.]] = load i16, i16 [[TMP19]], align 1			; CHECK-NEXT: [[TMP20:%.]] = load i16, i16 [[TMP19]], align 1
	; CHECK-NEXT: [[TMP21:%.*]] = insertelement <2 x i16> [[TMP12]], i16 [[TMP20]], i32 1			; CHECK-NEXT: [[TMP21:%.*]] = insertelement <2 x i16> [[TMP12]], i16 [[TMP20]], i32 1
	; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE4]]			; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE4]]
	; CHECK: pred.load.continue4:			; CHECK: pred.load.continue2:
	; CHECK-NEXT: [[TMP22:%.*]] = phi <2 x i16> [ [[TMP11]], [[PRED_LOAD_CONTINUE]] ], [ [[TMP18]], [[PRED_LOAD_IF3]] ]			; CHECK-NEXT: [[TMP22:%.*]] = phi <2 x i16> [ [[TMP11]], [[PRED_LOAD_CONTINUE]] ], [ [[TMP18]], [[PRED_LOAD_IF3]] ]
	; CHECK-NEXT: [[TMP23:%.*]] = phi <2 x i16> [ [[TMP12]], [[PRED_LOAD_CONTINUE]] ], [ [[TMP21]], [[PRED_LOAD_IF3]] ]			; CHECK-NEXT: [[TMP23:%.*]] = phi <2 x i16> [ [[TMP12]], [[PRED_LOAD_CONTINUE]] ], [ [[TMP21]], [[PRED_LOAD_IF3]] ]
	; CHECK-NEXT: [[TMP24:%.*]] = add nsw <2 x i16> [[TMP22]], [[TMP23]]			; CHECK-NEXT: [[TMP24:%.*]] = add nsw <2 x i16> [[TMP22]], [[TMP23]]
	; CHECK-NEXT: [[TMP25]] = add <2 x i16> [[VEC_PHI]], [[TMP24]]			; CHECK-NEXT: [[TMP25]] = add <2 x i16> [[VEC_PHI]], [[TMP24]]
	; CHECK-NEXT: [[TMP26:%.*]] = select <2 x i1> [[TMP1]], <2 x i16> [[TMP25]], <2 x i16> [[VEC_PHI]]			; CHECK-NEXT: [[TMP26:%.*]] = select <2 x i1> [[TMP1]], <2 x i16> [[TMP25]], <2 x i16> [[VEC_PHI]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 2			; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 2
	; CHECK-NEXT: [[TMP27:%.*]] = icmp eq i32 [[INDEX_NEXT]], 42			; CHECK-NEXT: [[TMP27:%.*]] = icmp eq i32 [[INDEX_NEXT]], 42
	; CHECK-NEXT: br i1 [[TMP27]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP27]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/select-reduction.ll

	Show All 20 Lines
	; CHECK-NEXT: [[TRIP_COUNT_MINUS_1:%.*]] = sub i64 [[EXTRA_ITER]], 1			; CHECK-NEXT: [[TRIP_COUNT_MINUS_1:%.*]] = sub i64 [[EXTRA_ITER]], 1
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i64> poison, i64 [[TRIP_COUNT_MINUS_1]], i32 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i64> poison, i64 [[TRIP_COUNT_MINUS_1]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i64> [[BROADCAST_SPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i64> [[BROADCAST_SPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP3:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP3:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[OFFSET_IDX:%.*]] = sub i64 [[EXTRA_ITER]], [[INDEX]]			; CHECK-NEXT: [[OFFSET_IDX:%.*]] = sub i64 [[EXTRA_ITER]], [[INDEX]]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <4 x i64> poison, i64 [[OFFSET_IDX]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <4 x i64> [[BROADCAST_SPLATINSERT1]], <4 x i64> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[INDUCTION:%.*]] = add <4 x i64> [[BROADCAST_SPLAT2]], <i64 0, i64 -1, i64 -2, i64 -3>
	; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[OFFSET_IDX]], 0			; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[OFFSET_IDX]], 0
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT3:%.*]] = insertelement <4 x i64> poison, i64 [[INDEX]], i32 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT3:%.*]] = insertelement <4 x i64> poison, i64 [[INDEX]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT4:%.*]] = shufflevector <4 x i64> [[BROADCAST_SPLATINSERT3]], <4 x i64> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT4:%.*]] = shufflevector <4 x i64> [[BROADCAST_SPLATINSERT3]], <4 x i64> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[VEC_IV:%.*]] = add <4 x i64> [[BROADCAST_SPLAT4]], <i64 0, i64 1, i64 2, i64 3>			; CHECK-NEXT: [[VEC_IV:%.*]] = add <4 x i64> [[BROADCAST_SPLAT4]], <i64 0, i64 1, i64 2, i64 3>
	; CHECK-NEXT: [[TMP1:%.*]] = icmp ule <4 x i64> [[VEC_IV]], [[BROADCAST_SPLAT]]			; CHECK-NEXT: [[TMP1:%.*]] = icmp ule <4 x i64> [[VEC_IV]], [[BROADCAST_SPLAT]]
	; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt <4 x i32> [[VEC_PHI]], <i32 10, i32 10, i32 10, i32 10>			; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt <4 x i32> [[VEC_PHI]], <i32 10, i32 10, i32 10, i32 10>
	; CHECK-NEXT: [[TMP3]] = select <4 x i1> [[TMP2]], <4 x i32> [[VEC_PHI]], <4 x i32> <i32 10, i32 10, i32 10, i32 10>			; CHECK-NEXT: [[TMP3]] = select <4 x i1> [[TMP2]], <4 x i32> [[VEC_PHI]], <4 x i32> <i32 10, i32 10, i32 10, i32 10>
	; CHECK-NEXT: [[TMP4:%.*]] = select <4 x i1> [[TMP1]], <4 x i32> [[TMP3]], <4 x i32> [[VEC_PHI]]			; CHECK-NEXT: [[TMP4:%.*]] = select <4 x i1> [[TMP1]], <4 x i32> [[TMP3]], <4 x i32> [[VEC_PHI]]
	▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/tail-folding-vectorization-factor-1.ll

	Show All 21 Lines
	; CHECK-NEXT: [[INDUCTION2:%.*]] = add i64 [[INDEX]], 2			; CHECK-NEXT: [[INDUCTION2:%.*]] = add i64 [[INDEX]], 2
	; CHECK-NEXT: [[INDUCTION3:%.*]] = add i64 [[INDEX]], 3			; CHECK-NEXT: [[INDUCTION3:%.*]] = add i64 [[INDEX]], 3
	; CHECK-NEXT: [[TMP0:%.*]] = icmp ule i64 [[INDUCTION]], 14			; CHECK-NEXT: [[TMP0:%.*]] = icmp ule i64 [[INDUCTION]], 14
	; CHECK-NEXT: [[TMP1:%.*]] = icmp ule i64 [[INDUCTION1]], 14			; CHECK-NEXT: [[TMP1:%.*]] = icmp ule i64 [[INDUCTION1]], 14
	; CHECK-NEXT: [[TMP2:%.*]] = icmp ule i64 [[INDUCTION2]], 14			; CHECK-NEXT: [[TMP2:%.*]] = icmp ule i64 [[INDUCTION2]], 14
	; CHECK-NEXT: [[TMP3:%.*]] = icmp ule i64 [[INDUCTION3]], 14			; CHECK-NEXT: [[TMP3:%.*]] = icmp ule i64 [[INDUCTION3]], 14
	; CHECK-NEXT: br i1 [[TMP0]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]			; CHECK-NEXT: br i1 [[TMP0]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]
	; CHECK: pred.store.if:			; CHECK: pred.store.if:
	; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 [[INDUCTION]]			; CHECK-NEXT: [[SUNK_IND0:%.*]] = add i64 [[INDEX]], 0
				; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 [[SUNK_IND0]]
	; CHECK-NEXT: store i32 0, i32* [[TMP4]], align 4			; CHECK-NEXT: store i32 0, i32* [[TMP4]], align 4
	; CHECK-NEXT: br label [[PRED_STORE_CONTINUE]]			; CHECK-NEXT: br label [[PRED_STORE_CONTINUE]]
	; CHECK: pred.store.continue:			; CHECK: pred.store.continue:
	; CHECK-NEXT: br i1 [[TMP1]], label [[PRED_STORE_IF4:%.]], label [[PRED_STORE_CONTINUE5:%.]]			; CHECK-NEXT: br i1 [[TMP1]], label [[PRED_STORE_IF4:%.]], label [[PRED_STORE_CONTINUE5:%.]]
	; CHECK: pred.store.if4:			; CHECK: pred.store.if7:
	; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 [[INDUCTION1]]			; CHECK-NEXT: [[SUNK_IND1:%.*]] = add i64 [[INDEX]], 1
				; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 [[SUNK_IND1]]
	; CHECK-NEXT: store i32 0, i32* [[TMP5]], align 4			; CHECK-NEXT: store i32 0, i32* [[TMP5]], align 4
	; CHECK-NEXT: br label [[PRED_STORE_CONTINUE5]]			; CHECK-NEXT: br label [[PRED_STORE_CONTINUE5]]
	; CHECK: pred.store.continue5:			; CHECK: pred.store.continue8:
	; CHECK-NEXT: br i1 [[TMP2]], label [[PRED_STORE_IF6:%.]], label [[PRED_STORE_CONTINUE7:%.]]			; CHECK-NEXT: br i1 [[TMP2]], label [[PRED_STORE_IF6:%.]], label [[PRED_STORE_CONTINUE7:%.]]
	; CHECK: pred.store.if6:			; CHECK: pred.store.if9:
	; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 [[INDUCTION2]]			; CHECK-NEXT: [[SUNK_IND2:%.*]] = add i64 [[INDEX]], 2
				; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 [[SUNK_IND2]]
	; CHECK-NEXT: store i32 0, i32* [[TMP6]], align 4			; CHECK-NEXT: store i32 0, i32* [[TMP6]], align 4
	; CHECK-NEXT: br label [[PRED_STORE_CONTINUE7]]			; CHECK-NEXT: br label [[PRED_STORE_CONTINUE7]]
	; CHECK: pred.store.continue7:			; CHECK: pred.store.continue10:
	; CHECK-NEXT: br i1 [[TMP3]], label [[PRED_STORE_IF8:%.*]], label [[PRED_STORE_CONTINUE9]]			; CHECK-NEXT: br i1 [[TMP3]], label [[PRED_STORE_IF8:%.*]], label [[PRED_STORE_CONTINUE9]]
	; CHECK: pred.store.if8:			; CHECK: pred.store.if11:
	; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 [[INDUCTION3]]			; CHECK-NEXT: [[SUNK_IND3:%.*]] = add i64 [[INDEX]], 3
				; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 [[SUNK_IND3]]
	; CHECK-NEXT: store i32 0, i32* [[TMP7]], align 4			; CHECK-NEXT: store i32 0, i32* [[TMP7]], align 4
	; CHECK-NEXT: br label [[PRED_STORE_CONTINUE9]]			; CHECK-NEXT: br label [[PRED_STORE_CONTINUE9]]
	; CHECK: pred.store.continue9:			; CHECK: pred.store.continue12:
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], 16			; CHECK-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], 16
	; CHECK-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: br i1 true, label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 16, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 16, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	▲ Show 20 Lines • Show All 102 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[VPlan] Handle IV vector splat using VPWidenCanonicalIV.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 404277

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-forced.ll

llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding.ll

llvm/test/Transforms/LoopVectorize/AArch64/tail-fold-uniform-memops.ll

llvm/test/Transforms/LoopVectorize/ARM/mve-gather-scatter-tailpred.ll

llvm/test/Transforms/LoopVectorize/ARM/mve-reduction-types.ll

llvm/test/Transforms/LoopVectorize/X86/constant-fold.ll

llvm/test/Transforms/LoopVectorize/X86/optsize.ll

llvm/test/Transforms/LoopVectorize/X86/pr34438.ll

llvm/test/Transforms/LoopVectorize/X86/small-size.ll

llvm/test/Transforms/LoopVectorize/X86/tail_loop_folding.ll

llvm/test/Transforms/LoopVectorize/X86/vect.omp.force.small-tc.ll

llvm/test/Transforms/LoopVectorize/dont-fold-tail-for-const-TC.ll

llvm/test/Transforms/LoopVectorize/dont-fold-tail-for-divisible-TC.ll

llvm/test/Transforms/LoopVectorize/first-order-recurrence-sink-replicate-region.ll

llvm/test/Transforms/LoopVectorize/first-order-recurrence.ll

llvm/test/Transforms/LoopVectorize/pr44488-predication.ll

llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll

llvm/test/Transforms/LoopVectorize/pr46525-expander-insertpoint.ll

llvm/test/Transforms/LoopVectorize/pr51614-fold-tail-by-masking.ll

llvm/test/Transforms/LoopVectorize/select-reduction.ll

llvm/test/Transforms/LoopVectorize/tail-folding-vectorization-factor-1.ll

[VPlan] Handle IV vector splat using VPWidenCanonicalIV.
ClosedPublic