This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Analysis/
-
llvm/
-
Analysis/
2
TargetTransformInfo.h
-
lib/
-
Target/RISCV/
-
RISCV/
8/9
RISCVTargetTransformInfo.h
1
RISCVTargetTransformInfo.cpp
-
Transforms/Vectorize/
-
Vectorize/
82/140
LoopVectorize.cpp
12/30
VPlan.h
-
VPlanAnalysis.cpp
32/41
VPlanRecipes.cpp
2/6
VPlanTransforms.h
21/44
VPlanTransforms.cpp
1
VPlanValue.h
2/6
VPlanVerifier.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
RISCV/
-
inloop-reduction.ll
11/12
vectorize-vp-intrinsics.ll
8/10
vplan-vp-intrinsics.ll
-
X86/
4/5
vectorize-vp-intrinsics.ll
1/1
vplan-vp-intrinsics.ll
-
vectorize-vp-intrinsics-gather-scatter.ll
-
vectorize-vp-intrinsics-interleave.ll
-
vectorize-vp-intrinsics-iv32.ll
-
vectorize-vp-intrinsics-masked-loadstore.ll
-
vectorize-vp-intrinsics-no-masking.ll
-
vectorize-vp-intrinsics-reverse-load-store.ll
3
vectorize-vp-intrinsics.ll
1
vplan-vp-intrinsics.ll

Differential D99750

[LV, VP]VP intrinsics support for the Loop Vectorizer
Needs ReviewPublic

Authored by ABataev on Apr 1 2021, 10:32 AM.

Download Raw Diff

Details

Reviewers

rogfer01
simoll
sdesmalen
dmgreen
craig.topper
bmahjour
fhahn
hussainjk
cameron.mcinally
vkmr
reames
Ayal
evandro

Summary

This patch introduces generating VP intrinsics in the Loop Vectorizer.

Currently the Loop Vectorizer supports vector predication in a very limited capacity via tail-folding and masked load/store/gather/scatter intrinsics. However, this does not let architectures with active vector length predication support take advantage of their capabilities. Architectures with general masked predication support also can only take advantage of predication on memory operations. By having a way for the Loop Vectorizer to generate Vector Predication intrinsics, which (will) provide a target-independent way to model predicated vector instructions, These architectures can make better use of their predication capabilities.

Our first approach (implemented in this patch) builds on top of the existing tail-folding mechanism in the LV, but instead of generating masked intrinsics for memory operations it generates VP intrinsics for loads/stores instructions.

Other important part of this approach is how the Explicit Vector Length is computed. (We use active vector length and explicit vector length interchangeably; VP intrinsics define this vector length parameter as Explicit Vector Length (EVL)). We consider the following three ways to compute the EVL parameter for the VP Intrinsics.

The simplest way is to use the VF as EVL and rely solely on the mask parameter to control predication. The mask parameter is the same as computed for current tail-folding implementation.
The second way is to insert instructions to compute min(VF, trip_count - index) for each vector iteration.
For architectures like RISC-V, which have special instruction to compute/set an explicit vector length, we also introduce an experimental intrinsic get_vector_length, that can be lowered to architecture specific instruction(s) to compute EVL.

Also, added a new recipe to emit instructions for computing EVL. Using VPlan in this way will eventually help build and compare VPlans corresponding to different strategies and alternatives.

Tentative Development Roadmap

Use vp-intrinsics for all possible vector operations. That work has 2 possible implementations:
1. Introduce a new pass which transforms emitted vector instructions to vp intrinsics if the the loop was transformed to use predication for loads/stores. The advantage of this approach is that it does not require many changes in the loop vectorizer itself. The disadvantage is that it may require to copy some existing functionality from the loop vectorizer in a separate patch, have similar code in the different passes and perform the same analysis 2 times, at least.
2. Extend Loop Vectorizer using VectorBuildor and make it emit vp intrinsics automatically in presence of EVL value. The advantage is that it does not require a separate pass, thus it may reduce compile time. Plus, we can avoid code duplication. It requires some extra work in the LoopVectorizer to add VectorBuilder support and smart vector instructions/vp intrinsics emission. Also, to fully support Loop Vectorizer it will require adding a new PHI recipe to handle EVL on the previous iteration + extending several existing recipes with the new operands (depends on the design).
Switch to vp-intrinsics for memory operations for VLS and VLA vectorizations.

Diff Detail

Repository

rG LLVM Github Monorepo

Build Status

Buildable 258177
Build 402563: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Rebase, fix issue with exit condition

Harbormaster completed remote builds in B257753: Diff 557587.Oct 4 2023, 9:44 AM

Rebase

Harbormaster completed remote builds in B257817: Diff 557696.Oct 12 2023, 8:41 PM

Thanks for the latest update! It looks like some of the unit tests (check-llvm-unit) are failing (maybe related to the verifier changes?), would be good to check and update them.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5552	nit: if EVL is preferred?
9634	nit: `EC` unused?
9683	nit: `If EVLPart`
llvm/lib/Transforms/Vectorize/VPlan.h
2198	`VPExplicitVectorLengthExitPHISC`?
2209	nit: active lane mask phi?
llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
1752	nit: VPWidenPHIRecipe implies vector phi, but this is a scalar phi.
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
1026	Thanks for clarifying! Does the current codegen in the patch work correctly for cases where we execute more than 1 iteration for `EVL < VF`? IIUC the current approach with rounding up the trip count and using VF as increment assumes only one extra iteration.
1115	I think using `Exit` condition may be confusing, it replaces the predicate for the header mask.
1184	nit: `Entry->Header`
1186	How ActiveLaneMask factor in here?
1195	Would be good to describe here what recipes are added and what's changed.
1197	Simpler to first get the canonical IV via `getCanonicalIV` and use it to access its increment?
1213	IIUC this introduces `EVLExitPhi` and uses it for the canonical IV increment only, which controls the exit and adjust the canonical IV. Would it be possible to do it the other way around, i.e. keep the canonical induction incremented by the canonical IV increment (thus keeping it canonical), and instead have `EVLExitPhi` updated by `ExplicitVectorLengthIVIncrement`, and updating the relevant users (all except the canonical increment?) to use that one?

ABataev added inline comments.Oct 16 2023, 4:08 AM

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
1026	The total number of iterations is the same, just the vector length changes by balancing the value. If EVL is less than VLMAX, EVL is used as vector length. Only if VLMAX < AVL < 2 * VLMAX some magic may happen, i.e. in last 2 (vectorized) iterations.

Rebase, address comments

Harbormaster completed remote builds in B257861: Diff 557762.Oct 18 2023, 10:28 AM

fhahn added inline comments.Oct 19 2023, 7:53 AM

llvm/lib/Transforms/Vectorize/VPlan.h
2193	This is out of date now I think as now the CanonicalIV is left unchanged and used for the countable exit condition. There might be a better name for the recipe, as the current one doesn't seem to capture the essence of the recipe, which effectively represents the current index of elements to process in the current iteration, by account for EVL.
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
1115	nit: `replaceHeaderPredicateWithIdiom`?
1185	This shouldn't introduce any new dead recipes, unless the canonical IV is only used to control the exit, so IIUC the only dead recipes introduced here should be due to `replacePredicateWithIdiom`. May be good to either clean up the recipes there or move `addVectorPredication` after `optimizeInductions`, but that would require passing in an extra flag to `optimize()`, so maybe better to clean up the dead recipes directly after removing their users.
1185	nit: all uses except the canonical IV increment.
1186	nit: the only user is CanonicalIVIncrement I think, branch-on-count uses the increment.
1237	The only users remaining should be CanonicalIVIncrement. Simpler to just do the below? // Replace all uses of VPCanonicalIVPHIRecipe by // VPExplicitVectorLengthPHIRecipe except for - // VPInstruction::CanonicalIVIncrement and VPCanonicalIVPHIRecipe itself. - for (VPUser U : to_vector(CanonicalIVPHI->users())) { - if (auto I = dyn_cast<VPInstruction>(U); - I && I->getOpcode() == VPInstruction::CanonicalIVIncrement) - continue; - if (isa<VPCanonicalIVPHIRecipe>(U)) - continue; - auto *UI = dyn_cast<VPRecipeBase>(U); - if (!UI) - continue; - for (unsigned Idx = 0, E = UI->getNumOperands(); Idx < E; ++Idx) - if (UI->getOperand(Idx) == CanonicalIVPHI) - UI->setOperand(Idx, EVLPhi); - } - // Cleanup dead recipes after the transformation. - removeDeadRecipes(Plan); + // VPInstruction::CanonicalIVIncrement. + CanonicalIVPHI->replaceAllUsesWith(EVLPhi); + CanonicalIVIncrement->setOperand(0, CanonicalIVPHI);
llvm/lib/Transforms/Vectorize/VPlanTransforms.h
81	nit: not all users, all users except the canonical IV increment, right?
85	nit: `addExplicitVectorLength`?

Rebase, address comments

Thanks for the update. Some more comments inline. Mostly small suggestions, but there's one question if masked mem ops are handled correctly and a clarification about active vs effective vector length.

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
81	Just following up on this, should this the name be changed in TTI? Do you know the reason for referring to it has active vector length there vs effective vector length in the patch?
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9665	Is the gather scatter case handled correctly for EVL at the moment?
llvm/lib/Transforms/Vectorize/VPlan.h
2194	Might be good to explicit say that it starts at 0 and gets incremented by EVL in each iteration of the vector loop?
llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
349	nit: `Compute EVL.` would be more accurate IIUC, nothing gets set in this function AFACIT.
382	might be good to have a test case where the induction is `i32` and no cast is needed.
1756	`evl.based.iv`?
1765	This still needs updating I think
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
1140	`VPWidenCanonicalIVRecipe` should inherit from VPValue via VPHeaderPHIRecipe, so it should only define a single value and `WideCanonicalIV->getNumUsers()` should be sufficient.
1184–1240	needs updating with new name
1186	needs updating with new name
1218	name needs updating
1236	name needs updating
llvm/lib/Transforms/Vectorize/VPlanTransforms.h
82	`VPExplicitVectorLengthPHIRecipe` needs updating with new name
83	maybe `only used to control the loop` or something like that
llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
210	`Plan` here should never be `nullptr`

ABataev marked 13 inline comments as done.Oct 23 2023, 8:03 AM

ABataev added inline comments.

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
81	We can do it later in a separate patch. EVL stands for Explicit vector length, TTI interface was introduced long before this patch. There are just different abbrevs for the same technique - Active Vector Length, Explicit Vector length, etc.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9665	Added support for this.
llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
382	Added @iv32 function in the test
llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
210	It can be nullptr in the unit tests, had to add this to avoid crashes in unit tests.

Rebase, address comments

Harbormaster completed remote builds in B257912: Diff 557842.Oct 23 2023, 10:40 AM

fhahn added inline comments.Oct 26 2023, 11:46 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9666	Great thanks! Now that there is VP intrinsic handling in multiple places, would it be better to handle all EVL related codegen together, i.e. something like below to avoid complicating reading the existing non-EVL code. WDYT? for () Value VectorGep = State.get(getAddr(), Part); if (Value EVLPart = State.EVL ? State.get(State.EVL, Part) : nullptr) { NewSI = lowerUsingVectorIntrinsics(vectorGEP..) } else { existing code.... }
9671	Not sure if we also have a test case for this path, do you know if this would be handled correctly at the moment?
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
1217	Thinking a bit more about this, at the moment is only safe to replace the with TrueMask for VPWidenMemoryInstructionRecipes, for other recipes that use the mask it would be a potential mis-compile, correct? If not, then it would be good to have an assert that all users of the mask we replace are VPWidenMemoryInstructionRecipes. Might be good to also have a test case for such a scenario
llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
210	Ah I see. Do you happen to know which unit tests? I suspect they need fixing and also checking why this isn't already caught by verification.

Rebase, address comments

Fixed reversed loads/stores

Harbormaster completed remote builds in B257946: Diff 557905.Oct 26 2023, 4:40 PM

fhahn added inline comments.Oct 27 2023, 6:51 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9549	can leave unchanged?
9666	can leave unchanged now?
9669	Could you elaborate what better means here? Might have missed it, does the current code handle reverse?
9711	can leave unchanged?
9718	can leave unchanged?
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
1217	Just do double-check as it looks like the above may have been missed in the latest update, WDYT?

ABataev added inline comments.Oct 27 2023, 6:55 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9669	I disabled reverse support, see useVPWithVPEVLVectorization(), need support for vp_reverse intrinsic, which is not added yet.
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
1217	Yes, missed it, will fix

Rebase, address comments

ABataev marked 4 inline comments as done.Oct 27 2023, 7:47 AM

Harbormaster completed remote builds in B257951: Diff 557913.Oct 27 2023, 10:53 AM

Thanks for the updates! I think all correctness issues should now have been addressed AFAICT, some minor comments left inline

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9653	nit: Drop `better` here, as reverse storing isn't supported at all at the moment. Would be good to also assert.
9693–9707	Indent looks off here
9699	nit: drop `Better` here, as reverse loading isn't supported at all a the moment. Would be good to assert here that the load isn't reversed
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
1139	Would it be simpler just have a common function for finding all the header predicates and then have the code to update the users at the call sites?
1142	Added `VPValue::replaceUsesWithIf` in a002271972fb3fb2877bdb4abf9275b2c1291036 as there were already multiple places hand-rolling that functionality.

ABataev marked 5 inline comments as done.Nov 7 2023, 6:26 AM

ABataev added inline comments.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
1139	Tried it, does not look better. Need to remove WideCanonicalIV, it means need it to find it again after the procedure, this leads to code duplication and size increase. Replace parameter with the predicate function instead.

Rebase, address comments

Harbormaster completed remote builds in B258034: Diff 558035.Nov 7 2023, 7:39 AM

shiva0217 added a subscriber: shiva0217.Nov 8 2023, 12:16 AM

shiva0217 added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1665	If the loop contains reduction variables, there might need a mask to merge the last two iteration results. int a[128]; int foo (int end) { int size = 0; for (int i = 0; i < end; i++) size += a[i]; return size; } Should the case be guarded by `Legal->getReductionVars().empty() &&` ?
llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
1757	There is a case that the PHI didnt' been inserted at top of basic block. int foo (int value, int buf, int end) { int tmp; for (tmp = buf; tmp < end; tmp++) value -= tmp; return value; } Should we specify insertion point? Something like: PHINode EntryPart = PHINode::Create( Start->getType(), 2, "evl.based.iv", &State.CFG.PrevBB->getFirstInsertionPt());
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
1124	There is a case that VPWidenCanonicalIVRecipe didn't be generated with tail folding. int i; int foo (int q, int z) { int e = 0; while (z < 1) { q = z * 2; if (q != 0) for (i = 0; i < 2; ++i) e = 5; ++z; } return e; } `for (i = 0; i < 2; ++i)` been simplifed as `store i32 2, ptr @i`. Both pointer and store value are loop-invariant, so the mask(VPWidenCanonicalIVRecipe) might not be generated. Should we suppress the replacement when the mask is not available?

ABataev updated this revision to Diff 558054.Nov 8 2023, 6:36 AM

ABataev marked 3 inline comments as done.

Rebase, address comments

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1665	Added
llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
1757	Fixed in VPlanTransforms.cpp by inserting the recipe immediately after CanonicalIVPHI.
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
1124	Fixed, added the test

Harbormaster completed remote builds in B258045: Diff 558054.Nov 8 2023, 9:21 AM

Rebase, ping!

Harbormaster completed remote builds in B258072: Diff 558097.Nov 14 2023, 8:41 AM

fhahn added inline comments.Nov 15 2023, 6:43 AM

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
1757	I think `VPEVLBasedIVPHIRecipe` should be turned into a subclass of VPHeaderPHIRecipe, this will also ensure that the VPlan verifier checks it is in the header section
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
1124	Was this fixed by adding the `bool KeepVPCanonicalWidenRecipes` flag? What's the test case for this? There's a new `no_masking` case, but it has an empty body and no vector code is generated?
llvm/test/Transforms/LoopVectorize/RISCV/vectorize-vp-intrinsics.ll
6	no need to redirect stderr here?
8	Can this configuration be used for target-independent tests?
386	This test file is getting quite big with 3 different run lines. I think it would be good to try to split this up a bit, to make it easier to see what's going on. I'd recommend having the test cases for various legality issues as target-independent tests with force flags (force EVL support, VF and IC). And keep cost-model specific tests target specific.
llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-intrinsics.ll
4	Can this test be target independent? does it need to check the no VP case?

ABataev added inline comments.Nov 15 2023, 6:55 AM

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
1124	Yes. Yes, this test is a reduced version of the failed case
llvm/test/Transforms/LoopVectorize/RISCV/vectorize-vp-intrinsics.ll
6	Will drop
8	Not now, it relies on the check of the TTI interface for now
llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-intrinsics.ll
4	No Yes, need to check that the option works correctly

ABataev added inline comments.Nov 16 2023, 8:47 AM

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
1757	You mean it should be dervied from VPEVLBasedIVPHIRecipe? Already done.

Rebase, address comments

llvm/test/Transforms/LoopVectorize/RISCV/vectorize-vp-intrinsics.ll
8	Added several target independent tests
386	Done
llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-intrinsics.ll
4	Added also target independent version

Harbormaster completed remote builds in B258086: Diff 558115.Nov 16 2023, 6:01 PM

fhahn added inline comments.Nov 17 2023, 12:51 PM

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
1124	Is it possible it is over-reduced? The same IR seems to be generated both with and without `KeepVPCanonicalWidenRecipes` IIUC, because the loop is not vectorized due to being empty. What's the issue if there's no mask/canonical widen recipe? Wouldn't it be fine to jus not replace anything?

fhahn added inline comments.Nov 17 2023, 12:53 PM

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
1757	Great!

ABataev added inline comments.Nov 17 2023, 12:56 PM

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
1124	When I checked it, it crashed without this parameter, maybe there were some other changes.

fhahn added inline comments.Nov 17 2023, 1:20 PM

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
1124	Can you check if it still crashes? Would be good to understand exactly what the issue is, and if possible avoid having a separate `KeepVPCanonicalWidenRecipes` flag

shiva0217 added inline comments.Nov 19 2023, 11:57 PM

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
1124	`KeepVPCanonicalWidenRecipes` might be motivated by the case that VPWidenCanonicalIVRecipe once exist but been optimized out to the (VPWidenIntOrFpInductionRecipe IV ule trip-count) which let replaceHeaderPredicateWithIdiom fail to replace (VPWidenCanonicalIV ule trip-count) to all-true mask. void foo (char *a) { for (int i = 0; i < 256; i++) if (i != '\n') a[i] = 0; }

Rebase
Removed KeepVPCanonicalWidenRecipes parameter since there is check for VPWidenCanonicalIVRecipe presence in replaceHeaderPredicateWithIdiom

Harbormaster completed remote builds in B258100: Diff 558136.Nov 20 2023, 1:40 PM

Rebase, ping!

Harbormaster completed remote builds in B258128: Diff 558174.Nov 27 2023, 6:30 AM

fhahn added inline comments.Nov 27 2023, 12:43 PM

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
1124	`VPWidenCanonicalIV` should only be replaced by something else if here's user that accesses the vector values of it I think. Could you share an IR test case that would show an issue? It is still not clear to me what the exact issue would be. @ABataev just to double check, the latest version shouldn't have any issues with @shiva0217's test case, correct?

I think it may be also good to run clang-format again on the latest version of the patch if possible

In D99750#4657540, @fhahn wrote:

I think it may be also good to run clang-format again on the latest version of the patch if possible

I have a special hook for formatting checking, it does not report any issues about formatting

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
1124	Yes, I added reduced test cases to this patch already and they do not crash

Rebase, fixes.

Harbormaster completed remote builds in B258148: Diff 558199.Nov 30 2023, 5:02 PM

Rebase

Harbormaster completed remote builds in B258154: Diff 558211.Dec 5 2023, 6:39 AM

GitHub <noreply@github.com> mentioned this in rGa5891fa4d2b7: [VPlan] Initial modeling of VF * UF as VPValue. (#74761).Dec 8 2023, 10:30 AM

Went over parts of this patch again, have other parts yet to go over again. Status of various past comments should be clarified.

llvm/include/llvm/IR/IRBuilder.h
2576–2581 ↗	(On Diff #558054)	This is used by VectorBuilder, committed in 9959cdb66a02b, some 1800 lines above.
838 ↗	(On Diff #558136)	Remove getTrueVector() below, given getAllOnesMask() here.
llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
76	Documentation of "hasActiveVectorLength()" indicator here should better explain the intent. E.g., whether the target supports Active Vector Length intrinsics for given \p Opcode, \p DataType and \p Alignment.
79	Looks like it to me, specifically to its 3rd TTI related bullet, but @ABataev should confirm.
81	Better use a single term, consistently, than have different abbreviations for the same thing. To clarify: `VF`: a constant number-of-elements provided within the Type, either a compile-time constant or a runtime constant, aka Fixed or Scalable, respectively. `mask`: a boolean vector of type `<VF x i1>` used by all vector intrinsics - both `llvm.masked.loads/stores/gather/scatter` and `llvm.vp.`. Let `FirstVF` denote the number-of-elements that can be processed in the first vector iteration. If all iterations can process FirstVF elements - because we know the trip count of the loop `N` is a multiple of FirstVF or because we leave a tail of remaining leftover iterations to be processed by a subsequent loop, then there is no need for any "Active/Explicit Vector Length" support. Otherwise, when folding the tail - the last vector iteration operates on `LastVF = N - IV <= FirstVF` elements where IV is the canonical induction variable of the loop. Combining LastVF with VF for all but last iteration yields `IterVF = min(N - IV, VF)` which varies per iteration. Conceptually, the vector type inside a tail-folded loop should use `IterVF` as its number-of-elements rather than `VF`, if types would support non-constant VF's. Instead, the type continues to use a conservative constant `VF` as a maximal/full number-of-elements across all loop iterations, and the excessive lanes of last iteration are masked-out by computing a vector `tail_mask` using a compare: `<VF x i1> %tail_mask = icmp ule <IV,IV+1,...,IV+VF-1>, <N,N,...,N>` or an intrinsic `<VF x i1> %tail_mask = llvm.get.active.lane.mask_VF(IV, N)` which captures this comparison w/o relying on vectorizing or broadcasting IV and N explicitly. This tail_mask is then folded into `mask &= %tail_mask`. The term `active` seems to refer to the dynamic, changing nature of the mask. But the name `get.active.lane.mask` is confusing - every* mask by definition is there to indicate which lanes are active. Perhaps a more accurate name would be `get.variable.vf.mask`. Another option could be `get.prefix.mask_VF(N-IV)` which simply produces a mask whose 'on' bits are the first N-IV bits, but there's an overflow case to consider. Now the `evl` support mentioned here, is closely tied to RVV's setvl, and involves two aspects: (1) providing a variable VF as a separate scalar operand to vector intrinsics - only the `llvm.vp.*` ones - alongside the mask operand rather than inside it; and (2) obtaining the variable VF for a given iteration by some target-dependent computation which may differ from min(N - IV, VF) for the last two iterations. The term `explicit` seems to refer to (1), in contrast to implicitly embedding the variable-VF inside the constant-VF via `mask`. Perhaps a more accurate name would be `get.variable.vf`, analogous to the above suggestion but providing the scalar variable VF rather than a mask thereof.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
271–277	(zlxu-rivai) IMHO, the vector length predication could be regarded as an another form of tail-folding, but not like ARM SVE which is based on active.lane.mask but based on an integer as vector length. Based on this notion of unification, it would be better to augment the enum TailFoldingStyle, for example, add new entries like TailFoldingStyle::DataWithEVL (serving the purpose of EVLOption::IfEVLSupport) and the existent TailFoldingStyle::None can be reused as EVLOption::NoPredication. (ABataev) Not necessarily. Generally speaking, we would like to support vectorized loop remainder in the future as one of the alternative solutions, so better to keep them separate. I agree with zlxu-rivai, this patch deals with yet another Style of tail folding, as in the proposed `DataWithEVL`. If/when EVL is desired w/o tail folding, a separate patch with potential options should be discussed.
282	Looks like it. Tail folding is "enabled", "forced", or absent.
1113	Better check if CurRec is a header PHI recipe?
1661	Reiterating the suggestion to use `VPI` to denote `llvm.vp.*` vector predicate intrinsicts, if/where needed, and "foldTailByEVL()" for deciding to use EVL style tail-folding.
1663	nit: move this TODO into a FIXME next to checking isSafeForAnyVectorWidth as other cases?
1823	+1 Suffice to say `PreferEVL`, as that implies `VPI`.
4142	Shouldn't/Doesn't ForceEVLSupport imply useVPWithVPEVLVectorization()? Other cases of using EVL can scalarize-and-predicate loads and stores? Early-return earlier.
4907	The logic of this method is getting excessively complicated. Suggest to turn this into an early-exiting if (!Legal->prepareToFoldTailByMasking()).
4913	Is there a test? Message intends to say that preference was indicated but ignored, i.e., tail will be folded (with UF>1) but w/o VP intrinsics - w/ or w/o EVL?
4920	ScalableVF is surely scalable, assert rather than ask? isNonZero or isVector?
4923	This method already has an undesired side-effect of setting CanFoldTailByMasking independent of returning MaxFactors, better avoid having it set yet another side-effect PreferVPWithVPEVLIntrinsics. Perhaps extend CanFoldTailByMasking from a bool to also indicate how the tail is folded, and/or extend the return value to carry more than a FixedScalableVFPair? Logic can be simplified into PreferVPWithVPEVLIntrinsics = PreferPredicateWithVPEVLIntrinsics == EVLOption::IfEVLSupported \|\| TTI.hasActiveVectorLength(0, nullptr, Align()); the rest is debug prints placed under !NDEBUG, or a single LLVM_DEBUG with some ternary ? selecting if the preference indicated is followed or ignored.
4929	"if the target support vector length predication" - we already checked that it does?
4940	This deserves a comment, explaining that a tail folded using VP intrinsics restricts the VF to be scalable.
8979	What's the relation between useVPWithVPEVLVectorization and useActiveLaneMask? Should the latter cover the former, so that it suffices to check if (useActiveLaneMask(Style)), or is it meaningful to have both true?
9520	These two functions each have a single caller, better defined as lambdas next to them? Simpler to separate into two separate store/scatter functions? Remove EVLPart because EVL currently works w/o unrolling, introduce it in the future as part of enabling EVL with unrolling?
9565	Reason explained above: execute() of recipes should be straightforward. This is one of the main guidelines outlined in the VPlan roadmap. This recipe is getting too complicated, should probably separate gather/scatter from wide load/store, and separate the pointer setting (as in [VPlan] Model address separately. #72164), independent of this patch. Recipe should indicate statically if EVL is used or not, to simplify code-gen and facilitate cost estimation, rather than having to check State.EVL during execute(). If multiple recipes share some common core, it can be shared via a common base class, as in VPHeaderPHIRecipe and VPRecipeWithIRFlags.
9640	Moving tail folding to a transform, following VPlan roadmap, is already quite an endeavor, which would hopefully better facilitate this patch. Better avoid complicating it further.
9647–9662	Better simply check if (State.EVL) and then State.get() it, or could the latter return null? Raised as a nit below. Better avoid checking if State.EVL altogether during VPlan execution, as noted above.
llvm/lib/Transforms/Vectorize/VPlan.h
2209	Is a dedicated recipe needed, or would some general header-phi recipe/VPInstruction suffice?
llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
359	This patch enables EVL for scalable VF's only, so best avoid pretending otherwise (w/o testing) and pass `true` for now. Extend to VF.isScalable when EVL is extended to handle fixed VF's - along with suitable tests.
367
369	to match RISC-V vsetvli terminology. Avoid AVL which also stands for Active Vector Length...
382	Have Phi and EVL be of the same type instead of needs to cast the latter to the former? Is a dedicated VPInstruction needed, or would a general Add suffice, similar to [VPlan] Initial modeling of VF * UF as VPValue. #74761?
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
1026	As @nikolaypanchenko clarified in https://reviews.llvm.org/D99750#inline-1521321 Hence the trip-count can be computed by taking the ceiling of TC divided by the EVL of first vector iteration, which can be computed in the pre-header. But if the loop iterates until an index reaches an upper bound, where the index is repeatedly bumped by a non-invariant EVL, then the index is not an Induction Variable and the upper bound does not qualify as a (scaled) trip-count, countability of the loop is hidden away.
llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-intrinsics.ll
20	UF>=1 or UF=1?
32	scalar-steps is needed only for UF>1, as this feeds scalar/cloned geps only?
34	Widened loads and store here should say that they use EVL, implicitly.
40	Can this simply be an ADD of vp<[[EVL_PHI]]> and vp<[[EVL]]>?
42	IF-EVL and FORCE-EVL are the same, check them together?
83	MASK is unused?
117	safe dep relies on VF*UF < 100, where VF relies on -riscv-v-vector-bits-max=128, vscale x 4 is omitted from VF, and UF>=1 is restricted?
llvm/test/Transforms/LoopVectorize/X86/vectorize-vp-intrinsics.ll
2	The three runs produce the same results, can combine their CHECKs.
20–21	nit: can remove CHECKs for this redundant block which never jumps to the dead scalar loop.
51–68	nit: can remove the CHECKs for the dead scalar loop.
llvm/test/Transforms/LoopVectorize/X86/vplan-vp-intrinsics.ll
20–44	All three runs {IF-EVL-NEXT, FORCE-EVL, NO-VP} result in the same VPlan; combine their checks together?
llvm/test/Transforms/LoopVectorize/vectorize-vp-intrinsics.ll
19	IF-EVL can share checks with NO-VP?
llvm/test/Transforms/LoopVectorize/vplan-vp-intrinsics.ll
5	This is IF-EVL, but its checks coincide with those of NO-VP? Perhaps use some combined CHECK

Rebase, address comments.
Moving it to github, since Phab does not send e-mails anymore.

Harbormaster completed remote builds in B258177: Diff 558247.Thu, Dec 21, 10:11 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

TargetTransformInfo.h

5 lines

lib/

Target/

RISCV/

RISCVTargetTransformInfo.h

16 lines

RISCVTargetTransformInfo.cpp

4 lines

Transforms/

Vectorize/

159 lines

43 lines

16 lines

66 lines

7 lines

111 lines

1 line

51 lines

test/

Transforms/

LoopVectorize/

RISCV/

inloop-reduction.ll

66 lines

vectorize-vp-intrinsics.ll

142 lines

vplan-vp-intrinsics.ll

125 lines

X86/

vectorize-vp-intrinsics.ll

127 lines

vplan-vp-intrinsics.ll

83 lines

vectorize-vp-intrinsics-gather-scatter.ll

64 lines

vectorize-vp-intrinsics-interleave.ll

169 lines

vectorize-vp-intrinsics-iv32.ll

84 lines

vectorize-vp-intrinsics-masked-loadstore.ll

81 lines

vectorize-vp-intrinsics-no-masking.ll

46 lines

vectorize-vp-intrinsics-reverse-load-store.ll

64 lines

vectorize-vp-intrinsics.ll

97 lines

vplan-vp-intrinsics.ll

36 lines

Diff 558247

llvm/include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 184 Lines • ▼ Show 20 Lines	enum class TailFoldingStyle {
/// This method always requires a runtime overflow check for the i + VL		/// This method always requires a runtime overflow check for the i + VL
/// increment inside the loop, because it uses the result direclty in the		/// increment inside the loop, because it uses the result direclty in the
/// active.lane.mask to calculate the mask for the next iteration. If the		/// active.lane.mask to calculate the mask for the next iteration. If the
/// increment overflows, the mask is no longer correct.		/// increment overflows, the mask is no longer correct.
DataAndControlFlow,		DataAndControlFlow,
/// Use predicate to control both data and control flow, but modify		/// Use predicate to control both data and control flow, but modify
/// the trip count so that a runtime overflow check can be avoided		/// the trip count so that a runtime overflow check can be avoided
/// and such that the scalar epilogue loop can always be removed.		/// and such that the scalar epilogue loop can always be removed.
DataAndControlFlowWithoutRuntimeCheck		DataAndControlFlowWithoutRuntimeCheck,
		/// Use predicated EVL instructions for tail-folding.
		/// Indicates that VP intrinsics should be used if tail-folding is enabled.
		DataWithEVL,
};		};

struct TailFoldingInfo {		struct TailFoldingInfo {
TargetLibraryInfo *TLI;		TargetLibraryInfo *TLI;
LoopVectorizationLegality *LVL;		LoopVectorizationLegality *LVL;
InterleavedAccessInfo *IAI;		InterleavedAccessInfo *IAI;
TailFoldingInfo(TargetLibraryInfo TLI, LoopVectorizationLegality LVL,		TailFoldingInfo(TargetLibraryInfo TLI, LoopVectorizationLegality LVL,
InterleavedAccessInfo *IAI)		InterleavedAccessInfo *IAI)
▲ Show 20 Lines • Show All 1,482 Lines • ▼ Show 20 Lines	struct VPLegalization {
// Convert: Convert this to a non-VP operation.		// Convert: Convert this to a non-VP operation.
// The 'Discard' strategy is invalid.		// The 'Discard' strategy is invalid.
VPTransform OpStrategy;		VPTransform OpStrategy;

bool shouldDoNothing() const {		bool shouldDoNothing() const {
return (EVLParamStrategy == Legal) && (OpStrategy == Legal);		return (EVLParamStrategy == Legal) && (OpStrategy == Legal);
}		}
VPLegalization(VPTransform EVLParamStrategy, VPTransform OpStrategy)		VPLegalization(VPTransform EVLParamStrategy, VPTransform OpStrategy)
: EVLParamStrategy(EVLParamStrategy), OpStrategy(OpStrategy) {}		: EVLParamStrategy(EVLParamStrategy), OpStrategy(OpStrategy) {}
		frasercrmckUnsubmitted Not Done Reply Inline Actions Part of me wants to say this should be `usesCustomActiveVectorLengthIntrinsic` like "the target `hasActiveVectorLength`" and "the target `supportsScalableVectors`". I'm not arguing very strongly though because we have `preferPredicatedReductionSelect`. I know, it's the epitome of bikeshedding. I don't know if it's any better if it's self-aware. frasercrmck: Part of me wants to say this should be `usesCustomActiveVectorLengthIntrinsic` like "the target…
		vkmrUnsubmitted Not Done Reply Inline Actions Hmmm...I am in double-mind here too. One way to look at it is that with `useCustomActiveVectorLengthIntrinsic` or `preferPredicatedReductionSelect` there is a notion of choice - even if it can use a custom vector length intrinsic it might choose not to; It is being told to pick one choice . Whereas `usesCustomActiveVectorLengthIntrinsic` or `hasActiveVectorLength` implies reporting a fact. Not sure if it makes much sense! May be others can chime in too. I am not too particular about names as long as they are reasonably sensible. I know, it's the epitome of bikeshedding. I don't know if it's any better if it's self-aware. "There are only two hard things in Computer Science: cache invalidation and naming things." -- Phil Karlton vkmr: Hmmm...I am in double-mind here too. One way to look at it is that with…
};		};

/// \returns How the target needs this vector-predicated operation to be		/// \returns How the target needs this vector-predicated operation to be
/// transformed.		/// transformed.
VPLegalization getVPLegalizationStrategy(const VPIntrinsic &PI) const;		VPLegalization getVPLegalizationStrategy(const VPIntrinsic &PI) const;
/// @}		/// @}

/// \returns Whether a 32-bit branch instruction is available in Arm or Thumb		/// \returns Whether a 32-bit branch instruction is available in Arm or Thumb
▲ Show 20 Lines • Show All 1,216 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h

Show First 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	public:
InstructionCost getIntImmCostInst(unsigned Opcode, unsigned Idx,		InstructionCost getIntImmCostInst(unsigned Opcode, unsigned Idx,
const APInt &Imm, Type *Ty,		const APInt &Imm, Type *Ty,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
Instruction *Inst = nullptr);		Instruction *Inst = nullptr);
InstructionCost getIntImmCostIntrin(Intrinsic::ID IID, unsigned Idx,		InstructionCost getIntImmCostIntrin(Intrinsic::ID IID, unsigned Idx,
const APInt &Imm, Type *Ty,		const APInt &Imm, Type *Ty,
TTI::TargetCostKind CostKind);		TTI::TargetCostKind CostKind);

		/// \name Vector Predication Information
		/// Whether the target supports the %evl parameter of VP intrinsic efficiently
		hiradityaUnsubmitted Done Reply Inline Actions nit: What is this intended for? hiraditya: nit: What is this intended for?
		AyalUnsubmitted Done Reply Inline Actions Documentation of "hasActiveVectorLength()" indicator here should better explain the intent. E.g., whether the target supports Active Vector Length intrinsics for given \p Opcode, \p DataType and \p Alignment. Ayal: Documentation of "hasActiveVectorLength()" indicator here should better explain the intent. E.g.
		/// in hardware, for the given opcode and type/alignment. (see LLVM Language
		/// Reference - "Vector Predication Intrinsics",
		/// https://llvm.org/docs/LangRef.html#vector-predication-intrinsics and
		hiradityaUnsubmitted Done Reply Inline Actions Does this correspond to: https://llvm.org/docs/Proposals/VectorPredication.html#ir-level-vp-intrinsics ? hiraditya: Does this correspond to: https://llvm.org/docs/Proposals/VectorPredication.html#ir-level-vp…
		AyalUnsubmitted Done Reply Inline Actions Looks like it to me, specifically to its 3rd TTI related bullet, but @ABataev should confirm. Ayal: Looks like it to me, specifically to its 3rd TTI related bullet, but @ABataev should confirm.
		/// "IR-level VP intrinsics",
		/// https://llvm.org/docs/Proposals/VectorPredication.html#ir-level-vp-intrinsics).
		AyalUnsubmitted Not Done Reply Inline Actions Better called `hasEffectiveVectorLength()` as in hasEVL - for consistency? From the summary: "We use active vector length and explicit vector length interchangeably" - why? Could only EVL be used, consistently? (RISC-V documentation of setvl has AVL as a parameter and [E]VL as return/set value.) Add Opcode/DataType/Alignment parameters later - when used? Ayal: Better called `hasEffectiveVectorLength()` as in hasEVL - for consistency? From the summary…
		ABataevAuthorUnsubmitted Done Reply Inline Actions This is separate. This function is already introduced in TTI, the purpose of the patch is not related to TTI directly. As to params, it is done in the implementation. ABataev: This is separate. This function is already introduced in TTI, the purpose of the patch is not…
		fhahnUnsubmitted Done Reply Inline Actions Just following up on this, should this the name be changed in TTI? Do you know the reason for referring to it has active vector length there vs effective vector length in the patch? fhahn: Just following up on this, should this the name be changed in TTI? Do you know the reason for…
		ABataevAuthorUnsubmitted Done Reply Inline Actions We can do it later in a separate patch. EVL stands for Explicit vector length, TTI interface was introduced long before this patch. There are just different abbrevs for the same technique - Active Vector Length, Explicit Vector length, etc. ABataev: We can do it later in a separate patch. EVL stands for Explicit vector length, TTI interface…
		AyalUnsubmitted Done Reply Inline Actions Better use a single term, consistently, than have different abbreviations for the same thing. To clarify: `VF`: a constant number-of-elements provided within the Type, either a compile-time constant or a runtime constant, aka Fixed or Scalable, respectively. `mask`: a boolean vector of type `<VF x i1>` used by all vector intrinsics - both `llvm.masked.loads/stores/gather/scatter` and `llvm.vp.`. Let `FirstVF` denote the number-of-elements that can be processed in the first vector iteration. If all iterations can process FirstVF elements - because we know the trip count of the loop `N` is a multiple of FirstVF or because we leave a tail of remaining leftover iterations to be processed by a subsequent loop, then there is no need for any "Active/Explicit Vector Length" support. Otherwise, when folding the tail - the last vector iteration operates on `LastVF = N - IV <= FirstVF` elements where IV is the canonical induction variable of the loop. Combining LastVF with VF for all but last iteration yields `IterVF = min(N - IV, VF)` which varies per iteration. Conceptually, the vector type inside a tail-folded loop should use `IterVF` as its number-of-elements rather than `VF`, if types would support non-constant VF's. Instead, the type continues to use a conservative constant `VF` as a maximal/full number-of-elements across all loop iterations, and the excessive lanes of last iteration are masked-out by computing a vector `tail_mask` using a compare: `<VF x i1> %tail_mask = icmp ule <IV,IV+1,...,IV+VF-1>, <N,N,...,N>` or an intrinsic `<VF x i1> %tail_mask = llvm.get.active.lane.mask_VF(IV, N)` which captures this comparison w/o relying on vectorizing or broadcasting IV and N explicitly. This tail_mask is then folded into `mask &= %tail_mask`. The term `active` seems to refer to the dynamic, changing nature of the mask. But the name `get.active.lane.mask` is confusing - every* mask by definition is there to indicate which lanes are active. Perhaps a more accurate name would be `get.variable.vf.mask`. Another option could be `get.prefix.mask_VF(N-IV)` which simply produces a mask whose 'on' bits are the first N-IV bits, but there's an overflow case to consider. Now the `evl` support mentioned here, is closely tied to RVV's setvl, and involves two aspects: (1) providing a variable VF as a separate scalar operand to vector intrinsics - only the `llvm.vp.` ones - alongside the mask operand rather than inside it; and (2) obtaining the variable VF for a given iteration by some target-dependent computation which may differ from min(N - IV, VF) for the last two iterations. The term `explicit` seems to refer to (1), in contrast to implicitly embedding the variable-VF inside the constant-VF via `mask`. Perhaps a more accurate name would be `get.variable.vf`, analogous to the above suggestion but providing the scalar variable VF rather than a mask thereof. Ayal:* Better use a single term, consistently, than have different abbreviations for the same thing.
		/// \param Opcode the opcode of the instruction checked for predicated version
		/// support.
		/// \param DataType the type of the instruction with the \p Opcode checked for
		/// prediction support.
		/// \param Alignment the alignment for memory access operation checked for
		/// predicated version support.
		bool hasActiveVectorLength(unsigned Opcode, Type *DataType,
		Align Alignment) const;

TargetTransformInfo::PopcntSupportKind getPopcntSupport(unsigned TyWidth);		TargetTransformInfo::PopcntSupportKind getPopcntSupport(unsigned TyWidth);

bool shouldExpandReduction(const IntrinsicInst *II) const;		bool shouldExpandReduction(const IntrinsicInst *II) const;
bool supportsScalableVectors() const { return ST->hasVInstructions(); }		bool supportsScalableVectors() const { return ST->hasVInstructions(); }
bool enableOrderedReductions() const { return true; }		bool enableOrderedReductions() const { return true; }
bool enableScalableVectorization() const { return ST->hasVInstructions(); }		bool enableScalableVectorization() const { return ST->hasVInstructions(); }
TailFoldingStyle		TailFoldingStyle
getPreferredTailFoldingStyle(bool IVUpdateMayOverflow) const {		getPreferredTailFoldingStyle(bool IVUpdateMayOverflow) const {
▲ Show 20 Lines • Show All 286 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

	Show First 20 Lines • Show All 163 Lines • ▼ Show 20 Lines
	InstructionCost			InstructionCost
	RISCVTTIImpl::getIntImmCostIntrin(Intrinsic::ID IID, unsigned Idx,			RISCVTTIImpl::getIntImmCostIntrin(Intrinsic::ID IID, unsigned Idx,
	const APInt &Imm, Type *Ty,			const APInt &Imm, Type *Ty,
	TTI::TargetCostKind CostKind) {			TTI::TargetCostKind CostKind) {
	// Prevent hoisting in unknown cases.			// Prevent hoisting in unknown cases.
	return TTI::TCC_Free;			return TTI::TCC_Free;
	}			}

				bool RISCVTTIImpl::hasActiveVectorLength(unsigned, Type *DataTy, Align) const {
				return ST->hasVInstructions();
				}

	TargetTransformInfo::PopcntSupportKind			TargetTransformInfo::PopcntSupportKind
				fhahnUnsubmitted Not Done Reply Inline Actions AFACIT this is all dead code for now? fhahn: AFACIT this is all dead code for now?
	RISCVTTIImpl::getPopcntSupport(unsigned TyWidth) {			RISCVTTIImpl::getPopcntSupport(unsigned TyWidth) {
	assert(isPowerOf2_32(TyWidth) && "Ty width must be power of 2");			assert(isPowerOf2_32(TyWidth) && "Ty width must be power of 2");
	return ST->hasStdExtZbb() \|\| ST->hasVendorXCVbitmanip()			return ST->hasStdExtZbb() \|\| ST->hasVendorXCVbitmanip()
	? TTI::PSK_FastHardware			? TTI::PSK_FastHardware
	: TTI::PSK_Software;			: TTI::PSK_Software;
	}			}

	bool RISCVTTIImpl::shouldExpandReduction(const IntrinsicInst *II) const {			bool RISCVTTIImpl::shouldExpandReduction(const IntrinsicInst *II) const {
	▲ Show 20 Lines • Show All 1,666 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 117 Lines • ▼ Show 20 Lines
#include "llvm/IR/Operator.h"		#include "llvm/IR/Operator.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/IR/ProfDataUtils.h"		#include "llvm/IR/ProfDataUtils.h"
#include "llvm/IR/Type.h"		#include "llvm/IR/Type.h"
#include "llvm/IR/Use.h"		#include "llvm/IR/Use.h"
#include "llvm/IR/User.h"		#include "llvm/IR/User.h"
#include "llvm/IR/Value.h"		#include "llvm/IR/Value.h"
#include "llvm/IR/ValueHandle.h"		#include "llvm/IR/ValueHandle.h"
		#include "llvm/IR/VectorBuilder.h"
#include "llvm/IR/Verifier.h"		#include "llvm/IR/Verifier.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Compiler.h"		#include "llvm/Support/Compiler.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/InstructionCost.h"		#include "llvm/Support/InstructionCost.h"
#include "llvm/Support/MathExtras.h"		#include "llvm/Support/MathExtras.h"
▲ Show 20 Lines • Show All 108 Lines • ▼ Show 20 Lines	cl::values(
TailFoldingStyle::Data, "data",		TailFoldingStyle::Data, "data",
"Create lane mask for data only, using active.lane.mask intrinsic"),		"Create lane mask for data only, using active.lane.mask intrinsic"),
clEnumValN(TailFoldingStyle::DataWithoutLaneMask,		clEnumValN(TailFoldingStyle::DataWithoutLaneMask,
"data-without-lane-mask",		"data-without-lane-mask",
"Create lane mask with compare/stepvector"),		"Create lane mask with compare/stepvector"),
clEnumValN(TailFoldingStyle::DataAndControlFlow, "data-and-control",		clEnumValN(TailFoldingStyle::DataAndControlFlow, "data-and-control",
"Create lane mask using active.lane.mask intrinsic, and use "		"Create lane mask using active.lane.mask intrinsic, and use "
"it for both data and control flow"),		"it for both data and control flow"),
clEnumValN(		clEnumValN(TailFoldingStyle::DataAndControlFlowWithoutRuntimeCheck,
TailFoldingStyle::DataAndControlFlowWithoutRuntimeCheck,
"data-and-control-without-rt-check",		"data-and-control-without-rt-check",
"Similar to data-and-control, but remove the runtime check")));		"Similar to data-and-control, but remove the runtime check"),
		clEnumValN(TailFoldingStyle::DataWithEVL, "data-with-evl",
		"Use predicated EVL instructions for tail folding if the "
		"target supports vector length predication")));

static cl::opt<bool> MaximizeBandwidth(		static cl::opt<bool> MaximizeBandwidth(
		AyalUnsubmitted Done Reply Inline Actions Drop "experimental"? Ayal: Drop "experimental"?
"vectorizer-maximize-bandwidth", cl::init(false), cl::Hidden,		"vectorizer-maximize-bandwidth", cl::init(false), cl::Hidden,
cl::desc("Maximize bandwidth when selecting vectorization factor which "		cl::desc("Maximize bandwidth when selecting vectorization factor which "
"will be determined by the smallest type in loop."));		"will be determined by the smallest type in loop."));

static cl::opt<bool> EnableInterleavedMemAccesses(		static cl::opt<bool> EnableInterleavedMemAccesses(
"enable-interleaved-mem-accesses", cl::init(false), cl::Hidden,		"enable-interleaved-mem-accesses", cl::init(false), cl::Hidden,
		hiradityaUnsubmitted Done Reply Inline Actions What is the fourth value? hiraditya: What is the fourth value?
cl::desc("Enable vectorization on interleaved memory accesses in a loop"));		cl::desc("Enable vectorization on interleaved memory accesses in a loop"));

/// An interleave-group may need masking if it resides in a block that needs		/// An interleave-group may need masking if it resides in a block that needs
/// predication, or in order to mask away gaps.		/// predication, or in order to mask away gaps.
static cl::opt<bool> EnableMaskedInterleavedMemAccesses(		static cl::opt<bool> EnableMaskedInterleavedMemAccesses(
		AyalUnsubmitted Done Reply Inline Actions Drop "experimental" and "which will be removed in [the] future"? Or drop altogether - is a switch enabling EVL for default target needed or suffice to test for concrete RISC-V EVL target? Ayal: Drop "experimental" and "which will be removed in [the] future"? Or drop altogether - is a…
"enable-masked-interleaved-mem-accesses", cl::init(false), cl::Hidden,		"enable-masked-interleaved-mem-accesses", cl::init(false), cl::Hidden,
cl::desc("Enable vectorization on masked interleaved memory accesses in a loop"));		cl::desc("Enable vectorization on masked interleaved memory accesses in a loop"));

static cl::opt<unsigned> TinyTripCountInterleaveThreshold(		static cl::opt<unsigned> TinyTripCountInterleaveThreshold(
"tiny-trip-count-interleave-threshold", cl::init(128), cl::Hidden,		"tiny-trip-count-interleave-threshold", cl::init(128), cl::Hidden,
cl::desc("We don't interleave loops with a estimated constant trip count "		cl::desc("We don't interleave loops with a estimated constant trip count "
"below this number"));		"below this number"));

		zlxu-rivaiUnsubmitted Done Reply Inline Actions IMHO, the vector length predication could be regarded as an another form of tail-folding, but not like ARM SVE which is based on `active.lane.mask` but based on an integer as vector length. Based on this notion of unification, it would be better to augment the enum TailFoldingStyle, for example, add new entries like `TailFoldingStyle::DataWithEVL` (serving the purpose of `EVLOption::IfEVLSupport`) and the existent `TailFoldingStyle::None` can be reused as `EVLOption::NoPredication`. zlxu-rivai: IMHO, the vector length predication could be regarded as an another form of tail-folding, but…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Not necessarily. Generally speaking, we would like to support vectorized loop remainder in the future as one of the alternative solutions, so better to keep them separate. ABataev: Not necessarily. Generally speaking, we would like to support vectorized loop remainder in the…
		zlxu-rivaiUnsubmitted Done Reply Inline Actions Forgive my ignorance, is there any scenario that epilogue vectorization is profitable when EVL is support? since in my understanding, eliminate loop epilogue is an important point of EVL. zlxu-rivai: Forgive my ignorance, is there any scenario that epilogue vectorization is profitable when EVL…
		ABataevAuthorUnsubmitted Done Reply Inline Actions It may be, at least for some time. Some of the optimizations work better for countable loops. It requires some extra work and time, before that, at least, vectorized loop remainder may be faster in some cases. ABataev: It may be, at least for some time. Some of the optimizations work better for countable loops.
		AyalUnsubmitted Not Done Reply Inline Actions Do you mean to say "Some optimizations work better for an invariant VF", i.e., that of VL0, rather than "for countable loops"? Vectorizing tail with EVL may still produce countable vector loops, as noted above. Ayal: Do you mean to say "Some optimizations work better for an invariant VF", i.e., that of VL0…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Currently, vectorization with EVL will result in an uncountable loop. Tail vectorization is not always the best option for some targets ABataev: Currently, vectorization with EVL will result in an uncountable loop. Tail vectorization is not…
		AyalUnsubmitted Done Reply Inline Actions (zlxu-rivai) IMHO, the vector length predication could be regarded as an another form of tail-folding, but not like ARM SVE which is based on active.lane.mask but based on an integer as vector length. Based on this notion of unification, it would be better to augment the enum TailFoldingStyle, for example, add new entries like TailFoldingStyle::DataWithEVL (serving the purpose of EVLOption::IfEVLSupport) and the existent TailFoldingStyle::None can be reused as EVLOption::NoPredication. (ABataev) Not necessarily. Generally speaking, we would like to support vectorized loop remainder in the future as one of the alternative solutions, so better to keep them separate. I agree with zlxu-rivai, this patch deals with yet another Style of tail folding, as in the proposed `DataWithEVL`. If/when EVL is desired w/o tail folding, a separate patch with potential options should be discussed. Ayal: >> (zlxu-rivai) IMHO, the vector length predication could be regarded as an another form of…
static cl::opt<unsigned> ForceTargetNumScalarRegs(		static cl::opt<unsigned> ForceTargetNumScalarRegs(
"force-target-num-scalar-regs", cl::init(0), cl::Hidden,		"force-target-num-scalar-regs", cl::init(0), cl::Hidden,
cl::desc("A flag that overrides the target's number of scalar registers."));		cl::desc("A flag that overrides the target's number of scalar registers."));
		bmahjourUnsubmitted Done Reply Inline Actions don't see ForceAVLSupport being used anywhere in this patch. bmahjour: don't see ForceAVLSupport being used anywhere in this patch.
		vkmrUnsubmitted Done Reply Inline Actions Not explicitly but implicitly as the default choice if other options do not match. See the test cases for more clarity on how it works and what it results in. vkmr: Not explicitly but implicitly as the default choice if other options do not match. See the test…

static cl::opt<unsigned> ForceTargetNumVectorRegs(		static cl::opt<unsigned> ForceTargetNumVectorRegs(
		AyalUnsubmitted Done Reply Inline Actions Description talks about tail-folding, name of variable and command-line argument do not. Ayal: Description talks about tail-folding, name of variable and command-line argument do not.
		fhahnUnsubmitted Not Done Reply Inline Actions Looks like the description still needs updating? fhahn: Looks like the description still needs updating?
		AyalUnsubmitted Done Reply Inline Actions Looks like it. Tail folding is "enabled", "forced", or absent. Ayal: Looks like it. Tail folding is "enabled", "forced", or absent.
"force-target-num-vector-regs", cl::init(0), cl::Hidden,		"force-target-num-vector-regs", cl::init(0), cl::Hidden,
cl::desc("A flag that overrides the target's number of vector registers."));		cl::desc("A flag that overrides the target's number of vector registers."));

static cl::opt<unsigned> ForceTargetMaxScalarInterleaveFactor(		static cl::opt<unsigned> ForceTargetMaxScalarInterleaveFactor(
"force-target-max-scalar-interleave", cl::init(0), cl::Hidden,		"force-target-max-scalar-interleave", cl::init(0), cl::Hidden,
cl::desc("A flag that overrides the target's max interleave factor for "		cl::desc("A flag that overrides the target's max interleave factor for "
"scalar loops."));		"scalar loops."));

▲ Show 20 Lines • Show All 317 Lines • ▼ Show 20 Lines	public:

/// Used to set the trip count after ILV's construction and after the		/// Used to set the trip count after ILV's construction and after the
/// preheader block has been executed. Note that this always holds the trip		/// preheader block has been executed. Note that this always holds the trip
/// count of the original loop for both main loop and epilogue vectorization.		/// count of the original loop for both main loop and epilogue vectorization.
void setTripCount(Value *TC) { TripCount = TC; }		void setTripCount(Value *TC) { TripCount = TC; }

protected:		protected:
friend class LoopVectorizationPlanner;		friend class LoopVectorizationPlanner;

		bmahjourUnsubmitted Not Done Reply Inline Actions Could this be moved to the `VPWidenEVLRecipe` class? bmahjour: Could this be moved to the `VPWidenEVLRecipe` class?
		vkmrUnsubmitted Done Reply Inline Actions This is called from the `execute` method of the recipe and is responsible for generating IR instructions to compute `EVL`. Keeping it in `InnerLoopVectorizer` is consistent with how the widening methods for other recipes are a part of `InnerLoopVectorizer`. vkmr: This is called from the `execute` method of the recipe and is responsible for generating IR…
/// A small list of PHINodes.		/// A small list of PHINodes.
using PhiVector = SmallVector<PHINode *, 4>;		using PhiVector = SmallVector<PHINode *, 4>;

/// A type for scalarized values in the new loop. Each value from the		/// A type for scalarized values in the new loop. Each value from the
/// original loop, when scalarized, is represented by UF x VF scalar values		/// original loop, when scalarized, is represented by UF x VF scalar values
/// in the new unrolled loop, where UF is the unroll factor and VF is the		/// in the new unrolled loop, where UF is the unroll factor and VF is the
/// vectorization factor.		/// vectorization factor.
using ScalarParts = SmallVector<SmallVector<Value *, 4>, 2>;		using ScalarParts = SmallVector<SmallVector<Value *, 4>, 2>;
▲ Show 20 Lines • Show All 479 Lines • ▼ Show 20 Lines	while (!Worklist.empty()) {

// Prune search if we find another recipe generating a widen memory		// Prune search if we find another recipe generating a widen memory
// instruction. Widen memory instructions involved in address computation		// instruction. Widen memory instructions involved in address computation
// will lead to gather/scatter instructions, which don't need to be		// will lead to gather/scatter instructions, which don't need to be
// handled.		// handled.
if (isa<VPWidenMemoryInstructionRecipe>(CurRec) \|\|		if (isa<VPWidenMemoryInstructionRecipe>(CurRec) \|\|
isa<VPInterleaveRecipe>(CurRec) \|\|		isa<VPInterleaveRecipe>(CurRec) \|\|
isa<VPScalarIVStepsRecipe>(CurRec) \|\|		isa<VPScalarIVStepsRecipe>(CurRec) \|\|
isa<VPCanonicalIVPHIRecipe>(CurRec) \|\|		isa<VPHeaderPHIRecipe>(CurRec))
isa<VPActiveLaneMaskPHIRecipe>(CurRec))
continue;		continue;
		AyalUnsubmitted Done Reply Inline Actions Better check if CurRec is a header PHI recipe? Ayal: Better check if CurRec is a header PHI recipe?

// This recipe contributes to the address computation of a widen		// This recipe contributes to the address computation of a widen
// load/store. If the underlying instruction has poison-generating flags,		// load/store. If the underlying instruction has poison-generating flags,
// drop them directly.		// drop them directly.
if (auto *RecWithFlags = dyn_cast<VPRecipeWithIRFlags>(CurRec)) {		if (auto *RecWithFlags = dyn_cast<VPRecipeWithIRFlags>(CurRec)) {
RecWithFlags->dropPoisonGeneratingFlags();		RecWithFlags->dropPoisonGeneratingFlags();
} else {		} else {
Instruction *Instr = dyn_cast_or_null<Instruction>(		Instruction *Instr = dyn_cast_or_null<Instruction>(
▲ Show 20 Lines • Show All 530 Lines • ▼ Show 20 Lines	public:

/// Returns true if the instructions in this block requires predication		/// Returns true if the instructions in this block requires predication
/// for any reason, e.g. because tail folding now requires a predicate		/// for any reason, e.g. because tail folding now requires a predicate
/// or because the block in the original loop was predicated.		/// or because the block in the original loop was predicated.
bool blockNeedsPredicationForAnyReason(BasicBlock *BB) const {		bool blockNeedsPredicationForAnyReason(BasicBlock *BB) const {
return foldTailByMasking() \|\| Legal->blockNeedsPredication(BB);		return foldTailByMasking() \|\| Legal->blockNeedsPredication(BB);
}		}

		/// Returns true if VP intrinsics with explicit vector length support should
		/// be generated in the tail folded loop.
		hiradityaUnsubmitted Done Reply Inline Actions nit: IMO the function name doesn't imply the semantics precisely. nit: reorder to aviod call if possible cpp PreferVPIntrinsics && foldTailByMasking(); hiraditya: nit: IMO the function name doesn't imply the semantics precisely. nit: reorder to aviod call…
		AyalUnsubmitted Done Reply Inline Actions +1 The acronym `VP` is overloaded here and mainly short for "VPlan", as in VPInstruction. Perhaps `VPI` would be clearer, short for VP Intrinsics. Seems to mean "foldTailByEVL()" or "foldTailByVPI()"? Ayal: +1 The acronym `VP` is overloaded here and mainly short for "VPlan", as in VPInstruction.
		fhahnUnsubmitted Not Done Reply Inline Actions Agreed, it should be possible to use VP intrinsics without EVL, so it would be good to have the name reflect that this is focused about introducing EVL? Same for the user-exposed options. fhahn: Agreed, it should be possible to use VP intrinsics without EVL, so it would be good to have the…
		AyalUnsubmitted Done Reply Inline Actions Reiterating the suggestion to use `VPI` to denote `llvm.vp.` vector predicate intrinsicts, if/where needed, and "foldTailByEVL()" for deciding to use EVL style tail-folding. Ayal:* Reiterating the suggestion to use `VPI` to denote `llvm.vp.*` vector predicate intrinsicts…
		bool useVPIWithVPEVLVectorization() const {
		return PreferEVL && !EnableVPlanNativePath &&
		fhahnUnsubmitted Not Done Reply Inline Actions Looks like this needs a test and the TODO from the recipe should probably go here to make clear why there's this restriction for now fhahn: Looks like this needs a test and the TODO from the recipe should probably go here to make clear…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Will add a test ABataev: Will add a test
		AyalUnsubmitted Done Reply Inline Actions nit: move this TODO into a FIXME next to checking isSafeForAnyVectorWidth as other cases? Ayal: nit: move this TODO into a FIXME next to checking isSafeForAnyVectorWidth as other cases?
		getTailFoldingStyle() == TailFoldingStyle::DataWithEVL &&
		// FIXME: implement support for max safe dependency distance.
		shiva0217Unsubmitted Done Reply Inline Actions If the loop contains reduction variables, there might need a mask to merge the last two iteration results. int a[128]; int foo (int end) { int size = 0; for (int i = 0; i < end; i++) size += a[i]; return size; } Should the case be guarded by `Legal->getReductionVars().empty() &&` ? shiva0217: If the loop contains reduction variables, there might need a mask to merge the last two…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Added ABataev: Added
		Legal->isSafeForAnyVectorWidth() &&
		// FIXME: remove this once reductions are supported.
		Legal->getReductionVars().empty() &&
		// FIXME: remove this once vp_reverse is supported.
		none_of(
		WideningDecisions,
		[](const std::pair<std::pair<Instruction *, ElementCount>,
		std::pair<InstWidening, InstructionCost>>
		&Data) { return Data.second.first == CM_Widen_Reverse; });
		}

/// Returns true if the Phi is part of an inloop reduction.		/// Returns true if the Phi is part of an inloop reduction.
bool isInLoopReduction(PHINode *Phi) const {		bool isInLoopReduction(PHINode *Phi) const {
return InLoopReductions.contains(Phi);		return InLoopReductions.contains(Phi);
}		}

/// Estimate cost of an intrinsic call instruction CI if it were vectorized		/// Estimate cost of an intrinsic call instruction CI if it were vectorized
/// with factor VF. Return the cost of the instruction, including		/// with factor VF. Return the cost of the instruction, including
/// scalarization overhead if it's needed.		/// scalarization overhead if it's needed.
▲ Show 20 Lines • Show All 129 Lines • ▼ Show 20 Lines	private:
/// or as a peel-loop to handle gaps in interleave-groups.		/// or as a peel-loop to handle gaps in interleave-groups.
/// Under optsize and when the trip count is very small we don't allow any		/// Under optsize and when the trip count is very small we don't allow any
/// iterations to execute in the scalar loop.		/// iterations to execute in the scalar loop.
ScalarEpilogueLowering ScalarEpilogueStatus = CM_ScalarEpilogueAllowed;		ScalarEpilogueLowering ScalarEpilogueStatus = CM_ScalarEpilogueAllowed;

/// All blocks of loop are to be masked to fold tail of scalar iterations.		/// All blocks of loop are to be masked to fold tail of scalar iterations.
bool CanFoldTailByMasking = false;		bool CanFoldTailByMasking = false;

		/// Control whether to generate VP intrinsics with explicit-vector-length
		/// support in vectorized code.
		fhahnUnsubmitted Not Done Reply Inline Actions This is only for using EVL for now, would be good to clarify name and comment. fhahn: This is only for using EVL for now, would be good to clarify name and comment.
		AyalUnsubmitted Not Done Reply Inline Actions +1 Suffice to say `PreferEVL`, as that implies `VPI`. Ayal: +1 Suffice to say `PreferEVL`, as that implies `VPI`.
		bool PreferEVL = false;

/// A map holding scalar costs for different vectorization factors. The		/// A map holding scalar costs for different vectorization factors. The
/// presence of a cost for an instruction in the mapping indicates that the		/// presence of a cost for an instruction in the mapping indicates that the
/// instruction will be scalarized when vectorizing with the associated		/// instruction will be scalarized when vectorizing with the associated
/// vectorization factor. The entries are VF-ScalarCostTy pairs.		/// vectorization factor. The entries are VF-ScalarCostTy pairs.
DenseMap<ElementCount, ScalarCostsTy> InstsToScalarize;		DenseMap<ElementCount, ScalarCostsTy> InstsToScalarize;

/// Holds the instructions known to be uniform after vectorization.		/// Holds the instructions known to be uniform after vectorization.
/// The data is collected per VF.		/// The data is collected per VF.
▲ Show 20 Lines • Show All 891 Lines • ▼ Show 20 Lines	if (BlockInMask \|\| MaskForGaps) {
Group->getAlign(), GroupMask);		Group->getAlign(), GroupMask);
} else		} else
NewStoreInstr =		NewStoreInstr =
Builder.CreateAlignedStore(IVec, AddrParts[Part], Group->getAlign());		Builder.CreateAlignedStore(IVec, AddrParts[Part], Group->getAlign());

Group->addMetadata(NewStoreInstr);		Group->addMetadata(NewStoreInstr);
}		}
}		}

void InnerLoopVectorizer::scalarizeInstruction(const Instruction *Instr,		void InnerLoopVectorizer::scalarizeInstruction(const Instruction *Instr,
		fhahnUnsubmitted Not Done Reply Inline Actions Do we gain a lot of code-reuse by adding this to `::vectorizeMemoryInstruction`? Seems like it would be cleaner to handle predicated codegen in a separate function? fhahn: Do we gain a lot of code-reuse by adding this to `::vectorizeMemoryInstruction`? Seems like it…
		vkmrUnsubmitted Not Done Reply Inline Actions I am not too happy about not having a separate function for `::vectorizeMemoryInstruction`! But there is substantial code overlap. Perhaps a better approach would be to abstract out much of the shared code in a separate function and duplicate some parts in a separate `vectorizePredicatedMemoryInstruction`. vkmr: I am not too happy about not having a separate function for `::vectorizeMemoryInstruction`! But…
VPReplicateRecipe *RepRecipe,		VPReplicateRecipe *RepRecipe,
		bmahjourUnsubmitted Not Done Reply Inline Actions how do we guard against this situation? I think this should be part of the legality check...eg `isLegalMaskedLoad` check for `Legal->isConsecutivePtr(Ptr)`...if we had a function similar to isLegalMaskedLoad (say isLegalEVLLoad), then it could check for -1 stride and return false...then we'd be guaranteed not to hit this assert. bmahjour: how do we guard against this situation? I think this should be part of the legality check...eg…
		vkmrUnsubmitted Done Reply Inline Actions Do you mean the legality checker should determine if it's a reverse operation then it is illegal to use EVL predicated Load and should default to Masked load (if that is legal)? Eventually, there will be support for reverse with EVL predication and required legality checks will be added. For this PoC patch however, it is more explicit and straightforward to have an assert here. vkmr: Do you mean the legality checker should determine if it's a reverse operation then it is…
const VPIteration &Instance,		const VPIteration &Instance,
VPTransformState &State) {		VPTransformState &State) {
assert(!Instr->getType()->isAggregateType() && "Can't handle vectors");		assert(!Instr->getType()->isAggregateType() && "Can't handle vectors");
		craig.topperUnsubmitted Done Reply Inline Actions Same here craig.topper: Same here
		frasercrmckUnsubmitted Done Reply Inline Actions Does this need to be a `VectorType`? Can't you pass `StoredVal->getType()` straight through to `CreateIntrinsic`? frasercrmck: Does this need to be a `VectorType`? Can't you pass `StoredVal->getType()` straight through to…

// llvm.experimental.noalias.scope.decl intrinsics must only be duplicated for		// llvm.experimental.noalias.scope.decl intrinsics must only be duplicated for
// the first lane and part.		// the first lane and part.
		craig.topperUnsubmitted Done Reply Inline Actions I think you can use Builder.getInt32Ty craig.topper: I think you can use Builder.getInt32Ty
if (isa<NoAliasScopeDeclInst>(Instr))		if (isa<NoAliasScopeDeclInst>(Instr))
if (!Instance.isFirstIteration())		if (!Instance.isFirstIteration())
return;		return;

// Does this instruction return a value ?		// Does this instruction return a value ?
bool IsVoidRetTy = Instr->getType()->isVoidTy();		bool IsVoidRetTy = Instr->getType()->isVoidTy();

Instruction *Cloned = Instr->clone();		Instruction *Cloned = Instr->clone();
Show All 40 Lines

Value *		Value *
InnerLoopVectorizer::getOrCreateVectorTripCount(BasicBlock *InsertBlock) {		InnerLoopVectorizer::getOrCreateVectorTripCount(BasicBlock *InsertBlock) {
if (VectorTripCount)		if (VectorTripCount)
return VectorTripCount;		return VectorTripCount;

Value *TC = getTripCount();		Value *TC = getTripCount();
IRBuilder<> Builder(InsertBlock->getTerminator());		IRBuilder<> Builder(InsertBlock->getTerminator());

		fhahnUnsubmitted Done Reply Inline Actions Is there a test with multiple exits? fhahn: Is there a test with multiple exits?
		ABataevAuthorUnsubmitted Done Reply Inline Actions It currently not supported by this patch. This is just an inital patch, it won't handle all the corner cases. ABataev: It currently not supported by this patch. This is just an inital patch, it won't handle all the…
		fhahnUnsubmitted Not Done Reply Inline Actions Reading this back now, the check below is for loops requiring scalar epilogues, not necessarily multiple exits. Would be good to update the comment and also add a test to make sure this is captured. AFAICT this is something that the patch already handles and should be tested (e.g. a test with an interleave group that requires scalar epilogue) fhahn: Reading this back now, the check below is for loops requiring scalar epilogues, not necessarily…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Will add a test ABataev: Will add a test
		ABataevAuthorUnsubmitted Done Reply Inline Actions Will add a test for ultiple exits only, interleave groups are invalidated if foldTailByMasking() is true. ABataev: Will add a test for ultiple exits only, interleave groups are invalidated if foldTailByMasking…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Double checked it, currently we cannot have this situation at all, so converting it to assert. ABataev: Double checked it, currently we cannot have this situation at all, so converting it to assert.
		AyalUnsubmitted Not Done Reply Inline Actions interleave groups are invalidated if foldTailByMasking() is true unless useMaskedInterleavedAccesses() is also true? Ayal: > interleave groups are invalidated if foldTailByMasking() is true unless…
		ABataevAuthorUnsubmitted Done Reply Inline Actions I can return check back, but currently cannot provide a test, since the target invalidates the costs for interleaved groups when making cost-based decisions. ABataev: I can return check back, but currently cannot provide a test, since the target invalidates the…
Type *Ty = TC->getType();		Type *Ty = TC->getType();
// This is where we can make the step a runtime constant.		// This is where we can make the step a runtime constant.
Value *Step = createStepForVF(Builder, Ty, VF, UF);		Value *Step = createStepForVF(Builder, Ty, VF, UF);
		AyalUnsubmitted Not Done Reply Inline Actions What is the meaning of setting VectorTripCount = TC? Ayal: What is the meaning of setting VectorTripCount = TC?
		ABataevAuthorUnsubmitted Done Reply Inline Actions We don't need to round vector trip count, so just set it to original trip count value (EVL is adjusted automatically to be not larger than trip count). ABataev: We don't need to round vector trip count, so just set it to original trip count value (EVL is…

// If the tail is to be folded by masking, round the number of iterations N		// If the tail is to be folded by masking, round the number of iterations N
// up to a multiple of Step instead of rounding down. This is done by first		// up to a multiple of Step instead of rounding down. This is done by first
// adding Step-1 and then rounding down. Note that it's ok if this addition		// adding Step-1 and then rounding down. Note that it's ok if this addition
// overflows: the vector induction variable will eventually wrap to zero given		// overflows: the vector induction variable will eventually wrap to zero given
// that it starts at zero and its Step is a power of two; the loop will then		// that it starts at zero and its Step is a power of two; the loop will then
// exit, with the last early-exit vector comparison also producing all-true.		// exit, with the last early-exit vector comparison also producing all-true.
// For scalable vectors the VF is not guaranteed to be a power of 2, but this		// For scalable vectors the VF is not guaranteed to be a power of 2, but this
▲ Show 20 Lines • Show All 1,091 Lines • ▼ Show 20 Lines	while (!Worklist.empty()) {
// need to iterate.		// need to iterate.
Changed = true;		Changed = true;
}		}
} while (Changed);		} while (Changed);
}		}

void InnerLoopVectorizer::fixNonInductionPHIs(VPlan &Plan,		void InnerLoopVectorizer::fixNonInductionPHIs(VPlan &Plan,
VPTransformState &State) {		VPTransformState &State) {
auto Iter = vp_depth_first_deep(Plan.getEntry());		auto Iter = vp_depth_first_deep(Plan.getEntry());
		bmahjourUnsubmitted Not Done Reply Inline Actions I guess the targets that don't need/have the concept of predicated binary operations, must provide lowering for all these calls to ultimately generate non-predicated vector code. I'd expect that to be a large effort with little gain. To allow the EVL exploitation while the lowering is being provided, would it make sense to have a path in this function where we just fall back to `InnerLoopVectorizer::widenInstruction` under an option? bmahjour: I guess the targets that don't need/have the concept of predicated binary operations, must…
		simollUnsubmitted Not Done Reply Inline Actions D78203 implements lowering from VP intrinsics to regular SIMD instructions for all targets. If targets support VP, they should get it. If IR with VP intrinsics ends up getting compiled for a non-VP targets, the intrinsics will disappear before they hit lowering. simoll: D78203 implements lowering from VP intrinsics to regular SIMD instructions for all targets. If…
		vkmrUnsubmitted Done Reply Inline Actions To add to @simoll's comment, eventually we will be using the `VectorBuilder` (earlier `VPBuilder`) to widen to VP intrinsics instead of manually creating each intrinsic here. `VectorBuilder` can also make some decisions on when and how to use the intrinsics. vkmr: To add to @simoll's comment, eventually we will be using the `VectorBuilder` (earlier…
for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(Iter)) {		for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(Iter)) {
for (VPRecipeBase &P : VPBB->phis()) {		for (VPRecipeBase &P : VPBB->phis()) {
VPWidenPHIRecipe *VPPhi = dyn_cast<VPWidenPHIRecipe>(&P);		VPWidenPHIRecipe *VPPhi = dyn_cast<VPWidenPHIRecipe>(&P);
if (!VPPhi)		if (!VPPhi)
continue;		continue;
PHINode *NewPhi = cast<PHINode>(State.get(VPPhi, 0));		PHINode *NewPhi = cast<PHINode>(State.get(VPPhi, 0));
// Make sure the builder has a valid insert point.		// Make sure the builder has a valid insert point.
Builder.SetInsertPoint(NewPhi);		Builder.SetInsertPoint(NewPhi);
Show All 24 Lines	void LoopVectorizationCostModel::collectLoopScalars(ElementCount VF) {
if (VF.isScalable()) {		if (VF.isScalable()) {
Scalars[VF].insert(Uniforms[VF].begin(), Uniforms[VF].end());		Scalars[VF].insert(Uniforms[VF].begin(), Uniforms[VF].end());
return;		return;
}		}

SmallSetVector<Instruction *, 8> Worklist;		SmallSetVector<Instruction *, 8> Worklist;

// These sets are used to seed the analysis with pointers used by memory		// These sets are used to seed the analysis with pointers used by memory
// accesses that will remain scalar.		// accesses that will remain scalar.
		frasercrmckUnsubmitted Done Reply Inline Actions Is this comment out of place? frasercrmck: Is this comment out of place?
SmallSetVector<Instruction *, 8> ScalarPtrs;		SmallSetVector<Instruction *, 8> ScalarPtrs;
SmallPtrSet<Instruction *, 8> PossibleNonScalarPtrs;		SmallPtrSet<Instruction *, 8> PossibleNonScalarPtrs;
auto *Latch = TheLoop->getLoopLatch();		auto *Latch = TheLoop->getLoopLatch();

// A helper that returns true if the use of Ptr by MemAccess will be scalar.		// A helper that returns true if the use of Ptr by MemAccess will be scalar.
// The pointer operands of loads and stores will be scalar as long as the		// The pointer operands of loads and stores will be scalar as long as the
// memory access is not a gather or scatter operation. The value operand of a		// memory access is not a gather or scatter operation. The value operand of a
// store will remain scalar if the store is scalarized.		// store will remain scalar if the store is scalarized.
▲ Show 20 Lines • Show All 175 Lines • ▼ Show 20 Lines	bool LoopVectorizationCostModel::isScalarWithPredication(
case Instruction::Load:		case Instruction::Load:
case Instruction::Store: {		case Instruction::Store: {
auto *Ptr = getLoadStorePointerOperand(I);		auto *Ptr = getLoadStorePointerOperand(I);
auto *Ty = getLoadStoreType(I);		auto *Ty = getLoadStoreType(I);
Type *VTy = Ty;		Type *VTy = Ty;
if (VF.isVector())		if (VF.isVector())
VTy = VectorType::get(Ty, VF);		VTy = VectorType::get(Ty, VF);
const Align Alignment = getLoadStoreAlignment(I);		const Align Alignment = getLoadStoreAlignment(I);
return isa<LoadInst>(I) ? !(isLegalMaskedLoad(Ty, Ptr, Alignment) \|\|		return isa<LoadInst>(I) ? !(isLegalMaskedLoad(Ty, Ptr, Alignment) \|\|
		AyalUnsubmitted Done Reply Inline Actions Shouldn't/Doesn't ForceEVLSupport imply useVPWithVPEVLVectorization()? Other cases of using EVL can scalarize-and-predicate loads and stores? Early-return earlier. Ayal: Shouldn't/Doesn't ForceEVLSupport imply useVPWithVPEVLVectorization()? Other cases of using EVL…
TTI.isLegalMaskedGather(VTy, Alignment))		TTI.isLegalMaskedGather(VTy, Alignment))
: !(isLegalMaskedStore(Ty, Ptr, Alignment) \|\|		: !(isLegalMaskedStore(Ty, Ptr, Alignment) \|\|
TTI.isLegalMaskedScatter(VTy, Alignment));		TTI.isLegalMaskedScatter(VTy, Alignment));
}		}
case Instruction::UDiv:		case Instruction::UDiv:
case Instruction::SDiv:		case Instruction::SDiv:
case Instruction::SRem:		case Instruction::SRem:
case Instruction::URem: {		case Instruction::URem: {
▲ Show 20 Lines • Show All 748 Lines • ▼ Show 20 Lines	if (Rem->isZero()) {
return MaxFactors;		return MaxFactors;
}		}
}		}

// If we don't know the precise trip count, or if the trip count that we		// If we don't know the precise trip count, or if the trip count that we
// found modulo the vectorization factor is not zero, try to fold the tail		// found modulo the vectorization factor is not zero, try to fold the tail
// by masking.		// by masking.
// FIXME: look for a smaller MaxVF that does divide TC rather than masking.		// FIXME: look for a smaller MaxVF that does divide TC rather than masking.
if (Legal->prepareToFoldTailByMasking()) {		if (Legal->prepareToFoldTailByMasking()) {
		AyalUnsubmitted Not Done Reply Inline Actions The logic of this method is getting excessively complicated. Suggest to turn this into an early-exiting if (!Legal->prepareToFoldTailByMasking()). Ayal: The logic of this method is getting excessively complicated. Suggest to turn this into an early…
CanFoldTailByMasking = true;		CanFoldTailByMasking = true;
		if (getTailFoldingStyle() == TailFoldingStyle::None)
		return MaxFactors;

		if (UserIC > 1) {
		LLVM_DEBUG(dbgs() << "LV: Preference for VP intrinsics indicated. Will "
		fhahnUnsubmitted Done Reply Inline Actions It looks like there may not be a test for this code path? fhahn: It looks like there may not be a test for this code path?
		ABataevAuthorUnsubmitted Done Reply Inline Actions I'll check what we can do about it. ABataev: I'll check what we can do about it.
		AyalUnsubmitted Done Reply Inline Actions Is there a test? Message intends to say that preference was indicated but ignored, i.e., tail will be folded (with UF>1) but w/o VP intrinsics - w/ or w/o EVL? Ayal: Is there a test? Message intends to say that preference was indicated but ignored, i.e., tail…
		"not generate VP intrinsics since interleave count "
		"specified is greater than 1.\n");
		AyalUnsubmitted Not Done Reply Inline Actions It may be confusing to keep `PreferVPIntrinsics = false` when "Preference for VP intrinsics indicated". The (names and) relation between `PreferPredicateWithVPIntrinsics` and `PreferVPIntrinsics` should be clarified. Separate the part that sets PreferVPIntrinsics, possible switching on PreferPredicateWithVPIntrinsics? Ayal: It may be confusing to keep `PreferVPIntrinsics = false` when "Preference for VP intrinsics…
		ABataevAuthorUnsubmitted Done Reply Inline Actions PreferPredicateWithVPIntrinsics is just the option, that controls, whether we can emit vp intrinsics at all. If later we see, that it is not possible for some reason, we do not do it to still be able to produce correct code. I don't see what else should be adjusted here ABataev: PreferPredicateWithVPIntrinsics is just the option, that controls, whether we can emit vp…
		return MaxFactors;
		}

		if (MaxFactors.ScalableVF.isVector()) {
		assert(MaxFactors.ScalableVF.isScalable() &&
		AyalUnsubmitted Done Reply Inline Actions ScalableVF is surely scalable, assert rather than ask? isNonZero or isVector? Ayal: ScalableVF is surely scalable, assert rather than ask? isNonZero or isVector?
		"Expected scalable vector factor.");
		// FIXME: use actual opcode/data type for analysis here.
		AyalUnsubmitted Not Done Reply Inline Actions Add expected fail tests to cover opcode/data types that are not supported nor checked? Ayal: Add expected fail tests to cover opcode/data types that are not supported nor checked?
		ABataevAuthorUnsubmitted Done Reply Inline Actions Currently there are no such test, since we support only loads/stores, which are supported by all potential targets ABataev: Currently there are no such test, since we support only loads/stores, which are supported by…
		PreferEVL = getTailFoldingStyle() == TailFoldingStyle::DataWithEVL &&
		AyalUnsubmitted Done Reply Inline Actions This method already has an undesired side-effect of setting CanFoldTailByMasking independent of returning MaxFactors, better avoid having it set yet another side-effect PreferVPWithVPEVLIntrinsics. Perhaps extend CanFoldTailByMasking from a bool to also indicate how the tail is folded, and/or extend the return value to carry more than a FixedScalableVFPair? Logic can be simplified into PreferVPWithVPEVLIntrinsics = PreferPredicateWithVPEVLIntrinsics == EVLOption::IfEVLSupported \|\| TTI.hasActiveVectorLength(0, nullptr, Align()); the rest is debug prints placed under !NDEBUG, or a single LLVM_DEBUG with some ternary ? selecting if the preference indicated is followed or ignored. Ayal: This method already has an undesired side-effect of setting CanFoldTailByMasking independent of…
		TTI.hasActiveVectorLength(0, nullptr, Align());
		#if !NDEBUG
		if (getTailFoldingStyle() == TailFoldingStyle::DataWithEVL) {
		AyalUnsubmitted Done Reply Inline Actions May as well also inform if the target supports it or not? Ayal: May as well also inform if the target supports it or not?
		if (PreferEVL)
		dbgs() << "LV: Preference for VP intrinsics indicated. Will "
		"try to generate VP Intrinsics.\n";
		AyalUnsubmitted Not Done Reply Inline Actions "if the target support vector length predication" - we already checked that it does? Ayal: "if the target support vector length predication" - we already checked that it does?
		else
		dbgs() << "LV: Preference for VP intrinsics indicated. Will "
		"not try to generate VP Intrinsics since the target "
		"does not support vector length predication.\n";
		fhahnUnsubmitted Done Reply Inline Actions Would be good to document what is going on here (effectively disabling fixed with vectorization and why) fhahn: Would be good to document what is going on here (effectively disabling fixed with vectorization…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Ok, will add ABataev: Ok, will add
		AyalUnsubmitted Not Done Reply Inline Actions So EVL is applied to scalable VFs only, right? Ayal: So EVL is applied to scalable VFs only, right?
		ABataevAuthorUnsubmitted Done Reply Inline Actions For now - yes. ABataev: For now - yes.
		}
		#endif // !NDEBUG

		// Tail folded loop using VP intrinsics restricts the VF to be scalable.
		if (PreferEVL)
		MaxFactors.FixedVF = ElementCount::getFixed(1);
		}
		AyalUnsubmitted Done Reply Inline Actions This deserves a comment, explaining that a tail folded using VP intrinsics restricts the VF to be scalable. Ayal: This deserves a comment, explaining that a tail folded using VP intrinsics restricts the VF to…

return MaxFactors;		return MaxFactors;
}		}

// If there was a tail-folding hint/switch, but we can't fold the tail by		// If there was a tail-folding hint/switch, but we can't fold the tail by
// masking, fallback to a vectorization with a scalar epilogue.		// masking, fallback to a vectorization with a scalar epilogue.
if (ScalarEpilogueStatus == CM_ScalarEpilogueNotNeededUsePredicate) {		if (ScalarEpilogueStatus == CM_ScalarEpilogueNotNeededUsePredicate) {
LLVM_DEBUG(dbgs() << "LV: Cannot fold tail by masking: vectorize with a "		LLVM_DEBUG(dbgs() << "LV: Cannot fold tail by masking: vectorize with a "
"scalar epilogue instead.\n");		"scalar epilogue instead.\n");
▲ Show 20 Lines • Show All 594 Lines • ▼ Show 20 Lines	LoopVectorizationCostModel::selectInterleaveCount(ElementCount VF,
// 2. If the loop is really small, then we interleave to reduce the loop		// 2. If the loop is really small, then we interleave to reduce the loop
// overhead.		// overhead.
// 3. We don't interleave if we think that we will spill registers to memory		// 3. We don't interleave if we think that we will spill registers to memory
// due to the increased register pressure.		// due to the increased register pressure.

if (!isScalarEpilogueAllowed())		if (!isScalarEpilogueAllowed())
return 1;		return 1;

		// Do not interleave if EVL is preferred and no User IC is specified.
		fhahnUnsubmitted Not Done Reply Inline Actions nit: if EVL is preferred? fhahn: nit: if EVL is preferred?
		if (useVPIWithVPEVLVectorization())
		return 1;

// We used the distance for the interleave count.		// We used the distance for the interleave count.
if (!Legal->isSafeForAnyVectorWidth())		if (!Legal->isSafeForAnyVectorWidth())
return 1;		return 1;

auto BestKnownTC = getSmallBestKnownTC(*PSE.getSE(), TheLoop);		auto BestKnownTC = getSmallBestKnownTC(*PSE.getSE(), TheLoop);
const bool HasReductions = !Legal->getReductionVars().empty();		const bool HasReductions = !Legal->getReductionVars().empty();
// Do not interleave loops with a relatively small known or estimated trip		// Do not interleave loops with a relatively small known or estimated trip
// count. But we will interleave when InterleaveSmallLoopScalarReduction is		// count. But we will interleave when InterleaveSmallLoopScalarReduction is
▲ Show 20 Lines • Show All 2,576 Lines • ▼ Show 20 Lines	VPValue VPRecipeBuilder::createEdgeMask(BasicBlock Src, BasicBlock *Dst,

return EdgeMaskCache[Edge] = EdgeMask;		return EdgeMaskCache[Edge] = EdgeMask;
}		}

void VPRecipeBuilder::createHeaderMask(VPlan &Plan) {		void VPRecipeBuilder::createHeaderMask(VPlan &Plan) {
BasicBlock *Header = OrigLoop->getHeader();		BasicBlock *Header = OrigLoop->getHeader();

// When not folding the tail, use nullptr to model all-true mask.		// When not folding the tail, use nullptr to model all-true mask.
if (!CM.foldTailByMasking()) {		if (!CM.foldTailByMasking()) {
		fhahnUnsubmitted Not Done Reply Inline Actions Better to replace the mask together with introducing EVL to make sure EVL gets added when the mask gets removed? fhahn: Better to replace the mask together with introducing EVL to make sure EVL gets added when the…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Currently it will require some extra work. We'll need to handle both cases, with activelane instrnsics and direct comparison. Would be possible to keep it for now and fix it once you land emission of activelane intrinsic in VPlan-toVPlan transform? ABataev: Currently it will require some extra work. We'll need to handle both cases, with activelane…
		fhahnUnsubmitted Not Done Reply Inline Actions With the latest version, can the `useVPWithVPEVLVectorization` part be dropped (if the transform is updated to remove the mask from load/stores)? fhahn: With the latest version, can the `useVPWithVPEVLVectorization` part be dropped (if the…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Not quite, it will require an extra VPValue, something like VPAllTrueMask, which should replace IV <= BTC. Shall I add it? ABataev: Not quite, it will require an extra VPValue, something like VPAllTrueMask, which should replace…
		fhahnUnsubmitted Not Done Reply Inline Actions Would a live in `i1 true` work? I think that may work as is. As EVL is only used for lowering of loads/stores at the moment, it should be only removed there for now? fhahn: Would a live in `i1 true` work? I think that may work as is. As EVL is only used for lowering…
		ABataevAuthorUnsubmitted Done Reply Inline Actions You mean scalar i1 true? ABataev: You mean scalar i1 true?
		fhahnUnsubmitted Not Done Reply Inline Actions yes, that should be broadcasted across all vector lanes fhahn: yes, that should be broadcasted across all vector lanes
		ABataevAuthorUnsubmitted Done Reply Inline Actions I don't like that we won't match actual type here. I thought about other possible solution - overload VPActiaveLaneMask and make it return all-true mask for RVL targets. WDYT? ABataev: I don't like that we won't match actual type here. I thought about other possible solution…
BlockMaskCache[Header] = nullptr;		BlockMaskCache[Header] = nullptr;
return;		return;
}		}

// Introduce the early-exit compare IV <= BTC to form header block mask.		// Introduce the early-exit compare IV <= BTC to form header block mask.
// This is used instead of IV < TC because TC may wrap, unlike BTC. Start by		// This is used instead of IV < TC because TC may wrap, unlike BTC. Start by
// constructing the desired canonical IV in the header block as its first		// constructing the desired canonical IV in the header block as its first
// non-phi instructions.		// non-phi instructions.
Show All 19 Lines	VPValue VPRecipeBuilder::createBlockInMask(BasicBlock BB, VPlan &Plan) {
if (BCEntryIt != BlockMaskCache.end())		if (BCEntryIt != BlockMaskCache.end())
return BCEntryIt->second;		return BCEntryIt->second;

assert(OrigLoop->getHeader() != BB &&		assert(OrigLoop->getHeader() != BB &&
"Loop header must have cached block mask");		"Loop header must have cached block mask");

// All-one mask is modelled as no-mask following the convention for masked		// All-one mask is modelled as no-mask following the convention for masked
// load/store/gather/scatter. Initialize BlockMask to no-mask.		// load/store/gather/scatter. Initialize BlockMask to no-mask.
VPValue *BlockMask = nullptr;		VPValue *BlockMask = nullptr;
// This is the block mask. We OR all incoming edges.		// This is the block mask. We OR all incoming edges.
		AyalUnsubmitted Not Done Reply Inline Actions If `useVPVectorization()` is another reason to predicate the header BB, it should be included in `blockNeedsPredicationForAnyReason()`. But since the former requires tail folding, is it really another reason, or can it be dropped? Ayal: If `useVPVectorization()` is another reason to predicate the header BB, it should be included…
		ABataevAuthorUnsubmitted Done Reply Inline Actions It is another reason and cannot be dropped ABataev: It is another reason and cannot be dropped
for (auto *Predecessor : predecessors(BB)) {		for (auto *Predecessor : predecessors(BB)) {
VPValue *EdgeMask = createEdgeMask(Predecessor, BB, Plan);		VPValue *EdgeMask = createEdgeMask(Predecessor, BB, Plan);
if (!EdgeMask) // Mask of predecessor is all-one so mask of block is too.		if (!EdgeMask) // Mask of predecessor is all-one so mask of block is too.
return BlockMaskCache[BB] = EdgeMask;		return BlockMaskCache[BB] = EdgeMask;

if (!BlockMask) { // BlockMask has its initialized nullptr value.		if (!BlockMask) { // BlockMask has its initialized nullptr value.
BlockMask = EdgeMask;		BlockMask = EdgeMask;
continue;		continue;
Show All 20 Lines	auto willWiden = [&](ElementCount VF) -> bool {
if (Decision == LoopVectorizationCostModel::CM_Interleave)		if (Decision == LoopVectorizationCostModel::CM_Interleave)
return true;		return true;
if (CM.isScalarAfterVectorization(I, VF) \|\|		if (CM.isScalarAfterVectorization(I, VF) \|\|
CM.isProfitableToScalarize(I, VF))		CM.isProfitableToScalarize(I, VF))
return false;		return false;
return Decision != LoopVectorizationCostModel::CM_Scalarize;		return Decision != LoopVectorizationCostModel::CM_Scalarize;
};		};

if (!LoopVectorizationPlanner::getDecisionAndClampRange(willWiden, Range))		if (!LoopVectorizationPlanner::getDecisionAndClampRange(willWiden, Range))
		frasercrmckUnsubmitted Done Reply Inline Actions Unnecessary parens around this statement. frasercrmck: Unnecessary parens around this statement.
return nullptr;		return nullptr;

VPValue *Mask = nullptr;		VPValue *Mask = nullptr;
if (Legal->isMaskRequired(I))		if (Legal->isMaskRequired(I))
Mask = createBlockInMask(I->getParent(), *Plan);		Mask = createBlockInMask(I->getParent(), *Plan);

// Determine if the pointer operand of the access is either consecutive or		// Determine if the pointer operand of the access is either consecutive or
// reverse consecutive.		// reverse consecutive.
▲ Show 20 Lines • Show All 331 Lines • ▼ Show 20 Lines	if (!IsUniform && Range.Start.isScalable() && isa<IntrinsicInst>(I)) {
default:		default:
break;		break;
}		}
}		}
VPValue *BlockInMask = nullptr;		VPValue *BlockInMask = nullptr;
if (!IsPredicated) {		if (!IsPredicated) {
// Finalize the recipe for Instr, first if it is not predicated.		// Finalize the recipe for Instr, first if it is not predicated.
LLVM_DEBUG(dbgs() << "LV: Scalarizing:" << *I << "\n");		LLVM_DEBUG(dbgs() << "LV: Scalarizing:" << *I << "\n");
} else {		} else {
		craig.topperUnsubmitted Done Reply Inline Actions Drop curly braces since it's only 1 line craig.topper: Drop curly braces since it's only 1 line
LLVM_DEBUG(dbgs() << "LV: Scalarizing and predicating:" << *I << "\n");		LLVM_DEBUG(dbgs() << "LV: Scalarizing and predicating:" << *I << "\n");
// Instructions marked for predication are replicated and a mask operand is		// Instructions marked for predication are replicated and a mask operand is
// added initially. Masked replicate recipes will later be placed under an		// added initially. Masked replicate recipes will later be placed under an
// if-then construct to prevent side-effects. Generate recipes to compute		// if-then construct to prevent side-effects. Generate recipes to compute
// the block mask for this region.		// the block mask for this region.
BlockInMask = createBlockInMask(I->getParent(), Plan);		BlockInMask = createBlockInMask(I->getParent(), Plan);
}		}

auto *Recipe = new VPReplicateRecipe(I, Plan.mapToVPValues(I->operands()),		auto *Recipe = new VPReplicateRecipe(I, Plan.mapToVPValues(I->operands()),
IsUniform, BlockInMask);		IsUniform, BlockInMask);
return toVPRecipeResult(Recipe);		return toVPRecipeResult(Recipe);
}		}

VPRecipeOrVPValueTy		VPRecipeOrVPValueTy
VPRecipeBuilder::tryToCreateWidenRecipe(Instruction *Instr,		VPRecipeBuilder::tryToCreateWidenRecipe(Instruction *Instr,
ArrayRef<VPValue *> Operands,		ArrayRef<VPValue *> Operands,
VFRange &Range, VPBasicBlock *VPBB,		VFRange &Range, VPBasicBlock *VPBB,
		bmahjourUnsubmitted Not Done Reply Inline Actions not sure if this is already in your todo list...but apart from "preferring" to predicate, we need to check that the predication is legal for the target platform. This should probably be done in `isScalarWithPredication` with calls similar to `isLegalMaskedLoad`. bmahjour: not sure if this is already in your todo list...but apart from "preferring" to predicate, we…
		vkmrUnsubmitted Done Reply Inline Actions `preferPredicatedWiden` takes into account `FoldTailByMasking` and `TTI.hasActiveVectorLength()`. `TTI.hasActiveVectorLength()` checks whether the EVL based predication is legal for the target or not. For this initial patch, legality checker and cost model are purposely avoided and the explicit command line option is used to 1) keep things simple and 2) demonstrate different approaches. vkmr: `preferPredicatedWiden` takes into account `FoldTailByMasking` and `TTI.hasActiveVectorLength…
VPlanPtr &Plan) {		VPlanPtr &Plan) {
// First, check for specific widening recipes that deal with inductions, Phi		// First, check for specific widening recipes that deal with inductions, Phi
// nodes, calls and memory operations.		// nodes, calls and memory operations.
VPRecipeBase *Recipe;		VPRecipeBase *Recipe;
if (auto Phi = dyn_cast<PHINode>(Instr)) {		if (auto Phi = dyn_cast<PHINode>(Instr)) {
if (Phi->getParent() != OrigLoop->getHeader())		if (Phi->getParent() != OrigLoop->getHeader())
return tryToBlend(Phi, Operands, Plan);		return tryToBlend(Phi, Operands, Plan);

▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	return toVPRecipeResult(new VPWidenSelectRecipe(
*SI, make_range(Operands.begin(), Operands.end())));		*SI, make_range(Operands.begin(), Operands.end())));
}		}

if (auto *CI = dyn_cast<CastInst>(Instr)) {		if (auto *CI = dyn_cast<CastInst>(Instr)) {
return toVPRecipeResult(new VPWidenCastRecipe(CI->getOpcode(), Operands[0],		return toVPRecipeResult(new VPWidenCastRecipe(CI->getOpcode(), Operands[0],
CI->getType(), *CI));		CI->getType(), *CI));
}		}

return toVPRecipeResult(tryToWiden(Instr, Operands, VPBB, Plan));		return toVPRecipeResult(tryToWiden(Instr, Operands, VPBB, Plan));
		craig.topperUnsubmitted Done Reply Inline Actions Drop curly braces craig.topper: Drop curly braces
}		}

void LoopVectorizationPlanner::buildVPlansWithVPRecipes(ElementCount MinVF,		void LoopVectorizationPlanner::buildVPlansWithVPRecipes(ElementCount MinVF,
ElementCount MaxVF) {		ElementCount MaxVF) {
assert(OrigLoop->isInnermost() && "Inner loop expected.");		assert(OrigLoop->isInnermost() && "Inner loop expected.");

auto MaxVFTimes2 = MaxVF * 2;		auto MaxVFTimes2 = MaxVF * 2;
for (ElementCount VF = MinVF; ElementCount::isKnownLT(VF, MaxVFTimes2);) {		for (ElementCount VF = MinVF; ElementCount::isKnownLT(VF, MaxVFTimes2);) {
VFRange SubRange = {VF, MaxVFTimes2};		VFRange SubRange = {VF, MaxVFTimes2};
if (auto Plan = tryToBuildVPlanWithVPRecipes(SubRange)) {		if (auto Plan = tryToBuildVPlanWithVPRecipes(SubRange)) {
// Now optimize the initial VPlan.		// Now optimize the initial VPlan.
if (!Plan->hasVF(ElementCount::getFixed(1)))		if (!Plan->hasVF(ElementCount::getFixed(1)))
VPlanTransforms::truncateToMinimalBitwidths(		VPlanTransforms::truncateToMinimalBitwidths(
*Plan, CM.getMinimalBitwidths(), PSE.getSE()->getContext());		*Plan, CM.getMinimalBitwidths(), PSE.getSE()->getContext());
VPlanTransforms::optimize(Plan, PSE.getSE());		VPlanTransforms::optimize(Plan, PSE.getSE());
		if (CM.useVPIWithVPEVLVectorization())
		VPlanTransforms::addExplicitVectorLength(*Plan);
assert(VPlanVerifier::verifyPlanIsValid(*Plan) && "VPlan is invalid");		assert(VPlanVerifier::verifyPlanIsValid(*Plan) && "VPlan is invalid");
VPlans.push_back(std::move(Plan));		VPlans.push_back(std::move(Plan));
}		}
VF = SubRange.End;		VF = SubRange.End;
}		}
}		}

// Add the necessary canonical IV and branch recipes required to control the		// Add the necessary canonical IV and branch recipes required to control the
// loop.		// loop.
static void addCanonicalIVRecipes(VPlan &Plan, Type *IdxTy, bool HasNUW,		static void addCanonicalIVRecipes(VPlan &Plan, Type *IdxTy, bool HasNUW,
DebugLoc DL) {		DebugLoc DL) {
Value *StartIdx = ConstantInt::get(IdxTy, 0);		Value *StartIdx = ConstantInt::get(IdxTy, 0);
auto *StartV = Plan.getVPValueOrAddLiveIn(StartIdx);		auto *StartV = Plan.getVPValueOrAddLiveIn(StartIdx);

// Add a VPCanonicalIVPHIRecipe starting at 0 to the header.		// Add a VPCanonicalIVPHIRecipe starting at 0 to the header.
auto *CanonicalIVPHI = new VPCanonicalIVPHIRecipe(StartV, DL);		auto *CanonicalIVPHI = new VPCanonicalIVPHIRecipe(StartV, DL);
VPRegionBlock *TopRegion = Plan.getVectorLoopRegion();		VPRegionBlock *TopRegion = Plan.getVectorLoopRegion();
VPBasicBlock *Header = TopRegion->getEntryBasicBlock();		VPBasicBlock *Header = TopRegion->getEntryBasicBlock();
Header->insert(CanonicalIVPHI, Header->begin());		Header->insert(CanonicalIVPHI, Header->begin());

// Add a CanonicalIVIncrement{NUW} VPInstruction to increment the scalar		// Add a CanonicalIVIncrement{NUW} VPInstruction to increment the scalar
// IV by VF * UF.		// IV by VF * UF.
auto *CanonicalIVIncrement =		auto *CanonicalIVIncrement =
new VPInstruction(Instruction::Add, {CanonicalIVPHI, &Plan.getVFxUF()},		new VPInstruction(Instruction::Add, {CanonicalIVPHI, &Plan.getVFxUF()},
		fhahnUnsubmitted Not Done Reply Inline Actions There should be only a single VPEVL recipe, would be good to check this in the VPlanVerifier. fhahn: There should be only a single VPEVL recipe, would be good to check this in the VPlanVerifier.
{HasNUW, false}, DL, "index.next");		{HasNUW, false}, DL, "index.next");
		fhahnUnsubmitted Not Done Reply Inline Actions nit: would it be possible to move the code to create the VPEVLRecipe here, reducing the scope of the `VPEVL` variable? fhahn: nit: would it be possible to move the code to create the VPEVLRecipe here, reducing the scope…
CanonicalIVPHI->addOperand(CanonicalIVIncrement);		CanonicalIVPHI->addOperand(CanonicalIVIncrement);

VPBasicBlock *EB = TopRegion->getExitingBasicBlock();		VPBasicBlock *EB = TopRegion->getExitingBasicBlock();
EB->appendRecipe(CanonicalIVIncrement);		EB->appendRecipe(CanonicalIVIncrement);

// Add the BranchOnCount VPInstruction to the latch.		// Add the BranchOnCount VPInstruction to the latch.
VPInstruction *BranchBack =		VPInstruction *BranchBack =
new VPInstruction(VPInstruction::BranchOnCount,		new VPInstruction(VPInstruction::BranchOnCount,
▲ Show 20 Lines • Show All 149 Lines • ▼ Show 20 Lines	for (Instruction &I : drop_end(BB->instructionsWithoutDebug(false))) {
Plan->addVPValue(UV, Def);		Plan->addVPValue(UV, Def);
}		}

RecipeBuilder.setRecipe(Instr, Recipe);		RecipeBuilder.setRecipe(Instr, Recipe);
if (isa<VPHeaderPHIRecipe>(Recipe)) {		if (isa<VPHeaderPHIRecipe>(Recipe)) {
// VPHeaderPHIRecipes must be kept in the phi section of HeaderVPBB. In		// VPHeaderPHIRecipes must be kept in the phi section of HeaderVPBB. In
// the following cases, VPHeaderPHIRecipes may be created after non-phi		// the following cases, VPHeaderPHIRecipes may be created after non-phi
// recipes and need to be moved to the phi section of HeaderVPBB:		// recipes and need to be moved to the phi section of HeaderVPBB:
// * tail-folding (non-phi recipes computing the header mask are		// * tail-folding (non-phi recipes computing the header mask are
		fhahnUnsubmitted Done Reply Inline Actions Is this still needed? The latest version only introduces `VPEVLRecipe` in `addCanonicalIVRecipes`. fhahn: Is this still needed? The latest version only introduces `VPEVLRecipe` in…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Still needed. VPWidenIntOrFpInductionRecipe is emitted after addCanonicalIVRecipes and needs to be inserted as a PHI recipe. I'm not sure about TODO, I think it should be removed. But we still need to insert VPWidenIntOrFpInductionRecipe before VPEVLRecipe, which is emitted immediately after canonical IV recipe. ABataev: Still needed. VPWidenIntOrFpInductionRecipe is emitted after addCanonicalIVRecipes and needs to…
		fhahnUnsubmitted Not Done Reply Inline Actions I think the TODO is not really needed. As the EVL recipe needs to be created before other recipe construction, there's not much we can do about it. fhahn: I think the TODO is not really needed. As the EVL recipe needs to be created before other…
		AyalUnsubmitted Not Done Reply Inline Actions (Another) Premature creation of a recipe out of place/time? Ayal: (Another) Premature creation of a recipe out of place/time?
// introduced earlier than regular header phi recipes, and should appear		// introduced earlier than regular header phi recipes, and should appear
// after them)		// after them)
// * Optimizing truncates to VPWidenIntOrFpInductionRecipe.		// * Optimizing truncates to VPWidenIntOrFpInductionRecipe.

assert((HeaderVPBB->getFirstNonPhi() == VPBB->end() \|\|		assert((HeaderVPBB->getFirstNonPhi() == VPBB->end() \|\|
CM.foldTailByMasking() \|\| isa<TruncInst>(Instr)) &&		CM.foldTailByMasking() \|\| isa<TruncInst>(Instr)) &&
"unexpected recipe needs moving");		"unexpected recipe needs moving");
Recipe->insertBefore(*HeaderVPBB, HeaderVPBB->getFirstNonPhi());		Recipe->insertBefore(*HeaderVPBB, HeaderVPBB->getFirstNonPhi());
▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines	LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
// in ways that accessing values using original IR values is incorrect.		// in ways that accessing values using original IR values is incorrect.
Plan->disableValue2VPValue();		Plan->disableValue2VPValue();

// Sink users of fixed-order recurrence past the recipe defining the previous		// Sink users of fixed-order recurrence past the recipe defining the previous
// value and introduce FirstOrderRecurrenceSplice VPInstructions.		// value and introduce FirstOrderRecurrenceSplice VPInstructions.
if (!VPlanTransforms::adjustFixedOrderRecurrences(*Plan, Builder))		if (!VPlanTransforms::adjustFixedOrderRecurrences(*Plan, Builder))
return nullptr;		return nullptr;

if (useActiveLaneMask(Style)) {		if (useActiveLaneMask(Style)) {
		AyalUnsubmitted Done Reply Inline Actions What's the relation between useVPWithVPEVLVectorization and useActiveLaneMask? Should the latter cover the former, so that it suffices to check if (useActiveLaneMask(Style)), or is it meaningful to have both true? Ayal: What's the relation between useVPWithVPEVLVectorization and useActiveLaneMask? Should the…
// TODO: Move checks to VPlanTransforms::addActiveLaneMask once		// TODO: Move checks to VPlanTransforms::addActiveLaneMask once
// TailFoldingStyle is visible there.		// TailFoldingStyle is visible there.
bool ForControlFlow = useActiveLaneMaskForControlFlow(Style);		bool ForControlFlow = useActiveLaneMaskForControlFlow(Style);
bool WithoutRuntimeCheck =		bool WithoutRuntimeCheck =
Style == TailFoldingStyle::DataAndControlFlowWithoutRuntimeCheck;		Style == TailFoldingStyle::DataAndControlFlowWithoutRuntimeCheck;
VPlanTransforms::addActiveLaneMask(*Plan, ForControlFlow,		VPlanTransforms::addActiveLaneMask(*Plan, ForControlFlow,
WithoutRuntimeCheck);		WithoutRuntimeCheck);
}		}
Show All 26 Lines	VPlanTransforms::VPInstructionsToVPRecipes(
PSE.getSE(), TLI);		PSE.getSE(), TLI);

// Remove the existing terminator of the exiting block of the top-most region.		// Remove the existing terminator of the exiting block of the top-most region.
// A BranchOnCount will be added instead when adding the canonical IV recipes.		// A BranchOnCount will be added instead when adding the canonical IV recipes.
auto *Term =		auto *Term =
Plan->getVectorLoopRegion()->getExitingBasicBlock()->getTerminator();		Plan->getVectorLoopRegion()->getExitingBasicBlock()->getTerminator();
Term->eraseFromParent();		Term->eraseFromParent();

// Tail folding is not supported for outer loops, so the induction increment		// Tail folding is not supported for outer loops, so the induction increment
// is guaranteed to not wrap.		// is guaranteed to not wrap.
bool HasNUW = true;		bool HasNUW = true;
addCanonicalIVRecipes(*Plan, Legal->getWidestInductionType(), HasNUW,		addCanonicalIVRecipes(*Plan, Legal->getWidestInductionType(), HasNUW,
DebugLoc());		DebugLoc());
		AyalUnsubmitted Not Done Reply Inline Actions Restrict EVL to inner loop vectorization only, for now? Ayal: Restrict EVL to inner loop vectorization only, for now?
return Plan;		return Plan;
}		}

// Adjust the recipes for reductions. For in-loop reductions the chain of		// Adjust the recipes for reductions. For in-loop reductions the chain of
// instructions leading from the loop exit instr to the phi need to be converted		// instructions leading from the loop exit instr to the phi need to be converted
// to reductions, with one operand being vector and the other being the scalar		// to reductions, with one operand being vector and the other being the scalar
// reduction chain. For other reductions, a select is introduced between the phi		// reduction chain. For other reductions, a select is introduced between the phi
// and live-out recipes when folding the tail.		// and live-out recipes when folding the tail.
▲ Show 20 Lines • Show All 476 Lines • ▼ Show 20 Lines	void VPReplicateRecipe::execute(VPTransformState &State) {
// Generate scalar instances for all VF lanes of all UF parts.		// Generate scalar instances for all VF lanes of all UF parts.
assert(!State.VF.isScalable() && "Can't scalarize a scalable vector");		assert(!State.VF.isScalable() && "Can't scalarize a scalable vector");
const unsigned EndLane = State.VF.getKnownMinValue();		const unsigned EndLane = State.VF.getKnownMinValue();
for (unsigned Part = 0; Part < State.UF; ++Part)		for (unsigned Part = 0; Part < State.UF; ++Part)
for (unsigned Lane = 0; Lane < EndLane; ++Lane)		for (unsigned Lane = 0; Lane < EndLane; ++Lane)
State.ILV->scalarizeInstruction(UI, this, VPIteration(Part, Lane), State);		State.ILV->scalarizeInstruction(UI, this, VPIteration(Part, Lane), State);
}		}

		/// Creates either vp_store or vp_scatter intrinsics calls to represent
		/// predicated store/scatter.
		AyalUnsubmitted Done Reply Inline Actions These two functions each have a single caller, better defined as lambdas next to them? Simpler to separate into two separate store/scatter functions? Remove EVLPart because EVL currently works w/o unrolling, introduce it in the future as part of enabling EVL with unrolling? Ayal: These two functions each have a single caller, better defined as lambdas next to them? Simpler…
		static Instruction *
		lowerStoreUsingVectorIntrinsics(IRBuilderBase &Builder, Value *Addr,
		Value StoredVal, bool IsScatter, Value Mask,
		Value *EVLPart, const Align &Alignment) {
		CallInst *Call;
		if (IsScatter) {
		Call = Builder.CreateIntrinsic(Type::getVoidTy(EVLPart->getContext()),
		Intrinsic::vp_scatter,
		{StoredVal, Addr, Mask, EVLPart});
		} else {
		VectorBuilder VBuilder(Builder);
		VBuilder.setEVL(EVLPart).setMask(Mask);
		Call = cast<CallInst>(VBuilder.createVectorInstruction(
		Instruction::Store, Type::getVoidTy(EVLPart->getContext()),
		{StoredVal, Addr}));
		}
		Call->addParamAttr(
		1, Attribute::getWithAlignment(Call->getContext(), Alignment));
		return Call;
		}

		/// Creates either vp_load or vp_gather intrinsics calls to represent
		/// predicated load/gather.
		static Instruction *lowerLoadUsingVectorIntrinsics(IRBuilderBase &Builder,
		VectorType *DataTy,
		Value *Addr, bool IsGather,
		Value Mask, Value EVLPart,
		const Align &Alignment) {
		CallInst *Call;
		if (IsGather) {
		Call = Builder.CreateIntrinsic(DataTy, Intrinsic::vp_gather,
		{Addr, Mask, EVLPart}, nullptr,
		"wide.masked.gather");
		} else {
		VectorBuilder VBuilder(Builder);
		VBuilder.setEVL(EVLPart).setMask(Mask);
		Call = cast<CallInst>(VBuilder.createVectorInstruction(
		Instruction::Load, DataTy, Addr, "vp.op.load"));
		}
		Call->addParamAttr(
		0, Attribute::getWithAlignment(Call->getContext(), Alignment));
		return Call;
		}

void VPWidenMemoryInstructionRecipe::execute(VPTransformState &State) {		void VPWidenMemoryInstructionRecipe::execute(VPTransformState &State) {
		AyalUnsubmitted Not Done Reply Inline Actions Recipes should strive to have straightforward code-gen as much as possible (contrary to "smart vector instructions/vp intrinsics emission" of the 2nd bullet in the summary's Tentative Development Roadmap). This is already challenged by the existing (non-EVL) VPWidenMemoryInstructionRecipe::execute. Design dedicated recipe(s) for widening memory instructions under EVL, and introduce them instead of the existing non-EVL recipes, preferably as a VPlan-to-VPlan transformation, rather than try to fit everything here, and potentially elsewhere? Also discussed below in https://reviews.llvm.org/D99750#inline-967127, and iinm in earlier revisions. Ayal: Recipes should strive to have straightforward code-gen as much as possible (contrary to "smart…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Why? I t already handles masking, why it should not be extended with the handling of EVL? New recipe will not add anything new here, just will be copy/paste of the existing recipes. ABataev: Why? I t already handles masking, why it should not be extended with the handling of EVL? New…
		AyalUnsubmitted Not Done Reply Inline Actions Reason explained above: execute() of recipes should be straightforward. This is one of the main guidelines outlined in the VPlan roadmap. This recipe is getting too complicated, should probably separate gather/scatter from wide load/store, and separate the pointer setting (as in [VPlan] Model address separately. #72164), independent of this patch. Recipe should indicate statically if EVL is used or not, to simplify code-gen and facilitate cost estimation, rather than having to check State.EVL during execute(). If multiple recipes share some common core, it can be shared via a common base class, as in VPHeaderPHIRecipe and VPRecipeWithIRFlags. Ayal: Reason explained above: execute() of recipes should be straightforward. This is one of the main…
VPValue *StoredValue = isStore() ? getStoredValue() : nullptr;		VPValue *StoredValue = isStore() ? getStoredValue() : nullptr;

// Attempt to issue a wide load.		// Attempt to issue a wide load.
LoadInst *LI = dyn_cast<LoadInst>(&Ingredient);		LoadInst *LI = dyn_cast<LoadInst>(&Ingredient);
StoreInst *SI = dyn_cast<StoreInst>(&Ingredient);		StoreInst *SI = dyn_cast<StoreInst>(&Ingredient);

assert((LI \|\| SI) && "Invalid Load/Store instruction");		assert((LI \|\| SI) && "Invalid Load/Store instruction");
assert((!SI \|\| StoredValue) && "No stored value provided for widened store");		assert((!SI \|\| StoredValue) && "No stored value provided for widened store");
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	const auto CreateVecPtr = [&](unsigned Part, Value Ptr) -> Value {
} else {		} else {
Value *Increment = createStepForVF(Builder, IndexTy, State.VF, Part);		Value *Increment = createStepForVF(Builder, IndexTy, State.VF, Part);
PartPtr = Builder.CreateGEP(ScalarDataTy, Ptr, Increment, "", InBounds);		PartPtr = Builder.CreateGEP(ScalarDataTy, Ptr, Increment, "", InBounds);
}		}

return PartPtr;		return PartPtr;
};		};

		auto MaskValue = [&](unsigned Part) -> Value * {
		fhahnUnsubmitted Not Done Reply Inline Actions nit: `EC` unused? fhahn: nit: `EC` unused?
		if (isMaskRequired)
		return BlockInMaskParts[Part];
		return nullptr;
		};

// Handle Stores:		// Handle Stores:
		fhahnUnsubmitted Not Done Reply Inline Actions It's not clear to me what the reasoning is here to simplify this to the all-true mask without checking the operands of the compare. If the compare can be simplified to the all-true mask, then this would be suitable to do as VPlan-to-VPlan simplification instead of during codegen. fhahn: It's not clear to me what the reasoning is here to simplify this to the all-true mask without…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Added extra check. ABataev: Added extra check.
		fhahnUnsubmitted Not Done Reply Inline Actions Thanks for the update. Looking at this again, it seems like it would be better to either perform this simplification when creating the tail-folding mask or optimize as VPlan-to-VPlan transform, as presumably this applies to all users of the top-level mask? Might be good to tie in with D157037 which would always create the top-level mask in the beginning when tail-folding. fhahn: Thanks for the update. Looking at this again, it seems like it would be better to either…
		ABataevAuthorUnsubmitted Done Reply Inline Actions If I understand your proposal correctly, it will require adding special VPValue like AllTrueVPValue. This lambda returns Value , no VPValue, so it cannot be done directly without adding new live-in(?). ABataev:* If I understand your proposal correctly, it will require adding special VPValue like…
		fhahnUnsubmitted Not Done Reply Inline Actions I think it would be preferable to handle this more generically during VPlan optimizations. I’ve been working on moving tail folding to a Vallant transform instead of combining it with regular block mask creation (D157713) With that, this simplification could be handled generically by not adding the header mask during the tail folding transform. fhahn: I think it would be preferable to handle this more generically during VPlan optimizations.
		ABataevAuthorUnsubmitted Done Reply Inline Actions Shall I wait for your patch or we can land this with the current implementation so you can later adjust this in your patch? ABataev: Shall I wait for your patch or we can land this with the current implementation so you can…
		AyalUnsubmitted Done Reply Inline Actions Moving tail folding to a transform, following VPlan roadmap, is already quite an endeavor, which would hopefully better facilitate this patch. Better avoid complicating it further. Ayal: Moving tail folding to a transform, following VPlan roadmap, is already quite an endeavor…
if (SI) {		if (SI) {
State.setDebugLocFrom(SI->getDebugLoc());		State.setDebugLocFrom(SI->getDebugLoc());

for (unsigned Part = 0; Part < State.UF; ++Part) {		for (unsigned Part = 0; Part < State.UF; ++Part) {
Instruction *NewSI = nullptr;		Instruction *NewSI = nullptr;
Value *StoredVal = State.get(StoredValue, Part);		Value *StoredVal = State.get(StoredValue, Part);
if (CreateGatherScatter) {		if (State.EVL) {
		Value *EVLPart = State.get(State.EVL, Part);
		// If EVL is not nullptr, then EVL must be a valid value set during plan
		// creation, possibly default value = whole vector register length. EVL
		// is created only if TTI prefers predicated vectorization, thus if EVL
		fhahnUnsubmitted Not Done Reply Inline Actions Can this be sunk inside the `else {`? fhahn: Can this be sunk inside the ` else {`?
		// is not nullptr it also implies preference for predicated
		// vectorization.
		fhahnUnsubmitted Done Reply Inline Actions nit: Drop `better` here, as reverse storing isn't supported at all at the moment. Would be good to also assert. fhahn: nit: Drop `better` here, as reverse storing isn't supported at all at the moment. Would be good…
		// FIXME: Support reverse store after vp_reverse is added.
		NewSI = lowerStoreUsingVectorIntrinsics(
		Builder,
		CreateGatherScatter
		? State.get(getAddr(), Part)
		: CreateVecPtr(Part, State.get(getAddr(), VPIteration(0, 0))),
		StoredVal, CreateGatherScatter, MaskValue(Part), EVLPart,
		Alignment);
		} else if (CreateGatherScatter) {
		AyalUnsubmitted Done Reply Inline Actions Better simply check if (State.EVL) and then State.get() it, or could the latter return null? Raised as a nit below. Better avoid checking if State.EVL altogether during VPlan execution, as noted above. Ayal: Better simply check if (State.EVL) and then State.get() it, or could the latter return null?
Value *MaskPart = isMaskRequired ? BlockInMaskParts[Part] : nullptr;		Value *MaskPart = isMaskRequired ? BlockInMaskParts[Part] : nullptr;
Value *VectorGep = State.get(getAddr(), Part);		Value *VectorGep = State.get(getAddr(), Part);
NewSI = Builder.CreateMaskedScatter(StoredVal, VectorGep, Alignment,		NewSI = Builder.CreateMaskedScatter(StoredVal, VectorGep, Alignment,
		fhahnUnsubmitted Done Reply Inline Actions Is the gather scatter case handled correctly for EVL at the moment? fhahn: Is the gather scatter case handled correctly for EVL at the moment?
		ABataevAuthorUnsubmitted Done Reply Inline Actions Added support for this. ABataev: Added support for this.
MaskPart);		MaskPart);
		fhahnUnsubmitted Not Done Reply Inline Actions Great thanks! Now that there is VP intrinsic handling in multiple places, would it be better to handle all EVL related codegen together, i.e. something like below to avoid complicating reading the existing non-EVL code. WDYT? for () Value VectorGep = State.get(getAddr(), Part); if (Value EVLPart = State.EVL ? State.get(State.EVL, Part) : nullptr) { NewSI = lowerUsingVectorIntrinsics(vectorGEP..) } else { existing code.... } fhahn: Great thanks! Now that there is VP intrinsic handling in multiple places, would it be better to…
		fhahnUnsubmitted Done Reply Inline Actions can leave unchanged now? fhahn: can leave unchanged now?
} else {		} else {
if (isReverse()) {		if (isReverse()) {
// If we store to reverse consecutive memory locations, then we need		// If we store to reverse consecutive memory locations, then we need
		fhahnUnsubmitted Not Done Reply Inline Actions Could you elaborate what better means here? Might have missed it, does the current code handle reverse? fhahn: Could you elaborate what better means here? Might have missed it, does the current code handle…
		ABataevAuthorUnsubmitted Done Reply Inline Actions I disabled reverse support, see useVPWithVPEVLVectorization(), need support for vp_reverse intrinsic, which is not added yet. ABataev: I disabled reverse support, see useVPWithVPEVLVectorization(), need support for vp_reverse…
// to reverse the order of elements in the stored value.		// to reverse the order of elements in the stored value.
StoredVal = Builder.CreateVectorReverse(StoredVal, "reverse");		StoredVal = Builder.CreateVectorReverse(StoredVal, "reverse");
		fhahnUnsubmitted Not Done Reply Inline Actions Not sure if we also have a test case for this path, do you know if this would be handled correctly at the moment? fhahn: Not sure if we also have a test case for this path, do you know if this would be handled…
// We don't want to update the value in the map as it might be used in		// We don't want to update the value in the map as it might be used in
// another expression. So don't call resetVectorValue(StoredVal).		// another expression. So don't call resetVectorValue(StoredVal).
}		}
auto *VecPtr =		auto *VecPtr =
CreateVecPtr(Part, State.get(getAddr(), VPIteration(0, 0)));		CreateVecPtr(Part, State.get(getAddr(), VPIteration(0, 0)));
if (isMaskRequired)		if (isMaskRequired)
NewSI = Builder.CreateMaskedStore(StoredVal, VecPtr, Alignment,		NewSI = Builder.CreateMaskedStore(StoredVal, VecPtr, Alignment,
fhahnUnsubmitted Done Reply Inline Actions can leave unchanged? fhahn: can leave unchanged?
BlockInMaskParts[Part]);		BlockInMaskParts[Part]);
		fhahnUnsubmitted Not Done Reply Inline Actions For correctness, would we also need special handling for interleave groups? fhahn: For correctness, would we also need special handling for interleave groups?
		ABataevAuthorUnsubmitted Done Reply Inline Actions if foldTailByMasking() is true (for VP support it is true), interleave groups are invalidated. ABataev: if foldTailByMasking() is true (for VP support it is true), interleave groups are invalidated.
		AyalUnsubmitted Not Done Reply Inline Actions if foldTailByMasking() is true (for VP support it is true), interleave groups are invalidated. unless useMaskedInterleavedAccesses() is also true? Ayal: > if foldTailByMasking() is true (for VP support it is true), interleave groups are invalidated.
else		else
NewSI = Builder.CreateAlignedStore(StoredVal, VecPtr, Alignment);		NewSI = Builder.CreateAlignedStore(StoredVal, VecPtr, Alignment);
		fhahnUnsubmitted Not Done Reply Inline Actions A more direct why that doesn't require accessing the preheader would be to use something like `State.Builder.GetInsertBlock()->getModule();` fhahn: A more direct why that doesn't require accessing the preheader would be to use something like…
}		}
		AyalUnsubmitted Not Done Reply Inline Actions nit: `if (State.EVL) { Value EVLPart = State.get(State.EVL, Part); ...`? Why deal with Parts when EVL mandates UF=1? Ayal:* nit: `if (State.EVL) { Value *EVLPart = State.get(State.EVL, Part); ...`? Why deal with Parts…
		ABataevAuthorUnsubmitted Done Reply Inline Actions The loop executes to State.UF (1 here, yes), so everything is fine here ABataev: The loop executes to State.UF (1 here, yes), so everything is fine here
State.addMetadata(NewSI, SI);		State.addMetadata(NewSI, SI);
		fhahnUnsubmitted Not Done Reply Inline Actions nit: `If EVLPart` fhahn: nit: `If EVLPart`
}		}
return;		return;
}		}

// Handle loads.		// Handle loads.
assert(LI && "Must have a load instruction");		assert(LI && "Must have a load instruction");
		craig.topperUnsubmitted Not Done Reply Inline Actions Should set the alignment attribute on the pointer operand. Something like NewSI->addParamAttr(1, Attribute::getWithAlignment(NewSI->getContext(), Alignment)); craig.topper: Should set the alignment attribute on the pointer operand. Something like ``` NewSI…
		fhahnUnsubmitted Not Done Reply Inline Actions Is there a reason for creating the VP intrinsic call explicitly rather than using the VP builder? fhahn: Is there a reason for creating the VP intrinsic call explicitly rather than using the VP…
		fhahnUnsubmitted Not Done Reply Inline Actions Ah I originally thought that the vector builder provides the same interface as IRBuilder, but uses the vector predication intrinsics when mask/EVL is set. Together with performing the `MaskValue` simplification as VP2VP transform, this would effectively reduce the changes here to a few lines, i.e. only setting EVL in addition to the mask. Another thing came to mind: is there a plan to converge the separate masked memory intrinsics and the vector predication versions, i.e. long term, should the masked memory intrinsics be superseded by the vector predication ones? fhahn: Ah I originally thought that the vector builder provides the same interface as IRBuilder, but…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Ah I originally thought that the vector builder provides the same interface as IRBuilder, but uses the vector predication intrinsics when mask/EVL is set. Together with performing the `MaskValue` simplification as VP2VP transform, this would effectively reduce the changes here to a few lines, i.e. only setting EVL in addition to the mask. Another thing came to mind: is there a plan to converge the separate masked memory intrinsics and the vector predication versions, i.e. long term, should the masked memory intrinsics be superseded by the vector predication ones? We did not discuss it yet and it should separate discussion/decision. Though it would be good to make it part of masked intrinsics or even instructions (along with the mask). ABataev: > Ah I originally thought that the vector builder provides the same interface as IRBuilder, but…
State.setDebugLocFrom(LI->getDebugLoc());		State.setDebugLocFrom(LI->getDebugLoc());
for (unsigned Part = 0; Part < State.UF; ++Part) {		for (unsigned Part = 0; Part < State.UF; ++Part) {
Value *NewLI;		Value *NewLI;
if (CreateGatherScatter) {		if (State.EVL) {
		Value *EVLPart = State.get(State.EVL, Part);
		// If EVL is not nullptr, then EVL must be a valid value set during plan
		// creation, possibly default value = whole vector register length. EVL
		// is created only if TTI prefers predicated vectorization, thus if EVL
		// is not nullptr it also implies preference for predicated
		// vectorization.
		fhahnUnsubmitted Done Reply Inline Actions nit: drop `Better` here, as reverse loading isn't supported at all a the moment. Would be good to assert here that the load isn't reversed fhahn: nit: drop `Better` here, as reverse loading isn't supported at all a the moment. Would be good…
		// FIXME: Support reverse loading after vp_reverse is added.
		NewLI = lowerLoadUsingVectorIntrinsics(
		Builder, DataTy,
		CreateGatherScatter
		? State.get(getAddr(), Part)
		: CreateVecPtr(Part, State.get(getAddr(), VPIteration(0, 0))),
		CreateGatherScatter, MaskValue(Part), EVLPart, Alignment);
		} else if (CreateGatherScatter) {
		fhahnUnsubmitted Done Reply Inline Actions Indent looks off here fhahn: Indent looks off here
Value *MaskPart = isMaskRequired ? BlockInMaskParts[Part] : nullptr;		Value *MaskPart = isMaskRequired ? BlockInMaskParts[Part] : nullptr;
Value *VectorGep = State.get(getAddr(), Part);		Value *VectorGep = State.get(getAddr(), Part);
NewLI = Builder.CreateMaskedGather(DataTy, VectorGep, Alignment, MaskPart,		NewLI = Builder.CreateMaskedGather(DataTy, VectorGep, Alignment, MaskPart,
nullptr, "wide.masked.gather");		nullptr, "wide.masked.gather");
		fhahnUnsubmitted Done Reply Inline Actions can leave unchanged? fhahn: can leave unchanged?
State.addMetadata(NewLI, LI);		State.addMetadata(NewLI, LI);
} else {		} else {
auto *VecPtr =		auto *VecPtr =
CreateVecPtr(Part, State.get(getAddr(), VPIteration(0, 0)));		CreateVecPtr(Part, State.get(getAddr(), VPIteration(0, 0)));
if (isMaskRequired)		if (isMaskRequired)
NewLI = Builder.CreateMaskedLoad(		NewLI = Builder.CreateMaskedLoad(
DataTy, VecPtr, Alignment, BlockInMaskParts[Part],		DataTy, VecPtr, Alignment, BlockInMaskParts[Part],
		fhahnUnsubmitted Done Reply Inline Actions can leave unchanged? fhahn: can leave unchanged?
PoisonValue::get(DataTy), "wide.masked.load");		PoisonValue::get(DataTy), "wide.masked.load");
else		else
NewLI =		NewLI =
Builder.CreateAlignedLoad(DataTy, VecPtr, Alignment, "wide.load");		Builder.CreateAlignedLoad(DataTy, VecPtr, Alignment, "wide.load");

		fhahnUnsubmitted Not Done Reply Inline Actions Can the `maskRequired` check be sunk into `MaskValue` or even better be taken care on recipe construction? fhahn: Can the `maskRequired` check be sunk into `MaskValue` or even better be taken care on recipe…
// Add metadata to the load, but setVectorValue to the reverse shuffle.		// Add metadata to the load, but setVectorValue to the reverse shuffle.
State.addMetadata(NewLI, LI);		State.addMetadata(NewLI, LI);
if (Reverse)		if (Reverse)
		craig.topperUnsubmitted Not Done Reply Inline Actions Should set the alignment attribute on the load. craig.topper: Should set the alignment attribute on the load.
NewLI = Builder.CreateVectorReverse(NewLI, "reverse");		NewLI = Builder.CreateVectorReverse(NewLI, "reverse");
}		}

State.set(getVPSingleValue(), NewLI, Part);		State.set(getVPSingleValue(), NewLI, Part);
}		}
}		}

// Determine how to lower the scalar epilogue, which depends on 1) optimising		// Determine how to lower the scalar epilogue, which depends on 1) optimising
Show All 18 Lines	if (F->hasOptSize() \|\| (llvm::shouldOptimizeForSize(L->getHeader(), PSI, BFI,
return CM_ScalarEpilogueNotAllowedOptSize;		return CM_ScalarEpilogueNotAllowedOptSize;

// 2) If set, obey the directives		// 2) If set, obey the directives
if (PreferPredicateOverEpilogue.getNumOccurrences()) {		if (PreferPredicateOverEpilogue.getNumOccurrences()) {
switch (PreferPredicateOverEpilogue) {		switch (PreferPredicateOverEpilogue) {
case PreferPredicateTy::ScalarEpilogue:		case PreferPredicateTy::ScalarEpilogue:
return CM_ScalarEpilogueAllowed;		return CM_ScalarEpilogueAllowed;
case PreferPredicateTy::PredicateElseScalarEpilogue:		case PreferPredicateTy::PredicateElseScalarEpilogue:
return CM_ScalarEpilogueNotNeededUsePredicate;		return CM_ScalarEpilogueNotNeededUsePredicate;
		bmahjourUnsubmitted Not Done Reply Inline Actions Value Remaining = Builder.CreateSub(TripCount, Induction, "tc.minus.iv"); bmahjour:* Value *Remaining = Builder.CreateSub(TripCount, Induction, "tc.minus.iv");
case PreferPredicateTy::PredicateOrDontVectorize:		case PreferPredicateTy::PredicateOrDontVectorize:
return CM_ScalarEpilogueNotAllowedUsePredicate;		return CM_ScalarEpilogueNotAllowedUsePredicate;
		craig.topperUnsubmitted Done Reply Inline Actions intrinisc -> intrinsic craig.topper: intrinisc -> intrinsic
};		};
}		}

// 3) If set, obey the hints		// 3) If set, obey the hints
switch (Hints.getPredicate()) {		switch (Hints.getPredicate()) {
case LoopVectorizeHints::FK_Enabled:		case LoopVectorizeHints::FK_Enabled:
return CM_ScalarEpilogueNotNeededUsePredicate;		return CM_ScalarEpilogueNotNeededUsePredicate;
case LoopVectorizeHints::FK_Disabled:		case LoopVectorizeHints::FK_Disabled:
return CM_ScalarEpilogueAllowed;		return CM_ScalarEpilogueAllowed;
};		};

// 4) if the TTI hook indicates this is profitable, request predication.		// 4) if the TTI hook indicates this is profitable, request predication.
TailFoldingInfo TFI(TLI, &LVL, IAI);		TailFoldingInfo TFI(TLI, &LVL, IAI);
		bmahjourUnsubmitted Not Done Reply Inline Actions can we try to initialize RuntimeVL with the right type, to avoid having to zero extend it here? bmahjour: can we try to initialize RuntimeVL with the right type, to avoid having to zero extend it here?
		vkmrUnsubmitted Not Done Reply Inline Actions The core issue here is that the Vectorizer sort of implicitly uses i32 for everything related to `VF` and Vector Lenght. `%evl` parameter in VP intrinsics is also of type i32. `TC`, `BTC` and `IV` are however i64. So whenever we mix these two we end up having extends and truncates. That being said, in this particular case, it should be possible to initialize `MinVF` and thus `RuntimeVL` to i64 if we are going to use it with `TC` and `IV` and i32 otherwise. vkmr: The core issue here is that the Vectorizer sort of implicitly uses i32 for everything related…
if (TTI->preferPredicateOverEpilogue(&TFI))		if (TTI->preferPredicateOverEpilogue(&TFI))
return CM_ScalarEpilogueNotNeededUsePredicate;		return CM_ScalarEpilogueNotNeededUsePredicate;
		bmahjourUnsubmitted Not Done Reply Inline Actions you can avoid calling the intrinsic, by just doing a `cmp` and a `select`. (ie check that Remaining is less than RuntimeVL and if so pick Ramining, otherwise select RuntimeVL). bmahjour: you can avoid calling the intrinsic, by just doing a `cmp` and a `select`. (ie check that…
		vkmrUnsubmitted Not Done Reply Inline Actions I may be wrong here but from what I understand, patches D81829 and D84125 recently introduces these intrinsics to avoid these IR patterns. I am not sure if their use is discouraged for some reason. vkmr: I may be wrong here but from what I understand, patches [[ https://reviews.llvm.org/D81829 \|…

		bmahjourUnsubmitted Not Done Reply Inline Actions Can we avoid truncating to 32-bit on 64-bit targets that take in 64-bit length? I think the type should be the same as the IV. bmahjour: Can we avoid truncating to 32-bit on 64-bit targets that take in 64-bit length? I think the…
		vkmrUnsubmitted Done Reply Inline Actions See previous related comment. %evl parameter in VP intrinsics is of type i32. vkmr: See previous related comment. %evl parameter in VP intrinsics is of type i32.
return CM_ScalarEpilogueAllowed;		return CM_ScalarEpilogueAllowed;
}		}

// Process the loop in the VPlan-native vectorization path. This path builds		// Process the loop in the VPlan-native vectorization path. This path builds
// VPlan upfront in the vectorization pipeline, which allows to apply		// VPlan upfront in the vectorization pipeline, which allows to apply
// VPlan-to-VPlan transformations from the very beginning without modifying the		// VPlan-to-VPlan transformations from the very beginning without modifying the
// input LLVM IR.		// input LLVM IR.
static bool processLoopInVPlanNativePath(		static bool processLoopInVPlanNativePath(
▲ Show 20 Lines • Show All 796 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/VPlan.h

Show First 20 Lines • Show All 236 Lines • ▼ Show 20 Lines	VPTransformState(ElementCount VF, unsigned UF, LoopInfo *LI,
InnerLoopVectorizer ILV, VPlan Plan, LLVMContext &Ctx)		InnerLoopVectorizer ILV, VPlan Plan, LLVMContext &Ctx)
: VF(VF), UF(UF), LI(LI), DT(DT), Builder(Builder), ILV(ILV), Plan(Plan),		: VF(VF), UF(UF), LI(LI), DT(DT), Builder(Builder), ILV(ILV), Plan(Plan),
LVer(nullptr), TypeAnalysis(Ctx) {}		LVer(nullptr), TypeAnalysis(Ctx) {}

/// The chosen Vectorization and Unroll Factors of the loop being vectorized.		/// The chosen Vectorization and Unroll Factors of the loop being vectorized.
ElementCount VF;		ElementCount VF;
unsigned UF;		unsigned UF;

		/// If EVL is not nullptr, then EVL must be a valid value set during plan
		/// creation, possibly a default value = whole vector register length. EVL is
		fhahnUnsubmitted Not Done Reply Inline Actions nit: `possibly a default value` ? Turn into a doc comment `///` fhahn: nit: `possibly a default value `? Turn into a doc comment `///`
		/// created only if TTI prefers predicated vectorization, thus if EVL is
		fhahnUnsubmitted Done Reply Inline Actions This is only used by the EVL recipe unless I missed something. Could it just be a member of the recipe instead? fhahn: This is only used by the EVL recipe unless I missed something. Could it just be a member of the…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Will do. ABataev: Will do.
		/// not nullptr it also implies preference for predicated vectorization.
		VPValue *EVL = nullptr;

/// Hold the indices to generate specific scalar instructions. Null indicates		/// Hold the indices to generate specific scalar instructions. Null indicates
/// that all instances are to be generated, using either scalar or vector		/// that all instances are to be generated, using either scalar or vector
/// instructions.		/// instructions.
std::optional<VPIteration> Instance;		std::optional<VPIteration> Instance;

struct DataState {		struct DataState {
/// A type for vectorized values in the new loop. Each value from the		/// A type for vectorized values in the new loop. Each value from the
/// original loop, when vectorized, is represented by UF vector values in		/// original loop, when vectorized, is represented by UF vector values in
▲ Show 20 Lines • Show All 799 Lines • ▼ Show 20 Lines	public:
enum {		enum {
FirstOrderRecurrenceSplice =		FirstOrderRecurrenceSplice =
Instruction::OtherOpsEnd + 1, // Combines the incoming and previous		Instruction::OtherOpsEnd + 1, // Combines the incoming and previous
// values of a first-order recurrence.		// values of a first-order recurrence.
Not,		Not,
SLPLoad,		SLPLoad,
SLPStore,		SLPStore,
ActiveLaneMask,		ActiveLaneMask,
		ExplicitVectorLength,
		ExplicitVectorLengthIVIncrement,
CalculateTripCountMinusVF,		CalculateTripCountMinusVF,
// Increment the canonical IV separately for each unrolled part.		// Increment the canonical IV separately for each unrolled part.
CanonicalIVIncrementForPart,		CanonicalIVIncrementForPart,
BranchOnCount,		BranchOnCount,
BranchOnCond		BranchOnCond
};		};

private:		private:
▲ Show 20 Lines • Show All 92 Lines • ▼ Show 20 Lines	bool onlyFirstLaneUsed(const VPValue *Op) const override {
assert(is_contained(operands(), Op) &&		assert(is_contained(operands(), Op) &&
"Op must be an operand of the recipe");		"Op must be an operand of the recipe");
if (getOperand(0) != Op)		if (getOperand(0) != Op)
return false;		return false;
switch (getOpcode()) {		switch (getOpcode()) {
default:		default:
return false;		return false;
case VPInstruction::ActiveLaneMask:		case VPInstruction::ActiveLaneMask:
		case VPInstruction::ExplicitVectorLength:
		case VPInstruction::ExplicitVectorLengthIVIncrement:
case VPInstruction::CalculateTripCountMinusVF:		case VPInstruction::CalculateTripCountMinusVF:
case VPInstruction::CanonicalIVIncrementForPart:		case VPInstruction::CanonicalIVIncrementForPart:
case VPInstruction::BranchOnCount:		case VPInstruction::BranchOnCount:
return true;		return true;
};		};
llvm_unreachable("switch should return");		llvm_unreachable("switch should return");
}		}

▲ Show 20 Lines • Show All 765 Lines • ▼ Show 20 Lines	bool usesScalars(const VPValue *Op) const override {
assert(is_contained(operands(), Op) &&		assert(is_contained(operands(), Op) &&
"Op must be an operand of the recipe");		"Op must be an operand of the recipe");
return true;		return true;
}		}
};		};

/// VPPredInstPHIRecipe is a recipe for generating the phi nodes needed when		/// VPPredInstPHIRecipe is a recipe for generating the phi nodes needed when
/// control converges back from a Branch-on-Mask. The phi nodes are needed in		/// control converges back from a Branch-on-Mask. The phi nodes are needed in
/// order to merge values that are set under such a branch and feed their uses.		/// order to merge values that are set under such a branch and feed their uses.
		fhahnUnsubmitted Not Done Reply Inline Actions IIUC codegen depends on the induction and trip count for targets with active vector length? Those should probably be modeled as operands for the recipe. For the non-active-vector length case, the recipe is loop invariant, right? Could we just use live-in VPValue in those cases instead of the recipe? We may have to set up the value outside the loop for scalable VFs. It would also be good to expand the comment to describe the behavior. fhahn: IIUC codegen depends on the induction and trip count for targets with active vector length?
		vkmrUnsubmitted Not Done Reply Inline Actions IIUC codegen depends on the induction and trip count for targets with active vector length? Those should probably be modeled as operands for the recipe. Correct. Given that `TripCount` and `Induction` are members of ILV, from VPlan's perspective does modeling them as operands for the recipe give any functional benefit or is it more of a VPlan design principle thing? For the non-active-vector length case, the recipe is loop invariant, right? Could we just use live-in VPValue in those cases instead of the recipe? We may have to set up the value outside the loop for scalable VFs. Right. It is the runtime VF of the vector loop. Just to make sure, in the VPlan terminology, a "live-in VPValue" is a direct mapping from an LLVM IR Value defined outside the loop body that is used in the loop body, right? For scalable vectors , yes we would have to create an IR instruction for `vscale x vf`. IIUC, there's supposed to be a VPlan built for each possible VF, but it wouldn't be desirable to create an IR instruction for each VF; Does VPlan allow creating a VPValue (that is not a recipe) that would be expand to an IR instruction on plan execution? (Pardon me if I am missing something obvious here!) vkmr: > IIUC codegen depends on the induction and trip count for targets with active vector length?
/// The phi nodes can be scalar or vector depending on the users of the value.		/// The phi nodes can be scalar or vector depending on the users of the value.
/// This recipe works in concert with VPBranchOnMaskRecipe.		/// This recipe works in concert with VPBranchOnMaskRecipe.
class VPPredInstPHIRecipe : public VPRecipeBase, public VPValue {		class VPPredInstPHIRecipe : public VPRecipeBase, public VPValue {
public:		public:
/// Construct a VPPredInstPHIRecipe given \p PredInst whose value needs a phi		/// Construct a VPPredInstPHIRecipe given \p PredInst whose value needs a phi
/// nodes after merging back from a Branch-on-Mask.		/// nodes after merging back from a Branch-on-Mask.
VPPredInstPHIRecipe(VPValue *PredV)		VPPredInstPHIRecipe(VPValue *PredV)
: VPRecipeBase(VPDef::VPPredInstPHISC, PredV), VPValue(this) {}		: VPRecipeBase(VPDef::VPPredInstPHISC, PredV), VPValue(this) {}
▲ Show 20 Lines • Show All 115 Lines • ▼ Show 20 Lines	#endif

Instruction &getIngredient() const { return Ingredient; }		Instruction &getIngredient() const { return Ingredient; }
};		};

/// Recipe to expand a SCEV expression.		/// Recipe to expand a SCEV expression.
class VPExpandSCEVRecipe : public VPRecipeBase, public VPValue {		class VPExpandSCEVRecipe : public VPRecipeBase, public VPValue {
const SCEV *Expr;		const SCEV *Expr;
ScalarEvolution &SE;		ScalarEvolution &SE;

		fhahnUnsubmitted Not Done Reply Inline Actions Not sure how well mirroring predicated recipes will scale as more recipes get added. Did you consider extending the existing recipes to take an extra operand for the EVL? fhahn: Not sure how well mirroring predicated recipes will scale as more recipes get added. Did you…
		vkmrUnsubmitted Not Done Reply Inline Actions I am actually considering and would prefer to extend the existing recipes to support EVL, for the reason you mentioned. IIRC at the time of our downstream implementation, the existing recipes were different enough and in a state of flux, that it felt cleaner to have mirrored recipes (besides we needed just 2 of them). Doesn't make much sense now, though. vkmr: I am actually considering and would prefer to extend the existing recipes to support EVL, for…
		vkmrUnsubmitted Not Done Reply Inline Actions BTW, when you say "extending the existing recipes", do you mean modifying the existing recipe implementation to add a new constructor and EVL and Mask operands or do you mean implementing predicated recipe class as an inherited class from the non-predicated recipe as the base class? In my previous reply I assumed the latter, and it is what I prefer over modifying existing recipes (unless it doesn't align with VPlan design principles!) - at the expense of little extra boilerplate code, keeps the two approaches separate and clean. vkmr: BTW, when you say "extending the existing recipes", do you mean modifying the existing recipe…
public:		public:
VPExpandSCEVRecipe(const SCEV *Expr, ScalarEvolution &SE)		VPExpandSCEVRecipe(const SCEV *Expr, ScalarEvolution &SE)
: VPRecipeBase(VPDef::VPExpandSCEVSC, {}), VPValue(this), Expr(Expr),		: VPRecipeBase(VPDef::VPExpandSCEVSC, {}), VPValue(this), Expr(Expr),
SE(SE) {}		SE(SE) {}

~VPExpandSCEVRecipe() override = default;		~VPExpandSCEVRecipe() override = default;

VP_CLASSOF_IMPL(VPDef::VPExpandSCEVSC)		VP_CLASSOF_IMPL(VPDef::VPExpandSCEVSC)
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	#endif

/// Check if the induction described by \p Kind, /p Start and \p Step is		/// Check if the induction described by \p Kind, /p Start and \p Step is
/// canonical, i.e. has the same start, step (of 1), and type as the		/// canonical, i.e. has the same start, step (of 1), and type as the
/// canonical IV.		/// canonical IV.
bool isCanonical(InductionDescriptor::InductionKind Kind, VPValue *Start,		bool isCanonical(InductionDescriptor::InductionKind Kind, VPValue *Start,
VPValue Step, Type Ty) const;		VPValue Step, Type Ty) const;
};		};

/// A recipe for generating the active lane mask for the vector loop that is		/// A recipe for generating the active lane mask for the vector loop that is
		fhahnUnsubmitted Not Done Reply Inline Actions nit: A recipe to generate `an` Explicit... fhahn: nit: A recipe to generate `an `Explicit...
/// used to predicate the vector operations.		/// used to predicate the vector operations.
		fhahnUnsubmitted Done Reply Inline Actions It would probably be helpful to explain here how the explicit vector length is computed fhahn: It would probably be helpful to explain here how the explicit vector length is computed
		fhahnUnsubmitted Not Done Reply Inline Actions nit: might be good to use `vector predication` instead of `VPred`. fhahn: nit: might be good to use `vector predication` instead of `VPred`.
/// TODO: It would be good to use the existing VPWidenPHIRecipe instead and		/// TODO: It would be good to use the existing VPWidenPHIRecipe instead and
/// remove VPActiveLaneMaskPHIRecipe.		/// remove VPActiveLaneMaskPHIRecipe.
class VPActiveLaneMaskPHIRecipe : public VPHeaderPHIRecipe {		class VPActiveLaneMaskPHIRecipe : public VPHeaderPHIRecipe {
public:		public:
		AyalUnsubmitted Not Done Reply Inline Actions But VPEVLRecipe seems to compute min(VF,TC-I) aka "the second way" in the above summary, rather than RISC-V's special setvl(TC-I,SEW,LMUL) instruction aka third bullet there? The use of "VF" here is confusing, as it is effectively used to compute VF... should it be MinVF or MaxVF instead? Ayal: But VPEVLRecipe seems to compute min(VF,TC-I) aka "the second way" in the above summary, rather…
		ABataevAuthorUnsubmitted Done Reply Inline Actions No, it emits setvl There is no VF. ABataev: 1. No, it emits setvl 2. There is no VF.
VPActiveLaneMaskPHIRecipe(VPValue *StartMask, DebugLoc DL)		VPActiveLaneMaskPHIRecipe(VPValue *StartMask, DebugLoc DL)
: VPHeaderPHIRecipe(VPDef::VPActiveLaneMaskPHISC, nullptr, StartMask,		: VPHeaderPHIRecipe(VPDef::VPActiveLaneMaskPHISC, nullptr, StartMask,
		fhahnUnsubmitted Done Reply Inline Actions IIUC this is what the recipe actually computes at the moment, right? fhahn: IIUC this is what the recipe actually computes at the moment, right?
		ABataevAuthorUnsubmitted Done Reply Inline Actions No, currently it emits number 3. Other are not fully supported, they will require some extra work. ABataev: No, currently it emits number 3. Other are not fully supported, they will require some extra…
		fhahnUnsubmitted Done Reply Inline Actions Ah I see, would be good to update fhahn: Ah I see, would be good to update
		ABataevAuthorUnsubmitted Done Reply Inline Actions Will update the comment ABataev: Will update the comment
		fhahnUnsubmitted Not Done Reply Inline Actions Thanks for the update! IIUC options 1) and 2) don't need to be handled by `VPEVLRecipe`, 1) would just be a live-in VPValue and 2) could be a set VPInsructions to compute it. Would it make sense to only focus on 3) for EVL recipe. It would be good to mention that the recipe effectively sets the EVL for the remainder of the region. fhahn: Thanks for the update! IIUC options 1) and 2) don't need to be handled by `VPEVLRecipe`, 1)…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Will remove 1 and 2 and will focus only on 3. It is not quite correct. For RISC-V it may return different values (< VLMAX) for 2 last iterations, not only for the remainder. ABataev: 1. Will remove 1 and 2 and will focus only on 3. 2. It is not quite correct. For RISC-V it may…
		nikolaypanchenkoUnsubmitted Not Done Reply Inline Actions To second @ABataev comment on 2): VL csr is controlled by `vsetvl` instruction that has following behavior https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#63-constraints-on-setting-vl. Being more precise, with VLMAX=8 and AVL=9, it may either do (8, 1) or (5, 4). nikolaypanchenko: To second @ABataev comment on 2): VL csr is controlled by `vsetvl` instruction that has…
		AyalUnsubmitted Not Done Reply Inline Actions Good clarification! Would be good to explain at the outset "what the recipe actually computes"? (aka "3" - whose definition was lost somewhere?). Trying to reason about this further: an original scalar loop for(I=0; I<TC; ++I) { ... } processing Standard Elements of Width SEW can be vectorized, along with tail if needed, by doing for (I=0, N=TC; I<TC; EVL=vsetlv(N, SEW), I+=EVL, N-=EVL) { ... } The spec and comments above imply that, strictly speaking: (a) `I` may advance in non-fixed increments, in the end, and thus is not an Induction Variable in general. (b) the vector loop produced as such is uncountable in general - does not have a predetermined vector-trip-count. (c) a countable vector loop can be produced by considering the VL of the first vector iteration, e.g.,: VL0 = vsetlv(TC, SEW); VTC = ceil(TC/VL0); for (I=0, N=TC, VIV=0; VIV<VTC; ++VIV, EVL=vsetlv(N, SEW), I+=EVL, N-=EVL) { ... } where `VIV` is an Induction Variable counting Vector Iterations and is separate from `I` which specifies the lanes processed in each vector iteration. A countable vector loop may be amenable to subsequent optimizations such as unrolling, modulo-scheduling, translation to hardware-loop. Ayal: Good clarification! Would be good to explain at the outset "what the recipe actually computes"?
		ABataevAuthorUnsubmitted Done Reply Inline Actions Good clarification! Would be good to explain at the outset "what the recipe actually computes"? (aka "3" - whose definition was lost somewhere?). Trying to reason about this further: an original scalar loop for(I=0; I<TC; ++I) { ... } processing Standard Elements of Width SEW can be vectorized, along with tail if needed, by doing for (I=0, N=TC; I<TC; EVL=vsetlv(N, SEW), I+=EVL, N-=EVL) { ... } The spec and comments above imply that, strictly speaking: (a) `I` may advance in non-fixed increments, in the end, and thus is not an Induction Variable in general. (b) the vector loop produced as such is uncountable in general - does not have a predetermined vector-trip-count. And I described this already. (c) a countable vector loop can be produced by considering the VL of the first vector iteration, e.g.,: VL0 = vsetlv(TC, SEW); VTC = ceil(TC/VL0); for (I=0, N=TC, VIV=0; VIV<VTC; ++VIV, EVL=vsetlv(N, SEW), I+=EVL, N-=EVL) { ... } where `VIV` is an Induction Variable counting Vector Iterations and is separate from `I` which specifies the lanes processed in each vector iteration. A countable vector loop may be amenable to subsequent optimizations such as unrolling, modulo-scheduling, translation to hardware-loop. Yes, we're going to implement this later. ABataev: > Good clarification! > > Would be good to explain at the outset "what the recipe actually…
		fhahnUnsubmitted Not Done Reply Inline Actions Right, I meant it sets it for the recipes in the remainder of the region. It can set a different VL on each iteration of the region fhahn: Right, I meant it sets it for the recipes in the remainder of the region. It can set a…
DL) {}		DL) {}

~VPActiveLaneMaskPHIRecipe() override = default;		~VPActiveLaneMaskPHIRecipe() override = default;

VP_CLASSOF_IMPL(VPDef::VPActiveLaneMaskPHISC)		VP_CLASSOF_IMPL(VPDef::VPActiveLaneMaskPHISC)

static inline bool classof(const VPHeaderPHIRecipe *D) {		static inline bool classof(const VPHeaderPHIRecipe *D) {
return D->getVPDefID() == VPDef::VPActiveLaneMaskPHISC;		return D->getVPDefID() == VPDef::VPActiveLaneMaskPHISC;
		AyalUnsubmitted Not Done Reply Inline Actions VPEVLRecipe holds no state, so deserves an opcode in VPInstruction rather than a dedicated recipe? Regarding the two operands of VPEVLRecipe - strictly speaking neither an Induction Variable nor a Vector Trip Count seem applicable, following the above? The current 2nd operand is received in the constructor as `TC`, but retrieved as `VTC`? Ayal: VPEVLRecipe holds no state, so deserves an opcode in VPInstruction rather than a dedicated…
}		}
		fhahnUnsubmitted Not Done Reply Inline Actions nit: turn into struct if only public members? fhahn: nit: turn into struct if only public members?

/// Generate the active lane mask phi of the vector loop.		/// Generate the active lane mask phi of the vector loop.
void execute(VPTransformState &State) override;		void execute(VPTransformState &State) override;

#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)		#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)
/// Print the recipe.		/// Print the recipe.
void print(raw_ostream &O, const Twine &Indent,		void print(raw_ostream &O, const Twine &Indent,
VPSlotTracker &SlotTracker) const override;		VPSlotTracker &SlotTracker) const override;
#endif		#endif
};		};

		/// A recipe for generating the phi node for the current index of elements,
		fhahnUnsubmitted Not Done Reply Inline Actions This is out of date now I think as now the CanonicalIV is left unchanged and used for the countable exit condition. There might be a better name for the recipe, as the current one doesn't seem to capture the essence of the recipe, which effectively represents the current index of elements to process in the current iteration, by account for EVL. fhahn: This is out of date now I think as now the CanonicalIV is left unchanged and used for the…
		/// adjusted in accordance with EVL value. It starts at StartIV value and gets
		fhahnUnsubmitted Done Reply Inline Actions Might be good to explicit say that it starts at 0 and gets incremented by EVL in each iteration of the vector loop? fhahn: Might be good to explicit say that it starts at 0 and gets incremented by EVL in each iteration…
		/// incremented by EVL in each iteration of the vector loop.
		class VPEVLBasedIVPHIRecipe : public VPHeaderPHIRecipe {
		public:
		VPEVLBasedIVPHIRecipe(VPValue *StartMask, DebugLoc DL)
		fhahnUnsubmitted Not Done Reply Inline Actions `VPExplicitVectorLengthExitPHISC`? fhahn: `VPExplicitVectorLengthExitPHISC`?
		: VPHeaderPHIRecipe(VPDef::VPEVLBasedIVPHISC, nullptr, StartMask, DL) {}

		~VPEVLBasedIVPHIRecipe() override = default;

		VP_CLASSOF_IMPL(VPDef::VPEVLBasedIVPHISC)

		static inline bool classof(const VPHeaderPHIRecipe *D) {
		return D->getVPDefID() == VPDef::VPEVLBasedIVPHISC;
		}

		/// Generate phi for handling IV based on EVL over iterations correctly.
		fhahnUnsubmitted Not Done Reply Inline Actions nit: active lane mask phi? fhahn: nit: active lane mask phi?
		AyalUnsubmitted Done Reply Inline Actions Is a dedicated recipe needed, or would some general header-phi recipe/VPInstruction suffice? Ayal: Is a dedicated recipe needed, or would some general header-phi recipe/VPInstruction suffice?
		void execute(VPTransformState &State) override;

		/// Returns true if the recipe only uses the first lane of operand \p Op.
		bool onlyFirstLaneUsed(const VPValue *Op) const override {
		assert(is_contained(operands(), Op) &&
		"Op must be an operand of the recipe");
		return true;
		}

		#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)
		/// Print the recipe.
		void print(raw_ostream &O, const Twine &Indent,
		VPSlotTracker &SlotTracker) const override;
		#endif
		};

/// A Recipe for widening the canonical induction variable of the vector loop.		/// A Recipe for widening the canonical induction variable of the vector loop.
class VPWidenCanonicalIVRecipe : public VPRecipeBase, public VPValue {		class VPWidenCanonicalIVRecipe : public VPRecipeBase, public VPValue {
public:		public:
VPWidenCanonicalIVRecipe(VPCanonicalIVPHIRecipe *CanonicalIV)		VPWidenCanonicalIVRecipe(VPCanonicalIVPHIRecipe *CanonicalIV)
: VPRecipeBase(VPDef::VPWidenCanonicalIVSC, {CanonicalIV}),		: VPRecipeBase(VPDef::VPWidenCanonicalIVSC, {CanonicalIV}),
VPValue(this) {}		VPValue(this) {}

~VPWidenCanonicalIVRecipe() override = default;		~VPWidenCanonicalIVRecipe() override = default;
▲ Show 20 Lines • Show All 921 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp

Show First 20 Lines • Show All 201 Lines • ▼ Show 20 Lines	if (Type *CachedTy = CachedTypes.lookup(V))
return CachedTy;		return CachedTy;

if (V->isLiveIn())		if (V->isLiveIn())
return V->getLiveInIRValue()->getType();		return V->getLiveInIRValue()->getType();

Type *ResultTy =		Type *ResultTy =
TypeSwitch<const VPRecipeBase , Type >(V->getDefiningRecipe())		TypeSwitch<const VPRecipeBase , Type >(V->getDefiningRecipe())
.Case<VPCanonicalIVPHIRecipe, VPFirstOrderRecurrencePHIRecipe,		.Case<VPCanonicalIVPHIRecipe, VPFirstOrderRecurrencePHIRecipe,
VPReductionPHIRecipe, VPWidenPointerInductionRecipe>(		VPReductionPHIRecipe, VPWidenPointerInductionRecipe,
[this](const auto *R) {		VPEVLBasedIVPHIRecipe>([this](const auto *R) {
// Handle header phi recipes, except VPWienIntOrFpInduction		// Handle header phi recipes, except VPWienIntOrFpInduction
// which needs special handling due it being possibly truncated.		// which needs special handling due it being possibly truncated.
// TODO: consider inferring/caching type of siblings, e.g.,		// TODO: consider inferring/caching type of siblings, e.g.,
// backedge value, here and in cases below.		// backedge value, here and in cases below.
return inferScalarType(R->getStartValue());		return inferScalarType(R->getStartValue());
})		})
.Case<VPWidenIntOrFpInductionRecipe, VPDerivedIVRecipe>(		.Case<VPWidenIntOrFpInductionRecipe, VPDerivedIVRecipe>(
[](const auto *R) { return R->getScalarType(); })		[](const auto *R) { return R->getScalarType(); })
.Case<VPPredInstPHIRecipe, VPWidenPHIRecipe, VPScalarIVStepsRecipe,		.Case<VPPredInstPHIRecipe, VPWidenPHIRecipe, VPScalarIVStepsRecipe,
VPWidenGEPRecipe>([this](const VPRecipeBase *R) {		VPWidenGEPRecipe>([this](const VPRecipeBase *R) {
return inferScalarType(R->getOperand(0));		return inferScalarType(R->getOperand(0));
})		})
.Case<VPBlendRecipe, VPInstruction, VPWidenRecipe, VPReplicateRecipe,		.Case<VPBlendRecipe, VPInstruction, VPWidenRecipe, VPReplicateRecipe,
VPWidenCallRecipe, VPWidenMemoryInstructionRecipe,		VPWidenCallRecipe, VPWidenMemoryInstructionRecipe,
Show All 12 Lines

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

Show First 20 Lines • Show All 338 Lines • ▼ Show 20 Lines Value *VPInstruction::generateInstruction(VPTransformState &State,

case VPInstruction::CalculateTripCountMinusVF: { case VPInstruction::CalculateTripCountMinusVF: {

Value *ScalarTC = State.get(getOperand(0), {0, 0}); Value *ScalarTC = State.get(getOperand(0), {0, 0});

Value *Step = Value *Step =

createStepForVF(Builder, ScalarTC->getType(), State.VF, State.UF); createStepForVF(Builder, ScalarTC->getType(), State.VF, State.UF);

Value *Sub = Builder.CreateSub(ScalarTC, Step); Value *Sub = Builder.CreateSub(ScalarTC, Step);

Value *Cmp = Builder.CreateICmp(CmpInst::Predicate::ICMP_UGT, ScalarTC, Step); Value *Cmp = Builder.CreateICmp(CmpInst::Predicate::ICMP_UGT, ScalarTC, Step);

Value *Zero = ConstantInt::get(ScalarTC->getType(), 0); Value *Zero = ConstantInt::get(ScalarTC->getType(), 0);

return Builder.CreateSelect(Cmp, Sub, Zero); return Builder.CreateSelect(Cmp, Sub, Zero);

} }

case VPInstruction::ExplicitVectorLength: {

// Compute EVL

fhahnUnsubmitted

Done

This always needs to be a zero-extend or no-op, right? If it is a truncate that may be incorrect, right? Would be good to add an assert if possible.

fhahn: This always needs to be a zero-extend or no-op, right? If it is a truncate that may be…

fhahnUnsubmitted

Not Done

IIUC the EVL check here is not needed? Can have an asset that EVL implies Part == 0. IIUC Part > 0 could be a problem is that there could be different EVL's for different parts?

fhahn: IIUC the `EVL` check here is not needed? Can have an asset that `EVL` implies `Part == 0`. IIUC…

fhahnUnsubmitted

Not Done

I spent some time thinking on how EVL could
be modeled more explicitly here and shared D157322 which adds an argument for the increment which could be set to EVL when using it, removing the need to modify the recipe itself here

fhahn: I spent some time thinking on how EVL could be modeled more explicitly here and shared D157322…

AyalUnsubmitted

Not Done

Or keep CanonicalIVIncrement intact using an implicit VF*UF step, and introduce separate recipe(s) to take care of controlling the loop and/or computing the lanes each vector iteration, under EVL?

nit: consider checking asserts in VPlanVerifier instead.

Ayal: Or keep CanonicalIVIncrement intact using an implicit VF*UF step, and introduce separate recipe…

ABataevAuthorUnsubmitted

Done

Yep, we can do it. Same question as before - shall I wait for your patch to be landed or we can commit it and then later you will update шеп in your patch?

ABataev: Yep, we can do it. Same question as before - shall I wait for your patch to be landed or we can…

fhahnUnsubmitted

Done

nit: Compute EVL. would be more accurate IIUC, nothing gets set in this function AFACIT.

fhahn: nit: `Compute EVL.` would be more accurate IIUC, nothing gets set in this function AFACIT.

auto GetSetVL = [=](VPTransformState &State, Value *EVL) {

fhahnUnsubmitted

Done

Is State.UF > 1 possible at the moment? It looks like there's a bailout for UserIC and UF/IC=1 is selected if vector predication is enabled.

fhahn: Is State.UF > 1 possible at the moment? It looks like there's a bailout for UserIC and UF/IC=1…

assert(EVL->getType()->isIntegerTy() &&

"Requested vector length should be an integer.");

// TODO: Add support for MaxSafeDist for correct loop emission.

Value *VFArg = State.Builder.getInt32(State.VF.getKnownMinValue());

Value *GVL = State.Builder.CreateIntrinsic(

State.Builder.getInt32Ty(), Intrinsic::experimental_get_vector_length,

{EVL, VFArg, State.Builder.getTrue()});

AyalUnsubmitted

Done

This patch enables EVL for scalable VF's only, so best avoid pretending otherwise (w/o testing) and pass true for now. Extend to VF.isScalable when EVL is extended to handle fixed VF's - along with suitable tests.

Ayal: This patch enables EVL for scalable VF's only, so best avoid pretending otherwise (w/o testing)…

return GVL;

};

// TODO: Restructure this code with an explicit remainder loop, vsetvli can

// be outside of the main loop.

assert(State.UF == 1 &&

"No unrolling expected for predicated vectorization.");

// Compute VTC - IV as the EVL(requested vector length).

Value *Index = State.get(getOperand(0), 0);

AyalUnsubmitted

Done

// Compute VTC - IV as the EVL(requested vector length).

- Value *IV = State.get(getOperand(0), 0);

+ Value *Index = State.get(getOperand(0), 0);

Value *TripCount = State.get(getOperand(1), VPIteration(0, 0));

Ayal:

Value *TripCount = State.get(getOperand(1), VPIteration(0, 0));

Value *EVL = State.Builder.CreateSub(TripCount, Index);

AyalUnsubmitted

Not Done

Value *TripCount = State.get(getOperand(1), VPIteration(0, 0));

- Value *EVL = State.Builder.CreateSub(TripCount, IV);

+ Value *ApplicationVectorLength = State.Builder.CreateSub(TripCount, IV);

Value *SetVL = GetSetVL(State, EVL);

to match RISC-V vsetvli terminology. Avoid AVL which also stands for Active Vector Length...

Ayal: to match RISC-V vsetvli terminology. Avoid AVL which also stands for Active Vector Length...

Value *SetVL = GetSetVL(State, EVL);

State.EVL = this;

return SetVL;

}

case VPInstruction::ExplicitVectorLengthIVIncrement: {

assert(State.UF == 1 && Part == 0 &&

"Expected unroll factor 1 for VP vectorization.");

Value *Phi = State.get(getOperand(0), 0);

Value *EVL = State.get(getOperand(1), 0);

assert(EVL->getType()->getScalarSizeInBits() <=

Phi->getType()->getScalarSizeInBits() &&

"EVL type must be smaller than Phi type.");

EVL = Builder.CreateIntCast(EVL, Phi->getType(), /*isSigned=*/false);

fhahnUnsubmitted

Done

might be good to have a test case where the induction is i32 and no cast is needed.

fhahn: might be good to have a test case where the induction is `i32` and no cast is needed.

ABataevAuthorUnsubmitted

Done

Added @iv32 function in the test

ABataev: Added @iv32 function in the test

AyalUnsubmitted

Done

Have Phi and EVL be of the same type instead of needs to cast the latter to the former?

Is a dedicated VPInstruction needed, or would a general Add suffice, similar to [VPlan] Initial modeling of VF * UF as VPValue. #74761?

Ayal: Have Phi and EVL be of the same type instead of needs to cast the latter to the former? Is a…

return Builder.CreateAdd(Phi, EVL, Name, hasNoUnsignedWrap(),

hasNoSignedWrap());

}

case VPInstruction::CanonicalIVIncrementForPart: { case VPInstruction::CanonicalIVIncrementForPart: {

fhahnUnsubmitted

Done

this comment is out of place now?

fhahn: this comment is out of place now?

auto *IV = State.get(getOperand(0), VPIteration(0, 0)); auto *IV = State.get(getOperand(0), VPIteration(0, 0));

if (Part == 0) if (Part == 0)

return IV; return IV;

// The canonical IV is incremented by the vectorization factor (num of SIMD // The canonical IV is incremented by the vectorization factor (num of SIMD

// elements) times the unroll part. // elements) times the unroll part.

Value *Step = createStepForVF(Builder, IV->getType(), State.VF, Part); Value *Step = createStepForVF(Builder, IV->getType(), State.VF, Part);

return Builder.CreateAdd(IV, Step, Name, hasNoUnsignedWrap(), return Builder.CreateAdd(IV, Step, Name, hasNoUnsignedWrap(),

▲ Show 20 Lines • Show All 100 Lines • ▼ Show 20 Lines case VPInstruction::SLPLoad:

O << "combined load"; O << "combined load";

break; break;

case VPInstruction::SLPStore: case VPInstruction::SLPStore:

O << "combined store"; O << "combined store";

break; break;

case VPInstruction::ActiveLaneMask: case VPInstruction::ActiveLaneMask:

O << "active lane mask"; O << "active lane mask";

break; break;

case VPInstruction::ExplicitVectorLength:

O << "EXPLICIT-VECTOR-LENGTH";

break;

case VPInstruction::ExplicitVectorLengthIVIncrement:

O << "EXPLICIT-VECTOR-LENGTH +";

break;

case VPInstruction::FirstOrderRecurrenceSplice: case VPInstruction::FirstOrderRecurrenceSplice:

O << "first-order splice"; O << "first-order splice";

break; break;

case VPInstruction::BranchOnCond: case VPInstruction::BranchOnCond:

O << "branch-on-cond"; O << "branch-on-cond";

break; break;

case VPInstruction::CalculateTripCountMinusVF: case VPInstruction::CalculateTripCountMinusVF:

O << "TC > VF ? TC - VF : 0"; O << "TC > VF ? TC - VF : 0";

▲ Show 20 Lines • Show All 980 Lines • ▼ Show 20 Lines bool VPCanonicalIVPHIRecipe::isCanonical(

InductionDescriptor::InductionKind Kind, VPValue *Start, VPValue *Step, InductionDescriptor::InductionKind Kind, VPValue *Start, VPValue *Step,

Type *Ty) const { Type *Ty) const {

// The types must match and it must be an integer induction. // The types must match and it must be an integer induction.

if (Ty != getScalarType() || Kind != InductionDescriptor::IK_IntInduction) if (Ty != getScalarType() || Kind != InductionDescriptor::IK_IntInduction)

return false; return false;

// Start must match the start value of this canonical induction. // Start must match the start value of this canonical induction.

if (Start != getStartValue()) if (Start != getStartValue())

return false; return false;

fhahnUnsubmitted

Done

Does this mean the code we generate is not correct cases where MaxSafeDist needs to be handled?

fhahn: Does this mean the code we generate is not correct cases where `MaxSafeDist` needs to be…

ABataevAuthorUnsubmitted

Done

Yes, as I mentioned above, this is just a first patch, some extra work is required to handle correctly all corner cases. Plus, MaxSafeDist analysis needs to be adjusted for EVL, since we can support non-power-of-2 value here

ABataev: Yes, as I mentioned above, this is just a first patch, some extra work is required to handle…

fhahnUnsubmitted

Done

Thanks for clarifying! In that case, I think it would be better to not use EVL lowering for such cases so LV will only generate correct code.

fhahn: Thanks for clarifying! In that case, I think it would be better to not use EVL lowering for…

ABataevAuthorUnsubmitted

Done

Will add a check

ABataev: Will add a check

fhahnUnsubmitted

Done

Sounds good! Are you aware of any other potential correctness issues?

fhahn: Sounds good! Are you aware of any other potential correctness issues?

ABataevAuthorUnsubmitted

Done

No, AFAIK

ABataev: No, AFAIK

// If the step is defined by a recipe, it is not a ConstantInt. // If the step is defined by a recipe, it is not a ConstantInt.

if (Step->getDefiningRecipe()) if (Step->getDefiningRecipe())

craig.topperUnsubmitted

Done

What is RegWidthFactor supposed to represent?

Can we make this the VF instead?

craig.topper: What is RegWidthFactor supposed to represent? Can we make this the VF instead?

ABataevAuthorUnsubmitted

Done

Generally speaking, LMUL. We can pass VF here, I assume, instead, it should be more portable.

ABataev: Generally speaking, LMUL. We can pass VF here, I assume, instead, it should be more portable.

craig.topperUnsubmitted

Done

getRegisterBitWidth already has an override in it for LMUL. So I would have add to getRegisterBitWidth in the backend to even figure out what value the factor was applied to. So VF makes more sense.

I think you're now giving llvm.vscale * VF instead of just VF. I need it to be a constant.

craig.topper: getRegisterBitWidth already has an override in it for LMUL. So I would have add to…

return false; return false;

ConstantInt *StepC = dyn_cast<ConstantInt>(Step->getLiveInIRValue()); ConstantInt *StepC = dyn_cast<ConstantInt>(Step->getLiveInIRValue());

return StepC && StepC->isOne(); return StepC && StepC->isOne();

AyalUnsubmitted

Not Done

Placing vsetvli outside the main loop will compute VL0 as above. Idea of the TODO is to use VL0 repeatedly for all floor(TC/VL0) iterations excluding tail, thereby producing "explicit remainder loop" aka tail, and a countable main vector loop facilitating UF>1?

Ayal: Placing vsetvli outside the main loop will compute VL0 as above. Idea of the TODO is to use VL0…

ABataevAuthorUnsubmitted

Done

It is not possible, generally speaking, since not only tail might be affected

ABataev: It is not possible, generally speaking, since not only tail might be affected

} }

AyalUnsubmitted

Not Done

"interleave or unroll", "Neither unrolling, nor interleaving" - what's the distinction between unrolling and interleaving?

If UF must be 1 then the loop over UF should be omitted.

Ayal: "interleave or unroll", "Neither unrolling, nor interleaving" - what's the distinction between…

bool VPWidenPointerInductionRecipe::onlyScalarsGenerated(ElementCount VF) { bool VPWidenPointerInductionRecipe::onlyScalarsGenerated(ElementCount VF) {

return IsScalarAfterVectorization && return IsScalarAfterVectorization &&

(!VF.isScalable() || vputils::onlyFirstLaneUsed(this)); (!VF.isScalable() || vputils::onlyFirstLaneUsed(this));

} }

fhahnUnsubmitted

Done

What’s RVV VLA?

fhahn: What’s RVV VLA?

ABataevAuthorUnsubmitted

Done

RISCV Vector VLA

ABataev: RISCV Vector VLA

AyalUnsubmitted

Not Done

This indeed looks like the original TripCount rather than a VectorTripCount?

The Sub can be taken care of by a separate VPInstruction, which should feed VPEVLRecipe, rather than having the latter depend on IV and [Vector]TripCount.

The result EVL produced by VPEVLRecipe should feed its dependent recipes directly, rather than via State.EVL?

Ayal: This indeed looks like the original `TripCount` rather than a `VectorTripCount`? The Sub can…

#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP) #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)

fhahnUnsubmitted

Done

Why does the insert point need resetting?

fhahn: Why does the insert point need resetting?

ABataevAuthorUnsubmitted

Done

Must be the very first instructions in the loop, all other instructions depend on this.

ABataev: Must be the very first instructions in the loop, all other instructions depend on this.

fhahnUnsubmitted

Done

Can it be placed at the beginning and marked as having side-effects so it won't get moved?

fhahn: Can it be placed at the beginning and marked as having side-effects so it won't get moved?

void VPWidenPointerInductionRecipe::print(raw_ostream &O, const Twine &Indent, void VPWidenPointerInductionRecipe::print(raw_ostream &O, const Twine &Indent,

VPSlotTracker &SlotTracker) const { VPSlotTracker &SlotTracker) const {

O << Indent << "EMIT "; O << Indent << "EMIT ";

printAsOperand(O, SlotTracker); printAsOperand(O, SlotTracker);

O << " = WIDEN-POINTER-INDUCTION "; O << " = WIDEN-POINTER-INDUCTION ";

getStartValue()->printAsOperand(O, SlotTracker); getStartValue()->printAsOperand(O, SlotTracker);

O << ", " << *IndDesc.getStep(); O << ", " << *IndDesc.getStep();

} }

#endif #endif

void VPExpandSCEVRecipe::execute(VPTransformState &State) { void VPExpandSCEVRecipe::execute(VPTransformState &State) {

assert(!State.Instance && "cannot be used in per-lane"); assert(!State.Instance && "cannot be used in per-lane");

fhahnUnsubmitted

Done

also needs to print the operands , printOperands(O, SlotTracker); should do

fhahn: also needs to print the operands , ` printOperands(O, SlotTracker);` should do

const DataLayout &DL = State.CFG.PrevBB->getModule()->getDataLayout(); const DataLayout &DL = State.CFG.PrevBB->getModule()->getDataLayout();

SCEVExpander Exp(SE, DL, "induction"); SCEVExpander Exp(SE, DL, "induction");

Value *Res = Exp.expandCodeFor(Expr, Expr->getType(), Value *Res = Exp.expandCodeFor(Expr, Expr->getType(),

&*State.Builder.GetInsertPoint()); &*State.Builder.GetInsertPoint());

assert(!State.ExpandedSCEVs.contains(Expr) && assert(!State.ExpandedSCEVs.contains(Expr) &&

"Same SCEV expanded multiple times"); "Same SCEV expanded multiple times");

State.ExpandedSCEVs[Expr] = Res; State.ExpandedSCEVs[Expr] = Res;

▲ Show 20 Lines • Show All 203 Lines • ▼ Show 20 Lines void VPActiveLaneMaskPHIRecipe::print(raw_ostream &O, const Twine &Indent,

VPSlotTracker &SlotTracker) const { VPSlotTracker &SlotTracker) const {

O << Indent << "ACTIVE-LANE-MASK-PHI "; O << Indent << "ACTIVE-LANE-MASK-PHI ";

printAsOperand(O, SlotTracker); printAsOperand(O, SlotTracker);

O << " = phi "; O << " = phi ";

printOperands(O, SlotTracker); printOperands(O, SlotTracker);

} }

#endif #endif

void VPEVLBasedIVPHIRecipe::execute(VPTransformState &State) {

fhahnUnsubmitted

Not Done

nit: VPWidenPHIRecipe implies vector phi, but this is a scalar phi.

fhahn: nit: VPWidenPHIRecipe implies vector phi, but this is a scalar phi.

BasicBlock *VectorPH = State.CFG.getPreheaderBBFor(this);

assert(State.UF == 1 && "Expected unroll factor 1 for VP vectorization.");

Value *Start = State.get(getOperand(0), VPIteration(0, 0));

PHINode *EntryPart =

fhahnUnsubmitted

Done

evl.based.iv?

fhahn: `evl.based.iv`?

State.Builder.CreatePHI(Start->getType(), 2, "evl.based.iv");

shiva0217Unsubmitted

Done

There is a case that the PHI didnt' been inserted at top of basic block.

int foo (int value, int *buf, int *end) {
  int *tmp;
  for (tmp = buf; tmp < end; tmp++)
    value -= *tmp;
  return value;
}

Should we specify insertion point?
Something like:

PHINode *EntryPart = PHINode::Create(
    Start->getType(), 2, "evl.based.iv", &*State.CFG.PrevBB->getFirstInsertionPt());

shiva0217: There is a case that the PHI didnt' been inserted at top of basic block. int foo (int…

ABataevAuthorUnsubmitted

Done

Fixed in VPlanTransforms.cpp by inserting the recipe immediately after CanonicalIVPHI.

ABataev: Fixed in VPlanTransforms.cpp by inserting the recipe immediately after CanonicalIVPHI.

fhahnUnsubmitted

Done

I think VPEVLBasedIVPHIRecipe should be turned into a subclass of VPHeaderPHIRecipe, this will also ensure that the VPlan verifier checks it is in the header section

fhahn: I think `VPEVLBasedIVPHIRecipe` should be turned into a subclass of VPHeaderPHIRecipe, this…

ABataevAuthorUnsubmitted

Done

You mean it should be dervied from VPEVLBasedIVPHIRecipe? Already done.

ABataev: You mean it should be dervied from VPEVLBasedIVPHIRecipe? Already done.

fhahnUnsubmitted

Not Done

Great!

fhahn: Great!

EntryPart->addIncoming(Start, VectorPH);

EntryPart->setDebugLoc(getDebugLoc());

State.set(this, EntryPart, 0);

}

#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)

void VPEVLBasedIVPHIRecipe::print(raw_ostream &O, const Twine &Indent,

VPSlotTracker &SlotTracker) const {

fhahnUnsubmitted

Done

This still needs updating I think

fhahn: This still needs updating I think

O << Indent << "EXPLICIT-VECTOR-LENGTH-BASED-IV-PHI ";

printAsOperand(O, SlotTracker);

O << " = phi ";

printOperands(O, SlotTracker);

}

#endif

llvm/lib/Transforms/Vectorize/VPlanTransforms.h

Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	struct VPlanTransforms {
static void optimize(VPlan &Plan, ScalarEvolution &SE);		static void optimize(VPlan &Plan, ScalarEvolution &SE);

/// Wrap predicated VPReplicateRecipes with a mask operand in an if-then		/// Wrap predicated VPReplicateRecipes with a mask operand in an if-then
/// region block and remove the mask operand. Optimize the created regions by		/// region block and remove the mask operand. Optimize the created regions by
/// iteratively sinking scalar operands into the region, followed by merging		/// iteratively sinking scalar operands into the region, followed by merging
/// regions until no improvements are remaining.		/// regions until no improvements are remaining.
static void createAndOptimizeReplicateRegions(VPlan &Plan);		static void createAndOptimizeReplicateRegions(VPlan &Plan);

/// Replace (ICMP_ULE, wide canonical IV, backedge-taken-count) checks with an		/// Replace (ICMP_ULE, wide canonical IV, backedge-taken-count) checks with an
		fhahnUnsubmitted Not Done Reply Inline Actions It look like the current implementation needs to be updated to actually replace the checks. It also adjusts the induction increment, would be good to check if that is actually needed in the initial version, as per comments elsewhere. fhahn: It look like the current implementation needs to be updated to actually replace the checks. It…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Yes, need to add VPPAllTrye mask vp value. It is required! ABataev: 1. Yes, need to add VPPAllTrye mask vp value. 2. It is required!
/// (active-lane-mask recipe, wide canonical IV, trip-count). If \p		/// (active-lane-mask recipe, wide canonical IV, trip-count). If \p
/// UseActiveLaneMaskForControlFlow is true, introduce an		/// UseActiveLaneMaskForControlFlow is true, introduce an
/// VPActiveLaneMaskPHIRecipe. If \p DataAndControlFlowWithoutRuntimeCheck is		/// VPActiveLaneMaskPHIRecipe. If \p DataAndControlFlowWithoutRuntimeCheck is
/// true, no minimum-iteration runtime check will be created (during skeleton		/// true, no minimum-iteration runtime check will be created (during skeleton
/// creation) and instead it is handled using active-lane-mask. \p		/// creation) and instead it is handled using active-lane-mask. \p
/// DataAndControlFlowWithoutRuntimeCheck implies \p		/// DataAndControlFlowWithoutRuntimeCheck implies \p
/// UseActiveLaneMaskForControlFlow.		/// UseActiveLaneMaskForControlFlow.
static void addActiveLaneMask(VPlan &Plan,		static void addActiveLaneMask(VPlan &Plan,
bool UseActiveLaneMaskForControlFlow,		bool UseActiveLaneMaskForControlFlow,
bool DataAndControlFlowWithoutRuntimeCheck);		bool DataAndControlFlowWithoutRuntimeCheck);

/// Insert truncates and extends for any truncated recipe. Redundant casts		/// Insert truncates and extends for any truncated recipe. Redundant casts
/// will be folded later.		/// will be folded later.
		fhahnUnsubmitted Not Done Reply Inline Actions nit: not all users, all users except the canonical IV increment, right? fhahn: nit: not all users, all users except the canonical IV increment, right?
static void		static void
		fhahnUnsubmitted Not Done Reply Inline Actions `VPExplicitVectorLengthPHIRecipe` needs updating with new name fhahn: `VPExplicitVectorLengthPHIRecipe` needs updating with new name
truncateToMinimalBitwidths(VPlan &Plan,		truncateToMinimalBitwidths(VPlan &Plan,
		fhahnUnsubmitted Done Reply Inline Actions maybe `only used to control the loop` or something like that fhahn: maybe `only used to control the loop` or something like that
const MapVector<Instruction *, uint64_t> &MinBWs,		const MapVector<Instruction *, uint64_t> &MinBWs,
LLVMContext &Ctx);		LLVMContext &Ctx);
		fhahnUnsubmitted Not Done Reply Inline Actions nit: `addExplicitVectorLength`? fhahn: nit: `addExplicitVectorLength`?

		/// Add a VPEVLBasedIVPHIRecipe and related recipes to \p Plan and
		/// replaces all uses except the canonical IV increment of
		/// VPCanonicalIVPHIRecipe with a VPEVLBasedIVPHIRecipe.
		/// VPCanonicalIVPHIRecipe is only used to control the loop after
		/// this transformation.
		static void addExplicitVectorLength(VPlan &Plan);

private:		private:
/// Remove redundant VPBasicBlocks by merging them into their predecessor if		/// Remove redundant VPBasicBlocks by merging them into their predecessor if
/// the predecessor has a single successor.		/// the predecessor has a single successor.
static bool mergeBlocksIntoPredecessors(VPlan &Plan);		static bool mergeBlocksIntoPredecessors(VPlan &Plan);

/// Remove redundant casts of inductions.		/// Remove redundant casts of inductions.
///		///
/// Such redundant casts are casts of induction variables that can be ignored,		/// Such redundant casts are casts of induction variables that can be ignored,
Show All 26 Lines

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

Show First 20 Lines • Show All 1,017 Lines • ▼ Show 20 Lines
//		//
// %TripCount = DataWithControlFlowWithoutRuntimeCheck ?		// %TripCount = DataWithControlFlowWithoutRuntimeCheck ?
// calculate-trip-count-minus-VF (original TC) : original TC		// calculate-trip-count-minus-VF (original TC) : original TC
// %IncrementValue = DataWithControlFlowWithoutRuntimeCheck ?		// %IncrementValue = DataWithControlFlowWithoutRuntimeCheck ?
// CanonicalIVPhi : CanonicalIVIncrement		// CanonicalIVPhi : CanonicalIVIncrement
// %StartV is the canonical induction start value.		// %StartV is the canonical induction start value.
//		//
// The function adds the following recipes:		// The function adds the following recipes:
//		//
		fhahnUnsubmitted Not Done Reply Inline Actions I think turning the step of the canonical induction non-loop-invariant technically turns the canonical IV into a phi that's not a canonical IV any more (which is guaranteed to step the same amount each iteration). Would it work to keep the increment unchanged and keep rounding up the trip count was with regular tail folding initially? Further down the line, the canonical IV issue may be resolved by also replacing the canonical IV node with a regular scalar phi when doing the replacement here. fhahn: I think turning the step of the canonical induction non-loop-invariant technically turns the…
		ABataevAuthorUnsubmitted Done Reply Inline Actions I'll try to improve this. ABataev: I'll try to improve this.
		fhahnUnsubmitted Not Done Reply Inline Actions Did you get a chance to try this out yet? 97687b7aea17 landed, it would probably be good to also remove the header mask from load/store recipes here, to make clear that this optimizes the tail-folded loop? fhahn: Did you get a chance to try this out yet? 97687b7aea17 landed, it would probably be good to…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Already did. The loop is countable, adding a new phi won't give anything, just some extra work without any effect. ABataev: Already did. The loop is countable, adding a new phi won't give anything, just some extra work…
		fhahnUnsubmitted Not Done Reply Inline Actions Oh right I missed that, sorry! Does the latest version actually have to update the canonical IV increment? I might be missing something, but shouldn't the exit condition now use the rounded up version (a multiple of the VF) of the trip count for the compare, so if we increment by EVL then the we might not reach the exit condition? fhahn: Oh right I missed that, sorry! Does the latest version actually have to update the canonical…
		ABataevAuthorUnsubmitted Done Reply Inline Actions There are 2 increments now: 2 than feeds into PHI and another one used for exit condition. It uses rounded version of trip count for comparison. ABataev: There are 2 increments now: 2 than feeds into PHI and another one used for exit condition. It…
		fhahnUnsubmitted Not Done Reply Inline Actions Right, but why both are needed? Can there be more than 1 iteration where EVL < VF? fhahn: Right, but why both are needed? Can there be more than 1 iteration where EVL < VF?
		ABataevAuthorUnsubmitted Done Reply Inline Actions The first one is used to increment IV, another one - to check for number of iterations. We need the first one (which is vsetvli dependable) to correctly count IV. But looks like I need to adjust the IV here, because otherwise we may have wrong comparison. I'll think more about it. ABataev: The first one is used to increment IV, another one - to check for number of iterations. We need…
		craig.topperUnsubmitted Not Done Reply Inline Actions Right, but why both are needed? Can there be more than 1 iteration where EVL < VF? Yes the last 2 iterations can have an EVL less than VL for RISC-V. The vsetvli instruction on RISC-V takes an input called AVL that contains the number of values to process and returns VL subject to the following constraints: vl = AVL if AVL ≤ VLMAX ceil(AVL / 2) ≤ vl ≤ VLMAX if AVL < (2 * VLMAX) vl = VLMAX if AVL ≥ (2 * VLMAX) Bullet 2 there allows the AVL between VLMAX and 2VLMAX to be split over the last 2 iterations. Not all microarchitectures implement this. On each iteration VL cannot be larger than the VL on the previous iteration. craig.topper:* > Right, but why both are needed? Can there be more than 1 iteration where EVL < VF? Yes the…
		AyalUnsubmitted Done Reply Inline Actions As @nikolaypanchenko clarified in https://reviews.llvm.org/D99750#inline-1521321 Hence the trip-count can be computed by taking the ceiling of TC divided by the EVL of first vector iteration, which can be computed in the pre-header. But if the loop iterates until an index reaches an upper bound, where the index is repeatedly bumped by a non-invariant EVL, then the index is not an Induction Variable and the upper bound does not qualify as a (scaled) trip-count, countability of the loop is hidden away. Ayal: As @nikolaypanchenko clarified in https://reviews.llvm.org/D99750#inline-1521321 Hence the…
		fhahnUnsubmitted Not Done Reply Inline Actions Thanks for clarifying! Does the current codegen in the patch work correctly for cases where we execute more than 1 iteration for `EVL < VF`? IIUC the current approach with rounding up the trip count and using VF as increment assumes only one extra iteration. fhahn: Thanks for clarifying! Does the current codegen in the patch work correctly for cases where we…
		ABataevAuthorUnsubmitted Done Reply Inline Actions The total number of iterations is the same, just the vector length changes by balancing the value. If EVL is less than VLMAX, EVL is used as vector length. Only if VLMAX < AVL < 2 * VLMAX some magic may happen, i.e. in last 2 (vectorized) iterations. ABataev: The total number of iterations is the same, just the vector length changes by balancing the…
// vector.ph:		// vector.ph:
// %TripCount = calculate-trip-count-minus-VF (original TC)		// %TripCount = calculate-trip-count-minus-VF (original TC)
// [if DataWithControlFlowWithoutRuntimeCheck]		// [if DataWithControlFlowWithoutRuntimeCheck]
// %EntryInc = canonical-iv-increment-for-part %StartV		// %EntryInc = canonical-iv-increment-for-part %StartV
// %EntryALM = active-lane-mask %EntryInc, %TripCount		// %EntryALM = active-lane-mask %EntryInc, %TripCount
//		//
// vector.body:		// vector.body:
// ...		// ...
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	static VPActiveLaneMaskPHIRecipe *addVPLaneMaskPhiAndUpdateExitBranch(
// Replace the original terminator with BranchOnCond. We have to invert the		// Replace the original terminator with BranchOnCond. We have to invert the
// mask here because a true condition means jumping to the exit block.		// mask here because a true condition means jumping to the exit block.
auto *NotMask = Builder.createNot(ALM, DL);		auto *NotMask = Builder.createNot(ALM, DL);
Builder.createNaryOp(VPInstruction::BranchOnCond, {NotMask}, DL);		Builder.createNaryOp(VPInstruction::BranchOnCond, {NotMask}, DL);
OriginalTerminator->eraseFromParent();		OriginalTerminator->eraseFromParent();
return LaneMaskPhi;		return LaneMaskPhi;
}		}

		/// Replaces (ICMP_ULE, WideCanonicalIV, backedge-taken-count) pattern using
		/// the given idiom \p Idiom.
		static void replaceHeaderPredicateWithIdiom(
		fhahnUnsubmitted Not Done Reply Inline Actions I think using `Exit` condition may be confusing, it replaces the predicate for the header mask. fhahn: I think using `Exit` condition may be confusing, it replaces the predicate for the header mask.
		fhahnUnsubmitted Not Done Reply Inline Actions nit: `replaceHeaderPredicateWithIdiom`? fhahn: nit: `replaceHeaderPredicateWithIdiom`?
		VPlan &Plan, VPValue &Idiom,
		function_ref<bool(VPUser &, unsigned)> Cond = {}) {
		auto *FoundWidenCanonicalIVUser =
		find_if(Plan.getCanonicalIV()->users(),
		[](VPUser *U) { return isa<VPWidenCanonicalIVRecipe>(U); });
		if (FoundWidenCanonicalIVUser == Plan.getCanonicalIV()->users().end())
		return;
		auto *WideCanonicalIV =
		cast<VPWidenCanonicalIVRecipe>(*FoundWidenCanonicalIVUser);
		shiva0217Unsubmitted Done Reply Inline Actions There is a case that VPWidenCanonicalIVRecipe didn't be generated with tail folding. int i; int foo (int q, int z) { int e = 0; while (z < 1) { q = z * 2; if (q != 0) for (i = 0; i < 2; ++i) e = 5; ++z; } return e; } `for (i = 0; i < 2; ++i)` been simplifed as `store i32 2, ptr @i`. Both pointer and store value are loop-invariant, so the mask(VPWidenCanonicalIVRecipe) might not be generated. Should we suppress the replacement when the mask is not available? shiva0217: There is a case that VPWidenCanonicalIVRecipe didn't be generated with tail folding. int i…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Fixed, added the test ABataev: Fixed, added the test
		fhahnUnsubmitted Done Reply Inline Actions Was this fixed by adding the `bool KeepVPCanonicalWidenRecipes` flag? What's the test case for this? There's a new `no_masking` case, but it has an empty body and no vector code is generated? fhahn: Was this fixed by adding the ` bool KeepVPCanonicalWidenRecipes` flag? What's the test case for…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Yes. Yes, this test is a reduced version of the failed case ABataev: 1. Yes. 2. Yes, this test is a reduced version of the failed case
		fhahnUnsubmitted Not Done Reply Inline Actions Is it possible it is over-reduced? The same IR seems to be generated both with and without `KeepVPCanonicalWidenRecipes` IIUC, because the loop is not vectorized due to being empty. What's the issue if there's no mask/canonical widen recipe? Wouldn't it be fine to jus not replace anything? fhahn: Is it possible it is over-reduced? The same IR seems to be generated both with and without…
		ABataevAuthorUnsubmitted Done Reply Inline Actions When I checked it, it crashed without this parameter, maybe there were some other changes. ABataev: When I checked it, it crashed without this parameter, maybe there were some other changes.
		fhahnUnsubmitted Not Done Reply Inline Actions Can you check if it still crashes? Would be good to understand exactly what the issue is, and if possible avoid having a separate `KeepVPCanonicalWidenRecipes` flag fhahn: Can you check if it still crashes? Would be good to understand exactly what the issue is, and…
		shiva0217Unsubmitted Not Done Reply Inline Actions `KeepVPCanonicalWidenRecipes` might be motivated by the case that VPWidenCanonicalIVRecipe once exist but been optimized out to the (VPWidenIntOrFpInductionRecipe IV ule trip-count) which let replaceHeaderPredicateWithIdiom fail to replace (VPWidenCanonicalIV ule trip-count) to all-true mask. void foo (char a) { for (int i = 0; i < 256; i++) if (i != '\n') a[i] = 0; } shiva0217:* `KeepVPCanonicalWidenRecipes` might be motivated by the case that VPWidenCanonicalIVRecipe once…
		fhahnUnsubmitted Not Done Reply Inline Actions `VPWidenCanonicalIV` should only be replaced by something else if here's user that accesses the vector values of it I think. Could you share an IR test case that would show an issue? It is still not clear to me what the exact issue would be. @ABataev just to double check, the latest version shouldn't have any issues with @shiva0217's test case, correct? fhahn: `VPWidenCanonicalIV` should only be replaced by something else if here's user that accesses the…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Yes, I added reduced test cases to this patch already and they do not crash ABataev: Yes, I added reduced test cases to this patch already and they do not crash
		// Walk users of WideCanonicalIV and replace all compares of the form
		// (ICMP_ULE, WideCanonicalIV, backedge-taken-count) with
		// the given idiom VPValue.
		VPValue *BTC = Plan.getOrCreateBackedgeTakenCount();
		for (VPUser U : SmallVector<VPUser >(WideCanonicalIV->users())) {
		auto *CompareToReplace = dyn_cast<VPInstruction>(U);
		if (!CompareToReplace \|\|
		CompareToReplace->getOpcode() != Instruction::ICmp \|\|
		CompareToReplace->getPredicate() != CmpInst::ICMP_ULE \|\|
		CompareToReplace->getOperand(1) != BTC)
		continue;

		assert(CompareToReplace->getOperand(0) == WideCanonicalIV &&
		"WidenCanonicalIV must be the first operand of the compare");
		if (Cond) {
		fhahnUnsubmitted Done Reply Inline Actions Would it be simpler just have a common function for finding all the header predicates and then have the code to update the users at the call sites? fhahn: Would it be simpler just have a common function for finding all the header predicates and then…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Tried it, does not look better. Need to remove WideCanonicalIV, it means need it to find it again after the procedure, this leads to code duplication and size increase. Replace parameter with the predicate function instead. ABataev: Tried it, does not look better. Need to remove WideCanonicalIV, it means need it to find it…
		CompareToReplace->replaceUsesWithIf(&Idiom, Cond);
		fhahnUnsubmitted Done Reply Inline Actions `VPWidenCanonicalIVRecipe` should inherit from VPValue via VPHeaderPHIRecipe, so it should only define a single value and `WideCanonicalIV->getNumUsers()` should be sufficient. fhahn: `VPWidenCanonicalIVRecipe` should inherit from VPValue via VPHeaderPHIRecipe, so it should only…
		if (!CompareToReplace->getNumUsers())
		CompareToReplace->eraseFromParent();
		fhahnUnsubmitted Done Reply Inline Actions Added `VPValue::replaceUsesWithIf` in a002271972fb3fb2877bdb4abf9275b2c1291036 as there were already multiple places hand-rolling that functionality. fhahn: Added `VPValue::replaceUsesWithIf` in a002271972fb3fb2877bdb4abf9275b2c1291036 as there were…
		} else {
		CompareToReplace->replaceAllUsesWith(&Idiom);
		CompareToReplace->eraseFromParent();
		}
		}
		if (!WideCanonicalIV->getNumUsers())
		WideCanonicalIV->eraseFromParent();
		}

void VPlanTransforms::addActiveLaneMask(		void VPlanTransforms::addActiveLaneMask(
VPlan &Plan, bool UseActiveLaneMaskForControlFlow,		VPlan &Plan, bool UseActiveLaneMaskForControlFlow,
bool DataAndControlFlowWithoutRuntimeCheck) {		bool DataAndControlFlowWithoutRuntimeCheck) {
assert((!DataAndControlFlowWithoutRuntimeCheck \|\|		assert((!DataAndControlFlowWithoutRuntimeCheck \|\|
UseActiveLaneMaskForControlFlow) &&		UseActiveLaneMaskForControlFlow) &&
"DataAndControlFlowWithoutRuntimeCheck implies "		"DataAndControlFlowWithoutRuntimeCheck implies "
"UseActiveLaneMaskForControlFlow");		"UseActiveLaneMaskForControlFlow");

Show All 13 Lines	LaneMask = new VPInstruction(VPInstruction::ActiveLaneMask,
{WideCanonicalIV, Plan.getTripCount()},		{WideCanonicalIV, Plan.getTripCount()},
nullptr, "active.lane.mask");		nullptr, "active.lane.mask");
LaneMask->insertAfter(WideCanonicalIV);		LaneMask->insertAfter(WideCanonicalIV);
}		}

// Walk users of WideCanonicalIV and replace all compares of the form		// Walk users of WideCanonicalIV and replace all compares of the form
// (ICMP_ULE, WideCanonicalIV, backedge-taken-count) with an		// (ICMP_ULE, WideCanonicalIV, backedge-taken-count) with an
// active-lane-mask.		// active-lane-mask.
VPValue *BTC = Plan.getOrCreateBackedgeTakenCount();		replaceHeaderPredicateWithIdiom(Plan, *LaneMask->getVPSingleValue());
for (VPUser U : SmallVector<VPUser >(WideCanonicalIV->users())) {
auto *CompareToReplace = dyn_cast<VPInstruction>(U);
if (!CompareToReplace \|\|
CompareToReplace->getOpcode() != Instruction::ICmp \|\|
CompareToReplace->getPredicate() != CmpInst::ICMP_ULE \|\|
CompareToReplace->getOperand(1) != BTC)
continue;

assert(CompareToReplace->getOperand(0) == WideCanonicalIV &&
"WidenCanonicalIV must be the first operand of the compare");
CompareToReplace->replaceAllUsesWith(LaneMask->getVPSingleValue());
CompareToReplace->eraseFromParent();
}		}

		// Add a VPEVLBasedIVPHIRecipe and related recipes to \p Plan and
		fhahnUnsubmitted Not Done Reply Inline Actions nit: `Entry->Header` fhahn: nit: `Entry->Header`
		// replaces all uses except the canonical IV increment of VPCanonicalIVPHIRecipe
		fhahnUnsubmitted Not Done Reply Inline Actions This shouldn't introduce any new dead recipes, unless the canonical IV is only used to control the exit, so IIUC the only dead recipes introduced here should be due to `replacePredicateWithIdiom`. May be good to either clean up the recipes there or move `addVectorPredication` after `optimizeInductions`, but that would require passing in an extra flag to `optimize()`, so maybe better to clean up the dead recipes directly after removing their users. fhahn: This shouldn't introduce any new dead recipes, unless the canonical IV is only used to control…
		fhahnUnsubmitted Not Done Reply Inline Actions nit: all uses except the canonical IV increment. fhahn: nit: all uses except the canonical IV increment.
		// with a VPEVLBasedIVPHIRecipe. VPCanonicalIVPHIRecipe is used only
		fhahnUnsubmitted Not Done Reply Inline Actions How ActiveLaneMask factor in here? fhahn: How ActiveLaneMask factor in here?
		fhahnUnsubmitted Not Done Reply Inline Actions nit: the only user is CanonicalIVIncrement I think, branch-on-count uses the increment. fhahn: nit: the only user is CanonicalIVIncrement I think, branch-on-count uses the increment.
		fhahnUnsubmitted Done Reply Inline Actions needs updating with new name fhahn: needs updating with new name
		// for loop iterations counting after this transformation.
		//
		// The function uses the following definitions:
		// %StartV is the canonical induction start value.
		//
		// The function adds the following recipes:
		//
		// vector.ph:
		// ...
		fhahnUnsubmitted Not Done Reply Inline Actions Would be good to describe here what recipes are added and what's changed. fhahn: Would be good to describe here what recipes are added and what's changed.
		//
		// vector.body:
		fhahnUnsubmitted Not Done Reply Inline Actions Simpler to first get the canonical IV via `getCanonicalIV` and use it to access its increment? fhahn: Simpler to first get the canonical IV via `getCanonicalIV` and use it to access its increment?
		// ...
		// %P = EXPLICIT-VECTOR-LENGTH-BASED-IV-PHI [ %StartV, %vector.ph ], [ %NextEVL,
		// %vector.body ]
		// %EVL = EXPLICIT-VECTOR-LENGTH %P, original TC
		// ...
		// %NextEVL = EXPLICIT-VECTOR-LENGTH + %P, %EVL
		// ...
		//
		void VPlanTransforms::addExplicitVectorLength(VPlan &Plan) {
		VPBasicBlock *Header = Plan.getVectorLoopRegion()->getEntryBasicBlock();
		auto *CanonicalIVPHI = Plan.getCanonicalIV();
		VPValue *StartV = CanonicalIVPHI->getStartValue();

		// Walk users of WideCanonicalIV and replace all compares of the form
		// (ICMP_ULE, WideCanonicalIV, backedge-taken-count) with an
		// all-true-mask.
		fhahnUnsubmitted Not Done Reply Inline Actions IIUC this introduces `EVLExitPhi` and uses it for the canonical IV increment only, which controls the exit and adjust the canonical IV. Would it be possible to do it the other way around, i.e. keep the canonical induction incremented by the canonical IV increment (thus keeping it canonical), and instead have `EVLExitPhi` updated by `ExplicitVectorLengthIVIncrement`, and updating the relevant users (all except the canonical increment?) to use that one? fhahn: IIUC this introduces `EVLExitPhi` and uses it for the canonical IV increment only, which…
		Value *TrueMask =
		ConstantInt::getTrue(CanonicalIVPHI->getScalarType()->getContext());
		VPValue *VPTrueMask = Plan.getVPValueOrAddLiveIn(TrueMask);
		replaceHeaderPredicateWithIdiom(Plan, *VPTrueMask, [](VPUser &U, unsigned) {
		fhahnUnsubmitted Not Done Reply Inline Actions Thinking a bit more about this, at the moment is only safe to replace the with TrueMask for VPWidenMemoryInstructionRecipes, for other recipes that use the mask it would be a potential mis-compile, correct? If not, then it would be good to have an assert that all users of the mask we replace are VPWidenMemoryInstructionRecipes. Might be good to also have a test case for such a scenario fhahn: Thinking a bit more about this, at the moment is only safe to replace the with TrueMask for…
		fhahnUnsubmitted Not Done Reply Inline Actions Just do double-check as it looks like the above may have been missed in the latest update, WDYT? fhahn: Just do double-check as it looks like the above may have been missed in the latest update…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Yes, missed it, will fix ABataev: Yes, missed it, will fix
		return isa<VPWidenMemoryInstructionRecipe>(U);
		fhahnUnsubmitted Done Reply Inline Actions name needs updating fhahn: name needs updating
		});
		// Now create the ExplicitVectorLengthPhi recipe in the main loop.
		auto *EVLPhi = new VPEVLBasedIVPHIRecipe(StartV, DebugLoc());
		EVLPhi->insertAfter(CanonicalIVPHI);
		auto *VPEVL = new VPInstruction(VPInstruction::ExplicitVectorLength,
		{EVLPhi, Plan.getTripCount()});
		VPEVL->insertBefore(*Header, Header->getFirstNonPhi());

		auto *CanonicalIVIncrement =
		cast<VPInstruction>(CanonicalIVPHI->getBackedgeValue());
		auto *NextEVLIV = new VPInstruction(
		VPInstruction::ExplicitVectorLengthIVIncrement, {EVLPhi, VPEVL},
		{CanonicalIVIncrement->hasNoUnsignedWrap(),
		CanonicalIVIncrement->hasNoSignedWrap()},
		CanonicalIVIncrement->getDebugLoc(), "index.evl.next");
		NextEVLIV->insertBefore(CanonicalIVIncrement);
		EVLPhi->addOperand(NextEVLIV);

		fhahnUnsubmitted Done Reply Inline Actions name needs updating fhahn: name needs updating
		// Replace all uses of VPCanonicalIVPHIRecipe by
		fhahnUnsubmitted Not Done Reply Inline Actions The only users remaining should be CanonicalIVIncrement. Simpler to just do the below? // Replace all uses of VPCanonicalIVPHIRecipe by // VPExplicitVectorLengthPHIRecipe except for - // VPInstruction::CanonicalIVIncrement and VPCanonicalIVPHIRecipe itself. - for (VPUser U : to_vector(CanonicalIVPHI->users())) { - if (auto I = dyn_cast<VPInstruction>(U); - I && I->getOpcode() == VPInstruction::CanonicalIVIncrement) - continue; - if (isa<VPCanonicalIVPHIRecipe>(U)) - continue; - auto UI = dyn_cast<VPRecipeBase>(U); - if (!UI) - continue; - for (unsigned Idx = 0, E = UI->getNumOperands(); Idx < E; ++Idx) - if (UI->getOperand(Idx) == CanonicalIVPHI) - UI->setOperand(Idx, EVLPhi); - } - // Cleanup dead recipes after the transformation. - removeDeadRecipes(Plan); + // VPInstruction::CanonicalIVIncrement. + CanonicalIVPHI->replaceAllUsesWith(EVLPhi); + CanonicalIVIncrement->setOperand(0, CanonicalIVPHI); fhahn:* The only users remaining should be CanonicalIVIncrement. Simpler to just do the below?
		// VPEVLBasedIVPHIRecipe except for VPInstruction::CanonicalIVIncrement.
		CanonicalIVPHI->replaceAllUsesWith(EVLPhi);
		CanonicalIVIncrement->setOperand(0, CanonicalIVPHI);
		fhahnUnsubmitted Done Reply Inline Actions needs updating with new name fhahn: needs updating with new name
}		}

llvm/lib/Transforms/Vectorize/VPlanValue.h

Show First 20 Lines • Show All 352 Lines • ▼ Show 20 Lines	using VPRecipeTy = enum {
VPScalarIVStepsSC,		VPScalarIVStepsSC,
VPWidenCallSC,		VPWidenCallSC,
VPWidenCanonicalIVSC,		VPWidenCanonicalIVSC,
VPWidenCastSC,		VPWidenCastSC,
VPWidenGEPSC,		VPWidenGEPSC,
VPWidenMemoryInstructionSC,		VPWidenMemoryInstructionSC,
VPWidenSC,		VPWidenSC,
VPWidenSelectSC,		VPWidenSelectSC,
// START: Phi-like recipes. Need to be kept together.		// START: Phi-like recipes. Need to be kept together.
		AyalUnsubmitted Not Done Reply Inline Actions nit: lex order. Ayal: nit: lex order.
VPBlendSC,		VPBlendSC,
VPPredInstPHISC,		VPPredInstPHISC,
// START: SubclassID for recipes that inherit VPHeaderPHIRecipe.		// START: SubclassID for recipes that inherit VPHeaderPHIRecipe.
// VPHeaderPHIRecipe need to be kept together.		// VPHeaderPHIRecipe need to be kept together.
VPCanonicalIVPHISC,		VPCanonicalIVPHISC,
VPActiveLaneMaskPHISC,		VPActiveLaneMaskPHISC,
		VPEVLBasedIVPHISC,
VPFirstOrderRecurrencePHISC,		VPFirstOrderRecurrencePHISC,
VPWidenPHISC,		VPWidenPHISC,
VPWidenIntOrFpInductionSC,		VPWidenIntOrFpInductionSC,
VPWidenPointerInductionSC,		VPWidenPointerInductionSC,
VPReductionPHISC,		VPReductionPHISC,
// END: SubclassID for recipes that inherit VPHeaderPHIRecipe		// END: SubclassID for recipes that inherit VPHeaderPHIRecipe
// END: Phi-like recipes		// END: Phi-like recipes
VPFirstPHISC = VPBlendSC,		VPFirstPHISC = VPBlendSC,
▲ Show 20 Lines • Show All 95 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp

Show First 20 Lines • Show All 196 Lines • ▼ Show 20 Lines	static bool verifyVPBasicBlock(const VPBasicBlock *VPBB,

// Verify that defs in VPBB dominate all their uses. The current		// Verify that defs in VPBB dominate all their uses. The current
// implementation is still incomplete.		// implementation is still incomplete.
DenseMap<const VPRecipeBase *, unsigned> RecipeNumbering;		DenseMap<const VPRecipeBase *, unsigned> RecipeNumbering;
unsigned Cnt = 0;		unsigned Cnt = 0;
for (const VPRecipeBase &R : *VPBB)		for (const VPRecipeBase &R : *VPBB)
RecipeNumbering[&R] = Cnt++;		RecipeNumbering[&R] = Cnt++;

		// Check if EVL recipes exist only in Entry block and only once.
		fhahnUnsubmitted Not Done Reply Inline Actions nit: Check if VPEVLRecipe exists only in Entry bloc `and` only once? fhahn: nit: Check if VPEVLRecipe exists only in Entry bloc `and` only once?
		AyalUnsubmitted Not Done Reply Inline Actions nit: retrieve Entry from the loop region enclosing VPBB, rather than pass it in. Ayal: nit: retrieve Entry from the loop region enclosing VPBB, rather than pass it in.
		DenseSet<unsigned> EVLFound;
		const VPBlockBase *Header = nullptr;
		const VPBlockBase *Exit = nullptr;
		AyalUnsubmitted Not Done Reply Inline Actions nit: may as well dump the error message here. A loop region under EVL may deserve its own verification to check its various constraints, e.g., no inner replicate regions, no replicate recipes, UF=1 only - should also be indicated when printed. Ayal: nit: may as well dump the error message here. A loop region under EVL may deserve its own…
		const VPlan *Plan = VPBB->getPlan();
		if (Plan && Plan->getEntry()->getNumSuccessors() == 1) {
		fhahnUnsubmitted Done Reply Inline Actions `Plan` here should never be `nullptr` fhahn: `Plan` here should never be `nullptr`
		ABataevAuthorUnsubmitted Done Reply Inline Actions It can be nullptr in the unit tests, had to add this to avoid crashes in unit tests. ABataev: It can be nullptr in the unit tests, had to add this to avoid crashes in unit tests.
		fhahnUnsubmitted Not Done Reply Inline Actions Ah I see. Do you happen to know which unit tests? I suspect they need fixing and also checking why this isn't already caught by verification. fhahn: Ah I see. Do you happen to know which unit tests? I suspect they need fixing and also checking…
		Header = Plan->getVectorLoopRegion()->getEntry();
		Exit = Plan->getVectorLoopRegion()->getExiting();
		}
		auto CheckEVLRecipiesInsts = [&](const VPRecipeBase *R) {
		if (isa<VPEVLBasedIVPHIRecipe>(R)) {
		if (!Header \|\| VPBB != Header) {
		errs() << "EVL PHI recipe not in entry block!\n";
		return false;
		}
		if (EVLFound.contains(VPDef::VPEVLBasedIVPHISC)) {
		errs() << "EVL PHI recipe inserted more than once!\n";
		return false;
		}
		EVLFound.insert(VPDef::VPEVLBasedIVPHISC);
		return true;
		}
		auto *RInst = dyn_cast<VPInstruction>(R);
		if (!RInst)
		return true;
		switch (RInst->getOpcode()) {
		case VPInstruction::ExplicitVectorLength:
		if (!Header \|\| VPBB != Header) {
		errs() << "EVL instruction not in entry block!\n";
		return false;
		}
		break;
		case VPInstruction::ExplicitVectorLengthIVIncrement:
		if (!Exit \|\| VPBB != Exit) {
		errs() << "EVL inc instruction not in exit block!\n";
		return false;
		}
		break;
		default:
		return true;
		}
		if (EVLFound.contains(RInst->getOpcode() + VPDef::VPLastPHISC)) {
		errs() << "EVL instruction inserted more than once!\n";
		return false;
		}
		EVLFound.insert(RInst->getOpcode() + VPDef::VPLastPHISC);
		return true;
		};

for (const VPRecipeBase &R : *VPBB) {		for (const VPRecipeBase &R : *VPBB) {
		if (!CheckEVLRecipiesInsts(&R))
		return false;
for (const VPValue *V : R.definedValues()) {		for (const VPValue *V : R.definedValues()) {
for (const VPUser *U : V->users()) {		for (const VPUser *U : V->users()) {
auto *UI = dyn_cast<VPRecipeBase>(U);		auto *UI = dyn_cast<VPRecipeBase>(U);
// TODO: check dominance of incoming values for phis properly.		// TODO: check dominance of incoming values for phis properly.
if (!UI \|\| isa<VPHeaderPHIRecipe>(UI) \|\| isa<VPPredInstPHIRecipe>(UI))		if (!UI \|\| isa<VPHeaderPHIRecipe>(UI) \|\| isa<VPPredInstPHIRecipe>(UI))
continue;		continue;

// If the user is in the same block, check it comes after R in the		// If the user is in the same block, check it comes after R in the
▲ Show 20 Lines • Show All 84 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/RISCV/inloop-reduction.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -mtriple riscv64-linux-gnu -mattr=+v,+d -passes=loop-vectorize < %s -S -o - \| FileCheck %s -check-prefix=OUTLOOP			; RUN: opt -mtriple riscv64-linux-gnu -mattr=+v,+d -passes=loop-vectorize < %s -S -o - \| FileCheck %s -check-prefix=OUTLOOP
	; RUN: opt -mtriple riscv64-linux-gnu -mattr=+v,+d -passes=loop-vectorize -prefer-inloop-reductions < %s -S -o - \| FileCheck %s -check-prefix=INLOOP			; RUN: opt -mtriple riscv64-linux-gnu -mattr=+v,+d -passes=loop-vectorize -prefer-inloop-reductions < %s -S -o - \| FileCheck %s -check-prefix=INLOOP
				; RUN: opt -passes=loop-vectorize -force-tail-folding-style=data-with-evl -prefer-predicate-over-epilogue=predicate-dont-vectorize -mtriple=riscv64 -mattr=+v -S < %s 2>&1 \| FileCheck --check-prefix=IF-EVL %s

	target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n64-S128"			target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n64-S128"
	target triple = "riscv64"			target triple = "riscv64"

	define i32 @add_i16_i32(ptr nocapture readonly %x, i32 %n) {			define i32 @add_i16_i32(ptr nocapture readonly %x, i32 %n) {
	; OUTLOOP-LABEL: @add_i16_i32(			; OUTLOOP-LABEL: @add_i16_i32(
	; OUTLOOP-NEXT: entry:			; OUTLOOP-NEXT: entry:
	; OUTLOOP-NEXT: [[CMP6:%.]] = icmp sgt i32 [[N:%.]], 0			; OUTLOOP-NEXT: [[CMP6:%.]] = icmp sgt i32 [[N:%.]], 0
	▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines
	; INLOOP-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]			; INLOOP-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
	; INLOOP: for.cond.cleanup.loopexit:			; INLOOP: for.cond.cleanup.loopexit:
	; INLOOP-NEXT: [[ADD_LCSSA:%.*]] = phi i32 [ [[ADD]], [[FOR_BODY]] ], [ [[TMP11]], [[MIDDLE_BLOCK]] ]			; INLOOP-NEXT: [[ADD_LCSSA:%.*]] = phi i32 [ [[ADD]], [[FOR_BODY]] ], [ [[TMP11]], [[MIDDLE_BLOCK]] ]
	; INLOOP-NEXT: br label [[FOR_COND_CLEANUP]]			; INLOOP-NEXT: br label [[FOR_COND_CLEANUP]]
	; INLOOP: for.cond.cleanup:			; INLOOP: for.cond.cleanup:
	; INLOOP-NEXT: [[R_0_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD_LCSSA]], [[FOR_COND_CLEANUP_LOOPEXIT]] ]			; INLOOP-NEXT: [[R_0_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD_LCSSA]], [[FOR_COND_CLEANUP_LOOPEXIT]] ]
	; INLOOP-NEXT: ret i32 [[R_0_LCSSA]]			; INLOOP-NEXT: ret i32 [[R_0_LCSSA]]
	;			;
				; IF-EVL-LABEL: @add_i16_i32(
				; IF-EVL-NEXT: entry:
				; IF-EVL-NEXT: [[CMP6:%.]] = icmp sgt i32 [[N:%.]], 0
				; IF-EVL-NEXT: br i1 [[CMP6]], label [[FOR_BODY_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]
				; IF-EVL: for.body.preheader:
				; IF-EVL-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; IF-EVL: vector.ph:
				; IF-EVL-NEXT: [[TMP0:%.*]] = call i32 @llvm.vscale.i32()
				; IF-EVL-NEXT: [[TMP1:%.*]] = mul i32 [[TMP0]], 4
				; IF-EVL-NEXT: [[TMP2:%.*]] = call i32 @llvm.vscale.i32()
				; IF-EVL-NEXT: [[TMP3:%.*]] = mul i32 [[TMP2]], 4
				; IF-EVL-NEXT: [[TMP4:%.*]] = sub i32 [[TMP3]], 1
				; IF-EVL-NEXT: [[N_RND_UP:%.*]] = add i32 [[N]], [[TMP4]]
				; IF-EVL-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N_RND_UP]], [[TMP1]]
				; IF-EVL-NEXT: [[N_VEC:%.*]] = sub i32 [[N_RND_UP]], [[N_MOD_VF]]
				; IF-EVL-NEXT: [[TRIP_COUNT_MINUS_1:%.*]] = sub i32 [[N]], 1
				; IF-EVL-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 4 x i32> poison, i32 [[TRIP_COUNT_MINUS_1]], i64 0
				; IF-EVL-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
				; IF-EVL-NEXT: [[TMP5:%.*]] = call i32 @llvm.vscale.i32()
				; IF-EVL-NEXT: [[TMP6:%.*]] = mul i32 [[TMP5]], 4
				; IF-EVL-NEXT: br label [[VECTOR_BODY:%.*]]
				; IF-EVL: vector.body:
				; IF-EVL-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; IF-EVL-NEXT: [[VEC_PHI:%.]] = phi <vscale x 4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP14:%.]], [[VECTOR_BODY]] ]
				; IF-EVL-NEXT: [[TMP7:%.*]] = add i32 [[INDEX]], 0
				; IF-EVL-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <vscale x 4 x i32> poison, i32 [[INDEX]], i64 0
				; IF-EVL-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT1]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
				; IF-EVL-NEXT: [[TMP8:%.*]] = call <vscale x 4 x i32> @llvm.experimental.stepvector.nxv4i32()
				; IF-EVL-NEXT: [[TMP9:%.*]] = add <vscale x 4 x i32> zeroinitializer, [[TMP8]]
				; IF-EVL-NEXT: [[VEC_IV:%.*]] = add <vscale x 4 x i32> [[BROADCAST_SPLAT2]], [[TMP9]]
				; IF-EVL-NEXT: [[TMP10:%.*]] = icmp ule <vscale x 4 x i32> [[VEC_IV]], [[BROADCAST_SPLAT]]
				; IF-EVL-NEXT: [[TMP11:%.]] = getelementptr inbounds i16, ptr [[X:%.]], i32 [[TMP7]]
				; IF-EVL-NEXT: [[TMP12:%.*]] = getelementptr inbounds i16, ptr [[TMP11]], i32 0
				; IF-EVL-NEXT: [[WIDE_MASKED_LOAD:%.*]] = call <vscale x 4 x i16> @llvm.masked.load.nxv4i16.p0(ptr [[TMP12]], i32 2, <vscale x 4 x i1> [[TMP10]], <vscale x 4 x i16> poison)
				; IF-EVL-NEXT: [[TMP13:%.*]] = sext <vscale x 4 x i16> [[WIDE_MASKED_LOAD]] to <vscale x 4 x i32>
				; IF-EVL-NEXT: [[TMP14]] = add <vscale x 4 x i32> [[VEC_PHI]], [[TMP13]]
				; IF-EVL-NEXT: [[TMP15:%.*]] = select <vscale x 4 x i1> [[TMP10]], <vscale x 4 x i32> [[TMP14]], <vscale x 4 x i32> [[VEC_PHI]]
				; IF-EVL-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], [[TMP6]]
				; IF-EVL-NEXT: [[TMP16:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
				; IF-EVL-NEXT: br i1 [[TMP16]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
				; IF-EVL: middle.block:
				; IF-EVL-NEXT: [[TMP17:%.*]] = call i32 @llvm.vector.reduce.add.nxv4i32(<vscale x 4 x i32> [[TMP15]])
				; IF-EVL-NEXT: br i1 true, label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[SCALAR_PH]]
				; IF-EVL: scalar.ph:
				; IF-EVL-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
				; IF-EVL-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[FOR_BODY_PREHEADER]] ], [ [[TMP17]], [[MIDDLE_BLOCK]] ]
				; IF-EVL-NEXT: br label [[FOR_BODY:%.*]]
				; IF-EVL: for.body:
				; IF-EVL-NEXT: [[I_08:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
				; IF-EVL-NEXT: [[R_07:%.]] = phi i32 [ [[ADD:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
				; IF-EVL-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i16, ptr [[X]], i32 [[I_08]]
				; IF-EVL-NEXT: [[TMP18:%.*]] = load i16, ptr [[ARRAYIDX]], align 2
				; IF-EVL-NEXT: [[CONV:%.*]] = sext i16 [[TMP18]] to i32
				; IF-EVL-NEXT: [[ADD]] = add nsw i32 [[R_07]], [[CONV]]
				; IF-EVL-NEXT: [[INC]] = add nuw nsw i32 [[I_08]], 1
				; IF-EVL-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[N]]
				; IF-EVL-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
				; IF-EVL: for.cond.cleanup.loopexit:
				; IF-EVL-NEXT: [[ADD_LCSSA:%.*]] = phi i32 [ [[ADD]], [[FOR_BODY]] ], [ [[TMP17]], [[MIDDLE_BLOCK]] ]
				; IF-EVL-NEXT: br label [[FOR_COND_CLEANUP]]
				; IF-EVL: for.cond.cleanup:
				; IF-EVL-NEXT: [[R_0_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD_LCSSA]], [[FOR_COND_CLEANUP_LOOPEXIT]] ]
				; IF-EVL-NEXT: ret i32 [[R_0_LCSSA]]
				;
	entry:			entry:
	%cmp6 = icmp sgt i32 %n, 0			%cmp6 = icmp sgt i32 %n, 0
	br i1 %cmp6, label %for.body, label %for.cond.cleanup			br i1 %cmp6, label %for.body, label %for.cond.cleanup

	for.body: ; preds = %entry, %for.body			for.body: ; preds = %entry, %for.body
	%i.08 = phi i32 [ %inc, %for.body ], [ 0, %entry ]			%i.08 = phi i32 [ %inc, %for.body ], [ 0, %entry ]
	%r.07 = phi i32 [ %add, %for.body ], [ 0, %entry ]			%r.07 = phi i32 [ %add, %for.body ], [ 0, %entry ]
	%arrayidx = getelementptr inbounds i16, ptr %x, i32 %i.08			%arrayidx = getelementptr inbounds i16, ptr %x, i32 %i.08
	Show All 11 Lines

llvm/test/Transforms/LoopVectorize/RISCV/vectorize-vp-intrinsics.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -passes=loop-vectorize \
				; RUN: -force-tail-folding-style=data-with-evl \
				; RUN: -prefer-predicate-over-epilogue=predicate-dont-vectorize \
				; RUN: -mtriple=riscv64 -mattr=+v -S < %s \| FileCheck --check-prefix=IF-EVL %s

				fhahnUnsubmitted Done Reply Inline Actions no need to redirect stderr here? fhahn: no need to redirect stderr here?
				ABataevAuthorUnsubmitted Done Reply Inline Actions Will drop ABataev: Will drop
				; RUN: opt -passes=loop-vectorize \
				; RUN: -force-tail-folding-style=none \
				fhahnUnsubmitted Done Reply Inline Actions Can this configuration be used for target-independent tests? fhahn: Can this configuration be used for target-independent tests?
				ABataevAuthorUnsubmitted Done Reply Inline Actions Not now, it relies on the check of the TTI interface for now ABataev: Not now, it relies on the check of the TTI interface for now
				ABataevAuthorUnsubmitted Done Reply Inline Actions Added several target independent tests ABataev: Added several target independent tests
				; RUN: -prefer-predicate-over-epilogue=predicate-dont-vectorize \
				; RUN: -mtriple=riscv64 -mattr=+v -S < %s \| FileCheck --check-prefix=NO-VP %s

				define void @foo(ptr noalias %a, ptr noalias %b, ptr noalias %c, i64 %N) {
				; IF-EVL-LABEL: @foo(
				; IF-EVL-NEXT: entry:
				; IF-EVL-NEXT: [[TMP0:%.]] = sub i64 -1, [[N:%.]]
				; IF-EVL-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
				; IF-EVL-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], 4
				; IF-EVL-NEXT: [[TMP3:%.*]] = icmp ult i64 [[TMP0]], [[TMP2]]
				; IF-EVL-NEXT: br i1 [[TMP3]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; IF-EVL: vector.ph:
				; IF-EVL-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
				; IF-EVL-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], 4
				; IF-EVL-NEXT: [[TMP6:%.*]] = call i64 @llvm.vscale.i64()
				; IF-EVL-NEXT: [[TMP7:%.*]] = mul i64 [[TMP6]], 4
				; IF-EVL-NEXT: [[TMP8:%.*]] = sub i64 [[TMP7]], 1
				; IF-EVL-NEXT: [[N_RND_UP:%.*]] = add i64 [[N]], [[TMP8]]
				; IF-EVL-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP5]]
				; IF-EVL-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
				; IF-EVL-NEXT: [[TMP9:%.*]] = call i64 @llvm.vscale.i64()
				; IF-EVL-NEXT: [[TMP10:%.*]] = mul i64 [[TMP9]], 4
				; IF-EVL-NEXT: br label [[VECTOR_BODY:%.*]]
				; IF-EVL: vector.body:
				; IF-EVL-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; IF-EVL-NEXT: [[EVL_BASED_IV:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_EVL_NEXT:%.]], [[VECTOR_BODY]] ]
				; IF-EVL-NEXT: [[TMP11:%.*]] = sub i64 [[N]], [[EVL_BASED_IV]]
				; IF-EVL-NEXT: [[TMP12:%.*]] = call i32 @llvm.experimental.get.vector.length.i64(i64 [[TMP11]], i32 4, i1 true)
				; IF-EVL-NEXT: [[TMP13:%.*]] = add i64 [[EVL_BASED_IV]], 0
				; IF-EVL-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, ptr [[B:%.]], i64 [[TMP13]]
				; IF-EVL-NEXT: [[TMP15:%.*]] = getelementptr inbounds i32, ptr [[TMP14]], i32 0
				; IF-EVL-NEXT: [[VP_OP_LOAD:%.*]] = call <vscale x 4 x i32> @llvm.vp.load.nxv4i32.p0(ptr align 4 [[TMP15]], <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), i32 [[TMP12]])
				; IF-EVL-NEXT: [[TMP16:%.]] = getelementptr inbounds i32, ptr [[C:%.]], i64 [[TMP13]]
				; IF-EVL-NEXT: [[TMP17:%.*]] = getelementptr inbounds i32, ptr [[TMP16]], i32 0
				; IF-EVL-NEXT: [[VP_OP_LOAD1:%.*]] = call <vscale x 4 x i32> @llvm.vp.load.nxv4i32.p0(ptr align 4 [[TMP17]], <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), i32 [[TMP12]])
				; IF-EVL-NEXT: [[TMP18:%.*]] = add nsw <vscale x 4 x i32> [[VP_OP_LOAD1]], [[VP_OP_LOAD]]
				; IF-EVL-NEXT: [[TMP19:%.]] = getelementptr inbounds i32, ptr [[A:%.]], i64 [[TMP13]]
				; IF-EVL-NEXT: [[TMP20:%.*]] = getelementptr inbounds i32, ptr [[TMP19]], i32 0
				; IF-EVL-NEXT: call void @llvm.vp.store.nxv4i32.p0(<vscale x 4 x i32> [[TMP18]], ptr align 4 [[TMP20]], <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), i32 [[TMP12]])
				; IF-EVL-NEXT: [[TMP21:%.*]] = zext i32 [[TMP12]] to i64
				; IF-EVL-NEXT: [[INDEX_EVL_NEXT]] = add i64 [[EVL_BASED_IV]], [[TMP21]]
				; IF-EVL-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], [[TMP10]]
				; IF-EVL-NEXT: [[TMP22:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; IF-EVL-NEXT: br i1 [[TMP22]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
				; IF-EVL: middle.block:
				; IF-EVL-NEXT: br i1 true, label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
				; IF-EVL: scalar.ph:
				; IF-EVL-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; IF-EVL-NEXT: br label [[FOR_BODY:%.*]]
				; IF-EVL: for.body:
				; IF-EVL-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
				; IF-EVL-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[IV]]
				; IF-EVL-NEXT: [[TMP23:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
				; IF-EVL-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds i32, ptr [[C]], i64 [[IV]]
				; IF-EVL-NEXT: [[TMP24:%.*]] = load i32, ptr [[ARRAYIDX2]], align 4
				; IF-EVL-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP24]], [[TMP23]]
				; IF-EVL-NEXT: [[ARRAYIDX4:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[IV]]
				; IF-EVL-NEXT: store i32 [[ADD]], ptr [[ARRAYIDX4]], align 4
				; IF-EVL-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
				; IF-EVL-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], [[N]]
				; IF-EVL-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
				; IF-EVL: for.cond.cleanup:
				; IF-EVL-NEXT: ret void
				;
				; NO-VP-LABEL: @foo(
				; NO-VP-NEXT: entry:
				; NO-VP-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
				; NO-VP-NEXT: [[TMP1:%.*]] = mul i64 [[TMP0]], 4
				; NO-VP-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], [[TMP1]]
				; NO-VP-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; NO-VP: vector.ph:
				; NO-VP-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
				; NO-VP-NEXT: [[TMP3:%.*]] = mul i64 [[TMP2]], 4
				; NO-VP-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], [[TMP3]]
				; NO-VP-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
				; NO-VP-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
				; NO-VP-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], 4
				; NO-VP-NEXT: br label [[VECTOR_BODY:%.*]]
				; NO-VP: vector.body:
				; NO-VP-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; NO-VP-NEXT: [[TMP6:%.*]] = add i64 [[INDEX]], 0
				; NO-VP-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, ptr [[B:%.]], i64 [[TMP6]]
				; NO-VP-NEXT: [[TMP8:%.*]] = getelementptr inbounds i32, ptr [[TMP7]], i32 0
				; NO-VP-NEXT: [[WIDE_LOAD:%.*]] = load <vscale x 4 x i32>, ptr [[TMP8]], align 4
				; NO-VP-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, ptr [[C:%.]], i64 [[TMP6]]
				; NO-VP-NEXT: [[TMP10:%.*]] = getelementptr inbounds i32, ptr [[TMP9]], i32 0
				; NO-VP-NEXT: [[WIDE_LOAD1:%.*]] = load <vscale x 4 x i32>, ptr [[TMP10]], align 4
				; NO-VP-NEXT: [[TMP11:%.*]] = add nsw <vscale x 4 x i32> [[WIDE_LOAD1]], [[WIDE_LOAD]]
				; NO-VP-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, ptr [[A:%.]], i64 [[TMP6]]
				; NO-VP-NEXT: [[TMP13:%.*]] = getelementptr inbounds i32, ptr [[TMP12]], i32 0
				; NO-VP-NEXT: store <vscale x 4 x i32> [[TMP11]], ptr [[TMP13]], align 4
				; NO-VP-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP5]]
				; NO-VP-NEXT: [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; NO-VP-NEXT: br i1 [[TMP14]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
				; NO-VP: middle.block:
				; NO-VP-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
				; NO-VP-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
				; NO-VP: scalar.ph:
				; NO-VP-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; NO-VP-NEXT: br label [[FOR_BODY:%.*]]
				; NO-VP: for.body:
				; NO-VP-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
				; NO-VP-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[IV]]
				; NO-VP-NEXT: [[TMP15:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
				; NO-VP-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds i32, ptr [[C]], i64 [[IV]]
				; NO-VP-NEXT: [[TMP16:%.*]] = load i32, ptr [[ARRAYIDX2]], align 4
				; NO-VP-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP16]], [[TMP15]]
				; NO-VP-NEXT: [[ARRAYIDX4:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[IV]]
				; NO-VP-NEXT: store i32 [[ADD]], ptr [[ARRAYIDX4]], align 4
				; NO-VP-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
				; NO-VP-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], [[N]]
				; NO-VP-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
				; NO-VP: for.cond.cleanup:
				; NO-VP-NEXT: ret void
				;
				entry:
				br label %for.body

				for.body:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%arrayidx = getelementptr inbounds i32, ptr %b, i64 %iv
				%0 = load i32, ptr %arrayidx, align 4
				%arrayidx2 = getelementptr inbounds i32, ptr %c, i64 %iv
				%1 = load i32, ptr %arrayidx2, align 4
				%add = add nsw i32 %1, %0
				%arrayidx4 = getelementptr inbounds i32, ptr %a, i64 %iv
				store i32 %add, ptr %arrayidx4, align 4
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, %N
				br i1 %exitcond.not, label %for.cond.cleanup, label %for.body

				for.cond.cleanup:
				ret void
				}
				fhahnUnsubmitted Done Reply Inline Actions nit: not needed? . (same for other tests) fhahn: nit: not needed? . (same for other tests)
				fhahnUnsubmitted Done Reply Inline Actions nit: not needed, could pass %N as i64. (same for other tests) fhahn: nit: not needed, could pass %N as i64. (same for other tests)
				fhahnUnsubmitted Done Reply Inline Actions nit: strip unnecessary `indvars.` prefix fhahn: nit: strip unnecessary `indvars.` prefix
				fhahnUnsubmitted Done Reply Inline Actions unnecessary block fhahn: unnecessary block
				fhahnUnsubmitted Not Done Reply Inline Actions one more test that would probably be good to have is one with a conditional memory operation, which needs EVL and the block mask, right? fhahn: one more test that would probably be good to have is one with a conditional memory operation…
				fhahnUnsubmitted Done Reply Inline Actions This test file is getting quite big with 3 different run lines. I think it would be good to try to split this up a bit, to make it easier to see what's going on. I'd recommend having the test cases for various legality issues as target-independent tests with force flags (force EVL support, VF and IC). And keep cost-model specific tests target specific. fhahn: This test file is getting quite big with 3 different run lines. I think it would be good to try…
				ABataevAuthorUnsubmitted Done Reply Inline Actions Done ABataev: Done

llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-intrinsics.ll

This file was added.

				; REQUIRES: asserts

				; RUN: opt -passes=loop-vectorize -debug-only=loop-vectorize \
				; RUN: -force-tail-folding-style=data-with-evl \
				fhahnUnsubmitted Not Done Reply Inline Actions Can this test be target independent? does it need to check the no VP case? fhahn: Can this test be target independent? does it need to check the no VP case?
				ABataevAuthorUnsubmitted Done Reply Inline Actions No Yes, need to check that the option works correctly ABataev: 1. No 2. Yes, need to check that the option works correctly
				ABataevAuthorUnsubmitted Done Reply Inline Actions Added also target independent version ABataev: Added also target independent version
				; RUN: -prefer-predicate-over-epilogue=predicate-dont-vectorize \
				; RUN: -mtriple=riscv64 -mattr=+v -riscv-v-vector-bits-max=128 -disable-output < %s 2>&1 \| FileCheck --check-prefixes=IF-EVL,CHECK %s

				; RUN: opt -passes=loop-vectorize -debug-only=loop-vectorize \
				; RUN: -force-tail-folding-style=none \
				; RUN: -prefer-predicate-over-epilogue=predicate-dont-vectorize \
				; RUN: -mtriple=riscv64 -mattr=+v -riscv-v-vector-bits-max=128 -disable-output < %s 2>&1 \| FileCheck --check-prefixes=NO-VP,CHECK %s

				define void @foo(ptr noalias %a, ptr noalias %b, ptr noalias %c, i64 %N) {
				; IF-EVL: VPlan 'Initial VPlan for VF={vscale x 1,vscale x 2,vscale x 4},UF>=1' {
				; IF-EVL-NEXT: Live-in vp<[[VFUF:%[0-9]+]]> = VF * UF
				; IF-EVL-NEXT: Live-in vp<[[VTC:%[0-9]+]]> = vector-trip-count
				; IF-EVL-NEXT: Live-in ir<%N> = original trip-count
				; IF-EVL-EMPTY:
				; IF-EVL: vector.ph:
				; IF-EVL-NEXT: Successor(s): vector loop
				AyalUnsubmitted Not Done Reply Inline Actions UF>=1 or UF=1? Ayal: UF>=1 or UF=1?
				; IF-EVL-EMPTY:
				; IF-EVL-NEXT: <x1> vector loop: {
				; IF-EVL-NEXT: vector.body:
				; IF-EVL-NEXT: EMIT vp<[[IV:%[0-9]+]]> = CANONICAL-INDUCTION
				; IF-EVL-NEXT: EXPLICIT-VECTOR-LENGTH-BASED-IV-PHI vp<[[EVL_PHI:%[0-9]+]]> = phi ir<0>, vp<[[IV_NEXT:%[0-9]+]]>
				; IF-EVL-NEXT: EMIT vp<[[EVL:%.+]]> = EXPLICIT-VECTOR-LENGTH vp<[[EVL_PHI]]>, ir<%N>
				; IF-EVL-NEXT: vp<[[ST:%[0-9]+]]> = SCALAR-STEPS vp<[[EVL_PHI]]>, ir<1>
				; IF-EVL-NEXT: CLONE ir<[[GEP1:%.+]]> = getelementptr inbounds ir<%b>, vp<[[ST]]>
				; IF-EVL-NEXT: WIDEN ir<[[LD1:%.+]]> = load ir<[[GEP1]]>, ir<true>
				; IF-EVL-NEXT: CLONE ir<[[GEP2:%.+]]> = getelementptr inbounds ir<%c>, vp<[[ST]]>
				; IF-EVL-NEXT: WIDEN ir<[[LD2:%.+]]> = load ir<[[GEP2]]>, ir<true>
				; IF-EVL-NEXT: WIDEN ir<[[ADD:%.+]]> = add nsw ir<[[LD2]]>, ir<[[LD1]]>
				AyalUnsubmitted Done Reply Inline Actions scalar-steps is needed only for UF>1, as this feeds scalar/cloned geps only? Ayal: scalar-steps is needed only for UF>1, as this feeds scalar/cloned geps only?
				; IF-EVL-NEXT: CLONE ir<[[GEP3:%.+]]> = getelementptr inbounds ir<%a>, vp<[[ST]]>
				; IF-EVL-NEXT: WIDEN store ir<[[GEP3]]>, ir<[[ADD]]>, ir<true>
				AyalUnsubmitted Done Reply Inline Actions Widened loads and store here should say that they use EVL, implicitly. Ayal: Widened loads and store here should say that they use EVL, implicitly.
				; IF-EVL-NEXT: EMIT vp<[[IV_NEXT]]> = EXPLICIT-VECTOR-LENGTH + vp<[[EVL_PHI]]>, vp<[[EVL]]>
				; IF-EVL-NEXT: EMIT vp<[[IV_NEXT_EXIT:%[0-9]+]]> = add vp<[[IV]]>, vp<[[VFUF]]>
				; IF-EVL-NEXT: EMIT branch-on-count vp<[[IV_NEXT_EXIT]]>, vp<[[VTC]]>
				; IF-EVL-NEXT: No successors
				; IF-EVL-NEXT: }

				AyalUnsubmitted Done Reply Inline Actions Can this simply be an ADD of vp<[[EVL_PHI]]> and vp<[[EVL]]>? Ayal: Can this simply be an ADD of vp<[[EVL_PHI]]> and vp<[[EVL]]>?
				; NO-VP: VPlan 'Initial VPlan for VF={vscale x 1,vscale x 2,vscale x 4},UF>=1' {
				; NO-VP-NEXT: Live-in vp<[[VFUF:%[0-9]+]]> = VF * UF
				AyalUnsubmitted Done Reply Inline Actions IF-EVL and FORCE-EVL are the same, check them together? Ayal: IF-EVL and FORCE-EVL are the same, check them together?
				; NO-VP-NEXT: Live-in vp<[[VTC:%[0-9]+]]> = vector-trip-count
				; NO-VP-NEXT: Live-in ir<%N> = original trip-count
				; NO-VP-EMPTY:
				; NO-VP: vector.ph:
				; NO-VP-NEXT: Successor(s): vector loop
				; NO-VP-EMPTY:
				; NO-VP-NEXT: <x1> vector loop: {
				; NO-VP-NEXT: vector.body:
				; NO-VP-NEXT: EMIT vp<[[IV:%[0-9]+]]> = CANONICAL-INDUCTION
				; NO-VP-NEXT: vp<[[ST:%[0-9]+]]> = SCALAR-STEPS vp<[[IV]]>, ir<1>
				; NO-VP-NEXT: CLONE ir<[[GEP1:%.+]]> = getelementptr inbounds ir<%b>, vp<[[ST]]>
				; NO-VP-NEXT: WIDEN ir<[[LD1:%.+]]> = load ir<[[GEP1]]>
				; NO-VP-NEXT: CLONE ir<[[GEP2:%.+]]> = getelementptr inbounds ir<%c>, vp<[[ST]]>
				; NO-VP-NEXT: WIDEN ir<[[LD2:%.+]]> = load ir<[[GEP2]]>
				; NO-VP-NEXT: WIDEN ir<[[ADD:%.+]]> = add nsw ir<[[LD2]]>, ir<[[LD1]]>
				; NO-VP-NEXT: CLONE ir<[[GEP3:%.+]]> = getelementptr inbounds ir<%a>, vp<[[ST]]>
				; NO-VP-NEXT: WIDEN store ir<[[GEP3]]>, ir<[[ADD]]>
				; NO-VP-NEXT: EMIT vp<[[IV_NEXT:%[0-9]+]]> = add nuw vp<[[IV]]>, vp<[[VFUF]]>
				; NO-VP-NEXT: EMIT branch-on-count vp<[[IV_NEXT]]>, vp<[[VTC]]>
				; NO-VP-NEXT: No successors
				; NO-VP-NEXT: }

				entry:
				br label %for.body

				for.body:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%arrayidx = getelementptr inbounds i32, ptr %b, i64 %iv
				%0 = load i32, ptr %arrayidx, align 4
				%arrayidx2 = getelementptr inbounds i32, ptr %c, i64 %iv
				%1 = load i32, ptr %arrayidx2, align 4
				%add = add nsw i32 %1, %0
				%arrayidx4 = getelementptr inbounds i32, ptr %a, i64 %iv
				store i32 %add, ptr %arrayidx4, align 4
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, %N
				br i1 %exitcond.not, label %for.cond.cleanup, label %for.body

				for.cond.cleanup:
				ret void
				}
				AyalUnsubmitted Done Reply Inline Actions MASK is unused? Ayal: MASK is unused?

				define void @safe_dep(ptr %p) {
				; CHECK: VPlan 'Initial VPlan for VF={vscale x 1,vscale x 2},UF>=1' {
				; CHECK-NEXT: Live-in vp<[[VFUF:%[0-9]+]]> = VF * UF
				; CHECK-NEXT: Live-in vp<[[VTC:%[0-9]+]]> = vector-trip-count
				; CHECK-NEXT: Live-in ir<512> = original trip-count
				; CHECK-EMPTY:
				; CHECK: vector.ph:
				; CHECK-NEXT: Successor(s): vector loop
				; CHECK-EMPTY:
				; CHECK-NEXT: <x1> vector loop: {
				; CHECK-NEXT: vector.body:
				; CHECK-NEXT: EMIT vp<[[IV:%[0-9]+]]> = CANONICAL-INDUCTION
				; CHECK-NEXT: vp<[[ST:%[0-9]+]]> = SCALAR-STEPS vp<[[IV]]>, ir<1>
				; CHECK-NEXT: CLONE ir<[[GEP1:%.+]]> = getelementptr ir<%p>, vp<[[ST]]>
				; CHECK-NEXT: WIDEN ir<[[V:%.+]]> = load ir<[[GEP1]]>
				; CHECK-NEXT: CLONE ir<[[OFFSET:.+]]> = add vp<[[ST]]>, ir<100>
				; CHECK-NEXT: CLONE ir<[[GEP2:%.+]]> = getelementptr ir<%p>, ir<[[OFFSET]]>
				; CHECK-NEXT: WIDEN store ir<[[GEP2]]>, ir<[[V]]>
				; CHECK-NEXT: EMIT vp<[[IV_NEXT:%[0-9]+]]> = add nuw vp<[[IV]]>, vp<[[VFUF]]>
				; CHECK-NEXT: EMIT branch-on-count vp<[[IV_NEXT]]>, vp<[[VTC]]>
				; CHECK-NEXT: No successors
				; CHECK-NEXT: }

				entry:
				br label %loop

				loop:
				%iv = phi i64 [0, %entry], [%iv.next, %loop]
				%a1 = getelementptr i64, ptr %p, i64 %iv
				%v = load i64, ptr %a1, align 32
				%offset = add i64 %iv, 100
				%a2 = getelementptr i64, ptr %p, i64 %offset
				store i64 %v, ptr %a2, align 32
				AyalUnsubmitted Done Reply Inline Actions safe dep relies on VFUF < 100, where VF relies on -riscv-v-vector-bits-max=128, vscale x 4 is omitted from VF, and UF>=1 is restricted? Ayal:* safe dep relies on VF*UF < 100, where VF relies on -riscv-v-vector-bits-max=128, vscale x 4 is…
				%iv.next = add i64 %iv, 1
				%cmp = icmp ne i64 %iv, 511
				br i1 %cmp, label %loop, label %exit

				exit:
				ret void
				}

llvm/test/Transforms/LoopVectorize/X86/vectorize-vp-intrinsics.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -passes=loop-vectorize \
				fhahnUnsubmitted Not Done Reply Inline Actions Do the current changes have any impact on the test? fhahn: Do the current changes have any impact on the test?
				ABataevAuthorUnsubmitted Done Reply Inline Actions It checks that the new options do not break anything for other targets and non-scalable vectorization. ABataev: It checks that the new options do not break anything for other targets and non-scalable…
				AyalUnsubmitted Done Reply Inline Actions The three runs produce the same results, can combine their CHECKs. Ayal: The three runs produce the same results, can combine their CHECKs.
				; RUN: -force-tail-folding-style=data-with-evl \
				; RUN: -prefer-predicate-over-epilogue=predicate-dont-vectorize \
				; RUN: -mtriple=x86_64 -mattr=+avx512f -S < %s 2>&1 \| FileCheck --check-prefix=IF-EVL %s

				; RUN: opt -passes=loop-vectorize \
				; RUN: -force-tail-folding-style=none \
				; RUN: -prefer-predicate-over-epilogue=predicate-dont-vectorize \
				; RUN: -mtriple=x86_64 -mattr=+avx512f -S < %s 2>&1 \| FileCheck --check-prefix=NO-VP %s

				define void @foo(ptr noalias %a, ptr noalias %b, ptr noalias %c, i64 %N) {
				; IF-EVL-LABEL: @foo(
				; IF-EVL-NEXT: entry:
				; IF-EVL-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; IF-EVL: vector.ph:
				; IF-EVL-NEXT: [[N_RND_UP:%.]] = add i64 [[N:%.]], 15
				; IF-EVL-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], 16
				; IF-EVL-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
				; IF-EVL-NEXT: [[TRIP_COUNT_MINUS_1:%.*]] = sub i64 [[N]], 1
				; IF-EVL-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <16 x i64> poison, i64 [[TRIP_COUNT_MINUS_1]], i64 0
				AyalUnsubmitted Done Reply Inline Actions nit: can remove CHECKs for this redundant block which never jumps to the dead scalar loop. Ayal: nit: can remove CHECKs for this redundant block which never jumps to the dead scalar loop.
				; IF-EVL-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <16 x i64> [[BROADCAST_SPLATINSERT]], <16 x i64> poison, <16 x i32> zeroinitializer
				; IF-EVL-NEXT: br label [[VECTOR_BODY:%.*]]
				; IF-EVL: vector.body:
				; IF-EVL-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; IF-EVL-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
				; IF-EVL-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <16 x i64> poison, i64 [[INDEX]], i64 0
				; IF-EVL-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <16 x i64> [[BROADCAST_SPLATINSERT1]], <16 x i64> poison, <16 x i32> zeroinitializer
				; IF-EVL-NEXT: [[VEC_IV:%.*]] = add <16 x i64> [[BROADCAST_SPLAT2]], <i64 0, i64 1, i64 2, i64 3, i64 4, i64 5, i64 6, i64 7, i64 8, i64 9, i64 10, i64 11, i64 12, i64 13, i64 14, i64 15>
				; IF-EVL-NEXT: [[TMP1:%.*]] = icmp ule <16 x i64> [[VEC_IV]], [[BROADCAST_SPLAT]]
				; IF-EVL-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, ptr [[B:%.]], i64 [[TMP0]]
				; IF-EVL-NEXT: [[TMP3:%.*]] = getelementptr inbounds i32, ptr [[TMP2]], i32 0
				; IF-EVL-NEXT: [[WIDE_MASKED_LOAD:%.*]] = call <16 x i32> @llvm.masked.load.v16i32.p0(ptr [[TMP3]], i32 4, <16 x i1> [[TMP1]], <16 x i32> poison)
				; IF-EVL-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, ptr [[C:%.]], i64 [[TMP0]]
				; IF-EVL-NEXT: [[TMP5:%.*]] = getelementptr inbounds i32, ptr [[TMP4]], i32 0
				; IF-EVL-NEXT: [[WIDE_MASKED_LOAD3:%.*]] = call <16 x i32> @llvm.masked.load.v16i32.p0(ptr [[TMP5]], i32 4, <16 x i1> [[TMP1]], <16 x i32> poison)
				; IF-EVL-NEXT: [[TMP6:%.*]] = add nsw <16 x i32> [[WIDE_MASKED_LOAD3]], [[WIDE_MASKED_LOAD]]
				; IF-EVL-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, ptr [[A:%.]], i64 [[TMP0]]
				; IF-EVL-NEXT: [[TMP8:%.*]] = getelementptr inbounds i32, ptr [[TMP7]], i32 0
				; IF-EVL-NEXT: call void @llvm.masked.store.v16i32.p0(<16 x i32> [[TMP6]], ptr [[TMP8]], i32 4, <16 x i1> [[TMP1]])
				; IF-EVL-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 16
				; IF-EVL-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; IF-EVL-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
				; IF-EVL: middle.block:
				; IF-EVL-NEXT: br i1 true, label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
				; IF-EVL: scalar.ph:
				; IF-EVL-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; IF-EVL-NEXT: br label [[FOR_BODY:%.*]]
				; IF-EVL: for.body:
				; IF-EVL-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
				; IF-EVL-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[IV]]
				; IF-EVL-NEXT: [[TMP10:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
				; IF-EVL-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds i32, ptr [[C]], i64 [[IV]]
				; IF-EVL-NEXT: [[TMP11:%.*]] = load i32, ptr [[ARRAYIDX2]], align 4
				; IF-EVL-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP11]], [[TMP10]]
				; IF-EVL-NEXT: [[ARRAYIDX4:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[IV]]
				; IF-EVL-NEXT: store i32 [[ADD]], ptr [[ARRAYIDX4]], align 4
				; IF-EVL-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
				; IF-EVL-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], [[N]]
				; IF-EVL-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
				; IF-EVL: for.cond.cleanup:
				; IF-EVL-NEXT: ret void
				;
				; NO-VP-LABEL: @foo(
				; NO-VP-NEXT: entry:
				; NO-VP-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 16
				; NO-VP-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; NO-VP: vector.ph:
				AyalUnsubmitted Done Reply Inline Actions nit: can remove the CHECKs for the dead scalar loop. Ayal: nit: can remove the CHECKs for the dead scalar loop.
				; NO-VP-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], 16
				; NO-VP-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
				; NO-VP-NEXT: br label [[VECTOR_BODY:%.*]]
				; NO-VP: vector.body:
				; NO-VP-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; NO-VP-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
				; NO-VP-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, ptr [[B:%.]], i64 [[TMP0]]
				; NO-VP-NEXT: [[TMP2:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i32 0
				; NO-VP-NEXT: [[WIDE_LOAD:%.*]] = load <16 x i32>, ptr [[TMP2]], align 4
				; NO-VP-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, ptr [[C:%.]], i64 [[TMP0]]
				; NO-VP-NEXT: [[TMP4:%.*]] = getelementptr inbounds i32, ptr [[TMP3]], i32 0
				; NO-VP-NEXT: [[WIDE_LOAD1:%.*]] = load <16 x i32>, ptr [[TMP4]], align 4
				; NO-VP-NEXT: [[TMP5:%.*]] = add nsw <16 x i32> [[WIDE_LOAD1]], [[WIDE_LOAD]]
				; NO-VP-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, ptr [[A:%.]], i64 [[TMP0]]
				; NO-VP-NEXT: [[TMP7:%.*]] = getelementptr inbounds i32, ptr [[TMP6]], i32 0
				; NO-VP-NEXT: store <16 x i32> [[TMP5]], ptr [[TMP7]], align 4
				; NO-VP-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
				; NO-VP-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; NO-VP-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
				; NO-VP: middle.block:
				; NO-VP-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
				; NO-VP-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
				; NO-VP: scalar.ph:
				; NO-VP-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; NO-VP-NEXT: br label [[FOR_BODY:%.*]]
				; NO-VP: for.body:
				; NO-VP-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
				; NO-VP-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[IV]]
				; NO-VP-NEXT: [[TMP9:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
				; NO-VP-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds i32, ptr [[C]], i64 [[IV]]
				; NO-VP-NEXT: [[TMP10:%.*]] = load i32, ptr [[ARRAYIDX2]], align 4
				; NO-VP-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP10]], [[TMP9]]
				; NO-VP-NEXT: [[ARRAYIDX4:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[IV]]
				; NO-VP-NEXT: store i32 [[ADD]], ptr [[ARRAYIDX4]], align 4
				; NO-VP-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
				; NO-VP-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], [[N]]
				; NO-VP-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
				; NO-VP: for.cond.cleanup:
				; NO-VP-NEXT: ret void
				;
				entry:
				br label %for.body

				for.body:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%arrayidx = getelementptr inbounds i32, ptr %b, i64 %iv
				%0 = load i32, ptr %arrayidx, align 4
				%arrayidx2 = getelementptr inbounds i32, ptr %c, i64 %iv
				%1 = load i32, ptr %arrayidx2, align 4
				%add = add nsw i32 %1, %0
				%arrayidx4 = getelementptr inbounds i32, ptr %a, i64 %iv
				store i32 %add, ptr %arrayidx4, align 4
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, %N
				br i1 %exitcond.not, label %for.cond.cleanup, label %for.body

				for.cond.cleanup:
				ret void
				}

llvm/test/Transforms/LoopVectorize/X86/vplan-vp-intrinsics.ll

This file was added.

				; REQUIRES: asserts

				; RUN: opt -passes=loop-vectorize -debug-only=loop-vectorize -force-vector-width=4 \
				; RUN: -force-tail-folding-style=data-with-evl \
				; RUN: -prefer-predicate-over-epilogue=predicate-dont-vectorize \
				; RUN: -mtriple=x86_64 -mattr=+avx512f -disable-output < %s 2>&1 \| FileCheck --check-prefix=IF-EVL %s

				; RUN: opt -passes=loop-vectorize -debug-only=loop-vectorize -force-vector-width=4 \
				; RUN: -force-tail-folding-style=none \
				; RUN: -prefer-predicate-over-epilogue=predicate-dont-vectorize \
				; RUN: -mtriple=x86_64 -mattr=+avx512f -disable-output < %s 2>&1 \| FileCheck --check-prefix=NO-VP %s

				define void @foo(ptr noalias %a, ptr noalias %b, ptr noalias %c, i64 %N) {
				; IF-EVL: VPlan 'Initial VPlan for VF={4},UF>=1' {
				; IF-EVL-NEXT: Live-in vp<[[VFUF:%[0-9]+]]> = VF * UF
				; IF-EVL-NEXT: Live-in vp<[[VTC:%[0-9]+]]> = vector-trip-count
				; IF-EVL-NEXT: Live-in vp<[[BETC:%[0-9]+]]> = backedge-taken count
				; IF-EVL-NEXT: Live-in ir<%N> = original trip-count
				; IF-EVL-EMPTY:
				; IF-EVL: vector.ph:
				; IF-EVL-NEXT: Successor(s): vector loop
				; IF-EVL-EMPTY:
				; IF-EVL-NEXT: <x1> vector loop: {
				; IF-EVL-NEXT: vector.body:
				; IF-EVL-NEXT: EMIT vp<[[IV:%[0-9]+]]> = CANONICAL-INDUCTION
				; IF-EVL-NEXT: vp<[[ST:%[0-9]+]]> = SCALAR-STEPS vp<[[IV]]>, ir<1>
				; IF-EVL-NEXT: EMIT vp<[[VIV:%[0-9]+]]> = WIDEN-CANONICAL-INDUCTION vp<[[IV]]>
				; IF-EVL-NEXT: EMIT vp<[[MASK:%[0-9]+]]> = icmp ule vp<[[VIV]]>, vp<[[BETC]]>
				; IF-EVL-NEXT: CLONE ir<[[GEP1:%.+]]> = getelementptr inbounds ir<%b>, vp<[[ST]]>
				; IF-EVL-NEXT: WIDEN ir<[[LD1:%.+]]> = load ir<[[GEP1]]>, vp<[[MASK]]>
				; IF-EVL-NEXT: CLONE ir<[[GEP2:%.+]]> = getelementptr inbounds ir<%c>, vp<[[ST]]>
				; IF-EVL-NEXT: WIDEN ir<[[LD2:%.+]]> = load ir<[[GEP2]]>, vp<[[MASK]]>
				; IF-EVL-NEXT: WIDEN ir<[[ADD:%.+]]> = add nsw ir<[[LD2]]>, ir<[[LD1]]>
				; IF-EVL-NEXT: CLONE ir<[[GEP3:%.+]]> = getelementptr inbounds ir<%a>, vp<[[ST]]>
				; IF-EVL-NEXT: WIDEN store ir<[[GEP3]]>, ir<[[ADD]]>, vp<[[MASK]]>
				; IF-EVL-NEXT: EMIT vp<[[IV_NEXT:%[0-9]+]]> = add vp<[[IV]]>, vp<[[VFUF]]>
				; IF-EVL-NEXT: EMIT branch-on-count vp<[[IV_NEXT]]>, vp<[[VTC]]>
				; IF-EVL-NEXT: No successors
				; IF-EVL-NEXT: }

				; NO-VP: VPlan 'Initial VPlan for VF={4},UF>=1' {
				; NO-VP-NEXT: Live-in vp<[[VFUF:%[0-9]+]]> = VF * UF
				; NO-VP-NEXT: Live-in vp<[[VTC:%[0-9]+]]> = vector-trip-count
				; NO-VP-NEXT: Live-in ir<%N> = original trip-count
				AyalUnsubmitted Done Reply Inline Actions All three runs {IF-EVL-NEXT, FORCE-EVL, NO-VP} result in the same VPlan; combine their checks together? Ayal: All three runs {IF-EVL-NEXT, FORCE-EVL, NO-VP} result in the same VPlan; combine their checks…
				; NO-VP-EMPTY:
				; NO-VP: vector.ph:
				; NO-VP-NEXT: Successor(s): vector loop
				; NO-VP-EMPTY:
				; NO-VP-NEXT: <x1> vector loop: {
				; NO-VP-NEXT: vector.body:
				; NO-VP-NEXT: EMIT vp<[[IV:%[0-9]+]]> = CANONICAL-INDUCTION
				; NO-VP-NEXT: vp<[[ST:%[0-9]+]]> = SCALAR-STEPS vp<[[IV]]>, ir<1>
				; NO-VP-NEXT: CLONE ir<[[GEP1:%.+]]> = getelementptr inbounds ir<%b>, vp<[[ST]]>
				; NO-VP-NEXT: WIDEN ir<[[LD1:%.+]]> = load ir<[[GEP1]]>
				; NO-VP-NEXT: CLONE ir<[[GEP2:%.+]]> = getelementptr inbounds ir<%c>, vp<[[ST]]>
				; NO-VP-NEXT: WIDEN ir<[[LD2:%.+]]> = load ir<[[GEP2]]>
				; NO-VP-NEXT: WIDEN ir<[[ADD:%.+]]> = add nsw ir<[[LD2]]>, ir<[[LD1]]>
				; NO-VP-NEXT: CLONE ir<[[GEP3:%.+]]> = getelementptr inbounds ir<%a>, vp<[[ST]]>
				; NO-VP-NEXT: WIDEN store ir<[[GEP3]]>, ir<[[ADD]]>
				; NO-VP-NEXT: EMIT vp<[[IV_NEXT:%[0-9]+]]> = add nuw vp<[[IV]]>, vp<[[VFUF]]>
				; NO-VP-NEXT: EMIT branch-on-count vp<[[IV_NEXT]]>, vp<[[VTC]]>
				; NO-VP-NEXT: No successors
				; NO-VP-NEXT: }

				entry:
				br label %for.body

				for.body:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%arrayidx = getelementptr inbounds i32, ptr %b, i64 %iv
				%0 = load i32, ptr %arrayidx, align 4
				%arrayidx2 = getelementptr inbounds i32, ptr %c, i64 %iv
				%1 = load i32, ptr %arrayidx2, align 4
				%add = add nsw i32 %1, %0
				%arrayidx4 = getelementptr inbounds i32, ptr %a, i64 %iv
				store i32 %add, ptr %arrayidx4, align 4
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, %N
				br i1 %exitcond.not, label %for.cond.cleanup, label %for.body

				for.cond.cleanup:
				ret void
				}

llvm/test/Transforms/LoopVectorize/vectorize-vp-intrinsics-gather-scatter.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -passes=loop-vectorize \
				; RUN: -force-tail-folding-style=data-with-evl \
				; RUN: -prefer-predicate-over-epilogue=predicate-dont-vectorize \
				; RUN: -force-target-supports-scalable-vectors -scalable-vectorization=on -S < %s \| FileCheck --check-prefix=IF-EVL %s

				; RUN: opt -passes=loop-vectorize \
				; RUN: -force-tail-folding-style=none \
				; RUN: -prefer-predicate-over-epilogue=predicate-dont-vectorize \
				; RUN: -force-target-supports-scalable-vectors -scalable-vectorization=on -S < %s \| FileCheck --check-prefix=NO-VP %s

				define void @gather_scatter(ptr noalias %in, ptr noalias %out, ptr noalias %index, i64 %n) {
				; IF-EVL-LABEL: @gather_scatter(
				; IF-EVL-NEXT: entry:
				; IF-EVL-NEXT: br label [[FOR_BODY:%.*]]
				; IF-EVL: for.body:
				; IF-EVL-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
				; IF-EVL-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i32, ptr [[INDEX:%.]], i64 [[INDVARS_IV]]
				; IF-EVL-NEXT: [[TMP0:%.*]] = load i64, ptr [[ARRAYIDX3]], align 8
				; IF-EVL-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds float, ptr [[IN:%.]], i64 [[TMP0]]
				; IF-EVL-NEXT: [[TMP1:%.*]] = load float, ptr [[ARRAYIDX5]], align 4
				; IF-EVL-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds float, ptr [[OUT:%.]], i64 [[TMP0]]
				; IF-EVL-NEXT: store float [[TMP1]], ptr [[ARRAYIDX7]], align 4
				; IF-EVL-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
				; IF-EVL-NEXT: [[EXITCOND_NOT:%.]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N:%.]]
				; IF-EVL-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]
				; IF-EVL: for.end:
				; IF-EVL-NEXT: ret void
				;
				; NO-VP-LABEL: @gather_scatter(
				; NO-VP-NEXT: entry:
				; NO-VP-NEXT: br label [[FOR_BODY:%.*]]
				; NO-VP: for.body:
				; NO-VP-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
				; NO-VP-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i32, ptr [[INDEX:%.]], i64 [[INDVARS_IV]]
				; NO-VP-NEXT: [[TMP0:%.*]] = load i64, ptr [[ARRAYIDX3]], align 8
				; NO-VP-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds float, ptr [[IN:%.]], i64 [[TMP0]]
				; NO-VP-NEXT: [[TMP1:%.*]] = load float, ptr [[ARRAYIDX5]], align 4
				; NO-VP-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds float, ptr [[OUT:%.]], i64 [[TMP0]]
				; NO-VP-NEXT: store float [[TMP1]], ptr [[ARRAYIDX7]], align 4
				; NO-VP-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
				; NO-VP-NEXT: [[EXITCOND_NOT:%.]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N:%.]]
				; NO-VP-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]
				; NO-VP: for.end:
				; NO-VP-NEXT: ret void
				;
				entry:
				br label %for.body

				for.body:
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%arrayidx3 = getelementptr inbounds i32, ptr %index, i64 %indvars.iv
				%0 = load i64, ptr %arrayidx3, align 8
				%arrayidx5 = getelementptr inbounds float, ptr %in, i64 %0
				%1 = load float, ptr %arrayidx5, align 4
				%arrayidx7 = getelementptr inbounds float, ptr %out, i64 %0
				store float %1, ptr %arrayidx7, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond.not = icmp eq i64 %indvars.iv.next, %n
				br i1 %exitcond.not, label %for.end, label %for.body

				for.end:
				ret void
				}

llvm/test/Transforms/LoopVectorize/vectorize-vp-intrinsics-interleave.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -passes=loop-vectorize \
				; RUN: -force-tail-folding-style=data-with-evl \
				; RUN: -prefer-predicate-over-epilogue=predicate-dont-vectorize \
				; RUN: -force-target-supports-scalable-vectors -scalable-vectorization=on -S < %s \| FileCheck --check-prefix=IF-EVL %s

				; RUN: opt -passes=loop-vectorize \
				; RUN: -force-tail-folding-style=none \
				; RUN: -prefer-predicate-over-epilogue=predicate-dont-vectorize \
				; RUN: -force-target-supports-scalable-vectors -scalable-vectorization=on -S < %s \| FileCheck --check-prefix=NO-VP %s

				define void @interleave(ptr noalias %a, ptr noalias %b, ptr noalias %c, i64 %N) {
				; IF-EVL-LABEL: @interleave(
				; IF-EVL-NEXT: entry:
				; IF-EVL-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; IF-EVL: vector.ph:
				; IF-EVL-NEXT: [[N_RND_UP:%.]] = add i64 [[N:%.]], 1
				; IF-EVL-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], 2
				; IF-EVL-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
				; IF-EVL-NEXT: [[TRIP_COUNT_MINUS_1:%.*]] = sub i64 [[N]], 1
				; IF-EVL-NEXT: br label [[VECTOR_BODY:%.*]]
				; IF-EVL: vector.body:
				; IF-EVL-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE3:%.*]] ]
				; IF-EVL-NEXT: [[VEC_IV:%.*]] = add i64 [[INDEX]], 0
				; IF-EVL-NEXT: [[VEC_IV1:%.*]] = add i64 [[INDEX]], 1
				; IF-EVL-NEXT: [[TMP0:%.*]] = icmp ule i64 [[VEC_IV]], [[TRIP_COUNT_MINUS_1]]
				; IF-EVL-NEXT: [[TMP1:%.*]] = icmp ule i64 [[VEC_IV1]], [[TRIP_COUNT_MINUS_1]]
				; IF-EVL-NEXT: br i1 [[TMP0]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]
				; IF-EVL: pred.store.if:
				; IF-EVL-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 0
				; IF-EVL-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, ptr [[B:%.]], i64 [[TMP2]]
				; IF-EVL-NEXT: [[TMP4:%.*]] = load i32, ptr [[TMP3]], align 4
				; IF-EVL-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, ptr [[C:%.]], i64 [[TMP2]]
				; IF-EVL-NEXT: [[TMP6:%.*]] = load i32, ptr [[TMP5]], align 4
				; IF-EVL-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, ptr [[A:%.]], i64 [[TMP2]]
				; IF-EVL-NEXT: [[TMP8:%.*]] = add nsw i32 [[TMP6]], [[TMP4]]
				; IF-EVL-NEXT: store i32 [[TMP8]], ptr [[TMP7]], align 4
				; IF-EVL-NEXT: br label [[PRED_STORE_CONTINUE]]
				; IF-EVL: pred.store.continue:
				; IF-EVL-NEXT: [[TMP9:%.*]] = phi i32 [ poison, [[VECTOR_BODY]] ], [ [[TMP4]], [[PRED_STORE_IF]] ]
				; IF-EVL-NEXT: [[TMP10:%.*]] = phi i32 [ poison, [[VECTOR_BODY]] ], [ [[TMP6]], [[PRED_STORE_IF]] ]
				; IF-EVL-NEXT: br i1 [[TMP1]], label [[PRED_STORE_IF2:%.*]], label [[PRED_STORE_CONTINUE3]]
				; IF-EVL: pred.store.if2:
				; IF-EVL-NEXT: [[TMP11:%.*]] = add i64 [[INDEX]], 1
				; IF-EVL-NEXT: [[TMP12:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[TMP11]]
				; IF-EVL-NEXT: [[TMP13:%.*]] = load i32, ptr [[TMP12]], align 4
				; IF-EVL-NEXT: [[TMP14:%.*]] = getelementptr inbounds i32, ptr [[C]], i64 [[TMP11]]
				; IF-EVL-NEXT: [[TMP15:%.*]] = load i32, ptr [[TMP14]], align 4
				; IF-EVL-NEXT: [[TMP16:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[TMP11]]
				; IF-EVL-NEXT: [[TMP17:%.*]] = add nsw i32 [[TMP15]], [[TMP13]]
				; IF-EVL-NEXT: store i32 [[TMP17]], ptr [[TMP16]], align 4
				; IF-EVL-NEXT: br label [[PRED_STORE_CONTINUE3]]
				; IF-EVL: pred.store.continue3:
				; IF-EVL-NEXT: [[TMP18:%.*]] = phi i32 [ poison, [[PRED_STORE_CONTINUE]] ], [ [[TMP13]], [[PRED_STORE_IF2]] ]
				; IF-EVL-NEXT: [[TMP19:%.*]] = phi i32 [ poison, [[PRED_STORE_CONTINUE]] ], [ [[TMP15]], [[PRED_STORE_IF2]] ]
				; IF-EVL-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 2
				; IF-EVL-NEXT: [[TMP20:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; IF-EVL-NEXT: br i1 [[TMP20]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
				; IF-EVL: middle.block:
				; IF-EVL-NEXT: br i1 true, label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
				; IF-EVL: scalar.ph:
				; IF-EVL-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; IF-EVL-NEXT: br label [[FOR_BODY:%.*]]
				; IF-EVL: for.body:
				; IF-EVL-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
				; IF-EVL-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[IV]]
				; IF-EVL-NEXT: [[TMP21:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
				; IF-EVL-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds i32, ptr [[C]], i64 [[IV]]
				; IF-EVL-NEXT: [[TMP22:%.*]] = load i32, ptr [[ARRAYIDX2]], align 4
				; IF-EVL-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP22]], [[TMP21]]
				; IF-EVL-NEXT: [[ARRAYIDX4:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[IV]]
				; IF-EVL-NEXT: store i32 [[ADD]], ptr [[ARRAYIDX4]], align 4
				; IF-EVL-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
				; IF-EVL-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], [[N]]
				; IF-EVL-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
				; IF-EVL: for.cond.cleanup:
				; IF-EVL-NEXT: ret void
				;
				; NO-VP-LABEL: @interleave(
				; NO-VP-NEXT: entry:
				; NO-VP-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
				; NO-VP-NEXT: [[TMP1:%.*]] = mul i64 [[TMP0]], 2
				; NO-VP-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], [[TMP1]]
				; NO-VP-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; NO-VP: vector.ph:
				; NO-VP-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
				; NO-VP-NEXT: [[TMP3:%.*]] = mul i64 [[TMP2]], 2
				; NO-VP-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], [[TMP3]]
				; NO-VP-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
				; NO-VP-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
				; NO-VP-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], 2
				; NO-VP-NEXT: br label [[VECTOR_BODY:%.*]]
				; NO-VP: vector.body:
				; NO-VP-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; NO-VP-NEXT: [[TMP6:%.*]] = add i64 [[INDEX]], 0
				; NO-VP-NEXT: [[TMP7:%.*]] = call i64 @llvm.vscale.i64()
				; NO-VP-NEXT: [[TMP8:%.*]] = add i64 [[TMP7]], 0
				; NO-VP-NEXT: [[TMP9:%.*]] = mul i64 [[TMP8]], 1
				; NO-VP-NEXT: [[TMP10:%.*]] = add i64 [[INDEX]], [[TMP9]]
				; NO-VP-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, ptr [[B:%.]], i64 [[TMP6]]
				; NO-VP-NEXT: [[TMP12:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[TMP10]]
				; NO-VP-NEXT: [[TMP13:%.*]] = getelementptr inbounds i32, ptr [[TMP11]], i32 0
				; NO-VP-NEXT: [[WIDE_LOAD:%.*]] = load <vscale x 1 x i32>, ptr [[TMP13]], align 4
				; NO-VP-NEXT: [[TMP14:%.*]] = call i64 @llvm.vscale.i64()
				; NO-VP-NEXT: [[TMP15:%.*]] = getelementptr inbounds i32, ptr [[TMP11]], i64 [[TMP14]]
				; NO-VP-NEXT: [[WIDE_LOAD1:%.*]] = load <vscale x 1 x i32>, ptr [[TMP15]], align 4
				; NO-VP-NEXT: [[TMP16:%.]] = getelementptr inbounds i32, ptr [[C:%.]], i64 [[TMP6]]
				; NO-VP-NEXT: [[TMP17:%.*]] = getelementptr inbounds i32, ptr [[C]], i64 [[TMP10]]
				; NO-VP-NEXT: [[TMP18:%.*]] = getelementptr inbounds i32, ptr [[TMP16]], i32 0
				; NO-VP-NEXT: [[WIDE_LOAD2:%.*]] = load <vscale x 1 x i32>, ptr [[TMP18]], align 4
				; NO-VP-NEXT: [[TMP19:%.*]] = call i64 @llvm.vscale.i64()
				; NO-VP-NEXT: [[TMP20:%.*]] = getelementptr inbounds i32, ptr [[TMP16]], i64 [[TMP19]]
				; NO-VP-NEXT: [[WIDE_LOAD3:%.*]] = load <vscale x 1 x i32>, ptr [[TMP20]], align 4
				; NO-VP-NEXT: [[TMP21:%.*]] = add nsw <vscale x 1 x i32> [[WIDE_LOAD2]], [[WIDE_LOAD]]
				; NO-VP-NEXT: [[TMP22:%.*]] = add nsw <vscale x 1 x i32> [[WIDE_LOAD3]], [[WIDE_LOAD1]]
				; NO-VP-NEXT: [[TMP23:%.]] = getelementptr inbounds i32, ptr [[A:%.]], i64 [[TMP6]]
				; NO-VP-NEXT: [[TMP24:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[TMP10]]
				; NO-VP-NEXT: [[TMP25:%.*]] = getelementptr inbounds i32, ptr [[TMP23]], i32 0
				; NO-VP-NEXT: store <vscale x 1 x i32> [[TMP21]], ptr [[TMP25]], align 4
				; NO-VP-NEXT: [[TMP26:%.*]] = call i64 @llvm.vscale.i64()
				; NO-VP-NEXT: [[TMP27:%.*]] = getelementptr inbounds i32, ptr [[TMP23]], i64 [[TMP26]]
				; NO-VP-NEXT: store <vscale x 1 x i32> [[TMP22]], ptr [[TMP27]], align 4
				; NO-VP-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP5]]
				; NO-VP-NEXT: [[TMP28:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; NO-VP-NEXT: br i1 [[TMP28]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
				; NO-VP: middle.block:
				; NO-VP-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
				; NO-VP-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
				; NO-VP: scalar.ph:
				; NO-VP-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; NO-VP-NEXT: br label [[FOR_BODY:%.*]]
				; NO-VP: for.body:
				; NO-VP-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
				; NO-VP-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[IV]]
				; NO-VP-NEXT: [[TMP29:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
				; NO-VP-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds i32, ptr [[C]], i64 [[IV]]
				; NO-VP-NEXT: [[TMP30:%.*]] = load i32, ptr [[ARRAYIDX2]], align 4
				; NO-VP-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP30]], [[TMP29]]
				; NO-VP-NEXT: [[ARRAYIDX4:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[IV]]
				; NO-VP-NEXT: store i32 [[ADD]], ptr [[ARRAYIDX4]], align 4
				; NO-VP-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
				; NO-VP-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], [[N]]
				; NO-VP-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
				; NO-VP: for.cond.cleanup:
				; NO-VP-NEXT: ret void
				;
				entry:
				br label %for.body

				for.body:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%arrayidx = getelementptr inbounds i32, ptr %b, i64 %iv
				%0 = load i32, ptr %arrayidx, align 4
				%arrayidx2 = getelementptr inbounds i32, ptr %c, i64 %iv
				%1 = load i32, ptr %arrayidx2, align 4
				%add = add nsw i32 %1, %0
				%arrayidx4 = getelementptr inbounds i32, ptr %a, i64 %iv
				store i32 %add, ptr %arrayidx4, align 4
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, %N
				br i1 %exitcond.not, label %for.cond.cleanup, label %for.body, !llvm.loop !0

				for.cond.cleanup:
				ret void
				}

				!0 = distinct !{!0, !1, !2}
				!1 = !{!"llvm.loop.interleave.count", i32 2}
				!2 = !{!"llvm.loop.vectorize.enable", i1 true}

llvm/test/Transforms/LoopVectorize/vectorize-vp-intrinsics-iv32.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -passes=loop-vectorize \
				; RUN: -force-tail-folding-style=data-with-evl \
				; RUN: -prefer-predicate-over-epilogue=predicate-dont-vectorize \
				; RUN: -force-target-supports-scalable-vectors -scalable-vectorization=on -S < %s \| FileCheck --check-prefix=IF-EVL %s

				; RUN: opt -passes=loop-vectorize \
				; RUN: -force-tail-folding-style=none \
				; RUN: -prefer-predicate-over-epilogue=predicate-dont-vectorize \
				; RUN: -force-target-supports-scalable-vectors -scalable-vectorization=on -S < %s \| FileCheck --check-prefix=NO-VP %s

				define void @iv32(ptr noalias %a, ptr noalias %b, i32 %N) {
				; IF-EVL-LABEL: @iv32(
				; IF-EVL-NEXT: entry:
				; IF-EVL-NEXT: br label [[FOR_BODY:%.*]]
				; IF-EVL: for.body:
				; IF-EVL-NEXT: [[IV:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.*]], [[FOR_BODY]] ]
				; IF-EVL-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, ptr [[B:%.]], i32 [[IV]]
				; IF-EVL-NEXT: [[TMP0:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
				; IF-EVL-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i32, ptr [[A:%.]], i32 [[IV]]
				; IF-EVL-NEXT: store i32 [[TMP0]], ptr [[ARRAYIDX4]], align 4
				; IF-EVL-NEXT: [[IV_NEXT]] = add nuw nsw i32 [[IV]], 1
				; IF-EVL-NEXT: [[EXITCOND_NOT:%.]] = icmp eq i32 [[IV_NEXT]], [[N:%.]]
				; IF-EVL-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]]
				; IF-EVL: for.cond.cleanup:
				; IF-EVL-NEXT: ret void
				;
				; NO-VP-LABEL: @iv32(
				; NO-VP-NEXT: entry:
				; NO-VP-NEXT: [[TMP0:%.*]] = call i32 @llvm.vscale.i32()
				; NO-VP-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i32 [[N:%.]], [[TMP0]]
				; NO-VP-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; NO-VP: vector.ph:
				; NO-VP-NEXT: [[TMP1:%.*]] = call i32 @llvm.vscale.i32()
				; NO-VP-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N]], [[TMP1]]
				; NO-VP-NEXT: [[N_VEC:%.*]] = sub i32 [[N]], [[N_MOD_VF]]
				; NO-VP-NEXT: [[TMP2:%.*]] = call i32 @llvm.vscale.i32()
				; NO-VP-NEXT: br label [[VECTOR_BODY:%.*]]
				; NO-VP: vector.body:
				; NO-VP-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; NO-VP-NEXT: [[TMP3:%.*]] = add i32 [[INDEX]], 0
				; NO-VP-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, ptr [[B:%.]], i32 [[TMP3]]
				; NO-VP-NEXT: [[TMP5:%.*]] = getelementptr inbounds i32, ptr [[TMP4]], i32 0
				; NO-VP-NEXT: [[WIDE_LOAD:%.*]] = load <vscale x 1 x i32>, ptr [[TMP5]], align 4
				; NO-VP-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, ptr [[A:%.]], i32 [[TMP3]]
				; NO-VP-NEXT: [[TMP7:%.*]] = getelementptr inbounds i32, ptr [[TMP6]], i32 0
				; NO-VP-NEXT: store <vscale x 1 x i32> [[WIDE_LOAD]], ptr [[TMP7]], align 4
				; NO-VP-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], [[TMP2]]
				; NO-VP-NEXT: [[TMP8:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
				; NO-VP-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
				; NO-VP: middle.block:
				; NO-VP-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[N]], [[N_VEC]]
				; NO-VP-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
				; NO-VP: scalar.ph:
				; NO-VP-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; NO-VP-NEXT: br label [[FOR_BODY:%.*]]
				; NO-VP: for.body:
				; NO-VP-NEXT: [[IV:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
				; NO-VP-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[B]], i32 [[IV]]
				; NO-VP-NEXT: [[TMP9:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
				; NO-VP-NEXT: [[ARRAYIDX4:%.*]] = getelementptr inbounds i32, ptr [[A]], i32 [[IV]]
				; NO-VP-NEXT: store i32 [[TMP9]], ptr [[ARRAYIDX4]], align 4
				; NO-VP-NEXT: [[IV_NEXT]] = add nuw nsw i32 [[IV]], 1
				; NO-VP-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i32 [[IV_NEXT]], [[N]]
				; NO-VP-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
				; NO-VP: for.cond.cleanup:
				; NO-VP-NEXT: ret void
				;
				entry:
				br label %for.body

				for.body:
				%iv = phi i32 [ 0, %entry ], [ %iv.next, %for.body ]
				%arrayidx = getelementptr inbounds i32, ptr %b, i32 %iv
				%0 = load i32, ptr %arrayidx, align 4
				%arrayidx4 = getelementptr inbounds i32, ptr %a, i32 %iv
				store i32 %0, ptr %arrayidx4, align 4
				%iv.next = add nuw nsw i32 %iv, 1
				%exitcond.not = icmp eq i32 %iv.next, %N
				br i1 %exitcond.not, label %for.cond.cleanup, label %for.body

				for.cond.cleanup:
				ret void
				}

llvm/test/Transforms/LoopVectorize/vectorize-vp-intrinsics-masked-loadstore.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -passes=loop-vectorize \
				; RUN: -force-tail-folding-style=data-with-evl \
				; RUN: -prefer-predicate-over-epilogue=predicate-dont-vectorize \
				; RUN: -force-target-supports-scalable-vectors -scalable-vectorization=on -S < %s \| FileCheck --check-prefix=IF-EVL %s

				; RUN: opt -passes=loop-vectorize \
				; RUN: -force-tail-folding-style=none \
				; RUN: -prefer-predicate-over-epilogue=predicate-dont-vectorize \
				; RUN: -force-target-supports-scalable-vectors -scalable-vectorization=on -S < %s \| FileCheck --check-prefix=NO-VP %s

				define void @masked_loadstore(ptr noalias %a, ptr noalias %b, i64 %n) {
				; IF-EVL-LABEL: @masked_loadstore(
				; IF-EVL-NEXT: entry:
				; IF-EVL-NEXT: br label [[FOR_BODY:%.*]]
				; IF-EVL: for.body:
				; IF-EVL-NEXT: [[I_011:%.]] = phi i64 [ [[INC:%.]], [[FOR_INC:%.]] ], [ 0, [[ENTRY:%.]] ]
				; IF-EVL-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, ptr [[B:%.]], i64 [[I_011]]
				; IF-EVL-NEXT: [[TMP0:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
				; IF-EVL-NEXT: [[CMP1:%.*]] = icmp ne i32 [[TMP0]], 0
				; IF-EVL-NEXT: br i1 [[CMP1]], label [[IF_THEN:%.*]], label [[FOR_INC]]
				; IF-EVL: if.then:
				; IF-EVL-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i32, ptr [[A:%.]], i64 [[I_011]]
				; IF-EVL-NEXT: [[TMP1:%.*]] = load i32, ptr [[ARRAYIDX3]], align 4
				; IF-EVL-NEXT: [[ADD:%.*]] = add i32 [[TMP0]], [[TMP1]]
				; IF-EVL-NEXT: store i32 [[ADD]], ptr [[ARRAYIDX3]], align 4
				; IF-EVL-NEXT: br label [[FOR_INC]]
				; IF-EVL: for.inc:
				; IF-EVL-NEXT: [[INC]] = add nuw nsw i64 [[I_011]], 1
				; IF-EVL-NEXT: [[EXITCOND_NOT:%.]] = icmp eq i64 [[INC]], [[N:%.]]
				; IF-EVL-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT:%.*]], label [[FOR_BODY]]
				; IF-EVL: exit:
				; IF-EVL-NEXT: ret void
				;
				; NO-VP-LABEL: @masked_loadstore(
				; NO-VP-NEXT: entry:
				; NO-VP-NEXT: br label [[FOR_BODY:%.*]]
				; NO-VP: for.body:
				; NO-VP-NEXT: [[I_011:%.]] = phi i64 [ [[INC:%.]], [[FOR_INC:%.]] ], [ 0, [[ENTRY:%.]] ]
				; NO-VP-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, ptr [[B:%.]], i64 [[I_011]]
				; NO-VP-NEXT: [[TMP0:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
				; NO-VP-NEXT: [[CMP1:%.*]] = icmp ne i32 [[TMP0]], 0
				; NO-VP-NEXT: br i1 [[CMP1]], label [[IF_THEN:%.*]], label [[FOR_INC]]
				; NO-VP: if.then:
				; NO-VP-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i32, ptr [[A:%.]], i64 [[I_011]]
				; NO-VP-NEXT: [[TMP1:%.*]] = load i32, ptr [[ARRAYIDX3]], align 4
				; NO-VP-NEXT: [[ADD:%.*]] = add i32 [[TMP0]], [[TMP1]]
				; NO-VP-NEXT: store i32 [[ADD]], ptr [[ARRAYIDX3]], align 4
				; NO-VP-NEXT: br label [[FOR_INC]]
				; NO-VP: for.inc:
				; NO-VP-NEXT: [[INC]] = add nuw nsw i64 [[I_011]], 1
				; NO-VP-NEXT: [[EXITCOND_NOT:%.]] = icmp eq i64 [[INC]], [[N:%.]]
				; NO-VP-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT:%.*]], label [[FOR_BODY]]
				; NO-VP: exit:
				; NO-VP-NEXT: ret void
				;
				entry:
				br label %for.body

				for.body:
				%i.011 = phi i64 [ %inc, %for.inc ], [ 0, %entry ]
				%arrayidx = getelementptr inbounds i32, ptr %b, i64 %i.011
				%0 = load i32, ptr %arrayidx, align 4
				%cmp1 = icmp ne i32 %0, 0
				br i1 %cmp1, label %if.then, label %for.inc

				if.then:
				%arrayidx3 = getelementptr inbounds i32, ptr %a, i64 %i.011
				%1 = load i32, ptr %arrayidx3, align 4
				%add = add i32 %0, %1
				store i32 %add, ptr %arrayidx3, align 4
				br label %for.inc

				for.inc:
				%inc = add nuw nsw i64 %i.011, 1
				%exitcond.not = icmp eq i64 %inc, %n
				br i1 %exitcond.not, label %exit, label %for.body

				exit:
				ret void
				}

llvm/test/Transforms/LoopVectorize/vectorize-vp-intrinsics-no-masking.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -passes=loop-vectorize \
				; RUN: -force-tail-folding-style=data-with-evl \
				; RUN: -prefer-predicate-over-epilogue=predicate-dont-vectorize \
				; RUN: -force-target-supports-scalable-vectors -scalable-vectorization=on -S < %s \| FileCheck --check-prefix=IF-EVL %s

				; RUN: opt -passes=loop-vectorize \
				; RUN: -force-tail-folding-style=none \
				; RUN: -prefer-predicate-over-epilogue=predicate-dont-vectorize \
				; RUN: -force-target-supports-scalable-vectors -scalable-vectorization=on -S < %s \| FileCheck --check-prefix=NO-VP %s

				define i32 @no_masking() {
				; IF-EVL-LABEL: @no_masking(
				; IF-EVL-NEXT: entry:
				; IF-EVL-NEXT: br label [[BODY:%.*]]
				; IF-EVL: body:
				; IF-EVL-NEXT: [[P:%.]] = phi i32 [ 1, [[ENTRY:%.]] ], [ [[INC:%.*]], [[BODY]] ]
				; IF-EVL-NEXT: [[INC]] = add i32 [[P]], 1
				; IF-EVL-NEXT: [[CMP:%.*]] = icmp eq i32 [[INC]], 0
				; IF-EVL-NEXT: br i1 [[CMP]], label [[END:%.*]], label [[BODY]]
				; IF-EVL: end:
				; IF-EVL-NEXT: ret i32 0
				;
				; NO-VP-LABEL: @no_masking(
				; NO-VP-NEXT: entry:
				; NO-VP-NEXT: br label [[BODY:%.*]]
				; NO-VP: body:
				; NO-VP-NEXT: [[P:%.]] = phi i32 [ 1, [[ENTRY:%.]] ], [ [[INC:%.*]], [[BODY]] ]
				; NO-VP-NEXT: [[INC]] = add i32 [[P]], 1
				; NO-VP-NEXT: [[CMP:%.*]] = icmp eq i32 [[INC]], 0
				; NO-VP-NEXT: br i1 [[CMP]], label [[END:%.*]], label [[BODY]]
				; NO-VP: end:
				; NO-VP-NEXT: ret i32 0
				;
				entry:
				br label %body

				body:
				%p = phi i32 [ 1, %entry ], [ %inc, %body ]
				%inc = add i32 %p, 1
				%cmp = icmp eq i32 %inc, 0
				br i1 %cmp, label %end, label %body

				end:
				ret i32 0
				}

llvm/test/Transforms/LoopVectorize/vectorize-vp-intrinsics-reverse-load-store.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -passes=loop-vectorize \
				; RUN: -force-tail-folding-style=data-with-evl \
				; RUN: -prefer-predicate-over-epilogue=predicate-dont-vectorize \
				; RUN: -force-target-supports-scalable-vectors -scalable-vectorization=on -S < %s \| FileCheck --check-prefix=IF-EVL %s

				; RUN: opt -passes=loop-vectorize \
				; RUN: -force-tail-folding-style=none \
				; RUN: -prefer-predicate-over-epilogue=predicate-dont-vectorize \
				; RUN: -force-target-supports-scalable-vectors -scalable-vectorization=on -S < %s \| FileCheck --check-prefix=NO-VP %s

				define void @reverse_load_store(i64 %startval, ptr noalias %ptr, ptr noalias %ptr2) {
				; IF-EVL-LABEL: @reverse_load_store(
				; IF-EVL-NEXT: entry:
				; IF-EVL-NEXT: br label [[FOR_BODY:%.*]]
				; IF-EVL: for.body:
				; IF-EVL-NEXT: [[ADD_PHI:%.]] = phi i64 [ [[STARTVAL:%.]], [[ENTRY:%.]] ], [ [[ADD:%.]], [[FOR_BODY]] ]
				; IF-EVL-NEXT: [[I:%.]] = phi i32 [ 0, [[ENTRY]] ], [ [[INC:%.]], [[FOR_BODY]] ]
				; IF-EVL-NEXT: [[ADD]] = add i64 [[ADD_PHI]], -1
				; IF-EVL-NEXT: [[GEPL:%.]] = getelementptr inbounds i32, ptr [[PTR:%.]], i64 [[ADD]]
				; IF-EVL-NEXT: [[TMP:%.*]] = load i32, ptr [[GEPL]], align 4
				; IF-EVL-NEXT: [[GEPS:%.]] = getelementptr inbounds i32, ptr [[PTR2:%.]], i64 [[ADD]]
				; IF-EVL-NEXT: store i32 [[TMP]], ptr [[GEPS]], align 4
				; IF-EVL-NEXT: [[INC]] = add i32 [[I]], 1
				; IF-EVL-NEXT: [[EXITCOND:%.*]] = icmp ne i32 [[INC]], 1024
				; IF-EVL-NEXT: br i1 [[EXITCOND]], label [[FOR_BODY]], label [[LOOPEND:%.*]]
				; IF-EVL: loopend:
				; IF-EVL-NEXT: ret void
				;
				; NO-VP-LABEL: @reverse_load_store(
				; NO-VP-NEXT: entry:
				; NO-VP-NEXT: br label [[FOR_BODY:%.*]]
				; NO-VP: for.body:
				; NO-VP-NEXT: [[ADD_PHI:%.]] = phi i64 [ [[STARTVAL:%.]], [[ENTRY:%.]] ], [ [[ADD:%.]], [[FOR_BODY]] ]
				; NO-VP-NEXT: [[I:%.]] = phi i32 [ 0, [[ENTRY]] ], [ [[INC:%.]], [[FOR_BODY]] ]
				; NO-VP-NEXT: [[ADD]] = add i64 [[ADD_PHI]], -1
				; NO-VP-NEXT: [[GEPL:%.]] = getelementptr inbounds i32, ptr [[PTR:%.]], i64 [[ADD]]
				; NO-VP-NEXT: [[TMP:%.*]] = load i32, ptr [[GEPL]], align 4
				; NO-VP-NEXT: [[GEPS:%.]] = getelementptr inbounds i32, ptr [[PTR2:%.]], i64 [[ADD]]
				; NO-VP-NEXT: store i32 [[TMP]], ptr [[GEPS]], align 4
				; NO-VP-NEXT: [[INC]] = add i32 [[I]], 1
				; NO-VP-NEXT: [[EXITCOND:%.*]] = icmp ne i32 [[INC]], 1024
				; NO-VP-NEXT: br i1 [[EXITCOND]], label [[FOR_BODY]], label [[LOOPEND:%.*]]
				; NO-VP: loopend:
				; NO-VP-NEXT: ret void
				;
				entry:
				br label %for.body

				for.body:
				%add.phi = phi i64 [ %startval, %entry ], [ %add, %for.body ]
				%i = phi i32 [ 0, %entry ], [ %inc, %for.body ]
				%add = add i64 %add.phi, -1
				%gepl = getelementptr inbounds i32, ptr %ptr, i64 %add
				%tmp = load i32, ptr %gepl, align 4
				%geps = getelementptr inbounds i32, ptr %ptr2, i64 %add
				store i32 %tmp, ptr %geps, align 4
				%inc = add i32 %i, 1
				%exitcond = icmp ne i32 %inc, 1024
				br i1 %exitcond, label %for.body, label %loopend

				loopend:
				ret void
				}

llvm/test/Transforms/LoopVectorize/vectorize-vp-intrinsics.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -passes=loop-vectorize \
				; RUN: -force-tail-folding-style=data-with-evl \
				; RUN: -prefer-predicate-over-epilogue=predicate-dont-vectorize \
				; RUN: -force-target-supports-scalable-vectors -scalable-vectorization=on -S < %s \| FileCheck --check-prefix=IF-EVL %s

				; RUN: opt -passes=loop-vectorize \
				; RUN: -force-tail-folding-style=none \
				; RUN: -prefer-predicate-over-epilogue=predicate-dont-vectorize \
				; RUN: -force-target-supports-scalable-vectors -scalable-vectorization=on -S < %s \| FileCheck --check-prefix=NO-VP %s

				define void @foo(ptr noalias %a, ptr noalias %b, ptr noalias %c, i64 %N) {
				; IF-EVL-LABEL: @foo(
				; IF-EVL-NEXT: entry:
				; IF-EVL-NEXT: br label [[FOR_BODY:%.*]]
				; IF-EVL: for.body:
				; IF-EVL-NEXT: [[IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.*]], [[FOR_BODY]] ]
				; IF-EVL-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, ptr [[B:%.]], i64 [[IV]]
				; IF-EVL-NEXT: [[TMP0:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
				AyalUnsubmitted Not Done Reply Inline Actions IF-EVL can share checks with NO-VP? Ayal: IF-EVL can share checks with NO-VP?
				; IF-EVL-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i32, ptr [[C:%.]], i64 [[IV]]
				; IF-EVL-NEXT: [[TMP1:%.*]] = load i32, ptr [[ARRAYIDX2]], align 4
				; IF-EVL-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP1]], [[TMP0]]
				; IF-EVL-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i32, ptr [[A:%.]], i64 [[IV]]
				; IF-EVL-NEXT: store i32 [[ADD]], ptr [[ARRAYIDX4]], align 4
				; IF-EVL-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
				; IF-EVL-NEXT: [[EXITCOND_NOT:%.]] = icmp eq i64 [[IV_NEXT]], [[N:%.]]
				; IF-EVL-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]]
				; IF-EVL: for.cond.cleanup:
				; IF-EVL-NEXT: ret void
				;
				; NO-VP-LABEL: @foo(
				; NO-VP-NEXT: entry:
				; NO-VP-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
				; NO-VP-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], [[TMP0]]
				; NO-VP-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; NO-VP: vector.ph:
				; NO-VP-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
				; NO-VP-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], [[TMP1]]
				; NO-VP-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
				; NO-VP-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
				; NO-VP-NEXT: br label [[VECTOR_BODY:%.*]]
				; NO-VP: vector.body:
				; NO-VP-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; NO-VP-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 0
				; NO-VP-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, ptr [[B:%.]], i64 [[TMP3]]
				; NO-VP-NEXT: [[TMP5:%.*]] = getelementptr inbounds i32, ptr [[TMP4]], i32 0
				; NO-VP-NEXT: [[WIDE_LOAD:%.*]] = load <vscale x 1 x i32>, ptr [[TMP5]], align 4
				; NO-VP-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, ptr [[C:%.]], i64 [[TMP3]]
				; NO-VP-NEXT: [[TMP7:%.*]] = getelementptr inbounds i32, ptr [[TMP6]], i32 0
				; NO-VP-NEXT: [[WIDE_LOAD1:%.*]] = load <vscale x 1 x i32>, ptr [[TMP7]], align 4
				; NO-VP-NEXT: [[TMP8:%.*]] = add nsw <vscale x 1 x i32> [[WIDE_LOAD1]], [[WIDE_LOAD]]
				; NO-VP-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, ptr [[A:%.]], i64 [[TMP3]]
				; NO-VP-NEXT: [[TMP10:%.*]] = getelementptr inbounds i32, ptr [[TMP9]], i32 0
				; NO-VP-NEXT: store <vscale x 1 x i32> [[TMP8]], ptr [[TMP10]], align 4
				; NO-VP-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP2]]
				; NO-VP-NEXT: [[TMP11:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; NO-VP-NEXT: br i1 [[TMP11]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
				; NO-VP: middle.block:
				; NO-VP-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
				; NO-VP-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
				; NO-VP: scalar.ph:
				; NO-VP-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; NO-VP-NEXT: br label [[FOR_BODY:%.*]]
				; NO-VP: for.body:
				; NO-VP-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
				; NO-VP-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[IV]]
				; NO-VP-NEXT: [[TMP12:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
				; NO-VP-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds i32, ptr [[C]], i64 [[IV]]
				; NO-VP-NEXT: [[TMP13:%.*]] = load i32, ptr [[ARRAYIDX2]], align 4
				; NO-VP-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP13]], [[TMP12]]
				; NO-VP-NEXT: [[ARRAYIDX4:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[IV]]
				; NO-VP-NEXT: store i32 [[ADD]], ptr [[ARRAYIDX4]], align 4
				; NO-VP-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
				; NO-VP-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], [[N]]
				; NO-VP-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
				; NO-VP: for.cond.cleanup:
				; NO-VP-NEXT: ret void
				;
				entry:
				br label %for.body

				for.body:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%arrayidx = getelementptr inbounds i32, ptr %b, i64 %iv
				%0 = load i32, ptr %arrayidx, align 4
				%arrayidx2 = getelementptr inbounds i32, ptr %c, i64 %iv
				%1 = load i32, ptr %arrayidx2, align 4
				%add = add nsw i32 %1, %0
				%arrayidx4 = getelementptr inbounds i32, ptr %a, i64 %iv
				store i32 %add, ptr %arrayidx4, align 4
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, %N
				br i1 %exitcond.not, label %for.cond.cleanup, label %for.body

				for.cond.cleanup:
				ret void
				}
				fhahnUnsubmitted Not Done Reply Inline Actions nit: not needed, same for the `zext i32 %N to i64`, you can pass `N` as i64 directly. fhahn: nit: not needed, same for the `zext i32 %N to i64`, you can pass `N` as i64 directly.
				vkmrUnsubmitted Not Done Reply Inline Actions Good catch! vkmr: Good catch!

llvm/test/Transforms/LoopVectorize/vplan-vp-intrinsics.ll

This file was added.

				; REQUIRES: asserts

				; RUN: opt -passes=loop-vectorize -debug-only=loop-vectorize \
				; RUN: -force-tail-folding-style=data-with-evl \
				; RUN: -force-target-supports-scalable-vectors -scalable-vectorization=on \
				AyalUnsubmitted Not Done Reply Inline Actions This is IF-EVL, but its checks coincide with those of NO-VP? Perhaps use some combined CHECK Ayal: This is IF-EVL, but its checks coincide with those of NO-VP? Perhaps use some combined CHECK
				; RUN: -disable-output < %s 2>&1 \| FileCheck --check-prefixes=NO-VP %s

				; RUN: opt -passes=loop-vectorize -debug-only=loop-vectorize \
				; RUN: -force-tail-folding-style=none \
				; RUN: -prefer-predicate-over-epilogue=predicate-dont-vectorize \
				; RUN: -force-target-supports-scalable-vectors -scalable-vectorization=on \
				; RUN: -disable-output < %s 2>&1 \| FileCheck --check-prefixes=NO-VP %s

				define void @foo(ptr noalias %a, ptr noalias %b, ptr noalias %c, i64 %N) {
				; NO-VP-NOT: EXPLICIT-VECTOR-LENGTH-BASED-IV-PHI

				entry:
				br label %for.body

				for.body:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%arrayidx = getelementptr inbounds i32, ptr %b, i64 %iv
				%0 = load i32, ptr %arrayidx, align 4
				%arrayidx2 = getelementptr inbounds i32, ptr %c, i64 %iv
				%1 = load i32, ptr %arrayidx2, align 4
				%add = add nsw i32 %1, %0
				%arrayidx4 = getelementptr inbounds i32, ptr %a, i64 %iv
				store i32 %add, ptr %arrayidx4, align 4
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, %N
				br i1 %exitcond.not, label %for.cond.cleanup, label %for.body

				for.cond.cleanup:
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[LV, VP]VP intrinsics support for the Loop VectorizerNeeds ReviewPublic

Details

Tentative Development Roadmap

Diff Detail

Event Timeline

Revision Contents

Diff 558247

llvm/include/llvm/Analysis/TargetTransformInfo.h

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/lib/Transforms/Vectorize/VPlan.h

llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

llvm/lib/Transforms/Vectorize/VPlanTransforms.h

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

llvm/lib/Transforms/Vectorize/VPlanValue.h

llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp

llvm/test/Transforms/LoopVectorize/RISCV/inloop-reduction.ll

llvm/test/Transforms/LoopVectorize/RISCV/vectorize-vp-intrinsics.ll

llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-intrinsics.ll

llvm/test/Transforms/LoopVectorize/X86/vectorize-vp-intrinsics.ll

llvm/test/Transforms/LoopVectorize/X86/vplan-vp-intrinsics.ll

llvm/test/Transforms/LoopVectorize/vectorize-vp-intrinsics-gather-scatter.ll

llvm/test/Transforms/LoopVectorize/vectorize-vp-intrinsics-interleave.ll

llvm/test/Transforms/LoopVectorize/vectorize-vp-intrinsics-iv32.ll

llvm/test/Transforms/LoopVectorize/vectorize-vp-intrinsics-masked-loadstore.ll

llvm/test/Transforms/LoopVectorize/vectorize-vp-intrinsics-no-masking.ll

llvm/test/Transforms/LoopVectorize/vectorize-vp-intrinsics-reverse-load-store.ll

llvm/test/Transforms/LoopVectorize/vectorize-vp-intrinsics.ll

llvm/test/Transforms/LoopVectorize/vplan-vp-intrinsics.ll

[LV, VP]VP intrinsics support for the Loop Vectorizer
Needs ReviewPublic