This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/lib/Transforms/Vectorize/
-
lib/
-
Transforms/
-
Vectorize/
81/139
LoopVectorize.cpp

Differential D99750

[LV, VP]VP intrinsics support for the Loop Vectorizer
Needs ReviewPublic

Authored by ABataev on Apr 1 2021, 10:32 AM.

Download Raw Diff

Details

Reviewers

rogfer01
simoll
sdesmalen
dmgreen
craig.topper
bmahjour
fhahn
hussainjk
cameron.mcinally
vkmr
reames
Ayal
evandro

Summary

This patch introduces generating VP intrinsics in the Loop Vectorizer.

Currently the Loop Vectorizer supports vector predication in a very limited capacity via tail-folding and masked load/store/gather/scatter intrinsics. However, this does not let architectures with active vector length predication support take advantage of their capabilities. Architectures with general masked predication support also can only take advantage of predication on memory operations. By having a way for the Loop Vectorizer to generate Vector Predication intrinsics, which (will) provide a target-independent way to model predicated vector instructions, These architectures can make better use of their predication capabilities.

Our first approach (implemented in this patch) builds on top of the existing tail-folding mechanism in the LV, but instead of generating masked intrinsics for memory operations it generates VP intrinsics for loads/stores instructions.

Other important part of this approach is how the Explicit Vector Length is computed. (We use active vector length and explicit vector length interchangeably; VP intrinsics define this vector length parameter as Explicit Vector Length (EVL)). We consider the following three ways to compute the EVL parameter for the VP Intrinsics.

The simplest way is to use the VF as EVL and rely solely on the mask parameter to control predication. The mask parameter is the same as computed for current tail-folding implementation.
The second way is to insert instructions to compute min(VF, trip_count - index) for each vector iteration.
For architectures like RISC-V, which have special instruction to compute/set an explicit vector length, we also introduce an experimental intrinsic get_vector_length, that can be lowered to architecture specific instruction(s) to compute EVL.

Also, added a new recipe to emit instructions for computing EVL. Using VPlan in this way will eventually help build and compare VPlans corresponding to different strategies and alternatives.

Tentative Development Roadmap

Use vp-intrinsics for all possible vector operations. That work has 2 possible implementations:
1. Introduce a new pass which transforms emitted vector instructions to vp intrinsics if the the loop was transformed to use predication for loads/stores. The advantage of this approach is that it does not require many changes in the loop vectorizer itself. The disadvantage is that it may require to copy some existing functionality from the loop vectorizer in a separate patch, have similar code in the different passes and perform the same analysis 2 times, at least.
2. Extend Loop Vectorizer using VectorBuildor and make it emit vp intrinsics automatically in presence of EVL value. The advantage is that it does not require a separate pass, thus it may reduce compile time. Plus, we can avoid code duplication. It requires some extra work in the LoopVectorizer to add VectorBuilder support and smart vector instructions/vp intrinsics emission. Also, to fully support Loop Vectorizer it will require adding a new PHI recipe to handle EVL on the previous iteration + extending several existing recipes with the new operands (depends on the design).
Switch to vp-intrinsics for memory operations for VLS and VLA vectorizations.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Rebase, fix issue with exit condition

Harbormaster completed remote builds in B257753: Diff 557587.Oct 4 2023, 9:44 AM

Rebase

Harbormaster completed remote builds in B257817: Diff 557696.Oct 12 2023, 8:41 PM

Thanks for the latest update! It looks like some of the unit tests (check-llvm-unit) are failing (maybe related to the verifier changes?), would be good to check and update them.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
6338	nit: if EVL is preferred?
9608	nit: `EC` unused?
9613	nit: `If EVLPart`
llvm/lib/Transforms/Vectorize/VPlan.h
2144 ↗	(On Diff #557696)	`VPExplicitVectorLengthExitPHISC`?
2155 ↗	(On Diff #557696)	nit: active lane mask phi?
llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
1725 ↗	(On Diff #557696)	nit: VPWidenPHIRecipe implies vector phi, but this is a scalar phi.
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
966 ↗	(On Diff #557696)	I think using `Exit` condition may be confusing, it replaces the predicate for the header mask.
1026 ↗	(On Diff #557696)	nit: `Entry->Header`
1028 ↗	(On Diff #557696)	How ActiveLaneMask factor in here?
1037 ↗	(On Diff #557696)	Would be good to describe here what recipes are added and what's changed.
1039 ↗	(On Diff #557696)	Simpler to first get the canonical IV via `getCanonicalIV` and use it to access its increment?
1055 ↗	(On Diff #557696)	IIUC this introduces `EVLExitPhi` and uses it for the canonical IV increment only, which controls the exit and adjust the canonical IV. Would it be possible to do it the other way around, i.e. keep the canonical induction incremented by the canonical IV increment (thus keeping it canonical), and instead have `EVLExitPhi` updated by `ExplicitVectorLengthIVIncrement`, and updating the relevant users (all except the canonical increment?) to use that one?
832 ↗	(On Diff #555150)	Thanks for clarifying! Does the current codegen in the patch work correctly for cases where we execute more than 1 iteration for `EVL < VF`? IIUC the current approach with rounding up the trip count and using VF as increment assumes only one extra iteration.

ABataev added inline comments.Oct 16 2023, 4:08 AM

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
832 ↗	(On Diff #555150)	The total number of iterations is the same, just the vector length changes by balancing the value. If EVL is less than VLMAX, EVL is used as vector length. Only if VLMAX < AVL < 2 * VLMAX some magic may happen, i.e. in last 2 (vectorized) iterations.

Rebase, address comments

Harbormaster completed remote builds in B257861: Diff 557762.Oct 18 2023, 10:28 AM

fhahn added inline comments.Oct 19 2023, 7:53 AM

llvm/lib/Transforms/Vectorize/VPlan.h
2139 ↗	(On Diff #557762)	This is out of date now I think as now the CanonicalIV is left unchanged and used for the countable exit condition. There might be a better name for the recipe, as the current one doesn't seem to capture the essence of the recipe, which effectively represents the current index of elements to process in the current iteration, by account for EVL.
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
966 ↗	(On Diff #557762)	nit: `replaceHeaderPredicateWithIdiom`?
1027 ↗	(On Diff #557762)	nit: all uses except the canonical IV increment.
1028 ↗	(On Diff #557762)	nit: the only user is CanonicalIVIncrement I think, branch-on-count uses the increment.
1079 ↗	(On Diff #557762)	The only users remaining should be CanonicalIVIncrement. Simpler to just do the below? // Replace all uses of VPCanonicalIVPHIRecipe by // VPExplicitVectorLengthPHIRecipe except for - // VPInstruction::CanonicalIVIncrement and VPCanonicalIVPHIRecipe itself. - for (VPUser U : to_vector(CanonicalIVPHI->users())) { - if (auto I = dyn_cast<VPInstruction>(U); - I && I->getOpcode() == VPInstruction::CanonicalIVIncrement) - continue; - if (isa<VPCanonicalIVPHIRecipe>(U)) - continue; - auto *UI = dyn_cast<VPRecipeBase>(U); - if (!UI) - continue; - for (unsigned Idx = 0, E = UI->getNumOperands(); Idx < E; ++Idx) - if (UI->getOperand(Idx) == CanonicalIVPHI) - UI->setOperand(Idx, EVLPhi); - } - // Cleanup dead recipes after the transformation. - removeDeadRecipes(Plan); + // VPInstruction::CanonicalIVIncrement. + CanonicalIVPHI->replaceAllUsesWith(EVLPhi); + CanonicalIVIncrement->setOperand(0, CanonicalIVPHI);
1093 ↗	(On Diff #557762)	This shouldn't introduce any new dead recipes, unless the canonical IV is only used to control the exit, so IIUC the only dead recipes introduced here should be due to `replacePredicateWithIdiom`. May be good to either clean up the recipes there or move `addVectorPredication` after `optimizeInductions`, but that would require passing in an extra flag to `optimize()`, so maybe better to clean up the dead recipes directly after removing their users.
llvm/lib/Transforms/Vectorize/VPlanTransforms.h
83 ↗	(On Diff #557762)	nit: not all users, all users except the canonical IV increment, right?
87 ↗	(On Diff #557762)	nit: `addExplicitVectorLength`?

Rebase, address comments

Thanks for the update. Some more comments inline. Mostly small suggestions, but there's one question if masked mem ops are handled correctly and a clarification about active vs effective vector length.

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
84 ↗	(On Diff #547860)	Just following up on this, should this the name be changed in TTI? Do you know the reason for referring to it has active vector length there vs effective vector length in the patch?
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9607–9608	Is the gather scatter case handled correctly for EVL at the moment?
llvm/lib/Transforms/Vectorize/VPlan.h
2140 ↗	(On Diff #557781)	Might be good to explicit say that it starts at 0 and gets incremented by EVL in each iteration of the vector loop?
llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
338 ↗	(On Diff #557781)	nit: `Compute EVL.` would be more accurate IIUC, nothing gets set in this function AFACIT.
371 ↗	(On Diff #557781)	might be good to have a test case where the induction is `i32` and no cast is needed.
1730 ↗	(On Diff #557781)	`evl.based.iv`?
1739 ↗	(On Diff #557781)	This still needs updating I think
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
991 ↗	(On Diff #557781)	`VPWidenCanonicalIVRecipe` should inherit from VPValue via VPHeaderPHIRecipe, so it should only define a single value and `WideCanonicalIV->getNumUsers()` should be sufficient.
1028 ↗	(On Diff #557781)	needs updating with new name
1030 ↗	(On Diff #557781)	needs updating with new name
1062 ↗	(On Diff #557781)	name needs updating
1080 ↗	(On Diff #557781)	name needs updating
llvm/lib/Transforms/Vectorize/VPlanTransforms.h
84 ↗	(On Diff #557781)	`VPExplicitVectorLengthPHIRecipe` needs updating with new name
85 ↗	(On Diff #557781)	maybe `only used to control the loop` or something like that
llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
210 ↗	(On Diff #557781)	`Plan` here should never be `nullptr`

ABataev marked 13 inline comments as done.Oct 23 2023, 8:03 AM

ABataev added inline comments.

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
84 ↗	(On Diff #547860)	We can do it later in a separate patch. EVL stands for Explicit vector length, TTI interface was introduced long before this patch. There are just different abbrevs for the same technique - Active Vector Length, Explicit Vector length, etc.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9607–9608	Added support for this.
llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
371 ↗	(On Diff #557781)	Added @iv32 function in the test
llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
210 ↗	(On Diff #557781)	It can be nullptr in the unit tests, had to add this to avoid crashes in unit tests.

Rebase, address comments

Harbormaster completed remote builds in B257912: Diff 557842.Oct 23 2023, 10:40 AM

fhahn added inline comments.Oct 26 2023, 11:46 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9607–9608	Not sure if we also have a test case for this path, do you know if this would be handled correctly at the moment?
9607–9608	Great thanks! Now that there is VP intrinsic handling in multiple places, would it be better to handle all EVL related codegen together, i.e. something like below to avoid complicating reading the existing non-EVL code. WDYT? for () Value VectorGep = State.get(getAddr(), Part); if (Value EVLPart = State.EVL ? State.get(State.EVL, Part) : nullptr) { NewSI = lowerUsingVectorIntrinsics(vectorGEP..) } else { existing code.... }
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
1089 ↗	(On Diff #557842)	Thinking a bit more about this, at the moment is only safe to replace the with TrueMask for VPWidenMemoryInstructionRecipes, for other recipes that use the mask it would be a potential mis-compile, correct? If not, then it would be good to have an assert that all users of the mask we replace are VPWidenMemoryInstructionRecipes. Might be good to also have a test case for such a scenario
llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
210 ↗	(On Diff #557781)	Ah I see. Do you happen to know which unit tests? I suspect they need fixing and also checking why this isn't already caught by verification.

Rebase, address comments

Fixed reversed loads/stores

Harbormaster completed remote builds in B257946: Diff 557905.Oct 26 2023, 4:40 PM

fhahn added inline comments.Oct 27 2023, 6:51 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9607–9608	can leave unchanged?
9607–9608	can leave unchanged?
9607–9608	can leave unchanged now?
9608–9609	can leave unchanged?
9613	Could you elaborate what better means here? Might have missed it, does the current code handle reverse?
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
1089 ↗	(On Diff #557842)	Just do double-check as it looks like the above may have been missed in the latest update, WDYT?

ABataev added inline comments.Oct 27 2023, 6:55 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9613	I disabled reverse support, see useVPWithVPEVLVectorization(), need support for vp_reverse intrinsic, which is not added yet.
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
1089 ↗	(On Diff #557842)	Yes, missed it, will fix

Rebase, address comments

ABataev marked 4 inline comments as done.Oct 27 2023, 7:47 AM

Harbormaster completed remote builds in B257951: Diff 557913.Oct 27 2023, 10:53 AM

Thanks for the updates! I think all correctness issues should now have been addressed AFAICT, some minor comments left inline

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9607–9608	Indent looks off here
9613	nit: drop `Better` here, as reverse loading isn't supported at all a the moment. Would be good to assert here that the load isn't reversed
9613	nit: Drop `better` here, as reverse storing isn't supported at all at the moment. Would be good to also assert.
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
1020 ↗	(On Diff #557913)	Would it be simpler just have a common function for finding all the header predicates and then have the code to update the users at the call sites?
1023 ↗	(On Diff #557913)	Added `VPValue::replaceUsesWithIf` in a002271972fb3fb2877bdb4abf9275b2c1291036 as there were already multiple places hand-rolling that functionality.

ABataev marked 5 inline comments as done.Nov 7 2023, 6:26 AM

ABataev added inline comments.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
1020 ↗	(On Diff #557913)	Tried it, does not look better. Need to remove WideCanonicalIV, it means need it to find it again after the procedure, this leads to code duplication and size increase. Replace parameter with the predicate function instead.

Rebase, address comments

Harbormaster completed remote builds in B258034: Diff 558035.Nov 7 2023, 7:39 AM

shiva0217 added a subscriber: shiva0217.Nov 8 2023, 12:16 AM

shiva0217 added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1650	If the loop contains reduction variables, there might need a mask to merge the last two iteration results. int a[128]; int foo (int end) { int size = 0; for (int i = 0; i < end; i++) size += a[i]; return size; } Should the case be guarded by `Legal->getReductionVars().empty() &&` ?
llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
1743 ↗	(On Diff #558035)	There is a case that the PHI didnt' been inserted at top of basic block. int foo (int value, int buf, int end) { int tmp; for (tmp = buf; tmp < end; tmp++) value -= tmp; return value; } Should we specify insertion point? Something like: PHINode EntryPart = PHINode::Create( Start->getType(), 2, "evl.based.iv", &State.CFG.PrevBB->getFirstInsertionPt());
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
986 ↗	(On Diff #558035)	There is a case that VPWidenCanonicalIVRecipe didn't be generated with tail folding. int i; int foo (int q, int z) { int e = 0; while (z < 1) { q = z * 2; if (q != 0) for (i = 0; i < 2; ++i) e = 5; ++z; } return e; } `for (i = 0; i < 2; ++i)` been simplifed as `store i32 2, ptr @i`. Both pointer and store value are loop-invariant, so the mask(VPWidenCanonicalIVRecipe) might not be generated. Should we suppress the replacement when the mask is not available?

ABataev updated this revision to Diff 558054.Nov 8 2023, 6:36 AM

ABataev marked 3 inline comments as done.

Rebase, address comments

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1650	Added
llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
1743 ↗	(On Diff #558035)	Fixed in VPlanTransforms.cpp by inserting the recipe immediately after CanonicalIVPHI.
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
986 ↗	(On Diff #558035)	Fixed, added the test

Harbormaster completed remote builds in B258045: Diff 558054.Nov 8 2023, 9:21 AM

Rebase, ping!

Harbormaster completed remote builds in B258072: Diff 558097.Nov 14 2023, 8:41 AM

fhahn added inline comments.Nov 15 2023, 6:43 AM

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
1743 ↗	(On Diff #558035)	I think `VPEVLBasedIVPHIRecipe` should be turned into a subclass of VPHeaderPHIRecipe, this will also ensure that the VPlan verifier checks it is in the header section
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
986 ↗	(On Diff #558035)	Was this fixed by adding the `bool KeepVPCanonicalWidenRecipes` flag? What's the test case for this? There's a new `no_masking` case, but it has an empty body and no vector code is generated?
llvm/test/Transforms/LoopVectorize/RISCV/vectorize-vp-intrinsics.ll
5 ↗	(On Diff #558097)	no need to redirect stderr here?
7 ↗	(On Diff #558097)	Can this configuration be used for target-independent tests?
385 ↗	(On Diff #558097)	This test file is getting quite big with 3 different run lines. I think it would be good to try to split this up a bit, to make it easier to see what's going on. I'd recommend having the test cases for various legality issues as target-independent tests with force flags (force EVL support, VF and IC). And keep cost-model specific tests target specific.
llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-intrinsics.ll
3 ↗	(On Diff #558097)	Can this test be target independent? does it need to check the no VP case?

ABataev added inline comments.Nov 15 2023, 6:55 AM

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
986 ↗	(On Diff #558035)	Yes. Yes, this test is a reduced version of the failed case
llvm/test/Transforms/LoopVectorize/RISCV/vectorize-vp-intrinsics.ll
5 ↗	(On Diff #558097)	Will drop
7 ↗	(On Diff #558097)	Not now, it relies on the check of the TTI interface for now
llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-intrinsics.ll
3 ↗	(On Diff #558097)	No Yes, need to check that the option works correctly

ABataev added inline comments.Nov 16 2023, 8:47 AM

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
1743 ↗	(On Diff #558035)	You mean it should be dervied from VPEVLBasedIVPHIRecipe? Already done.

Rebase, address comments

llvm/test/Transforms/LoopVectorize/RISCV/vectorize-vp-intrinsics.ll
7 ↗	(On Diff #558097)	Added several target independent tests
385 ↗	(On Diff #558097)	Done
llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-intrinsics.ll
3 ↗	(On Diff #558097)	Added also target independent version

Harbormaster completed remote builds in B258086: Diff 558115.Nov 16 2023, 6:01 PM

fhahn added inline comments.Nov 17 2023, 12:51 PM

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
986 ↗	(On Diff #558035)	Is it possible it is over-reduced? The same IR seems to be generated both with and without `KeepVPCanonicalWidenRecipes` IIUC, because the loop is not vectorized due to being empty. What's the issue if there's no mask/canonical widen recipe? Wouldn't it be fine to jus not replace anything?

fhahn added inline comments.Nov 17 2023, 12:53 PM

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
1743 ↗	(On Diff #558035)	Great!

ABataev added inline comments.Nov 17 2023, 12:56 PM

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
986 ↗	(On Diff #558035)	When I checked it, it crashed without this parameter, maybe there were some other changes.

fhahn added inline comments.Nov 17 2023, 1:20 PM

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
986 ↗	(On Diff #558035)	Can you check if it still crashes? Would be good to understand exactly what the issue is, and if possible avoid having a separate `KeepVPCanonicalWidenRecipes` flag

shiva0217 added inline comments.Nov 19 2023, 11:57 PM

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
986 ↗	(On Diff #558035)	`KeepVPCanonicalWidenRecipes` might be motivated by the case that VPWidenCanonicalIVRecipe once exist but been optimized out to the (VPWidenIntOrFpInductionRecipe IV ule trip-count) which let replaceHeaderPredicateWithIdiom fail to replace (VPWidenCanonicalIV ule trip-count) to all-true mask. void foo (char *a) { for (int i = 0; i < 256; i++) if (i != '\n') a[i] = 0; }

Rebase
Removed KeepVPCanonicalWidenRecipes parameter since there is check for VPWidenCanonicalIVRecipe presence in replaceHeaderPredicateWithIdiom

Harbormaster completed remote builds in B258100: Diff 558136.Nov 20 2023, 1:40 PM

Rebase, ping!

Harbormaster completed remote builds in B258128: Diff 558174.Nov 27 2023, 6:30 AM

fhahn added inline comments.Nov 27 2023, 12:43 PM

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
986 ↗	(On Diff #558035)	`VPWidenCanonicalIV` should only be replaced by something else if here's user that accesses the vector values of it I think. Could you share an IR test case that would show an issue? It is still not clear to me what the exact issue would be. @ABataev just to double check, the latest version shouldn't have any issues with @shiva0217's test case, correct?

I think it may be also good to run clang-format again on the latest version of the patch if possible

In D99750#4657540, @fhahn wrote:

I think it may be also good to run clang-format again on the latest version of the patch if possible

I have a special hook for formatting checking, it does not report any issues about formatting

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
986 ↗	(On Diff #558035)	Yes, I added reduced test cases to this patch already and they do not crash

Rebase, fixes.

Harbormaster completed remote builds in B258148: Diff 558199.Nov 30 2023, 5:02 PM

Rebase

Harbormaster completed remote builds in B258154: Diff 558211.Dec 5 2023, 6:39 AM

GitHub <noreply@github.com> mentioned this in rGa5891fa4d2b7: [VPlan] Initial modeling of VF * UF as VPValue. (#74761).Dec 8 2023, 10:30 AM

Went over parts of this patch again, have other parts yet to go over again. Status of various past comments should be clarified.

llvm/include/llvm/IR/IRBuilder.h
2576–2581 ↗	(On Diff #558054)	This is used by VectorBuilder, committed in 9959cdb66a02b, some 1800 lines above.
838 ↗	(On Diff #558136)	Remove getTrueVector() below, given getAllOnesMask() here.
llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
79 ↗	(On Diff #512838)	Documentation of "hasActiveVectorLength()" indicator here should better explain the intent. E.g., whether the target supports Active Vector Length intrinsics for given \p Opcode, \p DataType and \p Alignment.
82 ↗	(On Diff #512838)	Looks like it to me, specifically to its 3rd TTI related bullet, but @ABataev should confirm.
84 ↗	(On Diff #547860)	Better use a single term, consistently, than have different abbreviations for the same thing. To clarify: `VF`: a constant number-of-elements provided within the Type, either a compile-time constant or a runtime constant, aka Fixed or Scalable, respectively. `mask`: a boolean vector of type `<VF x i1>` used by all vector intrinsics - both `llvm.masked.loads/stores/gather/scatter` and `llvm.vp.`. Let `FirstVF` denote the number-of-elements that can be processed in the first vector iteration. If all iterations can process FirstVF elements - because we know the trip count of the loop `N` is a multiple of FirstVF or because we leave a tail of remaining leftover iterations to be processed by a subsequent loop, then there is no need for any "Active/Explicit Vector Length" support. Otherwise, when folding the tail - the last vector iteration operates on `LastVF = N - IV <= FirstVF` elements where IV is the canonical induction variable of the loop. Combining LastVF with VF for all but last iteration yields `IterVF = min(N - IV, VF)` which varies per iteration. Conceptually, the vector type inside a tail-folded loop should use `IterVF` as its number-of-elements rather than `VF`, if types would support non-constant VF's. Instead, the type continues to use a conservative constant `VF` as a maximal/full number-of-elements across all loop iterations, and the excessive lanes of last iteration are masked-out by computing a vector `tail_mask` using a compare: `<VF x i1> %tail_mask = icmp ule <IV,IV+1,...,IV+VF-1>, <N,N,...,N>` or an intrinsic `<VF x i1> %tail_mask = llvm.get.active.lane.mask_VF(IV, N)` which captures this comparison w/o relying on vectorizing or broadcasting IV and N explicitly. This tail_mask is then folded into `mask &= %tail_mask`. The term `active` seems to refer to the dynamic, changing nature of the mask. But the name `get.active.lane.mask` is confusing - every* mask by definition is there to indicate which lanes are active. Perhaps a more accurate name would be `get.variable.vf.mask`. Another option could be `get.prefix.mask_VF(N-IV)` which simply produces a mask whose 'on' bits are the first N-IV bits, but there's an overflow case to consider. Now the `evl` support mentioned here, is closely tied to RVV's setvl, and involves two aspects: (1) providing a variable VF as a separate scalar operand to vector intrinsics - only the `llvm.vp.*` ones - alongside the mask operand rather than inside it; and (2) obtaining the variable VF for a given iteration by some target-dependent computation which may differ from min(N - IV, VF) for the last two iterations. The term `explicit` seems to refer to (1), in contrast to implicitly embedding the variable-VF inside the constant-VF via `mask`. Perhaps a more accurate name would be `get.variable.vf`, analogous to the above suggestion but providing the scalar variable VF rather than a mask thereof.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
297–303	(zlxu-rivai) IMHO, the vector length predication could be regarded as an another form of tail-folding, but not like ARM SVE which is based on active.lane.mask but based on an integer as vector length. Based on this notion of unification, it would be better to augment the enum TailFoldingStyle, for example, add new entries like TailFoldingStyle::DataWithEVL (serving the purpose of EVLOption::IfEVLSupport) and the existent TailFoldingStyle::None can be reused as EVLOption::NoPredication. (ABataev) Not necessarily. Generally speaking, we would like to support vectorized loop remainder in the future as one of the alternative solutions, so better to keep them separate. I agree with zlxu-rivai, this patch deals with yet another Style of tail folding, as in the proposed `DataWithEVL`. If/when EVL is desired w/o tail folding, a separate patch with potential options should be discussed.
308	Looks like it. Tail folding is "enabled", "forced", or absent.
1206	Better check if CurRec is a header PHI recipe?
1630	Reiterating the suggestion to use `VPI` to denote `llvm.vp.*` vector predicate intrinsicts, if/where needed, and "foldTailByEVL()" for deciding to use EVL style tail-folding.
1648	nit: move this TODO into a FIXME next to checking isSafeForAnyVectorWidth as other cases?
1774	+1 Suffice to say `PreferEVL`, as that implies `VPI`.
5428	Shouldn't/Doesn't ForceEVLSupport imply useVPWithVPEVLVectorization()? Other cases of using EVL can scalarize-and-predicate loads and stores? Early-return earlier.
5837	The logic of this method is getting excessively complicated. Suggest to turn this into an early-exiting if (!Legal->prepareToFoldTailByMasking()).
5842	Is there a test? Message intends to say that preference was indicated but ignored, i.e., tail will be folded (with UF>1) but w/o VP intrinsics - w/ or w/o EVL?
5849	ScalableVF is surely scalable, assert rather than ask? isNonZero or isVector?
5852	This method already has an undesired side-effect of setting CanFoldTailByMasking independent of returning MaxFactors, better avoid having it set yet another side-effect PreferVPWithVPEVLIntrinsics. Perhaps extend CanFoldTailByMasking from a bool to also indicate how the tail is folded, and/or extend the return value to carry more than a FixedScalableVFPair? Logic can be simplified into PreferVPWithVPEVLIntrinsics = PreferPredicateWithVPEVLIntrinsics == EVLOption::IfEVLSupported \|\| TTI.hasActiveVectorLength(0, nullptr, Align()); the rest is debug prints placed under !NDEBUG, or a single LLVM_DEBUG with some ternary ? selecting if the preference indicated is followed or ignored.
5858	"if the target support vector length predication" - we already checked that it does?
5869	This deserves a comment, explaining that a tail folded using VP intrinsics restricts the VF to be scalable.
9252–9253	What's the relation between useVPWithVPEVLVectorization and useActiveLaneMask? Should the latter cover the former, so that it suffices to check if (useActiveLaneMask(Style)), or is it meaningful to have both true?
9601	Reason explained above: execute() of recipes should be straightforward. This is one of the main guidelines outlined in the VPlan roadmap. This recipe is getting too complicated, should probably separate gather/scatter from wide load/store, and separate the pointer setting (as in [VPlan] Model address separately. #72164), independent of this patch. Recipe should indicate statically if EVL is used or not, to simplify code-gen and facilitate cost estimation, rather than having to check State.EVL during execute(). If multiple recipes share some common core, it can be shared via a common base class, as in VPHeaderPHIRecipe and VPRecipeWithIRFlags.
9602	These two functions each have a single caller, better defined as lambdas next to them? Simpler to separate into two separate store/scatter functions? Remove EVLPart because EVL currently works w/o unrolling, introduce it in the future as part of enabling EVL with unrolling?
9607–9608	Better simply check if (State.EVL) and then State.get() it, or could the latter return null? Raised as a nit below. Better avoid checking if State.EVL altogether during VPlan execution, as noted above.
9614	Moving tail folding to a transform, following VPlan roadmap, is already quite an endeavor, which would hopefully better facilitate this patch. Better avoid complicating it further.
llvm/lib/Transforms/Vectorize/VPlan.h
2155 ↗	(On Diff #557696)	Is a dedicated recipe needed, or would some general header-phi recipe/VPInstruction suffice?
llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
371 ↗	(On Diff #557781)	Have Phi and EVL be of the same type instead of needs to cast the latter to the former? Is a dedicated VPInstruction needed, or would a general Add suffice, similar to [VPlan] Initial modeling of VF * UF as VPValue. #74761?
349 ↗	(On Diff #558136)	This patch enables EVL for scalable VF's only, so best avoid pretending otherwise (w/o testing) and pass `true` for now. Extend to VF.isScalable when EVL is extended to handle fixed VF's - along with suitable tests.
357 ↗	(On Diff #558136)
359 ↗	(On Diff #558136)	to match RISC-V vsetvli terminology. Avoid AVL which also stands for Active Vector Length...
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
832 ↗	(On Diff #555150)	As @nikolaypanchenko clarified in https://reviews.llvm.org/D99750#inline-1521321 Hence the trip-count can be computed by taking the ceiling of TC divided by the EVL of first vector iteration, which can be computed in the pre-header. But if the loop iterates until an index reaches an upper bound, where the index is repeatedly bumped by a non-invariant EVL, then the index is not an Induction Variable and the upper bound does not qualify as a (scaled) trip-count, countability of the loop is hidden away.
llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-intrinsics.ll
19 ↗	(On Diff #558054)	UF>=1 or UF=1?
31 ↗	(On Diff #558054)	scalar-steps is needed only for UF>1, as this feeds scalar/cloned geps only?
33 ↗	(On Diff #558054)	Widened loads and store here should say that they use EVL, implicitly.
39 ↗	(On Diff #558054)	Can this simply be an ADD of vp<[[EVL_PHI]]> and vp<[[EVL]]>?
41 ↗	(On Diff #558054)	IF-EVL and FORCE-EVL are the same, check them together?
82 ↗	(On Diff #558054)	MASK is unused?
116 ↗	(On Diff #558054)	safe dep relies on VF*UF < 100, where VF relies on -riscv-v-vector-bits-max=128, vscale x 4 is omitted from VF, and UF>=1 is restricted?
llvm/test/Transforms/LoopVectorize/X86/vectorize-vp-intrinsics.ll
19–20 ↗	(On Diff #558054)	nit: can remove CHECKs for this redundant block which never jumps to the dead scalar loop.
50–67 ↗	(On Diff #558054)	nit: can remove the CHECKs for the dead scalar loop.
1 ↗	(On Diff #522575)	The three runs produce the same results, can combine their CHECKs.
llvm/test/Transforms/LoopVectorize/X86/vplan-vp-intrinsics.ll
19–43 ↗	(On Diff #558054)	All three runs {IF-EVL-NEXT, FORCE-EVL, NO-VP} result in the same VPlan; combine their checks together?
llvm/test/Transforms/LoopVectorize/vectorize-vp-intrinsics.ll
18 ↗	(On Diff #558136)	IF-EVL can share checks with NO-VP?
llvm/test/Transforms/LoopVectorize/vplan-vp-intrinsics.ll
4 ↗	(On Diff #558136)	This is IF-EVL, but its checks coincide with those of NO-VP? Perhaps use some combined CHECK

Rebase, address comments.
Moving it to github, since Phab does not send e-mails anymore.

Harbormaster completed remote builds in B258177: Diff 558247.Thu, Dec 21, 10:11 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

1 line

Diff 334865

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines

#include "llvm/Transforms/Vectorize/LoopVectorize.h"		#include "llvm/Transforms/Vectorize/LoopVectorize.h"
#include "LoopVectorizationPlanner.h"		#include "LoopVectorizationPlanner.h"
#include "VPRecipeBuilder.h"		#include "VPRecipeBuilder.h"
#include "VPlan.h"		#include "VPlan.h"
#include "VPlanHCFGBuilder.h"		#include "VPlanHCFGBuilder.h"
#include "VPlanPredicator.h"		#include "VPlanPredicator.h"
#include "VPlanTransforms.h"		#include "VPlanTransforms.h"
#include "VPlanValue.h"
#include "llvm/ADT/APInt.h"		#include "llvm/ADT/APInt.h"
#include "llvm/ADT/ArrayRef.h"		#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/DenseMapInfo.h"		#include "llvm/ADT/DenseMapInfo.h"
#include "llvm/ADT/Hashing.h"		#include "llvm/ADT/Hashing.h"
#include "llvm/ADT/MapVector.h"		#include "llvm/ADT/MapVector.h"
#include "llvm/ADT/None.h"		#include "llvm/ADT/None.h"
#include "llvm/ADT/Optional.h"		#include "llvm/ADT/Optional.h"
▲ Show 20 Lines • Show All 200 Lines • ▼ Show 20 Lines	cl::values(
clEnumValN(PreferVPIntrinsicsTy::IfAVLSupported,		clEnumValN(PreferVPIntrinsicsTy::IfAVLSupported,
"if-active-vector-length-support",		"if-active-vector-length-support",
"Only generate VP intrinsics if the target supports vector "		"Only generate VP intrinsics if the target supports vector "
"length predication."),		"length predication."),
clEnumValN(PreferVPIntrinsicsTy::WithoutAVLSupport,		clEnumValN(PreferVPIntrinsicsTy::WithoutAVLSupport,
"without-active-vector-length-support",		"without-active-vector-length-support",
"Generate VP intrinsics even if vector length predication "		"Generate VP intrinsics even if vector length predication "
"is not supported. This option is discouraged."),		"is not supported. This option is discouraged."),
clEnumValN(PreferVPIntrinsicsTy::ForceAVLSupport,		clEnumValN(PreferVPIntrinsicsTy::ForceAVLSupport,
		bmahjourUnsubmitted Done Reply Inline Actions don't see ForceAVLSupport being used anywhere in this patch. bmahjour: don't see ForceAVLSupport being used anywhere in this patch.
		vkmrUnsubmitted Done Reply Inline Actions Not explicitly but implicitly as the default choice if other options do not match. See the test cases for more clarity on how it works and what it results in. vkmr: Not explicitly but implicitly as the default choice if other options do not match. See the test…
"force-active-vector-length-support",		"force-active-vector-length-support",
"Assume that the target supports vector length predication "		"Assume that the target supports vector length predication "
"and generate VP intrinsics accordingly.")));		"and generate VP intrinsics accordingly.")));

static cl::opt<bool> MaximizeBandwidth(		static cl::opt<bool> MaximizeBandwidth(
		AyalUnsubmitted Done Reply Inline Actions Drop "experimental"? Ayal: Drop "experimental"?
"vectorizer-maximize-bandwidth", cl::init(false), cl::Hidden,		"vectorizer-maximize-bandwidth", cl::init(false), cl::Hidden,
cl::desc("Maximize bandwidth when selecting vectorization factor which "		cl::desc("Maximize bandwidth when selecting vectorization factor which "
"will be determined by the smallest type in loop."));		"will be determined by the smallest type in loop."));

static cl::opt<bool> EnableInterleavedMemAccesses(		static cl::opt<bool> EnableInterleavedMemAccesses(
"enable-interleaved-mem-accesses", cl::init(false), cl::Hidden,		"enable-interleaved-mem-accesses", cl::init(false), cl::Hidden,
		hiradityaUnsubmitted Done Reply Inline Actions What is the fourth value? hiraditya: What is the fourth value?
cl::desc("Enable vectorization on interleaved memory accesses in a loop"));		cl::desc("Enable vectorization on interleaved memory accesses in a loop"));

/// An interleave-group may need masking if it resides in a block that needs		/// An interleave-group may need masking if it resides in a block that needs
/// predication, or in order to mask away gaps.		/// predication, or in order to mask away gaps.
static cl::opt<bool> EnableMaskedInterleavedMemAccesses(		static cl::opt<bool> EnableMaskedInterleavedMemAccesses(
		AyalUnsubmitted Done Reply Inline Actions Drop "experimental" and "which will be removed in [the] future"? Or drop altogether - is a switch enabling EVL for default target needed or suffice to test for concrete RISC-V EVL target? Ayal: Drop "experimental" and "which will be removed in [the] future"? Or drop altogether - is a…
"enable-masked-interleaved-mem-accesses", cl::init(false), cl::Hidden,		"enable-masked-interleaved-mem-accesses", cl::init(false), cl::Hidden,
cl::desc("Enable vectorization on masked interleaved memory accesses in a loop"));		cl::desc("Enable vectorization on masked interleaved memory accesses in a loop"));

static cl::opt<unsigned> TinyTripCountInterleaveThreshold(		static cl::opt<unsigned> TinyTripCountInterleaveThreshold(
"tiny-trip-count-interleave-threshold", cl::init(128), cl::Hidden,		"tiny-trip-count-interleave-threshold", cl::init(128), cl::Hidden,
cl::desc("We don't interleave loops with a estimated constant trip count "		cl::desc("We don't interleave loops with a estimated constant trip count "
"below this number"));		"below this number"));

		zlxu-rivaiUnsubmitted Done Reply Inline Actions IMHO, the vector length predication could be regarded as an another form of tail-folding, but not like ARM SVE which is based on `active.lane.mask` but based on an integer as vector length. Based on this notion of unification, it would be better to augment the enum TailFoldingStyle, for example, add new entries like `TailFoldingStyle::DataWithEVL` (serving the purpose of `EVLOption::IfEVLSupport`) and the existent `TailFoldingStyle::None` can be reused as `EVLOption::NoPredication`. zlxu-rivai: IMHO, the vector length predication could be regarded as an another form of tail-folding, but…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Not necessarily. Generally speaking, we would like to support vectorized loop remainder in the future as one of the alternative solutions, so better to keep them separate. ABataev: Not necessarily. Generally speaking, we would like to support vectorized loop remainder in the…
		zlxu-rivaiUnsubmitted Done Reply Inline Actions Forgive my ignorance, is there any scenario that epilogue vectorization is profitable when EVL is support? since in my understanding, eliminate loop epilogue is an important point of EVL. zlxu-rivai: Forgive my ignorance, is there any scenario that epilogue vectorization is profitable when EVL…
		ABataevAuthorUnsubmitted Done Reply Inline Actions It may be, at least for some time. Some of the optimizations work better for countable loops. It requires some extra work and time, before that, at least, vectorized loop remainder may be faster in some cases. ABataev: It may be, at least for some time. Some of the optimizations work better for countable loops.
		AyalUnsubmitted Not Done Reply Inline Actions Do you mean to say "Some optimizations work better for an invariant VF", i.e., that of VL0, rather than "for countable loops"? Vectorizing tail with EVL may still produce countable vector loops, as noted above. Ayal: Do you mean to say "Some optimizations work better for an invariant VF", i.e., that of VL0…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Currently, vectorization with EVL will result in an uncountable loop. Tail vectorization is not always the best option for some targets ABataev: Currently, vectorization with EVL will result in an uncountable loop. Tail vectorization is not…
		AyalUnsubmitted Done Reply Inline Actions (zlxu-rivai) IMHO, the vector length predication could be regarded as an another form of tail-folding, but not like ARM SVE which is based on active.lane.mask but based on an integer as vector length. Based on this notion of unification, it would be better to augment the enum TailFoldingStyle, for example, add new entries like TailFoldingStyle::DataWithEVL (serving the purpose of EVLOption::IfEVLSupport) and the existent TailFoldingStyle::None can be reused as EVLOption::NoPredication. (ABataev) Not necessarily. Generally speaking, we would like to support vectorized loop remainder in the future as one of the alternative solutions, so better to keep them separate. I agree with zlxu-rivai, this patch deals with yet another Style of tail folding, as in the proposed `DataWithEVL`. If/when EVL is desired w/o tail folding, a separate patch with potential options should be discussed. Ayal: >> (zlxu-rivai) IMHO, the vector length predication could be regarded as an another form of…
static cl::opt<unsigned> ForceTargetNumScalarRegs(		static cl::opt<unsigned> ForceTargetNumScalarRegs(
"force-target-num-scalar-regs", cl::init(0), cl::Hidden,		"force-target-num-scalar-regs", cl::init(0), cl::Hidden,
cl::desc("A flag that overrides the target's number of scalar registers."));		cl::desc("A flag that overrides the target's number of scalar registers."));

static cl::opt<unsigned> ForceTargetNumVectorRegs(		static cl::opt<unsigned> ForceTargetNumVectorRegs(
		AyalUnsubmitted Done Reply Inline Actions Description talks about tail-folding, name of variable and command-line argument do not. Ayal: Description talks about tail-folding, name of variable and command-line argument do not.
		fhahnUnsubmitted Not Done Reply Inline Actions Looks like the description still needs updating? fhahn: Looks like the description still needs updating?
		AyalUnsubmitted Done Reply Inline Actions Looks like it. Tail folding is "enabled", "forced", or absent. Ayal: Looks like it. Tail folding is "enabled", "forced", or absent.
"force-target-num-vector-regs", cl::init(0), cl::Hidden,		"force-target-num-vector-regs", cl::init(0), cl::Hidden,
cl::desc("A flag that overrides the target's number of vector registers."));		cl::desc("A flag that overrides the target's number of vector registers."));

static cl::opt<unsigned> ForceTargetMaxScalarInterleaveFactor(		static cl::opt<unsigned> ForceTargetMaxScalarInterleaveFactor(
"force-target-max-scalar-interleave", cl::init(0), cl::Hidden,		"force-target-max-scalar-interleave", cl::init(0), cl::Hidden,
cl::desc("A flag that overrides the target's max interleave factor for "		cl::desc("A flag that overrides the target's max interleave factor for "
"scalar loops."));		"scalar loops."));

▲ Show 20 Lines • Show All 298 Lines • ▼ Show 20 Lines	public:
/// instruction (shuffle) for loop invariant values and for the induction		/// instruction (shuffle) for loop invariant values and for the induction
/// value. If this is the induction variable then we extend it to N, N+1, ...		/// value. If this is the induction variable then we extend it to N, N+1, ...
/// this is needed because each iteration in the loop corresponds to a SIMD		/// this is needed because each iteration in the loop corresponds to a SIMD
/// element.		/// element.
virtual Value getBroadcastInstrs(Value V);		virtual Value getBroadcastInstrs(Value V);

/// Create Instructions to compute Explicit Vector Length when using VP		/// Create Instructions to compute Explicit Vector Length when using VP
/// intrinsics.		/// intrinsics.
Value *createEVL();		Value *createEVL();
		bmahjourUnsubmitted Not Done Reply Inline Actions Could this be moved to the `VPWidenEVLRecipe` class? bmahjour: Could this be moved to the `VPWidenEVLRecipe` class?
		vkmrUnsubmitted Done Reply Inline Actions This is called from the `execute` method of the recipe and is responsible for generating IR instructions to compute `EVL`. Keeping it in `InnerLoopVectorizer` is consistent with how the widening methods for other recipes are a part of `InnerLoopVectorizer`. vkmr: This is called from the `execute` method of the recipe and is responsible for generating IR…

protected:		protected:
friend class LoopVectorizationPlanner;		friend class LoopVectorizationPlanner;

/// A small list of PHINodes.		/// A small list of PHINodes.
using PhiVector = SmallVector<PHINode *, 4>;		using PhiVector = SmallVector<PHINode *, 4>;

/// A type for scalarized values in the new loop. Each value from the		/// A type for scalarized values in the new loop. Each value from the
▲ Show 20 Lines • Show All 566 Lines • ▼ Show 20 Lines	else
// Just print the module name.		// Just print the module name.
OS << L->getHeader()->getParent()->getParent()->getModuleIdentifier();		OS << L->getHeader()->getParent()->getParent()->getModuleIdentifier();
OS.flush();		OS.flush();
}		}
return Result;		return Result;
}		}
#endif		#endif

void InnerLoopVectorizer::addNewMetadata(Instruction *To,		void InnerLoopVectorizer::addNewMetadata(Instruction *To,
		AyalUnsubmitted Done Reply Inline Actions Better check if CurRec is a header PHI recipe? Ayal: Better check if CurRec is a header PHI recipe?
const Instruction *Orig) {		const Instruction *Orig) {
// If the loop was versioned with memchecks, add the corresponding no-alias		// If the loop was versioned with memchecks, add the corresponding no-alias
// metadata.		// metadata.
if (LVer && (isa<LoadInst>(Orig) \|\| isa<StoreInst>(Orig)))		if (LVer && (isa<LoadInst>(Orig) \|\| isa<StoreInst>(Orig)))
LVer->annotateInstWithNoAlias(To, Orig);		LVer->annotateInstWithNoAlias(To, Orig);
}		}

void InnerLoopVectorizer::addMetadata(Instruction *To,		void InnerLoopVectorizer::addMetadata(Instruction *To,
▲ Show 20 Lines • Show All 407 Lines • ▼ Show 20 Lines	public:
/// Returns true if all loop blocks should be masked to fold tail loop.		/// Returns true if all loop blocks should be masked to fold tail loop.
bool foldTailByMasking() const { return FoldTailByMasking; }		bool foldTailByMasking() const { return FoldTailByMasking; }

bool blockNeedsPredication(BasicBlock *BB) const {		bool blockNeedsPredication(BasicBlock *BB) const {
return foldTailByMasking() \|\| Legal->blockNeedsPredication(BB);		return foldTailByMasking() \|\| Legal->blockNeedsPredication(BB);
}		}

/// Returns true if VP intrinsics should be generated in the tail folded loop.		/// Returns true if VP intrinsics should be generated in the tail folded loop.
bool preferVPIntrinsics() const {		bool preferVPIntrinsics() const {
		hiradityaUnsubmitted Done Reply Inline Actions nit: IMO the function name doesn't imply the semantics precisely. nit: reorder to aviod call if possible cpp PreferVPIntrinsics && foldTailByMasking(); hiraditya: nit: IMO the function name doesn't imply the semantics precisely. nit: reorder to aviod call…
		AyalUnsubmitted Done Reply Inline Actions +1 The acronym `VP` is overloaded here and mainly short for "VPlan", as in VPInstruction. Perhaps `VPI` would be clearer, short for VP Intrinsics. Seems to mean "foldTailByEVL()" or "foldTailByVPI()"? Ayal: +1 The acronym `VP` is overloaded here and mainly short for "VPlan", as in VPInstruction.
		fhahnUnsubmitted Not Done Reply Inline Actions Agreed, it should be possible to use VP intrinsics without EVL, so it would be good to have the name reflect that this is focused about introducing EVL? Same for the user-exposed options. fhahn: Agreed, it should be possible to use VP intrinsics without EVL, so it would be good to have the…
		AyalUnsubmitted Done Reply Inline Actions Reiterating the suggestion to use `VPI` to denote `llvm.vp.` vector predicate intrinsicts, if/where needed, and "foldTailByEVL()" for deciding to use EVL style tail-folding. Ayal:* Reiterating the suggestion to use `VPI` to denote `llvm.vp.*` vector predicate intrinsicts…
return foldTailByMasking() && PreferVPIntrinsics;		return foldTailByMasking() && PreferVPIntrinsics;
}		}
		fhahnUnsubmitted Not Done Reply Inline Actions Looks like this needs a test and the TODO from the recipe should probably go here to make clear why there's this restriction for now fhahn: Looks like this needs a test and the TODO from the recipe should probably go here to make clear…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Will add a test ABataev: Will add a test

/// A SmallMapVector to store the InLoop reduction op chains, mapping phi		/// A SmallMapVector to store the InLoop reduction op chains, mapping phi
/// nodes to the chain of instructions representing the reductions. Uses a		/// nodes to the chain of instructions representing the reductions. Uses a
/// MapVector to ensure deterministic iteration order.		/// MapVector to ensure deterministic iteration order.
using ReductionChainMap =		using ReductionChainMap =
SmallMapVector<PHINode , SmallVector<Instruction , 4>, 4>;		SmallMapVector<PHINode , SmallVector<Instruction , 4>, 4>;

/// Return the chain of instructions representing an inloop reduction.		/// Return the chain of instructions representing an inloop reduction.
const ReductionChainMap &getInLoopReductionChains() const {		const ReductionChainMap &getInLoopReductionChains() const {
return InLoopReductionChains;		return InLoopReductionChains;
}		}

/// Returns true if the Phi is part of an inloop reduction.		/// Returns true if the Phi is part of an inloop reduction.
bool isInLoopReduction(PHINode *Phi) const {		bool isInLoopReduction(PHINode *Phi) const {
return InLoopReductionChains.count(Phi);		return InLoopReductionChains.count(Phi);
}		}
		AyalUnsubmitted Done Reply Inline Actions nit: move this TODO into a FIXME next to checking isSafeForAnyVectorWidth as other cases? Ayal: nit: move this TODO into a FIXME next to checking isSafeForAnyVectorWidth as other cases?

/// Estimate cost of an intrinsic call instruction CI if it were vectorized		/// Estimate cost of an intrinsic call instruction CI if it were vectorized
		shiva0217Unsubmitted Done Reply Inline Actions If the loop contains reduction variables, there might need a mask to merge the last two iteration results. int a[128]; int foo (int end) { int size = 0; for (int i = 0; i < end; i++) size += a[i]; return size; } Should the case be guarded by `Legal->getReductionVars().empty() &&` ? shiva0217: If the loop contains reduction variables, there might need a mask to merge the last two…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Added ABataev: Added
/// with factor VF. Return the cost of the instruction, including		/// with factor VF. Return the cost of the instruction, including
/// scalarization overhead if it's needed.		/// scalarization overhead if it's needed.
InstructionCost getVectorIntrinsicCost(CallInst *CI, ElementCount VF) const;		InstructionCost getVectorIntrinsicCost(CallInst *CI, ElementCount VF) const;

/// Estimate cost of a call instruction CI if it were vectorized with factor		/// Estimate cost of a call instruction CI if it were vectorized with factor
/// VF. Return the cost of the instruction, including scalarization overhead		/// VF. Return the cost of the instruction, including scalarization overhead
/// if it's needed. The flag NeedToScalarize shows if the call needs to be		/// if it's needed. The flag NeedToScalarize shows if the call needs to be
/// scalarized -		/// scalarized -
▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines	private:

/// All blocks of loop are to be masked to fold tail of scalar iterations.		/// All blocks of loop are to be masked to fold tail of scalar iterations.
bool FoldTailByMasking = false;		bool FoldTailByMasking = false;

/// Control whether to generate VP intrinsics in vectorized code.		/// Control whether to generate VP intrinsics in vectorized code.
bool PreferVPIntrinsics = false;		bool PreferVPIntrinsics = false;

/// A map holding scalar costs for different vectorization factors. The		/// A map holding scalar costs for different vectorization factors. The
/// presence of a cost for an instruction in the mapping indicates that the		/// presence of a cost for an instruction in the mapping indicates that the
		fhahnUnsubmitted Not Done Reply Inline Actions This is only for using EVL for now, would be good to clarify name and comment. fhahn: This is only for using EVL for now, would be good to clarify name and comment.
		AyalUnsubmitted Not Done Reply Inline Actions +1 Suffice to say `PreferEVL`, as that implies `VPI`. Ayal: +1 Suffice to say `PreferEVL`, as that implies `VPI`.
/// instruction will be scalarized when vectorizing with the associated		/// instruction will be scalarized when vectorizing with the associated
/// vectorization factor. The entries are VF-ScalarCostTy pairs.		/// vectorization factor. The entries are VF-ScalarCostTy pairs.
DenseMap<ElementCount, ScalarCostsTy> InstsToScalarize;		DenseMap<ElementCount, ScalarCostsTy> InstsToScalarize;

/// Holds the instructions known to be uniform after vectorization.		/// Holds the instructions known to be uniform after vectorization.
/// The data is collected per VF.		/// The data is collected per VF.
DenseMap<ElementCount, SmallPtrSet<Instruction *, 4>> Uniforms;		DenseMap<ElementCount, SmallPtrSet<Instruction *, 4>> Uniforms;

▲ Show 20 Lines • Show All 1,142 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::vectorizeMemoryInstruction(
// reverse consecutive.		// reverse consecutive.
bool Reverse = (Decision == LoopVectorizationCostModel::CM_Widen_Reverse);		bool Reverse = (Decision == LoopVectorizationCostModel::CM_Widen_Reverse);
bool ConsecutiveStride =		bool ConsecutiveStride =
Reverse \|\| (Decision == LoopVectorizationCostModel::CM_Widen);		Reverse \|\| (Decision == LoopVectorizationCostModel::CM_Widen);
bool CreateGatherScatter =		bool CreateGatherScatter =
(Decision == LoopVectorizationCostModel::CM_GatherScatter);		(Decision == LoopVectorizationCostModel::CM_GatherScatter);

if (Reverse)		if (Reverse)
assert(!EVL &&		assert(!EVL &&
		bmahjourUnsubmitted Not Done Reply Inline Actions how do we guard against this situation? I think this should be part of the legality check...eg `isLegalMaskedLoad` check for `Legal->isConsecutivePtr(Ptr)`...if we had a function similar to isLegalMaskedLoad (say isLegalEVLLoad), then it could check for -1 stride and return false...then we'd be guaranteed not to hit this assert. bmahjour: how do we guard against this situation? I think this should be part of the legality check...eg…
		vkmrUnsubmitted Done Reply Inline Actions Do you mean the legality checker should determine if it's a reverse operation then it is illegal to use EVL predicated Load and should default to Masked load (if that is legal)? Eventually, there will be support for reverse with EVL predication and required legality checks will be added. For this PoC patch however, it is more explicit and straightforward to have an assert here. vkmr: Do you mean the legality checker should determine if it's a reverse operation then it is…
"Vector reverse not supported for predicated vectorization.");		"Vector reverse not supported for predicated vectorization.");
if (CreateGatherScatter)		if (CreateGatherScatter)
assert(!EVL && "Gather/Scatter operations not supported for "		assert(!EVL && "Gather/Scatter operations not supported for "
"predicated vectorization.");		"predicated vectorization.");

// Either Ptr feeds a vector load/store, or a vector GEP should feed a vector		// Either Ptr feeds a vector load/store, or a vector GEP should feed a vector
// gather/scatter. Otherwise Decision should have been to Scalarize.		// gather/scatter. Otherwise Decision should have been to Scalarize.
assert((ConsecutiveStride \|\| CreateGatherScatter) &&		assert((ConsecutiveStride \|\| CreateGatherScatter) &&
▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	for (unsigned Part = 0; Part < UF; ++Part) {
if (Reverse) {		if (Reverse) {
// If we store to reverse consecutive memory locations, then we need		// If we store to reverse consecutive memory locations, then we need
// to reverse the order of elements in the stored value.		// to reverse the order of elements in the stored value.
StoredVal = reverseVector(StoredVal);		StoredVal = reverseVector(StoredVal);
// We don't want to update the value in the map as it might be used in		// We don't want to update the value in the map as it might be used in
// another expression. So don't call resetVectorValue(StoredVal).		// another expression. So don't call resetVectorValue(StoredVal).
}		}
auto *VecPtr = CreateVecPtr(Part, State.get(Addr, VPIteration(0, 0)));		auto *VecPtr = CreateVecPtr(Part, State.get(Addr, VPIteration(0, 0)));
// if EVLPart is not null, we can vectorize using predicated		// if EVLPart is not null, we can vectorize using predicated
// intrinsic.		// intrinsic.
if (EVLPart) {		if (EVLPart) {
assert(isMaskRequired &&		assert(isMaskRequired &&
"Mask argument is required for VP intrinsics.");		"Mask argument is required for VP intrinsics.");
VectorType *StoredValTy = cast<VectorType>(StoredVal->getType());		VectorType *StoredValTy = cast<VectorType>(StoredVal->getType());
		frasercrmckUnsubmitted Done Reply Inline Actions Does this need to be a `VectorType`? Can't you pass `StoredVal->getType()` straight through to `CreateIntrinsic`? frasercrmck: Does this need to be a `VectorType`? Can't you pass `StoredVal->getType()` straight through to…
Value *BlockInMaskPart = BlockInMaskParts[Part];		Value *BlockInMaskPart = BlockInMaskParts[Part];
Value *EVLPartI32 = Builder.CreateSExtOrTrunc(		Value *EVLPartI32 = Builder.CreateSExtOrTrunc(
EVLPart, Type::getInt32Ty(Builder.getContext()));		EVLPart, Type::getInt32Ty(Builder.getContext()));
		craig.topperUnsubmitted Done Reply Inline Actions I think you can use Builder.getInt32Ty craig.topper: I think you can use Builder.getInt32Ty
NewSI = Builder.CreateIntrinsic(		NewSI = Builder.CreateIntrinsic(
Intrinsic::vp_store, {StoredValTy, VecPtr->getType()},		Intrinsic::vp_store, {StoredValTy, VecPtr->getType()},
{StoredVal, VecPtr, Builder.getInt32(Alignment.value()),		{StoredVal, VecPtr, Builder.getInt32(Alignment.value()),
BlockInMaskPart, EVLPartI32});		BlockInMaskPart, EVLPartI32});
} else if (isMaskRequired) {		} else if (isMaskRequired) {
		fhahnUnsubmitted Not Done Reply Inline Actions Do we gain a lot of code-reuse by adding this to `::vectorizeMemoryInstruction`? Seems like it would be cleaner to handle predicated codegen in a separate function? fhahn: Do we gain a lot of code-reuse by adding this to `::vectorizeMemoryInstruction`? Seems like it…
		vkmrUnsubmitted Not Done Reply Inline Actions I am not too happy about not having a separate function for `::vectorizeMemoryInstruction`! But there is substantial code overlap. Perhaps a better approach would be to abstract out much of the shared code in a separate function and duplicate some parts in a separate `vectorizePredicatedMemoryInstruction`. vkmr: I am not too happy about not having a separate function for `::vectorizeMemoryInstruction`! But…
NewSI = Builder.CreateMaskedStore(StoredVal, VecPtr, Alignment,		NewSI = Builder.CreateMaskedStore(StoredVal, VecPtr, Alignment,
BlockInMaskParts[Part]);		BlockInMaskParts[Part]);
} else {		} else {
NewSI = Builder.CreateAlignedStore(StoredVal, VecPtr, Alignment);		NewSI = Builder.CreateAlignedStore(StoredVal, VecPtr, Alignment);
}		}
}		}
addMetadata(NewSI, SI);		addMetadata(NewSI, SI);
}		}
Show All 20 Lines	if (CreateGatherScatter) {
addMetadata(NewLI, LI);		addMetadata(NewLI, LI);
} else {		} else {
auto *VecPtr = CreateVecPtr(Part, State.get(Addr, VPIteration(0, 0)));		auto *VecPtr = CreateVecPtr(Part, State.get(Addr, VPIteration(0, 0)));
if (EVLPart) {		if (EVLPart) {
assert(isMaskRequired &&		assert(isMaskRequired &&
"Mask argument is required for VP intrinsics.");		"Mask argument is required for VP intrinsics.");
Value *BlockInMaskPart = BlockInMaskParts[Part];		Value *BlockInMaskPart = BlockInMaskParts[Part];
Value *EVLPartI32 = Builder.CreateSExtOrTrunc(		Value *EVLPartI32 = Builder.CreateSExtOrTrunc(
EVLPart, Type::getInt32Ty(Builder.getContext()));		EVLPart, Type::getInt32Ty(Builder.getContext()));
		craig.topperUnsubmitted Done Reply Inline Actions Same here craig.topper: Same here
NewLI = Builder.CreateIntrinsic(		NewLI = Builder.CreateIntrinsic(
Intrinsic::vp_load,		Intrinsic::vp_load,
{VecPtr->getType()->getPointerElementType(), VecPtr->getType()},		{VecPtr->getType()->getPointerElementType(), VecPtr->getType()},
{VecPtr, Builder.getInt32(Alignment.value()), BlockInMaskPart,		{VecPtr, Builder.getInt32(Alignment.value()), BlockInMaskPart,
EVLPartI32},		EVLPartI32},
nullptr, "vp.op.load");		nullptr, "vp.op.load");
} else if (isMaskRequired) {		} else if (isMaskRequired) {
NewLI = Builder.CreateMaskedLoad(		NewLI = Builder.CreateMaskedLoad(
▲ Show 20 Lines • Show All 146 Lines • ▼ Show 20 Lines
}		}

Value InnerLoopVectorizer::getOrCreateVectorTripCount(Loop L) {		Value InnerLoopVectorizer::getOrCreateVectorTripCount(Loop L) {
if (VectorTripCount)		if (VectorTripCount)
return VectorTripCount;		return VectorTripCount;

Value *TC = getOrCreateTripCount(L);		Value *TC = getOrCreateTripCount(L);
IRBuilder<> Builder(L->getLoopPreheader()->getTerminator());		IRBuilder<> Builder(L->getLoopPreheader()->getTerminator());

		fhahnUnsubmitted Done Reply Inline Actions Is there a test with multiple exits? fhahn: Is there a test with multiple exits?
		ABataevAuthorUnsubmitted Done Reply Inline Actions It currently not supported by this patch. This is just an inital patch, it won't handle all the corner cases. ABataev: It currently not supported by this patch. This is just an inital patch, it won't handle all the…
		fhahnUnsubmitted Not Done Reply Inline Actions Reading this back now, the check below is for loops requiring scalar epilogues, not necessarily multiple exits. Would be good to update the comment and also add a test to make sure this is captured. AFAICT this is something that the patch already handles and should be tested (e.g. a test with an interleave group that requires scalar epilogue) fhahn: Reading this back now, the check below is for loops requiring scalar epilogues, not necessarily…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Will add a test ABataev: Will add a test
		ABataevAuthorUnsubmitted Done Reply Inline Actions Will add a test for ultiple exits only, interleave groups are invalidated if foldTailByMasking() is true. ABataev: Will add a test for ultiple exits only, interleave groups are invalidated if foldTailByMasking…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Double checked it, currently we cannot have this situation at all, so converting it to assert. ABataev: Double checked it, currently we cannot have this situation at all, so converting it to assert.
		AyalUnsubmitted Not Done Reply Inline Actions interleave groups are invalidated if foldTailByMasking() is true unless useMaskedInterleavedAccesses() is also true? Ayal: > interleave groups are invalidated if foldTailByMasking() is true unless…
		ABataevAuthorUnsubmitted Done Reply Inline Actions I can return check back, but currently cannot provide a test, since the target invalidates the costs for interleaved groups when making cost-based decisions. ABataev: I can return check back, but currently cannot provide a test, since the target invalidates the…
Type *Ty = TC->getType();		Type *Ty = TC->getType();
// This is where we can make the step a runtime constant.		// This is where we can make the step a runtime constant.
Value *Step = createStepForVF(Builder, ConstantInt::get(Ty, UF), VF);		Value *Step = createStepForVF(Builder, ConstantInt::get(Ty, UF), VF);
		AyalUnsubmitted Not Done Reply Inline Actions What is the meaning of setting VectorTripCount = TC? Ayal: What is the meaning of setting VectorTripCount = TC?
		ABataevAuthorUnsubmitted Done Reply Inline Actions We don't need to round vector trip count, so just set it to original trip count value (EVL is adjusted automatically to be not larger than trip count). ABataev: We don't need to round vector trip count, so just set it to original trip count value (EVL is…

// If the tail is to be folded by masking, round the number of iterations N		// If the tail is to be folded by masking, round the number of iterations N
// up to a multiple of Step instead of rounding down. This is done by first		// up to a multiple of Step instead of rounding down. This is done by first
// adding Step-1 and then rounding down. Note that it's ok if this addition		// adding Step-1 and then rounding down. Note that it's ok if this addition
// overflows: the vector induction variable will eventually wrap to zero given		// overflows: the vector induction variable will eventually wrap to zero given
// that it starts at zero and its Step is a power of two; the loop will then		// that it starts at zero and its Step is a power of two; the loop will then
// exit, with the last early-exit vector comparison also producing all-true.		// exit, with the last early-exit vector comparison also producing all-true.
if (Cost->foldTailByMasking()) {		if (Cost->foldTailByMasking()) {
▲ Show 20 Lines • Show All 1,709 Lines • ▼ Show 20 Lines	assert((I.getOpcode() == Instruction::UDiv \|\|
I.getOpcode() == Instruction::URem \|\|		I.getOpcode() == Instruction::URem \|\|
I.getOpcode() == Instruction::SRem) &&		I.getOpcode() == Instruction::SRem) &&
"Unexpected instruction");		"Unexpected instruction");
Value *Divisor = I.getOperand(1);		Value *Divisor = I.getOperand(1);
auto *CInt = dyn_cast<ConstantInt>(Divisor);		auto *CInt = dyn_cast<ConstantInt>(Divisor);
return !CInt \|\| CInt->isZero();		return !CInt \|\| CInt->isZero();
}		}

void InnerLoopVectorizer::widenPredicatedInstruction(Instruction &I,		void InnerLoopVectorizer::widenPredicatedInstruction(Instruction &I,
		bmahjourUnsubmitted Not Done Reply Inline Actions I guess the targets that don't need/have the concept of predicated binary operations, must provide lowering for all these calls to ultimately generate non-predicated vector code. I'd expect that to be a large effort with little gain. To allow the EVL exploitation while the lowering is being provided, would it make sense to have a path in this function where we just fall back to `InnerLoopVectorizer::widenInstruction` under an option? bmahjour: I guess the targets that don't need/have the concept of predicated binary operations, must…
		simollUnsubmitted Not Done Reply Inline Actions D78203 implements lowering from VP intrinsics to regular SIMD instructions for all targets. If targets support VP, they should get it. If IR with VP intrinsics ends up getting compiled for a non-VP targets, the intrinsics will disappear before they hit lowering. simoll: D78203 implements lowering from VP intrinsics to regular SIMD instructions for all targets. If…
		vkmrUnsubmitted Done Reply Inline Actions To add to @simoll's comment, eventually we will be using the `VectorBuilder` (earlier `VPBuilder`) to widen to VP intrinsics instead of manually creating each intrinsic here. `VectorBuilder` can also make some decisions on when and how to use the intrinsics. vkmr: To add to @simoll's comment, eventually we will be using the `VectorBuilder` (earlier…
VPValue *Def, VPUser &User,		VPValue *Def, VPUser &User,
VPTransformState &State,		VPTransformState &State,
VPValue *BlockInMask,		VPValue *BlockInMask,
VPValue *EVL) {		VPValue *EVL) {
auto getVPIntrInstr = [](unsigned Opcode) {		auto getVPIntrInstr = [](unsigned Opcode) {
switch (Opcode) {		switch (Opcode) {
case Instruction::Add:		case Instruction::Add:
return Intrinsic::vp_add;		return Intrinsic::vp_add;
Show All 24 Lines	auto getVPIntrInstr = [](unsigned Opcode) {
}		}
return Intrinsic::not_intrinsic;		return Intrinsic::not_intrinsic;
};		};

unsigned Opcode = I.getOpcode();		unsigned Opcode = I.getOpcode();
assert(getVPIntrInstr(Opcode) != Intrinsic::not_intrinsic &&		assert(getVPIntrInstr(Opcode) != Intrinsic::not_intrinsic &&
"Instruction does not have VP intrinsic support.");		"Instruction does not have VP intrinsic support.");

// Just widen unops and binops.		// Just widen unops and binops.
		frasercrmckUnsubmitted Done Reply Inline Actions Is this comment out of place? frasercrmck: Is this comment out of place?
setDebugLocFromInst(Builder, &I);		setDebugLocFromInst(Builder, &I);

for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
SmallVector<Value *, 2> Ops;		SmallVector<Value *, 2> Ops;
for (unsigned OpIdx = 0; OpIdx < User.getNumOperands() - 2; OpIdx++)		for (unsigned OpIdx = 0; OpIdx < User.getNumOperands() - 2; OpIdx++)
Ops.push_back(State.get(User.getOperand(OpIdx), Part));		Ops.push_back(State.get(User.getOperand(OpIdx), Part));

VectorType *OpTy = cast<VectorType>(Ops[0]->getType());		VectorType *OpTy = cast<VectorType>(Ops[0]->getType());
▲ Show 20 Lines • Show All 413 Lines • ▼ Show 20 Lines	default:
break;		break;
case Instruction::Load:		case Instruction::Load:
case Instruction::Store: {		case Instruction::Store: {
if (!Legal->isMaskRequired(I))		if (!Legal->isMaskRequired(I))
return false;		return false;
auto *Ptr = getLoadStorePointerOperand(I);		auto *Ptr = getLoadStorePointerOperand(I);
auto *Ty = getMemInstValueType(I);		auto *Ty = getMemInstValueType(I);
// We have already decided how to vectorize this instruction, get that		// We have already decided how to vectorize this instruction, get that
// result.		// result.
		AyalUnsubmitted Done Reply Inline Actions Shouldn't/Doesn't ForceEVLSupport imply useVPWithVPEVLVectorization()? Other cases of using EVL can scalarize-and-predicate loads and stores? Early-return earlier. Ayal: Shouldn't/Doesn't ForceEVLSupport imply useVPWithVPEVLVectorization()? Other cases of using EVL…
if (VF.isVector()) {		if (VF.isVector()) {
InstWidening WideningDecision = getWideningDecision(I, VF);		InstWidening WideningDecision = getWideningDecision(I, VF);
assert(WideningDecision != CM_Unknown &&		assert(WideningDecision != CM_Unknown &&
"Widening decision should be ready at this moment");		"Widening decision should be ready at this moment");
return WideningDecision == CM_Scalarize;		return WideningDecision == CM_Scalarize;
}		}
const Align Alignment = getLoadStoreAlignment(I);		const Align Alignment = getLoadStoreAlignment(I);
return isa<LoadInst>(I) ? !(isLegalMaskedLoad(Ty, Ptr, Alignment) \|\|		return isa<LoadInst>(I) ? !(isLegalMaskedLoad(Ty, Ptr, Alignment) \|\|
▲ Show 20 Lines • Show All 392 Lines • ▼ Show 20 Lines	if (Rem->isZero()) {
LLVM_DEBUG(dbgs() << "LV: No tail will remain for any chosen VF.\n");		LLVM_DEBUG(dbgs() << "LV: No tail will remain for any chosen VF.\n");
return MaxVF;		return MaxVF;
}		}

// If we don't know the precise trip count, or if the trip count that we		// If we don't know the precise trip count, or if the trip count that we
// found modulo the vectorization factor is not zero, try to fold the tail		// found modulo the vectorization factor is not zero, try to fold the tail
// by masking.		// by masking.
// FIXME: look for a smaller MaxVF that does divide TC rather than masking.		// FIXME: look for a smaller MaxVF that does divide TC rather than masking.
if (Legal->prepareToFoldTailByMasking()) {		if (Legal->prepareToFoldTailByMasking()) {
		AyalUnsubmitted Not Done Reply Inline Actions The logic of this method is getting excessively complicated. Suggest to turn this into an early-exiting if (!Legal->prepareToFoldTailByMasking()). Ayal: The logic of this method is getting excessively complicated. Suggest to turn this into an early…
FoldTailByMasking = true;		FoldTailByMasking = true;
if (!PreferPredicateWithVPIntrinsics)		if (!PreferPredicateWithVPIntrinsics)
return MaxVF;		return MaxVF;

if (UserIC > 1) {		if (UserIC > 1) {
		fhahnUnsubmitted Done Reply Inline Actions It looks like there may not be a test for this code path? fhahn: It looks like there may not be a test for this code path?
		ABataevAuthorUnsubmitted Done Reply Inline Actions I'll check what we can do about it. ABataev: I'll check what we can do about it.
		AyalUnsubmitted Done Reply Inline Actions Is there a test? Message intends to say that preference was indicated but ignored, i.e., tail will be folded (with UF>1) but w/o VP intrinsics - w/ or w/o EVL? Ayal: Is there a test? Message intends to say that preference was indicated but ignored, i.e., tail…
LLVM_DEBUG(dbgs() << "LV: Preference for VP intrinsics indicated. Will "		LLVM_DEBUG(dbgs() << "LV: Preference for VP intrinsics indicated. Will "
"not generate VP intrinsics since interleave count "		"not generate VP intrinsics since interleave count "
		AyalUnsubmitted Not Done Reply Inline Actions It may be confusing to keep `PreferVPIntrinsics = false` when "Preference for VP intrinsics indicated". The (names and) relation between `PreferPredicateWithVPIntrinsics` and `PreferVPIntrinsics` should be clarified. Separate the part that sets PreferVPIntrinsics, possible switching on PreferPredicateWithVPIntrinsics? Ayal: It may be confusing to keep `PreferVPIntrinsics = false` when "Preference for VP intrinsics…
		ABataevAuthorUnsubmitted Done Reply Inline Actions PreferPredicateWithVPIntrinsics is just the option, that controls, whether we can emit vp intrinsics at all. If later we see, that it is not possible for some reason, we do not do it to still be able to produce correct code. I don't see what else should be adjusted here ABataev: PreferPredicateWithVPIntrinsics is just the option, that controls, whether we can emit vp…
"specified is greater than 1.\n");		"specified is greater than 1.\n");
return MaxVF;		return MaxVF;
}		}

if (PreferPredicateWithVPIntrinsics ==		if (PreferPredicateWithVPIntrinsics ==
		AyalUnsubmitted Done Reply Inline Actions ScalableVF is surely scalable, assert rather than ask? isNonZero or isVector? Ayal: ScalableVF is surely scalable, assert rather than ask? isNonZero or isVector?
PreferVPIntrinsicsTy::IfAVLSupported) {		PreferVPIntrinsicsTy::IfAVLSupported) {
PreferVPIntrinsics = TTI.hasActiveVectorLength();		PreferVPIntrinsics = TTI.hasActiveVectorLength();
		AyalUnsubmitted Not Done Reply Inline Actions Add expected fail tests to cover opcode/data types that are not supported nor checked? Ayal: Add expected fail tests to cover opcode/data types that are not supported nor checked?
		ABataevAuthorUnsubmitted Done Reply Inline Actions Currently there are no such test, since we support only loads/stores, which are supported by all potential targets ABataev: Currently there are no such test, since we support only loads/stores, which are supported by…
LLVM_DEBUG(dbgs() << "LV: Preference for VP intrinsics indicated. Will "		LLVM_DEBUG(dbgs() << "LV: Preference for VP intrinsics indicated. Will "
		AyalUnsubmitted Done Reply Inline Actions This method already has an undesired side-effect of setting CanFoldTailByMasking independent of returning MaxFactors, better avoid having it set yet another side-effect PreferVPWithVPEVLIntrinsics. Perhaps extend CanFoldTailByMasking from a bool to also indicate how the tail is folded, and/or extend the return value to carry more than a FixedScalableVFPair? Logic can be simplified into PreferVPWithVPEVLIntrinsics = PreferPredicateWithVPEVLIntrinsics == EVLOption::IfEVLSupported \|\| TTI.hasActiveVectorLength(0, nullptr, Align()); the rest is debug prints placed under !NDEBUG, or a single LLVM_DEBUG with some ternary ? selecting if the preference indicated is followed or ignored. Ayal: This method already has an undesired side-effect of setting CanFoldTailByMasking independent of…
"try to generate VP Intrinsics if the target "		"try to generate VP Intrinsics if the target "
"support vector length predication.\n");		"support vector length predication.\n");
} else {		} else {
		AyalUnsubmitted Done Reply Inline Actions May as well also inform if the target supports it or not? Ayal: May as well also inform if the target supports it or not?
PreferVPIntrinsics = true;		PreferVPIntrinsics = true;
LLVM_DEBUG(dbgs() << "LV: Preference for VP intrinsics indicated. Will "		LLVM_DEBUG(dbgs() << "LV: Preference for VP intrinsics indicated. Will "
"try to generate VP Intrinsics.\n");		"try to generate VP Intrinsics.\n");
		AyalUnsubmitted Not Done Reply Inline Actions "if the target support vector length predication" - we already checked that it does? Ayal: "if the target support vector length predication" - we already checked that it does?
}		}

return MaxVF;		return MaxVF;
}		}
		fhahnUnsubmitted Done Reply Inline Actions Would be good to document what is going on here (effectively disabling fixed with vectorization and why) fhahn: Would be good to document what is going on here (effectively disabling fixed with vectorization…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Ok, will add ABataev: Ok, will add
		AyalUnsubmitted Not Done Reply Inline Actions So EVL is applied to scalable VFs only, right? Ayal: So EVL is applied to scalable VFs only, right?
		ABataevAuthorUnsubmitted Done Reply Inline Actions For now - yes. ABataev: For now - yes.

// If there was a tail-folding hint/switch, but we can't fold the tail by		// If there was a tail-folding hint/switch, but we can't fold the tail by
// masking, fallback to a vectorization with a scalar epilogue.		// masking, fallback to a vectorization with a scalar epilogue.
if (ScalarEpilogueStatus == CM_ScalarEpilogueNotNeededUsePredicate) {		if (ScalarEpilogueStatus == CM_ScalarEpilogueNotNeededUsePredicate) {
LLVM_DEBUG(dbgs() << "LV: Cannot fold tail by masking: vectorize with a "		LLVM_DEBUG(dbgs() << "LV: Cannot fold tail by masking: vectorize with a "
"scalar epilogue instead.\n");		"scalar epilogue instead.\n");
ScalarEpilogueStatus = CM_ScalarEpilogueAllowed;		ScalarEpilogueStatus = CM_ScalarEpilogueAllowed;
		AyalUnsubmitted Done Reply Inline Actions This deserves a comment, explaining that a tail folded using VP intrinsics restricts the VF to be scalable. Ayal: This deserves a comment, explaining that a tail folded using VP intrinsics restricts the VF to…
return MaxVF;		return MaxVF;
}		}

if (ScalarEpilogueStatus == CM_ScalarEpilogueNotAllowedUsePredicate) {		if (ScalarEpilogueStatus == CM_ScalarEpilogueNotAllowedUsePredicate) {
LLVM_DEBUG(dbgs() << "LV: Can't fold tail by masking: don't vectorize\n");		LLVM_DEBUG(dbgs() << "LV: Can't fold tail by masking: don't vectorize\n");
return None;		return None;
}		}

▲ Show 20 Lines • Show All 452 Lines • ▼ Show 20 Lines	unsigned LoopVectorizationCostModel::selectInterleaveCount(ElementCount VF,
// 2. If the loop is really small, then we interleave to reduce the loop		// 2. If the loop is really small, then we interleave to reduce the loop
// overhead.		// overhead.
// 3. We don't interleave if we think that we will spill registers to memory		// 3. We don't interleave if we think that we will spill registers to memory
// due to the increased register pressure.		// due to the increased register pressure.

if (!isScalarEpilogueAllowed())		if (!isScalarEpilogueAllowed())
return 1;		return 1;

// Do not interleave if VP intrinsics are preferred and no User IC is		// Do not interleave if VP intrinsics are preferred and no User IC is
		fhahnUnsubmitted Not Done Reply Inline Actions nit: if EVL is preferred? fhahn: nit: if EVL is preferred?
// specified.		// specified.
if (preferVPIntrinsics())		if (preferVPIntrinsics())
return 1;		return 1;

// We used the distance for the interleave count.		// We used the distance for the interleave count.
if (Legal->getMaxSafeDepDistBytes() != -1U)		if (Legal->getMaxSafeDepDistBytes() != -1U)
return 1;		return 1;

▲ Show 20 Lines • Show All 2,189 Lines • ▼ Show 20 Lines	if (SrcMask) { // Otherwise block in-mask is all-one, no need to AND.
// EdgeMask is poison. Using 'and' here introduces undefined behavior.		// EdgeMask is poison. Using 'and' here introduces undefined behavior.
VPValue *False = Plan->getOrAddVPValue(		VPValue *False = Plan->getOrAddVPValue(
ConstantInt::getFalse(BI->getCondition()->getType()));		ConstantInt::getFalse(BI->getCondition()->getType()));
EdgeMask = Builder.createSelect(SrcMask, EdgeMask, False);		EdgeMask = Builder.createSelect(SrcMask, EdgeMask, False);
}		}

return EdgeMaskCache[Edge] = EdgeMask;		return EdgeMaskCache[Edge] = EdgeMask;
}		}

VPValue VPRecipeBuilder::createBlockInMask(BasicBlock BB, VPlanPtr &Plan) {		VPValue VPRecipeBuilder::createBlockInMask(BasicBlock BB, VPlanPtr &Plan) {
		fhahnUnsubmitted Not Done Reply Inline Actions Better to replace the mask together with introducing EVL to make sure EVL gets added when the mask gets removed? fhahn: Better to replace the mask together with introducing EVL to make sure EVL gets added when the…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Currently it will require some extra work. We'll need to handle both cases, with activelane instrnsics and direct comparison. Would be possible to keep it for now and fix it once you land emission of activelane intrinsic in VPlan-toVPlan transform? ABataev: Currently it will require some extra work. We'll need to handle both cases, with activelane…
		fhahnUnsubmitted Not Done Reply Inline Actions With the latest version, can the `useVPWithVPEVLVectorization` part be dropped (if the transform is updated to remove the mask from load/stores)? fhahn: With the latest version, can the `useVPWithVPEVLVectorization` part be dropped (if the…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Not quite, it will require an extra VPValue, something like VPAllTrueMask, which should replace IV <= BTC. Shall I add it? ABataev: Not quite, it will require an extra VPValue, something like VPAllTrueMask, which should replace…
		fhahnUnsubmitted Not Done Reply Inline Actions Would a live in `i1 true` work? I think that may work as is. As EVL is only used for lowering of loads/stores at the moment, it should be only removed there for now? fhahn: Would a live in `i1 true` work? I think that may work as is. As EVL is only used for lowering…
		ABataevAuthorUnsubmitted Done Reply Inline Actions You mean scalar i1 true? ABataev: You mean scalar i1 true?
		fhahnUnsubmitted Not Done Reply Inline Actions yes, that should be broadcasted across all vector lanes fhahn: yes, that should be broadcasted across all vector lanes
		ABataevAuthorUnsubmitted Done Reply Inline Actions I don't like that we won't match actual type here. I thought about other possible solution - overload VPActiaveLaneMask and make it return all-true mask for RVL targets. WDYT? ABataev: I don't like that we won't match actual type here. I thought about other possible solution…
assert(OrigLoop->contains(BB) && "Block is not a part of a loop");		assert(OrigLoop->contains(BB) && "Block is not a part of a loop");

// Look for cached value.		// Look for cached value.
BlockMaskCacheTy::iterator BCEntryIt = BlockMaskCache.find(BB);		BlockMaskCacheTy::iterator BCEntryIt = BlockMaskCache.find(BB);
if (BCEntryIt != BlockMaskCache.end())		if (BCEntryIt != BlockMaskCache.end())
return BCEntryIt->second;		return BCEntryIt->second;

// All-one mask is modelled as no-mask following the convention for masked		// All-one mask is modelled as no-mask following the convention for masked
// load/store/gather/scatter. Initialize BlockMask to no-mask.		// load/store/gather/scatter. Initialize BlockMask to no-mask.
VPValue *BlockMask = nullptr;		VPValue *BlockMask = nullptr;

if (OrigLoop->getHeader() == BB) {		if (OrigLoop->getHeader() == BB) {
if (!CM.blockNeedsPredication(BB))		if (!CM.blockNeedsPredication(BB))
		AyalUnsubmitted Not Done Reply Inline Actions If `useVPVectorization()` is another reason to predicate the header BB, it should be included in `blockNeedsPredicationForAnyReason()`. But since the former requires tail folding, is it really another reason, or can it be dropped? Ayal: If `useVPVectorization()` is another reason to predicate the header BB, it should be included…
		ABataevAuthorUnsubmitted Done Reply Inline Actions It is another reason and cannot be dropped ABataev: It is another reason and cannot be dropped
return BlockMaskCache[BB] = BlockMask; // Loop incoming mask is all-one.		return BlockMaskCache[BB] = BlockMask; // Loop incoming mask is all-one.

// if header block needs predication then it is only because tail-folding is		// if header block needs predication then it is only because tail-folding is
// enabled. If we are using VP intrinsics for a target with vector length		// enabled. If we are using VP intrinsics for a target with vector length
// predication support, this mask (icmp ule %IV %BTC) becomes redundant with		// predication support, this mask (icmp ule %IV %BTC) becomes redundant with
// EVL, which means unless we are using VP intrinsics without vector length		// EVL, which means unless we are using VP intrinsics without vector length
// predication support we can replace this mask with an all-true mask for		// predication support we can replace this mask with an all-true mask for
// possibly better latency.		// possibly better latency.
▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	auto willWiden = [&](ElementCount VF) -> bool {
if (Decision == LoopVectorizationCostModel::CM_Interleave)		if (Decision == LoopVectorizationCostModel::CM_Interleave)
return true;		return true;
if (CM.isScalarAfterVectorization(I, VF) \|\|		if (CM.isScalarAfterVectorization(I, VF) \|\|
CM.isProfitableToScalarize(I, VF))		CM.isProfitableToScalarize(I, VF))
return false;		return false;
return Decision != LoopVectorizationCostModel::CM_Scalarize;		return Decision != LoopVectorizationCostModel::CM_Scalarize;
};		};

return (LoopVectorizationPlanner::getDecisionAndClampRange(willWiden, Range));		return (LoopVectorizationPlanner::getDecisionAndClampRange(willWiden, Range));
}		}

VPRecipeBase VPRecipeBuilder::tryToWidenMemory(Instruction I, VFRange &Range,		VPRecipeBase VPRecipeBuilder::tryToWidenMemory(Instruction I, VFRange &Range,
VPlanPtr &Plan) {		VPlanPtr &Plan) {
if (!validateWidenMemory(I, Range))		if (!validateWidenMemory(I, Range))
		frasercrmckUnsubmitted Done Reply Inline Actions Unnecessary parens around this statement. frasercrmck: Unnecessary parens around this statement.
return nullptr;		return nullptr;

VPValue *Mask = nullptr;		VPValue *Mask = nullptr;
if (Legal->isMaskRequired(I))		if (Legal->isMaskRequired(I))
Mask = createBlockInMask(I->getParent(), Plan);		Mask = createBlockInMask(I->getParent(), Plan);

VPValue *Addr = Plan->getOrAddVPValue(getLoadStorePointerOperand(I));		VPValue *Addr = Plan->getOrAddVPValue(getLoadStorePointerOperand(I));
if (LoadInst *Load = dyn_cast<LoadInst>(I))		if (LoadInst *Load = dyn_cast<LoadInst>(I))
▲ Show 20 Lines • Show All 302 Lines • ▼ Show 20 Lines
VPRecipeOrVPValueTy VPRecipeBuilder::tryToCreateWidenRecipe(Instruction *Instr,		VPRecipeOrVPValueTy VPRecipeBuilder::tryToCreateWidenRecipe(Instruction *Instr,
VFRange &Range,		VFRange &Range,
VPlanPtr &Plan) {		VPlanPtr &Plan) {
// First, check for specific widening recipes that deal with calls, memory		// First, check for specific widening recipes that deal with calls, memory
// operations, inductions and Phi nodes.		// operations, inductions and Phi nodes.
if (auto *CI = dyn_cast<CallInst>(Instr))		if (auto *CI = dyn_cast<CallInst>(Instr))
return toVPRecipeResult(tryToWidenCall(CI, Range, *Plan));		return toVPRecipeResult(tryToWidenCall(CI, Range, *Plan));

if (isa<LoadInst>(Instr) \|\| isa<StoreInst>(Instr)) {		if (isa<LoadInst>(Instr) \|\| isa<StoreInst>(Instr)) {
		bmahjourUnsubmitted Not Done Reply Inline Actions not sure if this is already in your todo list...but apart from "preferring" to predicate, we need to check that the predication is legal for the target platform. This should probably be done in `isScalarWithPredication` with calls similar to `isLegalMaskedLoad`. bmahjour: not sure if this is already in your todo list...but apart from "preferring" to predicate, we…
		vkmrUnsubmitted Done Reply Inline Actions `preferPredicatedWiden` takes into account `FoldTailByMasking` and `TTI.hasActiveVectorLength()`. `TTI.hasActiveVectorLength()` checks whether the EVL based predication is legal for the target or not. For this initial patch, legality checker and cost model are purposely avoided and the explicit command line option is used to 1) keep things simple and 2) demonstrate different approaches. vkmr: `preferPredicatedWiden` takes into account `FoldTailByMasking` and `TTI.hasActiveVectorLength…
if (preferPredicatedWiden()) {		if (preferPredicatedWiden()) {
		craig.topperUnsubmitted Done Reply Inline Actions Drop curly braces since it's only 1 line craig.topper: Drop curly braces since it's only 1 line
return toVPRecipeResult(tryToPredicatedWidenMemory(Instr, Range, Plan));		return toVPRecipeResult(tryToPredicatedWidenMemory(Instr, Range, Plan));
}		}
return toVPRecipeResult(tryToWidenMemory(Instr, Range, Plan));		return toVPRecipeResult(tryToWidenMemory(Instr, Range, Plan));
}		}

VPRecipeBase *Recipe;		VPRecipeBase *Recipe;
if (auto Phi = dyn_cast<PHINode>(Instr)) {		if (auto Phi = dyn_cast<PHINode>(Instr)) {
if (Phi->getParent() != OrigLoop->getHeader())		if (Phi->getParent() != OrigLoop->getHeader())
Show All 24 Lines	VPRecipeOrVPValueTy VPRecipeBuilder::tryToCreateWidenRecipe(Instruction *Instr,

if (auto *SI = dyn_cast<SelectInst>(Instr)) {		if (auto *SI = dyn_cast<SelectInst>(Instr)) {
bool InvariantCond =		bool InvariantCond =
PSE.getSE()->isLoopInvariant(PSE.getSCEV(SI->getOperand(0)), OrigLoop);		PSE.getSE()->isLoopInvariant(PSE.getSCEV(SI->getOperand(0)), OrigLoop);
return toVPRecipeResult(new VPWidenSelectRecipe(		return toVPRecipeResult(new VPWidenSelectRecipe(
*SI, Plan->mapToVPValues(SI->operands()), InvariantCond));		*SI, Plan->mapToVPValues(SI->operands()), InvariantCond));
}		}

if (preferPredicatedWiden()) {		if (preferPredicatedWiden()) {
		craig.topperUnsubmitted Done Reply Inline Actions Drop curly braces craig.topper: Drop curly braces
return toVPRecipeResult(tryToPredicatedWiden(Instr, Plan));		return toVPRecipeResult(tryToPredicatedWiden(Instr, Plan));
}		}
return toVPRecipeResult(tryToWiden(Instr, *Plan));		return toVPRecipeResult(tryToWiden(Instr, *Plan));
}		}

void LoopVectorizationPlanner::buildVPlansWithVPRecipes(ElementCount MinVF,		void LoopVectorizationPlanner::buildVPlansWithVPRecipes(ElementCount MinVF,
ElementCount MaxVF) {		ElementCount MaxVF) {
assert(OrigLoop->isInnermost() && "Inner loop expected.");		assert(OrigLoop->isInnermost() && "Inner loop expected.");
Show All 25 Lines	for (ElementCount VF = MinVF; ElementCount::isKnownLT(VF, MaxVFPlusOne);) {
VPlans.push_back(		VPlans.push_back(
buildVPlanWithVPRecipes(SubRange, DeadInstructions, SinkAfter));		buildVPlanWithVPRecipes(SubRange, DeadInstructions, SinkAfter));
VF = SubRange.End;		VF = SubRange.End;
}		}
}		}

VPlanPtr LoopVectorizationPlanner::buildVPlanWithVPRecipes(		VPlanPtr LoopVectorizationPlanner::buildVPlanWithVPRecipes(
VFRange &Range, SmallPtrSetImpl<Instruction *> &DeadInstructions,		VFRange &Range, SmallPtrSetImpl<Instruction *> &DeadInstructions,
const DenseMap<Instruction , Instruction > &SinkAfter) {		const DenseMap<Instruction , Instruction > &SinkAfter) {
		fhahnUnsubmitted Not Done Reply Inline Actions nit: would it be possible to move the code to create the VPEVLRecipe here, reducing the scope of the `VPEVL` variable? fhahn: nit: would it be possible to move the code to create the VPEVLRecipe here, reducing the scope…

		fhahnUnsubmitted Not Done Reply Inline Actions There should be only a single VPEVL recipe, would be good to check this in the VPlanVerifier. fhahn: There should be only a single VPEVL recipe, would be good to check this in the VPlanVerifier.
SmallPtrSet<const InterleaveGroup<Instruction> *, 1> InterleaveGroups;		SmallPtrSet<const InterleaveGroup<Instruction> *, 1> InterleaveGroups;

VPRecipeBuilder RecipeBuilder(OrigLoop, TLI, Legal, CM, PSE, Builder);		VPRecipeBuilder RecipeBuilder(OrigLoop, TLI, Legal, CM, PSE, Builder);

// ---------------------------------------------------------------------------		// ---------------------------------------------------------------------------
// Pre-construction: record ingredients whose recipes we'll need to further		// Pre-construction: record ingredients whose recipes we'll need to further
// process after constructing the initial VPlan.		// process after constructing the initial VPlan.
// ---------------------------------------------------------------------------		// ---------------------------------------------------------------------------
▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	for (Instruction &I : BB->instructionsWithoutDebug()) {
RecipeBuilder.setRecipe(Instr, Recipe);		RecipeBuilder.setRecipe(Instr, Recipe);
VPBB->appendRecipe(Recipe);		VPBB->appendRecipe(Recipe);
continue;		continue;
}		}

// Otherwise, if all widening options failed, Instruction is to be		// Otherwise, if all widening options failed, Instruction is to be
// replicated. This may create a successor for VPBB.		// replicated. This may create a successor for VPBB.
VPBasicBlock *NextVPBB =		VPBasicBlock *NextVPBB =
RecipeBuilder.handleReplication(Instr, Range, VPBB, Plan);		RecipeBuilder.handleReplication(Instr, Range, VPBB, Plan);
		fhahnUnsubmitted Done Reply Inline Actions Is this still needed? The latest version only introduces `VPEVLRecipe` in `addCanonicalIVRecipes`. fhahn: Is this still needed? The latest version only introduces `VPEVLRecipe` in…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Still needed. VPWidenIntOrFpInductionRecipe is emitted after addCanonicalIVRecipes and needs to be inserted as a PHI recipe. I'm not sure about TODO, I think it should be removed. But we still need to insert VPWidenIntOrFpInductionRecipe before VPEVLRecipe, which is emitted immediately after canonical IV recipe. ABataev: Still needed. VPWidenIntOrFpInductionRecipe is emitted after addCanonicalIVRecipes and needs to…
		fhahnUnsubmitted Not Done Reply Inline Actions I think the TODO is not really needed. As the EVL recipe needs to be created before other recipe construction, there's not much we can do about it. fhahn: I think the TODO is not really needed. As the EVL recipe needs to be created before other…
		AyalUnsubmitted Not Done Reply Inline Actions (Another) Premature creation of a recipe out of place/time? Ayal: (Another) Premature creation of a recipe out of place/time?
if (NextVPBB != VPBB) {		if (NextVPBB != VPBB) {
VPBB = NextVPBB;		VPBB = NextVPBB;
VPBB->setName(BB->hasName() ? BB->getName() + "." + Twine(VPBBsForBB++)		VPBB->setName(BB->hasName() ? BB->getName() + "." + Twine(VPBBsForBB++)
: "");		: "");
}		}
}		}
}		}

▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	if (CM.foldTailByMasking() && !Legal->getReductionVars().empty()) {
for (auto &Reduction : Legal->getReductionVars()) {		for (auto &Reduction : Legal->getReductionVars()) {
if (CM.isInLoopReduction(Reduction.first))		if (CM.isInLoopReduction(Reduction.first))
continue;		continue;
VPValue *Phi = Plan->getOrAddVPValue(Reduction.first);		VPValue *Phi = Plan->getOrAddVPValue(Reduction.first);
VPValue *Red = Plan->getOrAddVPValue(Reduction.second.getLoopExitInstr());		VPValue *Red = Plan->getOrAddVPValue(Reduction.second.getLoopExitInstr());
Builder.createNaryOp(Instruction::Select, {Cond, Red, Phi});		Builder.createNaryOp(Instruction::Select, {Cond, Red, Phi});
}		}
}		}

std::string PlanName;		std::string PlanName;
		AyalUnsubmitted Done Reply Inline Actions What's the relation between useVPWithVPEVLVectorization and useActiveLaneMask? Should the latter cover the former, so that it suffices to check if (useActiveLaneMask(Style)), or is it meaningful to have both true? Ayal: What's the relation between useVPWithVPEVLVectorization and useActiveLaneMask? Should the…
raw_string_ostream RSO(PlanName);		raw_string_ostream RSO(PlanName);
ElementCount VF = Range.Start;		ElementCount VF = Range.Start;
Plan->addVF(VF);		Plan->addVF(VF);
RSO << "Initial VPlan for VF={" << VF;		RSO << "Initial VPlan for VF={" << VF;
for (VF = 2; ElementCount::isKnownLT(VF, Range.End); VF = 2) {		for (VF = 2; ElementCount::isKnownLT(VF, Range.End); VF = 2) {
Plan->addVF(VF);		Plan->addVF(VF);
RSO << "," << VF;		RSO << "," << VF;
}		}
Show All 27 Lines	if (EnableVPlanPredication) {
VPlanPredicator VPP(*Plan);		VPlanPredicator VPP(*Plan);
VPP.predicate();		VPP.predicate();

// Avoid running transformation to recipes until masked code generation in		// Avoid running transformation to recipes until masked code generation in
// VPlan-native path is in place.		// VPlan-native path is in place.
return Plan;		return Plan;
}		}

SmallPtrSet<Instruction *, 1> DeadInstructions;		SmallPtrSet<Instruction *, 1> DeadInstructions;
VPlanTransforms::VPInstructionsToVPRecipes(OrigLoop, Plan,		VPlanTransforms::VPInstructionsToVPRecipes(OrigLoop, Plan,
Legal->getInductionVars(),		Legal->getInductionVars(),
DeadInstructions, *PSE.getSE());		DeadInstructions, *PSE.getSE());
		AyalUnsubmitted Not Done Reply Inline Actions Restrict EVL to inner loop vectorization only, for now? Ayal: Restrict EVL to inner loop vectorization only, for now?
return Plan;		return Plan;
}		}

// Adjust the recipes for any inloop reductions. The chain of instructions		// Adjust the recipes for any inloop reductions. The chain of instructions
// leading from the loop exit instr to the phi need to be converted to		// leading from the loop exit instr to the phi need to be converted to
// reductions, with one operand being vector and the other being the scalar		// reductions, with one operand being vector and the other being the scalar
// reduction chain.		// reduction chain.
void LoopVectorizationPlanner::adjustRecipesForInLoopReductions(		void LoopVectorizationPlanner::adjustRecipesForInLoopReductions(
▲ Show 20 Lines • Show All 284 Lines • ▼ Show 20 Lines	if (State.hasVectorValue(getOperand(0), Part)) {
else		else
State.set(this, Phi, *State.Instance);		State.set(this, Phi, *State.Instance);
// NOTE: Currently we need to update the value of the operand, so the next		// NOTE: Currently we need to update the value of the operand, so the next
// predicated iteration inserts its generated value in the correct vector.		// predicated iteration inserts its generated value in the correct vector.
State.reset(getOperand(0), Phi, *State.Instance);		State.reset(getOperand(0), Phi, *State.Instance);
}		}
}		}

void VPWidenMemoryInstructionRecipe::execute(VPTransformState &State) {		void VPWidenMemoryInstructionRecipe::execute(VPTransformState &State) {
		AyalUnsubmitted Not Done Reply Inline Actions Recipes should strive to have straightforward code-gen as much as possible (contrary to "smart vector instructions/vp intrinsics emission" of the 2nd bullet in the summary's Tentative Development Roadmap). This is already challenged by the existing (non-EVL) VPWidenMemoryInstructionRecipe::execute. Design dedicated recipe(s) for widening memory instructions under EVL, and introduce them instead of the existing non-EVL recipes, preferably as a VPlan-to-VPlan transformation, rather than try to fit everything here, and potentially elsewhere? Also discussed below in https://reviews.llvm.org/D99750#inline-967127, and iinm in earlier revisions. Ayal: Recipes should strive to have straightforward code-gen as much as possible (contrary to "smart…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Why? I t already handles masking, why it should not be extended with the handling of EVL? New recipe will not add anything new here, just will be copy/paste of the existing recipes. ABataev: Why? I t already handles masking, why it should not be extended with the handling of EVL? New…
		AyalUnsubmitted Not Done Reply Inline Actions Reason explained above: execute() of recipes should be straightforward. This is one of the main guidelines outlined in the VPlan roadmap. This recipe is getting too complicated, should probably separate gather/scatter from wide load/store, and separate the pointer setting (as in [VPlan] Model address separately. #72164), independent of this patch. Recipe should indicate statically if EVL is used or not, to simplify code-gen and facilitate cost estimation, rather than having to check State.EVL during execute(). If multiple recipes share some common core, it can be shared via a common base class, as in VPHeaderPHIRecipe and VPRecipeWithIRFlags. Ayal: Reason explained above: execute() of recipes should be straightforward. This is one of the main…
VPValue *StoredValue = isStore() ? getStoredValue() : nullptr;		VPValue *StoredValue = isStore() ? getStoredValue() : nullptr;
		AyalUnsubmitted Done Reply Inline Actions These two functions each have a single caller, better defined as lambdas next to them? Simpler to separate into two separate store/scatter functions? Remove EVLPart because EVL currently works w/o unrolling, introduce it in the future as part of enabling EVL with unrolling? Ayal: These two functions each have a single caller, better defined as lambdas next to them? Simpler…
State.ILV->vectorizeMemoryInstruction(&Ingredient, State,		State.ILV->vectorizeMemoryInstruction(&Ingredient, State,
StoredValue ? nullptr : getVPValue(),		StoredValue ? nullptr : getVPValue(),
getAddr(), StoredValue, getMask());		getAddr(), StoredValue, getMask());
}		}

void VPPredicatedWidenMemoryInstructionRecipe::execute(		void VPPredicatedWidenMemoryInstructionRecipe::execute(
fhahnUnsubmitted Done Reply Inline Actions can leave unchanged? fhahn: can leave unchanged?		fhahnUnsubmitted Done Reply Inline Actions can leave unchanged? fhahn: can leave unchanged?
		fhahnUnsubmitted Not Done Reply Inline Actions nit: `EC` unused? fhahn: nit: `EC` unused?
		fhahnUnsubmitted Done Reply Inline Actions Is the gather scatter case handled correctly for EVL at the moment? fhahn: Is the gather scatter case handled correctly for EVL at the moment?
		ABataevAuthorUnsubmitted Done Reply Inline Actions Added support for this. ABataev: Added support for this.
		fhahnUnsubmitted Not Done Reply Inline Actions Not sure if we also have a test case for this path, do you know if this would be handled correctly at the moment? fhahn: Not sure if we also have a test case for this path, do you know if this would be handled…
		fhahnUnsubmitted Not Done Reply Inline Actions Great thanks! Now that there is VP intrinsic handling in multiple places, would it be better to handle all EVL related codegen together, i.e. something like below to avoid complicating reading the existing non-EVL code. WDYT? for () Value VectorGep = State.get(getAddr(), Part); if (Value EVLPart = State.EVL ? State.get(State.EVL, Part) : nullptr) { NewSI = lowerUsingVectorIntrinsics(vectorGEP..) } else { existing code.... } fhahn: Great thanks! Now that there is VP intrinsic handling in multiple places, would it be better to…
		fhahnUnsubmitted Done Reply Inline Actions can leave unchanged? fhahn: can leave unchanged?
		fhahnUnsubmitted Done Reply Inline Actions can leave unchanged now? fhahn: can leave unchanged now?
		fhahnUnsubmitted Done Reply Inline Actions Indent looks off here fhahn: Indent looks off here
		AyalUnsubmitted Done Reply Inline Actions Better simply check if (State.EVL) and then State.get() it, or could the latter return null? Raised as a nit below. Better avoid checking if State.EVL altogether during VPlan execution, as noted above. Ayal: Better simply check if (State.EVL) and then State.get() it, or could the latter return null?
VPTransformState &State) {		VPTransformState &State) {
		fhahnUnsubmitted Not Done Reply Inline Actions For correctness, would we also need special handling for interleave groups? fhahn: For correctness, would we also need special handling for interleave groups?
		ABataevAuthorUnsubmitted Done Reply Inline Actions if foldTailByMasking() is true (for VP support it is true), interleave groups are invalidated. ABataev: if foldTailByMasking() is true (for VP support it is true), interleave groups are invalidated.
		AyalUnsubmitted Not Done Reply Inline Actions if foldTailByMasking() is true (for VP support it is true), interleave groups are invalidated. unless useMaskedInterleavedAccesses() is also true? Ayal: > if foldTailByMasking() is true (for VP support it is true), interleave groups are invalidated.
VPValue *StoredValue = isStore() ? getStoredValue() : nullptr;		VPValue *StoredValue = isStore() ? getStoredValue() : nullptr;
State.ILV->vectorizeMemoryInstruction(		State.ILV->vectorizeMemoryInstruction(
		fhahnUnsubmitted Not Done Reply Inline Actions A more direct why that doesn't require accessing the preheader would be to use something like `State.Builder.GetInsertBlock()->getModule();` fhahn: A more direct why that doesn't require accessing the preheader would be to use something like…
&Ingredient, State, StoredValue ? nullptr : getVPValue(), getAddr(),		&Ingredient, State, StoredValue ? nullptr : getVPValue(), getAddr(),
		fhahnUnsubmitted Not Done Reply Inline Actions Can this be sunk inside the `else {`? fhahn: Can this be sunk inside the ` else {`?
		AyalUnsubmitted Not Done Reply Inline Actions nit: `if (State.EVL) { Value EVLPart = State.get(State.EVL, Part); ...`? Why deal with Parts when EVL mandates UF=1? Ayal:* nit: `if (State.EVL) { Value *EVLPart = State.get(State.EVL, Part); ...`? Why deal with Parts…
		ABataevAuthorUnsubmitted Done Reply Inline Actions The loop executes to State.UF (1 here, yes), so everything is fine here ABataev: The loop executes to State.UF (1 here, yes), so everything is fine here
StoredValue, getMask(), getEVL());		StoredValue, getMask(), getEVL());
		fhahnUnsubmitted Not Done Reply Inline Actions nit: `If EVLPart` fhahn: nit: `If EVLPart`
		fhahnUnsubmitted Not Done Reply Inline Actions Could you elaborate what better means here? Might have missed it, does the current code handle reverse? fhahn: Could you elaborate what better means here? Might have missed it, does the current code handle…
		ABataevAuthorUnsubmitted Done Reply Inline Actions I disabled reverse support, see useVPWithVPEVLVectorization(), need support for vp_reverse intrinsic, which is not added yet. ABataev: I disabled reverse support, see useVPWithVPEVLVectorization(), need support for vp_reverse…
		fhahnUnsubmitted Done Reply Inline Actions nit: drop `Better` here, as reverse loading isn't supported at all a the moment. Would be good to assert here that the load isn't reversed fhahn: nit: drop `Better` here, as reverse loading isn't supported at all a the moment. Would be good…
		fhahnUnsubmitted Done Reply Inline Actions nit: Drop `better` here, as reverse storing isn't supported at all at the moment. Would be good to also assert. fhahn: nit: Drop `better` here, as reverse storing isn't supported at all at the moment. Would be good…
}		}
		fhahnUnsubmitted Not Done Reply Inline Actions It's not clear to me what the reasoning is here to simplify this to the all-true mask without checking the operands of the compare. If the compare can be simplified to the all-true mask, then this would be suitable to do as VPlan-to-VPlan simplification instead of during codegen. fhahn: It's not clear to me what the reasoning is here to simplify this to the all-true mask without…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Added extra check. ABataev: Added extra check.
		fhahnUnsubmitted Not Done Reply Inline Actions Thanks for the update. Looking at this again, it seems like it would be better to either perform this simplification when creating the tail-folding mask or optimize as VPlan-to-VPlan transform, as presumably this applies to all users of the top-level mask? Might be good to tie in with D157037 which would always create the top-level mask in the beginning when tail-folding. fhahn: Thanks for the update. Looking at this again, it seems like it would be better to either…
		ABataevAuthorUnsubmitted Done Reply Inline Actions If I understand your proposal correctly, it will require adding special VPValue like AllTrueVPValue. This lambda returns Value , no VPValue, so it cannot be done directly without adding new live-in(?). ABataev:* If I understand your proposal correctly, it will require adding special VPValue like…
		fhahnUnsubmitted Not Done Reply Inline Actions I think it would be preferable to handle this more generically during VPlan optimizations. I’ve been working on moving tail folding to a Vallant transform instead of combining it with regular block mask creation (D157713) With that, this simplification could be handled generically by not adding the header mask during the tail folding transform. fhahn: I think it would be preferable to handle this more generically during VPlan optimizations.
		ABataevAuthorUnsubmitted Done Reply Inline Actions Shall I wait for your patch or we can land this with the current implementation so you can later adjust this in your patch? ABataev: Shall I wait for your patch or we can land this with the current implementation so you can…
		AyalUnsubmitted Done Reply Inline Actions Moving tail folding to a transform, following VPlan roadmap, is already quite an endeavor, which would hopefully better facilitate this patch. Better avoid complicating it further. Ayal: Moving tail folding to a transform, following VPlan roadmap, is already quite an endeavor…
		fhahnUnsubmitted Not Done Reply Inline Actions Can the `maskRequired` check be sunk into `MaskValue` or even better be taken care on recipe construction? fhahn: Can the `maskRequired` check be sunk into `MaskValue` or even better be taken care on recipe…

Value *InnerLoopVectorizer::createEVL() {		Value *InnerLoopVectorizer::createEVL() {
assert(PreferPredicateWithVPIntrinsics !=		assert(PreferPredicateWithVPIntrinsics !=
		craig.topperUnsubmitted Not Done Reply Inline Actions Should set the alignment attribute on the load. craig.topper: Should set the alignment attribute on the load.
PreferVPIntrinsicsTy::NoPredication &&		PreferVPIntrinsicsTy::NoPredication &&
"Predication with VP intrinsics turned off.");		"Predication with VP intrinsics turned off.");
		craig.topperUnsubmitted Not Done Reply Inline Actions Should set the alignment attribute on the pointer operand. Something like NewSI->addParamAttr(1, Attribute::getWithAlignment(NewSI->getContext(), Alignment)); craig.topper: Should set the alignment attribute on the pointer operand. Something like ``` NewSI…
		fhahnUnsubmitted Not Done Reply Inline Actions Is there a reason for creating the VP intrinsic call explicitly rather than using the VP builder? fhahn: Is there a reason for creating the VP intrinsic call explicitly rather than using the VP…
		fhahnUnsubmitted Not Done Reply Inline Actions Ah I originally thought that the vector builder provides the same interface as IRBuilder, but uses the vector predication intrinsics when mask/EVL is set. Together with performing the `MaskValue` simplification as VP2VP transform, this would effectively reduce the changes here to a few lines, i.e. only setting EVL in addition to the mask. Another thing came to mind: is there a plan to converge the separate masked memory intrinsics and the vector predication versions, i.e. long term, should the masked memory intrinsics be superseded by the vector predication ones? fhahn: Ah I originally thought that the vector builder provides the same interface as IRBuilder, but…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Ah I originally thought that the vector builder provides the same interface as IRBuilder, but uses the vector predication intrinsics when mask/EVL is set. Together with performing the `MaskValue` simplification as VP2VP transform, this would effectively reduce the changes here to a few lines, i.e. only setting EVL in addition to the mask. Another thing came to mind: is there a plan to converge the separate masked memory intrinsics and the vector predication versions, i.e. long term, should the masked memory intrinsics be superseded by the vector predication ones? We did not discuss it yet and it should separate discussion/decision. Though it would be good to make it part of masked intrinsics or even instructions (along with the mask). ABataev: > Ah I originally thought that the vector builder provides the same interface as IRBuilder, but…

if (PreferPredicateWithVPIntrinsics == PreferVPIntrinsicsTy::IfAVLSupported)		if (PreferPredicateWithVPIntrinsics == PreferVPIntrinsicsTy::IfAVLSupported)
assert(TTI->hasActiveVectorLength() &&		assert(TTI->hasActiveVectorLength() &&
"Target does not support vector length predication.");		"Target does not support vector length predication.");

auto *MinVF = Builder.getInt32(VF.getKnownMinValue());		auto *MinVF = Builder.getInt32(VF.getKnownMinValue());
Value *RuntimeVL =		Value *RuntimeVL =
VF.isScalable() ? Builder.CreateVScale(MinVF, "vscale.x.vf") : MinVF;		VF.isScalable() ? Builder.CreateVScale(MinVF, "vscale.x.vf") : MinVF;

if (PreferPredicateWithVPIntrinsics ==		if (PreferPredicateWithVPIntrinsics ==
PreferVPIntrinsicsTy::WithoutAVLSupport &&		PreferVPIntrinsicsTy::WithoutAVLSupport &&
!TTI->hasActiveVectorLength()) {		!TTI->hasActiveVectorLength()) {
return RuntimeVL;		return RuntimeVL;
}		}

Value *Remaining = Builder.CreateSub(TripCount, Induction);		Value *Remaining = Builder.CreateSub(TripCount, Induction);
		bmahjourUnsubmitted Not Done Reply Inline Actions Value Remaining = Builder.CreateSub(TripCount, Induction, "tc.minus.iv"); bmahjour:* Value *Remaining = Builder.CreateSub(TripCount, Induction, "tc.minus.iv");
// FIXME: This is a proof-of-concept naive implementation to demonstrate using		// FIXME: This is a proof-of-concept naive implementation to demonstrate using
// a target dependent intrinisc to compute the vector length.		// a target dependent intrinisc to compute the vector length.
		craig.topperUnsubmitted Done Reply Inline Actions intrinisc -> intrinsic craig.topper: intrinisc -> intrinsic
if (TTI->useCustomActiveVectorLengthIntrinsic()) {		if (TTI->useCustomActiveVectorLengthIntrinsic()) {
// Set Element width to the widest type used in the loop.		// Set Element width to the widest type used in the loop.
unsigned SmallestType, WidestType;		unsigned SmallestType, WidestType;
std::tie(SmallestType, WidestType) = Cost->getSmallestAndWidestTypes();		std::tie(SmallestType, WidestType) = Cost->getSmallestAndWidestTypes();
Constant *ElementWidth = Builder.getInt32(WidestType);		Constant *ElementWidth = Builder.getInt32(WidestType);
// Set Register width factor to 1.		// Set Register width factor to 1.
Constant *RegWidthFactor = Builder.getInt32(1);		Constant *RegWidthFactor = Builder.getInt32(1);
return Builder.CreateIntrinsic(Intrinsic::experimental_set_vector_length,		return Builder.CreateIntrinsic(Intrinsic::experimental_set_vector_length,
{Remaining->getType()},		{Remaining->getType()},
{Remaining, ElementWidth, RegWidthFactor});		{Remaining, ElementWidth, RegWidthFactor});
}		}

Value *RuntimeVLExt = Builder.CreateZExt(RuntimeVL, Remaining->getType());		Value *RuntimeVLExt = Builder.CreateZExt(RuntimeVL, Remaining->getType());
		bmahjourUnsubmitted Not Done Reply Inline Actions can we try to initialize RuntimeVL with the right type, to avoid having to zero extend it here? bmahjour: can we try to initialize RuntimeVL with the right type, to avoid having to zero extend it here?
		vkmrUnsubmitted Not Done Reply Inline Actions The core issue here is that the Vectorizer sort of implicitly uses i32 for everything related to `VF` and Vector Lenght. `%evl` parameter in VP intrinsics is also of type i32. `TC`, `BTC` and `IV` are however i64. So whenever we mix these two we end up having extends and truncates. That being said, in this particular case, it should be possible to initialize `MinVF` and thus `RuntimeVL` to i64 if we are going to use it with `TC` and `IV` and i32 otherwise. vkmr: The core issue here is that the Vectorizer sort of implicitly uses i32 for everything related…
Value *EVL =		Value *EVL =
Builder.CreateBinaryIntrinsic(Intrinsic::umin, RuntimeVLExt, Remaining);		Builder.CreateBinaryIntrinsic(Intrinsic::umin, RuntimeVLExt, Remaining);
		bmahjourUnsubmitted Not Done Reply Inline Actions you can avoid calling the intrinsic, by just doing a `cmp` and a `select`. (ie check that Remaining is less than RuntimeVL and if so pick Ramining, otherwise select RuntimeVL). bmahjour: you can avoid calling the intrinsic, by just doing a `cmp` and a `select`. (ie check that…
		vkmrUnsubmitted Not Done Reply Inline Actions I may be wrong here but from what I understand, patches D81829 and D84125 recently introduces these intrinsics to avoid these IR patterns. I am not sure if their use is discouraged for some reason. vkmr: I may be wrong here but from what I understand, patches [[ https://reviews.llvm.org/D81829 \|…
return Builder.CreateTrunc(EVL, Builder.getInt32Ty());		return Builder.CreateTrunc(EVL, Builder.getInt32Ty());
		bmahjourUnsubmitted Not Done Reply Inline Actions Can we avoid truncating to 32-bit on 64-bit targets that take in 64-bit length? I think the type should be the same as the IV. bmahjour: Can we avoid truncating to 32-bit on 64-bit targets that take in 64-bit length? I think the…
		vkmrUnsubmitted Done Reply Inline Actions See previous related comment. %evl parameter in VP intrinsics is of type i32. vkmr: See previous related comment. %evl parameter in VP intrinsics is of type i32.
}		}

void VPWidenEVLRecipe::execute(VPTransformState &State) {		void VPWidenEVLRecipe::execute(VPTransformState &State) {
// FIXME: Interleaving with predicated vectorization is not yet supported.		// FIXME: Interleaving with predicated vectorization is not yet supported.
// Since VPlan only provides set methods for per Part or per Instance, we use		// Since VPlan only provides set methods for per Part or per Instance, we use
// the per Part set method to store the same EVL for each Part (State.UF would		// the per Part set method to store the same EVL for each Part (State.UF would
// be 1 for now.)		// be 1 for now.)
for (unsigned Part = 0; Part < State.UF; Part++)		for (unsigned Part = 0; Part < State.UF; Part++)
▲ Show 20 Lines • Show All 695 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LV, VP]VP intrinsics support for the Loop VectorizerNeeds ReviewPublic

Details

Tentative Development Roadmap

Diff Detail

Event Timeline

Revision Contents

Diff 334865

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

[LV, VP]VP intrinsics support for the Loop Vectorizer
Needs ReviewPublic