This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Transforms/Vectorize/
-
llvm/
-
Transforms/
-
Vectorize/
-
LoopVectorizationLegality.h
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
-
LoopVectorizationLegality.cpp
2/3
LoopVectorizationPlanner.h
22/25
LoopVectorize.cpp
-
VPlan.h
-
VPlan.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
AArch64/
-
masked-call.ll
1
maximize-bandwidth-invalidate.ll
-
sve-tail-folding-forced.ll
-
tail-folding-styles.ll
-
ARM/
-
mve-known-trip-count.ll
-
tail-folding-reduces-vf.ll
-
PowerPC/
-
reg-usage.ll
-
RISCV/
-
riscv-vector-reverse.ll
-
X86/
-
vect.omp.force.small-tc.ll
-
first-order-recurrence-sink-replicate-region.ll
1/2
first-order-recurrence.ll
-
icmp-uniforms.ll
-
pr45679-fold-tail-by-masking.ll
1/2
vplan-sink-scalars-and-merge.ll

Differential D142015

[LV] Plan with and without FoldTailByMasking
Needs ReviewPublic

Authored by dmgreen on Jan 18 2023, 6:33 AM.

Download Raw Diff

Details

Reviewers

fhahn
Ayal
SjoerdMeijer
sdesmalen
david-arm
bmahjour

Summary

Currently the loop vectorizer has a single parameter in the CostModel that controls FoldTailByMasking. It is set fairly early, and can't be changed later meaning we need to pick between tail folding and non tail folding before we have done much cost modelling. This patch aims to alter that so that there is not a single parameter, moving it eventually into the vplan, so that we can have plans both with and without FoldTailByMasking, that can be costed against one another and the best one picked for vectorization.

A lot of the changes are fairly mechanical and attempted to be non-disruptive to keep the patch simpler, but there are still a fair number of changes. The important parts are:

FoldTailByMasking is removed from CostModel. It now has a MayFoldTailByMasking variable to hold whether FoldTailByMasking VPlans can be created.
A number of maps in the cost model like InstsToScalarize/Uniforms/Scalars/ForcedScalars were made conditional on the pair of (FoldTailByMasking, VF), so that they still contain the information required in both FoldTailByMasking=true and FoldTailByMasking=false cases. The semi-random name VectorizationScheme was used to describe the pair of (FoldTailByMasking, VF). Alternative suggestions welcome.
We then create vplans with both FoldTailByMasking=false and FoldTailByMasking=true if we MayFoldTailByMasking. The VPlan from then stores whether it is FoldTailByMasking. For VF=1 only non-predicated plans are created.
Tail folded and non tail folded vplans then need to be able to be costed against one another. This patch makes FoldTailByMasking win on a tie if the costs are equal, which will be more consistent with the vectorization before this patch.
Epilogue loops for the moment are always use FoldTailByMasking=false. This can hopefully be changed in the near future to allow predicated epilogues for unpredicated loop bodies.

Overall this has the effect of allowing us to model and cost tail folded and non tail folded vplans against one another, and should in the future allow us to generate predicated epilogues for unpredicated vector loops.

Diff Detail

Event Timeline

dmgreen created this revision.Jan 18 2023, 6:33 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 18 2023, 6:33 AM

Herald added subscribers: luke, StephenFan, frasercrmck and 23 others. · View Herald Transcript

dmgreen requested review of this revision.Jan 18 2023, 6:33 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 18 2023, 6:33 AM

Herald added subscribers: • pcwang-thead, vkmr, MaskRay. · View Herald Transcript

Harbormaster completed remote builds in B208473: Diff 490112.Jan 18 2023, 6:34 AM

Thanks for working on this. General direction looks good to me, this is exactly what we want.

VectorizationScheme was used to describe the pair of (FoldTailByMasking, VF). Alternative suggestions welcome

I was going to suggest PredicationScheme. But the VF is also part of it, so PredicationScheme does not covert it. Perhaps VectorizationScheme is just fine. :)

I have done only a first scan of the patch, but it is big, so need to do it again, which I will do soon.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5748–5749	Nit: perhaps rename this and in some other places to VS, so that the VF.Width references below become VS.Width.

This makes a lot of sense to me, so with the nits addressed, this LGTM.
But wait a day in case there are other ideas about this.

llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
182	This comment needs to be moved to line no. 224.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7585	nit: TODO

This revision is now accepted and ready to land.Jan 26 2023, 12:46 AM

Hi @dmgreen, I'm sorry I've not looked into this in more detail, but from the description you gave (which is very detailed - thanks!) I do have one concern about choosing tail-folding by default in a tie. One major problem we have at the moment is that the active lane mask call is effectively free because, unless I'm mistaken, we don't add the cost of this intrinsic to the loop. A simple IV increment (an add instruction) is likely to be cheaper than the codegen for the loop predicate for many targets? However, I do appreciate that targets that can't efficiently generate loop predicates probably also can't do masked loads and stores, so the tail-folded loop cost is likely to be high anyway. I wonder if it's better to be conservative and choose the non-tail-folded version in a tie until we've got a fairer comparison between different vectorisation styles?

fhahn mentioned this in D142669: [VPlan] Allow planning with different cost models..Jan 26 2023, 2:12 PM

fhahn mentioned this in D142670: [LV] Allow forcing tail folding when constructing the cost model (WIP)..Jan 26 2023, 2:15 PM

I think the overall direction is very desirable! However I am not sure if adding a FoldTail argument to the cost functions is an ideal way to get there as it requires a large number of changes and is not really scalable (e.g. adding support to generate a 3rd variant of plans may mean adding another argument to all functions)

I am not sure if I am missing something, but would it be possible to instead run the planner multiple times with different cost-model configurations? I tried to sketch an option for more general support for that in D142669. Then we could generate the plans for tail-folding by just instantiating another instance of the cost model, sketched in D142670. Maybe that could help to simplify the patch?

Another thing I noticed that the patch changes skeleton creation to take a VPlan as argument. I think it would be preferable to do this the other way around if possible, so instead model the difference between plans with and without tail-folding explicitly in the pre-header blocks in the VPlan. I've not looked at the details here to see how difficult that would be yet unfortunately though. But more than happy to iterate on that aspect together!

This revision now requires changes to proceed.Jan 26 2023, 3:09 PM

Could you model this approach instead as a VPlan transformation, instead of hardcoding unscalable flags. Transform: add tail folding. Transform: add masked tail folding. Transform: add scalar tail. Then you can take the cost model to decide which is more desirable.

dmgreen added a child revision: D142875: [LV] Predicated epilog vectorization.Jan 30 2023, 1:18 AM

In D142015#4066159, @SjoerdMeijer wrote:

Thanks for working on this. General direction looks good to me, this is exactly what we want.

VectorizationScheme was used to describe the pair of (FoldTailByMasking, VF). Alternative suggestions welcome

I was going to suggest PredicationScheme. But the VF is also part of it, so PredicationScheme does not covert it. Perhaps VectorizationScheme is just fine. :)

Thanks. There are still some MVE performance issues I need to work through, where it now picks a higher unpredicated trip count over tail predication, and a combination of it being inside a nested loop and low cheap counts make it unprofitable. There are some good improvements too, but that one is a little too large. I think with epilog vectorization and a few other tricks it can get the the point where it too is an improvement.

In D142015#4082011, @david-arm wrote:

Hi @dmgreen, I'm sorry I've not looked into this in more detail, but from the description you gave (which is very detailed - thanks!) I do have one concern about choosing tail-folding by default in a tie. One major problem we have at the moment is that the active lane mask call is effectively free because, unless I'm mistaken, we don't add the cost of this intrinsic to the loop. A simple IV increment (an add instruction) is likely to be cheaper than the codegen for the loop predicate for many targets? However, I do appreciate that targets that can't efficiently generate loop predicates probably also can't do masked loads and stores, so the tail-folded loop cost is likely to be high anyway. I wonder if it's better to be conservative and choose the non-tail-folded version in a tie until we've got a fairer comparison between different vectorisation styles?

Yep - that's hopefully step 3. Once we have this and epilog vectorization (which should be a fairly simple addition) we can then have the option on SVE of choosing between unpredicated body + predicated remainder vs a full predicated body, depending on the target. Targets where the while instructions are not a bottleneck can get the benefits of full predication. I'm not sure yet whether it will just be a hard limit or an added cost. That's the idea at least, there are plenty of details to sort out first and it we will have to see how it performs in practice.

In D142015#4084191, @fhahn wrote:

I think the overall direction is very desirable! However I am not sure if adding a FoldTail argument to the cost functions is an ideal way to get there as it requires a large number of changes and is not really scalable (e.g. adding support to generate a 3rd variant of plans may mean adding another argument to all functions)

I am not sure if I am missing something, but would it be possible to instead run the planner multiple times with different cost-model configurations? I tried to sketch an option for more general support for that in D142669. Then we could generate the plans for tail-folding by just instantiating another instance of the cost model, sketched in D142670. Maybe that could help to simplify the patch?

Multiple cost models was something I considered, but dismissed fairly quickly. I didn't think it would be something that anyone would agree to in a review - that we invent a new hierarchy of multiple cost-models each containing multiple vplans. The vplans should be flatter than that, and there is more in the cost-model that isn't dependent on FoldTail than is. I think we should be trying to reduce the number of cost-models, not increase it! The same argument about new variants could equally apply, just to an explosion in the number of cost-models needed.

This patch only pushed the FoldTailByMasking to the places that need it, just treating FoldTailByMasking like the VF already is. How married are you to the idea of multiple cost models? (Or how anti this are you?) I know this patch is fairly large but it feels like a cleaner end result once it is done. Let me know and I can try and get everything working the other way if necessary.

Another thing I noticed that the patch changes skeleton creation to take a VPlan as argument. I think it would be preferable to do this the other way around if possible, so instead model the difference between plans with and without tail-folding explicitly in the pre-header blocks in the VPlan. I've not looked at the details here to see how difficult that would be yet unfortunately though. But more than happy to iterate on that aspect together!

I was hoping for one step at a time. Firstly we can break the dependency on specifying FoldTailByMasking early in the costmodel, whilst keeping then number of other changes needed minimal. Bigger changes can come later, but might require more changes. (It is useful, for example, to know that a plan is FoldTailByMasking when picking epilog plans).

Matt added a subscriber: Matt.Jan 30 2023, 3:01 PM

syzaara added a subscriber: syzaara.Jan 31 2023, 9:41 AM

In D142015#4090401, @dmgreen wrote:

In D142015#4084191, @fhahn wrote:

I think the overall direction is very desirable! However I am not sure if adding a FoldTail argument to the cost functions is an ideal way to get there as it requires a large number of changes and is not really scalable (e.g. adding support to generate a 3rd variant of plans may mean adding another argument to all functions)

I am not sure if I am missing something, but would it be possible to instead run the planner multiple times with different cost-model configurations? I tried to sketch an option for more general support for that in D142669. Then we could generate the plans for tail-folding by just instantiating another instance of the cost model, sketched in D142670. Maybe that could help to simplify the patch?

Multiple cost models was something I considered, but dismissed fairly quickly. I didn't think it would be something that anyone would agree to in a review - that we invent a new hierarchy of multiple cost-models each containing multiple vplans. The vplans should be flatter than that, and there is more in the cost-model that isn't dependent on FoldTail than is. I think we should be trying to reduce the number of cost-models, not increase it! The same argument about new variants could equally apply, just to an explosion in the number of cost-models needed.

IIUC we in practice already have multiple cost models (e.g. tail folded vs regular), but at the moment we effectively pick which one to run very early. So I don't think this adds a new hierarchy, but I might be missing some aspects you are thinking about here?

I agree that we shouldn't have a new hierarchies containing multiple plans. IIUC this is referring to D142669 where it picks selects the list of plans for the most profitable VF returned by plan. This was mostly to keep the sketch of supporting multiple cost models simple (in terms of time to implement). I *think* this could be improved by computing the cost for each plan directly, instead of doing so after constructing all plans in selectVectorizationFactor. I'll try to see if there are any stumbling blocks down this path.

I was also hoping to untangle some of the cost model functionality that computes the max VF, but unfortunately things are very tied together and it is not really feasible to split those parts of from computing costs per VF.

This patch only pushed the FoldTailByMasking to the places that need it, just treating FoldTailByMasking like the VF already is. How married are you to the idea of multiple cost models? (Or how anti this are you?) I know this patch is fairly large but it feels like a cleaner end result once it is done. Let me know and I can try and get everything working the other way if necessary.

The changes are mostly mechanical, but it makes the mappings in the cost model more complicated and adding the additional argument potentially increases maintenance cost and makes the signatures slightly more complex. I just want to make sure we the considered potential tradeoffs/alternatives if we go down that route.

Another thing I noticed that the patch changes skeleton creation to take a VPlan as argument. I think it would be preferable to do this the other way around if possible, so instead model the difference between plans with and without tail-folding explicitly in the pre-header blocks in the VPlan. I've not looked at the details here to see how difficult that would be yet unfortunately though. But more than happy to iterate on that aspect together!

I was hoping for one step at a time. Firstly we can break the dependency on specifying FoldTailByMasking early in the costmodel, whilst keeping then number of other changes needed minimal. Bigger changes can come later, but might require more changes. (It is useful, for example, to know that a plan is FoldTailByMasking when picking epilog plans).

Agreed, I think there are existing places where it would be convenient to know if the plan is FoldTailByMasking. One example is adjustRecipesForReductions, where this is the main thing for removing the reliance on the cost-model there.

fhahn mentioned this in D143938: [VPlan] Compute costs for plans directly after construction..Feb 13 2023, 12:10 PM

In D142015#4114045, @fhahn wrote:

I agree that we shouldn't have a new hierarchies containing multiple plans. IIUC this is referring to D142669 where it picks selects the list of plans for the most profitable VF returned by plan. This was mostly to keep the sketch of supporting multiple cost models simple (in terms of time to implement). I *think* this could be improved by computing the cost for each plan directly, instead of doing so after constructing all plans in selectVectorizationFactor. I'll try to see if there are any stumbling blocks down this path.

Sketched in D143938 and updated D142669.

This is just a rebase of the existing patch - I think not much has changed other than adjustments needed for the rebase.

Harbormaster completed remote builds in B221918: Diff 508526.Mar 27 2023, 1:49 AM

OK This now attempt to use the scheme @fhahn suggested where there are multiple cost models that get created with TailFold=true and TailFold=false. It should hopefully work with allowing planning with and without tail folding (this patch), plus the predicated epilog vectorization in the followup.

Some of what was in CostModel has moved into the Planner. This patch adds a reference from the vplan to the costmodel to make sure the correct model is used with the correct plan. Hopefully that can be reduced and removed eventually. It does manage to remove the VFCandidates array which is a nice little simplification.

Harbormaster completed remote builds in B221934: Diff 508547.Mar 27 2023, 2:56 AM

Hi @dmgreen, I'm starting to review this patch again, but it's quite large so it might take a while. Just a quick skim through so far I thought of a few ideas about some NFC patches that could be useful in their own right and might help to reduce the diff, if you agree?

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5289	Some changes like this don't really need to be in this patch, right?
5293	Can't we just pass in `TheFunction` instead?
5349	Again, I'm a bit concerned about the effect this will have when the active lane mask call is not costed into the loop. I'd prefer for now to be conservative and opt for the non-predicated scheme until we've accounted for the IV costs.
5749	Again, it looks like the change in prototype doesn't need to be part of this patch and might be a useful tidy-up NFC patch.
llvm/test/Transforms/LoopVectorize/vplan-sink-scalars-and-merge.ll
973	These debug output changes look useful by themselves outside of this patch - not sure if it's possible to pass in the `FoldTailByMasking` flag in a separate patch?

In D142015#4223465, @dmgreen wrote:

OK This now attempt to use the scheme @fhahn suggested where there are multiple cost models that get created with TailFold=true and TailFold=false. It should hopefully work with allowing planning with and without tail folding (this patch), plus the predicated epilog vectorization in the followup.

Some of what was in CostModel has moved into the Planner. This patch adds a reference from the vplan to the costmodel to make sure the correct model is used with the correct plan. Hopefully that can be reduced and removed eventually. It does manage to remove the VFCandidates array which is a nice little simplification.

I updated D143938 and D142669 recently and I think they should be in good shape now and should be ready for review. Rebasing the patch on top of them should hopefully simplify the diff. There are a few additional dependencies to remove a few more places that rely on the cost model after construction.

I put all patches on a branch, if that helps: https://github.com/fhahn/llvm-project/tree/vplan-cost-upfront

dmgreen added inline comments.Apr 6 2023, 9:30 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5289	Yeah sure, I can give that a try. It seems a bit odd on it's own to be honest, like a patch that makes things worse on its own and feels like we change it just so that we change it again later. Creating needless busywork. But I have for the moment moved it out of this patch.
5293	I think I chose Loop because the LoopVectorizationPlanner doesn't store the Function, so would need pass L->getHeader()->getParent() as argument and it simplified the interface a little. Happy to change it though.
5321	I also have pulled this part out into D147720, as a functional change that can be done separately.
5349	For MVE in the past, when we returned true for preferPredicateOverEpilogue, we would get a tail-predicated loop (or not vectorize). Now that we get both tail-folded and non-tail-folded loops to cost against one another, we need to pick the tail-folded version on a tie to not be worse than before. The scores between the two are often the same, so to be closer to the old codegen the conservative option is to chose FoldTail on a tie. I believe that the only target that returns true from preferPredicateOverEpilogue at the moment is MVE. SVE can be adjusted later if needed, but my current thinking was to use UsePredicatedEpilogue from D145925 as a first step to get the benefits of epilog vectorization whilst hopefully not messing anything else up. That was the plan for how to treat SVE conservatively, and we can expand things if needed in the future. It doesn't really make a lot of sense to add up disparate throughput costs and expect them to mean anything, but we can perhaps come up with something if we need and if not we can always just force unpredicated body + predicated epilog. Let me know what you think.
5749	Thanks Yeah - That is a good idea. I can actually just remove this from the current version of the patch, I believe.
llvm/test/Transforms/LoopVectorize/vplan-sink-scalars-and-merge.ll
973	Hmm. It would involve pulling out VPlan->FoldTailByMasking into a separate patch. That feels like it is the core of this patch, to be honest. Pulling it out just for some debug messages that are usually present elsewhere feels like a bit of an odd patch on it's own. When you look at the whole debug output there is already parts explaining whether the CostModel is FoldTailByMasking.

dmgreen updated this revision to Diff 511438.Apr 6 2023, 9:31 AM

Harbormaster completed remote builds in B224041: Diff 511438.Apr 6 2023, 9:32 AM

Hi @dmgreen, this patch looks a lot smaller and tidier than last time - thanks a lot! I've just got a few more comments.

llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
195	Perhaps this comment should now be something like: /// Cost of the loop with that width and vectorization style. What do you think?
302	I think the comment needs updating now because it no longer returns a VF.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1551	This is not your fault, but whilst you are here could you update the comment to reflect the function behaviour? It looks like the comment probably got out of sync with the function at some point. :) Perhaps something like: /// Returns true if a scalar epilogue is allowed. It may return false if: /// 1. We are optimising for code size /// 2. There is a loop hint annotation /// etc.
1555	Do you know the scenarios in which we aren't folding the tail by masking, but a scalar epilogue is not needed? I think CM_ScalarEpilogueNotNeededUsePredicate is only set for these cases: The user has supplied a hint. The user has set the prefer-predicate-over-epilogue flag. The TTI hook preferPredicateOverEpilogue has returned true. but I'd expect that for at least 3) we've set FoldTailByMasking to true?
5073–5076	In this case it looks like we're going to create two sets of plans - one with tail-folding and one without - and in both cases we're actually going to tail-fold anyway. Is that right? Not saying we shouldn't do that, but just trying to understand how this works.
5082	Don't we have to also set `ScalarEpilogueStatus = CM_ScalarEpilogueAllowed` here?
5460	This should always be false for VF=1, right?
7544	Perhaps worth adding a `/* ... */` comment for the new argument too? Same for any other similar places in the file.
7605	Is it worth changing `hasVF` to take the `FoldTailByMasking` as a second argument so you can just write: assert(count_if(VPlans, [VF, FoldTailByMasking](const VPlanPtr &Plan) { return Plan->hasVF(VF, FoldTailByMasking); }) == 1 && "Best VF has not a single VPlan.");
8682	Wouldn't it be simpler to just write: for (ElementCount VF = MinVF; ElementCount::isKnownLE(VF, MaxVF);
llvm/test/Transforms/LoopVectorize/AArch64/maximize-bandwidth-invalidate.ll
17	This seems odd. Perhaps I'm mistakend, but I thought with your patch we wouldn't decide to tail-fold with a VF of 1?
llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding.ll
805 ↗	(On Diff #511438)	I was confused at first, but actually this looks like an improvement - nice! I think before we were still using an interleave count of 1 as if we were tail-folding, but now we've fallen back on the non-tail-folding plan that uses interleaving.
llvm/test/Transforms/LoopVectorize/first-order-recurrence.ll
2915–2916	For what it's worth this new version looks better, but do you know why? Is it because we no longer allow tail-folding for a VF of 1? I assume it was trying to tail-fold before due to the low trip count.

dmgreen added a parent revision: D147720: [LV] Use the known trip count when costing non-tail folded VFs.Apr 24 2023, 2:31 AM

Rebase and address comments. Thanks for taking a look.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1551	Yeah this is a bit of a weird one nowadays. I had tried to remove it, but I think it makes more sense to keep around for specifying when a scalar epilogue cannot be used.
1555	For 3 it would be both with and without FoldTailByMasking. D145925 adds a way for the target to control the SEL more directly.
5073–5076	This falls though in both cases. In the code below we may find that the the MaxVF is a multiple of the known tripcount, and if so return plans with TailFold=false. Else it will not create TailFold=false plans, just using the TailFold=true. There are a lot of edge cases.
5082	I don't believe so. The ScalarEpilogueStatus tell us at a high level whether we should be predicating. FoldTailByMasking then controls whether the individual vplans are predicated. So it felt better to keep the original ScalarEpilogueStatus inplace to refer back to, in case we would like to differentiate between CM_ScalarEpilogueAllowed and CM_ScalarEpilogueNotNeededUsePredicate cases.
5460	Not in all cases. Cases that are always predicated (like CM_ScalarEpilogueNotAllowedUsePredicate or CM_ScalarEpilogueNotAllowedLowTripLoop) will have VF=1 with tail folding still. They won't have any unpredicated vplans.
7605	It feels like a separate parameter to me, that would want to be checked separately. I can change it if you think its better but there are places that call hasVF on it's own.
llvm/test/Transforms/LoopVectorize/first-order-recurrence.ll
2915–2916	This was because: // Don't use a predicated vector body if the user has forced a vectorization // factor of 1. if (UserVF.isScalar()) SEL = CM_ScalarEpilogueAllowed; It overrides the "tiny trip count" SEL that was applying previously. I think that for VF=1 it makes a lot sense to not predicated the body, but I've removed that from this patch to keep the diff down. We can re-add it in the future if needed.

Harbormaster completed remote builds in B229059: Diff 518200.Apr 29 2023, 12:32 PM

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

Vectorize/

LoopVectorizationLegality.h

15 lines

lib/

Transforms/

Vectorize/

LoopVectorizationLegality.cpp

11 lines

LoopVectorizationPlanner.h

76 lines

LoopVectorize.cpp

671 lines

VPlan.h

19 lines

VPlan.cpp

2 lines

test/

Transforms/

LoopVectorize/

AArch64/

masked-call.ll

120 lines

maximize-bandwidth-invalidate.ll

2 lines

sve-tail-folding-forced.ll

2 lines

tail-folding-styles.ll

43 lines

ARM/

mve-known-trip-count.ll

6 lines

tail-folding-reduces-vf.ll

4 lines

PowerPC/

reg-usage.ll

10 lines

RISCV/

riscv-vector-reverse.ll

6 lines

X86/

vect.omp.force.small-tc.ll

22 lines

first-order-recurrence-sink-replicate-region.ll

12 lines

first-order-recurrence.ll

202 lines

icmp-uniforms.ll

2 lines

pr45679-fold-tail-by-masking.ll

130 lines

vplan-sink-scalars-and-merge.ll

22 lines

Diff 508547

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h

Show First 20 Lines • Show All 369 Lines • ▼ Show 20 Lines	public:
uint64_t getMaxSafeVectorWidthInBits() const {		uint64_t getMaxSafeVectorWidthInBits() const {
return LAI->getDepChecker().getMaxSafeVectorWidthInBits();		return LAI->getDepChecker().getMaxSafeVectorWidthInBits();
}		}

bool hasStride(Value *V) { return LAI->hasStride(V); }		bool hasStride(Value *V) { return LAI->hasStride(V); }

/// Returns true if vector representation of the instruction \p I		/// Returns true if vector representation of the instruction \p I
/// requires mask.		/// requires mask.
bool isMaskRequired(const Instruction *I) const {		bool isMaskRequired(bool FoldTailByMasking, const Instruction *I) const {
return MaskedOp.contains(I);		return MaskedOp.contains(I) \|\|
		(FoldTailByMasking && FoldTailMaskedOp.contains(I));
}		}

unsigned getNumStores() const { return LAI->getNumStores(); }		unsigned getNumStores() const { return LAI->getNumStores(); }
unsigned getNumLoads() const { return LAI->getNumLoads(); }		unsigned getNumLoads() const { return LAI->getNumLoads(); }

/// Returns all assume calls in predicated blocks. They need to be dropped		/// Returns all assume calls in predicated blocks. They need to be dropped
/// when flattening the CFG.		/// when flattening the CFG.
const SmallPtrSetImpl<Instruction *> &getConditionalAssumes() const {		const SmallPtrSetImpl<Instruction *> &
return ConditionalAssumes;		getConditionalAssumes(bool FoldTailByMasking) const {
		return FoldTailByMasking ? FoldTailConditionalAssumes : ConditionalAssumes;
}		}

private:		private:
/// Return true if the pre-header, exiting and latch blocks of \p Lp and all		/// Return true if the pre-header, exiting and latch blocks of \p Lp and all
/// its nested loops are considered legal for vectorization. These legal		/// its nested loops are considered legal for vectorization. These legal
/// checks are common for inner and outer loop vectorization.		/// checks are common for inner and outer loop vectorization.
/// Temporarily taking UseVPlanNativePath parameter. If true, take		/// Temporarily taking UseVPlanNativePath parameter. If true, take
/// the new code path being implemented for outer loop vectorization		/// the new code path being implemented for outer loop vectorization
▲ Show 20 Lines • Show All 143 Lines • ▼ Show 20 Lines	private:
/// While vectorizing these instructions we have to generate a		/// While vectorizing these instructions we have to generate a
/// call to the appropriate masked intrinsic		/// call to the appropriate masked intrinsic
SmallPtrSet<const Instruction *, 8> MaskedOp;		SmallPtrSet<const Instruction *, 8> MaskedOp;

/// Assume instructions in predicated blocks must be dropped if the CFG gets		/// Assume instructions in predicated blocks must be dropped if the CFG gets
/// flattened.		/// flattened.
SmallPtrSet<Instruction *, 8> ConditionalAssumes;		SmallPtrSet<Instruction *, 8> ConditionalAssumes;

		/// Same as MaskedOp above when folding tail by masking.
		SmallPtrSet<const Instruction *, 8> FoldTailMaskedOp;
		/// Same as ConditionalAssumes above when folding tail by masking.
		SmallPtrSet<Instruction *, 8> FoldTailConditionalAssumes;

/// BFI and PSI are used to check for profile guided size optimizations.		/// BFI and PSI are used to check for profile guided size optimizations.
BlockFrequencyInfo *BFI;		BlockFrequencyInfo *BFI;
ProfileSummaryInfo *PSI;		ProfileSummaryInfo *PSI;
};		};

} // namespace llvm		} // namespace llvm

#endif // LLVM_TRANSFORMS_VECTORIZE_LOOPVECTORIZATIONLEGALITY_H		#endif // LLVM_TRANSFORMS_VECTORIZE_LOOPVECTORIZATIONLEGALITY_H

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

Show First 20 Lines • Show All 1,418 Lines • ▼ Show 20 Lines	for (User *U : AE->users()) {
<< *UI << "\n");		<< *UI << "\n");
return false;		return false;
}		}
}		}

// The list of pointers that we can safely read and write to remains empty.		// The list of pointers that we can safely read and write to remains empty.
SmallPtrSet<Value *, 8> SafePointers;		SmallPtrSet<Value *, 8> SafePointers;

SmallPtrSet<const Instruction *, 8> TmpMaskedOp;
SmallPtrSet<Instruction *, 8> TmpConditionalAssumes;

// Check and mark all blocks for predication, including those that ordinarily		// Check and mark all blocks for predication, including those that ordinarily
// do not need predication such as the header block.		// do not need predication such as the header block.
for (BasicBlock *BB : TheLoop->blocks()) {		for (BasicBlock *BB : TheLoop->blocks()) {
if (!blockCanBePredicated(BB, SafePointers, TmpMaskedOp,		if (!blockCanBePredicated(BB, SafePointers, FoldTailMaskedOp,
TmpConditionalAssumes)) {		FoldTailConditionalAssumes)) {
LLVM_DEBUG(dbgs() << "LV: Cannot fold tail by masking as requested.\n");		LLVM_DEBUG(dbgs() << "LV: Cannot fold tail by masking as requested.\n");
return false;		return false;
}		}
}		}

LLVM_DEBUG(dbgs() << "LV: can fold tail by masking.\n");		LLVM_DEBUG(dbgs() << "LV: can fold tail by masking.\n");

MaskedOp.insert(TmpMaskedOp.begin(), TmpMaskedOp.end());
ConditionalAssumes.insert(TmpConditionalAssumes.begin(),
TmpConditionalAssumes.end());

return true;		return true;
}		}

} // namespace llvm		} // namespace llvm

llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h

Show First 20 Lines • Show All 173 Lines • ▼ Show 20 Lines	public:

InsertPointGuard(const InsertPointGuard &) = delete;		InsertPointGuard(const InsertPointGuard &) = delete;
InsertPointGuard &operator=(const InsertPointGuard &) = delete;		InsertPointGuard &operator=(const InsertPointGuard &) = delete;

~InsertPointGuard() { Builder.restoreIP(VPInsertPoint(Block, Point)); }		~InsertPointGuard() { Builder.restoreIP(VPInsertPoint(Block, Point)); }
};		};
};		};

/// TODO: The following VectorizationFactor was pulled out of		/// TODO: The following VectorizationFactor was pulled out of
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions This comment needs to be moved to line no. 224. SjoerdMeijer: This comment needs to be moved to line no. 224.
/// LoopVectorizationCostModel class. LV also deals with		/// LoopVectorizationCostModel class. LV also deals with
/// VectorizerParams::VectorizationFactor and VectorizationCostTy.		/// VectorizerParams::VectorizationFactor and VectorizationCostTy.
/// We need to streamline them.		/// We need to streamline them.

/// Information about vectorization costs.		/// Information about vectorization costs.
struct VectorizationFactor {		struct VectorizationFactor {
/// Vector width with best cost.		/// Vector width with best cost.
ElementCount Width;		ElementCount Width;

		/// Whether the entire loop is predicated.
		bool FoldTailByMasking;

/// Cost of the loop with that width.		/// Cost of the loop with that width.
		david-armUnsubmitted Done Reply Inline Actions Perhaps this comment should now be something like: /// Cost of the loop with that width and vectorization style. What do you think? david-arm: Perhaps this comment should now be something like: /// Cost of the loop with that width and…
InstructionCost Cost;		InstructionCost Cost;

/// Cost of the scalar loop.		/// Cost of the scalar loop.
InstructionCost ScalarCost;		InstructionCost ScalarCost;

/// The minimum trip count required to make vectorization profitable, e.g. due		/// The minimum trip count required to make vectorization profitable, e.g. due
/// to runtime checks.		/// to runtime checks.
ElementCount MinProfitableTripCount;		ElementCount MinProfitableTripCount;

VectorizationFactor(ElementCount Width, InstructionCost Cost,		VectorizationFactor(ElementCount Width, bool FoldTailByMasking,
InstructionCost ScalarCost)		InstructionCost Cost, InstructionCost ScalarCost)
: Width(Width), Cost(Cost), ScalarCost(ScalarCost) {}		: Width(Width), FoldTailByMasking(FoldTailByMasking), Cost(Cost),
		ScalarCost(ScalarCost) {}

/// Width 1 means no vectorization, cost 0 means uncomputed cost.		/// Width 1 means no vectorization, cost 0 means uncomputed cost.
static VectorizationFactor Disabled() {		static VectorizationFactor Disabled() {
return {ElementCount::getFixed(1), 0, 0};		return {ElementCount::getFixed(1), false, 0, 0};
}		}

bool operator==(const VectorizationFactor &rhs) const {		bool operator==(const VectorizationFactor &rhs) const {
return Width == rhs.Width && Cost == rhs.Cost;		return Width == rhs.Width && FoldTailByMasking == rhs.FoldTailByMasking &&
		Cost == rhs.Cost;
}		}

bool operator!=(const VectorizationFactor &rhs) const {		bool operator!=(const VectorizationFactor &rhs) const {
return !(*this == rhs);		return !(*this == rhs);
}		}
};		};

/// A class that represents two vectorization factors (initialized with 0 by		/// A class that represents two vectorization factors (initialized with 0 by
Show All 40 Lines	class LoopVectorizationPlanner {
const TargetLibraryInfo *TLI;		const TargetLibraryInfo *TLI;

/// Target Transform Info.		/// Target Transform Info.
const TargetTransformInfo *TTI;		const TargetTransformInfo *TTI;

/// The legality analysis.		/// The legality analysis.
LoopVectorizationLegality *Legal;		LoopVectorizationLegality *Legal;

/// The profitability analysis.
LoopVectorizationCostModel &CM;

/// The interleaved access analysis.		/// The interleaved access analysis.
InterleavedAccessInfo &IAI;		InterleavedAccessInfo &IAI;

PredicatedScalarEvolution &PSE;		PredicatedScalarEvolution &PSE;

const LoopVectorizeHints &Hints;		const LoopVectorizeHints &Hints;

OptimizationRemarkEmitter *ORE;		OptimizationRemarkEmitter *ORE;

SmallVector<VPlanPtr, 4> VPlans;		SmallVector<VPlanPtr, 4> VPlans;

/// A builder used to construct the current plan.		/// A builder used to construct the current plan.
VPBuilder Builder;		VPBuilder Builder;

		/// Profitable vector factors.
		SmallVector<VectorizationFactor, 8> ProfitableVFs;

public:		public:
LoopVectorizationPlanner(Loop L, LoopInfo LI, const TargetLibraryInfo *TLI,		LoopVectorizationPlanner(Loop L, LoopInfo LI, const TargetLibraryInfo *TLI,
const TargetTransformInfo *TTI,		const TargetTransformInfo *TTI,
LoopVectorizationLegality *Legal,		LoopVectorizationLegality *Legal,
LoopVectorizationCostModel &CM,
InterleavedAccessInfo &IAI,		InterleavedAccessInfo &IAI,
PredicatedScalarEvolution &PSE,		PredicatedScalarEvolution &PSE,
const LoopVectorizeHints &Hints,		const LoopVectorizeHints &Hints,
OptimizationRemarkEmitter *ORE)		OptimizationRemarkEmitter *ORE)
: OrigLoop(L), LI(LI), TLI(TLI), TTI(TTI), Legal(Legal), CM(CM), IAI(IAI),		: OrigLoop(L), LI(LI), TLI(TLI), TTI(TTI), Legal(Legal), IAI(IAI),
PSE(PSE), Hints(Hints), ORE(ORE) {}		PSE(PSE), Hints(Hints), ORE(ORE) {}

/// Plan how to best vectorize, return the best VF and its cost, or		/// Plan how to best vectorize, return the best VF and its cost, or
		david-armUnsubmitted Done Reply Inline Actions I think the comment needs updating now because it no longer returns a VF. david-arm: I think the comment needs updating now because it no longer returns a VF.
/// std::nullopt if vectorization and interleaving should be avoided up front.		/// std::nullopt if vectorization and interleaving should be avoided up front.
std::optional<VectorizationFactor> plan(ElementCount UserVF, unsigned UserIC);		void plan(LoopVectorizationCostModel &CM, ElementCount UserVF,
		unsigned UserIC);

		/// \return The most profitable vectorization factor and the cost of that VF.
		/// This method checks every VF in the plans in \p VPlans. If UserVF is not
		/// ZERO then this vectorization factor will be selected if vectorization is
		/// possible.
		std::optional<VectorizationFactor> selectVectorizationFactor();

		VectorizationFactor
		selectEpilogueVectorizationFactor(const VectorizationFactor &MainVF);

/// Use the VPlan-native path to plan how to best vectorize, return the best		/// Use the VPlan-native path to plan how to best vectorize, return the best
/// VF and its cost.		/// VF and its cost.
VectorizationFactor planInVPlanNativePath(ElementCount UserVF);		VectorizationFactor planInVPlanNativePath(LoopVectorizationCostModel &CM,
		ElementCount UserVF);

/// Return the best VPlan for \p VF.		/// Return the best VPlan for \p VF.
VPlan &getBestPlanFor(ElementCount VF) const;		VPlan &getBestPlanFor(ElementCount VF, bool FoldTailByMasking) const;

/// Generate the IR code for the body of the vectorized loop according to the		/// Generate the IR code for the body of the vectorized loop according to the
/// best selected \p VF, \p UF and VPlan \p BestPlan.		/// best selected \p VF, \p UF and VPlan \p BestPlan.
/// TODO: \p IsEpilogueVectorization is needed to avoid issues due to epilogue		/// TODO: \p IsEpilogueVectorization is needed to avoid issues due to epilogue
/// vectorization re-using plans for both the main and epilogue vector loops.		/// vectorization re-using plans for both the main and epilogue vector loops.
/// It should be removed once the re-use issue has been fixed.		/// It should be removed once the re-use issue has been fixed.
void executePlan(ElementCount VF, unsigned UF, VPlan &BestPlan,		void executePlan(ElementCount VF, unsigned UF, VPlan &BestPlan,
InnerLoopVectorizer &LB, DominatorTree *DT,		InnerLoopVectorizer &LB, DominatorTree *DT,
bool IsEpilogueVectorization);		bool IsEpilogueVectorization);

#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)		#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)
void printPlans(raw_ostream &O);		void printPlans(raw_ostream &O);
#endif		#endif

/// Look through the existing plans and return true if we have one with all		/// Look through the existing plans and return true if we have one with all
/// the vectorization factors in question.		/// the vectorization factors in question.
bool hasPlanWithVF(ElementCount VF) const {		bool hasPlanWithVF(ElementCount VF, bool FoldTailByMasking) const {
return any_of(VPlans,		return any_of(VPlans, [&](const VPlanPtr &Plan) {
[&](const VPlanPtr &Plan) { return Plan->hasVF(VF); });		return Plan->hasVF(VF) && Plan->foldTailByMasking() == FoldTailByMasking;
		});
}		}

/// Test a \p Predicate on a \p Range of VF's. Return the value of applying		/// Test a \p Predicate on a \p Range of VF's. Return the value of applying
/// \p Predicate on Range.Start, possibly decreasing Range.End such that the		/// \p Predicate on Range.Start, possibly decreasing Range.End such that the
/// returned value holds for the entire \p Range.		/// returned value holds for the entire \p Range.
static bool		static bool
getDecisionAndClampRange(const std::function<bool(ElementCount)> &Predicate,		getDecisionAndClampRange(const std::function<bool(ElementCount)> &Predicate,
VFRange &Range);		VFRange &Range);

/// Check if the number of runtime checks exceeds the threshold.		/// Check if the number of runtime checks exceeds the threshold.
bool requiresTooManyRuntimeChecks() const;		bool requiresTooManyRuntimeChecks() const;

protected:		protected:
/// Build VPlans for power-of-2 VF's between \p MinVF and \p MaxVF inclusive,		/// Build VPlans for power-of-2 VF's between \p MinVF and \p MaxVF inclusive,
/// according to the information gathered by Legal when it checked if it is		/// according to the information gathered by Legal when it checked if it is
/// legal to vectorize the loop.		/// legal to vectorize the loop.
void buildVPlans(ElementCount MinVF, ElementCount MaxVF);		void buildVPlans(LoopVectorizationCostModel &CM, ElementCount MinVF,
		ElementCount MaxVF);

private:		private:
/// Build a VPlan according to the information gathered by Legal. \return a		/// Build a VPlan according to the information gathered by Legal. \return a
/// VPlan for vectorization factors \p Range.Start and up to \p Range.End		/// VPlan for vectorization factors \p Range.Start and up to \p Range.End
/// exclusive, possibly decreasing \p Range.End.		/// exclusive, possibly decreasing \p Range.End.
VPlanPtr buildVPlan(VFRange &Range);		VPlanPtr buildVPlan(LoopVectorizationCostModel &CM, VFRange &Range);

/// Build a VPlan using VPRecipes according to the information gather by		/// Build a VPlan using VPRecipes according to the information gather by
/// Legal. This method is only used for the legacy inner loop vectorizer.		/// Legal. This method is only used for the legacy inner loop vectorizer.
VPlanPtr		VPlanPtr
buildVPlanWithVPRecipes(VFRange &Range,		buildVPlanWithVPRecipes(LoopVectorizationCostModel &CM, VFRange &Range,
SmallPtrSetImpl<Instruction *> &DeadInstructions);		SmallPtrSetImpl<Instruction *> &DeadInstructions);

/// Build VPlans for power-of-2 VF's between \p MinVF and \p MaxVF inclusive,		/// Build VPlans for power-of-2 VF's between \p MinVF and \p MaxVF inclusive,
/// according to the information gathered by Legal when it checked if it is		/// according to the information gathered by Legal when it checked if it is
/// legal to vectorize the loop. This method creates VPlans using VPRecipes.		/// legal to vectorize the loop. This method creates VPlans using VPRecipes.
void buildVPlansWithVPRecipes(ElementCount MinVF, ElementCount MaxVF);		void buildVPlansWithVPRecipes(LoopVectorizationCostModel &CM,
		ElementCount MinVF, ElementCount MaxVF);

// Adjust the recipes for reductions. For in-loop reductions the chain of		// Adjust the recipes for reductions. For in-loop reductions the chain of
// instructions leading from the loop exit instr to the phi need to be		// instructions leading from the loop exit instr to the phi need to be
// converted to reductions, with one operand being vector and the other being		// converted to reductions, with one operand being vector and the other being
// the scalar reduction chain. For other reductions, a select is introduced		// the scalar reduction chain. For other reductions, a select is introduced
// between the phi and live-out recipes when folding the tail.		// between the phi and live-out recipes when folding the tail.
void adjustRecipesForReductions(VPBasicBlock *LatchVPBB, VPlanPtr &Plan,		void adjustRecipesForReductions(LoopVectorizationCostModel &CM,
		VPBasicBlock *LatchVPBB, VPlanPtr &Plan,
VPRecipeBuilder &RecipeBuilder,		VPRecipeBuilder &RecipeBuilder,
ElementCount MinVF);		ElementCount MinVF);

		/// Determines if we have the infrastructure to vectorize loop \p L and its
		/// epilogue, assuming the main loop is vectorized by \p VF.
		bool isCandidateForEpilogueVectorization(const Loop &L,
		const ElementCount VF) const;

		/// Returns true if the per-lane cost of VectorizationFactor A is lower than
		/// that of B.
		bool isMoreProfitable(const VectorizationFactor &A,
		const VectorizationFactor &B) const;

		/// Returns true if epilogue vectorization is considered profitable, and
		/// false otherwise.
		/// \p VF is the vectorization factor chosen for the original loop.
		bool isEpilogueVectorizationProfitable(const ElementCount VF) const;
};		};

} // namespace llvm		} // namespace llvm

#endif // LLVM_TRANSFORMS_VECTORIZE_LOOPVECTORIZATIONPLANNER_H		#endif // LLVM_TRANSFORMS_VECTORIZE_LOOPVECTORIZATIONPLANNER_H

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 230 Lines • ▼ Show 20 Lines	cl::values(clEnumValN(PreferPredicateTy::ScalarEpilogue,
"predicate-dont-vectorize",		"predicate-dont-vectorize",
"prefers tail-folding, don't attempt vectorization if "		"prefers tail-folding, don't attempt vectorization if "
"tail-folding fails.")));		"tail-folding fails.")));

static cl::opt<TailFoldingStyle> ForceTailFoldingStyle(		static cl::opt<TailFoldingStyle> ForceTailFoldingStyle(
"force-tail-folding-style", cl::desc("Force the tail folding style"),		"force-tail-folding-style", cl::desc("Force the tail folding style"),
cl::init(TailFoldingStyle::None),		cl::init(TailFoldingStyle::None),
cl::values(		cl::values(
clEnumValN(TailFoldingStyle::None, "none", "Disable tail folding"),
clEnumValN(		clEnumValN(
TailFoldingStyle::Data, "data",		TailFoldingStyle::Data, "data",
"Create lane mask for data only, using active.lane.mask intrinsic"),		"Create lane mask for data only, using active.lane.mask intrinsic"),
clEnumValN(TailFoldingStyle::DataWithoutLaneMask,		clEnumValN(TailFoldingStyle::DataWithoutLaneMask,
"data-without-lane-mask",		"data-without-lane-mask",
"Create lane mask with compare/stepvector"),		"Create lane mask with compare/stepvector"),
clEnumValN(TailFoldingStyle::DataAndControlFlow, "data-and-control",		clEnumValN(TailFoldingStyle::DataAndControlFlow, "data-and-control",
"Create lane mask using active.lane.mask intrinsic, and use "		"Create lane mask using active.lane.mask intrinsic, and use "
▲ Show 20 Lines • Show All 931 Lines • ▼ Show 20 Lines
/// vectorization.		/// vectorization.
/// In many cases vectorization is not profitable. This can happen because of		/// In many cases vectorization is not profitable. This can happen because of
/// a number of reasons. In this class we mainly attempt to predict the		/// a number of reasons. In this class we mainly attempt to predict the
/// expected speedup/slowdowns due to the supported instruction set. We use the		/// expected speedup/slowdowns due to the supported instruction set. We use the
/// TargetTransformInfo to query the different backends for the cost of		/// TargetTransformInfo to query the different backends for the cost of
/// different operations.		/// different operations.
class LoopVectorizationCostModel {		class LoopVectorizationCostModel {
public:		public:
LoopVectorizationCostModel(ScalarEpilogueLowering SEL, Loop *L,		LoopVectorizationCostModel(bool FoldTailByMasking, ScalarEpilogueLowering SEL,
PredicatedScalarEvolution &PSE, LoopInfo *LI,		Loop *L, PredicatedScalarEvolution &PSE,
LoopVectorizationLegality *Legal,		LoopInfo LI, LoopVectorizationLegality Legal,
const TargetTransformInfo &TTI,		const TargetTransformInfo &TTI,
const TargetLibraryInfo TLI, DemandedBits DB,		const TargetLibraryInfo TLI, DemandedBits DB,
AssumptionCache *AC,		AssumptionCache *AC,
OptimizationRemarkEmitter ORE, const Function F,		OptimizationRemarkEmitter ORE, const Function F,
const LoopVectorizeHints *Hints,		const LoopVectorizeHints *Hints,
InterleavedAccessInfo &IAI)		InterleavedAccessInfo &IAI)
: ScalarEpilogueStatus(SEL), TheLoop(L), PSE(PSE), LI(LI), Legal(Legal),		: ScalarEpilogueStatus(SEL), FoldTailByMasking(FoldTailByMasking),
TTI(TTI), TLI(TLI), DB(DB), AC(AC), ORE(ORE), TheFunction(F),		TheLoop(L), PSE(PSE), LI(LI), Legal(Legal), TTI(TTI), TLI(TLI), DB(DB),
Hints(Hints), InterleaveInfo(IAI) {}		AC(AC), ORE(ORE), TheFunction(F), Hints(Hints), InterleaveInfo(IAI) {}

/// \return An upper bound for the vectorization factors (both fixed and		/// \return An upper bound for the vectorization factors (both fixed and
/// scalable). If the factors are 0, vectorization and interleaving should be		/// scalable). If the factors are 0, vectorization and interleaving should be
/// avoided up front.		/// avoided up front.
FixedScalableVFPair computeMaxVF(ElementCount UserVF, unsigned UserIC);		FixedScalableVFPair computeMaxVF(ElementCount UserVF, unsigned UserIC);

/// \return True if runtime checks are required for vectorization, and false		/// \return True if runtime checks are required for vectorization, and false
/// otherwise.		/// otherwise.
bool runtimeChecksRequired();		bool runtimeChecksRequired();

/// \return The most profitable vectorization factor and the cost of that VF.
/// This method checks every VF in \p CandidateVFs. If UserVF is not ZERO
/// then this vectorization factor will be selected if vectorization is
/// possible.
VectorizationFactor
selectVectorizationFactor(const ElementCountSet &CandidateVFs);

VectorizationFactor
selectEpilogueVectorizationFactor(const ElementCount MaxVF,
const LoopVectorizationPlanner &LVP);

/// Setup cost-based decisions for user vectorization factor.		/// Setup cost-based decisions for user vectorization factor.
/// \return true if the UserVF is a feasible VF to be chosen.		/// \return true if the UserVF is a feasible VF to be chosen.
bool selectUserVectorizationFactor(ElementCount UserVF) {		bool selectUserVectorizationFactor(ElementCount UserVF) {
collectUniformsAndScalars(UserVF);		collectUniformsAndScalars(UserVF);
collectInstsToScalarize(UserVF);		collectInstsToScalarize(UserVF);
return expectedCost(UserVF).first.isValid();		return expectedCost(UserVF).first.isValid();
}		}

/// \return The size (in bits) of the smallest and widest types in the code		/// \return The size (in bits) of the smallest and widest types in the code
/// that needs to be vectorized. We ignore values that remain scalar such as		/// that needs to be vectorized. We ignore values that remain scalar such as
/// 64 bit loop indices.		/// 64 bit loop indices.
std::pair<unsigned, unsigned> getSmallestAndWidestTypes();		std::pair<unsigned, unsigned> getSmallestAndWidestTypes();

/// \return The desired interleave count.		/// \return The desired interleave count.
/// If interleave count has been specified by metadata it will be returned.		/// If interleave count has been specified by metadata it will be returned.
/// Otherwise, the interleave count is computed and returned. VF and LoopCost		/// Otherwise, the interleave count is computed and returned. VF and LoopCost
/// are the selected vectorization factor and the cost of the selected VF.		/// are the selected vectorization factor and the cost of the selected VF.
unsigned selectInterleaveCount(ElementCount VF, InstructionCost LoopCost);		unsigned selectInterleaveCount(VectorizationFactor VF);

/// Memory access instruction may be vectorized in more than one way.		/// Memory access instruction may be vectorized in more than one way.
/// Form of instruction after vectorization depends on cost.		/// Form of instruction after vectorization depends on cost.
/// This function takes cost-based decisions for Load/Store instructions		/// This function takes cost-based decisions for Load/Store instructions
/// and collects them in a map. This decisions map is used for building		/// and collects them in a map. This decisions map is used for building
/// the lists of loop-uniform and loop-scalar instructions.		/// the lists of loop-uniform and loop-scalar instructions.
/// The calculated cost is saved with widening decision in order to		/// The calculated cost is saved with widening decision in order to
/// avoid redundant calculations.		/// avoid redundant calculations.
▲ Show 20 Lines • Show All 309 Lines • ▼ Show 20 Lines	if (!isScalarEpilogueAllowed())
return false;		return false;
// If we might exit from anywhere but the latch, must run the exiting		// If we might exit from anywhere but the latch, must run the exiting
// iteration in scalar form.		// iteration in scalar form.
if (TheLoop->getExitingBlock() != TheLoop->getLoopLatch())		if (TheLoop->getExitingBlock() != TheLoop->getLoopLatch())
return true;		return true;
return VF.isVector() && InterleaveInfo.requiresScalarEpilogue();		return VF.isVector() && InterleaveInfo.requiresScalarEpilogue();
}		}

/// Returns true if a scalar epilogue is not allowed due to optsize or a		/// Returns true if a scalar epilogue is not allowed due to optsize or a
		david-armUnsubmitted Not Done Reply Inline Actions This is not your fault, but whilst you are here could you update the comment to reflect the function behaviour? It looks like the comment probably got out of sync with the function at some point. :) Perhaps something like: /// Returns true if a scalar epilogue is allowed. It may return false if: /// 1. We are optimising for code size /// 2. There is a loop hint annotation /// etc. david-arm: This is not your fault, but whilst you are here could you update the comment to reflect the…
		dmgreenAuthorUnsubmitted Done Reply Inline Actions Yeah this is a bit of a weird one nowadays. I had tried to remove it, but I think it makes more sense to keep around for specifying when a scalar epilogue cannot be used. dmgreen: Yeah this is a bit of a weird one nowadays. I had tried to remove it, but I think it makes more…
/// loop hint annotation.		/// loop hint annotation.
bool isScalarEpilogueAllowed() const {		bool isScalarEpilogueAllowed() const {
return ScalarEpilogueStatus == CM_ScalarEpilogueAllowed;		return ScalarEpilogueStatus == CM_ScalarEpilogueAllowed \|\|
		(!FoldTailByMasking &&
		david-armUnsubmitted Not Done Reply Inline Actions Do you know the scenarios in which we aren't folding the tail by masking, but a scalar epilogue is not needed? I think CM_ScalarEpilogueNotNeededUsePredicate is only set for these cases: The user has supplied a hint. The user has set the prefer-predicate-over-epilogue flag. The TTI hook preferPredicateOverEpilogue has returned true. but I'd expect that for at least 3) we've set FoldTailByMasking to true? david-arm: Do you know the scenarios in which we aren't folding the tail by masking, but a scalar epilogue…
		dmgreenAuthorUnsubmitted Done Reply Inline Actions For 3 it would be both with and without FoldTailByMasking. D145925 adds a way for the target to control the SEL more directly. dmgreen: For 3 it would be both with and without FoldTailByMasking. D145925 adds a way for the target to…
		ScalarEpilogueStatus == CM_ScalarEpilogueNotNeededUsePredicate);
}		}

/// Returns the TailFoldingStyle that is best for the current loop.		/// Returns the TailFoldingStyle that is best for the current loop.
TailFoldingStyle		TailFoldingStyle
getTailFoldingStyle(bool IVUpdateMayOverflow = true) const {		getTailFoldingStyle(bool IVUpdateMayOverflow = true) const {
if (!CanFoldTailByMasking)		if (!FoldTailByMasking)
return TailFoldingStyle::None;		return TailFoldingStyle::None;

if (ForceTailFoldingStyle.getNumOccurrences())		if (ForceTailFoldingStyle.getNumOccurrences())
return ForceTailFoldingStyle;		return ForceTailFoldingStyle;

return TTI.getPreferredTailFoldingStyle(IVUpdateMayOverflow);		return TTI.getPreferredTailFoldingStyle(IVUpdateMayOverflow);
}		}

/// Returns true if all loop blocks should be masked to fold tail loop.		/// Returns true if all loop blocks should be masked to fold tail loop.
bool foldTailByMasking() const {		bool foldTailByMasking() const { return FoldTailByMasking; }
return getTailFoldingStyle() != TailFoldingStyle::None;
}

/// Returns true if the instructions in this block requires predication		/// Returns true if the instructions in this block requires predication
/// for any reason, e.g. because tail folding now requires a predicate		/// for any reason, e.g. because tail folding now requires a predicate
/// or because the block in the original loop was predicated.		/// or because the block in the original loop was predicated.
bool blockNeedsPredicationForAnyReason(BasicBlock *BB) const {		bool blockNeedsPredicationForAnyReason(BasicBlock *BB) const {
return foldTailByMasking() \|\| Legal->blockNeedsPredication(BB);		return foldTailByMasking() \|\| Legal->blockNeedsPredication(BB);
}		}

Show All 22 Lines	public:
/// VF. Return the cost of the instruction, including scalarization overhead		/// VF. Return the cost of the instruction, including scalarization overhead
/// if it's needed. The flag NeedToScalarize shows if the call needs to be		/// if it's needed. The flag NeedToScalarize shows if the call needs to be
/// scalarized -		/// scalarized -
/// i.e. either vector version isn't available, or is too expensive.		/// i.e. either vector version isn't available, or is too expensive.
InstructionCost getVectorCallCost(CallInst *CI, ElementCount VF,		InstructionCost getVectorCallCost(CallInst *CI, ElementCount VF,
Function **Variant,		Function **Variant,
bool *NeedsMask = nullptr) const;		bool *NeedsMask = nullptr) const;

/// Returns true if the per-lane cost of VectorizationFactor A is lower than
/// that of B.
bool isMoreProfitable(const VectorizationFactor &A,
const VectorizationFactor &B) const;

/// Invalidates decisions already taken by the cost model.		/// Invalidates decisions already taken by the cost model.
void invalidateCostModelingDecisions() {		void invalidateCostModelingDecisions() {
WideningDecisions.clear();		WideningDecisions.clear();
Uniforms.clear();		Uniforms.clear();
Scalars.clear();		Scalars.clear();
}		}

/// Convenience function that returns the value of vscale_range iff		/// The vectorization cost is a combination of the cost itself and a boolean
/// vscale_range.min == vscale_range.max or otherwise returns the value		/// indicating whether any of the contributing operations will actually
/// returned by the corresponding TLI method.		/// operate on vector values after type legalization in the backend. If this
std::optional<unsigned> getVScaleForTuning() const;		/// latter value is false, then all operations will be scalarized (i.e. no
		/// vectorization has actually taken place).
		using VectorizationCostTy = std::pair<InstructionCost, bool>;

		/// Returns the expected execution cost. The unit of the cost does
		/// not matter because we use the 'cost' units to compare different
		/// vector widths. The cost that is returned is not normalized by
		/// the factor width. If \p Invalid is not nullptr, this function
		/// will add a pair(Instruction*, ElementCount) to \p Invalid for
		/// each instruction that has an Invalid cost for the given VF.
		VectorizationCostTy
		expectedCost(ElementCount VF,
		SmallVectorImpl<InstructionVFPair> *Invalid = nullptr);

		/// Return the NumPredStores, to be checked by the Planner.
		unsigned getNumPredStores() { return NumPredStores; }

private:		private:
unsigned NumPredStores = 0;		unsigned NumPredStores = 0;

/// \return An upper bound for the vectorization factors for both		/// \return An upper bound for the vectorization factors for both
/// fixed and scalable vectorization, where the minimum-known number of		/// fixed and scalable vectorization, where the minimum-known number of
/// elements is a power-of-2 larger than zero. If scalable vectorization is		/// elements is a power-of-2 larger than zero. If scalable vectorization is
/// disabled or unsupported, then the scalable part will be equal to		/// disabled or unsupported, then the scalable part will be equal to
Show All 10 Lines	ElementCount getMaximizedVFForTarget(unsigned ConstTripCount,
unsigned WidestType,		unsigned WidestType,
ElementCount MaxSafeVF,		ElementCount MaxSafeVF,
bool FoldTailByMasking);		bool FoldTailByMasking);

/// \return the maximum legal scalable VF, based on the safe max number		/// \return the maximum legal scalable VF, based on the safe max number
/// of elements.		/// of elements.
ElementCount getMaxLegalScalableVF(unsigned MaxSafeElements);		ElementCount getMaxLegalScalableVF(unsigned MaxSafeElements);

/// The vectorization cost is a combination of the cost itself and a boolean
/// indicating whether any of the contributing operations will actually
/// operate on vector values after type legalization in the backend. If this
/// latter value is false, then all operations will be scalarized (i.e. no
/// vectorization has actually taken place).
using VectorizationCostTy = std::pair<InstructionCost, bool>;

/// Returns the expected execution cost. The unit of the cost does
/// not matter because we use the 'cost' units to compare different
/// vector widths. The cost that is returned is not normalized by
/// the factor width. If \p Invalid is not nullptr, this function
/// will add a pair(Instruction*, ElementCount) to \p Invalid for
/// each instruction that has an Invalid cost for the given VF.
VectorizationCostTy
expectedCost(ElementCount VF,
SmallVectorImpl<InstructionVFPair> *Invalid = nullptr);

/// Returns the execution time cost of an instruction for a given vector		/// Returns the execution time cost of an instruction for a given vector
/// width. Vector width of one means scalar.		/// width. Vector width of one means scalar.
VectorizationCostTy getInstructionCost(Instruction *I, ElementCount VF);		VectorizationCostTy getInstructionCost(Instruction *I, ElementCount VF);

/// The cost-computation logic from getInstructionCost which provides		/// The cost-computation logic from getInstructionCost which provides
/// the vector type as an output parameter.		/// the vector type as an output parameter.
InstructionCost getInstructionCost(Instruction *I, ElementCount VF,		InstructionCost getInstructionCost(Instruction *I, ElementCount VF,
Type *&VectorTy);		Type *&VectorTy);
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	private:
/// aliasing/dependence checks fail, or to handle the tail/remainder		/// aliasing/dependence checks fail, or to handle the tail/remainder
/// iterations when the trip count is unknown or doesn't divide by the VF,		/// iterations when the trip count is unknown or doesn't divide by the VF,
/// or as a peel-loop to handle gaps in interleave-groups.		/// or as a peel-loop to handle gaps in interleave-groups.
/// Under optsize and when the trip count is very small we don't allow any		/// Under optsize and when the trip count is very small we don't allow any
/// iterations to execute in the scalar loop.		/// iterations to execute in the scalar loop.
ScalarEpilogueLowering ScalarEpilogueStatus = CM_ScalarEpilogueAllowed;		ScalarEpilogueLowering ScalarEpilogueStatus = CM_ScalarEpilogueAllowed;

/// All blocks of loop are to be masked to fold tail of scalar iterations.		/// All blocks of loop are to be masked to fold tail of scalar iterations.
bool CanFoldTailByMasking = false;		bool FoldTailByMasking = false;

/// A map holding scalar costs for different vectorization factors. The		/// A map holding scalar costs for different vectorization factors. The
/// presence of a cost for an instruction in the mapping indicates that the		/// presence of a cost for an instruction in the mapping indicates that the
/// instruction will be scalarized when vectorizing with the associated		/// instruction will be scalarized when vectorizing with the associated
/// vectorization factor. The entries are VF-ScalarCostTy pairs.		/// vectorization factor. The entries are VF-ScalarCostTy pairs.
DenseMap<ElementCount, ScalarCostsTy> InstsToScalarize;		DenseMap<ElementCount, ScalarCostsTy> InstsToScalarize;

/// Holds the instructions known to be uniform after vectorization.		/// Holds the instructions known to be uniform after vectorization.
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	private:

/// Returns a range containing only operands needing to be extracted.		/// Returns a range containing only operands needing to be extracted.
SmallVector<Value *, 4> filterExtractingOperands(Instruction::op_range Ops,		SmallVector<Value *, 4> filterExtractingOperands(Instruction::op_range Ops,
ElementCount VF) const {		ElementCount VF) const {
return SmallVector<Value *, 4>(make_filter_range(		return SmallVector<Value *, 4>(make_filter_range(
Ops, [this, VF](Value *V) { return this->needsExtract(V, VF); }));		Ops, [this, VF](Value *V) { return this->needsExtract(V, VF); }));
}		}

/// Determines if we have the infrastructure to vectorize loop \p L and its
/// epilogue, assuming the main loop is vectorized by \p VF.
bool isCandidateForEpilogueVectorization(const Loop &L,
const ElementCount VF) const;

/// Returns true if epilogue vectorization is considered profitable, and
/// false otherwise.
/// \p VF is the vectorization factor chosen for the original loop.
bool isEpilogueVectorizationProfitable(const ElementCount VF) const;

public:		public:
/// The loop that we evaluate.		/// The loop that we evaluate.
Loop *TheLoop;		Loop *TheLoop;

/// Predicated scalar evolution analysis.		/// Predicated scalar evolution analysis.
PredicatedScalarEvolution &PSE;		PredicatedScalarEvolution &PSE;

/// Loop Info analysis.		/// Loop Info analysis.
Show All 29 Lines	public:
/// Values to ignore in the cost model.		/// Values to ignore in the cost model.
SmallPtrSet<const Value *, 16> ValuesToIgnore;		SmallPtrSet<const Value *, 16> ValuesToIgnore;

/// Values to ignore in the cost model when VF > 1.		/// Values to ignore in the cost model when VF > 1.
SmallPtrSet<const Value *, 16> VecValuesToIgnore;		SmallPtrSet<const Value *, 16> VecValuesToIgnore;

/// All element types found in the loop.		/// All element types found in the loop.
SmallPtrSet<Type *, 16> ElementTypesInLoop;		SmallPtrSet<Type *, 16> ElementTypesInLoop;

/// Profitable vector factors.
SmallVector<VectorizationFactor, 8> ProfitableVFs;
};		};
} // end namespace llvm		} // end namespace llvm

namespace {		namespace {
/// Helper struct to manage generating runtime checks for vectorization.		/// Helper struct to manage generating runtime checks for vectorization.
///		///
/// The runtime checks are created up-front in temporary blocks to allow better		/// The runtime checks are created up-front in temporary blocks to allow better
/// estimating the cost and un-linked from the existing IR. After deciding to		/// estimating the cost and un-linked from the existing IR. After deciding to
▲ Show 20 Lines • Show All 1,596 Lines • ▼ Show 20 Lines	if (VecFunc) {
VectorType::get(		VectorType::get(
IntegerType::getInt1Ty(VecFunc->getFunctionType()->getContext()),		IntegerType::getInt1Ty(VecFunc->getFunctionType()->getContext()),
VF));		VF));
}		}
}		}

// We don't support masked function calls yet, but we can scalarize a		// We don't support masked function calls yet, but we can scalarize a
// masked call with branches (unless VF is scalable).		// masked call with branches (unless VF is scalable).
if (!TLI \|\| CI->isNoBuiltin() \|\| !VecFunc \|\| Legal->isMaskRequired(CI))		if (!TLI \|\| CI->isNoBuiltin() \|\| !VecFunc \|\|
		Legal->isMaskRequired(foldTailByMasking(), CI))
return VF.isScalable() ? InstructionCost::getInvalid() : Cost;		return VF.isScalable() ? InstructionCost::getInvalid() : Cost;

// If the corresponding vector cost is cheaper, return its cost.		// If the corresponding vector cost is cheaper, return its cost.
InstructionCost VectorCallCost =		InstructionCost VectorCallCost =
TTI.getCallInstrCost(nullptr, RetTy, Tys, CostKind) + MaskCost;		TTI.getCallInstrCost(nullptr, RetTy, Tys, CostKind) + MaskCost;
if (VectorCallCost < Cost) {		if (VectorCallCost < Cost) {
*Variant = VecFunc;		*Variant = VecFunc;
Cost = VectorCallCost;		Cost = VectorCallCost;
▲ Show 20 Lines • Show All 407 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::fixReduction(VPReductionPHIRecipe *PhiR,

VPBasicBlock *LatchVPBB =		VPBasicBlock *LatchVPBB =
PhiR->getParent()->getEnclosingLoopRegion()->getExitingBasicBlock();		PhiR->getParent()->getEnclosingLoopRegion()->getExitingBasicBlock();
BasicBlock *VectorLoopLatch = State.CFG.VPBB2IRBB[LatchVPBB];		BasicBlock *VectorLoopLatch = State.CFG.VPBB2IRBB[LatchVPBB];
// If tail is folded by masking, the vector value to leave the loop should be		// If tail is folded by masking, the vector value to leave the loop should be
// a Select choosing between the vectorized LoopExitInst and vectorized Phi,		// a Select choosing between the vectorized LoopExitInst and vectorized Phi,
// instead of the former. For an inloop reduction the reduction will already		// instead of the former. For an inloop reduction the reduction will already
// be predicated, and does not need to be handled here.		// be predicated, and does not need to be handled here.
if (Cost->foldTailByMasking() && !PhiR->isInLoop()) {		if (State.Plan->foldTailByMasking() && !PhiR->isInLoop()) {
for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
Value *VecLoopExitInst = State.get(LoopExitInstDef, Part);		Value *VecLoopExitInst = State.get(LoopExitInstDef, Part);
SelectInst *Sel = nullptr;		SelectInst *Sel = nullptr;
for (User *U : VecLoopExitInst->users()) {		for (User *U : VecLoopExitInst->users()) {
if (isa<SelectInst>(U)) {		if (isa<SelectInst>(U)) {
assert(!Sel && "Reduction exit feeding two selects");		assert(!Sel && "Reduction exit feeding two selects");
Sel = cast<SelectInst>(U);		Sel = cast<SelectInst>(U);
} else		} else
▲ Show 20 Lines • Show All 510 Lines • ▼ Show 20 Lines	bool LoopVectorizationCostModel::isPredicatedInst(Instruction *I) const {

// Can we prove this instruction is safe to unconditionally execute?		// Can we prove this instruction is safe to unconditionally execute?
// If not, we must use some form of predication.		// If not, we must use some form of predication.
switch(I->getOpcode()) {		switch(I->getOpcode()) {
default:		default:
return false;		return false;
case Instruction::Load:		case Instruction::Load:
case Instruction::Store: {		case Instruction::Store: {
if (!Legal->isMaskRequired(I))		if (!Legal->isMaskRequired(foldTailByMasking(), I))
return false;		return false;
// When we know the load's address is loop invariant and the instruction		// When we know the load's address is loop invariant and the instruction
// in the original scalar loop was unconditionally executed then we		// in the original scalar loop was unconditionally executed then we
// don't need to mark it as a predicated instruction. Tail folding may		// don't need to mark it as a predicated instruction. Tail folding may
// introduce additional predication, but we're guaranteed to always have		// introduce additional predication, but we're guaranteed to always have
// at least one active lane. We call Legal->blockNeedsPredication here		// at least one active lane. We call Legal->blockNeedsPredication here
// because it doesn't query tail-folding. For stores, we need to prove		// because it doesn't query tail-folding. For stores, we need to prove
// both speculation safety (which follows from the same argument as loads),		// both speculation safety (which follows from the same argument as loads),
Show All 10 Lines	bool LoopVectorizationCostModel::isPredicatedInst(Instruction *I) const {
case Instruction::UDiv:		case Instruction::UDiv:
case Instruction::SDiv:		case Instruction::SDiv:
case Instruction::SRem:		case Instruction::SRem:
case Instruction::URem:		case Instruction::URem:
// TODO: We can use the loop-preheader as context point here and get		// TODO: We can use the loop-preheader as context point here and get
// context sensitive reasoning		// context sensitive reasoning
return !isSafeToSpeculativelyExecute(I);		return !isSafeToSpeculativelyExecute(I);
case Instruction::Call:		case Instruction::Call:
return Legal->isMaskRequired(I);		return Legal->isMaskRequired(foldTailByMasking(), I);
}		}
}		}

std::pair<InstructionCost, InstructionCost>		std::pair<InstructionCost, InstructionCost>
LoopVectorizationCostModel::getDivRemSpeculationCost(Instruction *I,		LoopVectorizationCostModel::getDivRemSpeculationCost(Instruction *I,
ElementCount VF) const {		ElementCount VF) const {
assert(I->getOpcode() == Instruction::UDiv \|\|		assert(I->getOpcode() == Instruction::UDiv \|\|
I->getOpcode() == Instruction::SDiv \|\|		I->getOpcode() == Instruction::SDiv \|\|
I->getOpcode() == Instruction::SRem \|\|		I->getOpcode() == Instruction::SRem \|\|
I->getOpcode() == Instruction::URem);		I->getOpcode() == Instruction::URem);
assert(!isSafeToSpeculativelyExecute(I));		assert(!isSafeToSpeculativelyExecute(I));

const TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;		const TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;

▲ Show 20 Lines • Show All 89 Lines • ▼ Show 20 Lines	bool LoopVectorizationCostModel::interleavedAccessCanBeWidened(

// Check if masking is required.		// Check if masking is required.
// A Group may need masking for one of two reasons: it resides in a block that		// A Group may need masking for one of two reasons: it resides in a block that
// needs predication, or it was decided to use masking to deal with gaps		// needs predication, or it was decided to use masking to deal with gaps
// (either a gap at the end of a load-access that may result in a speculative		// (either a gap at the end of a load-access that may result in a speculative
// load, or any gaps in a store-access).		// load, or any gaps in a store-access).
bool PredicatedAccessRequiresMasking =		bool PredicatedAccessRequiresMasking =
blockNeedsPredicationForAnyReason(I->getParent()) &&		blockNeedsPredicationForAnyReason(I->getParent()) &&
Legal->isMaskRequired(I);		Legal->isMaskRequired(foldTailByMasking(), I);
bool LoadAccessWithGapsRequiresEpilogMasking =		bool LoadAccessWithGapsRequiresEpilogMasking =
isa<LoadInst>(I) && Group->requiresScalarEpilogue() &&		isa<LoadInst>(I) && Group->requiresScalarEpilogue() &&
!isScalarEpilogueAllowed();		!isScalarEpilogueAllowed();
bool StoreAccessWithGapsRequiresMasking =		bool StoreAccessWithGapsRequiresMasking =
isa<StoreInst>(I) && (Group->getNumMembers() < Group->getFactor());		isa<StoreInst>(I) && (Group->getNumMembers() < Group->getFactor());
if (!PredicatedAccessRequiresMasking &&		if (!PredicatedAccessRequiresMasking &&
!LoadAccessWithGapsRequiresEpilogMasking &&		!LoadAccessWithGapsRequiresEpilogMasking &&
!StoreAccessWithGapsRequiresMasking)		!StoreAccessWithGapsRequiresMasking)
▲ Show 20 Lines • Show All 483 Lines • ▼ Show 20 Lines	reportVectorizationFailure("Single iteration (non) loop",
"SingleIterationLoop", ORE, TheLoop);		"SingleIterationLoop", ORE, TheLoop);
return FixedScalableVFPair::getNone();		return FixedScalableVFPair::getNone();
}		}

switch (ScalarEpilogueStatus) {		switch (ScalarEpilogueStatus) {
case CM_ScalarEpilogueAllowed:		case CM_ScalarEpilogueAllowed:
return computeFeasibleMaxVF(TC, UserVF, false);		return computeFeasibleMaxVF(TC, UserVF, false);
case CM_ScalarEpilogueNotAllowedUsePredicate:		case CM_ScalarEpilogueNotAllowedUsePredicate:
[[fallthrough]];		LLVM_DEBUG(dbgs() << "LV: vector predicate hint/switch found.\n"
		<< "LV: Not allowing scalar epilogue, creating "
		"predicated vector loop.\n");
		break;
		david-armUnsubmitted Done Reply Inline Actions In this case it looks like we're going to create two sets of plans - one with tail-folding and one without - and in both cases we're actually going to tail-fold anyway. Is that right? Not saying we shouldn't do that, but just trying to understand how this works. david-arm: In this case it looks like we're going to create two sets of plans - one with tail-folding and…
		dmgreenAuthorUnsubmitted Done Reply Inline Actions This falls though in both cases. In the code below we may find that the the MaxVF is a multiple of the known tripcount, and if so return plans with TailFold=false. Else it will not create TailFold=false plans, just using the TailFold=true. There are a lot of edge cases. dmgreen: This falls though in both cases. In the code below we may find that the the MaxVF is a multiple…
case CM_ScalarEpilogueNotNeededUsePredicate:		case CM_ScalarEpilogueNotNeededUsePredicate:
LLVM_DEBUG(		// If this cost model is for predicated plans then fall through to the
dbgs() << "LV: vector predicate hint/switch found.\n"		// prepareToFoldTailByMasking checks below, else return the unpredicated max
<< "LV: Not allowing scalar epilogue, creating predicated "		// size.
<< "vector loop.\n");		if (!FoldTailByMasking)
		return computeFeasibleMaxVF(TC, UserVF, false);
		david-armUnsubmitted Not Done Reply Inline Actions Don't we have to also set `ScalarEpilogueStatus = CM_ScalarEpilogueAllowed` here? david-arm: Don't we have to also set `ScalarEpilogueStatus = CM_ScalarEpilogueAllowed` here?
		dmgreenAuthorUnsubmitted Done Reply Inline Actions I don't believe so. The ScalarEpilogueStatus tell us at a high level whether we should be predicating. FoldTailByMasking then controls whether the individual vplans are predicated. So it felt better to keep the original ScalarEpilogueStatus inplace to refer back to, in case we would like to differentiate between CM_ScalarEpilogueAllowed and CM_ScalarEpilogueNotNeededUsePredicate cases. dmgreen: I don't believe so. The ScalarEpilogueStatus tell us at a high level whether we should be…
		LLVM_DEBUG(dbgs() << "LV: vector predicate hint/switch found.\n"
		<< "LV: Trying predicated vector loop.\n");
break;		break;
case CM_ScalarEpilogueNotAllowedLowTripLoop:		case CM_ScalarEpilogueNotAllowedLowTripLoop:
// fallthrough as a special case of OptForSize		// fallthrough as a special case of OptForSize
case CM_ScalarEpilogueNotAllowedOptSize:		case CM_ScalarEpilogueNotAllowedOptSize:
if (ScalarEpilogueStatus == CM_ScalarEpilogueNotAllowedOptSize)		if (ScalarEpilogueStatus == CM_ScalarEpilogueNotAllowedOptSize)
LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << "LV: Not allowing scalar epilogue due to -Os/-Oz.\n");		dbgs() << "LV: Not allowing scalar epilogue due to -Os/-Oz.\n");
else		else
LLVM_DEBUG(dbgs() << "LV: Not allowing scalar epilogue due to low trip "		LLVM_DEBUG(dbgs() << "LV: Not allowing scalar epilogue due to low trip "
<< "count.\n");		<< "count.\n");

// Bail if runtime checks are required, which are not good when optimising		// Bail if runtime checks are required, which are not good when optimising
// for size.		// for size.
if (runtimeChecksRequired())		if (runtimeChecksRequired())
return FixedScalableVFPair::getNone();		return FixedScalableVFPair::getNone();

break;		break;
}		}

// The only loops we can vectorize without a scalar epilogue, are loops with		// The only loops we can vectorize without a scalar epilogue, are loops with
// a bottom-test and a single exiting block. We'd have to handle the fact		// a bottom-test and a single exiting block. We'd have to handle the fact
// that not every instruction executes on the last iteration. This will		// that not every instruction executes on the last iteration. This will
// require a lane mask which varies through the vector loop body. (TODO)		// require a lane mask which varies through the vector loop body. (TODO)
if (TheLoop->getExitingBlock() != TheLoop->getLoopLatch()) {		if (TheLoop->getExitingBlock() != TheLoop->getLoopLatch())
// If there was a tail-folding hint/switch, but we can't fold the tail by
// masking, fallback to a vectorization with a scalar epilogue.
if (ScalarEpilogueStatus == CM_ScalarEpilogueNotNeededUsePredicate) {
LLVM_DEBUG(dbgs() << "LV: Cannot fold tail by masking: vectorize with a "
"scalar epilogue instead.\n");
ScalarEpilogueStatus = CM_ScalarEpilogueAllowed;
return computeFeasibleMaxVF(TC, UserVF, false);
}
return FixedScalableVFPair::getNone();		return FixedScalableVFPair::getNone();
}

// Now try the tail folding		// Now try the tail folding

// Invalidate interleave groups that require an epilogue if we can't mask		// Invalidate interleave groups that require an epilogue if we can't mask
// the interleave-group.		// the interleave-group.
if (!useMaskedInterleavedAccesses(TTI)) {		if (!useMaskedInterleavedAccesses(TTI)) {
assert(WideningDecisions.empty() && Uniforms.empty() && Scalars.empty() &&		assert(WideningDecisions.empty() && Uniforms.empty() && Scalars.empty() &&
"No decisions should have been taken at this point");		"No decisions should have been taken at this point");
Show All 18 Lines	if (MaxFactors.FixedVF.isVector() && !MaxFactors.ScalableVF) {
const SCEV *ExitCount = SE->getAddExpr(		const SCEV *ExitCount = SE->getAddExpr(
BackedgeTakenCount, SE->getOne(BackedgeTakenCount->getType()));		BackedgeTakenCount, SE->getOne(BackedgeTakenCount->getType()));
const SCEV *Rem = SE->getURemExpr(		const SCEV *Rem = SE->getURemExpr(
SE->applyLoopGuards(ExitCount, TheLoop),		SE->applyLoopGuards(ExitCount, TheLoop),
SE->getConstant(BackedgeTakenCount->getType(), MaxVFtimesIC));		SE->getConstant(BackedgeTakenCount->getType(), MaxVFtimesIC));
if (Rem->isZero()) {		if (Rem->isZero()) {
// Accept MaxFixedVF if we do not have a tail.		// Accept MaxFixedVF if we do not have a tail.
LLVM_DEBUG(dbgs() << "LV: No tail will remain for any chosen VF.\n");		LLVM_DEBUG(dbgs() << "LV: No tail will remain for any chosen VF.\n");
return MaxFactors;		return FoldTailByMasking ? FixedScalableVFPair::getNone() : MaxFactors;
}		}
}		}

		// If this cost model is not for tail folding then return at this point and
		// leave it for the other model.
		if (!FoldTailByMasking &&
		ScalarEpilogueStatus != CM_ScalarEpilogueNotNeededUsePredicate)
		return FixedScalableVFPair::getNone();

// If we don't know the precise trip count, or if the trip count that we		// If we don't know the precise trip count, or if the trip count that we
// found modulo the vectorization factor is not zero, try to fold the tail		// found modulo the vectorization factor is not zero, try to fold the tail
// by masking.		// by masking.
// FIXME: look for a smaller MaxVF that does divide TC rather than masking.		// FIXME: look for a smaller MaxVF that does divide TC rather than masking.
if (Legal->prepareToFoldTailByMasking()) {		if (Legal->prepareToFoldTailByMasking()) {
CanFoldTailByMasking = true;		assert(FoldTailByMasking);
return MaxFactors;
}

// If there was a tail-folding hint/switch, but we can't fold the tail by
// masking, fallback to a vectorization with a scalar epilogue.
if (ScalarEpilogueStatus == CM_ScalarEpilogueNotNeededUsePredicate) {
LLVM_DEBUG(dbgs() << "LV: Cannot fold tail by masking: vectorize with a "
"scalar epilogue instead.\n");
ScalarEpilogueStatus = CM_ScalarEpilogueAllowed;
return MaxFactors;		return MaxFactors;
}		}

if (ScalarEpilogueStatus == CM_ScalarEpilogueNotAllowedUsePredicate) {		if (ScalarEpilogueStatus == CM_ScalarEpilogueNotAllowedUsePredicate) {
LLVM_DEBUG(dbgs() << "LV: Can't fold tail by masking: don't vectorize\n");		LLVM_DEBUG(dbgs() << "LV: Can't fold tail by masking: don't vectorize\n");
return FixedScalableVFPair::getNone();		return FixedScalableVFPair::getNone();
}		}

▲ Show 20 Lines • Show All 113 Lines • ▼ Show 20 Lines	if (MaximizeBandwidth \|\| (MaximizeBandwidth.getNumOccurrences() == 0 &&
// Invalidate any widening decisions we might have made, in case the loop		// Invalidate any widening decisions we might have made, in case the loop
// requires prediction (decided later), but we have already made some		// requires prediction (decided later), but we have already made some
// load/store widening decisions.		// load/store widening decisions.
invalidateCostModelingDecisions();		invalidateCostModelingDecisions();
}		}
return MaxVF;		return MaxVF;
}		}

std::optional<unsigned> LoopVectorizationCostModel::getVScaleForTuning() const {		/// Convenience function that returns the value of vscale_range iff
		david-armUnsubmitted Done Reply Inline Actions Some changes like this don't really need to be in this patch, right? david-arm: Some changes like this don't really need to be in this patch, right?
		dmgreenAuthorUnsubmitted Done Reply Inline Actions Yeah sure, I can give that a try. It seems a bit odd on it's own to be honest, like a patch that makes things worse on its own and feels like we change it just so that we change it again later. Creating needless busywork. But I have for the moment moved it out of this patch. dmgreen: Yeah sure, I can give that a try. It seems a bit odd on it's own to be honest, like a patch…
		/// vscale_range.min == vscale_range.max or otherwise returns the value
		/// returned by the corresponding TLI method.
		static std::optional<unsigned>
		getVScaleForTuning(const Loop *L, const TargetTransformInfo &TTI) {
		david-armUnsubmitted Done Reply Inline Actions Can't we just pass in `TheFunction` instead? david-arm: Can't we just pass in `TheFunction` instead?
		dmgreenAuthorUnsubmitted Done Reply Inline Actions I think I chose Loop because the LoopVectorizationPlanner doesn't store the Function, so would need pass L->getHeader()->getParent() as argument and it simplified the interface a little. Happy to change it though. dmgreen: I think I chose Loop because the LoopVectorizationPlanner doesn't store the Function, so would…
		Function *TheFunction = L->getHeader()->getParent();
if (TheFunction->hasFnAttribute(Attribute::VScaleRange)) {		if (TheFunction->hasFnAttribute(Attribute::VScaleRange)) {
auto Attr = TheFunction->getFnAttribute(Attribute::VScaleRange);		auto Attr = TheFunction->getFnAttribute(Attribute::VScaleRange);
auto Min = Attr.getVScaleRangeMin();		auto Min = Attr.getVScaleRangeMin();
auto Max = Attr.getVScaleRangeMax();		auto Max = Attr.getVScaleRangeMax();
if (Max && Min == Max)		if (Max && Min == Max)
return Max;		return Max;
}		}

return TTI.getVScaleForTuning();		return TTI.getVScaleForTuning();
}		}

bool LoopVectorizationCostModel::isMoreProfitable(		bool LoopVectorizationPlanner::isMoreProfitable(
const VectorizationFactor &A, const VectorizationFactor &B) const {		const VectorizationFactor &A, const VectorizationFactor &B) const {
InstructionCost CostA = A.Cost;		InstructionCost CostA = A.Cost;
InstructionCost CostB = B.Cost;		InstructionCost CostB = B.Cost;

unsigned MaxTripCount = PSE.getSE()->getSmallConstantMaxTripCount(TheLoop);		unsigned MaxTripCount = PSE.getSE()->getSmallConstantMaxTripCount(OrigLoop);

		if (!A.Width.isScalable() && !B.Width.isScalable() && MaxTripCount) {
		// If the trip count is a known (possibly small) constant, the trip count
		// will be rounded up to an integer number of iterations under
		// FoldTailByMasking. The total cost in that case will be
		// PerIterationCost*ceil(TripCount/VF). When not folding the tail, the total
		// cost will be PerIterationCost*floor(TC/VF) + Scalar remainder cost, where
		// the scalar cost is approximated using the correct fraction of the vector
		// cost.
		auto RTCostA =
		dmgreenAuthorUnsubmitted Done Reply Inline Actions I also have pulled this part out into D147720, as a functional change that can be done separately. dmgreen: I also have pulled this part out into D147720, as a functional change that can be done…
		A.Width.getFixedValue()
		? (CostA * divideCeil(MaxTripCount, A.Width.getFixedValue()))
		: (CostA * MaxTripCount) / A.Width.getFixedValue();
		auto RTCostB =
		B.Width.getFixedValue()
		? (CostB * divideCeil(MaxTripCount, B.Width.getFixedValue()))
		: (CostB * MaxTripCount) / B.Width.getFixedValue();

		if (A.FoldTailByMasking && !B.FoldTailByMasking)
		return RTCostA <= RTCostB;

if (!A.Width.isScalable() && !B.Width.isScalable() && foldTailByMasking() &&
MaxTripCount) {
// If we are folding the tail and the trip count is a known (possibly small)
// constant, the trip count will be rounded up to an integer number of
// iterations. The total cost will be PerIterationCost*ceil(TripCount/VF),
// which we compare directly. When not folding the tail, the total cost will
// be PerIterationCost*floor(TC/VF) + Scalar remainder cost, and so is
// approximated with the per-lane cost below instead of using the tripcount
// as here.
auto RTCostA = CostA * divideCeil(MaxTripCount, A.Width.getFixedValue());
auto RTCostB = CostB * divideCeil(MaxTripCount, B.Width.getFixedValue());
return RTCostA < RTCostB;		return RTCostA < RTCostB;
}		}

// Improve estimate for the vector width if it is scalable.		// Improve estimate for the vector width if it is scalable.
unsigned EstimatedWidthA = A.Width.getKnownMinValue();		unsigned EstimatedWidthA = A.Width.getKnownMinValue();
unsigned EstimatedWidthB = B.Width.getKnownMinValue();		unsigned EstimatedWidthB = B.Width.getKnownMinValue();
if (std::optional<unsigned> VScale = getVScaleForTuning()) {		if (std::optional<unsigned> VScale = getVScaleForTuning(OrigLoop, *TTI)) {
if (A.Width.isScalable())		if (A.Width.isScalable())
EstimatedWidthA = VScale;		EstimatedWidthA = VScale;
if (B.Width.isScalable())		if (B.Width.isScalable())
EstimatedWidthB = VScale;		EstimatedWidthB = VScale;
}		}

		// If one plan is predicated and the other is not, opt for the predicated
		// scheme on a tie.
		if (A.FoldTailByMasking && !B.FoldTailByMasking)
		return (CostA * EstimatedWidthB) <= (CostB * EstimatedWidthA);
		david-armUnsubmitted Done Reply Inline Actions Again, I'm a bit concerned about the effect this will have when the active lane mask call is not costed into the loop. I'd prefer for now to be conservative and opt for the non-predicated scheme until we've accounted for the IV costs. david-arm: Again, I'm a bit concerned about the effect this will have when the active lane mask call is…
		dmgreenAuthorUnsubmitted Done Reply Inline Actions For MVE in the past, when we returned true for preferPredicateOverEpilogue, we would get a tail-predicated loop (or not vectorize). Now that we get both tail-folded and non-tail-folded loops to cost against one another, we need to pick the tail-folded version on a tie to not be worse than before. The scores between the two are often the same, so to be closer to the old codegen the conservative option is to chose FoldTail on a tie. I believe that the only target that returns true from preferPredicateOverEpilogue at the moment is MVE. SVE can be adjusted later if needed, but my current thinking was to use UsePredicatedEpilogue from D145925 as a first step to get the benefits of epilog vectorization whilst hopefully not messing anything else up. That was the plan for how to treat SVE conservatively, and we can expand things if needed in the future. It doesn't really make a lot of sense to add up disparate throughput costs and expect them to mean anything, but we can perhaps come up with something if we need and if not we can always just force unpredicated body + predicated epilog. Let me know what you think. dmgreen: For MVE in the past, when we returned true for preferPredicateOverEpilogue, we would get a tail…

// Assume vscale may be larger than 1 (or the value being tuned for),		// Assume vscale may be larger than 1 (or the value being tuned for),
// so that scalable vectorization is slightly favorable over fixed-width		// so that scalable vectorization is slightly favorable over fixed-width
// vectorization.		// vectorization.
if (A.Width.isScalable() && !B.Width.isScalable())		if (A.Width.isScalable() && !B.Width.isScalable())
return (CostA * B.Width.getFixedValue()) <= (CostB * EstimatedWidthA);		return (CostA * B.Width.getFixedValue()) <= (CostB * EstimatedWidthA);

// To avoid the need for FP division:		// To avoid the need for FP division:
// (CostA / A.Width) < (CostB / B.Width)		// (CostA / A.Width) < (CostB / B.Width)
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	if (Subset == Tail \|\| Tail[Subset.size()].first != I) {
Tail = Tail.drop_front(Subset.size());		Tail = Tail.drop_front(Subset.size());
Subset = {};		Subset = {};
} else		} else
// Grow the subset by one element		// Grow the subset by one element
Subset = Tail.take_front(Subset.size() + 1);		Subset = Tail.take_front(Subset.size() + 1);
} while (!Tail.empty());		} while (!Tail.empty());
}		}

VectorizationFactor LoopVectorizationCostModel::selectVectorizationFactor(		std::optional<VectorizationFactor>
const ElementCountSet &VFCandidates) {		LoopVectorizationPlanner::selectVectorizationFactor() {
InstructionCost ExpectedCost = expectedCost(ElementCount::getFixed(1)).first;		LLVM_DEBUG(printPlans(dbgs()));

		// If we had no plans as they were all invalid, return the invalid cost
		if (VPlans.size() == 0)
		return std::nullopt;

		// If we only have one plan due to the UserVF, return it. We try with both
		// predicated and unpredicated loops.
		ElementCount UserVF = Hints.getWidth();
		bool UserPredicated = Hints.getPredicate();
		if (UserVF && hasPlanWithVF(UserVF, UserPredicated)) {
		VPlan &Plan = getBestPlanFor(UserVF, UserPredicated);
		auto Cost = Plan.getCostModel()->expectedCost(UserVF);
		if (Cost.first.isValid())
		return VectorizationFactor(UserVF, UserPredicated, Cost.first, 0);
		} else if (UserVF && hasPlanWithVF(UserVF, !UserPredicated)) {
		VPlan &Plan = getBestPlanFor(UserVF, !UserPredicated);
		auto Cost = Plan.getCostModel()->expectedCost(UserVF);
		if (Cost.first.isValid())
		return VectorizationFactor(UserVF, !UserPredicated, Cost.first, 0);
		}

		assert(VPlans[0]->hasScalarVFOnly() &&
		"Expected Scalar VPlan to be a the first candidate");

		InstructionCost ExpectedCost =
		VPlans[0]->getCostModel()->expectedCost(ElementCount::getFixed(1)).first;
LLVM_DEBUG(dbgs() << "LV: Scalar loop costs: " << ExpectedCost << ".\n");		LLVM_DEBUG(dbgs() << "LV: Scalar loop costs: " << ExpectedCost << ".\n");
assert(ExpectedCost.isValid() && "Unexpected invalid cost for scalar loop");		assert(ExpectedCost.isValid() && "Unexpected invalid cost for scalar loop");
assert(VFCandidates.count(ElementCount::getFixed(1)) &&
"Expected Scalar VF to be a candidate");

const VectorizationFactor ScalarCost(ElementCount::getFixed(1), ExpectedCost,		const VectorizationFactor ScalarCost(ElementCount::getFixed(1),
ExpectedCost);		VPlans[0]->foldTailByMasking(),
		david-armUnsubmitted Done Reply Inline Actions This should always be false for VF=1, right? david-arm: This should always be false for VF=1, right?
		dmgreenAuthorUnsubmitted Done Reply Inline Actions Not in all cases. Cases that are always predicated (like CM_ScalarEpilogueNotAllowedUsePredicate or CM_ScalarEpilogueNotAllowedLowTripLoop) will have VF=1 with tail folding still. They won't have any unpredicated vplans. dmgreen: Not in all cases. Cases that are always predicated (like…
		ExpectedCost, ExpectedCost);
VectorizationFactor ChosenFactor = ScalarCost;		VectorizationFactor ChosenFactor = ScalarCost;

bool ForceVectorization = Hints->getForce() == LoopVectorizeHints::FK_Enabled;		bool ForceVectorization = Hints.getForce() == LoopVectorizeHints::FK_Enabled;
if (ForceVectorization && VFCandidates.size() > 1) {		if (ForceVectorization && VPlans.size() > 1) {
// Ignore scalar width, because the user explicitly wants vectorization.		// Ignore scalar width, because the user explicitly wants vectorization.
// Initialize cost to max so that VF = 2 is, at least, chosen during cost		// Initialize cost to max so that VF = 2 is, at least, chosen during cost
// evaluation.		// evaluation.
ChosenFactor.Cost = InstructionCost::getMax();		ChosenFactor.Cost = InstructionCost::getMax();
}		}

SmallVector<InstructionVFPair> InvalidCosts;		SmallVector<InstructionVFPair> InvalidCosts;
for (const auto &i : VFCandidates) {		for (const VPlanPtr &VPlan : drop_begin(VPlans)) {
		for (const ElementCount &i : VPlan->getVFs()) {
// The cost for scalar VF=1 is already calculated, so ignore it.		// The cost for scalar VF=1 is already calculated, so ignore it.
if (i.isScalar())		if (i.isScalar())
continue;		continue;

VectorizationCostTy C = expectedCost(i, &InvalidCosts);		LoopVectorizationCostModel::VectorizationCostTy C =
VectorizationFactor Candidate(i, C.first, ScalarCost.ScalarCost);		VPlan->getCostModel()->expectedCost(i, &InvalidCosts);
		VectorizationFactor Candidate(i, VPlan->foldTailByMasking(), C.first,
		ScalarCost.ScalarCost);

#ifndef NDEBUG		#ifndef NDEBUG
unsigned AssumedMinimumVscale = 1;		unsigned AssumedMinimumVscale = 1;
if (std::optional<unsigned> VScale = getVScaleForTuning())		if (std::optional<unsigned> VScale = getVScaleForTuning(OrigLoop, *TTI))
AssumedMinimumVscale = *VScale;		AssumedMinimumVscale = *VScale;
unsigned Width =		unsigned Width =
Candidate.Width.isScalable()		Candidate.Width.isScalable()
? Candidate.Width.getKnownMinValue() * AssumedMinimumVscale		? Candidate.Width.getKnownMinValue() * AssumedMinimumVscale
: Candidate.Width.getFixedValue();		: Candidate.Width.getFixedValue();
LLVM_DEBUG(dbgs() << "LV: Vector loop of width " << i << " costs: "		LLVM_DEBUG(
<< Candidate.Cost << " => " << (Candidate.Cost / Width));		dbgs() << "LV: " << (VPlan->foldTailByMasking() ? "Tail folded " : "")
		<< "Vector loop of width " << i << " costs: " << Candidate.Cost
		<< " => " << (Candidate.Cost / Width));
if (i.isScalable())		if (i.isScalable())
LLVM_DEBUG(dbgs() << " (assuming a minimum vscale of "		LLVM_DEBUG(dbgs() << " (assuming a minimum vscale of "
<< AssumedMinimumVscale << ")");		<< AssumedMinimumVscale << ")");
LLVM_DEBUG(dbgs() << ".\n");		LLVM_DEBUG(dbgs() << ".\n");
#endif		#endif

if (!C.second && !ForceVectorization) {		if (!C.second && !ForceVectorization) {
LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << "LV: Not considering vector loop of width " << i		dbgs()
		<< "LV: Not considering vector loop of width " << i
<< " because it will not generate any vector instructions.\n");		<< " because it will not generate any vector instructions.\n");
continue;		continue;
}		}

		// FIXME: Possibly remove EnableCondStoresVectorization now.
		if (!EnableCondStoresVectorization &&
		VPlan->getCostModel()->getNumPredStores()) {
		reportVectorizationFailure(
		"There are conditional stores.",
		"store that is conditionally executed prevents vectorization",
		"ConditionalStore", ORE, OrigLoop);
		continue;
		}

// If profitable add it to ProfitableVF list.		// If profitable add it to ProfitableVF list.
if (isMoreProfitable(Candidate, ScalarCost))		if (isMoreProfitable(Candidate, ScalarCost))
ProfitableVFs.push_back(Candidate);		ProfitableVFs.push_back(Candidate);

if (isMoreProfitable(Candidate, ChosenFactor))		if (isMoreProfitable(Candidate, ChosenFactor))
ChosenFactor = Candidate;		ChosenFactor = Candidate;
}		}

emitInvalidCostRemarks(InvalidCosts, ORE, TheLoop);

if (!EnableCondStoresVectorization && NumPredStores) {
reportVectorizationFailure("There are conditional stores.",
"store that is conditionally executed prevents vectorization",
"ConditionalStore", ORE, TheLoop);
ChosenFactor = ScalarCost;
}		}

LLVM_DEBUG(if (ForceVectorization && !ChosenFactor.Width.isScalar() &&		emitInvalidCostRemarks(InvalidCosts, ORE, OrigLoop);
!isMoreProfitable(ChosenFactor, ScalarCost)) dbgs()
<< "LV: Vectorization seems to be not beneficial, "		LLVM_DEBUG({
<< "but was forced by a user.\n");		if (ForceVectorization && !ChosenFactor.Width.isScalar() &&
LLVM_DEBUG(dbgs() << "LV: Selecting VF: " << ChosenFactor.Width << ".\n");		!isMoreProfitable(ChosenFactor, ScalarCost))
		dbgs() << "LV: Vectorization seems to be not beneficial, "
		<< "but was forced by a user.\n";
		});
		LLVM_DEBUG(dbgs() << "LV: Selecting "
		<< (ChosenFactor.FoldTailByMasking ? "Tail folded " : "")
		<< "VF: " << ChosenFactor.Width << ".\n");
		assert((ChosenFactor.Width.isScalar() \|\| ChosenFactor.ScalarCost > 0) &&
		"when vectorizing, the scalar cost must be non-zero.");
return ChosenFactor;		return ChosenFactor;
}		}

bool LoopVectorizationCostModel::isCandidateForEpilogueVectorization(		bool LoopVectorizationPlanner::isCandidateForEpilogueVectorization(
const Loop &L, ElementCount VF) const {		const Loop &L, ElementCount VF) const {
// Cross iteration phis such as reductions need special handling and are		// Cross iteration phis such as reductions need special handling and are
// currently unsupported.		// currently unsupported.
if (any_of(L.getHeader()->phis(),		if (any_of(L.getHeader()->phis(),
[&](PHINode &Phi) { return Legal->isFixedOrderRecurrence(&Phi); }))		[&](PHINode &Phi) { return Legal->isFixedOrderRecurrence(&Phi); }))
return false;		return false;

// Phis with uses outside of the loop require special handling and are		// Phis with uses outside of the loop require special handling and are
Show All 14 Lines	bool LoopVectorizationPlanner::isCandidateForEpilogueVectorization(
// non-latch exits properly. It may be fine, but it needs auditted and		// non-latch exits properly. It may be fine, but it needs auditted and
// tested.		// tested.
if (L.getExitingBlock() != L.getLoopLatch())		if (L.getExitingBlock() != L.getLoopLatch())
return false;		return false;

return true;		return true;
}		}

bool LoopVectorizationCostModel::isEpilogueVectorizationProfitable(		bool LoopVectorizationPlanner::isEpilogueVectorizationProfitable(
const ElementCount VF) const {		const ElementCount VF) const {
// FIXME: We need a much better cost-model to take different parameters such		// FIXME: We need a much better cost-model to take different parameters such
// as register pressure, code size increase and cost of extra branches into		// as register pressure, code size increase and cost of extra branches into
// account. For now we apply a very crude heuristic and only consider loops		// account. For now we apply a very crude heuristic and only consider loops
// with vectorization factors larger than a certain value.		// with vectorization factors larger than a certain value.

// Allow the target to opt out entirely.		// Allow the target to opt out entirely.
if (!TTI.preferEpilogueVectorization())		if (!TTI->preferEpilogueVectorization())
return false;		return false;

// We also consider epilogue vectorization unprofitable for targets that don't		// We also consider epilogue vectorization unprofitable for targets that don't
// consider interleaving beneficial (eg. MVE).		// consider interleaving beneficial (eg. MVE).
if (TTI.getMaxInterleaveFactor(VF) <= 1)		if (TTI->getMaxInterleaveFactor(VF) <= 1)
return false;		return false;
// FIXME: We should consider changing the threshold for scalable		// FIXME: We should consider changing the threshold for scalable
// vectors to take VScaleForTuning into account.		// vectors to take VScaleForTuning into account.
if (VF.getKnownMinValue() >= EpilogueVectorizationMinVF)		if (VF.getKnownMinValue() >= EpilogueVectorizationMinVF)
return true;		return true;
return false;		return false;
}		}

VectorizationFactor		VectorizationFactor LoopVectorizationPlanner::selectEpilogueVectorizationFactor(
LoopVectorizationCostModel::selectEpilogueVectorizationFactor(		const VectorizationFactor &MainLoopVF) {
const ElementCount MainLoopVF, const LoopVectorizationPlanner &LVP) {
VectorizationFactor Result = VectorizationFactor::Disabled();		VectorizationFactor Result = VectorizationFactor::Disabled();
if (!EnableEpilogueVectorization) {		if (!EnableEpilogueVectorization) {
LLVM_DEBUG(dbgs() << "LEV: Epilogue vectorization is disabled.\n";);		LLVM_DEBUG(dbgs() << "LEV: Epilogue vectorization is disabled.\n";);
return Result;		return Result;
}		}

if (!isScalarEpilogueAllowed()) {		if (MainLoopVF.FoldTailByMasking) {
LLVM_DEBUG(		LLVM_DEBUG(dbgs() << "LEV: Epilogue not required as the vector loop is "
dbgs() << "LEV: Unable to vectorize epilogue because no epilogue is "		"predicated.\n";);
"allowed.\n";);
return Result;		return Result;
}		}

// Not really a cost consideration, but check for unsupported cases here to		// Not really a cost consideration, but check for unsupported cases here to
// simplify the logic.		// simplify the logic.
if (!isCandidateForEpilogueVectorization(*TheLoop, MainLoopVF)) {		if (!isCandidateForEpilogueVectorization(*OrigLoop, MainLoopVF.Width)) {
LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << "LEV: Unable to vectorize epilogue because the loop is "		dbgs() << "LEV: Unable to vectorize epilogue because the loop is "
"not a supported candidate.\n";);		"not a supported candidate.\n";);
return Result;		return Result;
}		}

if (EpilogueVectorizationForceVF > 1) {		if (EpilogueVectorizationForceVF > 1) {
LLVM_DEBUG(dbgs() << "LEV: Epilogue vectorization factor is forced.\n";);		LLVM_DEBUG(dbgs() << "LEV: Epilogue vectorization factor is forced.\n";);
ElementCount ForcedEC = ElementCount::getFixed(EpilogueVectorizationForceVF);		ElementCount ForcedEC = ElementCount::getFixed(EpilogueVectorizationForceVF);
if (LVP.hasPlanWithVF(ForcedEC))		if (hasPlanWithVF(ForcedEC, false))
return {ForcedEC, 0, 0};		return {ForcedEC, false, 0, 0};
else {		else {
LLVM_DEBUG(		LLVM_DEBUG(
dbgs()		dbgs()
<< "LEV: Epilogue vectorization forced factor is not viable.\n";);		<< "LEV: Epilogue vectorization forced factor is not viable.\n";);
return Result;		return Result;
}		}
}		}

if (TheLoop->getHeader()->getParent()->hasOptSize() \|\|		if (OrigLoop->getHeader()->getParent()->hasOptSize() \|\|
TheLoop->getHeader()->getParent()->hasMinSize()) {		OrigLoop->getHeader()->getParent()->hasMinSize()) {
LLVM_DEBUG(		LLVM_DEBUG(
dbgs()		dbgs()
<< "LEV: Epilogue vectorization skipped due to opt for size.\n";);		<< "LEV: Epilogue vectorization skipped due to opt for size.\n";);
return Result;		return Result;
}		}

if (!isEpilogueVectorizationProfitable(MainLoopVF)) {		if (!isEpilogueVectorizationProfitable(MainLoopVF.Width)) {
LLVM_DEBUG(dbgs() << "LEV: Epilogue vectorization is not profitable for "		LLVM_DEBUG(dbgs() << "LEV: Epilogue vectorization is not profitable for "
"this loop\n");		"this loop\n");
return Result;		return Result;
}		}

// If MainLoopVF = vscale x 2, and vscale is expected to be 4, then we know		// If MainLoopVF = vscale x 2, and vscale is expected to be 4, then we know
// the main loop handles 8 lanes per iteration. We could still benefit from		// the main loop handles 8 lanes per iteration. We could still benefit from
// vectorizing the epilogue loop with VF=4.		// vectorizing the epilogue loop with VF=4.
ElementCount EstimatedRuntimeVF = MainLoopVF;		ElementCount EstimatedRuntimeVF = MainLoopVF.Width;
if (MainLoopVF.isScalable()) {		if (MainLoopVF.Width.isScalable()) {
EstimatedRuntimeVF = ElementCount::getFixed(MainLoopVF.getKnownMinValue());		EstimatedRuntimeVF =
if (std::optional<unsigned> VScale = getVScaleForTuning())		ElementCount::getFixed(MainLoopVF.Width.getKnownMinValue());
		if (std::optional<unsigned> VScale = getVScaleForTuning(OrigLoop, *TTI))
EstimatedRuntimeVF = VScale;		EstimatedRuntimeVF = VScale;
}		}

for (auto &NextVF : ProfitableVFs)		for (auto &NextVF : ProfitableVFs)
if (((!NextVF.Width.isScalable() && MainLoopVF.isScalable() &&		if (((!NextVF.Width.isScalable() && MainLoopVF.Width.isScalable() &&
ElementCount::isKnownLT(NextVF.Width, EstimatedRuntimeVF)) \|\|		ElementCount::isKnownLT(NextVF.Width, EstimatedRuntimeVF)) \|\|
ElementCount::isKnownLT(NextVF.Width, MainLoopVF)) &&		ElementCount::isKnownLT(NextVF.Width, MainLoopVF.Width)) &&
(Result.Width.isScalar() \|\| isMoreProfitable(NextVF, Result)) &&		(Result.Width.isScalar() \|\| isMoreProfitable(NextVF, Result)) &&
LVP.hasPlanWithVF(NextVF.Width))		hasPlanWithVF(NextVF.Width, NextVF.FoldTailByMasking))
Result = NextVF;		Result = NextVF;

if (Result != VectorizationFactor::Disabled())		if (Result != VectorizationFactor::Disabled())
LLVM_DEBUG(dbgs() << "LEV: Vectorizing epilogue loop with VF = "		LLVM_DEBUG(dbgs() << "LEV: Vectorizing epilogue loop with VF = "
<< Result.Width << "\n";);		<< Result.Width << "\n";);
return Result;		return Result;
}		}

▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines	for (Instruction &I : BB->instructionsWithoutDebug()) {
assert(T->isSized() &&		assert(T->isSized() &&
"Expected the load/store/recurrence type to be sized");		"Expected the load/store/recurrence type to be sized");

ElementTypesInLoop.insert(T);		ElementTypesInLoop.insert(T);
}		}
}		}
}		}

unsigned		unsigned
LoopVectorizationCostModel::selectInterleaveCount(ElementCount VF,		LoopVectorizationCostModel::selectInterleaveCount(VectorizationFactor VF) {
		SjoerdMeijerUnsubmitted Done Reply Inline Actions Nit: perhaps rename this and in some other places to VS, so that the VF.Width references below become VS.Width. SjoerdMeijer: Nit: perhaps rename this and in some other places to VS, so that the VF.Width references below…
		david-armUnsubmitted Done Reply Inline Actions Again, it looks like the change in prototype doesn't need to be part of this patch and might be a useful tidy-up NFC patch. david-arm: Again, it looks like the change in prototype doesn't need to be part of this patch and might be…
		dmgreenAuthorUnsubmitted Done Reply Inline Actions Thanks Yeah - That is a good idea. I can actually just remove this from the current version of the patch, I believe. dmgreen: Thanks Yeah - That is a good idea. I can actually just remove this from the current version of…
InstructionCost LoopCost) {
// -- The interleave heuristics --		// -- The interleave heuristics --
// We interleave the loop in order to expose ILP and reduce the loop overhead.		// We interleave the loop in order to expose ILP and reduce the loop overhead.
// There are many micro-architectural considerations that we can't predict		// There are many micro-architectural considerations that we can't predict
// at this level. For example, frontend pressure (on decode or fetch) due to		// at this level. For example, frontend pressure (on decode or fetch) due to
// code size, or the number and capabilities of the execution ports.		// code size, or the number and capabilities of the execution ports.
//		//
// We use the following heuristics to select the interleave count:		// We use the following heuristics to select the interleave count:
// 1. If the code has reductions, then we interleave to break the cross		// 1. If the code has reductions, then we interleave to break the cross
// iteration dependency.		// iteration dependency.
// 2. If the loop is really small, then we interleave to reduce the loop		// 2. If the loop is really small, then we interleave to reduce the loop
// overhead.		// overhead.
// 3. We don't interleave if we think that we will spill registers to memory		// 3. We don't interleave if we think that we will spill registers to memory
// due to the increased register pressure.		// due to the increased register pressure.
		if (ScalarEpilogueStatus == CM_ScalarEpilogueNotAllowedUsePredicate \|\|
if (!isScalarEpilogueAllowed())		VF.FoldTailByMasking \|\| TheLoop->getHeader()->getParent()->hasOptSize())
return 1;		return 1;

// We used the distance for the interleave count.		// We used the distance for the interleave count.
if (Legal->getMaxSafeDepDistBytes() != -1U)		if (Legal->getMaxSafeDepDistBytes() != -1U)
return 1;		return 1;

auto BestKnownTC = getSmallBestKnownTC(*PSE.getSE(), TheLoop);		auto BestKnownTC = getSmallBestKnownTC(*PSE.getSE(), TheLoop);
const bool HasReductions = !Legal->getReductionVars().empty();		const bool HasReductions = !Legal->getReductionVars().empty();
// Do not interleave loops with a relatively small known or estimated trip		// Do not interleave loops with a relatively small known or estimated trip
// count. But we will interleave when InterleaveSmallLoopScalarReduction is		// count. But we will interleave when InterleaveSmallLoopScalarReduction is
// enabled, and the code has scalar reductions(HasReductions && VF = 1),		// enabled, and the code has scalar reductions(HasReductions && VF = 1),
// because with the above conditions interleaving can expose ILP and break		// because with the above conditions interleaving can expose ILP and break
// cross iteration dependences for reductions.		// cross iteration dependences for reductions.
if (BestKnownTC && (*BestKnownTC < TinyTripCountInterleaveThreshold) &&		if (BestKnownTC && (*BestKnownTC < TinyTripCountInterleaveThreshold) &&
!(InterleaveSmallLoopScalarReduction && HasReductions && VF.isScalar()))		!(InterleaveSmallLoopScalarReduction && HasReductions &&
		VF.Width.isScalar()))
return 1;		return 1;

// If we did not calculate the cost for VF (because the user selected the VF)		// If we did not calculate the cost for VF (because the user selected the VF)
// then we calculate the cost of VF here.		// then we calculate the cost of VF here.
if (LoopCost == 0) {		if (VF.Cost == 0) {
LoopCost = expectedCost(VF).first;		VF.Cost = expectedCost(VF.Width).first;
assert(LoopCost.isValid() && "Expected to have chosen a VF with valid cost");		assert(VF.Cost.isValid() && "Expected to have chosen a VF with valid cost");

// Loop body is free and there is no need for interleaving.		// Loop body is free and there is no need for interleaving.
if (LoopCost == 0)		if (VF.Cost == 0)
return 1;		return 1;
}		}

RegisterUsage R = calculateRegisterUsage({VF})[0];		RegisterUsage R = calculateRegisterUsage({VF.Width})[0];
// We divide by these constants so assume that we have at least one		// We divide by these constants so assume that we have at least one
// instruction that uses at least one register.		// instruction that uses at least one register.
for (auto& pair : R.MaxLocalUsers) {		for (auto& pair : R.MaxLocalUsers) {
pair.second = std::max(pair.second, 1U);		pair.second = std::max(pair.second, 1U);
}		}

// We calculate the interleave count using the following formula.		// We calculate the interleave count using the following formula.
// Subtract the number of loop invariants from the number of available		// Subtract the number of loop invariants from the number of available
// registers. These registers are used by all of the interleaved instances.		// registers. These registers are used by all of the interleaved instances.
// Next, divide the remaining registers by the number of registers that is		// Next, divide the remaining registers by the number of registers that is
// required by the loop, in order to estimate how many parallel instances		// required by the loop, in order to estimate how many parallel instances
// fit without causing spills. All of this is rounded down if necessary to be		// fit without causing spills. All of this is rounded down if necessary to be
// a power of two. We want power of two interleave count to simplify any		// a power of two. We want power of two interleave count to simplify any
// addressing operations or alignment considerations.		// addressing operations or alignment considerations.
// We also want power of two interleave counts to ensure that the induction		// We also want power of two interleave counts to ensure that the induction
// variable of the vector loop wraps to zero, when tail is folded by masking;		// variable of the vector loop wraps to zero, when tail is folded by masking;
// this currently happens when OptForSize, in which case IC is set to 1 above.		// this currently happens when OptForSize, in which case IC is set to 1 above.
unsigned IC = UINT_MAX;		unsigned IC = UINT_MAX;

for (auto& pair : R.MaxLocalUsers) {		for (auto& pair : R.MaxLocalUsers) {
unsigned TargetNumRegisters = TTI.getNumberOfRegisters(pair.first);		unsigned TargetNumRegisters = TTI.getNumberOfRegisters(pair.first);
LLVM_DEBUG(dbgs() << "LV: The target has " << TargetNumRegisters		LLVM_DEBUG(dbgs() << "LV: The target has " << TargetNumRegisters
<< " registers of "		<< " registers of "
<< TTI.getRegisterClassName(pair.first) << " register class\n");		<< TTI.getRegisterClassName(pair.first) << " register class\n");
if (VF.isScalar()) {		if (VF.Width.isScalar()) {
if (ForceTargetNumScalarRegs.getNumOccurrences() > 0)		if (ForceTargetNumScalarRegs.getNumOccurrences() > 0)
TargetNumRegisters = ForceTargetNumScalarRegs;		TargetNumRegisters = ForceTargetNumScalarRegs;
} else {		} else {
if (ForceTargetNumVectorRegs.getNumOccurrences() > 0)		if (ForceTargetNumVectorRegs.getNumOccurrences() > 0)
TargetNumRegisters = ForceTargetNumVectorRegs;		TargetNumRegisters = ForceTargetNumVectorRegs;
}		}
unsigned MaxLocalUsers = pair.second;		unsigned MaxLocalUsers = pair.second;
unsigned LoopInvariantRegs = 0;		unsigned LoopInvariantRegs = 0;
if (R.LoopInvariantRegs.find(pair.first) != R.LoopInvariantRegs.end())		if (R.LoopInvariantRegs.find(pair.first) != R.LoopInvariantRegs.end())
LoopInvariantRegs = R.LoopInvariantRegs[pair.first];		LoopInvariantRegs = R.LoopInvariantRegs[pair.first];

unsigned TmpIC = llvm::bit_floor((TargetNumRegisters - LoopInvariantRegs) /		unsigned TmpIC = llvm::bit_floor((TargetNumRegisters - LoopInvariantRegs) /
MaxLocalUsers);		MaxLocalUsers);
// Don't count the induction variable as interleaved.		// Don't count the induction variable as interleaved.
if (EnableIndVarRegisterHeur) {		if (EnableIndVarRegisterHeur) {
TmpIC = llvm::bit_floor((TargetNumRegisters - LoopInvariantRegs - 1) /		TmpIC = llvm::bit_floor((TargetNumRegisters - LoopInvariantRegs - 1) /
std::max(1U, (MaxLocalUsers - 1)));		std::max(1U, (MaxLocalUsers - 1)));
}		}

IC = std::min(IC, TmpIC);		IC = std::min(IC, TmpIC);
}		}

// Clamp the interleave ranges to reasonable counts.		// Clamp the interleave ranges to reasonable counts.
unsigned MaxInterleaveCount = TTI.getMaxInterleaveFactor(VF);		unsigned MaxInterleaveCount = TTI.getMaxInterleaveFactor(VF.Width);

// Check if the user has overridden the max.		// Check if the user has overridden the max.
if (VF.isScalar()) {		if (VF.Width.isScalar()) {
if (ForceTargetMaxScalarInterleaveFactor.getNumOccurrences() > 0)		if (ForceTargetMaxScalarInterleaveFactor.getNumOccurrences() > 0)
MaxInterleaveCount = ForceTargetMaxScalarInterleaveFactor;		MaxInterleaveCount = ForceTargetMaxScalarInterleaveFactor;
} else {		} else {
if (ForceTargetMaxVectorInterleaveFactor.getNumOccurrences() > 0)		if (ForceTargetMaxVectorInterleaveFactor.getNumOccurrences() > 0)
MaxInterleaveCount = ForceTargetMaxVectorInterleaveFactor;		MaxInterleaveCount = ForceTargetMaxVectorInterleaveFactor;
}		}

// If trip count is known or estimated compile time constant, limit the		// If trip count is known or estimated compile time constant, limit the
// interleave count to be less than the trip count divided by VF, provided it		// interleave count to be less than the trip count divided by VF, provided it
// is at least 1.		// is at least 1.
//		//
// For scalable vectors we can't know if interleaving is beneficial. It may		// For scalable vectors we can't know if interleaving is beneficial. It may
// not be beneficial for small loops if none of the lanes in the second vector		// not be beneficial for small loops if none of the lanes in the second vector
// iterations is enabled. However, for larger loops, there is likely to be a		// iterations is enabled. However, for larger loops, there is likely to be a
// similar benefit as for fixed-width vectors. For now, we choose to leave		// similar benefit as for fixed-width vectors. For now, we choose to leave
// the InterleaveCount as if vscale is '1', although if some information about		// the InterleaveCount as if vscale is '1', although if some information about
// the vector is known (e.g. min vector size), we can make a better decision.		// the vector is known (e.g. min vector size), we can make a better decision.
if (BestKnownTC) {		if (BestKnownTC) {
MaxInterleaveCount =		MaxInterleaveCount = std::min(*BestKnownTC / VF.Width.getKnownMinValue(),
std::min(*BestKnownTC / VF.getKnownMinValue(), MaxInterleaveCount);		MaxInterleaveCount);
// Make sure MaxInterleaveCount is greater than 0.		// Make sure MaxInterleaveCount is greater than 0.
MaxInterleaveCount = std::max(1u, MaxInterleaveCount);		MaxInterleaveCount = std::max(1u, MaxInterleaveCount);
}		}

assert(MaxInterleaveCount > 0 &&		assert(MaxInterleaveCount > 0 &&
"Maximum interleave count must be greater than 0");		"Maximum interleave count must be greater than 0");

// Clamp the calculated IC to be between the 1 and the max interleave count		// Clamp the calculated IC to be between the 1 and the max interleave count
// that the target and trip count allows.		// that the target and trip count allows.
if (IC > MaxInterleaveCount)		if (IC > MaxInterleaveCount)
IC = MaxInterleaveCount;		IC = MaxInterleaveCount;
else		else
// Make sure IC is greater than 0.		// Make sure IC is greater than 0.
IC = std::max(1u, IC);		IC = std::max(1u, IC);

assert(IC > 0 && "Interleave count must be greater than 0.");		assert(IC > 0 && "Interleave count must be greater than 0.");

// Interleave if we vectorized this loop and there is a reduction that could		// Interleave if we vectorized this loop and there is a reduction that could
// benefit from interleaving.		// benefit from interleaving.
if (VF.isVector() && HasReductions) {		if (VF.Width.isVector() && HasReductions) {
LLVM_DEBUG(dbgs() << "LV: Interleaving because of reductions.\n");		LLVM_DEBUG(dbgs() << "LV: Interleaving because of reductions.\n");
return IC;		return IC;
}		}

// For any scalar loop that either requires runtime checks or predication we		// For any scalar loop that either requires runtime checks or predication we
// are better off leaving this to the unroller. Note that if we've already		// are better off leaving this to the unroller. Note that if we've already
// vectorized the loop we will have done the runtime check and so interleaving		// vectorized the loop we will have done the runtime check and so interleaving
// won't require further checks.		// won't require further checks.
bool ScalarInterleavingRequiresPredication =		bool ScalarInterleavingRequiresPredication =
(VF.isScalar() && any_of(TheLoop->blocks(), [this](BasicBlock *BB) {		(VF.Width.isScalar() && any_of(TheLoop->blocks(), [this](BasicBlock *BB) {
return Legal->blockNeedsPredication(BB);		return Legal->blockNeedsPredication(BB);
}));		}));
bool ScalarInterleavingRequiresRuntimePointerCheck =		bool ScalarInterleavingRequiresRuntimePointerCheck =
(VF.isScalar() && Legal->getRuntimePointerChecking()->Need);		(VF.Width.isScalar() && Legal->getRuntimePointerChecking()->Need);

// We want to interleave small loops in order to reduce the loop overhead and		// We want to interleave small loops in order to reduce the loop overhead and
// potentially expose ILP opportunities.		// potentially expose ILP opportunities.
LLVM_DEBUG(dbgs() << "LV: Loop cost is " << LoopCost << '\n'		LLVM_DEBUG(dbgs() << "LV: Loop cost is " << VF.Cost << '\n'
<< "LV: IC is " << IC << '\n'		<< "LV: IC is " << IC << '\n'
<< "LV: VF is " << VF << '\n');		<< "LV: VF is " << VF.Width << '\n'
		<< "LV: Fold Tail is "
		<< (VF.FoldTailByMasking ? "true" : "false") << '\n');
const bool AggressivelyInterleaveReductions =		const bool AggressivelyInterleaveReductions =
TTI.enableAggressiveInterleaving(HasReductions);		TTI.enableAggressiveInterleaving(HasReductions);
if (!ScalarInterleavingRequiresRuntimePointerCheck &&		if (!ScalarInterleavingRequiresRuntimePointerCheck &&
!ScalarInterleavingRequiresPredication && LoopCost < SmallLoopCost) {		!ScalarInterleavingRequiresPredication && VF.Cost < SmallLoopCost) {
// We assume that the cost overhead is 1 and we use the cost model		// We assume that the cost overhead is 1 and we use the cost model
// to estimate the cost of the loop and interleave until the cost of the		// to estimate the cost of the loop and interleave until the cost of the
// loop overhead is about 5% of the cost of the loop.		// loop overhead is about 5% of the cost of the loop.
unsigned SmallIC = std::min(IC, (unsigned)llvm::bit_floor<uint64_t>(		unsigned SmallIC = std::min(IC, (unsigned)llvm::bit_floor<uint64_t>(
SmallLoopCost / *LoopCost.getValue()));		SmallLoopCost / *VF.Cost.getValue()));

// Interleave until store/load ports (estimated by max interleave count) are		// Interleave until store/load ports (estimated by max interleave count) are
// saturated.		// saturated.
unsigned NumStores = Legal->getNumStores();		unsigned NumStores = Legal->getNumStores();
unsigned NumLoads = Legal->getNumLoads();		unsigned NumLoads = Legal->getNumLoads();
unsigned StoresIC = IC / (NumStores ? NumStores : 1);		unsigned StoresIC = IC / (NumStores ? NumStores : 1);
unsigned LoadsIC = IC / (NumLoads ? NumLoads : 1);		unsigned LoadsIC = IC / (NumLoads ? NumLoads : 1);

Show All 40 Lines	if (EnableLoadStoreRuntimeInterleave &&
std::max(StoresIC, LoadsIC) > SmallIC) {		std::max(StoresIC, LoadsIC) > SmallIC) {
LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << "LV: Interleaving to saturate store or load ports.\n");		dbgs() << "LV: Interleaving to saturate store or load ports.\n");
return std::max(StoresIC, LoadsIC);		return std::max(StoresIC, LoadsIC);
}		}

// If there are scalar reductions and TTI has enabled aggressive		// If there are scalar reductions and TTI has enabled aggressive
// interleaving for reductions, we will interleave to expose ILP.		// interleaving for reductions, we will interleave to expose ILP.
if (InterleaveSmallLoopScalarReduction && VF.isScalar() &&		if (InterleaveSmallLoopScalarReduction && VF.Width.isScalar() &&
AggressivelyInterleaveReductions) {		AggressivelyInterleaveReductions) {
LLVM_DEBUG(dbgs() << "LV: Interleaving to expose ILP.\n");		LLVM_DEBUG(dbgs() << "LV: Interleaving to expose ILP.\n");
// Interleave no less than SmallIC but not as aggressive as the normal IC		// Interleave no less than SmallIC but not as aggressive as the normal IC
// to satisfy the rare situation when resources are too limited.		// to satisfy the rare situation when resources are too limited.
return std::max(IC / 2, SmallIC);		return std::max(IC / 2, SmallIC);
} else {		} else {
LLVM_DEBUG(dbgs() << "LV: Interleaving to reduce branch cost.\n");		LLVM_DEBUG(dbgs() << "LV: Interleaving to reduce branch cost.\n");
return SmallIC;		return SmallIC;
▲ Show 20 Lines • Show All 536 Lines • ▼ Show 20 Lines	LoopVectorizationCostModel::getConsecutiveMemOpCost(Instruction *I,
unsigned AS = getLoadStoreAddressSpace(I);		unsigned AS = getLoadStoreAddressSpace(I);
int ConsecutiveStride = Legal->isConsecutivePtr(ValTy, Ptr);		int ConsecutiveStride = Legal->isConsecutivePtr(ValTy, Ptr);
enum TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;		enum TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;

assert((ConsecutiveStride == 1 \|\| ConsecutiveStride == -1) &&		assert((ConsecutiveStride == 1 \|\| ConsecutiveStride == -1) &&
"Stride should be 1 or -1 for consecutive memory access");		"Stride should be 1 or -1 for consecutive memory access");
const Align Alignment = getLoadStoreAlignment(I);		const Align Alignment = getLoadStoreAlignment(I);
InstructionCost Cost = 0;		InstructionCost Cost = 0;
if (Legal->isMaskRequired(I)) {		if (Legal->isMaskRequired(foldTailByMasking(), I)) {
Cost += TTI.getMaskedMemoryOpCost(I->getOpcode(), VectorTy, Alignment, AS,		Cost += TTI.getMaskedMemoryOpCost(I->getOpcode(), VectorTy, Alignment, AS,
CostKind);		CostKind);
} else {		} else {
TTI::OperandValueInfo OpInfo = TTI::getOperandInfo(I->getOperand(0));		TTI::OperandValueInfo OpInfo = TTI::getOperandInfo(I->getOperand(0));
Cost += TTI.getMemoryOpCost(I->getOpcode(), VectorTy, Alignment, AS,		Cost += TTI.getMemoryOpCost(I->getOpcode(), VectorTy, Alignment, AS,
CostKind, OpInfo, I);		CostKind, OpInfo, I);
}		}

Show All 37 Lines	LoopVectorizationCostModel::getGatherScatterCost(Instruction *I,
ElementCount VF) {		ElementCount VF) {
Type *ValTy = getLoadStoreType(I);		Type *ValTy = getLoadStoreType(I);
auto *VectorTy = cast<VectorType>(ToVectorTy(ValTy, VF));		auto *VectorTy = cast<VectorType>(ToVectorTy(ValTy, VF));
const Align Alignment = getLoadStoreAlignment(I);		const Align Alignment = getLoadStoreAlignment(I);
const Value *Ptr = getLoadStorePointerOperand(I);		const Value *Ptr = getLoadStorePointerOperand(I);

return TTI.getAddressComputationCost(VectorTy) +		return TTI.getAddressComputationCost(VectorTy) +
TTI.getGatherScatterOpCost(		TTI.getGatherScatterOpCost(
I->getOpcode(), VectorTy, Ptr, Legal->isMaskRequired(I), Alignment,		I->getOpcode(), VectorTy, Ptr,
		Legal->isMaskRequired(foldTailByMasking(), I), Alignment,
TargetTransformInfo::TCK_RecipThroughput, I);		TargetTransformInfo::TCK_RecipThroughput, I);
}		}

InstructionCost		InstructionCost
LoopVectorizationCostModel::getInterleaveGroupCost(Instruction *I,		LoopVectorizationCostModel::getInterleaveGroupCost(Instruction *I,
ElementCount VF) {		ElementCount VF) {
// TODO: Once we have support for interleaving with scalable vectors		// TODO: Once we have support for interleaving with scalable vectors
// we can calculate the cost properly here.		// we can calculate the cost properly here.
Show All 18 Lines	if (Group->getMember(IF))
Indices.push_back(IF);		Indices.push_back(IF);

// Calculate the cost of the whole interleaved group.		// Calculate the cost of the whole interleaved group.
bool UseMaskForGaps =		bool UseMaskForGaps =
(Group->requiresScalarEpilogue() && !isScalarEpilogueAllowed()) \|\|		(Group->requiresScalarEpilogue() && !isScalarEpilogueAllowed()) \|\|
(isa<StoreInst>(I) && (Group->getNumMembers() < Group->getFactor()));		(isa<StoreInst>(I) && (Group->getNumMembers() < Group->getFactor()));
InstructionCost Cost = TTI.getInterleavedMemoryOpCost(		InstructionCost Cost = TTI.getInterleavedMemoryOpCost(
I->getOpcode(), WideVecTy, Group->getFactor(), Indices, Group->getAlign(),		I->getOpcode(), WideVecTy, Group->getFactor(), Indices, Group->getAlign(),
AS, CostKind, Legal->isMaskRequired(I), UseMaskForGaps);		AS, CostKind, Legal->isMaskRequired(foldTailByMasking(), I),
		UseMaskForGaps);

if (Group->isReverse()) {		if (Group->isReverse()) {
// TODO: Add support for reversed masked interleaved access.		// TODO: Add support for reversed masked interleaved access.
assert(!Legal->isMaskRequired(I) &&		assert(!Legal->isMaskRequired(foldTailByMasking(), I) &&
"Reverse masked interleaved access not supported.");		"Reverse masked interleaved access not supported.");
Cost += Group->getNumMembers() *		Cost += Group->getNumMembers() *
TTI.getShuffleCost(TargetTransformInfo::SK_Reverse, VectorTy,		TTI.getShuffleCost(TargetTransformInfo::SK_Reverse, VectorTy,
std::nullopt, CostKind, 0);		std::nullopt, CostKind, 0);
}		}
return Cost;		return Cost;
}		}

▲ Show 20 Lines • Show All 699 Lines • ▼ Show 20 Lines	auto ComputeCCH = [&](Instruction *I) -> TTI::CastContextHint {

switch (getWideningDecision(I, VF)) {		switch (getWideningDecision(I, VF)) {
case LoopVectorizationCostModel::CM_GatherScatter:		case LoopVectorizationCostModel::CM_GatherScatter:
return TTI::CastContextHint::GatherScatter;		return TTI::CastContextHint::GatherScatter;
case LoopVectorizationCostModel::CM_Interleave:		case LoopVectorizationCostModel::CM_Interleave:
return TTI::CastContextHint::Interleave;		return TTI::CastContextHint::Interleave;
case LoopVectorizationCostModel::CM_Scalarize:		case LoopVectorizationCostModel::CM_Scalarize:
case LoopVectorizationCostModel::CM_Widen:		case LoopVectorizationCostModel::CM_Widen:
return Legal->isMaskRequired(I) ? TTI::CastContextHint::Masked		return Legal->isMaskRequired(foldTailByMasking(), I)
		? TTI::CastContextHint::Masked
: TTI::CastContextHint::Normal;		: TTI::CastContextHint::Normal;
case LoopVectorizationCostModel::CM_Widen_Reverse:		case LoopVectorizationCostModel::CM_Widen_Reverse:
return TTI::CastContextHint::Reversed;		return TTI::CastContextHint::Reversed;
case LoopVectorizationCostModel::CM_Unknown:		case LoopVectorizationCostModel::CM_Unknown:
llvm_unreachable("Instr did not go through cost modelling?");		llvm_unreachable("Instr did not go through cost modelling?");
}		}

llvm_unreachable("Unhandled case!");		llvm_unreachable("Unhandled case!");
};		};
▲ Show 20 Lines • Show All 151 Lines • ▼ Show 20 Lines
static unsigned determineVPlanVF(const unsigned WidestVectorRegBits,		static unsigned determineVPlanVF(const unsigned WidestVectorRegBits,
LoopVectorizationCostModel &CM) {		LoopVectorizationCostModel &CM) {
unsigned WidestType;		unsigned WidestType;
std::tie(std::ignore, WidestType) = CM.getSmallestAndWidestTypes();		std::tie(std::ignore, WidestType) = CM.getSmallestAndWidestTypes();
return WidestVectorRegBits / WidestType;		return WidestVectorRegBits / WidestType;
}		}

VectorizationFactor		VectorizationFactor
LoopVectorizationPlanner::planInVPlanNativePath(ElementCount UserVF) {		LoopVectorizationPlanner::planInVPlanNativePath(LoopVectorizationCostModel &CM,
		ElementCount UserVF) {
assert(!UserVF.isScalable() && "scalable vectors not yet supported");		assert(!UserVF.isScalable() && "scalable vectors not yet supported");
ElementCount VF = UserVF;		ElementCount VF = UserVF;
// Outer loop handling: They may require CFG and instruction level		// Outer loop handling: They may require CFG and instruction level
// transformations before even evaluating whether vectorization is profitable.		// transformations before even evaluating whether vectorization is profitable.
// Since we cannot modify the incoming IR, we need to build VPlan upfront in		// Since we cannot modify the incoming IR, we need to build VPlan upfront in
// the vectorization pipeline.		// the vectorization pipeline.
if (!OrigLoop->isInnermost()) {		if (!OrigLoop->isInnermost()) {
// If the user doesn't provide a vectorization factor, determine a		// If the user doesn't provide a vectorization factor, determine a
Show All 12 Lines	if (UserVF.isZero()) {
VF = ElementCount::getFixed(4);		VF = ElementCount::getFixed(4);
}		}
}		}
assert(EnableVPlanNativePath && "VPlan-native path is not enabled.");		assert(EnableVPlanNativePath && "VPlan-native path is not enabled.");
assert(isPowerOf2_32(VF.getKnownMinValue()) &&		assert(isPowerOf2_32(VF.getKnownMinValue()) &&
"VF needs to be a power of two");		"VF needs to be a power of two");
LLVM_DEBUG(dbgs() << "LV: Using " << (!UserVF.isZero() ? "user " : "")		LLVM_DEBUG(dbgs() << "LV: Using " << (!UserVF.isZero() ? "user " : "")
<< "VF " << VF << " to build VPlans.\n");		<< "VF " << VF << " to build VPlans.\n");
buildVPlans(VF, VF);		buildVPlans(CM, VF, VF);

// For VPlan build stress testing, we bail out after VPlan construction.		// For VPlan build stress testing, we bail out after VPlan construction.
if (VPlanBuildStressTest)		if (VPlanBuildStressTest)
return VectorizationFactor::Disabled();		return VectorizationFactor::Disabled();

return {VF, 0 /Cost/, 0 /* ScalarCost */};		return {VF, false, 0 /Cost/, 0 /* ScalarCost */};
		david-armUnsubmitted Done Reply Inline Actions Perhaps worth adding a `/* ... /` comment for the new argument too? Same for any other similar places in the file. david-arm:* Perhaps worth adding a `/* ... */` comment for the new argument too? Same for any other similar…
}		}

LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << "LV: Not vectorizing. Inner loops aren't supported in the "		dbgs() << "LV: Not vectorizing. Inner loops aren't supported in the "
"VPlan-native path.\n");		"VPlan-native path.\n");
return VectorizationFactor::Disabled();		return VectorizationFactor::Disabled();
}		}

std::optional<VectorizationFactor>		void LoopVectorizationPlanner::plan(LoopVectorizationCostModel &CM,
LoopVectorizationPlanner::plan(ElementCount UserVF, unsigned UserIC) {		ElementCount UserVF, unsigned UserIC) {
		CM.collectValuesToIgnore();
		CM.collectElementTypesForWidening();

assert(OrigLoop->isInnermost() && "Inner loop expected.");		assert(OrigLoop->isInnermost() && "Inner loop expected.");
FixedScalableVFPair MaxFactors = CM.computeMaxVF(UserVF, UserIC);		FixedScalableVFPair MaxFactors = CM.computeMaxVF(UserVF, UserIC);
if (!MaxFactors) // Cases that should not to be vectorized nor interleaved.		if (!MaxFactors) // Cases that should not to be vectorized nor interleaved.
return std::nullopt;		return;

// Invalidate interleave groups if all blocks of loop will be predicated.		// Invalidate interleave groups if all blocks of loop will be predicated.
if (CM.blockNeedsPredicationForAnyReason(OrigLoop->getHeader()) &&		if (CM.blockNeedsPredicationForAnyReason(OrigLoop->getHeader()) &&
!useMaskedInterleavedAccesses(*TTI)) {		!useMaskedInterleavedAccesses(*TTI)) {
LLVM_DEBUG(		LLVM_DEBUG(
dbgs()		dbgs()
<< "LV: Invalidate all interleaved groups due to fold-tail by masking "		<< "LV: Invalidate all interleaved groups due to fold-tail by masking "
"which requires masked-interleaved support.\n");		"which requires masked-interleaved support.\n");
if (CM.InterleaveInfo.invalidateGroups())		if (CM.InterleaveInfo.invalidateGroups())
// Invalidating interleave groups also requires invalidating all decisions		// Invalidating interleave groups also requires invalidating all decisions
// based on them, which includes widening decisions and uniform and scalar		// based on them, which includes widening decisions and uniform and scalar
// values.		// values.
CM.invalidateCostModelingDecisions();		CM.invalidateCostModelingDecisions();
}		}

ElementCount MaxUserVF =		ElementCount MaxUserVF =
UserVF.isScalable() ? MaxFactors.ScalableVF : MaxFactors.FixedVF;		UserVF.isScalable() ? MaxFactors.ScalableVF : MaxFactors.FixedVF;
bool UserVFIsLegal = ElementCount::isKnownLE(UserVF, MaxUserVF);		bool UserVFIsLegal = ElementCount::isKnownLE(UserVF, MaxUserVF);
if (!UserVF.isZero() && UserVFIsLegal) {		if (!UserVF.isZero() && UserVFIsLegal) {
assert(isPowerOf2_32(UserVF.getKnownMinValue()) &&		assert(isPowerOf2_32(UserVF.getKnownMinValue()) &&
"VF needs to be a power of two");		"VF needs to be a power of two");
// Collect the instructions (and their associated costs) that will be more		// Collect the instructions (and their associated costs) that will be more
// profitable to scalarize.		// profitable to scalarize.
if (CM.selectUserVectorizationFactor(UserVF)) {		if (CM.selectUserVectorizationFactor(UserVF)) {
		SjoerdMeijerUnsubmitted Done Reply Inline Actions nit: TODO SjoerdMeijer: nit: TODO
LLVM_DEBUG(dbgs() << "LV: Using user VF " << UserVF << ".\n");		LLVM_DEBUG(dbgs() << "LV: Using user VF " << UserVF << ".\n");
CM.collectInLoopReductions();		CM.collectInLoopReductions();
buildVPlansWithVPRecipes(UserVF, UserVF);		buildVPlansWithVPRecipes(CM, UserVF, UserVF);
LLVM_DEBUG(printPlans(dbgs()));		return;
return {{UserVF, 0, 0}};
} else		} else
reportVectorizationInfo("UserVF ignored because of invalid costs.",		reportVectorizationInfo("UserVF ignored because of invalid costs.",
"InvalidCost", ORE, OrigLoop);		"InvalidCost", ORE, OrigLoop);
}		}

// Populate the set of Vectorization Factor Candidates.
ElementCountSet VFCandidates;
for (auto VF = ElementCount::getFixed(1);
ElementCount::isKnownLE(VF, MaxFactors.FixedVF); VF *= 2)
VFCandidates.insert(VF);
for (auto VF = ElementCount::getScalable(1);
ElementCount::isKnownLE(VF, MaxFactors.ScalableVF); VF *= 2)
VFCandidates.insert(VF);

for (const auto &VF : VFCandidates) {
// Collect Uniform and Scalar instructions after vectorization with VF.
CM.collectUniformsAndScalars(VF);

// Collect the instructions (and their associated costs) that will be more
// profitable to scalarize.
if (VF.isVector())
CM.collectInstsToScalarize(VF);
}

CM.collectInLoopReductions();		CM.collectInLoopReductions();
buildVPlansWithVPRecipes(ElementCount::getFixed(1), MaxFactors.FixedVF);		buildVPlansWithVPRecipes(CM, ElementCount::getFixed(1), MaxFactors.FixedVF);
buildVPlansWithVPRecipes(ElementCount::getScalable(1), MaxFactors.ScalableVF);		buildVPlansWithVPRecipes(CM, ElementCount::getScalable(1),
		MaxFactors.ScalableVF);
LLVM_DEBUG(printPlans(dbgs()));
if (!MaxFactors.hasVector())
return VectorizationFactor::Disabled();

// Select the optimal vectorization factor.
VectorizationFactor VF = CM.selectVectorizationFactor(VFCandidates);
assert((VF.Width.isScalar() \|\| VF.ScalarCost > 0) && "when vectorizing, the scalar cost must be non-zero.");
return VF;
}		}

VPlan &LoopVectorizationPlanner::getBestPlanFor(ElementCount VF) const {		VPlan &LoopVectorizationPlanner::getBestPlanFor(ElementCount VF,
		bool FoldTailByMasking) const {
assert(count_if(VPlans,		assert(count_if(VPlans,
[VF](const VPlanPtr &Plan) { return Plan->hasVF(VF); }) ==		[VF, FoldTailByMasking](const VPlanPtr &Plan) {
1 &&		return Plan->hasVF(VF) &&
		david-armUnsubmitted Done Reply Inline Actions Is it worth changing `hasVF` to take the `FoldTailByMasking` as a second argument so you can just write: assert(count_if(VPlans, [VF, FoldTailByMasking](const VPlanPtr &Plan) { return Plan->hasVF(VF, FoldTailByMasking); }) == 1 && "Best VF has not a single VPlan."); david-arm: Is it worth changing `hasVF` to take the `FoldTailByMasking` as a second argument so you can…
		dmgreenAuthorUnsubmitted Done Reply Inline Actions It feels like a separate parameter to me, that would want to be checked separately. I can change it if you think its better but there are places that call hasVF on it's own. dmgreen: It feels like a separate parameter to me, that would want to be checked separately. I can…
		Plan->foldTailByMasking() == FoldTailByMasking;
		}) == 1 &&
"Best VF has not a single VPlan.");		"Best VF has not a single VPlan.");

for (const VPlanPtr &Plan : VPlans) {		for (const VPlanPtr &Plan : VPlans) {
if (Plan->hasVF(VF))		if (Plan->hasVF(VF) && Plan->foldTailByMasking() == FoldTailByMasking)
return *Plan.get();		return *Plan.get();
}		}
llvm_unreachable("No plan found!");		llvm_unreachable("No plan found!");
}		}

static void AddRuntimeUnrollDisableMetaData(Loop *L) {		static void AddRuntimeUnrollDisableMetaData(Loop *L) {
SmallVector<Metadata *, 4> MDs;		SmallVector<Metadata *, 4> MDs;
// Reserve first location for self reference to the LoopID metadata node.		// Reserve first location for self reference to the LoopID metadata node.
Show All 33 Lines	void LoopVectorizationPlanner::executePlan(ElementCount BestVF, unsigned BestUF,
InnerLoopVectorizer &ILV,		InnerLoopVectorizer &ILV,
DominatorTree *DT,		DominatorTree *DT,
bool IsEpilogueVectorization) {		bool IsEpilogueVectorization) {
assert(BestVPlan.hasVF(BestVF) &&		assert(BestVPlan.hasVF(BestVF) &&
"Trying to execute plan with unsupported VF");		"Trying to execute plan with unsupported VF");
assert(BestVPlan.hasUF(BestUF) &&		assert(BestVPlan.hasUF(BestUF) &&
"Trying to execute plan with unsupported UF");		"Trying to execute plan with unsupported UF");

LLVM_DEBUG(dbgs() << "Executing best plan with VF=" << BestVF << ", UF=" << BestUF		LLVM_DEBUG(dbgs() << "Executing best plan with TailFold="
<< '\n');		<< (BestVPlan.foldTailByMasking() ? "true" : "false")
		<< ", VF=" << BestVF << ", UF=" << BestUF << '\n');

// Workaround! Compute the trip count of the original loop and cache it		// Workaround! Compute the trip count of the original loop and cache it
// before we start modifying the CFG. This code has a systemic problem		// before we start modifying the CFG. This code has a systemic problem
// wherein it tries to run analysis over partially constructed IR; this is		// wherein it tries to run analysis over partially constructed IR; this is
// wrong, and not simply for SCEV. The trip count of the original loop		// wrong, and not simply for SCEV. The trip count of the original loop
// simply happens to be prone to hitting this in practice. In theory, we		// simply happens to be prone to hitting this in practice. In theory, we
// can hit the same issue for any SCEV, or ValueTracking query done during		// can hit the same issue for any SCEV, or ValueTracking query done during
// mutation. See PR49900.		// mutation. See PR49900.
▲ Show 20 Lines • Show All 381 Lines • ▼ Show 20 Lines	bool LoopVectorizationPlanner::getDecisionAndClampRange(
return PredicateAtRangeStart;		return PredicateAtRangeStart;
}		}

/// Build VPlans for the full range of feasible VF's = {\p MinVF, 2 * \p MinVF,		/// Build VPlans for the full range of feasible VF's = {\p MinVF, 2 * \p MinVF,
/// 4 * \p MinVF, ..., \p MaxVF} by repeatedly building a VPlan for a sub-range		/// 4 * \p MinVF, ..., \p MaxVF} by repeatedly building a VPlan for a sub-range
/// of VF's starting at a given VF and extending it as much as possible. Each		/// of VF's starting at a given VF and extending it as much as possible. Each
/// vectorization decision can potentially shorten this sub-range during		/// vectorization decision can potentially shorten this sub-range during
/// buildVPlan().		/// buildVPlan().
void LoopVectorizationPlanner::buildVPlans(ElementCount MinVF,		void LoopVectorizationPlanner::buildVPlans(LoopVectorizationCostModel &CM,
		ElementCount MinVF,
ElementCount MaxVF) {		ElementCount MaxVF) {
auto MaxVFPlusOne = MaxVF.getWithIncrement(1);		auto MaxVFPlusOne = MaxVF.getWithIncrement(1);
for (ElementCount VF = MinVF; ElementCount::isKnownLT(VF, MaxVFPlusOne);) {		for (ElementCount VF = MinVF; ElementCount::isKnownLT(VF, MaxVFPlusOne);) {
VFRange SubRange = {VF, MaxVFPlusOne};		VFRange SubRange = {VF, MaxVFPlusOne};
VPlans.push_back(buildVPlan(SubRange));		VPlans.push_back(buildVPlan(CM, SubRange));
VF = SubRange.End;		VF = SubRange.End;
}		}
}		}

VPValue VPRecipeBuilder::createEdgeMask(BasicBlock Src, BasicBlock *Dst,		VPValue VPRecipeBuilder::createEdgeMask(BasicBlock Src, BasicBlock *Dst,
VPlan &Plan) {		VPlan &Plan) {
assert(is_contained(predecessors(Dst), Src) && "Invalid edge");		assert(is_contained(predecessors(Dst), Src) && "Invalid edge");

▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	VPValue VPRecipeBuilder::createBlockInMask(BasicBlock BB, VPlan &Plan) {
// All-one mask is modelled as no-mask following the convention for masked		// All-one mask is modelled as no-mask following the convention for masked
// load/store/gather/scatter. Initialize BlockMask to no-mask.		// load/store/gather/scatter. Initialize BlockMask to no-mask.
VPValue *BlockMask = nullptr;		VPValue *BlockMask = nullptr;

if (OrigLoop->getHeader() == BB) {		if (OrigLoop->getHeader() == BB) {
if (!CM.blockNeedsPredicationForAnyReason(BB))		if (!CM.blockNeedsPredicationForAnyReason(BB))
return BlockMaskCache[BB] = BlockMask; // Loop incoming mask is all-one.		return BlockMaskCache[BB] = BlockMask; // Loop incoming mask is all-one.

assert(CM.foldTailByMasking() && "must fold the tail");		assert(Plan.foldTailByMasking() && "must fold the tail");

// If we're using the active lane mask for control flow, then we get the		// If we're using the active lane mask for control flow, then we get the
// mask from the active lane mask PHI that is cached in the VPlan.		// mask from the active lane mask PHI that is cached in the VPlan.
TailFoldingStyle TFStyle = CM.getTailFoldingStyle();		TailFoldingStyle TFStyle = CM.getTailFoldingStyle();
if (useActiveLaneMaskForControlFlow(TFStyle))		if (useActiveLaneMaskForControlFlow(TFStyle))
return BlockMaskCache[BB] = Plan.getActiveLaneMaskPhi();		return BlockMaskCache[BB] = Plan.getActiveLaneMaskPhi();

// Introduce the early-exit compare IV <= BTC to form header block mask.		// Introduce the early-exit compare IV <= BTC to form header block mask.
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	if (CM.isScalarAfterVectorization(I, VF) \|\|
return false;		return false;
return Decision != LoopVectorizationCostModel::CM_Scalarize;		return Decision != LoopVectorizationCostModel::CM_Scalarize;
};		};

if (!LoopVectorizationPlanner::getDecisionAndClampRange(willWiden, Range))		if (!LoopVectorizationPlanner::getDecisionAndClampRange(willWiden, Range))
return nullptr;		return nullptr;

VPValue *Mask = nullptr;		VPValue *Mask = nullptr;
if (Legal->isMaskRequired(I))		if (Legal->isMaskRequired(Plan->foldTailByMasking(), I))
Mask = createBlockInMask(I->getParent(), *Plan);		Mask = createBlockInMask(I->getParent(), *Plan);

// Determine if the pointer operand of the access is either consecutive or		// Determine if the pointer operand of the access is either consecutive or
// reverse consecutive.		// reverse consecutive.
LoopVectorizationCostModel::InstWidening Decision =		LoopVectorizationCostModel::InstWidening Decision =
CM.getWideningDecision(I, Range.Start);		CM.getWideningDecision(I, Range.Start);
bool Reverse = Decision == LoopVectorizationCostModel::CM_Widen_Reverse;		bool Reverse = Decision == LoopVectorizationCostModel::CM_Widen_Reverse;
bool Consecutive =		bool Consecutive =
▲ Show 20 Lines • Show All 455 Lines • ▼ Show 20 Lines	VPRecipeBuilder::tryToCreateWidenRecipe(Instruction *Instr,
if (auto *SI = dyn_cast<SelectInst>(Instr)) {		if (auto *SI = dyn_cast<SelectInst>(Instr)) {
return toVPRecipeResult(new VPWidenSelectRecipe(		return toVPRecipeResult(new VPWidenSelectRecipe(
*SI, make_range(Operands.begin(), Operands.end())));		*SI, make_range(Operands.begin(), Operands.end())));
}		}

return toVPRecipeResult(tryToWiden(Instr, Operands, VPBB, Plan));		return toVPRecipeResult(tryToWiden(Instr, Operands, VPBB, Plan));
}		}

void LoopVectorizationPlanner::buildVPlansWithVPRecipes(ElementCount MinVF,		void LoopVectorizationPlanner::buildVPlansWithVPRecipes(
ElementCount MaxVF) {		LoopVectorizationCostModel &CM, ElementCount MinVF, ElementCount MaxVF) {
assert(OrigLoop->isInnermost() && "Inner loop expected.");		assert(OrigLoop->isInnermost() && "Inner loop expected.");

		auto MaxVFPlusOne = MaxVF.getWithIncrement(1);
		for (ElementCount VF = MinVF; ElementCount::isKnownLT(VF, MaxVFPlusOne);
		david-armUnsubmitted Done Reply Inline Actions Wouldn't it be simpler to just write: for (ElementCount VF = MinVF; ElementCount::isKnownLE(VF, MaxVF); david-arm: Wouldn't it be simpler to just write: for (ElementCount VF = MinVF; ElementCount::isKnownLE…
		VF *= 2) {
		// Collect Uniform and Scalar instructions after vectorization with VF.
		CM.collectUniformsAndScalars(VF);

		// Collect the instructions (and their associated costs) that will be more
		// profitable to scalarize.
		if (VF.isVector())
		CM.collectInstsToScalarize(VF);
		}

// Add assume instructions we need to drop to DeadInstructions, to prevent		// Add assume instructions we need to drop to DeadInstructions, to prevent
// them from being added to the VPlan.		// them from being added to the VPlan.
// TODO: We only need to drop assumes in blocks that get flattend. If the		// TODO: We only need to drop assumes in blocks that get flattend. If the
// control flow is preserved, we should keep them.		// control flow is preserved, we should keep them.
SmallPtrSet<Instruction *, 4> DeadInstructions;		SmallPtrSet<Instruction *, 4> DeadInstructions;
auto &ConditionalAssumes = Legal->getConditionalAssumes();		auto &ConditionalAssumes =
		Legal->getConditionalAssumes(CM.foldTailByMasking());
DeadInstructions.insert(ConditionalAssumes.begin(), ConditionalAssumes.end());		DeadInstructions.insert(ConditionalAssumes.begin(), ConditionalAssumes.end());

auto MaxVFPlusOne = MaxVF.getWithIncrement(1);
for (ElementCount VF = MinVF; ElementCount::isKnownLT(VF, MaxVFPlusOne);) {		for (ElementCount VF = MinVF; ElementCount::isKnownLT(VF, MaxVFPlusOne);) {
VFRange SubRange = {VF, MaxVFPlusOne};		VFRange SubRange = {VF, MaxVFPlusOne};
VPlans.push_back(buildVPlanWithVPRecipes(SubRange, DeadInstructions));		VPlans.push_back(buildVPlanWithVPRecipes(CM, SubRange, DeadInstructions));
VF = SubRange.End;		VF = SubRange.End;
}		}
}		}

// Add the necessary canonical IV and branch recipes required to control the		// Add the necessary canonical IV and branch recipes required to control the
// loop.		// loop.
static void addCanonicalIVRecipes(VPlan &Plan, Type *IdxTy, DebugLoc DL,		static void addCanonicalIVRecipes(VPlan &Plan, Type *IdxTy, DebugLoc DL,
TailFoldingStyle Style) {		TailFoldingStyle Style) {
▲ Show 20 Lines • Show All 115 Lines • ▼ Show 20 Lines	for (PHINode &ExitPhi : ExitBB->phis()) {
Value *IncomingValue =		Value *IncomingValue =
ExitPhi.getIncomingValueForBlock(ExitingBB);		ExitPhi.getIncomingValueForBlock(ExitingBB);
VPValue *V = Plan.getOrAddVPValue(IncomingValue, true);		VPValue *V = Plan.getOrAddVPValue(IncomingValue, true);
Plan.addLiveOut(&ExitPhi, V);		Plan.addLiveOut(&ExitPhi, V);
}		}
}		}

VPlanPtr LoopVectorizationPlanner::buildVPlanWithVPRecipes(		VPlanPtr LoopVectorizationPlanner::buildVPlanWithVPRecipes(
VFRange &Range, SmallPtrSetImpl<Instruction *> &DeadInstructions) {		LoopVectorizationCostModel &CM, VFRange &Range,
		SmallPtrSetImpl<Instruction *> &DeadInstructions) {

SmallPtrSet<const InterleaveGroup<Instruction> *, 1> InterleaveGroups;		SmallPtrSet<const InterleaveGroup<Instruction> *, 1> InterleaveGroups;

VPRecipeBuilder RecipeBuilder(OrigLoop, TLI, Legal, CM, PSE, Builder);		VPRecipeBuilder RecipeBuilder(OrigLoop, TLI, Legal, CM, PSE, Builder);

// ---------------------------------------------------------------------------		// ---------------------------------------------------------------------------
// Pre-construction: record ingredients whose recipes we'll need to further		// Pre-construction: record ingredients whose recipes we'll need to further
// process after constructing the initial VPlan.		// process after constructing the initial VPlan.
Show All 17 Lines	for (const auto &Reduction : CM.getInLoopReductionChains()) {
}		}
}		}

// For each interleave group which is relevant for this (possibly trimmed)		// For each interleave group which is relevant for this (possibly trimmed)
// Range, add it to the set of groups to be later applied to the VPlan and add		// Range, add it to the set of groups to be later applied to the VPlan and add
// placeholders for its members' Recipes which we'll be replacing with a		// placeholders for its members' Recipes which we'll be replacing with a
// single VPInterleaveRecipe.		// single VPInterleaveRecipe.
for (InterleaveGroup<Instruction> *IG : IAI.getInterleaveGroups()) {		for (InterleaveGroup<Instruction> *IG : IAI.getInterleaveGroups()) {
auto applyIG = [IG, this](ElementCount VF) -> bool {		auto applyIG = [IG, &CM](ElementCount VF) -> bool {
return (VF.isVector() && // Query is illegal for VF == 1		return (VF.isVector() && // Query is illegal for VF == 1
CM.getWideningDecision(IG->getInsertPos(), VF) ==		CM.getWideningDecision(IG->getInsertPos(), VF) ==
LoopVectorizationCostModel::CM_Interleave);		LoopVectorizationCostModel::CM_Interleave);
};		};
if (!getDecisionAndClampRange(applyIG, Range))		if (!getDecisionAndClampRange(applyIG, Range))
continue;		continue;
InterleaveGroups.insert(IG);		InterleaveGroups.insert(IG);
for (unsigned i = 0; i < IG->getFactor(); i++)		for (unsigned i = 0; i < IG->getFactor(); i++)
if (Instruction *Member = IG->getMember(i))		if (Instruction *Member = IG->getMember(i))
RecipeBuilder.recordRecipeOf(Member);		RecipeBuilder.recordRecipeOf(Member);
};		};

// ---------------------------------------------------------------------------		// ---------------------------------------------------------------------------
// Build initial VPlan: Scan the body of the loop in a topological order to		// Build initial VPlan: Scan the body of the loop in a topological order to
// visit each basic block after having visited its predecessor basic blocks.		// visit each basic block after having visited its predecessor basic blocks.
// ---------------------------------------------------------------------------		// ---------------------------------------------------------------------------

// Create initial VPlan skeleton, starting with a block for the pre-header,		// Create initial VPlan skeleton, starting with a block for the pre-header,
// followed by a region for the vector loop, followed by the middle block. The		// followed by a region for the vector loop, followed by the middle block. The
// skeleton vector loop region contains a header and latch block.		// skeleton vector loop region contains a header and latch block.
VPBasicBlock *Preheader = new VPBasicBlock("vector.ph");		VPBasicBlock *Preheader = new VPBasicBlock("vector.ph");
auto Plan = std::make_unique<VPlan>(Preheader);		auto Plan = std::make_unique<VPlan>(CM.foldTailByMasking(), &CM, Preheader);

VPBasicBlock *HeaderVPBB = new VPBasicBlock("vector.body");		VPBasicBlock *HeaderVPBB = new VPBasicBlock("vector.body");
VPBasicBlock *LatchVPBB = new VPBasicBlock("vector.latch");		VPBasicBlock *LatchVPBB = new VPBasicBlock("vector.latch");
VPBlockUtils::insertBlockAfter(LatchVPBB, HeaderVPBB);		VPBlockUtils::insertBlockAfter(LatchVPBB, HeaderVPBB);
auto *TopRegion = new VPRegionBlock(HeaderVPBB, LatchVPBB, "vector loop");		auto *TopRegion = new VPRegionBlock(HeaderVPBB, LatchVPBB, "vector loop");
VPBlockUtils::insertBlockAfter(TopRegion, Preheader);		VPBlockUtils::insertBlockAfter(TopRegion, Preheader);
VPBasicBlock *MiddleVPBB = new VPBasicBlock("middle.block");		VPBasicBlock *MiddleVPBB = new VPBasicBlock("middle.block");
VPBlockUtils::insertBlockAfter(MiddleVPBB, TopRegion);		VPBlockUtils::insertBlockAfter(MiddleVPBB, TopRegion);
▲ Show 20 Lines • Show All 103 Lines • ▼ Show 20 Lines	VPlanPtr LoopVectorizationPlanner::buildVPlanWithVPRecipes(
// Transform initial VPlan: Apply previously taken decisions, in order, to		// Transform initial VPlan: Apply previously taken decisions, in order, to
// bring the VPlan to its final state.		// bring the VPlan to its final state.
// ---------------------------------------------------------------------------		// ---------------------------------------------------------------------------

VPlanTransforms::removeRedundantCanonicalIVs(*Plan);		VPlanTransforms::removeRedundantCanonicalIVs(*Plan);
VPlanTransforms::removeRedundantInductionCasts(*Plan);		VPlanTransforms::removeRedundantInductionCasts(*Plan);

// Adjust the recipes for any inloop reductions.		// Adjust the recipes for any inloop reductions.
adjustRecipesForReductions(cast<VPBasicBlock>(TopRegion->getExiting()), Plan,		adjustRecipesForReductions(CM, cast<VPBasicBlock>(TopRegion->getExiting()),
RecipeBuilder, Range.Start);		Plan, RecipeBuilder, Range.Start);

// Sink users of fixed-order recurrence past the recipe defining the previous		// Sink users of fixed-order recurrence past the recipe defining the previous
// value and introduce FirstOrderRecurrenceSplice VPInstructions.		// value and introduce FirstOrderRecurrenceSplice VPInstructions.
VPlanTransforms::adjustFixedOrderRecurrences(*Plan, Builder);		VPlanTransforms::adjustFixedOrderRecurrences(*Plan, Builder);

// Interleave memory: for each Interleave Group we marked earlier as relevant		// Interleave memory: for each Interleave Group we marked earlier as relevant
// for this VPlan, replace the Recipes widening its memory instructions with a		// for this VPlan, replace the Recipes widening its memory instructions with a
// single VPInterleaveRecipe at its insertion point.		// single VPInterleaveRecipe at its insertion point.
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	VPlanPtr LoopVectorizationPlanner::buildVPlanWithVPRecipes(

VPlanTransforms::removeRedundantExpandSCEVRecipes(*Plan);		VPlanTransforms::removeRedundantExpandSCEVRecipes(*Plan);
VPlanTransforms::mergeBlocksIntoPredecessors(*Plan);		VPlanTransforms::mergeBlocksIntoPredecessors(*Plan);

assert(VPlanVerifier::verifyPlanIsValid(*Plan) && "VPlan is invalid");		assert(VPlanVerifier::verifyPlanIsValid(*Plan) && "VPlan is invalid");
return Plan;		return Plan;
}		}

VPlanPtr LoopVectorizationPlanner::buildVPlan(VFRange &Range) {		VPlanPtr LoopVectorizationPlanner::buildVPlan(LoopVectorizationCostModel &CM,
		VFRange &Range) {
// Outer loop handling: They may require CFG and instruction level		// Outer loop handling: They may require CFG and instruction level
// transformations before even evaluating whether vectorization is profitable.		// transformations before even evaluating whether vectorization is profitable.
// Since we cannot modify the incoming IR, we need to build VPlan upfront in		// Since we cannot modify the incoming IR, we need to build VPlan upfront in
// the vectorization pipeline.		// the vectorization pipeline.
assert(!OrigLoop->isInnermost());		assert(!OrigLoop->isInnermost());
assert(EnableVPlanNativePath && "VPlan-native path is not enabled.");		assert(EnableVPlanNativePath && "VPlan-native path is not enabled.");

// Create new empty VPlan		// Create new empty VPlan
auto Plan = std::make_unique<VPlan>();		auto Plan = std::make_unique<VPlan>(CM.foldTailByMasking(), &CM);

// Build hierarchical CFG		// Build hierarchical CFG
VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI, *Plan);		VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI, *Plan);
HCFGBuilder.buildHierarchicalCFG();		HCFGBuilder.buildHierarchicalCFG();

for (ElementCount VF = Range.Start; ElementCount::isKnownLT(VF, Range.End);		for (ElementCount VF = Range.Start; ElementCount::isKnownLT(VF, Range.End);
VF *= 2)		VF *= 2)
Plan->addVF(VF);		Plan->addVF(VF);

SmallPtrSet<Instruction *, 1> DeadInstructions;		SmallPtrSet<Instruction *, 1> DeadInstructions;
VPlanTransforms::VPInstructionsToVPRecipes(		VPlanTransforms::VPInstructionsToVPRecipes(
Plan,		Plan,
[this](PHINode *P) { return Legal->getIntOrFpInductionDescriptor(P); },		[this](PHINode *P) { return Legal->getIntOrFpInductionDescriptor(P); },
DeadInstructions, PSE.getSE(), TLI);		DeadInstructions, PSE.getSE(), TLI);

// Remove the existing terminator of the exiting block of the top-most region.		// Remove the existing terminator of the exiting block of the top-most region.
// A BranchOnCount will be added instead when adding the canonical IV recipes.		// A BranchOnCount will be added instead when adding the canonical IV recipes.
auto *Term =		auto *Term =
Plan->getVectorLoopRegion()->getExitingBasicBlock()->getTerminator();		Plan->getVectorLoopRegion()->getExitingBasicBlock()->getTerminator();
Term->eraseFromParent();		Term->eraseFromParent();

addCanonicalIVRecipes(*Plan, Legal->getWidestInductionType(), DebugLoc(),		addCanonicalIVRecipes(*Plan, Legal->getWidestInductionType(), DebugLoc(),
CM.getTailFoldingStyle());		CM.getTailFoldingStyle(false));
return Plan;		return Plan;
}		}

// Adjust the recipes for reductions. For in-loop reductions the chain of		// Adjust the recipes for reductions. For in-loop reductions the chain of
// instructions leading from the loop exit instr to the phi need to be converted		// instructions leading from the loop exit instr to the phi need to be converted
// to reductions, with one operand being vector and the other being the scalar		// to reductions, with one operand being vector and the other being the scalar
// reduction chain. For other reductions, a select is introduced between the phi		// reduction chain. For other reductions, a select is introduced between the phi
// and live-out recipes when folding the tail.		// and live-out recipes when folding the tail.
void LoopVectorizationPlanner::adjustRecipesForReductions(		void LoopVectorizationPlanner::adjustRecipesForReductions(
VPBasicBlock *LatchVPBB, VPlanPtr &Plan, VPRecipeBuilder &RecipeBuilder,		LoopVectorizationCostModel &CM, VPBasicBlock *LatchVPBB, VPlanPtr &Plan,
ElementCount MinVF) {		VPRecipeBuilder &RecipeBuilder, ElementCount MinVF) {
for (const auto &Reduction : CM.getInLoopReductionChains()) {		for (const auto &Reduction : CM.getInLoopReductionChains()) {
PHINode *Phi = Reduction.first;		PHINode *Phi = Reduction.first;
const RecurrenceDescriptor &RdxDesc =		const RecurrenceDescriptor &RdxDesc =
Legal->getReductionVars().find(Phi)->second;		Legal->getReductionVars().find(Phi)->second;
const SmallVector<Instruction *, 4> &ReductionOperations = Reduction.second;		const SmallVector<Instruction *, 4> &ReductionOperations = Reduction.second;

if (MinVF.isScalar() && !CM.useOrderedReductions(RdxDesc))		if (MinVF.isScalar() && !CM.useOrderedReductions(RdxDesc))
continue;		continue;
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	for (Instruction *R : ReductionOperations) {
}		}
Chain = R;		Chain = R;
}		}
}		}

// If tail is folded by masking, introduce selects between the phi		// If tail is folded by masking, introduce selects between the phi
// and the live-out instruction of each reduction, at the beginning of the		// and the live-out instruction of each reduction, at the beginning of the
// dedicated latch block.		// dedicated latch block.
if (CM.foldTailByMasking()) {		if (Plan->foldTailByMasking()) {
Builder.setInsertPoint(LatchVPBB, LatchVPBB->begin());		Builder.setInsertPoint(LatchVPBB, LatchVPBB->begin());
for (VPRecipeBase &R :		for (VPRecipeBase &R :
Plan->getVectorLoopRegion()->getEntryBasicBlock()->phis()) {		Plan->getVectorLoopRegion()->getEntryBasicBlock()->phis()) {
VPReductionPHIRecipe *PhiR = dyn_cast<VPReductionPHIRecipe>(&R);		VPReductionPHIRecipe *PhiR = dyn_cast<VPReductionPHIRecipe>(&R);
if (!PhiR \|\| PhiR->isInLoop())		if (!PhiR \|\| PhiR->isInLoop())
continue;		continue;
VPValue *Cond =		VPValue *Cond =
RecipeBuilder.createBlockInMask(OrigLoop->getHeader(), *Plan);		RecipeBuilder.createBlockInMask(OrigLoop->getHeader(), *Plan);
▲ Show 20 Lines • Show All 651 Lines • ▼ Show 20 Lines	static bool processLoopInVPlanNativePath(
}		}
assert(EnableVPlanNativePath && "VPlan-native path is disabled.");		assert(EnableVPlanNativePath && "VPlan-native path is disabled.");
Function *F = L->getHeader()->getParent();		Function *F = L->getHeader()->getParent();
InterleavedAccessInfo IAI(PSE, L, DT, LI, LVL->getLAI());		InterleavedAccessInfo IAI(PSE, L, DT, LI, LVL->getLAI());

ScalarEpilogueLowering SEL = getScalarEpilogueLowering(		ScalarEpilogueLowering SEL = getScalarEpilogueLowering(
F, L, Hints, PSI, BFI, TTI, TLI, AC, LI, PSE.getSE(), DT, *LVL, &IAI);		F, L, Hints, PSI, BFI, TTI, TLI, AC, LI, PSE.getSE(), DT, *LVL, &IAI);

LoopVectorizationCostModel CM(SEL, L, PSE, LI, LVL, *TTI, TLI, DB, AC, ORE, F,		LoopVectorizationCostModel CM(false, SEL, L, PSE, LI, LVL, *TTI, TLI, DB, AC,
&Hints, IAI);		ORE, F, &Hints, IAI);
// Use the planner for outer loop vectorization.		// Use the planner for outer loop vectorization.
// TODO: CM is not used at this point inside the planner. Turn CM into an		LoopVectorizationPlanner LVP(L, LI, TLI, TTI, LVL, IAI, PSE, Hints, ORE);
// optional argument if we don't need it in the future.
LoopVectorizationPlanner LVP(L, LI, TLI, TTI, LVL, CM, IAI, PSE, Hints, ORE);

// Get user vectorization factor.		// Get user vectorization factor.
ElementCount UserVF = Hints.getWidth();		ElementCount UserVF = Hints.getWidth();

CM.collectElementTypesForWidening();		CM.collectElementTypesForWidening();

// Plan how to best vectorize, return the best VF and its cost.		// Plan how to best vectorize, return the best VF and its cost.
const VectorizationFactor VF = LVP.planInVPlanNativePath(UserVF);		const VectorizationFactor VF = LVP.planInVPlanNativePath(CM, UserVF);

// If we are stress testing VPlan builds, do not attempt to generate vector		// If we are stress testing VPlan builds, do not attempt to generate vector
// code. Masked vector code generation support will follow soon.		// code. Masked vector code generation support will follow soon.
// Also, do not attempt to vectorize if no vector code will be produced.		// Also, do not attempt to vectorize if no vector code will be produced.
if (VPlanBuildStressTest \|\| VectorizationFactor::Disabled() == VF)		if (VPlanBuildStressTest \|\| VectorizationFactor::Disabled() == VF)
return false;		return false;

VPlan &BestPlan = LVP.getBestPlanFor(VF.Width);		VPlan &BestPlan = LVP.getBestPlanFor(VF.Width, VF.FoldTailByMasking);

{		{
GeneratedRTChecks Checks(*PSE.getSE(), DT, LI, TTI,		GeneratedRTChecks Checks(*PSE.getSE(), DT, LI, TTI,
F->getParent()->getDataLayout());		F->getParent()->getDataLayout());
InnerLoopVectorizer LB(L, PSE, LI, DT, TLI, TTI, AC, ORE, VF.Width,		InnerLoopVectorizer LB(L, PSE, LI, DT, TLI, TTI, AC, ORE, VF.Width,
VF.Width, 1, LVL, &CM, BFI, PSI, Checks);		VF.Width, 1, LVL, &CM, BFI, PSI, Checks);
LLVM_DEBUG(dbgs() << "Vectorizing outer loop in \""		LLVM_DEBUG(dbgs() << "Vectorizing outer loop in \""
<< L->getHeader()->getParent()->getName() << "\"\n");		<< L->getHeader()->getParent()->getName() << "\"\n");
▲ Show 20 Lines • Show All 215 Lines • ▼ Show 20 Lines	#endif /* NDEBUG */
// the incoming IR, we need to build VPlan upfront in the vectorization		// the incoming IR, we need to build VPlan upfront in the vectorization
// pipeline.		// pipeline.
if (!L->isInnermost())		if (!L->isInnermost())
return processLoopInVPlanNativePath(L, PSE, LI, DT, &LVL, TTI, TLI, DB, AC,		return processLoopInVPlanNativePath(L, PSE, LI, DT, &LVL, TTI, TLI, DB, AC,
ORE, BFI, PSI, Hints, Requirements);		ORE, BFI, PSI, Hints, Requirements);

assert(L->isInnermost() && "Inner loop expected.");		assert(L->isInnermost() && "Inner loop expected.");

InterleavedAccessInfo IAI(PSE, L, DT, LI, LVL.getLAI());		InterleavedAccessInfo BaseIAI(PSE, L, DT, LI, LVL.getLAI());
bool UseInterleaved = TTI->enableInterleavedAccessVectorization();		bool UseInterleaved = TTI->enableInterleavedAccessVectorization();

// If an override option has been passed in for interleaved accesses, use it.		// If an override option has been passed in for interleaved accesses, use it.
if (EnableInterleavedMemAccesses.getNumOccurrences() > 0)		if (EnableInterleavedMemAccesses.getNumOccurrences() > 0)
UseInterleaved = EnableInterleavedMemAccesses;		UseInterleaved = EnableInterleavedMemAccesses;

// Analyze interleaved memory accesses.		// Analyze interleaved memory accesses.
if (UseInterleaved)		if (UseInterleaved)
IAI.analyzeInterleaving(useMaskedInterleavedAccesses(*TTI));		BaseIAI.analyzeInterleaving(useMaskedInterleavedAccesses(*TTI));

// Check the function attributes and profiles to find out if this function		// Check the function attributes and profiles to find out if this function
// should be optimized for size.		// should be optimized for size.
ScalarEpilogueLowering SEL = getScalarEpilogueLowering(		ScalarEpilogueLowering SEL = getScalarEpilogueLowering(
F, L, Hints, PSI, BFI, TTI, TLI, AC, LI, PSE.getSE(), DT, LVL, &IAI);		F, L, Hints, PSI, BFI, TTI, TLI, AC, LI, PSE.getSE(), DT, LVL, &BaseIAI);

// Check the loop for a trip count threshold: vectorize loops with a tiny trip		// Check the loop for a trip count threshold: vectorize loops with a tiny trip
// count by optimizing for size, to minimize overheads.		// count by optimizing for size, to minimize overheads.
auto ExpectedTC = getSmallBestKnownTC(*SE, L);		auto ExpectedTC = getSmallBestKnownTC(*SE, L);
if (ExpectedTC && *ExpectedTC < TinyTripCountVectorThreshold) {		if (ExpectedTC && *ExpectedTC < TinyTripCountVectorThreshold) {
LLVM_DEBUG(dbgs() << "LV: Found a loop with a very small trip count. "		LLVM_DEBUG(dbgs() << "LV: Found a loop with a very small trip count. "
<< "This loop is worth vectorizing only if no scalar "		<< "This loop is worth vectorizing only if no scalar "
<< "iteration overheads are incurred.");		<< "iteration overheads are incurred.");
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	ORE->emit([&]() {
"floating-point operations";		"floating-point operations";
});		});
LLVM_DEBUG(dbgs() << "LV: loop not vectorized: cannot prove it is safe to "		LLVM_DEBUG(dbgs() << "LV: loop not vectorized: cannot prove it is safe to "
"reorder floating-point operations\n");		"reorder floating-point operations\n");
Hints.emitRemarkWithHints();		Hints.emitRemarkWithHints();
return false;		return false;
}		}

// Use the cost model.
LoopVectorizationCostModel CM(SEL, L, PSE, LI, &LVL, *TTI, TLI, DB, AC, ORE,
F, &Hints, IAI);
CM.collectValuesToIgnore();
CM.collectElementTypesForWidening();

// Use the planner for vectorization.		// Use the planner for vectorization.
LoopVectorizationPlanner LVP(L, LI, TLI, TTI, &LVL, CM, IAI, PSE, Hints, ORE);		LoopVectorizationPlanner LVP(L, LI, TLI, TTI, &LVL, BaseIAI, PSE, Hints, ORE);

// Get user vectorization factor and interleave count.		// Get user vectorization factor and interleave count.
ElementCount UserVF = Hints.getWidth();		ElementCount UserVF = Hints.getWidth();
unsigned UserIC = Hints.getInterleave();		unsigned UserIC = Hints.getInterleave();

		// Don't use a predicated vector body if the user has forced a vectorization
		// factor of 1.
		if (UserVF.isScalar())
		SEL = CM_ScalarEpilogueAllowed;

		// We plan with two different cost models with FoldTailByMasking = false and
		// true, adding the useful vplans from each and picking the best below in
		// selectVectorizationFactor.
		LoopVectorizationCostModel BaseCM(false, SEL, L, PSE, LI, &LVL, *TTI, TLI, DB,
		AC, ORE, F, &Hints, BaseIAI);
		LVP.plan(BaseCM, UserVF, UserIC);

		InterleavedAccessInfo PredIAI(PSE, L, DT, LI, LVL.getLAI());
		if (UseInterleaved && SEL != CM_ScalarEpilogueAllowed)
		PredIAI.analyzeInterleaving(useMaskedInterleavedAccesses(*TTI));
		LoopVectorizationCostModel PredCM(true, SEL, L, PSE, LI, &LVL, *TTI, TLI, DB,
		AC, ORE, F, &Hints, PredIAI);
		if (SEL != CM_ScalarEpilogueAllowed)
		LVP.plan(PredCM, UserVF, UserIC);

// Plan how to best vectorize, return the best VF and its cost.		// Plan how to best vectorize, return the best VF and its cost.
std::optional<VectorizationFactor> MaybeVF = LVP.plan(UserVF, UserIC);		// Use the cost model.
		std::optional<VectorizationFactor> MaybeVF = LVP.selectVectorizationFactor();

VectorizationFactor VF = VectorizationFactor::Disabled();		VectorizationFactor VF = VectorizationFactor::Disabled();
unsigned IC = 1;		unsigned IC = 1;

GeneratedRTChecks Checks(*PSE.getSE(), DT, LI, TTI,		GeneratedRTChecks Checks(*PSE.getSE(), DT, LI, TTI,
F->getParent()->getDataLayout());		F->getParent()->getDataLayout());
if (MaybeVF) {		if (MaybeVF) {
VF = *MaybeVF;		VF = *MaybeVF;
// Select the interleave count.		// Select the interleave count.
IC = CM.selectInterleaveCount(VF.Width, VF.Cost);		IC = VF.FoldTailByMasking ? PredCM.selectInterleaveCount(VF)
		: BaseCM.selectInterleaveCount(VF);

unsigned SelectedIC = std::max(IC, UserIC);		unsigned SelectedIC = std::max(IC, UserIC);
// Optimistically generate runtime checks if they are needed. Drop them if		// Optimistically generate runtime checks if they are needed. Drop them if
// they turn out to not be profitable.		// they turn out to not be profitable.
if (VF.Width.isVector() \|\| SelectedIC > 1)		if (VF.Width.isVector() \|\| SelectedIC > 1)
Checks.Create(L, *LVL.getLAI(), PSE.getPredicate(), VF.Width, SelectedIC);		Checks.Create(L, *LVL.getLAI(), PSE.getPredicate(), VF.Width, SelectedIC);

// Check if it is profitable to vectorize with runtime checks.		// Check if it is profitable to vectorize with runtime checks.
bool ForceVectorization =		bool ForceVectorization =
Hints.getForce() == LoopVectorizeHints::FK_Enabled;		Hints.getForce() == LoopVectorizeHints::FK_Enabled;
if (!ForceVectorization &&		if (!ForceVectorization &&
!areRuntimeChecksProfitable(Checks, VF, CM.getVScaleForTuning(), L,		!areRuntimeChecksProfitable(Checks, VF, getVScaleForTuning(L, *TTI), L,
*PSE.getSE())) {		*PSE.getSE())) {
ORE->emit([&]() {		ORE->emit([&]() {
return OptimizationRemarkAnalysisAliasing(		return OptimizationRemarkAnalysisAliasing(
DEBUG_TYPE, "CantReorderMemOps", L->getStartLoc(),		DEBUG_TYPE, "CantReorderMemOps", L->getStartLoc(),
L->getHeader())		L->getHeader())
<< "loop not vectorized: cannot prove it is safe to reorder "		<< "loop not vectorized: cannot prove it is safe to reorder "
"memory operations";		"memory operations";
});		});
▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	#endif /* NDEBUG */
bool DisableRuntimeUnroll = false;		bool DisableRuntimeUnroll = false;
MDNode *OrigLoopID = L->getLoopID();		MDNode *OrigLoopID = L->getLoopID();
{		{
using namespace ore;		using namespace ore;
if (!VectorizeLoop) {		if (!VectorizeLoop) {
assert(IC > 1 && "interleave count should not be 1 or 0");		assert(IC > 1 && "interleave count should not be 1 or 0");
// If we decided that it is not legal to vectorize the loop, then		// If we decided that it is not legal to vectorize the loop, then
// interleave it.		// interleave it.
		VPlan &BestPlan = LVP.getBestPlanFor(VF.Width, VF.FoldTailByMasking);
InnerLoopUnroller Unroller(L, PSE, LI, DT, TLI, TTI, AC, ORE, IC, &LVL,		InnerLoopUnroller Unroller(L, PSE, LI, DT, TLI, TTI, AC, ORE, IC, &LVL,
&CM, BFI, PSI, Checks);		BestPlan.getCostModel(), BFI, PSI, Checks);

VPlan &BestPlan = LVP.getBestPlanFor(VF.Width);
LVP.executePlan(VF.Width, IC, BestPlan, Unroller, DT, false);		LVP.executePlan(VF.Width, IC, BestPlan, Unroller, DT, false);

ORE->emit([&]() {		ORE->emit([&]() {
return OptimizationRemark(LV_NAME, "Interleaved", L->getStartLoc(),		return OptimizationRemark(LV_NAME, "Interleaved", L->getStartLoc(),
L->getHeader())		L->getHeader())
<< "interleaved loop (interleaved count: "		<< "interleaved loop (interleaved count: "
<< NV("InterleaveCount", IC) << ")";		<< NV("InterleaveCount", IC) << ")";
});		});
} else {		} else {
// If we decided that it is legal to vectorize the loop, then do it.		// If we decided that it is legal to vectorize the loop, then do it.

// Consider vectorizing the epilogue too if it's profitable.		// Consider vectorizing the epilogue too if it's profitable.
VectorizationFactor EpilogueVF =		VectorizationFactor EpilogueVF =
CM.selectEpilogueVectorizationFactor(VF.Width, LVP);		LVP.selectEpilogueVectorizationFactor(VF);
if (EpilogueVF.Width.isVector()) {		if (EpilogueVF.Width.isVector()) {

// The first pass vectorizes the main loop and creates a scalar epilogue		// The first pass vectorizes the main loop and creates a scalar epilogue
// to be vectorized by executing the plan (potentially with a different		// to be vectorized by executing the plan (potentially with a different
// factor) again shortly afterwards.		// factor) again shortly afterwards.
		// TODOD: Predicated remainders
EpilogueLoopVectorizationInfo EPI(VF.Width, IC, EpilogueVF.Width, 1);		EpilogueLoopVectorizationInfo EPI(VF.Width, IC, EpilogueVF.Width, 1);
EpilogueVectorizerMainLoop MainILV(L, PSE, LI, DT, TLI, TTI, AC, ORE,
EPI, &LVL, &CM, BFI, PSI, Checks);

VPlan &BestMainPlan = LVP.getBestPlanFor(EPI.MainLoopVF);		VPlan &BestMainPlan = LVP.getBestPlanFor(EPI.MainLoopVF, false);
		EpilogueVectorizerMainLoop MainILV(
		L, PSE, LI, DT, TLI, TTI, AC, ORE, EPI, &LVL,
		BestMainPlan.getCostModel(), BFI, PSI, Checks);
LVP.executePlan(EPI.MainLoopVF, EPI.MainLoopUF, BestMainPlan, MainILV,		LVP.executePlan(EPI.MainLoopVF, EPI.MainLoopUF, BestMainPlan, MainILV,
DT, true);		DT, true);
++LoopsVectorized;		++LoopsVectorized;

// Second pass vectorizes the epilogue and adjusts the control flow		// Second pass vectorizes the epilogue and adjusts the control flow
// edges from the first pass.		// edges from the first pass.
EPI.MainLoopVF = EPI.EpilogueVF;		EPI.MainLoopVF = EPI.EpilogueVF;
EPI.MainLoopUF = EPI.EpilogueUF;		EPI.MainLoopUF = EPI.EpilogueUF;
EpilogueVectorizerEpilogueLoop EpilogILV(L, PSE, LI, DT, TLI, TTI, AC,		VPlan &BestEpiPlan = LVP.getBestPlanFor(EPI.EpilogueVF, false);
ORE, EPI, &LVL, &CM, BFI, PSI,		EpilogueVectorizerEpilogueLoop EpilogILV(
Checks);		L, PSE, LI, DT, TLI, TTI, AC, ORE, EPI, &LVL,
		BestEpiPlan.getCostModel(), BFI, PSI, Checks);

VPlan &BestEpiPlan = LVP.getBestPlanFor(EPI.EpilogueVF);
VPRegionBlock *VectorLoop = BestEpiPlan.getVectorLoopRegion();		VPRegionBlock *VectorLoop = BestEpiPlan.getVectorLoopRegion();
VPBasicBlock *Header = VectorLoop->getEntryBasicBlock();		VPBasicBlock *Header = VectorLoop->getEntryBasicBlock();
Header->setName("vec.epilog.vector.body");		Header->setName("vec.epilog.vector.body");

// Ensure that the start values for any VPWidenIntOrFpInductionRecipe,		// Ensure that the start values for any VPWidenIntOrFpInductionRecipe,
// VPWidenPointerInductionRecipe and VPReductionPHIRecipes are updated		// VPWidenPointerInductionRecipe and VPReductionPHIRecipes are updated
// before vectorizing the epilogue loop.		// before vectorizing the epilogue loop.
for (VPRecipeBase &R : Header->phis()) {		for (VPRecipeBase &R : Header->phis()) {
Show All 30 Lines	if (!VectorizeLoop) {

LVP.executePlan(EPI.EpilogueVF, EPI.EpilogueUF, BestEpiPlan, EpilogILV,		LVP.executePlan(EPI.EpilogueVF, EPI.EpilogueUF, BestEpiPlan, EpilogILV,
DT, true);		DT, true);
++LoopsEpilogueVectorized;		++LoopsEpilogueVectorized;

if (!MainILV.areSafetyChecksAdded())		if (!MainILV.areSafetyChecksAdded())
DisableRuntimeUnroll = true;		DisableRuntimeUnroll = true;
} else {		} else {
		VPlan &BestPlan = LVP.getBestPlanFor(VF.Width, VF.FoldTailByMasking);
InnerLoopVectorizer LB(L, PSE, LI, DT, TLI, TTI, AC, ORE, VF.Width,		InnerLoopVectorizer LB(L, PSE, LI, DT, TLI, TTI, AC, ORE, VF.Width,
VF.MinProfitableTripCount, IC, &LVL, &CM, BFI,		VF.MinProfitableTripCount, IC, &LVL,
PSI, Checks);		BestPlan.getCostModel(), BFI, PSI, Checks);

VPlan &BestPlan = LVP.getBestPlanFor(VF.Width);
LVP.executePlan(VF.Width, IC, BestPlan, LB, DT, false);		LVP.executePlan(VF.Width, IC, BestPlan, LB, DT, false);
++LoopsVectorized;		++LoopsVectorized;

// Add metadata to disable runtime unrolling a scalar loop when there		// Add metadata to disable runtime unrolling a scalar loop when there
// are no runtime checks about strides and memory. A scalar loop that is		// are no runtime checks about strides and memory. A scalar loop that is
// rarely used is not worth unrolling.		// rarely used is not worth unrolling.
if (!LB.areSafetyChecksAdded())		if (!LB.areSafetyChecksAdded())
DisableRuntimeUnroll = true;		DisableRuntimeUnroll = true;
▲ Show 20 Lines • Show All 164 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/VPlan.h

Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
namespace llvm {		namespace llvm {

class BasicBlock;		class BasicBlock;
class DominatorTree;		class DominatorTree;
class InductionDescriptor;		class InductionDescriptor;
class InnerLoopVectorizer;		class InnerLoopVectorizer;
class IRBuilderBase;		class IRBuilderBase;
class LoopInfo;		class LoopInfo;
		class LoopVectorizationCostModel;
class PredicateScalarEvolution;		class PredicateScalarEvolution;
class raw_ostream;		class raw_ostream;
class RecurrenceDescriptor;		class RecurrenceDescriptor;
class SCEV;		class SCEV;
class Type;		class Type;
class VPBasicBlock;		class VPBasicBlock;
class VPRegionBlock;		class VPRegionBlock;
class VPlan;		class VPlan;
▲ Show 20 Lines • Show All 2,154 Lines • ▼ Show 20 Lines	class VPlan {

/// Indicates whether it is safe use the Value2VPValue mapping or if the		/// Indicates whether it is safe use the Value2VPValue mapping or if the
/// mapping cannot be used any longer, because it is stale.		/// mapping cannot be used any longer, because it is stale.
bool Value2VPValueEnabled = true;		bool Value2VPValueEnabled = true;

/// Values used outside the plan.		/// Values used outside the plan.
MapVector<PHINode , VPLiveOut > LiveOuts;		MapVector<PHINode , VPLiveOut > LiveOuts;

		/// Whether this plan should foldTailByMasking
		bool FoldTailByMasking = false;

		/// The cost model used to construct and cost this VPlan.
		/// TODO: Remove this and the dependencies on the costmodel.
		LoopVectorizationCostModel *Cost;

public:		public:
VPlan(VPBlockBase *Entry = nullptr) : Entry(Entry) {		VPlan(bool FoldTailByMasking = false,
		LoopVectorizationCostModel *Cost = nullptr,
		VPBlockBase *Entry = nullptr)
		: Entry(Entry), FoldTailByMasking(FoldTailByMasking), Cost(Cost) {
if (Entry)		if (Entry)
Entry->setPlan(this);		Entry->setPlan(this);
}		}

~VPlan();		~VPlan();

/// Prepare the plan for execution, setting up the required live-in values.		/// Prepare the plan for execution, setting up the required live-in values.
void prepareToExecute(Value TripCount, Value VectorTripCount,		void prepareToExecute(Value TripCount, Value VectorTripCount,
Show All 28 Lines	public:

/// The vector trip count.		/// The vector trip count.
VPValue &getVectorTripCount() { return VectorTripCount; }		VPValue &getVectorTripCount() { return VectorTripCount; }

/// Mark the plan to indicate that using Value2VPValue is not safe any		/// Mark the plan to indicate that using Value2VPValue is not safe any
/// longer, because it may be stale.		/// longer, because it may be stale.
void disableValue2VPValue() { Value2VPValueEnabled = false; }		void disableValue2VPValue() { Value2VPValueEnabled = false; }

		const SmallSetVector<ElementCount, 2> &getVFs() const { return VFs; }

void addVF(ElementCount VF) { VFs.insert(VF); }		void addVF(ElementCount VF) { VFs.insert(VF); }

void setVF(ElementCount VF) {		void setVF(ElementCount VF) {
assert(hasVF(VF) && "Cannot set VF not already in plan");		assert(hasVF(VF) && "Cannot set VF not already in plan");
VFs.clear();		VFs.clear();
VFs.insert(VF);		VFs.insert(VF);
}		}

▲ Show 20 Lines • Show All 122 Lines • ▼ Show 20 Lines	void removeLiveOut(PHINode *PN) {
delete LiveOuts[PN];		delete LiveOuts[PN];
LiveOuts.erase(PN);		LiveOuts.erase(PN);
}		}

const MapVector<PHINode , VPLiveOut > &getLiveOuts() const {		const MapVector<PHINode , VPLiveOut > &getLiveOuts() const {
return LiveOuts;		return LiveOuts;
}		}

		bool foldTailByMasking() const { return FoldTailByMasking; }

		LoopVectorizationCostModel *getCostModel() const { return Cost; }

private:		private:
/// Add to the given dominator tree the header block and every new basic block		/// Add to the given dominator tree the header block and every new basic block
/// that was created between it and the latch block, inclusive.		/// that was created between it and the latch block, inclusive.
static void updateDominatorTree(DominatorTree DT, BasicBlock LoopLatchBB,		static void updateDominatorTree(DominatorTree DT, BasicBlock LoopLatchBB,
BasicBlock *LoopPreHeaderBB,		BasicBlock *LoopPreHeaderBB,
BasicBlock *LoopExitBB);		BasicBlock *LoopExitBB);
};		};

▲ Show 20 Lines • Show All 320 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/VPlan.cpp

Show First 20 Lines • Show All 789 Lines • ▼ Show 20 Lines	void VPlan::print(raw_ostream &O) const {

O << "}\n";		O << "}\n";
}		}

std::string VPlan::getName() const {		std::string VPlan::getName() const {
std::string Out;		std::string Out;
raw_string_ostream RSO(Out);		raw_string_ostream RSO(Out);
RSO << Name << " for ";		RSO << Name << " for ";
		if (FoldTailByMasking)
		RSO << "Tail Folded ";
if (!VFs.empty()) {		if (!VFs.empty()) {
RSO << "VF={" << VFs[0];		RSO << "VF={" << VFs[0];
for (ElementCount VF : drop_begin(VFs))		for (ElementCount VF : drop_begin(VFs))
RSO << "," << VF;		RSO << "," << VF;
RSO << "},";		RSO << "},";
}		}

if (UFs.empty()) {		if (UFs.empty()) {
▲ Show 20 Lines • Show All 340 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/masked-call.ll

	Show First 20 Lines • Show All 100 Lines • ▼ Show 20 Lines
	; TFALWAYS-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; TFALWAYS-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; TFALWAYS-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1024			; TFALWAYS-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1024
	; TFALWAYS-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]			; TFALWAYS-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
	; TFALWAYS: for.cond.cleanup:			; TFALWAYS: for.cond.cleanup:
	; TFALWAYS-NEXT: ret void			; TFALWAYS-NEXT: ret void
	;			;
	; TFFALLBACK-LABEL: @test_widen(			; TFFALLBACK-LABEL: @test_widen(
	; TFFALLBACK-NEXT: entry:			; TFFALLBACK-NEXT: entry:
	; TFFALLBACK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; TFFALLBACK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
				; TFFALLBACK-NEXT: [[TMP1:%.*]] = mul i64 [[TMP0]], 2
				; TFFALLBACK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1024, [[TMP1]]
				; TFFALLBACK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; TFFALLBACK: vector.ph:			; TFFALLBACK: vector.ph:
				; TFFALLBACK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
				; TFFALLBACK-NEXT: [[TMP3:%.*]] = mul i64 [[TMP2]], 2
				; TFFALLBACK-NEXT: [[N_MOD_VF:%.*]] = urem i64 1024, [[TMP3]]
				; TFFALLBACK-NEXT: [[N_VEC:%.*]] = sub i64 1024, [[N_MOD_VF]]
	; TFFALLBACK-NEXT: br label [[VECTOR_BODY:%.*]]			; TFFALLBACK-NEXT: br label [[VECTOR_BODY:%.*]]
	; TFFALLBACK: vector.body:			; TFFALLBACK: vector.body:
	; TFFALLBACK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_CALL_CONTINUE2:%.*]] ]			; TFFALLBACK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; TFFALLBACK-NEXT: [[ACTIVE_LANE_MASK:%.]] = phi <2 x i1> [ <i1 true, i1 true>, [[VECTOR_PH]] ], [ [[ACTIVE_LANE_MASK_NEXT:%.]], [[PRED_CALL_CONTINUE2]] ]			; TFFALLBACK-NEXT: [[TMP4:%.]] = getelementptr i64, ptr [[B:%.]], i64 [[INDEX]]
	; TFFALLBACK-NEXT: [[TMP0:%.]] = getelementptr i64, ptr [[B:%.]], i64 [[INDEX]]			; TFFALLBACK-NEXT: [[WIDE_LOAD:%.*]] = load <vscale x 2 x i64>, ptr [[TMP4]], align 4
	; TFFALLBACK-NEXT: [[WIDE_MASKED_LOAD:%.*]] = call <2 x i64> @llvm.masked.load.v2i64.p0(ptr [[TMP0]], i32 4, <2 x i1> [[ACTIVE_LANE_MASK]], <2 x i64> poison)			; TFFALLBACK-NEXT: [[TMP5:%.*]] = call <vscale x 2 x i64> @foo_vector(<vscale x 2 x i64> [[WIDE_LOAD]], <vscale x 2 x i1> shufflevector (<vscale x 2 x i1> insertelement (<vscale x 2 x i1> poison, i1 true, i64 0), <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer))
	; TFFALLBACK-NEXT: [[TMP1:%.*]] = extractelement <2 x i1> [[ACTIVE_LANE_MASK]], i32 0			; TFFALLBACK-NEXT: [[TMP6:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[INDEX]]
	; TFFALLBACK-NEXT: br i1 [[TMP1]], label [[PRED_CALL_IF:%.]], label [[PRED_CALL_CONTINUE:%.]]			; TFFALLBACK-NEXT: store <vscale x 2 x i64> [[TMP5]], ptr [[TMP6]], align 4
	; TFFALLBACK: pred.call.if:			; TFFALLBACK-NEXT: [[TMP7:%.*]] = call i64 @llvm.vscale.i64()
	; TFFALLBACK-NEXT: [[TMP2:%.*]] = extractelement <2 x i64> [[WIDE_MASKED_LOAD]], i32 0			; TFFALLBACK-NEXT: [[TMP8:%.*]] = mul i64 [[TMP7]], 2
	; TFFALLBACK-NEXT: [[TMP3:%.*]] = call i64 @foo(i64 [[TMP2]]) #[[ATTR4:[0-9]+]]			; TFFALLBACK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP8]]
	; TFFALLBACK-NEXT: [[TMP4:%.*]] = insertelement <2 x i64> poison, i64 [[TMP3]], i32 0			; TFFALLBACK-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; TFFALLBACK-NEXT: br label [[PRED_CALL_CONTINUE]]			; TFFALLBACK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; TFFALLBACK: pred.call.continue:
	; TFFALLBACK-NEXT: [[TMP5:%.*]] = phi <2 x i64> [ poison, [[VECTOR_BODY]] ], [ [[TMP4]], [[PRED_CALL_IF]] ]
	; TFFALLBACK-NEXT: [[TMP6:%.*]] = extractelement <2 x i1> [[ACTIVE_LANE_MASK]], i32 1
	; TFFALLBACK-NEXT: br i1 [[TMP6]], label [[PRED_CALL_IF1:%.*]], label [[PRED_CALL_CONTINUE2]]
	; TFFALLBACK: pred.call.if1:
	; TFFALLBACK-NEXT: [[TMP7:%.*]] = extractelement <2 x i64> [[WIDE_MASKED_LOAD]], i32 1
	; TFFALLBACK-NEXT: [[TMP8:%.*]] = call i64 @foo(i64 [[TMP7]]) #[[ATTR4]]
	; TFFALLBACK-NEXT: [[TMP9:%.*]] = insertelement <2 x i64> [[TMP5]], i64 [[TMP8]], i32 1
	; TFFALLBACK-NEXT: br label [[PRED_CALL_CONTINUE2]]
	; TFFALLBACK: pred.call.continue2:
	; TFFALLBACK-NEXT: [[TMP10:%.*]] = phi <2 x i64> [ [[TMP5]], [[PRED_CALL_CONTINUE]] ], [ [[TMP9]], [[PRED_CALL_IF1]] ]
	; TFFALLBACK-NEXT: [[TMP11:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[INDEX]]
	; TFFALLBACK-NEXT: call void @llvm.masked.store.v2i64.p0(<2 x i64> [[TMP10]], ptr [[TMP11]], i32 4, <2 x i1> [[ACTIVE_LANE_MASK]])
	; TFFALLBACK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 2
	; TFFALLBACK-NEXT: [[ACTIVE_LANE_MASK_NEXT]] = call <2 x i1> @llvm.get.active.lane.mask.v2i1.i64(i64 [[INDEX_NEXT]], i64 1024)
	; TFFALLBACK-NEXT: [[TMP12:%.*]] = xor <2 x i1> [[ACTIVE_LANE_MASK_NEXT]], <i1 true, i1 true>
	; TFFALLBACK-NEXT: [[TMP13:%.*]] = extractelement <2 x i1> [[TMP12]], i32 0
	; TFFALLBACK-NEXT: br i1 [[TMP13]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; TFFALLBACK: middle.block:			; TFFALLBACK: middle.block:
	; TFFALLBACK-NEXT: br i1 true, label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]			; TFFALLBACK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, [[N_VEC]]
				; TFFALLBACK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
	; TFFALLBACK: scalar.ph:			; TFFALLBACK: scalar.ph:
	; TFFALLBACK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1024, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; TFFALLBACK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; TFFALLBACK-NEXT: br label [[FOR_BODY:%.*]]			; TFFALLBACK-NEXT: br label [[FOR_BODY:%.*]]
	; TFFALLBACK: for.body:			; TFFALLBACK: for.body:
	; TFFALLBACK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]			; TFFALLBACK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
	; TFFALLBACK-NEXT: [[GEP:%.*]] = getelementptr i64, ptr [[B]], i64 [[INDVARS_IV]]			; TFFALLBACK-NEXT: [[GEP:%.*]] = getelementptr i64, ptr [[B]], i64 [[INDVARS_IV]]
	; TFFALLBACK-NEXT: [[LOAD:%.*]] = load i64, ptr [[GEP]], align 4			; TFFALLBACK-NEXT: [[LOAD:%.*]] = load i64, ptr [[GEP]], align 4
	; TFFALLBACK-NEXT: [[CALL:%.*]] = call i64 @foo(i64 [[LOAD]]) #[[ATTR4]]			; TFFALLBACK-NEXT: [[CALL:%.*]] = call i64 @foo(i64 [[LOAD]]) #[[ATTR2:[0-9]+]]
	; TFFALLBACK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[INDVARS_IV]]			; TFFALLBACK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[INDVARS_IV]]
	; TFFALLBACK-NEXT: store i64 [[CALL]], ptr [[ARRAYIDX]], align 4			; TFFALLBACK-NEXT: store i64 [[CALL]], ptr [[ARRAYIDX]], align 4
	; TFFALLBACK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; TFFALLBACK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; TFFALLBACK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1024			; TFFALLBACK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1024
	; TFFALLBACK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]			; TFFALLBACK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
	; TFFALLBACK: for.cond.cleanup:			; TFFALLBACK: for.cond.cleanup:
	; TFFALLBACK-NEXT: ret void			; TFFALLBACK-NEXT: ret void
	;			;
	▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines
	; TFFALLBACK-NEXT: br label [[FOR_BODY:%.*]]			; TFFALLBACK-NEXT: br label [[FOR_BODY:%.*]]
	; TFFALLBACK: for.body:			; TFFALLBACK: for.body:
	; TFFALLBACK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[IF_END:%.]] ], [ 0, [[ENTRY:%.]] ]			; TFFALLBACK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[IF_END:%.]] ], [ 0, [[ENTRY:%.]] ]
	; TFFALLBACK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[INDVARS_IV]]			; TFFALLBACK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[INDVARS_IV]]
	; TFFALLBACK-NEXT: [[TMP0:%.*]] = load i64, ptr [[ARRAYIDX]], align 8			; TFFALLBACK-NEXT: [[TMP0:%.*]] = load i64, ptr [[ARRAYIDX]], align 8
	; TFFALLBACK-NEXT: [[CMP:%.*]] = icmp ugt i64 [[TMP0]], 50			; TFFALLBACK-NEXT: [[CMP:%.*]] = icmp ugt i64 [[TMP0]], 50
	; TFFALLBACK-NEXT: br i1 [[CMP]], label [[IF_THEN:%.*]], label [[IF_END]]			; TFFALLBACK-NEXT: br i1 [[CMP]], label [[IF_THEN:%.*]], label [[IF_END]]
	; TFFALLBACK: if.then:			; TFFALLBACK: if.then:
	; TFFALLBACK-NEXT: [[TMP1:%.*]] = call i64 @foo(i64 [[TMP0]]) #[[ATTR4]]			; TFFALLBACK-NEXT: [[TMP1:%.*]] = call i64 @foo(i64 [[TMP0]]) #[[ATTR2]]
	; TFFALLBACK-NEXT: br label [[IF_END]]			; TFFALLBACK-NEXT: br label [[IF_END]]
	; TFFALLBACK: if.end:			; TFFALLBACK: if.end:
	; TFFALLBACK-NEXT: [[TMP2:%.*]] = phi i64 [ [[TMP1]], [[IF_THEN]] ], [ 0, [[FOR_BODY]] ]			; TFFALLBACK-NEXT: [[TMP2:%.*]] = phi i64 [ [[TMP1]], [[IF_THEN]] ], [ 0, [[FOR_BODY]] ]
	; TFFALLBACK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i64, ptr [[B:%.]], i64 [[INDVARS_IV]]			; TFFALLBACK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i64, ptr [[B:%.]], i64 [[INDVARS_IV]]
	; TFFALLBACK-NEXT: store i64 [[TMP2]], ptr [[ARRAYIDX1]], align 8			; TFFALLBACK-NEXT: store i64 [[TMP2]], ptr [[ARRAYIDX1]], align 8
	; TFFALLBACK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; TFFALLBACK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; TFFALLBACK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1024			; TFFALLBACK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1024
	; TFFALLBACK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]]			; TFFALLBACK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]]
	▲ Show 20 Lines • Show All 86 Lines • ▼ Show 20 Lines
	; TFFALLBACK-NEXT: br label [[FOR_BODY:%.*]]			; TFFALLBACK-NEXT: br label [[FOR_BODY:%.*]]
	; TFFALLBACK: for.body:			; TFFALLBACK: for.body:
	; TFFALLBACK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[IF_END:%.]] ], [ 0, [[ENTRY:%.]] ]			; TFFALLBACK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[IF_END:%.]] ], [ 0, [[ENTRY:%.]] ]
	; TFFALLBACK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[INDVARS_IV]]			; TFFALLBACK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[INDVARS_IV]]
	; TFFALLBACK-NEXT: [[TMP0:%.*]] = load i64, ptr [[ARRAYIDX]], align 8			; TFFALLBACK-NEXT: [[TMP0:%.*]] = load i64, ptr [[ARRAYIDX]], align 8
	; TFFALLBACK-NEXT: [[CMP:%.*]] = icmp ugt i64 [[TMP0]], 50			; TFFALLBACK-NEXT: [[CMP:%.*]] = icmp ugt i64 [[TMP0]], 50
	; TFFALLBACK-NEXT: br i1 [[CMP]], label [[IF_THEN:%.]], label [[IF_ELSE:%.]]			; TFFALLBACK-NEXT: br i1 [[CMP]], label [[IF_THEN:%.]], label [[IF_ELSE:%.]]
	; TFFALLBACK: if.then:			; TFFALLBACK: if.then:
	; TFFALLBACK-NEXT: [[TMP1:%.*]] = call i64 @foo(i64 [[TMP0]]) #[[ATTR5:[0-9]+]]			; TFFALLBACK-NEXT: [[TMP1:%.*]] = call i64 @foo(i64 [[TMP0]]) #[[ATTR3:[0-9]+]]
	; TFFALLBACK-NEXT: br label [[IF_END]]			; TFFALLBACK-NEXT: br label [[IF_END]]
	; TFFALLBACK: if.else:			; TFFALLBACK: if.else:
	; TFFALLBACK-NEXT: [[TMP2:%.*]] = call i64 @foo(i64 0) #[[ATTR5]]			; TFFALLBACK-NEXT: [[TMP2:%.*]] = call i64 @foo(i64 0) #[[ATTR3]]
	; TFFALLBACK-NEXT: br label [[IF_END]]			; TFFALLBACK-NEXT: br label [[IF_END]]
	; TFFALLBACK: if.end:			; TFFALLBACK: if.end:
	; TFFALLBACK-NEXT: [[TMP3:%.*]] = phi i64 [ [[TMP1]], [[IF_THEN]] ], [ [[TMP2]], [[IF_ELSE]] ]			; TFFALLBACK-NEXT: [[TMP3:%.*]] = phi i64 [ [[TMP1]], [[IF_THEN]] ], [ [[TMP2]], [[IF_ELSE]] ]
	; TFFALLBACK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i64, ptr [[B:%.]], i64 [[INDVARS_IV]]			; TFFALLBACK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i64, ptr [[B:%.]], i64 [[INDVARS_IV]]
	; TFFALLBACK-NEXT: store i64 [[TMP3]], ptr [[ARRAYIDX1]], align 8			; TFFALLBACK-NEXT: store i64 [[TMP3]], ptr [[ARRAYIDX1]], align 8
	; TFFALLBACK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; TFFALLBACK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; TFFALLBACK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1024			; TFFALLBACK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1024
	; TFFALLBACK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]]			; TFFALLBACK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]]
	▲ Show 20 Lines • Show All 122 Lines • ▼ Show 20 Lines
	; TFFALLBACK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]			; TFFALLBACK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
	; TFFALLBACK: scalar.ph:			; TFFALLBACK: scalar.ph:
	; TFFALLBACK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; TFFALLBACK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; TFFALLBACK-NEXT: br label [[FOR_BODY:%.*]]			; TFFALLBACK-NEXT: br label [[FOR_BODY:%.*]]
	; TFFALLBACK: for.body:			; TFFALLBACK: for.body:
	; TFFALLBACK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]			; TFFALLBACK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
	; TFFALLBACK-NEXT: [[GEP:%.*]] = getelementptr i64, ptr [[B]], i64 [[INDVARS_IV]]			; TFFALLBACK-NEXT: [[GEP:%.*]] = getelementptr i64, ptr [[B]], i64 [[INDVARS_IV]]
	; TFFALLBACK-NEXT: [[LOAD:%.*]] = load i64, ptr [[GEP]], align 4			; TFFALLBACK-NEXT: [[LOAD:%.*]] = load i64, ptr [[GEP]], align 4
	; TFFALLBACK-NEXT: [[CALL:%.*]] = call i64 @foo(i64 [[LOAD]]) #[[ATTR6:[0-9]+]]			; TFFALLBACK-NEXT: [[CALL:%.*]] = call i64 @foo(i64 [[LOAD]]) #[[ATTR4:[0-9]+]]
	; TFFALLBACK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[INDVARS_IV]]			; TFFALLBACK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[INDVARS_IV]]
	; TFFALLBACK-NEXT: store i64 [[CALL]], ptr [[ARRAYIDX]], align 4			; TFFALLBACK-NEXT: store i64 [[CALL]], ptr [[ARRAYIDX]], align 4
	; TFFALLBACK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; TFFALLBACK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; TFFALLBACK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1024			; TFFALLBACK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1024
	; TFFALLBACK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]			; TFFALLBACK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
	; TFFALLBACK: for.cond.cleanup:			; TFFALLBACK: for.cond.cleanup:
	; TFFALLBACK-NEXT: ret void			; TFFALLBACK-NEXT: ret void
	;			;
	▲ Show 20 Lines • Show All 112 Lines • ▼ Show 20 Lines
	; TFALWAYS-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; TFALWAYS-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; TFALWAYS-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1024			; TFALWAYS-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1024
	; TFALWAYS-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]			; TFALWAYS-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
	; TFALWAYS: for.cond.cleanup:			; TFALWAYS: for.cond.cleanup:
	; TFALWAYS-NEXT: ret void			; TFALWAYS-NEXT: ret void
	;			;
	; TFFALLBACK-LABEL: @test_widen_optmask(			; TFFALLBACK-LABEL: @test_widen_optmask(
	; TFFALLBACK-NEXT: entry:			; TFFALLBACK-NEXT: entry:
	; TFFALLBACK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; TFFALLBACK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
				; TFFALLBACK-NEXT: [[TMP1:%.*]] = mul i64 [[TMP0]], 2
				; TFFALLBACK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1024, [[TMP1]]
				; TFFALLBACK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; TFFALLBACK: vector.ph:			; TFFALLBACK: vector.ph:
				; TFFALLBACK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
				; TFFALLBACK-NEXT: [[TMP3:%.*]] = mul i64 [[TMP2]], 2
				; TFFALLBACK-NEXT: [[N_MOD_VF:%.*]] = urem i64 1024, [[TMP3]]
				; TFFALLBACK-NEXT: [[N_VEC:%.*]] = sub i64 1024, [[N_MOD_VF]]
	; TFFALLBACK-NEXT: br label [[VECTOR_BODY:%.*]]			; TFFALLBACK-NEXT: br label [[VECTOR_BODY:%.*]]
	; TFFALLBACK: vector.body:			; TFFALLBACK: vector.body:
	; TFFALLBACK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_CALL_CONTINUE2:%.*]] ]			; TFFALLBACK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; TFFALLBACK-NEXT: [[ACTIVE_LANE_MASK:%.]] = phi <2 x i1> [ <i1 true, i1 true>, [[VECTOR_PH]] ], [ [[ACTIVE_LANE_MASK_NEXT:%.]], [[PRED_CALL_CONTINUE2]] ]			; TFFALLBACK-NEXT: [[TMP4:%.]] = getelementptr i64, ptr [[B:%.]], i64 [[INDEX]]
	; TFFALLBACK-NEXT: [[TMP0:%.]] = getelementptr i64, ptr [[B:%.]], i64 [[INDEX]]			; TFFALLBACK-NEXT: [[WIDE_LOAD:%.*]] = load <vscale x 2 x i64>, ptr [[TMP4]], align 4
	; TFFALLBACK-NEXT: [[WIDE_MASKED_LOAD:%.*]] = call <2 x i64> @llvm.masked.load.v2i64.p0(ptr [[TMP0]], i32 4, <2 x i1> [[ACTIVE_LANE_MASK]], <2 x i64> poison)			; TFFALLBACK-NEXT: [[TMP5:%.*]] = call <vscale x 2 x i64> @foo_vector_nomask(<vscale x 2 x i64> [[WIDE_LOAD]])
	; TFFALLBACK-NEXT: [[TMP1:%.*]] = extractelement <2 x i1> [[ACTIVE_LANE_MASK]], i32 0			; TFFALLBACK-NEXT: [[TMP6:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[INDEX]]
	; TFFALLBACK-NEXT: br i1 [[TMP1]], label [[PRED_CALL_IF:%.]], label [[PRED_CALL_CONTINUE:%.]]			; TFFALLBACK-NEXT: store <vscale x 2 x i64> [[TMP5]], ptr [[TMP6]], align 4
	; TFFALLBACK: pred.call.if:			; TFFALLBACK-NEXT: [[TMP7:%.*]] = call i64 @llvm.vscale.i64()
	; TFFALLBACK-NEXT: [[TMP2:%.*]] = extractelement <2 x i64> [[WIDE_MASKED_LOAD]], i32 0			; TFFALLBACK-NEXT: [[TMP8:%.*]] = mul i64 [[TMP7]], 2
	; TFFALLBACK-NEXT: [[TMP3:%.*]] = call i64 @foo(i64 [[TMP2]]) #[[ATTR7:[0-9]+]]			; TFFALLBACK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP8]]
	; TFFALLBACK-NEXT: [[TMP4:%.*]] = insertelement <2 x i64> poison, i64 [[TMP3]], i32 0			; TFFALLBACK-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; TFFALLBACK-NEXT: br label [[PRED_CALL_CONTINUE]]			; TFFALLBACK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
	; TFFALLBACK: pred.call.continue:
	; TFFALLBACK-NEXT: [[TMP5:%.*]] = phi <2 x i64> [ poison, [[VECTOR_BODY]] ], [ [[TMP4]], [[PRED_CALL_IF]] ]
	; TFFALLBACK-NEXT: [[TMP6:%.*]] = extractelement <2 x i1> [[ACTIVE_LANE_MASK]], i32 1
	; TFFALLBACK-NEXT: br i1 [[TMP6]], label [[PRED_CALL_IF1:%.*]], label [[PRED_CALL_CONTINUE2]]
	; TFFALLBACK: pred.call.if1:
	; TFFALLBACK-NEXT: [[TMP7:%.*]] = extractelement <2 x i64> [[WIDE_MASKED_LOAD]], i32 1
	; TFFALLBACK-NEXT: [[TMP8:%.*]] = call i64 @foo(i64 [[TMP7]]) #[[ATTR7]]
	; TFFALLBACK-NEXT: [[TMP9:%.*]] = insertelement <2 x i64> [[TMP5]], i64 [[TMP8]], i32 1
	; TFFALLBACK-NEXT: br label [[PRED_CALL_CONTINUE2]]
	; TFFALLBACK: pred.call.continue2:
	; TFFALLBACK-NEXT: [[TMP10:%.*]] = phi <2 x i64> [ [[TMP5]], [[PRED_CALL_CONTINUE]] ], [ [[TMP9]], [[PRED_CALL_IF1]] ]
	; TFFALLBACK-NEXT: [[TMP11:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[INDEX]]
	; TFFALLBACK-NEXT: call void @llvm.masked.store.v2i64.p0(<2 x i64> [[TMP10]], ptr [[TMP11]], i32 4, <2 x i1> [[ACTIVE_LANE_MASK]])
	; TFFALLBACK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 2
	; TFFALLBACK-NEXT: [[ACTIVE_LANE_MASK_NEXT]] = call <2 x i1> @llvm.get.active.lane.mask.v2i1.i64(i64 [[INDEX_NEXT]], i64 1024)
	; TFFALLBACK-NEXT: [[TMP12:%.*]] = xor <2 x i1> [[ACTIVE_LANE_MASK_NEXT]], <i1 true, i1 true>
	; TFFALLBACK-NEXT: [[TMP13:%.*]] = extractelement <2 x i1> [[TMP12]], i32 0
	; TFFALLBACK-NEXT: br i1 [[TMP13]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
	; TFFALLBACK: middle.block:			; TFFALLBACK: middle.block:
	; TFFALLBACK-NEXT: br i1 true, label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]			; TFFALLBACK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, [[N_VEC]]
				; TFFALLBACK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
	; TFFALLBACK: scalar.ph:			; TFFALLBACK: scalar.ph:
	; TFFALLBACK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1024, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; TFFALLBACK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; TFFALLBACK-NEXT: br label [[FOR_BODY:%.*]]			; TFFALLBACK-NEXT: br label [[FOR_BODY:%.*]]
	; TFFALLBACK: for.body:			; TFFALLBACK: for.body:
	; TFFALLBACK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]			; TFFALLBACK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
	; TFFALLBACK-NEXT: [[GEP:%.*]] = getelementptr i64, ptr [[B]], i64 [[INDVARS_IV]]			; TFFALLBACK-NEXT: [[GEP:%.*]] = getelementptr i64, ptr [[B]], i64 [[INDVARS_IV]]
	; TFFALLBACK-NEXT: [[LOAD:%.*]] = load i64, ptr [[GEP]], align 4			; TFFALLBACK-NEXT: [[LOAD:%.*]] = load i64, ptr [[GEP]], align 4
	; TFFALLBACK-NEXT: [[CALL:%.*]] = call i64 @foo(i64 [[LOAD]]) #[[ATTR7]]			; TFFALLBACK-NEXT: [[CALL:%.*]] = call i64 @foo(i64 [[LOAD]]) #[[ATTR5:[0-9]+]]
	; TFFALLBACK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[INDVARS_IV]]			; TFFALLBACK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[INDVARS_IV]]
	; TFFALLBACK-NEXT: store i64 [[CALL]], ptr [[ARRAYIDX]], align 4			; TFFALLBACK-NEXT: store i64 [[CALL]], ptr [[ARRAYIDX]], align 4
	; TFFALLBACK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; TFFALLBACK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; TFFALLBACK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1024			; TFFALLBACK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1024
	; TFFALLBACK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]			; TFFALLBACK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]
	; TFFALLBACK: for.cond.cleanup:			; TFFALLBACK: for.cond.cleanup:
	; TFFALLBACK-NEXT: ret void			; TFFALLBACK-NEXT: ret void
	;			;
	Show All 30 Lines

llvm/test/Transforms/LoopVectorize/AArch64/maximize-bandwidth-invalidate.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: opt < %s -passes=loop-vectorize -vectorizer-maximize-bandwidth -S 2>&1 \| FileCheck %s			; RUN: opt < %s -passes=loop-vectorize -vectorizer-maximize-bandwidth -S 2>&1 \| FileCheck %s
	; RUN: opt < %s -passes=loop-vectorize -vectorizer-maximize-bandwidth -S -debug-only=loop-vectorize 2>&1 -disable-output \| FileCheck %s --check-prefix=COST			; RUN: opt < %s -passes=loop-vectorize -vectorizer-maximize-bandwidth -S -debug-only=loop-vectorize 2>&1 -disable-output \| FileCheck %s --check-prefix=COST

	target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
	target triple = "aarch64-none-unknown-eabi"			target triple = "aarch64-none-unknown-eabi"

	; Check that the maximize vector bandwidth option does not give incorrect costs			; Check that the maximize vector bandwidth option does not give incorrect costs
	; due to invalid cost decisions. The loop below has a low maximum trip count,			; due to invalid cost decisions. The loop below has a low maximum trip count,
	; so will be masked.			; so will be masked.

	; COST: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %0 = load			; COST: LV: Found an estimated cost of 3000000 for VF 2 For instruction: %0 = load
	; COST: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %0 = load			; COST: LV: Found an estimated cost of 3000000 for VF 4 For instruction: %0 = load
	; COST: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %0 = load			; COST: LV: Found an estimated cost of 3000000 for VF 8 For instruction: %0 = load
	; COST: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %0 = load			; COST: LV: Found an estimated cost of 3000000 for VF 16 For instruction: %0 = load
	; COST: LV: Selecting VF: 1.			; COST: LV: Selecting Tail folded VF: 1.
				david-armUnsubmitted Not Done Reply Inline Actions This seems odd. Perhaps I'm mistakend, but I thought with your patch we wouldn't decide to tail-fold with a VF of 1? david-arm: This seems odd. Perhaps I'm mistakend, but I thought with your patch we wouldn't decide to tail…

	define i32 @test(ptr nocapture noundef readonly %pInVec, ptr nocapture noundef readonly %pInA1, ptr nocapture noundef readonly %pInA2, ptr nocapture noundef readonly %pInA3, ptr nocapture noundef readonly %pInA4, i32 noundef %numCols) {			define i32 @test(ptr nocapture noundef readonly %pInVec, ptr nocapture noundef readonly %pInA1, ptr nocapture noundef readonly %pInA2, ptr nocapture noundef readonly %pInA3, ptr nocapture noundef readonly %pInA4, i32 noundef %numCols) {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[AND:%.]] = and i32 [[NUMCOLS:%.]], 3			; CHECK-NEXT: [[AND:%.]] = and i32 [[NUMCOLS:%.]], 3
	; CHECK-NEXT: [[CMP_NOT32:%.*]] = icmp eq i32 [[AND]], 0			; CHECK-NEXT: [[CMP_NOT32:%.*]] = icmp eq i32 [[AND]], 0
	; CHECK-NEXT: br i1 [[CMP_NOT32]], label [[WHILE_END:%.]], label [[WHILE_BODY_PREHEADER:%.]]			; CHECK-NEXT: br i1 [[CMP_NOT32]], label [[WHILE_END:%.]], label [[WHILE_BODY_PREHEADER:%.]]
	; CHECK: while.body.preheader:			; CHECK: while.body.preheader:
	▲ Show 20 Lines • Show All 104 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-forced.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: opt -S -passes=loop-vectorize -debug-only=loop-vectorize < %s 2>%t \| FileCheck %s			; RUN: opt -S -passes=loop-vectorize -debug-only=loop-vectorize < %s 2>%t \| FileCheck %s
	; RUN: cat %t \| FileCheck %s --check-prefix=VPLANS			; RUN: cat %t \| FileCheck %s --check-prefix=VPLANS

	; These tests ensure that tail-folding is enabled when the predicate.enable			; These tests ensure that tail-folding is enabled when the predicate.enable
	; loop attribute is set to true.			; loop attribute is set to true.

	target triple = "aarch64-unknown-linux-gnu"			target triple = "aarch64-unknown-linux-gnu"

	; VPLANS-LABEL: Checking a loop in 'simple_memset'			; VPLANS-LABEL: Checking a loop in 'simple_memset'
	; VPLANS: VPlan 'Initial VPlan for VF={vscale x 1,vscale x 2,vscale x 4},UF>=1' {			; VPLANS: VPlan 'Initial VPlan for Tail Folded VF={vscale x 1,vscale x 2,vscale x 4},UF>=1' {
	; VPLANS-NEXT: Live-in vp<[[TC:%[0-9]+]]> = original trip-count			; VPLANS-NEXT: Live-in vp<[[TC:%[0-9]+]]> = original trip-count
	; VPLANS-EMPTY:			; VPLANS-EMPTY:
	; VPLANS-NEXT: vector.ph:			; VPLANS-NEXT: vector.ph:
	; VPLANS-NEXT: EMIT vp<[[VF:%[0-9]+]]> = VF * Part + ir<0>			; VPLANS-NEXT: EMIT vp<[[VF:%[0-9]+]]> = VF * Part + ir<0>
	; VPLANS-NEXT: EMIT vp<[[NEWTC:%[0-9]+]]> = TC > VF ? TC - VF : 0 vp<[[TC]]>			; VPLANS-NEXT: EMIT vp<[[NEWTC:%[0-9]+]]> = TC > VF ? TC - VF : 0 vp<[[TC]]>
	; VPLANS-NEXT: EMIT vp<[[LANEMASK_ENTRY:%[0-9]+]]> = active lane mask vp<[[VF]]> vp<[[TC]]>			; VPLANS-NEXT: EMIT vp<[[LANEMASK_ENTRY:%[0-9]+]]> = active lane mask vp<[[VF]]> vp<[[TC]]>
	; VPLANS-NEXT: Successor(s): vector loop			; VPLANS-NEXT: Successor(s): vector loop
	; VPLANS-EMPTY:			; VPLANS-EMPTY:
	▲ Show 20 Lines • Show All 87 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/tail-folding-styles.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -passes=loop-vectorize -force-tail-folding-style=none < %s \| FileCheck %s --check-prefix=NONE
	; RUN: opt -S -passes=loop-vectorize -force-tail-folding-style=data < %s \| FileCheck %s --check-prefix=DATA			; RUN: opt -S -passes=loop-vectorize -force-tail-folding-style=data < %s \| FileCheck %s --check-prefix=DATA
	; RUN: opt -S -passes=loop-vectorize -force-tail-folding-style=data-without-lane-mask < %s \| FileCheck %s --check-prefix=DATA_NO_LANEMASK			; RUN: opt -S -passes=loop-vectorize -force-tail-folding-style=data-without-lane-mask < %s \| FileCheck %s --check-prefix=DATA_NO_LANEMASK
	; RUN: opt -S -passes=loop-vectorize -force-tail-folding-style=data-and-control < %s \| FileCheck %s --check-prefix=DATA_AND_CONTROL			; RUN: opt -S -passes=loop-vectorize -force-tail-folding-style=data-and-control < %s \| FileCheck %s --check-prefix=DATA_AND_CONTROL
	; RUN: opt -S -passes=loop-vectorize -force-tail-folding-style=data-and-control-without-rt-check < %s \| FileCheck %s --check-prefix=DATA_AND_CONTROL_NO_RT_CHECK			; RUN: opt -S -passes=loop-vectorize -force-tail-folding-style=data-and-control-without-rt-check < %s \| FileCheck %s --check-prefix=DATA_AND_CONTROL_NO_RT_CHECK

	target triple = "aarch64-unknown-linux-gnu"			target triple = "aarch64-unknown-linux-gnu"

	; Test the different tail folding styles.			; Test the different tail folding styles.

	define void @simple_memset_tailfold(i32 %val, ptr %ptr, i64 %n) "target-features" = "+sve" {			define void @simple_memset_tailfold(i32 %val, ptr %ptr, i64 %n) "target-features" = "+sve" {
	; NONE-LABEL: @simple_memset_tailfold(
	; NONE-NEXT: entry:
	; NONE-NEXT: [[UMAX:%.]] = call i64 @llvm.umax.i64(i64 [[N:%.]], i64 1)
	; NONE-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
	; NONE-NEXT: [[TMP1:%.*]] = mul i64 [[TMP0]], 4
	; NONE-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[UMAX]], [[TMP1]]
	; NONE-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; NONE: vector.ph:
	; NONE-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
	; NONE-NEXT: [[TMP3:%.*]] = mul i64 [[TMP2]], 4
	; NONE-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[UMAX]], [[TMP3]]
	; NONE-NEXT: [[N_VEC:%.*]] = sub i64 [[UMAX]], [[N_MOD_VF]]
	; NONE-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <vscale x 4 x i32> poison, i32 [[VAL:%.]], i64 0
	; NONE-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
	; NONE-NEXT: br label [[VECTOR_BODY:%.*]]
	; NONE: vector.body:
	; NONE-NEXT: [[INDEX1:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT2:%.]], [[VECTOR_BODY]] ]
	; NONE-NEXT: [[TMP4:%.*]] = add i64 [[INDEX1]], 0
	; NONE-NEXT: [[TMP5:%.]] = getelementptr i32, ptr [[PTR:%.]], i64 [[TMP4]]
	; NONE-NEXT: [[TMP6:%.*]] = getelementptr i32, ptr [[TMP5]], i32 0
	; NONE-NEXT: store <vscale x 4 x i32> [[BROADCAST_SPLAT]], ptr [[TMP6]], align 4
	; NONE-NEXT: [[TMP7:%.*]] = call i64 @llvm.vscale.i64()
	; NONE-NEXT: [[TMP8:%.*]] = mul i64 [[TMP7]], 4
	; NONE-NEXT: [[INDEX_NEXT2]] = add nuw i64 [[INDEX1]], [[TMP8]]
	; NONE-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT2]], [[N_VEC]]
	; NONE-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; NONE: middle.block:
	; NONE-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[UMAX]], [[N_VEC]]
	; NONE-NEXT: br i1 [[CMP_N]], label [[WHILE_END_LOOPEXIT:%.*]], label [[SCALAR_PH]]
	; NONE: scalar.ph:
	; NONE-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; NONE-NEXT: br label [[WHILE_BODY:%.*]]
	; NONE: while.body:
	; NONE-NEXT: [[INDEX:%.]] = phi i64 [ [[INDEX_NEXT:%.]], [[WHILE_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; NONE-NEXT: [[GEP:%.*]] = getelementptr i32, ptr [[PTR]], i64 [[INDEX]]
	; NONE-NEXT: store i32 [[VAL]], ptr [[GEP]], align 4
	; NONE-NEXT: [[INDEX_NEXT]] = add nsw i64 [[INDEX]], 1
	; NONE-NEXT: [[CMP10:%.*]] = icmp ult i64 [[INDEX_NEXT]], [[N]]
	; NONE-NEXT: br i1 [[CMP10]], label [[WHILE_BODY]], label [[WHILE_END_LOOPEXIT]], !llvm.loop [[LOOP3:![0-9]+]]
	; NONE: while.end.loopexit:
	; NONE-NEXT: ret void
	;
	; DATA-LABEL: @simple_memset_tailfold(			; DATA-LABEL: @simple_memset_tailfold(
	; DATA-NEXT: entry:			; DATA-NEXT: entry:
	; DATA-NEXT: [[UMAX:%.]] = call i64 @llvm.umax.i64(i64 [[N:%.]], i64 1)			; DATA-NEXT: [[UMAX:%.]] = call i64 @llvm.umax.i64(i64 [[N:%.]], i64 1)
	; DATA-NEXT: [[TMP0:%.*]] = sub i64 -1, [[UMAX]]			; DATA-NEXT: [[TMP0:%.*]] = sub i64 -1, [[UMAX]]
	; DATA-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()			; DATA-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
	; DATA-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], 4			; DATA-NEXT: [[TMP2:%.*]] = mul i64 [[TMP1]], 4
	; DATA-NEXT: [[TMP3:%.*]] = icmp ult i64 [[TMP0]], [[TMP2]]			; DATA-NEXT: [[TMP3:%.*]] = icmp ult i64 [[TMP0]], [[TMP2]]
	; DATA-NEXT: br i1 [[TMP3]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; DATA-NEXT: br i1 [[TMP3]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	▲ Show 20 Lines • Show All 215 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/ARM/mve-known-trip-count.ll

; RUN: opt -passes=loop-vectorize -debug-only=loop-vectorize -disable-output < %s 2>&1 \| FileCheck %s		; RUN: opt -passes=loop-vectorize -debug-only=loop-vectorize -disable-output < %s 2>&1 \| FileCheck %s
; REQUIRES: asserts		; REQUIRES: asserts

target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"		target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
target triple = "thumbv8.1m.main-arm-none-eabi"		target triple = "thumbv8.1m.main-arm-none-eabi"

; Trip count of 5 - shouldn't be vectorized.		; Trip count of 5 - shouldn't be vectorized.
; CHECK-LABEL: tripcount5		; CHECK-LABEL: tripcount5
; CHECK: LV: Selecting VF: 1		; CHECK: LV: Selecting Tail folded VF: 1
define void @tripcount5(ptr nocapture readonly %in, ptr nocapture %out, ptr nocapture readonly %consts, i32 %n) #0 {		define void @tripcount5(ptr nocapture readonly %in, ptr nocapture %out, ptr nocapture readonly %consts, i32 %n) #0 {
entry:		entry:
%arrayidx20 = getelementptr inbounds i32, ptr %out, i32 1		%arrayidx20 = getelementptr inbounds i32, ptr %out, i32 1
%arrayidx38 = getelementptr inbounds i32, ptr %out, i32 2		%arrayidx38 = getelementptr inbounds i32, ptr %out, i32 2
%arrayidx56 = getelementptr inbounds i32, ptr %out, i32 3		%arrayidx56 = getelementptr inbounds i32, ptr %out, i32 3
%arrayidx74 = getelementptr inbounds i32, ptr %out, i32 4		%arrayidx74 = getelementptr inbounds i32, ptr %out, i32 4
%arrayidx92 = getelementptr inbounds i32, ptr %out, i32 5		%arrayidx92 = getelementptr inbounds i32, ptr %out, i32 5
%arrayidx110 = getelementptr inbounds i32, ptr %out, i32 6		%arrayidx110 = getelementptr inbounds i32, ptr %out, i32 6
▲ Show 20 Lines • Show All 365 Lines • ▼ Show 20 Lines	for.body: ; preds = %entry, %for.body
%add138 = add nsw i32 %mul136, %add129		%add138 = add nsw i32 %mul136, %add129
%add139 = add nuw nsw i32 %hop.0236, 16		%add139 = add nuw nsw i32 %hop.0236, 16
%cmp = icmp ult i32 %hop.0236, 112		%cmp = icmp ult i32 %hop.0236, 112
br i1 %cmp, label %for.body, label %for.cond.cleanup		br i1 %cmp, label %for.body, label %for.cond.cleanup
}		}

; Larger example with predication that should also not be vectorized		; Larger example with predication that should also not be vectorized
; CHECK-LABEL: predicated		; CHECK-LABEL: predicated
; CHECK: LV: Selecting VF: 1		; CHECK: LV: Selecting Tail folded VF: 1
; CHECK: LV: Selecting VF: 1		; CHECK: LV: Selecting Tail folded VF: 1
define dso_local i32 @predicated(i32 noundef %0, ptr %glob) #0 {		define dso_local i32 @predicated(i32 noundef %0, ptr %glob) #0 {
%2 = alloca [101 x i32], align 4		%2 = alloca [101 x i32], align 4
%3 = alloca [21 x i32], align 4		%3 = alloca [21 x i32], align 4
call void @llvm.lifetime.start.p0(i64 404, ptr nonnull %2)		call void @llvm.lifetime.start.p0(i64 404, ptr nonnull %2)
call void @llvm.lifetime.start.p0(i64 84, ptr nonnull %3)		call void @llvm.lifetime.start.p0(i64 84, ptr nonnull %3)
%4 = icmp sgt i32 %0, 0		%4 = icmp sgt i32 %0, 0
br i1 %4, label %5, label %159		br i1 %4, label %5, label %159

▲ Show 20 Lines • Show All 198 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/ARM/tail-folding-reduces-vf.ll

	; RUN: opt -opaque-pointers=0 < %s -mattr=+mve,+mve.fp -passes=loop-vectorize -tail-predication=disabled -S \| FileCheck %s --check-prefixes=DEFAULT			; RUN: opt -opaque-pointers=0 < %s -mattr=+mve,+mve.fp -passes=loop-vectorize -tail-predication=disabled -S \| FileCheck %s --check-prefixes=DEFAULT
	; RUN: opt -opaque-pointers=0 < %s -mattr=+mve,+mve.fp -passes=loop-vectorize -prefer-predicate-over-epilogue=predicate-else-scalar-epilogue -S \| FileCheck %s --check-prefixes=TAILPRED			; RUN: opt -opaque-pointers=0 < %s -mattr=+mve,+mve.fp -passes=loop-vectorize -prefer-predicate-over-epilogue=predicate-dont-vectorize -S \| FileCheck %s --check-prefixes=TAILPRED
				; RUN: opt -opaque-pointers=0 < %s -mattr=+mve,+mve.fp -passes=loop-vectorize -prefer-predicate-over-epilogue=predicate-else-scalar-epilogue -S \| FileCheck %s --check-prefixes=DEFAULT

	target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"			target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
	target triple = "thumbv8.1m.main-arm-none-eabi"			target triple = "thumbv8.1m.main-arm-none-eabi"

	; When TP is disabled, this test can vectorize with a VF of 16.			; When TP is disabled, this test can vectorize with a VF of 16.
	; When TP is enabled, this test should vectorize with a VF of 8.			; When TP is enabled, this test should vectorize with a VF of 8.
				; When both are allowed, the VF=16 without tail folding should win out.
	;			;
	; DEFAULT: load <16 x i8>, <16 x i8>*			; DEFAULT: load <16 x i8>, <16 x i8>*
	; DEFAULT: sext <16 x i8> %{{.*}} to <16 x i16>			; DEFAULT: sext <16 x i8> %{{.*}} to <16 x i16>
	; DEFAULT: add <16 x i16>			; DEFAULT: add <16 x i16>
	; DEFAULT-NOT: llvm.masked.load			; DEFAULT-NOT: llvm.masked.load
	; DEFAULT-NOT: llvm.masked.store			; DEFAULT-NOT: llvm.masked.store
	;			;
	; TAILPRED: llvm.masked.load.v8i8.p0v8i8			; TAILPRED: llvm.masked.load.v8i8.p0v8i8
	▲ Show 20 Lines • Show All 95 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/PowerPC/reg-usage.ll

; RUN: opt < %s -debug-only=loop-vectorize -passes='function(loop-vectorize),default<O2>' -vectorizer-maximize-bandwidth -mtriple=powerpc64-unknown-linux -S -mcpu=pwr8 2>&1 \| FileCheck %s --check-prefixes=CHECK,CHECK-PWR8		; RUN: opt < %s -debug-only=loop-vectorize -passes='function(loop-vectorize),default<O2>' -vectorizer-maximize-bandwidth -mtriple=powerpc64-unknown-linux -S -mcpu=pwr8 2>&1 \| FileCheck %s --check-prefixes=CHECK,CHECK-PWR8
; RUN: opt < %s -debug-only=loop-vectorize -passes='function(loop-vectorize),default<O2>' -vectorizer-maximize-bandwidth -mtriple=powerpc64le-unknown-linux -S -mcpu=pwr9 2>&1 \| FileCheck %s --check-prefixes=CHECK,CHECK-PWR9		; RUN: opt < %s -debug-only=loop-vectorize -passes='function(loop-vectorize),default<O2>' -vectorizer-maximize-bandwidth -mtriple=powerpc64le-unknown-linux -S -mcpu=pwr9 2>&1 \| FileCheck %s --check-prefixes=CHECK,CHECK-PWR9
; REQUIRES: asserts		; REQUIRES: asserts

@a = global [1024 x i8] zeroinitializer, align 16		@a = global [1024 x i8] zeroinitializer, align 16
@b = global [1024 x i8] zeroinitializer, align 16		@b = global [1024 x i8] zeroinitializer, align 16

define i32 @foo() {		define i32 @foo() {
; CHECK-LABEL: foo		; CHECK-LABEL: foo

; CHECK-PWR8: Executing best plan with VF=16, UF=4		; CHECK-PWR8: Executing best plan with TailFold=false, VF=16, UF=4

; CHECK-PWR9: Executing best plan with VF=8, UF=8		; CHECK-PWR9: Executing best plan with TailFold=false, VF=8, UF=8


entry:		entry:
br label %for.body		br label %for.body

for.cond.cleanup:		for.cond.cleanup:
%add.lcssa = phi i32 [ %add, %for.body ]		%add.lcssa = phi i32 [ %add, %for.body ]
ret i32 %add.lcssa		ret i32 %add.lcssa
Show All 19 Lines

define i32 @goo() {		define i32 @goo() {
; For indvars.iv used in a computating chain only feeding into getelementptr or cmp,		; For indvars.iv used in a computating chain only feeding into getelementptr or cmp,
; it will not have vector version and the vector register usage will not exceed the		; it will not have vector version and the vector register usage will not exceed the
; available vector register number.		; available vector register number.

; CHECK-LABEL: goo		; CHECK-LABEL: goo

; CHECK: Executing best plan with VF=16, UF=4		; CHECK: Executing best plan with TailFold=false, VF=16, UF=4

entry:		entry:
br label %for.body		br label %for.body

for.cond.cleanup: ; preds = %for.body		for.cond.cleanup: ; preds = %for.body
%add.lcssa = phi i32 [ %add, %for.body ]		%add.lcssa = phi i32 [ %add, %for.body ]
ret i32 %add.lcssa		ret i32 %add.lcssa

Show All 16 Lines	for.body: ; preds = %for.body, %entry
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond = icmp eq i64 %indvars.iv.next, 1024		%exitcond = icmp eq i64 %indvars.iv.next, 1024
br i1 %exitcond, label %for.cond.cleanup, label %for.body		br i1 %exitcond, label %for.cond.cleanup, label %for.body
}		}

define i64 @bar(ptr nocapture %a) {		define i64 @bar(ptr nocapture %a) {
; CHECK-LABEL: bar		; CHECK-LABEL: bar

; CHECK: Executing best plan with VF=2, UF=12		; CHECK: Executing best plan with TailFold=false, VF=2, UF=12

entry:		entry:
br label %for.body		br label %for.body

for.cond.cleanup:		for.cond.cleanup:
%add2.lcssa = phi i64 [ %add2, %for.body ]		%add2.lcssa = phi i64 [ %add2, %for.body ]
ret i64 %add2.lcssa		ret i64 %add2.lcssa

Show All 11 Lines
}		}

@d = external global [0 x i64], align 8		@d = external global [0 x i64], align 8
@e = external global [0 x i32], align 4		@e = external global [0 x i32], align 4
@c = external global [0 x i32], align 4		@c = external global [0 x i32], align 4

define void @hoo(i32 %n) {		define void @hoo(i32 %n) {
; CHECK-LABEL: hoo		; CHECK-LABEL: hoo
; CHECK: Executing best plan with VF=1, UF=12		; CHECK: Executing best plan with TailFold=false, VF=1, UF=12

entry:		entry:
br label %for.body		br label %for.body

for.body: ; preds = %for.body, %entry		for.body: ; preds = %for.body, %entry
%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]		%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
%arrayidx = getelementptr inbounds [0 x i64], ptr @d, i64 0, i64 %indvars.iv		%arrayidx = getelementptr inbounds [0 x i64], ptr @d, i64 0, i64 %indvars.iv
%tmp = load i64, ptr %arrayidx, align 8		%tmp = load i64, ptr %arrayidx, align 8
▲ Show 20 Lines • Show All 163 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll

	Show First 20 Lines • Show All 102 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: LV(REG): RegisterClass: RISCV::VRRC, 2 registers			; CHECK-NEXT: LV(REG): RegisterClass: RISCV::VRRC, 2 registers
	; CHECK-NEXT: LV(REG): Found invariant usage: 1 item			; CHECK-NEXT: LV(REG): Found invariant usage: 1 item
	; CHECK-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 1 registers			; CHECK-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 1 registers
	; CHECK-NEXT: LV: The target has 31 registers of RISCV::GPRRC register class			; CHECK-NEXT: LV: The target has 31 registers of RISCV::GPRRC register class
	; CHECK-NEXT: LV: The target has 32 registers of RISCV::VRRC register class			; CHECK-NEXT: LV: The target has 32 registers of RISCV::VRRC register class
	; CHECK-NEXT: LV: Loop cost is 23			; CHECK-NEXT: LV: Loop cost is 23
	; CHECK-NEXT: LV: IC is 1			; CHECK-NEXT: LV: IC is 1
	; CHECK-NEXT: LV: VF is vscale x 4			; CHECK-NEXT: LV: VF is vscale x 4
				; CHECK-NEXT: LV: Fold Tail is false
	; CHECK-NEXT: LV: Not Interleaving.			; CHECK-NEXT: LV: Not Interleaving.
	; CHECK-NEXT: LV: Interleaving is not beneficial.			; CHECK-NEXT: LV: Interleaving is not beneficial.
	; CHECK-NEXT: LV: Found a vectorizable loop (vscale x 4) in <stdin>			; CHECK-NEXT: LV: Found a vectorizable loop (vscale x 4) in <stdin>
	; CHECK-NEXT: LEV: Epilogue vectorization is not profitable for this loop			; CHECK-NEXT: LEV: Epilogue vectorization is not profitable for this loop
	; CHECK-NEXT: Executing best plan with VF=vscale x 4, UF=1			; CHECK-NEXT: Executing best plan with TailFold=false, VF=vscale x 4, UF=1
	; CHECK-NEXT: LV: Interleaving disabled by the pass manager			; CHECK-NEXT: LV: Interleaving disabled by the pass manager
	;			;
	entry:			entry:
	%cmp7 = icmp sgt i32 %n, 0			%cmp7 = icmp sgt i32 %n, 0
	br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup			br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup

	for.body.preheader: ; preds = %entry			for.body.preheader: ; preds = %entry
	%0 = zext i32 %n to i64			%0 = zext i32 %n to i64
	▲ Show 20 Lines • Show All 111 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: LV(REG): RegisterClass: RISCV::VRRC, 2 registers			; CHECK-NEXT: LV(REG): RegisterClass: RISCV::VRRC, 2 registers
	; CHECK-NEXT: LV(REG): Found invariant usage: 1 item			; CHECK-NEXT: LV(REG): Found invariant usage: 1 item
	; CHECK-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 1 registers			; CHECK-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 1 registers
	; CHECK-NEXT: LV: The target has 31 registers of RISCV::GPRRC register class			; CHECK-NEXT: LV: The target has 31 registers of RISCV::GPRRC register class
	; CHECK-NEXT: LV: The target has 32 registers of RISCV::VRRC register class			; CHECK-NEXT: LV: The target has 32 registers of RISCV::VRRC register class
	; CHECK-NEXT: LV: Loop cost is 23			; CHECK-NEXT: LV: Loop cost is 23
	; CHECK-NEXT: LV: IC is 1			; CHECK-NEXT: LV: IC is 1
	; CHECK-NEXT: LV: VF is vscale x 4			; CHECK-NEXT: LV: VF is vscale x 4
				; CHECK-NEXT: LV: Fold Tail is false
	; CHECK-NEXT: LV: Not Interleaving.			; CHECK-NEXT: LV: Not Interleaving.
	; CHECK-NEXT: LV: Interleaving is not beneficial.			; CHECK-NEXT: LV: Interleaving is not beneficial.
	; CHECK-NEXT: LV: Found a vectorizable loop (vscale x 4) in <stdin>			; CHECK-NEXT: LV: Found a vectorizable loop (vscale x 4) in <stdin>
	; CHECK-NEXT: LEV: Epilogue vectorization is not profitable for this loop			; CHECK-NEXT: LEV: Epilogue vectorization is not profitable for this loop
	; CHECK-NEXT: Executing best plan with VF=vscale x 4, UF=1			; CHECK-NEXT: Executing best plan with TailFold=false, VF=vscale x 4, UF=1
	; CHECK-NEXT: LV: Interleaving disabled by the pass manager			; CHECK-NEXT: LV: Interleaving disabled by the pass manager
	;			;
	entry:			entry:
	%cmp7 = icmp sgt i32 %n, 0			%cmp7 = icmp sgt i32 %n, 0
	br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup			br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup

	for.body.preheader: ; preds = %entry			for.body.preheader: ; preds = %entry
	%0 = zext i32 %n to i64			%0 = zext i32 %n to i64
	Show All 25 Lines

llvm/test/Transforms/LoopVectorize/X86/vect.omp.force.small-tc.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -passes=loop-vectorize -mcpu=corei7-avx -S -vectorizer-min-trip-count=21 \| FileCheck %s			; RUN: opt < %s -passes=loop-vectorize -mcpu=corei7-avx -S -vectorizer-min-trip-count=21 \| FileCheck %s

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux"			target triple = "x86_64-unknown-linux"

	;			;
	; The source code for the test:			; The source code for the test:
	;			;
	; void foo(ptr restrict A, ptr restrict B)			; void foo(ptr restrict A, ptr restrict B)
	; {			; {
	; for (int i = 0; i < 20; ++i) A[i] += B[i];			; for (int i = 0; i < 20; ++i) A[i] += B[i];
	; }			; }
	;			;

	;			;
	; This loop will be vectorized, although the trip count is below the threshold, but vectorization is explicitly forced in metadata.			; This loop will be vectorized, although the trip count is below the threshold, but
				; vectorization is explicitly forced in metadata. The trip count of 4 is chosen as
				; it more nicely divides the loop count of 20, produce a lower total cost.
	;			;
	define void @vectorized(ptr noalias nocapture %A, ptr noalias nocapture readonly %B) {			define void @vectorized(ptr noalias nocapture %A, ptr noalias nocapture readonly %B) {
	; CHECK-LABEL: @vectorized(			; CHECK-LABEL: @vectorized(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0			; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds float, ptr [[B:%.]], i64 [[TMP0]]			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds float, ptr [[B:%.]], i64 [[TMP0]]
	; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds float, ptr [[TMP1]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds float, ptr [[TMP1]], i32 0
	; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <8 x float>, ptr [[TMP2]], align 4, !llvm.access.group [[ACC_GRP0:![0-9]+]]			; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x float>, ptr [[TMP2]], align 4, !llvm.access.group [[ACC_GRP0:![0-9]+]]
	; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds float, ptr [[A:%.]], i64 [[TMP0]]			; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds float, ptr [[A:%.]], i64 [[TMP0]]
	; CHECK-NEXT: [[TMP4:%.*]] = getelementptr inbounds float, ptr [[TMP3]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = getelementptr inbounds float, ptr [[TMP3]], i32 0
	; CHECK-NEXT: [[WIDE_LOAD1:%.*]] = load <8 x float>, ptr [[TMP4]], align 4, !llvm.access.group [[ACC_GRP0]]			; CHECK-NEXT: [[WIDE_LOAD1:%.*]] = load <4 x float>, ptr [[TMP4]], align 4, !llvm.access.group [[ACC_GRP0]]
	; CHECK-NEXT: [[TMP5:%.*]] = fadd fast <8 x float> [[WIDE_LOAD]], [[WIDE_LOAD1]]			; CHECK-NEXT: [[TMP5:%.*]] = fadd fast <4 x float> [[WIDE_LOAD]], [[WIDE_LOAD1]]
	; CHECK-NEXT: store <8 x float> [[TMP5]], ptr [[TMP4]], align 4, !llvm.access.group [[ACC_GRP0]]			; CHECK-NEXT: store <4 x float> [[TMP5]], ptr [[TMP4]], align 4, !llvm.access.group [[ACC_GRP0]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], 16			; CHECK-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], 20
	; CHECK-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP1:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP1:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 20, 16			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 20, 20
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 16, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 20, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds float, ptr [[B]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds float, ptr [[B]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: [[TMP7:%.*]] = load float, ptr [[ARRAYIDX]], align 4, !llvm.access.group [[ACC_GRP0]]			; CHECK-NEXT: [[TMP7:%.*]] = load float, ptr [[ARRAYIDX]], align 4, !llvm.access.group [[ACC_GRP0]]
	; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: [[TMP8:%.*]] = load float, ptr [[ARRAYIDX2]], align 4, !llvm.access.group [[ACC_GRP0]]			; CHECK-NEXT: [[TMP8:%.*]] = load float, ptr [[ARRAYIDX2]], align 4, !llvm.access.group [[ACC_GRP0]]
	; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[TMP7]], [[TMP8]]			; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[TMP7]], [[TMP8]]
	; CHECK-NEXT: store float [[ADD]], ptr [[ARRAYIDX2]], align 4, !llvm.access.group [[ACC_GRP0]]			; CHECK-NEXT: store float [[ADD]], ptr [[ARRAYIDX2]], align 4, !llvm.access.group [[ACC_GRP0]]
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 20			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 20
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	▲ Show 20 Lines • Show All 152 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/first-order-recurrence-sink-replicate-region.ll

; REQUIRES: asserts		; REQUIRES: asserts
; RUN: opt < %s -passes=loop-vectorize -force-vector-width=2 -force-vector-interleave=1 -force-widen-divrem-via-safe-divisor=0 -disable-output -debug-only=loop-vectorize 2>&1 \| FileCheck %s		; RUN: opt < %s -passes=loop-vectorize -force-vector-width=2 -force-vector-interleave=1 -force-widen-divrem-via-safe-divisor=0 -disable-output -debug-only=loop-vectorize 2>&1 \| FileCheck %s

target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"		target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"

; Test cases for PR50009, which require sinking a replicate-region due to a		; Test cases for PR50009, which require sinking a replicate-region due to a
; first-order recurrence.		; first-order recurrence.

define void @sink_replicate_region_1(i32 %x, ptr %ptr, ptr noalias %dst) optsize {		define void @sink_replicate_region_1(i32 %x, ptr %ptr, ptr noalias %dst) optsize {
; CHECK-LABEL: sink_replicate_region_1		; CHECK-LABEL: sink_replicate_region_1
; CHECK: VPlan 'Initial VPlan for VF={2},UF>=1' {		; CHECK: VPlan 'Initial VPlan for Tail Folded VF={2},UF>=1' {
; CHECK-NEXT: Live-in vp<[[VEC_TC:%.+]]> = vector-trip-count		; CHECK-NEXT: Live-in vp<[[VEC_TC:%.+]]> = vector-trip-count
; CHECK-NEXT: Live-in vp<[[BTC:%.+]]> = backedge-taken count		; CHECK-NEXT: Live-in vp<[[BTC:%.+]]> = backedge-taken count
; CHECK-EMPTY:		; CHECK-EMPTY:
; CHECK-NEXT: vector.ph:		; CHECK-NEXT: vector.ph:
; CHECK-NEXT: Successor(s): vector loop		; CHECK-NEXT: Successor(s): vector loop
; CHECK-EMPTY:		; CHECK-EMPTY:
; CHECK-NEXT: <x1> vector loop: {		; CHECK-NEXT: <x1> vector loop: {
; CHECK-NEXT: vector.body:		; CHECK-NEXT: vector.body:
▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	loop:
br i1 %ec, label %exit, label %loop		br i1 %ec, label %exit, label %loop

exit:		exit:
ret void		ret void
}		}

define void @sink_replicate_region_2(i32 %x, i8 %y, ptr %ptr) optsize {		define void @sink_replicate_region_2(i32 %x, i8 %y, ptr %ptr) optsize {
; CHECK-LABEL: sink_replicate_region_2		; CHECK-LABEL: sink_replicate_region_2
; CHECK: VPlan 'Initial VPlan for VF={2},UF>=1' {		; CHECK: VPlan 'Initial VPlan for Tail Folded VF={2},UF>=1' {
; CHECK-NEXT: Live-in vp<[[VEC_TC:%.+]]> = vector-trip-count		; CHECK-NEXT: Live-in vp<[[VEC_TC:%.+]]> = vector-trip-count
; CHECK-NEXT: Live-in vp<[[BTC:%.+]]> = backedge-taken count		; CHECK-NEXT: Live-in vp<[[BTC:%.+]]> = backedge-taken count
; CHECK-EMPTY:		; CHECK-EMPTY:
; CHECK-NEXT: vector.ph:		; CHECK-NEXT: vector.ph:
; CHECK-NEXT: Successor(s): vector loop		; CHECK-NEXT: Successor(s): vector loop
; CHECK-EMPTY:		; CHECK-EMPTY:
; CHECK-NEXT: <x1> vector loop: {		; CHECK-NEXT: <x1> vector loop: {
; CHECK-NEXT: vector.body:		; CHECK-NEXT: vector.body:
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	loop:
br i1 %ec, label %exit, label %loop		br i1 %ec, label %exit, label %loop

exit:		exit:
ret void		ret void
}		}

define i32 @sink_replicate_region_3_reduction(i32 %x, i8 %y, ptr %ptr) optsize {		define i32 @sink_replicate_region_3_reduction(i32 %x, i8 %y, ptr %ptr) optsize {
; CHECK-LABEL: sink_replicate_region_3_reduction		; CHECK-LABEL: sink_replicate_region_3_reduction
; CHECK: VPlan 'Initial VPlan for VF={2},UF>=1' {		; CHECK: VPlan 'Initial VPlan for Tail Folded VF={2},UF>=1' {
; CHECK-NEXT: Live-in vp<[[VEC_TC:%.+]]> = vector-trip-count		; CHECK-NEXT: Live-in vp<[[VEC_TC:%.+]]> = vector-trip-count
; CHECK-NEXT: Live-in vp<[[BTC:%.+]]> = backedge-taken count		; CHECK-NEXT: Live-in vp<[[BTC:%.+]]> = backedge-taken count
; CHECK-EMPTY:		; CHECK-EMPTY:
; CHECK-NEXT: vector.ph:		; CHECK-NEXT: vector.ph:
; CHECK-NEXT: Successor(s): vector loop		; CHECK-NEXT: Successor(s): vector loop
; CHECK-EMPTY:		; CHECK-EMPTY:
; CHECK-NEXT: <x1> vector loop: {		; CHECK-NEXT: <x1> vector loop: {
; CHECK-NEXT: vector.body:		; CHECK-NEXT: vector.body:
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	exit:
%res = phi i32 [ %and.red.next, %loop ]		%res = phi i32 [ %and.red.next, %loop ]
ret i32 %res		ret i32 %res
}		}

; To sink the replicate region containing %rem, we need to split the block		; To sink the replicate region containing %rem, we need to split the block
; containing %conv at the end, because %conv is the last recipe in the block.		; containing %conv at the end, because %conv is the last recipe in the block.
define void @sink_replicate_region_4_requires_split_at_end_of_block(i32 %x, ptr %ptr, ptr noalias %dst) optsize {		define void @sink_replicate_region_4_requires_split_at_end_of_block(i32 %x, ptr %ptr, ptr noalias %dst) optsize {
; CHECK-LABEL: sink_replicate_region_4_requires_split_at_end_of_block		; CHECK-LABEL: sink_replicate_region_4_requires_split_at_end_of_block
; CHECK: VPlan 'Initial VPlan for VF={2},UF>=1' {		; CHECK: VPlan 'Initial VPlan for Tail Folded VF={2},UF>=1' {
; CHECK-NEXT: Live-in vp<[[VEC_TC:%.+]]> = vector-trip-count		; CHECK-NEXT: Live-in vp<[[VEC_TC:%.+]]> = vector-trip-count
; CHECK-NEXT: Live-in vp<[[BTC:%.+]]> = backedge-taken count		; CHECK-NEXT: Live-in vp<[[BTC:%.+]]> = backedge-taken count
; CHECK-EMPTY:		; CHECK-EMPTY:
; CHECK-NEXT: vector.ph:		; CHECK-NEXT: vector.ph:
; CHECK-NEXT: Successor(s): vector loop		; CHECK-NEXT: Successor(s): vector loop
; CHECK-EMPTY:		; CHECK-EMPTY:
; CHECK-NEXT: <x1> vector loop: {		; CHECK-NEXT: <x1> vector loop: {
; CHECK-NEXT: vector.body:		; CHECK-NEXT: vector.body:
▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines

exit:		exit:
ret void		ret void
}		}

; Test case that requires sinking a recipe in a replicate region after another replicate region.		; Test case that requires sinking a recipe in a replicate region after another replicate region.
define void @sink_replicate_region_after_replicate_region(ptr %ptr, ptr noalias %dst.2, i32 %x, i8 %y) optsize {		define void @sink_replicate_region_after_replicate_region(ptr %ptr, ptr noalias %dst.2, i32 %x, i8 %y) optsize {
; CHECK-LABEL: sink_replicate_region_after_replicate_region		; CHECK-LABEL: sink_replicate_region_after_replicate_region
; CHECK: VPlan 'Initial VPlan for VF={2},UF>=1' {		; CHECK: VPlan 'Initial VPlan for Tail Folded VF={2},UF>=1' {
; CHECK-NEXT: Live-in vp<[[VEC_TC:%.+]]> = vector-trip-count		; CHECK-NEXT: Live-in vp<[[VEC_TC:%.+]]> = vector-trip-count
; CHECK-NEXT: Live-in vp<[[BTC:%.+]]> = backedge-taken count		; CHECK-NEXT: Live-in vp<[[BTC:%.+]]> = backedge-taken count
; CHECK-EMPTY:		; CHECK-EMPTY:
; CHECK-NEXT: vector.ph:		; CHECK-NEXT: vector.ph:
; CHECK-NEXT: Successor(s): vector loop		; CHECK-NEXT: Successor(s): vector loop
; CHECK-EMPTY:		; CHECK-EMPTY:
; CHECK-NEXT: <x1> vector loop: {		; CHECK-NEXT: <x1> vector loop: {
; CHECK-NEXT: vector.body:		; CHECK-NEXT: vector.body:
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	loop: ; preds = %loop, %entry
br i1 %C, label %exit, label %loop		br i1 %C, label %exit, label %loop

exit: ; preds = %loop		exit: ; preds = %loop
ret void		ret void
}		}

define void @need_new_block_after_sinking_pr56146(i32 %x, ptr %src, ptr noalias %dst) {		define void @need_new_block_after_sinking_pr56146(i32 %x, ptr %src, ptr noalias %dst) {
; CHECK-LABEL: need_new_block_after_sinking_pr56146		; CHECK-LABEL: need_new_block_after_sinking_pr56146
; CHECK: VPlan 'Initial VPlan for VF={2},UF>=1' {		; CHECK: VPlan 'Initial VPlan for Tail Folded VF={2},UF>=1' {
; CHECK-NEXT: Live-in vp<[[VEC_TC:%.+]]> = vector-trip-count		; CHECK-NEXT: Live-in vp<[[VEC_TC:%.+]]> = vector-trip-count
; CHECK-NEXT: Live-in vp<[[BTC:%.+]]> = backedge-taken count		; CHECK-NEXT: Live-in vp<[[BTC:%.+]]> = backedge-taken count
; CHECK-EMPTY:		; CHECK-EMPTY:
; CHECK-NEXT: vector.ph:		; CHECK-NEXT: vector.ph:
; CHECK-NEXT: Successor(s): vector loop		; CHECK-NEXT: Successor(s): vector loop
; CHECK-EMPTY:		; CHECK-EMPTY:
; CHECK-NEXT: <x1> vector loop: {		; CHECK-NEXT: <x1> vector loop: {
; CHECK-NEXT: vector.body:		; CHECK-NEXT: vector.body:
▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/first-order-recurrence.ll

	Show First 20 Lines • Show All 68 Lines • ▼ Show 20 Lines
	; UNROLL-NO-IC-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; UNROLL-NO-IC-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; UNROLL-NO-IC-NEXT: [[ARRAYIDX32:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[INDVARS_IV_NEXT]]			; UNROLL-NO-IC-NEXT: [[ARRAYIDX32:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[INDVARS_IV_NEXT]]
	; UNROLL-NO-IC-NEXT: [[TMP20]] = load i32, ptr [[ARRAYIDX32]], align 4			; UNROLL-NO-IC-NEXT: [[TMP20]] = load i32, ptr [[ARRAYIDX32]], align 4
	; UNROLL-NO-IC-NEXT: [[ARRAYIDX34:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[INDVARS_IV]]			; UNROLL-NO-IC-NEXT: [[ARRAYIDX34:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[INDVARS_IV]]
	; UNROLL-NO-IC-NEXT: [[ADD35:%.*]] = add i32 [[TMP20]], [[SCALAR_RECUR]]			; UNROLL-NO-IC-NEXT: [[ADD35:%.*]] = add i32 [[TMP20]], [[SCALAR_RECUR]]
	; UNROLL-NO-IC-NEXT: store i32 [[ADD35]], ptr [[ARRAYIDX34]], align 4			; UNROLL-NO-IC-NEXT: store i32 [[ADD35]], ptr [[ARRAYIDX34]], align 4
	; UNROLL-NO-IC-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32			; UNROLL-NO-IC-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
	; UNROLL-NO-IC-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[LFTR_WIDEIV]], [[N]]			; UNROLL-NO-IC-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[LFTR_WIDEIV]], [[N]]
	; UNROLL-NO-IC-NEXT: br i1 [[EXITCOND]], label [[FOR_EXIT]], label [[SCALAR_BODY]], !llvm.loop [[LOOP2:![0-9]+]]			; UNROLL-NO-IC-NEXT: br i1 [[EXITCOND]], label [[FOR_EXIT]], label [[SCALAR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
	; UNROLL-NO-IC: for.exit:			; UNROLL-NO-IC: for.exit:
	; UNROLL-NO-IC-NEXT: ret void			; UNROLL-NO-IC-NEXT: ret void
	;			;
	; UNROLL-NO-VF-LABEL: @recurrence_1(			; UNROLL-NO-VF-LABEL: @recurrence_1(
	; UNROLL-NO-VF-NEXT: entry:			; UNROLL-NO-VF-NEXT: entry:
	; UNROLL-NO-VF-NEXT: br label [[FOR_PREHEADER:%.*]]			; UNROLL-NO-VF-NEXT: br label [[FOR_PREHEADER:%.*]]
	; UNROLL-NO-VF: for.preheader:			; UNROLL-NO-VF: for.preheader:
	; UNROLL-NO-VF-NEXT: [[PRE_LOAD:%.]] = load i32, ptr [[A:%.]], align 4			; UNROLL-NO-VF-NEXT: [[PRE_LOAD:%.]] = load i32, ptr [[A:%.]], align 4
	Show All 39 Lines
	; UNROLL-NO-VF-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; UNROLL-NO-VF-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; UNROLL-NO-VF-NEXT: [[ARRAYIDX32:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[INDVARS_IV_NEXT]]			; UNROLL-NO-VF-NEXT: [[ARRAYIDX32:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[INDVARS_IV_NEXT]]
	; UNROLL-NO-VF-NEXT: [[TMP16]] = load i32, ptr [[ARRAYIDX32]], align 4			; UNROLL-NO-VF-NEXT: [[TMP16]] = load i32, ptr [[ARRAYIDX32]], align 4
	; UNROLL-NO-VF-NEXT: [[ARRAYIDX34:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[INDVARS_IV]]			; UNROLL-NO-VF-NEXT: [[ARRAYIDX34:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[INDVARS_IV]]
	; UNROLL-NO-VF-NEXT: [[ADD35:%.*]] = add i32 [[TMP16]], [[SCALAR_RECUR]]			; UNROLL-NO-VF-NEXT: [[ADD35:%.*]] = add i32 [[TMP16]], [[SCALAR_RECUR]]
	; UNROLL-NO-VF-NEXT: store i32 [[ADD35]], ptr [[ARRAYIDX34]], align 4			; UNROLL-NO-VF-NEXT: store i32 [[ADD35]], ptr [[ARRAYIDX34]], align 4
	; UNROLL-NO-VF-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32			; UNROLL-NO-VF-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
	; UNROLL-NO-VF-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[LFTR_WIDEIV]], [[N]]			; UNROLL-NO-VF-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[LFTR_WIDEIV]], [[N]]
	; UNROLL-NO-VF-NEXT: br i1 [[EXITCOND]], label [[FOR_EXIT]], label [[SCALAR_BODY]], !llvm.loop [[LOOP2:![0-9]+]]			; UNROLL-NO-VF-NEXT: br i1 [[EXITCOND]], label [[FOR_EXIT]], label [[SCALAR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
	; UNROLL-NO-VF: for.exit:			; UNROLL-NO-VF: for.exit:
	; UNROLL-NO-VF-NEXT: ret void			; UNROLL-NO-VF-NEXT: ret void
	;			;
	; SINK-AFTER-LABEL: @recurrence_1(			; SINK-AFTER-LABEL: @recurrence_1(
	; SINK-AFTER-NEXT: entry:			; SINK-AFTER-NEXT: entry:
	; SINK-AFTER-NEXT: br label [[FOR_PREHEADER:%.*]]			; SINK-AFTER-NEXT: br label [[FOR_PREHEADER:%.*]]
	; SINK-AFTER: for.preheader:			; SINK-AFTER: for.preheader:
	; SINK-AFTER-NEXT: [[PRE_LOAD:%.]] = load i32, ptr [[A:%.]], align 4			; SINK-AFTER-NEXT: [[PRE_LOAD:%.]] = load i32, ptr [[A:%.]], align 4
	Show All 38 Lines
	; SINK-AFTER-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; SINK-AFTER-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; SINK-AFTER-NEXT: [[ARRAYIDX32:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[INDVARS_IV_NEXT]]			; SINK-AFTER-NEXT: [[ARRAYIDX32:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[INDVARS_IV_NEXT]]
	; SINK-AFTER-NEXT: [[TMP12]] = load i32, ptr [[ARRAYIDX32]], align 4			; SINK-AFTER-NEXT: [[TMP12]] = load i32, ptr [[ARRAYIDX32]], align 4
	; SINK-AFTER-NEXT: [[ARRAYIDX34:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[INDVARS_IV]]			; SINK-AFTER-NEXT: [[ARRAYIDX34:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[INDVARS_IV]]
	; SINK-AFTER-NEXT: [[ADD35:%.*]] = add i32 [[TMP12]], [[SCALAR_RECUR]]			; SINK-AFTER-NEXT: [[ADD35:%.*]] = add i32 [[TMP12]], [[SCALAR_RECUR]]
	; SINK-AFTER-NEXT: store i32 [[ADD35]], ptr [[ARRAYIDX34]], align 4			; SINK-AFTER-NEXT: store i32 [[ADD35]], ptr [[ARRAYIDX34]], align 4
	; SINK-AFTER-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32			; SINK-AFTER-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
	; SINK-AFTER-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[LFTR_WIDEIV]], [[N]]			; SINK-AFTER-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[LFTR_WIDEIV]], [[N]]
	; SINK-AFTER-NEXT: br i1 [[EXITCOND]], label [[FOR_EXIT]], label [[SCALAR_BODY]], !llvm.loop [[LOOP2:![0-9]+]]			; SINK-AFTER-NEXT: br i1 [[EXITCOND]], label [[FOR_EXIT]], label [[SCALAR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
	; SINK-AFTER: for.exit:			; SINK-AFTER: for.exit:
	; SINK-AFTER-NEXT: ret void			; SINK-AFTER-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.preheader			br label %for.preheader

	for.preheader:			for.preheader:
	%pre_load = load i32, ptr %a			%pre_load = load i32, ptr %a
	▲ Show 20 Lines • Show All 140 Lines • ▼ Show 20 Lines
	; UNROLL-NO-VF-NEXT: [[TMP13:%.*]] = select i1 [[TMP11]], i32 [[TMP9]], i32 0			; UNROLL-NO-VF-NEXT: [[TMP13:%.*]] = select i1 [[TMP11]], i32 [[TMP9]], i32 0
	; UNROLL-NO-VF-NEXT: [[TMP14:%.*]] = select i1 [[TMP12]], i32 [[TMP10]], i32 0			; UNROLL-NO-VF-NEXT: [[TMP14:%.*]] = select i1 [[TMP12]], i32 [[TMP10]], i32 0
	; UNROLL-NO-VF-NEXT: [[TMP15:%.*]] = icmp slt i32 [[VEC_PHI]], [[TMP13]]			; UNROLL-NO-VF-NEXT: [[TMP15:%.*]] = icmp slt i32 [[VEC_PHI]], [[TMP13]]
	; UNROLL-NO-VF-NEXT: [[TMP16:%.*]] = icmp slt i32 [[VEC_PHI1]], [[TMP14]]			; UNROLL-NO-VF-NEXT: [[TMP16:%.*]] = icmp slt i32 [[VEC_PHI1]], [[TMP14]]
	; UNROLL-NO-VF-NEXT: [[TMP17]] = select i1 [[TMP15]], i32 [[VEC_PHI]], i32 [[TMP13]]			; UNROLL-NO-VF-NEXT: [[TMP17]] = select i1 [[TMP15]], i32 [[VEC_PHI]], i32 [[TMP13]]
	; UNROLL-NO-VF-NEXT: [[TMP18]] = select i1 [[TMP16]], i32 [[VEC_PHI1]], i32 [[TMP14]]			; UNROLL-NO-VF-NEXT: [[TMP18]] = select i1 [[TMP16]], i32 [[VEC_PHI1]], i32 [[TMP14]]
	; UNROLL-NO-VF-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2			; UNROLL-NO-VF-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
	; UNROLL-NO-VF-NEXT: [[TMP19:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; UNROLL-NO-VF-NEXT: [[TMP19:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; UNROLL-NO-VF-NEXT: br i1 [[TMP19]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]			; UNROLL-NO-VF-NEXT: br i1 [[TMP19]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
	; UNROLL-NO-VF: middle.block:			; UNROLL-NO-VF: middle.block:
	; UNROLL-NO-VF-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp slt i32 [[TMP17]], [[TMP18]]			; UNROLL-NO-VF-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp slt i32 [[TMP17]], [[TMP18]]
	; UNROLL-NO-VF-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select i1 [[RDX_MINMAX_CMP]], i32 [[TMP17]], i32 [[TMP18]]			; UNROLL-NO-VF-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select i1 [[RDX_MINMAX_CMP]], i32 [[TMP17]], i32 [[TMP18]]
	; UNROLL-NO-VF-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP2]], [[N_VEC]]			; UNROLL-NO-VF-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP2]], [[N_VEC]]
	; UNROLL-NO-VF-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[SCALAR_PH]]			; UNROLL-NO-VF-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[SCALAR_PH]]
	; UNROLL-NO-VF: scalar.ph:			; UNROLL-NO-VF: scalar.ph:
	; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR_INIT:%.*]] = phi i32 [ [[DOTPRE]], [[FOR_PREHEADER]] ], [ [[TMP8]], [[MIDDLE_BLOCK]] ]			; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR_INIT:%.*]] = phi i32 [ [[DOTPRE]], [[FOR_PREHEADER]] ], [ [[TMP8]], [[MIDDLE_BLOCK]] ]
	; UNROLL-NO-VF-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[FOR_PREHEADER]] ]			; UNROLL-NO-VF-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[FOR_PREHEADER]] ]
	Show All 14 Lines
	; UNROLL-NO-VF-NEXT: [[SUB3:%.*]] = sub nsw i32 [[TMP20]], [[SCALAR_RECUR]]			; UNROLL-NO-VF-NEXT: [[SUB3:%.*]] = sub nsw i32 [[TMP20]], [[SCALAR_RECUR]]
	; UNROLL-NO-VF-NEXT: [[CMP4:%.*]] = icmp sgt i32 [[SUB3]], 0			; UNROLL-NO-VF-NEXT: [[CMP4:%.*]] = icmp sgt i32 [[SUB3]], 0
	; UNROLL-NO-VF-NEXT: [[COND:%.*]] = select i1 [[CMP4]], i32 [[SUB3]], i32 0			; UNROLL-NO-VF-NEXT: [[COND:%.*]] = select i1 [[CMP4]], i32 [[SUB3]], i32 0
	; UNROLL-NO-VF-NEXT: [[CMP5:%.*]] = icmp slt i32 [[MINMAX_028]], [[COND]]			; UNROLL-NO-VF-NEXT: [[CMP5:%.*]] = icmp slt i32 [[MINMAX_028]], [[COND]]
	; UNROLL-NO-VF-NEXT: [[MINMAX_0_COND]] = select i1 [[CMP5]], i32 [[MINMAX_028]], i32 [[COND]]			; UNROLL-NO-VF-NEXT: [[MINMAX_0_COND]] = select i1 [[CMP5]], i32 [[MINMAX_028]], i32 [[COND]]
	; UNROLL-NO-VF-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; UNROLL-NO-VF-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; UNROLL-NO-VF-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32			; UNROLL-NO-VF-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
	; UNROLL-NO-VF-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[LFTR_WIDEIV]], [[N]]			; UNROLL-NO-VF-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[LFTR_WIDEIV]], [[N]]
	; UNROLL-NO-VF-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[SCALAR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]			; UNROLL-NO-VF-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[SCALAR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
	;			;
	; SINK-AFTER-LABEL: @recurrence_2(			; SINK-AFTER-LABEL: @recurrence_2(
	; SINK-AFTER-NEXT: entry:			; SINK-AFTER-NEXT: entry:
	; SINK-AFTER-NEXT: [[CMP27:%.]] = icmp sgt i32 [[N:%.]], 0			; SINK-AFTER-NEXT: [[CMP27:%.]] = icmp sgt i32 [[N:%.]], 0
	; SINK-AFTER-NEXT: br i1 [[CMP27]], label [[FOR_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]			; SINK-AFTER-NEXT: br i1 [[CMP27]], label [[FOR_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]
	; SINK-AFTER: for.preheader:			; SINK-AFTER: for.preheader:
	; SINK-AFTER-NEXT: [[ARRAYIDX2_PHI_TRANS_INSERT:%.]] = getelementptr inbounds i32, ptr [[A:%.]], i64 -1			; SINK-AFTER-NEXT: [[ARRAYIDX2_PHI_TRANS_INSERT:%.]] = getelementptr inbounds i32, ptr [[A:%.]], i64 -1
	; SINK-AFTER-NEXT: [[DOTPRE:%.*]] = load i32, ptr [[ARRAYIDX2_PHI_TRANS_INSERT]], align 4			; SINK-AFTER-NEXT: [[DOTPRE:%.*]] = load i32, ptr [[ARRAYIDX2_PHI_TRANS_INSERT]], align 4
	▲ Show 20 Lines • Show All 229 Lines • ▼ Show 20 Lines
	; UNROLL-NO-VF-NEXT: [[TMP16:%.*]] = fsub fast double [[TMP10]], [[TMP14]]			; UNROLL-NO-VF-NEXT: [[TMP16:%.*]] = fsub fast double [[TMP10]], [[TMP14]]
	; UNROLL-NO-VF-NEXT: [[TMP17:%.*]] = fsub fast double [[TMP11]], [[TMP15]]			; UNROLL-NO-VF-NEXT: [[TMP17:%.*]] = fsub fast double [[TMP11]], [[TMP15]]
	; UNROLL-NO-VF-NEXT: [[TMP18:%.*]] = getelementptr inbounds double, ptr [[B]], i64 [[TMP4]]			; UNROLL-NO-VF-NEXT: [[TMP18:%.*]] = getelementptr inbounds double, ptr [[B]], i64 [[TMP4]]
	; UNROLL-NO-VF-NEXT: [[TMP19:%.*]] = getelementptr inbounds double, ptr [[B]], i64 [[TMP5]]			; UNROLL-NO-VF-NEXT: [[TMP19:%.*]] = getelementptr inbounds double, ptr [[B]], i64 [[TMP5]]
	; UNROLL-NO-VF-NEXT: store double [[TMP16]], ptr [[TMP18]], align 8			; UNROLL-NO-VF-NEXT: store double [[TMP16]], ptr [[TMP18]], align 8
	; UNROLL-NO-VF-NEXT: store double [[TMP17]], ptr [[TMP19]], align 8			; UNROLL-NO-VF-NEXT: store double [[TMP17]], ptr [[TMP19]], align 8
	; UNROLL-NO-VF-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2			; UNROLL-NO-VF-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
	; UNROLL-NO-VF-NEXT: [[TMP20:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; UNROLL-NO-VF-NEXT: [[TMP20:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; UNROLL-NO-VF-NEXT: br i1 [[TMP20]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]			; UNROLL-NO-VF-NEXT: br i1 [[TMP20]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
	; UNROLL-NO-VF: middle.block:			; UNROLL-NO-VF: middle.block:
	; UNROLL-NO-VF-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP3]], [[N_VEC]]			; UNROLL-NO-VF-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP3]], [[N_VEC]]
	; UNROLL-NO-VF-NEXT: br i1 [[CMP_N]], label [[FOR_END_LOOPEXIT:%.*]], label [[SCALAR_PH]]			; UNROLL-NO-VF-NEXT: br i1 [[CMP_N]], label [[FOR_END_LOOPEXIT:%.*]], label [[SCALAR_PH]]
	; UNROLL-NO-VF: scalar.ph:			; UNROLL-NO-VF: scalar.ph:
	; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR_INIT:%.*]] = phi i16 [ [[TMP0]], [[FOR_PREHEADER]] ], [ [[TMP9]], [[MIDDLE_BLOCK]] ]			; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR_INIT:%.*]] = phi i16 [ [[TMP0]], [[FOR_PREHEADER]] ], [ [[TMP9]], [[MIDDLE_BLOCK]] ]
	; UNROLL-NO-VF-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 1, [[FOR_PREHEADER]] ]			; UNROLL-NO-VF-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 1, [[FOR_PREHEADER]] ]
	; UNROLL-NO-VF-NEXT: br label [[SCALAR_BODY:%.*]]			; UNROLL-NO-VF-NEXT: br label [[SCALAR_BODY:%.*]]
	; UNROLL-NO-VF: scalar.body:			; UNROLL-NO-VF: scalar.body:
	; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR:%.]] = phi i16 [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ], [ [[TMP21:%.]], [[SCALAR_BODY]] ]			; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR:%.]] = phi i16 [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ], [ [[TMP21:%.]], [[SCALAR_BODY]] ]
	; UNROLL-NO-VF-NEXT: [[ADVARS_IV:%.]] = phi i64 [ [[ADVARS_IV_NEXT:%.]], [[SCALAR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; UNROLL-NO-VF-NEXT: [[ADVARS_IV:%.]] = phi i64 [ [[ADVARS_IV_NEXT:%.]], [[SCALAR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; UNROLL-NO-VF-NEXT: [[ARRAYIDX5:%.*]] = getelementptr inbounds i16, ptr [[A]], i64 [[ADVARS_IV]]			; UNROLL-NO-VF-NEXT: [[ARRAYIDX5:%.*]] = getelementptr inbounds i16, ptr [[A]], i64 [[ADVARS_IV]]
	; UNROLL-NO-VF-NEXT: [[TMP21]] = load i16, ptr [[ARRAYIDX5]], align 2			; UNROLL-NO-VF-NEXT: [[TMP21]] = load i16, ptr [[ARRAYIDX5]], align 2
	; UNROLL-NO-VF-NEXT: [[CONV6:%.*]] = sitofp i16 [[TMP21]] to double			; UNROLL-NO-VF-NEXT: [[CONV6:%.*]] = sitofp i16 [[TMP21]] to double
	; UNROLL-NO-VF-NEXT: [[CONV11:%.*]] = sitofp i16 [[SCALAR_RECUR]] to double			; UNROLL-NO-VF-NEXT: [[CONV11:%.*]] = sitofp i16 [[SCALAR_RECUR]] to double
	; UNROLL-NO-VF-NEXT: [[MUL12:%.*]] = fmul fast double [[CONV11]], [[CONV1]]			; UNROLL-NO-VF-NEXT: [[MUL12:%.*]] = fmul fast double [[CONV11]], [[CONV1]]
	; UNROLL-NO-VF-NEXT: [[SUB13:%.*]] = fsub fast double [[CONV6]], [[MUL12]]			; UNROLL-NO-VF-NEXT: [[SUB13:%.*]] = fsub fast double [[CONV6]], [[MUL12]]
	; UNROLL-NO-VF-NEXT: [[ARRAYIDX15:%.*]] = getelementptr inbounds double, ptr [[B]], i64 [[ADVARS_IV]]			; UNROLL-NO-VF-NEXT: [[ARRAYIDX15:%.*]] = getelementptr inbounds double, ptr [[B]], i64 [[ADVARS_IV]]
	; UNROLL-NO-VF-NEXT: store double [[SUB13]], ptr [[ARRAYIDX15]], align 8			; UNROLL-NO-VF-NEXT: store double [[SUB13]], ptr [[ARRAYIDX15]], align 8
	; UNROLL-NO-VF-NEXT: [[ADVARS_IV_NEXT]] = add nuw nsw i64 [[ADVARS_IV]], 1			; UNROLL-NO-VF-NEXT: [[ADVARS_IV_NEXT]] = add nuw nsw i64 [[ADVARS_IV]], 1
	; UNROLL-NO-VF-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[ADVARS_IV_NEXT]] to i32			; UNROLL-NO-VF-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[ADVARS_IV_NEXT]] to i32
	; UNROLL-NO-VF-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[LFTR_WIDEIV]], [[N]]			; UNROLL-NO-VF-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[LFTR_WIDEIV]], [[N]]
	; UNROLL-NO-VF-NEXT: br i1 [[EXITCOND]], label [[FOR_END_LOOPEXIT]], label [[SCALAR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]			; UNROLL-NO-VF-NEXT: br i1 [[EXITCOND]], label [[FOR_END_LOOPEXIT]], label [[SCALAR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]
	; UNROLL-NO-VF: for.end.loopexit:			; UNROLL-NO-VF: for.end.loopexit:
	; UNROLL-NO-VF-NEXT: br label [[FOR_END]]			; UNROLL-NO-VF-NEXT: br label [[FOR_END]]
	; UNROLL-NO-VF: for.end:			; UNROLL-NO-VF: for.end:
	; UNROLL-NO-VF-NEXT: ret void			; UNROLL-NO-VF-NEXT: ret void
	;			;
	; SINK-AFTER-LABEL: @recurrence_3(			; SINK-AFTER-LABEL: @recurrence_3(
	; SINK-AFTER-NEXT: entry:			; SINK-AFTER-NEXT: entry:
	; SINK-AFTER-NEXT: [[TMP0:%.]] = load i16, ptr [[A:%.]], align 2			; SINK-AFTER-NEXT: [[TMP0:%.]] = load i16, ptr [[A:%.]], align 2
	▲ Show 20 Lines • Show All 322 Lines • ▼ Show 20 Lines
	; UNROLL-NO-VF-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; UNROLL-NO-VF-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; UNROLL-NO-VF: vector.ph:			; UNROLL-NO-VF: vector.ph:
	; UNROLL-NO-VF-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[TMP0]], 2			; UNROLL-NO-VF-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[TMP0]], 2
	; UNROLL-NO-VF-NEXT: [[N_VEC:%.*]] = sub i32 [[TMP0]], [[N_MOD_VF]]			; UNROLL-NO-VF-NEXT: [[N_VEC:%.*]] = sub i32 [[TMP0]], [[N_MOD_VF]]
	; UNROLL-NO-VF-NEXT: [[IND_END:%.*]] = sub i32 [[I_016]], [[N_VEC]]			; UNROLL-NO-VF-NEXT: [[IND_END:%.*]] = sub i32 [[I_016]], [[N_VEC]]
	; UNROLL-NO-VF-NEXT: br label [[VECTOR_BODY:%.*]]			; UNROLL-NO-VF-NEXT: br label [[VECTOR_BODY:%.*]]
	; UNROLL-NO-VF: vector.body:			; UNROLL-NO-VF: vector.body:
	; UNROLL-NO-VF-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; UNROLL-NO-VF-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; UNROLL-NO-VF-NEXT: [[VECTOR_RECUR:%.]] = phi i32 [ [[E_015]], [[VECTOR_PH]] ], [ [[TMP1:%.]], [[VECTOR_BODY]] ]			; UNROLL-NO-VF-NEXT: [[VECTOR_RECUR:%.]] = phi i32 [ [[E_015]], [[VECTOR_PH]] ], [ [[TMP2:%.]], [[VECTOR_BODY]] ]
	; UNROLL-NO-VF-NEXT: [[OFFSET_IDX:%.*]] = sub i32 [[I_016]], [[INDEX]]			; UNROLL-NO-VF-NEXT: [[OFFSET_IDX:%.*]] = sub i32 [[I_016]], [[INDEX]]
	; UNROLL-NO-VF-NEXT: [[TMP2:%.*]] = add i32 [[OFFSET_IDX]], 0			; UNROLL-NO-VF-NEXT: [[TMP1:%.*]] = add i32 [[OFFSET_IDX]], 0
	; UNROLL-NO-VF-NEXT: [[TMP3:%.*]] = add i32 [[OFFSET_IDX]], -1			; UNROLL-NO-VF-NEXT: [[TMP2]] = add i32 [[OFFSET_IDX]], -1
	; UNROLL-NO-VF-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 2			; UNROLL-NO-VF-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 2
	; UNROLL-NO-VF-NEXT: [[TMP1:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]			; UNROLL-NO-VF-NEXT: [[TMP3:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
	; UNROLL-NO-VF-NEXT: br i1 [[TMP1]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]			; UNROLL-NO-VF-NEXT: br i1 [[TMP3]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
	; UNROLL-NO-VF: middle.block:			; UNROLL-NO-VF: middle.block:
	; UNROLL-NO-VF-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[TMP0]], [[N_VEC]]			; UNROLL-NO-VF-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[TMP0]], [[N_VEC]]
	; UNROLL-NO-VF-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP3]], label [[SCALAR_PH]]			; UNROLL-NO-VF-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP3]], label [[SCALAR_PH]]
	; UNROLL-NO-VF: scalar.ph:			; UNROLL-NO-VF: scalar.ph:
	; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR_INIT:%.*]] = phi i32 [ [[E_015]], [[FOR_COND1_PREHEADER]] ], [ [[TMP3]], [[MIDDLE_BLOCK]] ]			; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR_INIT:%.*]] = phi i32 [ [[E_015]], [[FOR_COND1_PREHEADER]] ], [ [[TMP2]], [[MIDDLE_BLOCK]] ]
	; UNROLL-NO-VF-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ [[I_016]], [[FOR_COND1_PREHEADER]] ]			; UNROLL-NO-VF-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ [[I_016]], [[FOR_COND1_PREHEADER]] ]
	; UNROLL-NO-VF-NEXT: br label [[FOR_COND1:%.*]]			; UNROLL-NO-VF-NEXT: br label [[FOR_COND1:%.*]]
	; UNROLL-NO-VF: for.cond.cleanup:			; UNROLL-NO-VF: for.cond.cleanup:
	; UNROLL-NO-VF-NEXT: [[E_1_LCSSA_LCSSA:%.*]] = phi i32 [ [[E_1_LCSSA]], [[FOR_COND_CLEANUP3]] ]			; UNROLL-NO-VF-NEXT: [[E_1_LCSSA_LCSSA:%.*]] = phi i32 [ [[E_1_LCSSA]], [[FOR_COND_CLEANUP3]] ]
	; UNROLL-NO-VF-NEXT: ret i32 [[E_1_LCSSA_LCSSA]]			; UNROLL-NO-VF-NEXT: ret i32 [[E_1_LCSSA_LCSSA]]
	; UNROLL-NO-VF: for.cond1:			; UNROLL-NO-VF: for.cond1:
	; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR:%.]] = phi i32 [ [[K_0:%.]], [[FOR_COND1]] ], [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ]			; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR:%.]] = phi i32 [ [[K_0:%.]], [[FOR_COND1]] ], [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ]
	; UNROLL-NO-VF-NEXT: [[K_0]] = phi i32 [ [[DEC:%.*]], [[FOR_COND1]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; UNROLL-NO-VF-NEXT: [[K_0]] = phi i32 [ [[DEC:%.*]], [[FOR_COND1]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; UNROLL-NO-VF-NEXT: [[CMP2:%.*]] = icmp sgt i32 [[K_0]], 1			; UNROLL-NO-VF-NEXT: [[CMP2:%.*]] = icmp sgt i32 [[K_0]], 1
	; UNROLL-NO-VF-NEXT: [[DEC]] = add nsw i32 [[K_0]], -1			; UNROLL-NO-VF-NEXT: [[DEC]] = add nsw i32 [[K_0]], -1
	; UNROLL-NO-VF-NEXT: br i1 [[CMP2]], label [[FOR_COND1]], label [[FOR_COND_CLEANUP3]], !llvm.loop [[LOOP8:![0-9]+]]			; UNROLL-NO-VF-NEXT: br i1 [[CMP2]], label [[FOR_COND1]], label [[FOR_COND_CLEANUP3]], !llvm.loop [[LOOP9:![0-9]+]]
	; UNROLL-NO-VF: for.cond.cleanup3:			; UNROLL-NO-VF: for.cond.cleanup3:
	; UNROLL-NO-VF-NEXT: [[E_1_LCSSA]] = phi i32 [ [[SCALAR_RECUR]], [[FOR_COND1]] ], [ [[TMP2]], [[MIDDLE_BLOCK]] ]			; UNROLL-NO-VF-NEXT: [[E_1_LCSSA]] = phi i32 [ [[SCALAR_RECUR]], [[FOR_COND1]] ], [ [[TMP1]], [[MIDDLE_BLOCK]] ]
	; UNROLL-NO-VF-NEXT: [[INC]] = add nuw nsw i32 [[I_016]], 1			; UNROLL-NO-VF-NEXT: [[INC]] = add nuw nsw i32 [[I_016]], 1
	; UNROLL-NO-VF-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], 49			; UNROLL-NO-VF-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], 49
	; UNROLL-NO-VF-NEXT: [[INDVAR_NEXT]] = add i32 [[INDVAR]], 1			; UNROLL-NO-VF-NEXT: [[INDVAR_NEXT]] = add i32 [[INDVAR]], 1
	; UNROLL-NO-VF-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_COND1_PREHEADER]]			; UNROLL-NO-VF-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_COND1_PREHEADER]]
	;			;
	; SINK-AFTER-LABEL: @PR27246(			; SINK-AFTER-LABEL: @PR27246(
	; SINK-AFTER-NEXT: entry:			; SINK-AFTER-NEXT: entry:
	; SINK-AFTER-NEXT: br label [[FOR_COND1_PREHEADER:%.*]]			; SINK-AFTER-NEXT: br label [[FOR_COND1_PREHEADER:%.*]]
	▲ Show 20 Lines • Show All 178 Lines • ▼ Show 20 Lines
	; UNROLL-NO-VF-NEXT: [[TMP5:%.*]] = add nuw nsw i64 [[TMP3]], 2			; UNROLL-NO-VF-NEXT: [[TMP5:%.*]] = add nuw nsw i64 [[TMP3]], 2
	; UNROLL-NO-VF-NEXT: [[TMP6:%.*]] = add nuw nsw i64 [[TMP4]], 2			; UNROLL-NO-VF-NEXT: [[TMP6:%.*]] = add nuw nsw i64 [[TMP4]], 2
	; UNROLL-NO-VF-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, ptr [[A:%.]], i64 [[TMP5]]			; UNROLL-NO-VF-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, ptr [[A:%.]], i64 [[TMP5]]
	; UNROLL-NO-VF-NEXT: [[TMP8:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[TMP6]]			; UNROLL-NO-VF-NEXT: [[TMP8:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[TMP6]]
	; UNROLL-NO-VF-NEXT: [[TMP9:%.*]] = load i32, ptr [[TMP7]], align 4			; UNROLL-NO-VF-NEXT: [[TMP9:%.*]] = load i32, ptr [[TMP7]], align 4
	; UNROLL-NO-VF-NEXT: [[TMP10]] = load i32, ptr [[TMP8]], align 4			; UNROLL-NO-VF-NEXT: [[TMP10]] = load i32, ptr [[TMP8]], align 4
	; UNROLL-NO-VF-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2			; UNROLL-NO-VF-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
	; UNROLL-NO-VF-NEXT: [[TMP11:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; UNROLL-NO-VF-NEXT: [[TMP11:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; UNROLL-NO-VF-NEXT: br i1 [[TMP11]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]]			; UNROLL-NO-VF-NEXT: br i1 [[TMP11]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]
	; UNROLL-NO-VF: middle.block:			; UNROLL-NO-VF: middle.block:
	; UNROLL-NO-VF-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP2]], [[N_VEC]]			; UNROLL-NO-VF-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP2]], [[N_VEC]]
	; UNROLL-NO-VF-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; UNROLL-NO-VF-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; UNROLL-NO-VF: scalar.ph:			; UNROLL-NO-VF: scalar.ph:
	; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR_INIT:%.]] = phi i32 [ [[PRE_LOAD]], [[ENTRY:%.]] ], [ [[TMP10]], [[MIDDLE_BLOCK]] ]			; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR_INIT:%.]] = phi i32 [ [[PRE_LOAD]], [[ENTRY:%.]] ], [ [[TMP10]], [[MIDDLE_BLOCK]] ]
	; UNROLL-NO-VF-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]			; UNROLL-NO-VF-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
	; UNROLL-NO-VF-NEXT: br label [[SCALAR_BODY:%.*]]			; UNROLL-NO-VF-NEXT: br label [[SCALAR_BODY:%.*]]
	; UNROLL-NO-VF: scalar.body:			; UNROLL-NO-VF: scalar.body:
	; UNROLL-NO-VF-NEXT: [[I:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[I_NEXT:%.]], [[SCALAR_BODY]] ]			; UNROLL-NO-VF-NEXT: [[I:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[I_NEXT:%.]], [[SCALAR_BODY]] ]
	; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR:%.]] = phi i32 [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ], [ [[VAR2:%.]], [[SCALAR_BODY]] ]			; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR:%.]] = phi i32 [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ], [ [[VAR2:%.]], [[SCALAR_BODY]] ]
	; UNROLL-NO-VF-NEXT: [[I_NEXT]] = add nuw nsw i64 [[I]], 2			; UNROLL-NO-VF-NEXT: [[I_NEXT]] = add nuw nsw i64 [[I]], 2
	; UNROLL-NO-VF-NEXT: [[VAR1:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[I_NEXT]]			; UNROLL-NO-VF-NEXT: [[VAR1:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[I_NEXT]]
	; UNROLL-NO-VF-NEXT: [[VAR2]] = load i32, ptr [[VAR1]], align 4			; UNROLL-NO-VF-NEXT: [[VAR2]] = load i32, ptr [[VAR1]], align 4
	; UNROLL-NO-VF-NEXT: [[COND:%.*]] = icmp eq i64 [[I_NEXT]], [[N]]			; UNROLL-NO-VF-NEXT: [[COND:%.*]] = icmp eq i64 [[I_NEXT]], [[N]]
	; UNROLL-NO-VF-NEXT: br i1 [[COND]], label [[FOR_END]], label [[SCALAR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]			; UNROLL-NO-VF-NEXT: br i1 [[COND]], label [[FOR_END]], label [[SCALAR_BODY]], !llvm.loop [[LOOP11:![0-9]+]]
	; UNROLL-NO-VF: for.end:			; UNROLL-NO-VF: for.end:
	; UNROLL-NO-VF-NEXT: ret void			; UNROLL-NO-VF-NEXT: ret void
	;			;
	; SINK-AFTER-LABEL: @PR30183(			; SINK-AFTER-LABEL: @PR30183(
	; SINK-AFTER-NEXT: entry:			; SINK-AFTER-NEXT: entry:
	; SINK-AFTER-NEXT: [[TMP0:%.]] = add i64 [[N:%.]], -2			; SINK-AFTER-NEXT: [[TMP0:%.]] = add i64 [[N:%.]], -2
	; SINK-AFTER-NEXT: [[TMP1:%.*]] = lshr i64 [[TMP0]], 1			; SINK-AFTER-NEXT: [[TMP1:%.*]] = lshr i64 [[TMP0]], 1
	; SINK-AFTER-NEXT: [[TMP2:%.*]] = add nuw i64 [[TMP1]], 1			; SINK-AFTER-NEXT: [[TMP2:%.*]] = add nuw i64 [[TMP1]], 1
	▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines
	; UNROLL-NO-VF-NEXT: br label [[VECTOR_BODY:%.*]]			; UNROLL-NO-VF-NEXT: br label [[VECTOR_BODY:%.*]]
	; UNROLL-NO-VF: vector.body:			; UNROLL-NO-VF: vector.body:
	; UNROLL-NO-VF-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; UNROLL-NO-VF-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; UNROLL-NO-VF-NEXT: [[VECTOR_RECUR:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[TMP1:%.]], [[VECTOR_BODY]] ]			; UNROLL-NO-VF-NEXT: [[VECTOR_RECUR:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[TMP1:%.]], [[VECTOR_BODY]] ]
	; UNROLL-NO-VF-NEXT: [[TMP0:%.*]] = add i64 0, 1			; UNROLL-NO-VF-NEXT: [[TMP0:%.*]] = add i64 0, 1
	; UNROLL-NO-VF-NEXT: [[TMP1]] = add i64 0, 1			; UNROLL-NO-VF-NEXT: [[TMP1]] = add i64 0, 1
	; UNROLL-NO-VF-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2			; UNROLL-NO-VF-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
	; UNROLL-NO-VF-NEXT: [[TMP2:%.*]] = icmp eq i64 [[INDEX_NEXT]], undef			; UNROLL-NO-VF-NEXT: [[TMP2:%.*]] = icmp eq i64 [[INDEX_NEXT]], undef
	; UNROLL-NO-VF-NEXT: br i1 [[TMP2]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP11:![0-9]+]]			; UNROLL-NO-VF-NEXT: br i1 [[TMP2]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]
	; UNROLL-NO-VF: middle.block:			; UNROLL-NO-VF: middle.block:
	; UNROLL-NO-VF-NEXT: [[CMP_N:%.*]] = icmp eq i64 undef, undef			; UNROLL-NO-VF-NEXT: [[CMP_N:%.*]] = icmp eq i64 undef, undef
	; UNROLL-NO-VF-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; UNROLL-NO-VF-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; UNROLL-NO-VF: scalar.ph:			; UNROLL-NO-VF: scalar.ph:
	; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR_INIT:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[TMP1]], [[MIDDLE_BLOCK]] ]			; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR_INIT:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[TMP1]], [[MIDDLE_BLOCK]] ]
	; UNROLL-NO-VF-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ undef, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]			; UNROLL-NO-VF-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ undef, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
	; UNROLL-NO-VF-NEXT: br label [[SCALAR_BODY:%.*]]			; UNROLL-NO-VF-NEXT: br label [[SCALAR_BODY:%.*]]
	; UNROLL-NO-VF: scalar.body:			; UNROLL-NO-VF: scalar.body:
	; UNROLL-NO-VF-NEXT: [[I:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[I_NEXT:%.]], [[SCALAR_BODY]] ]			; UNROLL-NO-VF-NEXT: [[I:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[I_NEXT:%.]], [[SCALAR_BODY]] ]
	; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR:%.]] = phi i64 [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ], [ [[VAR3:%.]], [[SCALAR_BODY]] ]			; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR:%.]] = phi i64 [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ], [ [[VAR3:%.]], [[SCALAR_BODY]] ]
	; UNROLL-NO-VF-NEXT: [[VAR3]] = add i64 0, 1			; UNROLL-NO-VF-NEXT: [[VAR3]] = add i64 0, 1
	; UNROLL-NO-VF-NEXT: [[I_NEXT]] = add nuw nsw i64 [[I]], 1			; UNROLL-NO-VF-NEXT: [[I_NEXT]] = add nuw nsw i64 [[I]], 1
	; UNROLL-NO-VF-NEXT: [[COND:%.*]] = icmp eq i64 [[I_NEXT]], undef			; UNROLL-NO-VF-NEXT: [[COND:%.*]] = icmp eq i64 [[I_NEXT]], undef
	; UNROLL-NO-VF-NEXT: br i1 [[COND]], label [[FOR_END]], label [[SCALAR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]			; UNROLL-NO-VF-NEXT: br i1 [[COND]], label [[FOR_END]], label [[SCALAR_BODY]], !llvm.loop [[LOOP13:![0-9]+]]
	; UNROLL-NO-VF: for.end:			; UNROLL-NO-VF: for.end:
	; UNROLL-NO-VF-NEXT: ret void			; UNROLL-NO-VF-NEXT: ret void
	;			;
	; SINK-AFTER-LABEL: @constant_folded_previous_value(			; SINK-AFTER-LABEL: @constant_folded_previous_value(
	; SINK-AFTER-NEXT: entry:			; SINK-AFTER-NEXT: entry:
	; SINK-AFTER-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; SINK-AFTER-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; SINK-AFTER: vector.ph:			; SINK-AFTER: vector.ph:
	; SINK-AFTER-NEXT: br label [[VECTOR_BODY:%.*]]			; SINK-AFTER-NEXT: br label [[VECTOR_BODY:%.*]]
	▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines
	; UNROLL-NO-VF-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; UNROLL-NO-VF-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; UNROLL-NO-VF-NEXT: [[VECTOR_RECUR:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[TMP3:%.]], [[VECTOR_BODY]] ]			; UNROLL-NO-VF-NEXT: [[VECTOR_RECUR:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[TMP3:%.]], [[VECTOR_BODY]] ]
	; UNROLL-NO-VF-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0			; UNROLL-NO-VF-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0
	; UNROLL-NO-VF-NEXT: [[TMP1:%.*]] = add i32 [[INDEX]], 1			; UNROLL-NO-VF-NEXT: [[TMP1:%.*]] = add i32 [[INDEX]], 1
	; UNROLL-NO-VF-NEXT: [[TMP2:%.]] = add i32 [[TMP0]], [[X:%.]]			; UNROLL-NO-VF-NEXT: [[TMP2:%.]] = add i32 [[TMP0]], [[X:%.]]
	; UNROLL-NO-VF-NEXT: [[TMP3]] = add i32 [[TMP1]], [[X]]			; UNROLL-NO-VF-NEXT: [[TMP3]] = add i32 [[TMP1]], [[X]]
	; UNROLL-NO-VF-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 2			; UNROLL-NO-VF-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 2
	; UNROLL-NO-VF-NEXT: [[TMP4:%.*]] = icmp eq i32 [[INDEX_NEXT]], 96			; UNROLL-NO-VF-NEXT: [[TMP4:%.*]] = icmp eq i32 [[INDEX_NEXT]], 96
	; UNROLL-NO-VF-NEXT: br i1 [[TMP4]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP13:![0-9]+]]			; UNROLL-NO-VF-NEXT: br i1 [[TMP4]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP14:![0-9]+]]
	; UNROLL-NO-VF: middle.block:			; UNROLL-NO-VF: middle.block:
	; UNROLL-NO-VF-NEXT: [[CMP_N:%.*]] = icmp eq i32 96, 96			; UNROLL-NO-VF-NEXT: [[CMP_N:%.*]] = icmp eq i32 96, 96
	; UNROLL-NO-VF-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; UNROLL-NO-VF-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; UNROLL-NO-VF: scalar.ph:			; UNROLL-NO-VF: scalar.ph:
	; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR_INIT:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[TMP3]], [[MIDDLE_BLOCK]] ]			; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR_INIT:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[TMP3]], [[MIDDLE_BLOCK]] ]
	; UNROLL-NO-VF-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ 96, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]			; UNROLL-NO-VF-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ 96, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
	; UNROLL-NO-VF-NEXT: br label [[FOR_BODY:%.*]]			; UNROLL-NO-VF-NEXT: br label [[FOR_BODY:%.*]]
	; UNROLL-NO-VF: for.body:			; UNROLL-NO-VF: for.body:
	; UNROLL-NO-VF-NEXT: [[INC_PHI:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_BODY]] ]			; UNROLL-NO-VF-NEXT: [[INC_PHI:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_BODY]] ]
	; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR:%.]] = phi i32 [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ], [ [[ADDX:%.]], [[FOR_BODY]] ]			; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR:%.]] = phi i32 [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ], [ [[ADDX:%.]], [[FOR_BODY]] ]
	; UNROLL-NO-VF-NEXT: [[INC]] = add i32 [[INC_PHI]], 1			; UNROLL-NO-VF-NEXT: [[INC]] = add i32 [[INC_PHI]], 1
	; UNROLL-NO-VF-NEXT: [[BC:%.*]] = zext i32 [[INC_PHI]] to i64			; UNROLL-NO-VF-NEXT: [[BC:%.*]] = zext i32 [[INC_PHI]] to i64
	; UNROLL-NO-VF-NEXT: [[ADDX]] = add i32 [[INC_PHI]], [[X]]			; UNROLL-NO-VF-NEXT: [[ADDX]] = add i32 [[INC_PHI]], [[X]]
	; UNROLL-NO-VF-NEXT: [[CMP:%.*]] = icmp eq i32 [[INC_PHI]], 95			; UNROLL-NO-VF-NEXT: [[CMP:%.*]] = icmp eq i32 [[INC_PHI]], 95
	; UNROLL-NO-VF-NEXT: br i1 [[CMP]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP14:![0-9]+]]			; UNROLL-NO-VF-NEXT: br i1 [[CMP]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP15:![0-9]+]]
	; UNROLL-NO-VF: for.end:			; UNROLL-NO-VF: for.end:
	; UNROLL-NO-VF-NEXT: [[VAL_PHI_LCSSA:%.*]] = phi i32 [ [[SCALAR_RECUR]], [[FOR_BODY]] ], [ [[TMP2]], [[MIDDLE_BLOCK]] ]			; UNROLL-NO-VF-NEXT: [[VAL_PHI_LCSSA:%.*]] = phi i32 [ [[SCALAR_RECUR]], [[FOR_BODY]] ], [ [[TMP2]], [[MIDDLE_BLOCK]] ]
	; UNROLL-NO-VF-NEXT: ret i32 [[VAL_PHI_LCSSA]]			; UNROLL-NO-VF-NEXT: ret i32 [[VAL_PHI_LCSSA]]
	;			;
	; SINK-AFTER-LABEL: @extract_second_last_iteration(			; SINK-AFTER-LABEL: @extract_second_last_iteration(
	; SINK-AFTER-NEXT: entry:			; SINK-AFTER-NEXT: entry:
	; SINK-AFTER-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; SINK-AFTER-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; SINK-AFTER: vector.ph:			; SINK-AFTER: vector.ph:
	▲ Show 20 Lines • Show All 199 Lines • ▼ Show 20 Lines
	; UNROLL-NO-VF-NEXT: [[TMP10:%.*]] = fcmp une double [[TMP8]], 0.000000e+00			; UNROLL-NO-VF-NEXT: [[TMP10:%.*]] = fcmp une double [[TMP8]], 0.000000e+00
	; UNROLL-NO-VF-NEXT: [[TMP11:%.*]] = fcmp une double [[TMP9]], 0.000000e+00			; UNROLL-NO-VF-NEXT: [[TMP11:%.*]] = fcmp une double [[TMP9]], 0.000000e+00
	; UNROLL-NO-VF-NEXT: [[TMP12:%.*]] = zext i1 [[TMP10]] to i32			; UNROLL-NO-VF-NEXT: [[TMP12:%.*]] = zext i1 [[TMP10]] to i32
	; UNROLL-NO-VF-NEXT: [[TMP13:%.*]] = zext i1 [[TMP11]] to i32			; UNROLL-NO-VF-NEXT: [[TMP13:%.*]] = zext i1 [[TMP11]] to i32
	; UNROLL-NO-VF-NEXT: [[TMP14]] = add i32 [[VEC_PHI]], [[TMP12]]			; UNROLL-NO-VF-NEXT: [[TMP14]] = add i32 [[VEC_PHI]], [[TMP12]]
	; UNROLL-NO-VF-NEXT: [[TMP15]] = add i32 [[VEC_PHI3]], [[TMP13]]			; UNROLL-NO-VF-NEXT: [[TMP15]] = add i32 [[VEC_PHI3]], [[TMP13]]
	; UNROLL-NO-VF-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2			; UNROLL-NO-VF-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
	; UNROLL-NO-VF-NEXT: [[TMP16:%.*]] = icmp eq i64 [[INDEX_NEXT]], 10240			; UNROLL-NO-VF-NEXT: [[TMP16:%.*]] = icmp eq i64 [[INDEX_NEXT]], 10240
	; UNROLL-NO-VF-NEXT: br i1 [[TMP16]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP15:![0-9]+]]			; UNROLL-NO-VF-NEXT: br i1 [[TMP16]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP16:![0-9]+]]
	; UNROLL-NO-VF: middle.block:			; UNROLL-NO-VF: middle.block:
	; UNROLL-NO-VF-NEXT: [[BIN_RDX:%.*]] = add i32 [[TMP15]], [[TMP14]]			; UNROLL-NO-VF-NEXT: [[BIN_RDX:%.*]] = add i32 [[TMP15]], [[TMP14]]
	; UNROLL-NO-VF-NEXT: [[CMP_N:%.*]] = icmp eq i64 10240, 10240			; UNROLL-NO-VF-NEXT: [[CMP_N:%.*]] = icmp eq i64 10240, 10240
	; UNROLL-NO-VF-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]			; UNROLL-NO-VF-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
	; UNROLL-NO-VF: scalar.ph:			; UNROLL-NO-VF: scalar.ph:
	; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR_INIT:%.]] = phi double [ [[J]], [[ENTRY:%.]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]			; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR_INIT:%.]] = phi double [ [[J]], [[ENTRY:%.]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]
	; UNROLL-NO-VF-NEXT: [[BC_RESUME_VAL:%.*]] = phi ptr [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ [[B]], [[ENTRY]] ]			; UNROLL-NO-VF-NEXT: [[BC_RESUME_VAL:%.*]] = phi ptr [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ [[B]], [[ENTRY]] ]
	; UNROLL-NO-VF-NEXT: [[BC_RESUME_VAL1:%.*]] = phi i32 [ 10240, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]			; UNROLL-NO-VF-NEXT: [[BC_RESUME_VAL1:%.*]] = phi i32 [ 10240, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
	Show All 11 Lines
	; UNROLL-NO-VF-NEXT: [[TMP17]] = load double, ptr [[ARRAYIDX]], align 8			; UNROLL-NO-VF-NEXT: [[TMP17]] = load double, ptr [[ARRAYIDX]], align 8
	; UNROLL-NO-VF-NEXT: [[MUL:%.*]] = fmul double [[SCALAR_RECUR]], [[TMP17]]			; UNROLL-NO-VF-NEXT: [[MUL:%.*]] = fmul double [[SCALAR_RECUR]], [[TMP17]]
	; UNROLL-NO-VF-NEXT: [[TOBOOL:%.*]] = fcmp une double [[MUL]], 0.000000e+00			; UNROLL-NO-VF-NEXT: [[TOBOOL:%.*]] = fcmp une double [[MUL]], 0.000000e+00
	; UNROLL-NO-VF-NEXT: [[INC:%.*]] = zext i1 [[TOBOOL]] to i32			; UNROLL-NO-VF-NEXT: [[INC:%.*]] = zext i1 [[TOBOOL]] to i32
	; UNROLL-NO-VF-NEXT: [[A_1]] = add nsw i32 [[A_010]], [[INC]]			; UNROLL-NO-VF-NEXT: [[A_1]] = add nsw i32 [[A_010]], [[INC]]
	; UNROLL-NO-VF-NEXT: [[INC1]] = add nuw nsw i32 [[I_011]], 1			; UNROLL-NO-VF-NEXT: [[INC1]] = add nuw nsw i32 [[I_011]], 1
	; UNROLL-NO-VF-NEXT: [[ADD_PTR]] = getelementptr inbounds double, ptr [[B_ADDR_012]], i64 25			; UNROLL-NO-VF-NEXT: [[ADD_PTR]] = getelementptr inbounds double, ptr [[B_ADDR_012]], i64 25
	; UNROLL-NO-VF-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC1]], 10240			; UNROLL-NO-VF-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC1]], 10240
	; UNROLL-NO-VF-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP16:![0-9]+]]			; UNROLL-NO-VF-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP17:![0-9]+]]
	;			;
	; SINK-AFTER-LABEL: @PR33613(			; SINK-AFTER-LABEL: @PR33613(
	; SINK-AFTER-NEXT: entry:			; SINK-AFTER-NEXT: entry:
	; SINK-AFTER-NEXT: [[IDXPROM:%.]] = sext i32 [[D:%.]] to i64			; SINK-AFTER-NEXT: [[IDXPROM:%.]] = sext i32 [[D:%.]] to i64
	; SINK-AFTER-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; SINK-AFTER-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; SINK-AFTER: vector.ph:			; SINK-AFTER: vector.ph:
	; SINK-AFTER-NEXT: [[IND_END:%.]] = getelementptr i8, ptr [[B:%.]], i64 2048000			; SINK-AFTER-NEXT: [[IND_END:%.]] = getelementptr i8, ptr [[B:%.]], i64 2048000
	; SINK-AFTER-NEXT: [[VECTOR_RECUR_INIT:%.]] = insertelement <4 x double> poison, double [[J:%.]], i32 3			; SINK-AFTER-NEXT: [[VECTOR_RECUR_INIT:%.]] = insertelement <4 x double> poison, double [[J:%.]], i32 3
	▲ Show 20 Lines • Show All 190 Lines • ▼ Show 20 Lines
	; UNROLL-NO-VF-NEXT: [[TMP12:%.*]] = mul nsw i32 [[TMP10]], [[TMP8]]			; UNROLL-NO-VF-NEXT: [[TMP12:%.*]] = mul nsw i32 [[TMP10]], [[TMP8]]
	; UNROLL-NO-VF-NEXT: [[TMP13:%.*]] = mul nsw i32 [[TMP11]], [[TMP9]]			; UNROLL-NO-VF-NEXT: [[TMP13:%.*]] = mul nsw i32 [[TMP11]], [[TMP9]]
	; UNROLL-NO-VF-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, ptr [[B:%.]], i64 [[TMP0]]			; UNROLL-NO-VF-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, ptr [[B:%.]], i64 [[TMP0]]
	; UNROLL-NO-VF-NEXT: [[TMP15:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[TMP1]]			; UNROLL-NO-VF-NEXT: [[TMP15:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[TMP1]]
	; UNROLL-NO-VF-NEXT: store i32 [[TMP12]], ptr [[TMP14]], align 4			; UNROLL-NO-VF-NEXT: store i32 [[TMP12]], ptr [[TMP14]], align 4
	; UNROLL-NO-VF-NEXT: store i32 [[TMP13]], ptr [[TMP15]], align 4			; UNROLL-NO-VF-NEXT: store i32 [[TMP13]], ptr [[TMP15]], align 4
	; UNROLL-NO-VF-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2			; UNROLL-NO-VF-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
	; UNROLL-NO-VF-NEXT: [[TMP16:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; UNROLL-NO-VF-NEXT: [[TMP16:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; UNROLL-NO-VF-NEXT: br i1 [[TMP16]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP17:![0-9]+]]			; UNROLL-NO-VF-NEXT: br i1 [[TMP16]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP18:![0-9]+]]
	; UNROLL-NO-VF: middle.block:			; UNROLL-NO-VF: middle.block:
	; UNROLL-NO-VF-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]			; UNROLL-NO-VF-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
	; UNROLL-NO-VF-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; UNROLL-NO-VF-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; UNROLL-NO-VF: scalar.ph:			; UNROLL-NO-VF: scalar.ph:
	; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR_INIT:%.]] = phi i16 [ [[DOTPRE]], [[ENTRY:%.]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]			; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR_INIT:%.]] = phi i16 [ [[DOTPRE]], [[ENTRY:%.]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]
	; UNROLL-NO-VF-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]			; UNROLL-NO-VF-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
	; UNROLL-NO-VF-NEXT: br label [[FOR_BODY:%.*]]			; UNROLL-NO-VF-NEXT: br label [[FOR_BODY:%.*]]
	; UNROLL-NO-VF: for.body:			; UNROLL-NO-VF: for.body:
	; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR:%.]] = phi i16 [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ], [ [[TMP17:%.]], [[FOR_BODY]] ]			; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR:%.]] = phi i16 [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ], [ [[TMP17:%.]], [[FOR_BODY]] ]
	; UNROLL-NO-VF-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]			; UNROLL-NO-VF-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
	; UNROLL-NO-VF-NEXT: [[CONV:%.*]] = sext i16 [[SCALAR_RECUR]] to i32			; UNROLL-NO-VF-NEXT: [[CONV:%.*]] = sext i16 [[SCALAR_RECUR]] to i32
	; UNROLL-NO-VF-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; UNROLL-NO-VF-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; UNROLL-NO-VF-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds i16, ptr [[A]], i64 [[INDVARS_IV_NEXT]]			; UNROLL-NO-VF-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds i16, ptr [[A]], i64 [[INDVARS_IV_NEXT]]
	; UNROLL-NO-VF-NEXT: [[TMP17]] = load i16, ptr [[ARRAYIDX2]], align 2			; UNROLL-NO-VF-NEXT: [[TMP17]] = load i16, ptr [[ARRAYIDX2]], align 2
	; UNROLL-NO-VF-NEXT: [[CONV3:%.*]] = sext i16 [[TMP17]] to i32			; UNROLL-NO-VF-NEXT: [[CONV3:%.*]] = sext i16 [[TMP17]] to i32
	; UNROLL-NO-VF-NEXT: [[MUL:%.*]] = mul nsw i32 [[CONV3]], [[CONV]]			; UNROLL-NO-VF-NEXT: [[MUL:%.*]] = mul nsw i32 [[CONV3]], [[CONV]]
	; UNROLL-NO-VF-NEXT: [[ARRAYIDX5:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[INDVARS_IV]]			; UNROLL-NO-VF-NEXT: [[ARRAYIDX5:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[INDVARS_IV]]
	; UNROLL-NO-VF-NEXT: store i32 [[MUL]], ptr [[ARRAYIDX5]], align 4			; UNROLL-NO-VF-NEXT: store i32 [[MUL]], ptr [[ARRAYIDX5]], align 4
	; UNROLL-NO-VF-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]			; UNROLL-NO-VF-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
	; UNROLL-NO-VF-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP18:![0-9]+]]			; UNROLL-NO-VF-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP19:![0-9]+]]
	; UNROLL-NO-VF: for.end:			; UNROLL-NO-VF: for.end:
	; UNROLL-NO-VF-NEXT: ret void			; UNROLL-NO-VF-NEXT: ret void
	;			;
	; SINK-AFTER-LABEL: @sink_after(			; SINK-AFTER-LABEL: @sink_after(
	; SINK-AFTER-NEXT: entry:			; SINK-AFTER-NEXT: entry:
	; SINK-AFTER-NEXT: [[DOTPRE:%.]] = load i16, ptr [[A:%.]], align 2			; SINK-AFTER-NEXT: [[DOTPRE:%.]] = load i16, ptr [[A:%.]], align 2
	; SINK-AFTER-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 4			; SINK-AFTER-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 4
	; SINK-AFTER-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; SINK-AFTER-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	▲ Show 20 Lines • Show All 208 Lines • ▼ Show 20 Lines
	; UNROLL-NO-VF-NEXT: [[TMP12:%.*]] = mul nsw i32 [[TMP10]], [[TMP8]]			; UNROLL-NO-VF-NEXT: [[TMP12:%.*]] = mul nsw i32 [[TMP10]], [[TMP8]]
	; UNROLL-NO-VF-NEXT: [[TMP13:%.*]] = mul nsw i32 [[TMP11]], [[TMP9]]			; UNROLL-NO-VF-NEXT: [[TMP13:%.*]] = mul nsw i32 [[TMP11]], [[TMP9]]
	; UNROLL-NO-VF-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, ptr [[B:%.]], i64 [[TMP0]]			; UNROLL-NO-VF-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, ptr [[B:%.]], i64 [[TMP0]]
	; UNROLL-NO-VF-NEXT: [[TMP15:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[TMP1]]			; UNROLL-NO-VF-NEXT: [[TMP15:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[TMP1]]
	; UNROLL-NO-VF-NEXT: store i32 [[TMP12]], ptr [[TMP14]], align 4			; UNROLL-NO-VF-NEXT: store i32 [[TMP12]], ptr [[TMP14]], align 4
	; UNROLL-NO-VF-NEXT: store i32 [[TMP13]], ptr [[TMP15]], align 4			; UNROLL-NO-VF-NEXT: store i32 [[TMP13]], ptr [[TMP15]], align 4
	; UNROLL-NO-VF-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2			; UNROLL-NO-VF-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
	; UNROLL-NO-VF-NEXT: [[TMP16:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; UNROLL-NO-VF-NEXT: [[TMP16:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; UNROLL-NO-VF-NEXT: br i1 [[TMP16]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP19:![0-9]+]]			; UNROLL-NO-VF-NEXT: br i1 [[TMP16]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP20:![0-9]+]]
	; UNROLL-NO-VF: middle.block:			; UNROLL-NO-VF: middle.block:
	; UNROLL-NO-VF-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]			; UNROLL-NO-VF-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
	; UNROLL-NO-VF-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; UNROLL-NO-VF-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; UNROLL-NO-VF: scalar.ph:			; UNROLL-NO-VF: scalar.ph:
	; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR_INIT:%.]] = phi i16 [ [[DOTPRE]], [[ENTRY:%.]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]			; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR_INIT:%.]] = phi i16 [ [[DOTPRE]], [[ENTRY:%.]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]
	; UNROLL-NO-VF-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]			; UNROLL-NO-VF-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
	; UNROLL-NO-VF-NEXT: br label [[FOR_BODY:%.*]]			; UNROLL-NO-VF-NEXT: br label [[FOR_BODY:%.*]]
	; UNROLL-NO-VF: for.body:			; UNROLL-NO-VF: for.body:
	; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR:%.]] = phi i16 [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ], [ [[TMP17:%.]], [[FOR_BODY]] ]			; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR:%.]] = phi i16 [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ], [ [[TMP17:%.]], [[FOR_BODY]] ]
	; UNROLL-NO-VF-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]			; UNROLL-NO-VF-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
	; UNROLL-NO-VF-NEXT: [[ARRAYCIDX:%.*]] = getelementptr inbounds i32, ptr [[C]], i64 [[INDVARS_IV]]			; UNROLL-NO-VF-NEXT: [[ARRAYCIDX:%.*]] = getelementptr inbounds i32, ptr [[C]], i64 [[INDVARS_IV]]
	; UNROLL-NO-VF-NEXT: [[CUR_INDEX:%.*]] = getelementptr inbounds [2 x i16], ptr [[A]], i64 [[INDVARS_IV]], i64 1			; UNROLL-NO-VF-NEXT: [[CUR_INDEX:%.*]] = getelementptr inbounds [2 x i16], ptr [[A]], i64 [[INDVARS_IV]], i64 1
	; UNROLL-NO-VF-NEXT: store i32 7, ptr [[ARRAYCIDX]], align 4			; UNROLL-NO-VF-NEXT: store i32 7, ptr [[ARRAYCIDX]], align 4
	; UNROLL-NO-VF-NEXT: [[CONV:%.*]] = sext i16 [[SCALAR_RECUR]] to i32			; UNROLL-NO-VF-NEXT: [[CONV:%.*]] = sext i16 [[SCALAR_RECUR]] to i32
	; UNROLL-NO-VF-NEXT: [[TMP17]] = load i16, ptr [[CUR_INDEX]], align 2			; UNROLL-NO-VF-NEXT: [[TMP17]] = load i16, ptr [[CUR_INDEX]], align 2
	; UNROLL-NO-VF-NEXT: [[CONV3:%.*]] = sext i16 [[TMP17]] to i32			; UNROLL-NO-VF-NEXT: [[CONV3:%.*]] = sext i16 [[TMP17]] to i32
	; UNROLL-NO-VF-NEXT: [[MUL:%.*]] = mul nsw i32 [[CONV3]], [[CONV]]			; UNROLL-NO-VF-NEXT: [[MUL:%.*]] = mul nsw i32 [[CONV3]], [[CONV]]
	; UNROLL-NO-VF-NEXT: [[ARRAYIDX5:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[INDVARS_IV]]			; UNROLL-NO-VF-NEXT: [[ARRAYIDX5:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[INDVARS_IV]]
	; UNROLL-NO-VF-NEXT: store i32 [[MUL]], ptr [[ARRAYIDX5]], align 4			; UNROLL-NO-VF-NEXT: store i32 [[MUL]], ptr [[ARRAYIDX5]], align 4
	; UNROLL-NO-VF-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; UNROLL-NO-VF-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; UNROLL-NO-VF-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]			; UNROLL-NO-VF-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
	; UNROLL-NO-VF-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP20:![0-9]+]]			; UNROLL-NO-VF-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP21:![0-9]+]]
	; UNROLL-NO-VF: for.end:			; UNROLL-NO-VF: for.end:
	; UNROLL-NO-VF-NEXT: ret void			; UNROLL-NO-VF-NEXT: ret void
	;			;
	; SINK-AFTER-LABEL: @PR34711(			; SINK-AFTER-LABEL: @PR34711(
	; SINK-AFTER-NEXT: entry:			; SINK-AFTER-NEXT: entry:
	; SINK-AFTER-NEXT: [[DOTPRE:%.]] = load i16, ptr [[A:%.]], align 2			; SINK-AFTER-NEXT: [[DOTPRE:%.]] = load i16, ptr [[A:%.]], align 2
	; SINK-AFTER-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 4			; SINK-AFTER-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 4
	; SINK-AFTER-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; SINK-AFTER-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	▲ Show 20 Lines • Show All 191 Lines • ▼ Show 20 Lines
	; UNROLL-NO-VF-NEXT: [[TMP14:%.*]] = mul nsw i32 [[TMP10]], [[TMP12]]			; UNROLL-NO-VF-NEXT: [[TMP14:%.*]] = mul nsw i32 [[TMP10]], [[TMP12]]
	; UNROLL-NO-VF-NEXT: [[TMP15:%.*]] = mul nsw i32 [[TMP11]], [[TMP13]]			; UNROLL-NO-VF-NEXT: [[TMP15:%.*]] = mul nsw i32 [[TMP11]], [[TMP13]]
	; UNROLL-NO-VF-NEXT: [[TMP16:%.]] = getelementptr inbounds i32, ptr [[B:%.]], i64 [[TMP0]]			; UNROLL-NO-VF-NEXT: [[TMP16:%.]] = getelementptr inbounds i32, ptr [[B:%.]], i64 [[TMP0]]
	; UNROLL-NO-VF-NEXT: [[TMP17:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[TMP1]]			; UNROLL-NO-VF-NEXT: [[TMP17:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[TMP1]]
	; UNROLL-NO-VF-NEXT: store i32 [[TMP14]], ptr [[TMP16]], align 4			; UNROLL-NO-VF-NEXT: store i32 [[TMP14]], ptr [[TMP16]], align 4
	; UNROLL-NO-VF-NEXT: store i32 [[TMP15]], ptr [[TMP17]], align 4			; UNROLL-NO-VF-NEXT: store i32 [[TMP15]], ptr [[TMP17]], align 4
	; UNROLL-NO-VF-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2			; UNROLL-NO-VF-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
	; UNROLL-NO-VF-NEXT: [[TMP18:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; UNROLL-NO-VF-NEXT: [[TMP18:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; UNROLL-NO-VF-NEXT: br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP21:![0-9]+]]			; UNROLL-NO-VF-NEXT: br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP22:![0-9]+]]
	; UNROLL-NO-VF: middle.block:			; UNROLL-NO-VF: middle.block:
	; UNROLL-NO-VF-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]			; UNROLL-NO-VF-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
	; UNROLL-NO-VF-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; UNROLL-NO-VF-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; UNROLL-NO-VF: scalar.ph:			; UNROLL-NO-VF: scalar.ph:
	; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR_INIT:%.]] = phi i16 [ [[DOTPRE]], [[ENTRY:%.]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]			; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR_INIT:%.]] = phi i16 [ [[DOTPRE]], [[ENTRY:%.]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]
	; UNROLL-NO-VF-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]			; UNROLL-NO-VF-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
	; UNROLL-NO-VF-NEXT: br label [[FOR_BODY:%.*]]			; UNROLL-NO-VF-NEXT: br label [[FOR_BODY:%.*]]
	; UNROLL-NO-VF: for.body:			; UNROLL-NO-VF: for.body:
	; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR:%.]] = phi i16 [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ], [ [[TMP19:%.]], [[FOR_BODY]] ]			; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR:%.]] = phi i16 [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ], [ [[TMP19:%.]], [[FOR_BODY]] ]
	; UNROLL-NO-VF-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]			; UNROLL-NO-VF-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
	; UNROLL-NO-VF-NEXT: [[CONV:%.*]] = sext i16 [[SCALAR_RECUR]] to i32			; UNROLL-NO-VF-NEXT: [[CONV:%.*]] = sext i16 [[SCALAR_RECUR]] to i32
	; UNROLL-NO-VF-NEXT: [[ADD:%.*]] = add nsw i32 [[CONV]], 2			; UNROLL-NO-VF-NEXT: [[ADD:%.*]] = add nsw i32 [[CONV]], 2
	; UNROLL-NO-VF-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; UNROLL-NO-VF-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; UNROLL-NO-VF-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds i16, ptr [[A]], i64 [[INDVARS_IV_NEXT]]			; UNROLL-NO-VF-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds i16, ptr [[A]], i64 [[INDVARS_IV_NEXT]]
	; UNROLL-NO-VF-NEXT: [[TMP19]] = load i16, ptr [[ARRAYIDX2]], align 2			; UNROLL-NO-VF-NEXT: [[TMP19]] = load i16, ptr [[ARRAYIDX2]], align 2
	; UNROLL-NO-VF-NEXT: [[CONV3:%.*]] = sext i16 [[TMP19]] to i32			; UNROLL-NO-VF-NEXT: [[CONV3:%.*]] = sext i16 [[TMP19]] to i32
	; UNROLL-NO-VF-NEXT: [[MUL:%.*]] = mul nsw i32 [[ADD]], [[CONV3]]			; UNROLL-NO-VF-NEXT: [[MUL:%.*]] = mul nsw i32 [[ADD]], [[CONV3]]
	; UNROLL-NO-VF-NEXT: [[ARRAYIDX5:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[INDVARS_IV]]			; UNROLL-NO-VF-NEXT: [[ARRAYIDX5:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[INDVARS_IV]]
	; UNROLL-NO-VF-NEXT: store i32 [[MUL]], ptr [[ARRAYIDX5]], align 4			; UNROLL-NO-VF-NEXT: store i32 [[MUL]], ptr [[ARRAYIDX5]], align 4
	; UNROLL-NO-VF-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]			; UNROLL-NO-VF-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
	; UNROLL-NO-VF-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP22:![0-9]+]]			; UNROLL-NO-VF-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP23:![0-9]+]]
	; UNROLL-NO-VF: for.end:			; UNROLL-NO-VF: for.end:
	; UNROLL-NO-VF-NEXT: ret void			; UNROLL-NO-VF-NEXT: ret void
	;			;
	; SINK-AFTER-LABEL: @sink_after_with_multiple_users(			; SINK-AFTER-LABEL: @sink_after_with_multiple_users(
	; SINK-AFTER-NEXT: entry:			; SINK-AFTER-NEXT: entry:
	; SINK-AFTER-NEXT: [[DOTPRE:%.]] = load i16, ptr [[A:%.]], align 2			; SINK-AFTER-NEXT: [[DOTPRE:%.]] = load i16, ptr [[A:%.]], align 2
	; SINK-AFTER-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 4			; SINK-AFTER-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 4
	; SINK-AFTER-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; SINK-AFTER-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	▲ Show 20 Lines • Show All 240 Lines • ▼ Show 20 Lines
	; UNROLL-NO-VF-NEXT: [[TMP2:%.*]] = add i16 [[TMP0]], 1			; UNROLL-NO-VF-NEXT: [[TMP2:%.*]] = add i16 [[TMP0]], 1
	; UNROLL-NO-VF-NEXT: [[TMP3:%.*]] = add i16 [[TMP1]], 1			; UNROLL-NO-VF-NEXT: [[TMP3:%.*]] = add i16 [[TMP1]], 1
	; UNROLL-NO-VF-NEXT: [[TMP4:%.*]] = zext i16 [[TMP2]] to i32			; UNROLL-NO-VF-NEXT: [[TMP4:%.*]] = zext i16 [[TMP2]] to i32
	; UNROLL-NO-VF-NEXT: [[TMP5]] = zext i16 [[TMP3]] to i32			; UNROLL-NO-VF-NEXT: [[TMP5]] = zext i16 [[TMP3]] to i32
	; UNROLL-NO-VF-NEXT: [[TMP6:%.*]] = add i16 [[TMP2]], 5			; UNROLL-NO-VF-NEXT: [[TMP6:%.*]] = add i16 [[TMP2]], 5
	; UNROLL-NO-VF-NEXT: [[TMP7]] = add i16 [[TMP3]], 5			; UNROLL-NO-VF-NEXT: [[TMP7]] = add i16 [[TMP3]], 5
	; UNROLL-NO-VF-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 2			; UNROLL-NO-VF-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 2
	; UNROLL-NO-VF-NEXT: [[TMP8:%.*]] = icmp eq i32 [[INDEX_NEXT]], 42			; UNROLL-NO-VF-NEXT: [[TMP8:%.*]] = icmp eq i32 [[INDEX_NEXT]], 42
	; UNROLL-NO-VF-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP23:![0-9]+]]			; UNROLL-NO-VF-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP24:![0-9]+]]
	; UNROLL-NO-VF: middle.block:			; UNROLL-NO-VF: middle.block:
	; UNROLL-NO-VF-NEXT: [[CMP_N:%.*]] = icmp eq i32 43, 42			; UNROLL-NO-VF-NEXT: [[CMP_N:%.*]] = icmp eq i32 43, 42
	; UNROLL-NO-VF-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; UNROLL-NO-VF-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; UNROLL-NO-VF: scalar.ph:			; UNROLL-NO-VF: scalar.ph:
	; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR_INIT2:%.]] = phi i32 [ -27, [[ENTRY:%.]] ], [ [[TMP5]], [[MIDDLE_BLOCK]] ]			; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR_INIT2:%.]] = phi i32 [ -27, [[ENTRY:%.]] ], [ [[TMP5]], [[MIDDLE_BLOCK]] ]
	; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR_INIT:%.*]] = phi i16 [ 0, [[ENTRY]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]			; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR_INIT:%.*]] = phi i16 [ 0, [[ENTRY]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]
	; UNROLL-NO-VF-NEXT: [[BC_RESUME_VAL:%.*]] = phi i16 [ 15, [[MIDDLE_BLOCK]] ], [ -27, [[ENTRY]] ]			; UNROLL-NO-VF-NEXT: [[BC_RESUME_VAL:%.*]] = phi i16 [ 15, [[MIDDLE_BLOCK]] ], [ -27, [[ENTRY]] ]
	; UNROLL-NO-VF-NEXT: br label [[FOR_COND:%.*]]			; UNROLL-NO-VF-NEXT: br label [[FOR_COND:%.*]]
	; UNROLL-NO-VF: for.cond:			; UNROLL-NO-VF: for.cond:
	; UNROLL-NO-VF-NEXT: [[IV:%.]] = phi i16 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_COND]] ]			; UNROLL-NO-VF-NEXT: [[IV:%.]] = phi i16 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_COND]] ]
	; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR:%.]] = phi i16 [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ], [ [[REC_1_PREV:%.]], [[FOR_COND]] ]			; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR:%.]] = phi i16 [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ], [ [[REC_1_PREV:%.]], [[FOR_COND]] ]
	; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR3:%.]] = phi i32 [ [[SCALAR_RECUR_INIT2]], [[SCALAR_PH]] ], [ [[REC_2_PREV:%.]], [[FOR_COND]] ]			; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR3:%.]] = phi i32 [ [[SCALAR_RECUR_INIT2]], [[SCALAR_PH]] ], [ [[REC_2_PREV:%.]], [[FOR_COND]] ]
	; UNROLL-NO-VF-NEXT: [[USE_REC_1:%.*]] = sub i16 [[SCALAR_RECUR]], 10			; UNROLL-NO-VF-NEXT: [[USE_REC_1:%.*]] = sub i16 [[SCALAR_RECUR]], 10
	; UNROLL-NO-VF-NEXT: [[CMP:%.*]] = icmp eq i32 [[SCALAR_RECUR3]], 15			; UNROLL-NO-VF-NEXT: [[CMP:%.*]] = icmp eq i32 [[SCALAR_RECUR3]], 15
	; UNROLL-NO-VF-NEXT: [[IV_NEXT]] = add i16 [[IV]], 1			; UNROLL-NO-VF-NEXT: [[IV_NEXT]] = add i16 [[IV]], 1
	; UNROLL-NO-VF-NEXT: [[REC_2_PREV]] = zext i16 [[IV_NEXT]] to i32			; UNROLL-NO-VF-NEXT: [[REC_2_PREV]] = zext i16 [[IV_NEXT]] to i32
	; UNROLL-NO-VF-NEXT: [[REC_1_PREV]] = add i16 [[IV_NEXT]], 5			; UNROLL-NO-VF-NEXT: [[REC_1_PREV]] = add i16 [[IV_NEXT]], 5
	; UNROLL-NO-VF-NEXT: br i1 [[CMP]], label [[FOR_END]], label [[FOR_COND]], !llvm.loop [[LOOP24:![0-9]+]]			; UNROLL-NO-VF-NEXT: br i1 [[CMP]], label [[FOR_END]], label [[FOR_COND]], !llvm.loop [[LOOP25:![0-9]+]]
	; UNROLL-NO-VF: for.end:			; UNROLL-NO-VF: for.end:
	; UNROLL-NO-VF-NEXT: ret void			; UNROLL-NO-VF-NEXT: ret void
	;			;
	; SINK-AFTER-LABEL: @sink_dead_inst(			; SINK-AFTER-LABEL: @sink_dead_inst(
	; SINK-AFTER-NEXT: entry:			; SINK-AFTER-NEXT: entry:
	; SINK-AFTER-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; SINK-AFTER-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; SINK-AFTER: vector.ph:			; SINK-AFTER: vector.ph:
	; SINK-AFTER-NEXT: br label [[VECTOR_BODY:%.*]]			; SINK-AFTER-NEXT: br label [[VECTOR_BODY:%.*]]
	▲ Show 20 Lines • Show All 187 Lines • ▼ Show 20 Lines
	; UNROLL-NO-IC-NEXT: [[VAR9:%.*]] = icmp slt i32 [[VAR3]], 2			; UNROLL-NO-IC-NEXT: [[VAR9:%.*]] = icmp slt i32 [[VAR3]], 2
	; UNROLL-NO-IC-NEXT: br i1 [[VAR9]], label [[BB1]], label [[BB2]], !prof [[PROF28:![0-9]+]], !llvm.loop [[LOOP29:![0-9]+]]			; UNROLL-NO-IC-NEXT: br i1 [[VAR9]], label [[BB1]], label [[BB2]], !prof [[PROF28:![0-9]+]], !llvm.loop [[LOOP29:![0-9]+]]
	;			;
	; UNROLL-NO-VF-LABEL: @sink_into_replication_region(			; UNROLL-NO-VF-LABEL: @sink_into_replication_region(
	; UNROLL-NO-VF-NEXT: bb:			; UNROLL-NO-VF-NEXT: bb:
	; UNROLL-NO-VF-NEXT: [[TMP0:%.]] = add i32 [[Y:%.]], 1			; UNROLL-NO-VF-NEXT: [[TMP0:%.]] = add i32 [[Y:%.]], 1
	; UNROLL-NO-VF-NEXT: [[SMIN:%.*]] = call i32 @llvm.smin.i32(i32 [[Y]], i32 1)			; UNROLL-NO-VF-NEXT: [[SMIN:%.*]] = call i32 @llvm.smin.i32(i32 [[Y]], i32 1)
	; UNROLL-NO-VF-NEXT: [[TMP1:%.*]] = sub i32 [[TMP0]], [[SMIN]]			; UNROLL-NO-VF-NEXT: [[TMP1:%.*]] = sub i32 [[TMP0]], [[SMIN]]
	; UNROLL-NO-VF-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; UNROLL-NO-VF-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[TMP1]], 2
				; UNROLL-NO-VF-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; UNROLL-NO-VF: vector.ph:			; UNROLL-NO-VF: vector.ph:
	; UNROLL-NO-VF-NEXT: [[N_RND_UP:%.*]] = add i32 [[TMP1]], 1			; UNROLL-NO-VF-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[TMP1]], 2
	; UNROLL-NO-VF-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N_RND_UP]], 2			; UNROLL-NO-VF-NEXT: [[N_VEC:%.*]] = sub i32 [[TMP1]], [[N_MOD_VF]]
	; UNROLL-NO-VF-NEXT: [[N_VEC:%.*]] = sub i32 [[N_RND_UP]], [[N_MOD_VF]]
	; UNROLL-NO-VF-NEXT: [[IND_END:%.*]] = sub i32 [[Y]], [[N_VEC]]			; UNROLL-NO-VF-NEXT: [[IND_END:%.*]] = sub i32 [[Y]], [[N_VEC]]
	; UNROLL-NO-VF-NEXT: [[TRIP_COUNT_MINUS_1:%.*]] = sub i32 [[TMP1]], 1
	; UNROLL-NO-VF-NEXT: br label [[VECTOR_BODY:%.*]]			; UNROLL-NO-VF-NEXT: br label [[VECTOR_BODY:%.*]]
	; UNROLL-NO-VF: vector.body:			; UNROLL-NO-VF: vector.body:
	; UNROLL-NO-VF-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_UDIV_CONTINUE4:%.*]] ]			; UNROLL-NO-VF-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				david-armUnsubmitted Not Done Reply Inline Actions For what it's worth this new version looks better, but do you know why? Is it because we no longer allow tail-folding for a VF of 1? I assume it was trying to tail-fold before due to the low trip count. david-arm: For what it's worth this new version looks better, but do you know why? Is it because we no…
				dmgreenAuthorUnsubmitted Done Reply Inline Actions This was because: // Don't use a predicated vector body if the user has forced a vectorization // factor of 1. if (UserVF.isScalar()) SEL = CM_ScalarEpilogueAllowed; It overrides the "tiny trip count" SEL that was applying previously. I think that for VF=1 it makes a lot sense to not predicated the body, but I've removed that from this patch to keep the diff down. We can re-add it in the future if needed. dmgreen: This was because: ``` // Don't use a predicated vector body if the user has forced a…
	; UNROLL-NO-VF-NEXT: [[VECTOR_RECUR:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[TMP9:%.]], [[PRED_UDIV_CONTINUE4]] ]			; UNROLL-NO-VF-NEXT: [[VECTOR_RECUR:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[TMP5:%.]], [[VECTOR_BODY]] ]
	; UNROLL-NO-VF-NEXT: [[VEC_PHI:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[TMP10:%.]], [[PRED_UDIV_CONTINUE4]] ]			; UNROLL-NO-VF-NEXT: [[VEC_PHI:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[TMP6:%.]], [[VECTOR_BODY]] ]
	; UNROLL-NO-VF-NEXT: [[VEC_PHI1:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[TMP11:%.]], [[PRED_UDIV_CONTINUE4]] ]			; UNROLL-NO-VF-NEXT: [[VEC_PHI1:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[TMP7:%.]], [[VECTOR_BODY]] ]
	; UNROLL-NO-VF-NEXT: [[OFFSET_IDX:%.*]] = sub i32 [[Y]], [[INDEX]]			; UNROLL-NO-VF-NEXT: [[OFFSET_IDX:%.*]] = sub i32 [[Y]], [[INDEX]]
	; UNROLL-NO-VF-NEXT: [[VEC_IV:%.*]] = add i32 [[INDEX]], 0			; UNROLL-NO-VF-NEXT: [[TMP2:%.*]] = add i32 [[OFFSET_IDX]], 0
	; UNROLL-NO-VF-NEXT: [[VEC_IV2:%.*]] = add i32 [[INDEX]], 1			; UNROLL-NO-VF-NEXT: [[TMP3:%.*]] = add i32 [[OFFSET_IDX]], -1
	; UNROLL-NO-VF-NEXT: [[TMP2:%.*]] = icmp ule i32 [[VEC_IV]], [[TRIP_COUNT_MINUS_1]]			; UNROLL-NO-VF-NEXT: [[TMP4:%.*]] = udiv i32 219220132, [[TMP2]]
	; UNROLL-NO-VF-NEXT: [[TMP3:%.*]] = icmp ule i32 [[VEC_IV2]], [[TRIP_COUNT_MINUS_1]]			; UNROLL-NO-VF-NEXT: [[TMP5]] = udiv i32 219220132, [[TMP3]]
	; UNROLL-NO-VF-NEXT: br i1 [[TMP2]], label [[PRED_UDIV_IF:%.]], label [[PRED_UDIV_CONTINUE:%.]]			; UNROLL-NO-VF-NEXT: [[TMP6]] = add i32 [[VEC_PHI]], [[VECTOR_RECUR]]
	; UNROLL-NO-VF: pred.udiv.if:			; UNROLL-NO-VF-NEXT: [[TMP7]] = add i32 [[VEC_PHI1]], [[TMP4]]
	; UNROLL-NO-VF-NEXT: [[TMP4:%.*]] = add i32 [[OFFSET_IDX]], 0			; UNROLL-NO-VF-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 2
	; UNROLL-NO-VF-NEXT: [[TMP5:%.*]] = udiv i32 219220132, [[TMP4]]			; UNROLL-NO-VF-NEXT: [[TMP8:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
	; UNROLL-NO-VF-NEXT: br label [[PRED_UDIV_CONTINUE]]			; UNROLL-NO-VF-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !prof [[PROF26:![0-9]+]], !llvm.loop [[LOOP27:![0-9]+]]
	; UNROLL-NO-VF: pred.udiv.continue:
	; UNROLL-NO-VF-NEXT: [[TMP6:%.*]] = phi i32 [ poison, [[VECTOR_BODY]] ], [ [[TMP5]], [[PRED_UDIV_IF]] ]
	; UNROLL-NO-VF-NEXT: br i1 [[TMP3]], label [[PRED_UDIV_IF3:%.*]], label [[PRED_UDIV_CONTINUE4]]
	; UNROLL-NO-VF: pred.udiv.if3:
	; UNROLL-NO-VF-NEXT: [[TMP7:%.*]] = add i32 [[OFFSET_IDX]], -1
	; UNROLL-NO-VF-NEXT: [[TMP8:%.*]] = udiv i32 219220132, [[TMP7]]
	; UNROLL-NO-VF-NEXT: br label [[PRED_UDIV_CONTINUE4]]
	; UNROLL-NO-VF: pred.udiv.continue4:
	; UNROLL-NO-VF-NEXT: [[TMP9]] = phi i32 [ poison, [[PRED_UDIV_CONTINUE]] ], [ [[TMP8]], [[PRED_UDIV_IF3]] ]
	; UNROLL-NO-VF-NEXT: [[TMP10]] = add i32 [[VEC_PHI]], [[VECTOR_RECUR]]
	; UNROLL-NO-VF-NEXT: [[TMP11]] = add i32 [[VEC_PHI1]], [[TMP6]]
	; UNROLL-NO-VF-NEXT: [[TMP12:%.*]] = select i1 [[TMP2]], i32 [[TMP10]], i32 [[VEC_PHI]]
	; UNROLL-NO-VF-NEXT: [[TMP13:%.*]] = select i1 [[TMP3]], i32 [[TMP11]], i32 [[VEC_PHI1]]
	; UNROLL-NO-VF-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 2
	; UNROLL-NO-VF-NEXT: [[TMP14:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
	; UNROLL-NO-VF-NEXT: br i1 [[TMP14]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !prof [[PROF25:![0-9]+]], !llvm.loop [[LOOP26:![0-9]+]]
	; UNROLL-NO-VF: middle.block:			; UNROLL-NO-VF: middle.block:
	; UNROLL-NO-VF-NEXT: [[BIN_RDX:%.*]] = add i32 [[TMP13]], [[TMP12]]			; UNROLL-NO-VF-NEXT: [[BIN_RDX:%.*]] = add i32 [[TMP7]], [[TMP6]]
	; UNROLL-NO-VF-NEXT: br i1 true, label [[BB1:%.*]], label [[SCALAR_PH]]			; UNROLL-NO-VF-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[TMP1]], [[N_VEC]]
				; UNROLL-NO-VF-NEXT: br i1 [[CMP_N]], label [[BB1:%.*]], label [[SCALAR_PH]]
	; UNROLL-NO-VF: scalar.ph:			; UNROLL-NO-VF: scalar.ph:
	; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR_INIT:%.]] = phi i32 [ 0, [[BB:%.]] ], [ [[TMP9]], [[MIDDLE_BLOCK]] ]			; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR_INIT:%.]] = phi i32 [ 0, [[BB:%.]] ], [ [[TMP5]], [[MIDDLE_BLOCK]] ]
	; UNROLL-NO-VF-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ [[Y]], [[BB]] ]			; UNROLL-NO-VF-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ [[Y]], [[BB]] ]
	; UNROLL-NO-VF-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[BB]] ], [ [[BIN_RDX]], [[MIDDLE_BLOCK]] ]			; UNROLL-NO-VF-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[BB]] ], [ [[BIN_RDX]], [[MIDDLE_BLOCK]] ]
	; UNROLL-NO-VF-NEXT: br label [[BB2:%.*]]			; UNROLL-NO-VF-NEXT: br label [[BB2:%.*]]
	; UNROLL-NO-VF: bb1:			; UNROLL-NO-VF: bb1:
	; UNROLL-NO-VF-NEXT: [[VAR:%.]] = phi i32 [ [[VAR6:%.]], [[BB2]] ], [ [[BIN_RDX]], [[MIDDLE_BLOCK]] ]			; UNROLL-NO-VF-NEXT: [[VAR:%.]] = phi i32 [ [[VAR6:%.]], [[BB2]] ], [ [[BIN_RDX]], [[MIDDLE_BLOCK]] ]
	; UNROLL-NO-VF-NEXT: ret i32 [[VAR]]			; UNROLL-NO-VF-NEXT: ret i32 [[VAR]]
	; UNROLL-NO-VF: bb2:			; UNROLL-NO-VF: bb2:
	; UNROLL-NO-VF-NEXT: [[VAR3:%.]] = phi i32 [ [[VAR8:%.]], [[BB2]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; UNROLL-NO-VF-NEXT: [[VAR3:%.]] = phi i32 [ [[VAR8:%.]], [[BB2]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR:%.]] = phi i32 [ [[VAR7:%.]], [[BB2]] ], [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ]			; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR:%.]] = phi i32 [ [[VAR7:%.]], [[BB2]] ], [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ]
	; UNROLL-NO-VF-NEXT: [[VAR5:%.*]] = phi i32 [ [[VAR6]], [[BB2]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]			; UNROLL-NO-VF-NEXT: [[VAR5:%.*]] = phi i32 [ [[VAR6]], [[BB2]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
	; UNROLL-NO-VF-NEXT: [[VAR6]] = add i32 [[VAR5]], [[SCALAR_RECUR]]			; UNROLL-NO-VF-NEXT: [[VAR6]] = add i32 [[VAR5]], [[SCALAR_RECUR]]
	; UNROLL-NO-VF-NEXT: [[VAR7]] = udiv i32 219220132, [[VAR3]]			; UNROLL-NO-VF-NEXT: [[VAR7]] = udiv i32 219220132, [[VAR3]]
	; UNROLL-NO-VF-NEXT: [[VAR8]] = add nsw i32 [[VAR3]], -1			; UNROLL-NO-VF-NEXT: [[VAR8]] = add nsw i32 [[VAR3]], -1
	; UNROLL-NO-VF-NEXT: [[VAR9:%.*]] = icmp slt i32 [[VAR3]], 2			; UNROLL-NO-VF-NEXT: [[VAR9:%.*]] = icmp slt i32 [[VAR3]], 2
	; UNROLL-NO-VF-NEXT: br i1 [[VAR9]], label [[BB1]], label [[BB2]], !prof [[PROF27:![0-9]+]], !llvm.loop [[LOOP28:![0-9]+]]			; UNROLL-NO-VF-NEXT: br i1 [[VAR9]], label [[BB1]], label [[BB2]], !prof [[PROF28:![0-9]+]], !llvm.loop [[LOOP29:![0-9]+]]
	;			;
	; SINK-AFTER-LABEL: @sink_into_replication_region(			; SINK-AFTER-LABEL: @sink_into_replication_region(
	; SINK-AFTER-NEXT: bb:			; SINK-AFTER-NEXT: bb:
	; SINK-AFTER-NEXT: [[TMP0:%.]] = add i32 [[Y:%.]], 1			; SINK-AFTER-NEXT: [[TMP0:%.]] = add i32 [[Y:%.]], 1
	; SINK-AFTER-NEXT: [[SMIN:%.*]] = call i32 @llvm.smin.i32(i32 [[Y]], i32 1)			; SINK-AFTER-NEXT: [[SMIN:%.*]] = call i32 @llvm.smin.i32(i32 [[Y]], i32 1)
	; SINK-AFTER-NEXT: [[TMP1:%.*]] = sub i32 [[TMP0]], [[SMIN]]			; SINK-AFTER-NEXT: [[TMP1:%.*]] = sub i32 [[TMP0]], [[SMIN]]
	; SINK-AFTER-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; SINK-AFTER-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; SINK-AFTER: vector.ph:			; SINK-AFTER: vector.ph:
	▲ Show 20 Lines • Show All 300 Lines • ▼ Show 20 Lines
	; UNROLL-NO-IC-NEXT: [[VAR9:%.*]] = icmp slt i32 [[VAR3]], 2			; UNROLL-NO-IC-NEXT: [[VAR9:%.*]] = icmp slt i32 [[VAR3]], 2
	; UNROLL-NO-IC-NEXT: br i1 [[VAR9]], label [[BB1]], label [[BB2]], !prof [[PROF28]], !llvm.loop [[LOOP31:![0-9]+]]			; UNROLL-NO-IC-NEXT: br i1 [[VAR9]], label [[BB1]], label [[BB2]], !prof [[PROF28]], !llvm.loop [[LOOP31:![0-9]+]]
	;			;
	; UNROLL-NO-VF-LABEL: @sink_into_replication_region_multiple(			; UNROLL-NO-VF-LABEL: @sink_into_replication_region_multiple(
	; UNROLL-NO-VF-NEXT: bb:			; UNROLL-NO-VF-NEXT: bb:
	; UNROLL-NO-VF-NEXT: [[TMP0:%.]] = add i32 [[Y:%.]], 1			; UNROLL-NO-VF-NEXT: [[TMP0:%.]] = add i32 [[Y:%.]], 1
	; UNROLL-NO-VF-NEXT: [[SMIN:%.*]] = call i32 @llvm.smin.i32(i32 [[Y]], i32 1)			; UNROLL-NO-VF-NEXT: [[SMIN:%.*]] = call i32 @llvm.smin.i32(i32 [[Y]], i32 1)
	; UNROLL-NO-VF-NEXT: [[TMP1:%.*]] = sub i32 [[TMP0]], [[SMIN]]			; UNROLL-NO-VF-NEXT: [[TMP1:%.*]] = sub i32 [[TMP0]], [[SMIN]]
	; UNROLL-NO-VF-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; UNROLL-NO-VF-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[TMP1]], 2
				; UNROLL-NO-VF-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; UNROLL-NO-VF: vector.ph:			; UNROLL-NO-VF: vector.ph:
	; UNROLL-NO-VF-NEXT: [[N_RND_UP:%.*]] = add i32 [[TMP1]], 1			; UNROLL-NO-VF-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[TMP1]], 2
	; UNROLL-NO-VF-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N_RND_UP]], 2			; UNROLL-NO-VF-NEXT: [[N_VEC:%.*]] = sub i32 [[TMP1]], [[N_MOD_VF]]
	; UNROLL-NO-VF-NEXT: [[N_VEC:%.*]] = sub i32 [[N_RND_UP]], [[N_MOD_VF]]
	; UNROLL-NO-VF-NEXT: [[IND_END:%.*]] = sub i32 [[Y]], [[N_VEC]]			; UNROLL-NO-VF-NEXT: [[IND_END:%.*]] = sub i32 [[Y]], [[N_VEC]]
	; UNROLL-NO-VF-NEXT: [[TRIP_COUNT_MINUS_1:%.*]] = sub i32 [[TMP1]], 1
	; UNROLL-NO-VF-NEXT: br label [[VECTOR_BODY:%.*]]			; UNROLL-NO-VF-NEXT: br label [[VECTOR_BODY:%.*]]
	; UNROLL-NO-VF: vector.body:			; UNROLL-NO-VF: vector.body:
	; UNROLL-NO-VF-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE7:%.*]] ]			; UNROLL-NO-VF-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; UNROLL-NO-VF-NEXT: [[VECTOR_RECUR:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[TMP9:%.]], [[PRED_STORE_CONTINUE7]] ]			; UNROLL-NO-VF-NEXT: [[VECTOR_RECUR:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[TMP9:%.]], [[VECTOR_BODY]] ]
	; UNROLL-NO-VF-NEXT: [[VEC_PHI:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[TMP10:%.]], [[PRED_STORE_CONTINUE7]] ]			; UNROLL-NO-VF-NEXT: [[VEC_PHI:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[TMP10:%.]], [[VECTOR_BODY]] ]
	; UNROLL-NO-VF-NEXT: [[VEC_PHI2:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[TMP11:%.]], [[PRED_STORE_CONTINUE7]] ]			; UNROLL-NO-VF-NEXT: [[VEC_PHI2:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[TMP11:%.]], [[VECTOR_BODY]] ]
				; UNROLL-NO-VF-NEXT: [[TMP2:%.*]] = add i32 [[INDEX]], 0
				; UNROLL-NO-VF-NEXT: [[TMP3:%.*]] = add i32 [[INDEX]], 1
	; UNROLL-NO-VF-NEXT: [[OFFSET_IDX:%.*]] = sub i32 [[Y]], [[INDEX]]			; UNROLL-NO-VF-NEXT: [[OFFSET_IDX:%.*]] = sub i32 [[Y]], [[INDEX]]
	; UNROLL-NO-VF-NEXT: [[TMP2:%.*]] = add i32 [[OFFSET_IDX]], 0			; UNROLL-NO-VF-NEXT: [[TMP4:%.*]] = add i32 [[OFFSET_IDX]], 0
	; UNROLL-NO-VF-NEXT: [[TMP3:%.*]] = add i32 [[OFFSET_IDX]], -1			; UNROLL-NO-VF-NEXT: [[TMP5:%.*]] = add i32 [[OFFSET_IDX]], -1
	; UNROLL-NO-VF-NEXT: [[VEC_IV:%.*]] = add i32 [[INDEX]], 0			; UNROLL-NO-VF-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, ptr [[X:%.]], i32 [[TMP2]]
	; UNROLL-NO-VF-NEXT: [[VEC_IV3:%.*]] = add i32 [[INDEX]], 1			; UNROLL-NO-VF-NEXT: [[TMP7:%.*]] = getelementptr inbounds i32, ptr [[X]], i32 [[TMP3]]
	; UNROLL-NO-VF-NEXT: [[TMP4:%.*]] = icmp ule i32 [[VEC_IV]], [[TRIP_COUNT_MINUS_1]]			; UNROLL-NO-VF-NEXT: [[TMP8:%.*]] = udiv i32 219220132, [[TMP4]]
	; UNROLL-NO-VF-NEXT: [[TMP5:%.*]] = icmp ule i32 [[VEC_IV3]], [[TRIP_COUNT_MINUS_1]]			; UNROLL-NO-VF-NEXT: [[TMP9]] = udiv i32 219220132, [[TMP5]]
	; UNROLL-NO-VF-NEXT: br i1 [[TMP4]], label [[PRED_UDIV_IF:%.]], label [[PRED_UDIV_CONTINUE:%.]]
	; UNROLL-NO-VF: pred.udiv.if:
	; UNROLL-NO-VF-NEXT: [[TMP6:%.*]] = udiv i32 219220132, [[TMP2]]
	; UNROLL-NO-VF-NEXT: br label [[PRED_UDIV_CONTINUE]]
	; UNROLL-NO-VF: pred.udiv.continue:
	; UNROLL-NO-VF-NEXT: [[TMP7:%.*]] = phi i32 [ poison, [[VECTOR_BODY]] ], [ [[TMP6]], [[PRED_UDIV_IF]] ]
	; UNROLL-NO-VF-NEXT: br i1 [[TMP5]], label [[PRED_UDIV_IF4:%.]], label [[PRED_UDIV_CONTINUE5:%.]]
	; UNROLL-NO-VF: pred.udiv.if4:
	; UNROLL-NO-VF-NEXT: [[TMP8:%.*]] = udiv i32 219220132, [[TMP3]]
	; UNROLL-NO-VF-NEXT: br label [[PRED_UDIV_CONTINUE5]]
	; UNROLL-NO-VF: pred.udiv.continue5:
	; UNROLL-NO-VF-NEXT: [[TMP9]] = phi i32 [ poison, [[PRED_UDIV_CONTINUE]] ], [ [[TMP8]], [[PRED_UDIV_IF4]] ]
	; UNROLL-NO-VF-NEXT: [[TMP10]] = add i32 [[VEC_PHI]], [[VECTOR_RECUR]]			; UNROLL-NO-VF-NEXT: [[TMP10]] = add i32 [[VEC_PHI]], [[VECTOR_RECUR]]
	; UNROLL-NO-VF-NEXT: [[TMP11]] = add i32 [[VEC_PHI2]], [[TMP7]]			; UNROLL-NO-VF-NEXT: [[TMP11]] = add i32 [[VEC_PHI2]], [[TMP8]]
	; UNROLL-NO-VF-NEXT: br i1 [[TMP4]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]			; UNROLL-NO-VF-NEXT: store i32 [[TMP4]], ptr [[TMP6]], align 4
	; UNROLL-NO-VF: pred.store.if:			; UNROLL-NO-VF-NEXT: store i32 [[TMP5]], ptr [[TMP7]], align 4
	; UNROLL-NO-VF-NEXT: [[TMP12:%.*]] = add i32 [[INDEX]], 0			; UNROLL-NO-VF-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 2
	; UNROLL-NO-VF-NEXT: [[TMP13:%.]] = getelementptr inbounds i32, ptr [[X:%.]], i32 [[TMP12]]			; UNROLL-NO-VF-NEXT: [[TMP12:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
	; UNROLL-NO-VF-NEXT: store i32 [[TMP2]], ptr [[TMP13]], align 4			; UNROLL-NO-VF-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !prof [[PROF26]], !llvm.loop [[LOOP30:![0-9]+]]
	; UNROLL-NO-VF-NEXT: br label [[PRED_STORE_CONTINUE]]
	; UNROLL-NO-VF: pred.store.continue:
	; UNROLL-NO-VF-NEXT: br i1 [[TMP5]], label [[PRED_STORE_IF6:%.*]], label [[PRED_STORE_CONTINUE7]]
	; UNROLL-NO-VF: pred.store.if6:
	; UNROLL-NO-VF-NEXT: [[TMP14:%.*]] = add i32 [[INDEX]], 1
	; UNROLL-NO-VF-NEXT: [[TMP15:%.*]] = getelementptr inbounds i32, ptr [[X]], i32 [[TMP14]]
	; UNROLL-NO-VF-NEXT: store i32 [[TMP3]], ptr [[TMP15]], align 4
	; UNROLL-NO-VF-NEXT: br label [[PRED_STORE_CONTINUE7]]
	; UNROLL-NO-VF: pred.store.continue7:
	; UNROLL-NO-VF-NEXT: [[TMP16:%.*]] = select i1 [[TMP4]], i32 [[TMP10]], i32 [[VEC_PHI]]
	; UNROLL-NO-VF-NEXT: [[TMP17:%.*]] = select i1 [[TMP5]], i32 [[TMP11]], i32 [[VEC_PHI2]]
	; UNROLL-NO-VF-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 2
	; UNROLL-NO-VF-NEXT: [[TMP18:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
	; UNROLL-NO-VF-NEXT: br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !prof [[PROF25]], !llvm.loop [[LOOP29:![0-9]+]]
	; UNROLL-NO-VF: middle.block:			; UNROLL-NO-VF: middle.block:
	; UNROLL-NO-VF-NEXT: [[BIN_RDX:%.*]] = add i32 [[TMP17]], [[TMP16]]			; UNROLL-NO-VF-NEXT: [[BIN_RDX:%.*]] = add i32 [[TMP11]], [[TMP10]]
	; UNROLL-NO-VF-NEXT: br i1 true, label [[BB1:%.*]], label [[SCALAR_PH]]			; UNROLL-NO-VF-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[TMP1]], [[N_VEC]]
				; UNROLL-NO-VF-NEXT: br i1 [[CMP_N]], label [[BB1:%.*]], label [[SCALAR_PH]]
	; UNROLL-NO-VF: scalar.ph:			; UNROLL-NO-VF: scalar.ph:
	; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR_INIT:%.]] = phi i32 [ 0, [[BB:%.]] ], [ [[TMP9]], [[MIDDLE_BLOCK]] ]			; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR_INIT:%.]] = phi i32 [ 0, [[BB:%.]] ], [ [[TMP9]], [[MIDDLE_BLOCK]] ]
	; UNROLL-NO-VF-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ [[Y]], [[BB]] ]			; UNROLL-NO-VF-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ [[Y]], [[BB]] ]
	; UNROLL-NO-VF-NEXT: [[BC_RESUME_VAL1:%.*]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[BB]] ]			; UNROLL-NO-VF-NEXT: [[BC_RESUME_VAL1:%.*]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[BB]] ]
	; UNROLL-NO-VF-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[BB]] ], [ [[BIN_RDX]], [[MIDDLE_BLOCK]] ]			; UNROLL-NO-VF-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[BB]] ], [ [[BIN_RDX]], [[MIDDLE_BLOCK]] ]
	; UNROLL-NO-VF-NEXT: br label [[BB2:%.*]]			; UNROLL-NO-VF-NEXT: br label [[BB2:%.*]]
	; UNROLL-NO-VF: bb1:			; UNROLL-NO-VF: bb1:
	; UNROLL-NO-VF-NEXT: [[VAR:%.]] = phi i32 [ [[VAR6:%.]], [[BB2]] ], [ [[BIN_RDX]], [[MIDDLE_BLOCK]] ]			; UNROLL-NO-VF-NEXT: [[VAR:%.]] = phi i32 [ [[VAR6:%.]], [[BB2]] ], [ [[BIN_RDX]], [[MIDDLE_BLOCK]] ]
	; UNROLL-NO-VF-NEXT: ret i32 [[VAR]]			; UNROLL-NO-VF-NEXT: ret i32 [[VAR]]
	; UNROLL-NO-VF: bb2:			; UNROLL-NO-VF: bb2:
	; UNROLL-NO-VF-NEXT: [[VAR3:%.]] = phi i32 [ [[VAR8:%.]], [[BB2]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; UNROLL-NO-VF-NEXT: [[VAR3:%.]] = phi i32 [ [[VAR8:%.]], [[BB2]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; UNROLL-NO-VF-NEXT: [[IV:%.]] = phi i32 [ [[IV_NEXT:%.]], [[BB2]] ], [ [[BC_RESUME_VAL1]], [[SCALAR_PH]] ]			; UNROLL-NO-VF-NEXT: [[IV:%.]] = phi i32 [ [[IV_NEXT:%.]], [[BB2]] ], [ [[BC_RESUME_VAL1]], [[SCALAR_PH]] ]
	; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR:%.]] = phi i32 [ [[VAR7:%.]], [[BB2]] ], [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ]			; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR:%.]] = phi i32 [ [[VAR7:%.]], [[BB2]] ], [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ]
	; UNROLL-NO-VF-NEXT: [[VAR5:%.*]] = phi i32 [ [[VAR6]], [[BB2]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]			; UNROLL-NO-VF-NEXT: [[VAR5:%.*]] = phi i32 [ [[VAR6]], [[BB2]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
	; UNROLL-NO-VF-NEXT: [[G:%.*]] = getelementptr inbounds i32, ptr [[X]], i32 [[IV]]			; UNROLL-NO-VF-NEXT: [[G:%.*]] = getelementptr inbounds i32, ptr [[X]], i32 [[IV]]
	; UNROLL-NO-VF-NEXT: [[VAR6]] = add i32 [[VAR5]], [[SCALAR_RECUR]]			; UNROLL-NO-VF-NEXT: [[VAR6]] = add i32 [[VAR5]], [[SCALAR_RECUR]]
	; UNROLL-NO-VF-NEXT: [[VAR7]] = udiv i32 219220132, [[VAR3]]			; UNROLL-NO-VF-NEXT: [[VAR7]] = udiv i32 219220132, [[VAR3]]
	; UNROLL-NO-VF-NEXT: store i32 [[VAR3]], ptr [[G]], align 4			; UNROLL-NO-VF-NEXT: store i32 [[VAR3]], ptr [[G]], align 4
	; UNROLL-NO-VF-NEXT: [[VAR8]] = add nsw i32 [[VAR3]], -1			; UNROLL-NO-VF-NEXT: [[VAR8]] = add nsw i32 [[VAR3]], -1
	; UNROLL-NO-VF-NEXT: [[IV_NEXT]] = add nsw i32 [[IV]], 1			; UNROLL-NO-VF-NEXT: [[IV_NEXT]] = add nsw i32 [[IV]], 1
	; UNROLL-NO-VF-NEXT: [[VAR9:%.*]] = icmp slt i32 [[VAR3]], 2			; UNROLL-NO-VF-NEXT: [[VAR9:%.*]] = icmp slt i32 [[VAR3]], 2
	; UNROLL-NO-VF-NEXT: br i1 [[VAR9]], label [[BB1]], label [[BB2]], !prof [[PROF27]], !llvm.loop [[LOOP30:![0-9]+]]			; UNROLL-NO-VF-NEXT: br i1 [[VAR9]], label [[BB1]], label [[BB2]], !prof [[PROF28]], !llvm.loop [[LOOP31:![0-9]+]]
	;			;
	; SINK-AFTER-LABEL: @sink_into_replication_region_multiple(			; SINK-AFTER-LABEL: @sink_into_replication_region_multiple(
	; SINK-AFTER-NEXT: bb:			; SINK-AFTER-NEXT: bb:
	; SINK-AFTER-NEXT: [[TMP0:%.]] = add i32 [[Y:%.]], 1			; SINK-AFTER-NEXT: [[TMP0:%.]] = add i32 [[Y:%.]], 1
	; SINK-AFTER-NEXT: [[SMIN:%.*]] = call i32 @llvm.smin.i32(i32 [[Y]], i32 1)			; SINK-AFTER-NEXT: [[SMIN:%.*]] = call i32 @llvm.smin.i32(i32 [[Y]], i32 1)
	; SINK-AFTER-NEXT: [[TMP1:%.*]] = sub i32 [[TMP0]], [[SMIN]]			; SINK-AFTER-NEXT: [[TMP1:%.*]] = sub i32 [[TMP0]], [[SMIN]]
	; SINK-AFTER-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; SINK-AFTER-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; SINK-AFTER: vector.ph:			; SINK-AFTER: vector.ph:
	▲ Show 20 Lines • Show All 215 Lines • ▼ Show 20 Lines
	; UNROLL-NO-VF-NEXT: [[TMP6:%.*]] = zext i16 [[TMP4]] to i32			; UNROLL-NO-VF-NEXT: [[TMP6:%.*]] = zext i16 [[TMP4]] to i32
	; UNROLL-NO-VF-NEXT: [[TMP7]] = zext i16 [[TMP5]] to i32			; UNROLL-NO-VF-NEXT: [[TMP7]] = zext i16 [[TMP5]] to i32
	; UNROLL-NO-VF-NEXT: [[TMP8:%.]] = getelementptr i32, ptr [[A_PTR:%.]], i16 [[TMP0]]			; UNROLL-NO-VF-NEXT: [[TMP8:%.]] = getelementptr i32, ptr [[A_PTR:%.]], i16 [[TMP0]]
	; UNROLL-NO-VF-NEXT: [[TMP9:%.*]] = getelementptr i32, ptr [[A_PTR]], i16 [[TMP1]]			; UNROLL-NO-VF-NEXT: [[TMP9:%.*]] = getelementptr i32, ptr [[A_PTR]], i16 [[TMP1]]
	; UNROLL-NO-VF-NEXT: store i32 0, ptr [[TMP8]], align 4			; UNROLL-NO-VF-NEXT: store i32 0, ptr [[TMP8]], align 4
	; UNROLL-NO-VF-NEXT: store i32 0, ptr [[TMP9]], align 4			; UNROLL-NO-VF-NEXT: store i32 0, ptr [[TMP9]], align 4
	; UNROLL-NO-VF-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 2			; UNROLL-NO-VF-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 2
	; UNROLL-NO-VF-NEXT: [[TMP10:%.*]] = icmp eq i32 [[INDEX_NEXT]], 16			; UNROLL-NO-VF-NEXT: [[TMP10:%.*]] = icmp eq i32 [[INDEX_NEXT]], 16
	; UNROLL-NO-VF-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP31:![0-9]+]]			; UNROLL-NO-VF-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP32:![0-9]+]]
	; UNROLL-NO-VF: middle.block:			; UNROLL-NO-VF: middle.block:
	; UNROLL-NO-VF-NEXT: [[CMP_N:%.*]] = icmp eq i32 16, 16			; UNROLL-NO-VF-NEXT: [[CMP_N:%.*]] = icmp eq i32 16, 16
	; UNROLL-NO-VF-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; UNROLL-NO-VF-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; UNROLL-NO-VF: scalar.ph:			; UNROLL-NO-VF: scalar.ph:
	; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR_INIT:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]			; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR_INIT:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]
	; UNROLL-NO-VF-NEXT: [[BC_RESUME_VAL:%.*]] = phi i16 [ 16, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]			; UNROLL-NO-VF-NEXT: [[BC_RESUME_VAL:%.*]] = phi i16 [ 16, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
	; UNROLL-NO-VF-NEXT: br label [[LOOP:%.*]]			; UNROLL-NO-VF-NEXT: br label [[LOOP:%.*]]
	; UNROLL-NO-VF: loop:			; UNROLL-NO-VF: loop:
	; UNROLL-NO-VF-NEXT: [[IV:%.]] = phi i16 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP]] ]			; UNROLL-NO-VF-NEXT: [[IV:%.]] = phi i16 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP]] ]
	; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR:%.]] = phi i32 [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ], [ [[FOR_PREV:%.]], [[LOOP]] ]			; UNROLL-NO-VF-NEXT: [[SCALAR_RECUR:%.]] = phi i32 [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ], [ [[FOR_PREV:%.]], [[LOOP]] ]
	; UNROLL-NO-VF-NEXT: [[CMP:%.*]] = icmp eq i32 [[SCALAR_RECUR]], 15			; UNROLL-NO-VF-NEXT: [[CMP:%.*]] = icmp eq i32 [[SCALAR_RECUR]], 15
	; UNROLL-NO-VF-NEXT: [[C:%.*]] = icmp eq i1 [[CMP]], true			; UNROLL-NO-VF-NEXT: [[C:%.*]] = icmp eq i1 [[CMP]], true
	; UNROLL-NO-VF-NEXT: [[VEC_DEAD:%.*]] = and i1 [[C]], true			; UNROLL-NO-VF-NEXT: [[VEC_DEAD:%.*]] = and i1 [[C]], true
	; UNROLL-NO-VF-NEXT: [[IV_NEXT]] = add i16 [[IV]], 1			; UNROLL-NO-VF-NEXT: [[IV_NEXT]] = add i16 [[IV]], 1
	; UNROLL-NO-VF-NEXT: [[B1:%.*]] = or i16 [[IV_NEXT]], [[IV_NEXT]]			; UNROLL-NO-VF-NEXT: [[B1:%.*]] = or i16 [[IV_NEXT]], [[IV_NEXT]]
	; UNROLL-NO-VF-NEXT: [[B3:%.*]] = and i1 [[CMP]], [[C]]			; UNROLL-NO-VF-NEXT: [[B3:%.*]] = and i1 [[CMP]], [[C]]
	; UNROLL-NO-VF-NEXT: [[FOR_PREV]] = zext i16 [[B1]] to i32			; UNROLL-NO-VF-NEXT: [[FOR_PREV]] = zext i16 [[B1]] to i32
	; UNROLL-NO-VF-NEXT: [[EXT:%.*]] = zext i1 [[B3]] to i32			; UNROLL-NO-VF-NEXT: [[EXT:%.*]] = zext i1 [[B3]] to i32
	; UNROLL-NO-VF-NEXT: [[A_GEP:%.*]] = getelementptr i32, ptr [[A_PTR]], i16 [[IV]]			; UNROLL-NO-VF-NEXT: [[A_GEP:%.*]] = getelementptr i32, ptr [[A_PTR]], i16 [[IV]]
	; UNROLL-NO-VF-NEXT: store i32 0, ptr [[A_GEP]], align 4			; UNROLL-NO-VF-NEXT: store i32 0, ptr [[A_GEP]], align 4
	; UNROLL-NO-VF-NEXT: br i1 [[VEC_DEAD]], label [[FOR_END]], label [[LOOP]], !llvm.loop [[LOOP32:![0-9]+]]			; UNROLL-NO-VF-NEXT: br i1 [[VEC_DEAD]], label [[FOR_END]], label [[LOOP]], !llvm.loop [[LOOP33:![0-9]+]]
	; UNROLL-NO-VF: for.end:			; UNROLL-NO-VF: for.end:
	; UNROLL-NO-VF-NEXT: ret void			; UNROLL-NO-VF-NEXT: ret void
	;			;
	; SINK-AFTER-LABEL: @sink_after_dead_inst(			; SINK-AFTER-LABEL: @sink_after_dead_inst(
	; SINK-AFTER-NEXT: entry:			; SINK-AFTER-NEXT: entry:
	; SINK-AFTER-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; SINK-AFTER-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; SINK-AFTER: vector.ph:			; SINK-AFTER: vector.ph:
	; SINK-AFTER-NEXT: br label [[VECTOR_BODY:%.*]]			; SINK-AFTER-NEXT: br label [[VECTOR_BODY:%.*]]
	▲ Show 20 Lines • Show All 67 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/icmp-uniforms.ll

	Show All 30 Lines

	for.end:			for.end:
	%tmp4 = phi i32 [ %tmp3, %for.body ]			%tmp4 = phi i32 [ %tmp3, %for.body ]
	ret i32 %tmp4			ret i32 %tmp4
	}			}

	; Check for crash exposed by D76992.			; Check for crash exposed by D76992.
	; CHECK-LABEL: 'test'			; CHECK-LABEL: 'test'
	; CHECK: VPlan 'Initial VPlan for VF={4},UF>=1' {			; CHECK: VPlan 'Initial VPlan for Tail Folded VF={4},UF>=1' {
	; CHECK-NEXT: Live-in vp<[[VEC_TC:%.+]]> = vector-trip-count			; CHECK-NEXT: Live-in vp<[[VEC_TC:%.+]]> = vector-trip-count
	; CHECK-NEXT: Live-in vp<[[BTC:%.+]]> = backedge-taken count			; CHECK-NEXT: Live-in vp<[[BTC:%.+]]> = backedge-taken count
	; CHECK-EMPTY:			; CHECK-EMPTY:
	; CHECK-NEXT: vector.ph:			; CHECK-NEXT: vector.ph:
	; CHECK-NEXT: Successor(s): vector loop			; CHECK-NEXT: Successor(s): vector loop
	; CHECK-EMPTY:			; CHECK-EMPTY:
	; CHECK-NEXT: <x1> vector loop: {			; CHECK-NEXT: <x1> vector loop: {
	; CHECK-NEXT: vector.body:			; CHECK-NEXT: vector.body:
	▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll

	Show First 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 16, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 16, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[RIV:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[RIVPLUS1:%.]], [[LOOP]] ]			; CHECK-NEXT: [[RIV:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[RIVPLUS1:%.]], [[LOOP]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[A]], i32 [[RIV]]			; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[A]], i32 [[RIV]]
	; CHECK-NEXT: store i32 13, ptr [[ARRAYIDX]], align 1			; CHECK-NEXT: store i32 13, ptr [[ARRAYIDX]], align 1
	; CHECK-NEXT: [[RIVPLUS1]] = add nuw nsw i32 [[RIV]], 1			; CHECK-NEXT: [[RIVPLUS1]] = add nuw nsw i32 [[RIV]], 1
	; CHECK-NEXT: [[COND:%.*]] = icmp eq i32 [[RIVPLUS1]], 14			; CHECK-NEXT: [[COND:%.*]] = icmp eq i32 [[RIVPLUS1]], 14
	; CHECK-NEXT: br i1 [[COND]], label [[EXIT]], label [[LOOP]], !llvm.loop [[LOOP2:![0-9]+]]			; CHECK-NEXT: br i1 [[COND]], label [[EXIT]], label [[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	; VF2UF2-LABEL: @pr45679(			; VF2UF2-LABEL: @pr45679(
	; VF2UF2-NEXT: entry:			; VF2UF2-NEXT: entry:
	; VF2UF2-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; VF2UF2-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; VF2UF2: vector.ph:			; VF2UF2: vector.ph:
	; VF2UF2-NEXT: br label [[VECTOR_BODY:%.*]]			; VF2UF2-NEXT: br label [[VECTOR_BODY:%.*]]
	▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
	; VF2UF2-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 16, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; VF2UF2-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 16, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; VF2UF2-NEXT: br label [[LOOP:%.*]]			; VF2UF2-NEXT: br label [[LOOP:%.*]]
	; VF2UF2: loop:			; VF2UF2: loop:
	; VF2UF2-NEXT: [[RIV:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[RIVPLUS1:%.]], [[LOOP]] ]			; VF2UF2-NEXT: [[RIV:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[RIVPLUS1:%.]], [[LOOP]] ]
	; VF2UF2-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[A]], i32 [[RIV]]			; VF2UF2-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[A]], i32 [[RIV]]
	; VF2UF2-NEXT: store i32 13, ptr [[ARRAYIDX]], align 1			; VF2UF2-NEXT: store i32 13, ptr [[ARRAYIDX]], align 1
	; VF2UF2-NEXT: [[RIVPLUS1]] = add nuw nsw i32 [[RIV]], 1			; VF2UF2-NEXT: [[RIVPLUS1]] = add nuw nsw i32 [[RIV]], 1
	; VF2UF2-NEXT: [[COND:%.*]] = icmp eq i32 [[RIVPLUS1]], 14			; VF2UF2-NEXT: [[COND:%.*]] = icmp eq i32 [[RIVPLUS1]], 14
	; VF2UF2-NEXT: br i1 [[COND]], label [[EXIT]], label [[LOOP]], !llvm.loop [[LOOP2:![0-9]+]]			; VF2UF2-NEXT: br i1 [[COND]], label [[EXIT]], label [[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
	; VF2UF2: exit:			; VF2UF2: exit:
	; VF2UF2-NEXT: ret void			; VF2UF2-NEXT: ret void
	;			;
	; VF1UF4-LABEL: @pr45679(			; VF1UF4-LABEL: @pr45679(
	; VF1UF4-NEXT: entry:			; VF1UF4-NEXT: entry:
	; VF1UF4-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; VF1UF4-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; VF1UF4: vector.ph:			; VF1UF4: vector.ph:
	; VF1UF4-NEXT: br label [[VECTOR_BODY:%.*]]			; VF1UF4-NEXT: br label [[VECTOR_BODY:%.*]]
	; VF1UF4: vector.body:			; VF1UF4: vector.body:
	; VF1UF4-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE12:%.*]] ]			; VF1UF4-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; VF1UF4-NEXT: [[VEC_IV:%.*]] = add i32 [[INDEX]], 0			; VF1UF4-NEXT: [[TMP0:%.*]] = add i32 [[INDEX]], 0
	; VF1UF4-NEXT: [[VEC_IV4:%.*]] = add i32 [[INDEX]], 1			; VF1UF4-NEXT: [[TMP1:%.*]] = add i32 [[INDEX]], 1
	; VF1UF4-NEXT: [[VEC_IV5:%.*]] = add i32 [[INDEX]], 2			; VF1UF4-NEXT: [[TMP2:%.*]] = add i32 [[INDEX]], 2
	; VF1UF4-NEXT: [[VEC_IV6:%.*]] = add i32 [[INDEX]], 3			; VF1UF4-NEXT: [[TMP3:%.*]] = add i32 [[INDEX]], 3
	; VF1UF4-NEXT: [[TMP0:%.*]] = icmp ule i32 [[VEC_IV]], 13			; VF1UF4-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, ptr [[A:%.]], i32 [[TMP0]]
	; VF1UF4-NEXT: [[TMP1:%.*]] = icmp ule i32 [[VEC_IV4]], 13			; VF1UF4-NEXT: [[TMP5:%.*]] = getelementptr inbounds i32, ptr [[A]], i32 [[TMP1]]
	; VF1UF4-NEXT: [[TMP2:%.*]] = icmp ule i32 [[VEC_IV5]], 13			; VF1UF4-NEXT: [[TMP6:%.*]] = getelementptr inbounds i32, ptr [[A]], i32 [[TMP2]]
	; VF1UF4-NEXT: [[TMP3:%.*]] = icmp ule i32 [[VEC_IV6]], 13			; VF1UF4-NEXT: [[TMP7:%.*]] = getelementptr inbounds i32, ptr [[A]], i32 [[TMP3]]
	; VF1UF4-NEXT: br i1 [[TMP0]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]
	; VF1UF4: pred.store.if:
	; VF1UF4-NEXT: [[INDUCTION:%.*]] = add i32 [[INDEX]], 0
	; VF1UF4-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, ptr [[A:%.]], i32 [[INDUCTION]]
	; VF1UF4-NEXT: store i32 13, ptr [[TMP4]], align 1			; VF1UF4-NEXT: store i32 13, ptr [[TMP4]], align 1
	; VF1UF4-NEXT: br label [[PRED_STORE_CONTINUE]]
	; VF1UF4: pred.store.continue:
	; VF1UF4-NEXT: br i1 [[TMP1]], label [[PRED_STORE_IF7:%.]], label [[PRED_STORE_CONTINUE8:%.]]
	; VF1UF4: pred.store.if4:
	; VF1UF4-NEXT: [[INDUCTION1:%.*]] = add i32 [[INDEX]], 1
	; VF1UF4-NEXT: [[TMP5:%.*]] = getelementptr inbounds i32, ptr [[A]], i32 [[INDUCTION1]]
	; VF1UF4-NEXT: store i32 13, ptr [[TMP5]], align 1			; VF1UF4-NEXT: store i32 13, ptr [[TMP5]], align 1
	; VF1UF4-NEXT: br label [[PRED_STORE_CONTINUE8]]
	; VF1UF4: pred.store.continue5:
	; VF1UF4-NEXT: br i1 [[TMP2]], label [[PRED_STORE_IF9:%.]], label [[PRED_STORE_CONTINUE10:%.]]
	; VF1UF4: pred.store.if6:
	; VF1UF4-NEXT: [[INDUCTION2:%.*]] = add i32 [[INDEX]], 2
	; VF1UF4-NEXT: [[TMP6:%.*]] = getelementptr inbounds i32, ptr [[A]], i32 [[INDUCTION2]]
	; VF1UF4-NEXT: store i32 13, ptr [[TMP6]], align 1			; VF1UF4-NEXT: store i32 13, ptr [[TMP6]], align 1
	; VF1UF4-NEXT: br label [[PRED_STORE_CONTINUE10]]
	; VF1UF4: pred.store.continue7:
	; VF1UF4-NEXT: br i1 [[TMP3]], label [[PRED_STORE_IF11:%.*]], label [[PRED_STORE_CONTINUE12]]
	; VF1UF4: pred.store.if8:
	; VF1UF4-NEXT: [[INDUCTION3:%.*]] = add i32 [[INDEX]], 3
	; VF1UF4-NEXT: [[TMP7:%.*]] = getelementptr inbounds i32, ptr [[A]], i32 [[INDUCTION3]]
	; VF1UF4-NEXT: store i32 13, ptr [[TMP7]], align 1			; VF1UF4-NEXT: store i32 13, ptr [[TMP7]], align 1
	; VF1UF4-NEXT: br label [[PRED_STORE_CONTINUE12]]			; VF1UF4-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4
	; VF1UF4: pred.store.continue9:			; VF1UF4-NEXT: [[TMP8:%.*]] = icmp eq i32 [[INDEX_NEXT]], 12
	; VF1UF4-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4
	; VF1UF4-NEXT: [[TMP8:%.*]] = icmp eq i32 [[INDEX_NEXT]], 16
	; VF1UF4-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]			; VF1UF4-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; VF1UF4: middle.block:			; VF1UF4: middle.block:
	; VF1UF4-NEXT: br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]]			; VF1UF4-NEXT: [[CMP_N:%.*]] = icmp eq i32 14, 12
				; VF1UF4-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
	; VF1UF4: scalar.ph:			; VF1UF4: scalar.ph:
	; VF1UF4-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 16, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; VF1UF4-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 12, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; VF1UF4-NEXT: br label [[LOOP:%.*]]			; VF1UF4-NEXT: br label [[LOOP:%.*]]
	; VF1UF4: loop:			; VF1UF4: loop:
	; VF1UF4-NEXT: [[RIV:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[RIVPLUS1:%.]], [[LOOP]] ]			; VF1UF4-NEXT: [[RIV:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[RIVPLUS1:%.]], [[LOOP]] ]
	; VF1UF4-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[A]], i32 [[RIV]]			; VF1UF4-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[A]], i32 [[RIV]]
	; VF1UF4-NEXT: store i32 13, ptr [[ARRAYIDX]], align 1			; VF1UF4-NEXT: store i32 13, ptr [[ARRAYIDX]], align 1
	; VF1UF4-NEXT: [[RIVPLUS1]] = add nuw nsw i32 [[RIV]], 1			; VF1UF4-NEXT: [[RIVPLUS1]] = add nuw nsw i32 [[RIV]], 1
	; VF1UF4-NEXT: [[COND:%.*]] = icmp eq i32 [[RIVPLUS1]], 14			; VF1UF4-NEXT: [[COND:%.*]] = icmp eq i32 [[RIVPLUS1]], 14
	; VF1UF4-NEXT: br i1 [[COND]], label [[EXIT]], label [[LOOP]], !llvm.loop [[LOOP2:![0-9]+]]			; VF1UF4-NEXT: br i1 [[COND]], label [[EXIT]], label [[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
	; VF1UF4: exit:			; VF1UF4: exit:
	; VF1UF4-NEXT: ret void			; VF1UF4-NEXT: ret void
	;			;
	entry:			entry:
	br label %loop			br label %loop

	loop:			loop:
	%riv = phi i32 [ 0, %entry ], [ %rivPlus1, %loop ]			%riv = phi i32 [ 0, %entry ], [ %rivPlus1, %loop ]
	▲ Show 20 Lines • Show All 149 Lines • ▼ Show 20 Lines
	; VF2UF2-NEXT: ret void			; VF2UF2-NEXT: ret void
	;			;
	; VF1UF4-LABEL: @load_variant(			; VF1UF4-LABEL: @load_variant(
	; VF1UF4-NEXT: entry:			; VF1UF4-NEXT: entry:
	; VF1UF4-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; VF1UF4-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; VF1UF4: vector.ph:			; VF1UF4: vector.ph:
	; VF1UF4-NEXT: br label [[VECTOR_BODY:%.*]]			; VF1UF4-NEXT: br label [[VECTOR_BODY:%.*]]
	; VF1UF4: vector.body:			; VF1UF4: vector.body:
	; VF1UF4-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE12:%.*]] ]			; VF1UF4-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; VF1UF4-NEXT: [[VEC_IV:%.*]] = add i64 [[INDEX]], 0			; VF1UF4-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
	; VF1UF4-NEXT: [[VEC_IV4:%.*]] = add i64 [[INDEX]], 1			; VF1UF4-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
	; VF1UF4-NEXT: [[VEC_IV5:%.*]] = add i64 [[INDEX]], 2			; VF1UF4-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2
	; VF1UF4-NEXT: [[VEC_IV6:%.*]] = add i64 [[INDEX]], 3			; VF1UF4-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3
	; VF1UF4-NEXT: [[TMP0:%.*]] = icmp ule i64 [[VEC_IV]], 13			; VF1UF4-NEXT: [[TMP4:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP0]]
	; VF1UF4-NEXT: [[TMP1:%.*]] = icmp ule i64 [[VEC_IV4]], 13			; VF1UF4-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]
	; VF1UF4-NEXT: [[TMP2:%.*]] = icmp ule i64 [[VEC_IV5]], 13			; VF1UF4-NEXT: [[TMP6:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP2]]
	; VF1UF4-NEXT: [[TMP3:%.*]] = icmp ule i64 [[VEC_IV6]], 13			; VF1UF4-NEXT: [[TMP7:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP3]]
	; VF1UF4-NEXT: br i1 [[TMP0]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]			; VF1UF4-NEXT: [[TMP8:%.*]] = load i64, ptr [[TMP4]], align 8
	; VF1UF4: pred.store.if:			; VF1UF4-NEXT: [[TMP9:%.*]] = load i64, ptr [[TMP5]], align 8
	; VF1UF4-NEXT: [[INDUCTION:%.*]] = add i64 [[INDEX]], 0			; VF1UF4-NEXT: [[TMP10:%.*]] = load i64, ptr [[TMP6]], align 8
	; VF1UF4-NEXT: [[TMP4:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[INDUCTION]]			; VF1UF4-NEXT: [[TMP11:%.*]] = load i64, ptr [[TMP7]], align 8
	; VF1UF4-NEXT: [[TMP5:%.*]] = load i64, ptr [[TMP4]], align 8			; VF1UF4-NEXT: store i64 [[TMP8]], ptr [[B:%.*]], align 8
	; VF1UF4-NEXT: store i64 [[TMP5]], ptr [[B:%.*]], align 8			; VF1UF4-NEXT: store i64 [[TMP9]], ptr [[B]], align 8
	; VF1UF4-NEXT: br label [[PRED_STORE_CONTINUE]]			; VF1UF4-NEXT: store i64 [[TMP10]], ptr [[B]], align 8
	; VF1UF4: pred.store.continue:
	; VF1UF4-NEXT: [[TMP6:%.*]] = phi i64 [ poison, [[VECTOR_BODY]] ], [ [[TMP5]], [[PRED_STORE_IF]] ]
	; VF1UF4-NEXT: br i1 [[TMP1]], label [[PRED_STORE_IF7:%.]], label [[PRED_STORE_CONTINUE8:%.]]
	; VF1UF4: pred.store.if4:
	; VF1UF4-NEXT: [[INDUCTION1:%.*]] = add i64 [[INDEX]], 1
	; VF1UF4-NEXT: [[TMP7:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[INDUCTION1]]
	; VF1UF4-NEXT: [[TMP8:%.*]] = load i64, ptr [[TMP7]], align 8
	; VF1UF4-NEXT: store i64 [[TMP8]], ptr [[B]], align 8
	; VF1UF4-NEXT: br label [[PRED_STORE_CONTINUE8]]
	; VF1UF4: pred.store.continue5:
	; VF1UF4-NEXT: [[TMP9:%.*]] = phi i64 [ poison, [[PRED_STORE_CONTINUE]] ], [ [[TMP8]], [[PRED_STORE_IF7]] ]
	; VF1UF4-NEXT: br i1 [[TMP2]], label [[PRED_STORE_IF9:%.]], label [[PRED_STORE_CONTINUE10:%.]]
	; VF1UF4: pred.store.if6:
	; VF1UF4-NEXT: [[INDUCTION2:%.*]] = add i64 [[INDEX]], 2
	; VF1UF4-NEXT: [[TMP10:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[INDUCTION2]]
	; VF1UF4-NEXT: [[TMP11:%.*]] = load i64, ptr [[TMP10]], align 8
	; VF1UF4-NEXT: store i64 [[TMP11]], ptr [[B]], align 8			; VF1UF4-NEXT: store i64 [[TMP11]], ptr [[B]], align 8
	; VF1UF4-NEXT: br label [[PRED_STORE_CONTINUE10]]			; VF1UF4-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; VF1UF4: pred.store.continue7:			; VF1UF4-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], 12
	; VF1UF4-NEXT: [[TMP12:%.*]] = phi i64 [ poison, [[PRED_STORE_CONTINUE8]] ], [ [[TMP11]], [[PRED_STORE_IF9]] ]			; VF1UF4-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
	; VF1UF4-NEXT: br i1 [[TMP3]], label [[PRED_STORE_IF11:%.*]], label [[PRED_STORE_CONTINUE12]]
	; VF1UF4: pred.store.if8:
	; VF1UF4-NEXT: [[INDUCTION3:%.*]] = add i64 [[INDEX]], 3
	; VF1UF4-NEXT: [[TMP13:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[INDUCTION3]]
	; VF1UF4-NEXT: [[TMP14:%.*]] = load i64, ptr [[TMP13]], align 8
	; VF1UF4-NEXT: store i64 [[TMP14]], ptr [[B]], align 8
	; VF1UF4-NEXT: br label [[PRED_STORE_CONTINUE12]]
	; VF1UF4: pred.store.continue9:
	; VF1UF4-NEXT: [[TMP15:%.*]] = phi i64 [ poison, [[PRED_STORE_CONTINUE10]] ], [ [[TMP14]], [[PRED_STORE_IF11]] ]
	; VF1UF4-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4
	; VF1UF4-NEXT: [[TMP16:%.*]] = icmp eq i64 [[INDEX_NEXT]], 16
	; VF1UF4-NEXT: br i1 [[TMP16]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
	; VF1UF4: middle.block:			; VF1UF4: middle.block:
	; VF1UF4-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]			; VF1UF4-NEXT: [[CMP_N:%.*]] = icmp eq i64 14, 12
				; VF1UF4-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; VF1UF4: scalar.ph:			; VF1UF4: scalar.ph:
	; VF1UF4-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 16, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; VF1UF4-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 12, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; VF1UF4-NEXT: br label [[FOR_BODY:%.*]]			; VF1UF4-NEXT: br label [[FOR_BODY:%.*]]
	; VF1UF4: for.body:			; VF1UF4: for.body:
	; VF1UF4-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]			; VF1UF4-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
	; VF1UF4-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]			; VF1UF4-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
	; VF1UF4-NEXT: [[V:%.*]] = load i64, ptr [[ARRAYIDX]], align 8			; VF1UF4-NEXT: [[V:%.*]] = load i64, ptr [[ARRAYIDX]], align 8
	; VF1UF4-NEXT: store i64 [[V]], ptr [[B]], align 8			; VF1UF4-NEXT: store i64 [[V]], ptr [[B]], align 8
	; VF1UF4-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; VF1UF4-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; VF1UF4-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 14			; VF1UF4-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 14
	; VF1UF4-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]			; VF1UF4-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
	; VF1UF4: for.end:			; VF1UF4: for.end:
	; VF1UF4-NEXT: ret void			; VF1UF4-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
	Show All 10 Lines

llvm/test/Transforms/LoopVectorize/vplan-sink-scalars-and-merge.ll

; REQUIRES: asserts		; REQUIRES: asserts

; RUN: opt -passes=loop-vectorize -force-vector-interleave=1 -force-vector-width=2 -debug -disable-output %s 2>&1 \| FileCheck %s		; RUN: opt -passes=loop-vectorize -force-vector-interleave=1 -force-vector-width=2 -debug -disable-output %s 2>&1 \| FileCheck %s

target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"		target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"

@a = common global [2048 x i32] zeroinitializer, align 16		@a = common global [2048 x i32] zeroinitializer, align 16
@b = common global [2048 x i32] zeroinitializer, align 16		@b = common global [2048 x i32] zeroinitializer, align 16
@c = common global [2048 x i32] zeroinitializer, align 16		@c = common global [2048 x i32] zeroinitializer, align 16


; CHECK-LABEL: LV: Checking a loop in 'sink1'		; CHECK-LABEL: LV: Checking a loop in 'sink1'
; CHECK: VPlan 'Initial VPlan for VF={2},UF>=1' {		; CHECK: VPlan 'Initial VPlan for Tail Folded VF={2},UF>=1' {
; CHECK-NEXT: Live-in vp<[[VEC_TC:%.+]]> = vector-trip-count		; CHECK-NEXT: Live-in vp<[[VEC_TC:%.+]]> = vector-trip-count
; CHECK-NEXT: Live-in vp<[[BTC:%.+]]> = backedge-taken count		; CHECK-NEXT: Live-in vp<[[BTC:%.+]]> = backedge-taken count
; CHECK-EMPTY:		; CHECK-EMPTY:
; CHECK-NEXT: vector.ph:		; CHECK-NEXT: vector.ph:
; CHECK-NEXT: Successor(s): vector loop		; CHECK-NEXT: Successor(s): vector loop
; CHECK-EMPTY:		; CHECK-EMPTY:
; CHECK-NEXT: <x1> vector loop: {		; CHECK-NEXT: <x1> vector loop: {
; CHECK-NEXT: vector.body:		; CHECK-NEXT: vector.body:
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	loop:
%realexit = or i1 %large, %exitcond		%realexit = or i1 %large, %exitcond
br i1 %realexit, label %exit, label %loop		br i1 %realexit, label %exit, label %loop

exit:		exit:
ret void		ret void
}		}

; CHECK-LABEL: LV: Checking a loop in 'sink2'		; CHECK-LABEL: LV: Checking a loop in 'sink2'
; CHECK: VPlan 'Initial VPlan for VF={2},UF>=1' {		; CHECK: VPlan 'Initial VPlan for Tail Folded VF={2},UF>=1' {
; CHECK-NEXT: Live-in vp<[[VEC_TC:%.+]]> = vector-trip-count		; CHECK-NEXT: Live-in vp<[[VEC_TC:%.+]]> = vector-trip-count
; CHECK-NEXT: Live-in vp<[[BTC:%.+]]> = backedge-taken count		; CHECK-NEXT: Live-in vp<[[BTC:%.+]]> = backedge-taken count
; CHECK-EMPTY:		; CHECK-EMPTY:
; CHECK-NEXT: vector.ph:		; CHECK-NEXT: vector.ph:
; CHECK-NEXT: Successor(s): vector loop		; CHECK-NEXT: Successor(s): vector loop
; CHECK-EMPTY:		; CHECK-EMPTY:
; CHECK-NEXT: <x1> vector loop: {		; CHECK-NEXT: <x1> vector loop: {
; CHECK-NEXT: vector.body:		; CHECK-NEXT: vector.body:
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	loop:
%realexit = or i1 %large, %exitcond		%realexit = or i1 %large, %exitcond
br i1 %realexit, label %exit, label %loop		br i1 %realexit, label %exit, label %loop

exit:		exit:
ret void		ret void
}		}

; CHECK-LABEL: LV: Checking a loop in 'sink3'		; CHECK-LABEL: LV: Checking a loop in 'sink3'
; CHECK: VPlan 'Initial VPlan for VF={2},UF>=1' {		; CHECK: VPlan 'Initial VPlan for Tail Folded VF={2},UF>=1' {
; CHECK-NEXT: Live-in vp<[[VEC_TC:%.+]]> = vector-trip-count		; CHECK-NEXT: Live-in vp<[[VEC_TC:%.+]]> = vector-trip-count
; CHECK-NEXT: Live-in vp<[[BTC:%.+]]> = backedge-taken count		; CHECK-NEXT: Live-in vp<[[BTC:%.+]]> = backedge-taken count
; CHECK-EMPTY:		; CHECK-EMPTY:
; CHECK-NEXT: vector.ph:		; CHECK-NEXT: vector.ph:
; CHECK-NEXT: Successor(s): vector loop		; CHECK-NEXT: Successor(s): vector loop
; CHECK-EMPTY:		; CHECK-EMPTY:
; CHECK-NEXT: <x1> vector loop: {		; CHECK-NEXT: <x1> vector loop: {
; CHECK-NEXT: vector.body:		; CHECK-NEXT: vector.body:
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines

exit:		exit:
ret void		ret void
}		}

; Make sure we do not sink uniform instructions.		; Make sure we do not sink uniform instructions.
define void @uniform_gep(i64 %k, ptr noalias %A, ptr noalias %B) {		define void @uniform_gep(i64 %k, ptr noalias %A, ptr noalias %B) {
; CHECK-LABEL: LV: Checking a loop in 'uniform_gep'		; CHECK-LABEL: LV: Checking a loop in 'uniform_gep'
; CHECK: VPlan 'Initial VPlan for VF={2},UF>=1' {		; CHECK: VPlan 'Initial VPlan for Tail Folded VF={2},UF>=1' {
; CHECK-NEXT: Live-in vp<[[VEC_TC:%.+]]> = vector-trip-count		; CHECK-NEXT: Live-in vp<[[VEC_TC:%.+]]> = vector-trip-count
; CHECK-NEXT: Live-in vp<[[BTC:%.+]]> = backedge-taken count		; CHECK-NEXT: Live-in vp<[[BTC:%.+]]> = backedge-taken count
; CHECK-EMPTY:		; CHECK-EMPTY:
; CHECK-NEXT: vector.ph:		; CHECK-NEXT: vector.ph:
; CHECK-NEXT: Successor(s): vector loop		; CHECK-NEXT: Successor(s): vector loop
; CHECK-EMPTY:		; CHECK-EMPTY:
; CHECK-NEXT: <x1> vector loop: {		; CHECK-NEXT: <x1> vector loop: {
; CHECK-NEXT: vector.body:		; CHECK-NEXT: vector.body:
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	loop.latch:
br i1 %cmp179, label %loop, label %exit		br i1 %cmp179, label %loop, label %exit
exit:		exit:
ret void		ret void
}		}

; Loop with predicated load.		; Loop with predicated load.
define void @pred_cfg1(i32 %k, i32 %j) {		define void @pred_cfg1(i32 %k, i32 %j) {
; CHECK-LABEL: LV: Checking a loop in 'pred_cfg1'		; CHECK-LABEL: LV: Checking a loop in 'pred_cfg1'
; CHECK: VPlan 'Initial VPlan for VF={2},UF>=1' {		; CHECK: VPlan 'Initial VPlan for Tail Folded VF={2},UF>=1' {
; CHECK-NEXT: Live-in vp<[[VEC_TC:%.+]]> = vector-trip-count		; CHECK-NEXT: Live-in vp<[[VEC_TC:%.+]]> = vector-trip-count
; CHECK-NEXT: Live-in vp<[[BTC:%.+]]> = backedge-taken count		; CHECK-NEXT: Live-in vp<[[BTC:%.+]]> = backedge-taken count
; CHECK-EMPTY:		; CHECK-EMPTY:
; CHECK-NEXT: vector.ph:		; CHECK-NEXT: vector.ph:
; CHECK-NEXT: Successor(s): vector loop		; CHECK-NEXT: Successor(s): vector loop
; CHECK-EMPTY:		; CHECK-EMPTY:
; CHECK-NEXT: <x1> vector loop: {		; CHECK-NEXT: <x1> vector loop: {
; CHECK-NEXT: vector.body:		; CHECK-NEXT: vector.body:
▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines
exit:		exit:
ret void		ret void
}		}

; Loop with predicated load and store in separate blocks, store depends on		; Loop with predicated load and store in separate blocks, store depends on
; loaded value.		; loaded value.
define void @pred_cfg2(i32 %k, i32 %j) {		define void @pred_cfg2(i32 %k, i32 %j) {
; CHECK-LABEL: LV: Checking a loop in 'pred_cfg2'		; CHECK-LABEL: LV: Checking a loop in 'pred_cfg2'
; CHECK: VPlan 'Initial VPlan for VF={2},UF>=1' {		; CHECK: VPlan 'Initial VPlan for Tail Folded VF={2},UF>=1' {
; CHECK-NEXT: Live-in vp<[[VEC_TC:%.+]]> = vector-trip-count		; CHECK-NEXT: Live-in vp<[[VEC_TC:%.+]]> = vector-trip-count
; CHECK-NEXT: Live-in vp<[[BTC:%.+]]> = backedge-taken count		; CHECK-NEXT: Live-in vp<[[BTC:%.+]]> = backedge-taken count
; CHECK-EMPTY:		; CHECK-EMPTY:
; CHECK-NEXT: vector.ph:		; CHECK-NEXT: vector.ph:
; CHECK-NEXT: Successor(s): vector loop		; CHECK-NEXT: Successor(s): vector loop
; CHECK-EMPTY:		; CHECK-EMPTY:
; CHECK-NEXT: <x1> vector loop: {		; CHECK-NEXT: <x1> vector loop: {
; CHECK-NEXT: vector.body:		; CHECK-NEXT: vector.body:
▲ Show 20 Lines • Show All 86 Lines • ▼ Show 20 Lines
exit:		exit:
ret void		ret void
}		}

; Loop with predicated load and store in separate blocks, store does not depend		; Loop with predicated load and store in separate blocks, store does not depend
; on loaded value.		; on loaded value.
define void @pred_cfg3(i32 %k, i32 %j) {		define void @pred_cfg3(i32 %k, i32 %j) {
; CHECK-LABEL: LV: Checking a loop in 'pred_cfg3'		; CHECK-LABEL: LV: Checking a loop in 'pred_cfg3'
; CHECK: VPlan 'Initial VPlan for VF={2},UF>=1' {		; CHECK: VPlan 'Initial VPlan for Tail Folded VF={2},UF>=1' {
; CHECK-NEXT: Live-in vp<[[VEC_TC:%.+]]> = vector-trip-count		; CHECK-NEXT: Live-in vp<[[VEC_TC:%.+]]> = vector-trip-count
; CHECK-NEXT: Live-in vp<[[BTC:%.+]]> = backedge-taken count		; CHECK-NEXT: Live-in vp<[[BTC:%.+]]> = backedge-taken count
; CHECK-EMPTY:		; CHECK-EMPTY:
; CHECK-NEXT: vector.ph:		; CHECK-NEXT: vector.ph:
; CHECK-NEXT: Successor(s): vector loop		; CHECK-NEXT: Successor(s): vector loop
; CHECK-EMPTY:		; CHECK-EMPTY:
; CHECK-NEXT: <x1> vector loop: {		; CHECK-NEXT: <x1> vector loop: {
; CHECK-NEXT: vector.body:		; CHECK-NEXT: vector.body:
▲ Show 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	next.1:
br i1 %realexit, label %exit, label %loop		br i1 %realexit, label %exit, label %loop

exit:		exit:
ret void		ret void
}		}

define void @merge_3_replicate_region(i32 %k, i32 %j) {		define void @merge_3_replicate_region(i32 %k, i32 %j) {
; CHECK-LABEL: LV: Checking a loop in 'merge_3_replicate_region'		; CHECK-LABEL: LV: Checking a loop in 'merge_3_replicate_region'
; CHECK: VPlan 'Initial VPlan for VF={2},UF>=1' {		; CHECK: VPlan 'Initial VPlan for Tail Folded VF={2},UF>=1' {
; CHECK-NEXT: Live-in vp<[[VEC_TC:%.+]]> = vector-trip-count		; CHECK-NEXT: Live-in vp<[[VEC_TC:%.+]]> = vector-trip-count
; CHECK-NEXT: Live-in vp<[[BTC:%.+]]> = backedge-taken count		; CHECK-NEXT: Live-in vp<[[BTC:%.+]]> = backedge-taken count
; CHECK-EMPTY:		; CHECK-EMPTY:
; CHECK-NEXT: vector.ph:		; CHECK-NEXT: vector.ph:
; CHECK-NEXT: Successor(s): vector loop		; CHECK-NEXT: Successor(s): vector loop
; CHECK-EMPTY:		; CHECK-EMPTY:
; CHECK-NEXT: <x1> vector loop: {		; CHECK-NEXT: <x1> vector loop: {
; CHECK-NEXT: vector.body:		; CHECK-NEXT: vector.body:
▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines

exit:		exit:
ret void		ret void
}		}


define void @update_2_uses_in_same_recipe_in_merged_block(i32 %k) {		define void @update_2_uses_in_same_recipe_in_merged_block(i32 %k) {
; CHECK-LABEL: LV: Checking a loop in 'update_2_uses_in_same_recipe_in_merged_block'		; CHECK-LABEL: LV: Checking a loop in 'update_2_uses_in_same_recipe_in_merged_block'
; CHECK: VPlan 'Initial VPlan for VF={2},UF>=1' {		; CHECK: VPlan 'Initial VPlan for Tail Folded VF={2},UF>=1' {
; CHECK-NEXT: Live-in vp<[[VEC_TC:%.+]]> = vector-trip-count		; CHECK-NEXT: Live-in vp<[[VEC_TC:%.+]]> = vector-trip-count
; CHECK-NEXT: Live-in vp<[[BTC:%.+]]> = backedge-taken count		; CHECK-NEXT: Live-in vp<[[BTC:%.+]]> = backedge-taken count
; CHECK-EMPTY:		; CHECK-EMPTY:
; CHECK-NEXT: vector.ph:		; CHECK-NEXT: vector.ph:
; CHECK-NEXT: Successor(s): vector loop		; CHECK-NEXT: Successor(s): vector loop
; CHECK-EMPTY:		; CHECK-EMPTY:
; CHECK-NEXT: <x1> vector loop: {		; CHECK-NEXT: <x1> vector loop: {
; CHECK-NEXT: vector.body:		; CHECK-NEXT: vector.body:
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	loop:
br i1 %realexit, label %exit, label %loop		br i1 %realexit, label %exit, label %loop

exit:		exit:
ret void		ret void
}		}

define void @recipe_in_merge_candidate_used_by_first_order_recurrence(i32 %k) {		define void @recipe_in_merge_candidate_used_by_first_order_recurrence(i32 %k) {
; CHECK-LABEL: LV: Checking a loop in 'recipe_in_merge_candidate_used_by_first_order_recurrence'		; CHECK-LABEL: LV: Checking a loop in 'recipe_in_merge_candidate_used_by_first_order_recurrence'
; CHECK: VPlan 'Initial VPlan for VF={2},UF>=1' {		; CHECK: VPlan 'Initial VPlan for Tail Folded VF={2},UF>=1' {
; CHECK-NEXT: Live-in vp<[[VEC_TC:%.+]]> = vector-trip-count		; CHECK-NEXT: Live-in vp<[[VEC_TC:%.+]]> = vector-trip-count
; CHECK-NEXT: Live-in vp<[[BTC:%.+]]> = backedge-taken count		; CHECK-NEXT: Live-in vp<[[BTC:%.+]]> = backedge-taken count
; CHECK-EMPTY:		; CHECK-EMPTY:
; CHECK-NEXT: vector.ph:		; CHECK-NEXT: vector.ph:
; CHECK-NEXT: Successor(s): vector loop		; CHECK-NEXT: Successor(s): vector loop
; CHECK-EMPTY:		; CHECK-EMPTY:
; CHECK-NEXT: <x1> vector loop: {		; CHECK-NEXT: <x1> vector loop: {
; CHECK-NEXT: vector.body:		; CHECK-NEXT: vector.body:
▲ Show 20 Lines • Show All 193 Lines • ▼ Show 20 Lines
exit:		exit:
ret void		ret void
}		}

; Test case with a dead GEP between the load and store regions. Dead recipes		; Test case with a dead GEP between the load and store regions. Dead recipes
; need to be removed before merging.		; need to be removed before merging.
define void @merge_with_dead_gep_between_regions(i32 %n, ptr noalias %src, ptr noalias %dst) optsize {		define void @merge_with_dead_gep_between_regions(i32 %n, ptr noalias %src, ptr noalias %dst) optsize {
; CHECK-LABEL: LV: Checking a loop in 'merge_with_dead_gep_between_regions'		; CHECK-LABEL: LV: Checking a loop in 'merge_with_dead_gep_between_regions'
; CHECK: VPlan 'Initial VPlan for VF={2},UF>=1' {		; CHECK: VPlan 'Initial VPlan for Tail Folded VF={2},UF>=1' {
		david-armUnsubmitted Not Done Reply Inline Actions These debug output changes look useful by themselves outside of this patch - not sure if it's possible to pass in the `FoldTailByMasking` flag in a separate patch? david-arm: These debug output changes look useful by themselves outside of this patch - not sure if it's…
		dmgreenAuthorUnsubmitted Done Reply Inline Actions Hmm. It would involve pulling out VPlan->FoldTailByMasking into a separate patch. That feels like it is the core of this patch, to be honest. Pulling it out just for some debug messages that are usually present elsewhere feels like a bit of an odd patch on it's own. When you look at the whole debug output there is already parts explaining whether the CostModel is FoldTailByMasking. dmgreen: Hmm. It would involve pulling out VPlan->FoldTailByMasking into a separate patch. That feels…
; CHECK-NEXT: Live-in vp<[[VEC_TC:%.+]]> = vector-trip-count		; CHECK-NEXT: Live-in vp<[[VEC_TC:%.+]]> = vector-trip-count
; CHECK-NEXT: Live-in vp<[[BTC:%.+]]> = backedge-taken count		; CHECK-NEXT: Live-in vp<[[BTC:%.+]]> = backedge-taken count
; CHECK-EMPTY:		; CHECK-EMPTY:
; CHECK-NEXT: vector.ph:		; CHECK-NEXT: vector.ph:
; CHECK-NEXT: Successor(s): vector loop		; CHECK-NEXT: Successor(s): vector loop
; CHECK-EMPTY:		; CHECK-EMPTY:
; CHECK-NEXT: <x1> vector loop: {		; CHECK-NEXT: <x1> vector loop: {
; CHECK-NEXT: vector.body:		; CHECK-NEXT: vector.body:
▲ Show 20 Lines • Show All 119 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LV] Plan with and without FoldTailByMaskingNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 508547

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/lib/Transforms/Vectorize/VPlan.h

llvm/lib/Transforms/Vectorize/VPlan.cpp

llvm/test/Transforms/LoopVectorize/AArch64/masked-call.ll

llvm/test/Transforms/LoopVectorize/AArch64/maximize-bandwidth-invalidate.ll

llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-forced.ll

llvm/test/Transforms/LoopVectorize/AArch64/tail-folding-styles.ll

llvm/test/Transforms/LoopVectorize/ARM/mve-known-trip-count.ll

llvm/test/Transforms/LoopVectorize/ARM/tail-folding-reduces-vf.ll

llvm/test/Transforms/LoopVectorize/PowerPC/reg-usage.ll

llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll

llvm/test/Transforms/LoopVectorize/X86/vect.omp.force.small-tc.ll

llvm/test/Transforms/LoopVectorize/first-order-recurrence-sink-replicate-region.ll

llvm/test/Transforms/LoopVectorize/first-order-recurrence.ll

llvm/test/Transforms/LoopVectorize/icmp-uniforms.ll

llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll

llvm/test/Transforms/LoopVectorize/vplan-sink-scalars-and-merge.ll

[LV] Plan with and without FoldTailByMasking
Needs ReviewPublic