This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/IR/
-
llvm/
-
IR/
-
IRBuilder.h
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
-
VPlan.h
-
VPlan.cpp
-
VPlanValue.h

Differential D126363

[VPlan, VP] 1/4 Introduce new recipes to support predicated vectorization
Needs ReviewPublic

Authored by simoll on May 25 2022, 3:10 AM.

Download Raw Diff

Details

Reviewers

vkmr
fhahn
ABataev
rogfer01
craig.topper
npanchen
bmahjour
Ayal
gilr

Summary

This patch introduces new VPlan recipes VPWidenEVLRecipe, VPPredicatedWidenRecipe, VPPredicatedWidenMemoryInstrctionRecipe, and AllTrueMask VPInstruction as a first step to enable predicated vectorization introduced in RFC patch D99750.

co-authored-by: Simon Moll <moll@cs.uni-saarland.de>

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

simoll created this revision.May 25 2022, 3:10 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 25 2022, 3:10 AM

Herald added subscribers: StephenFan, tschuett, psnobl and 2 others. · View Herald Transcript

simoll requested review of this revision.May 25 2022, 3:10 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 25 2022, 3:10 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

ABataev added a reviewer: npanchen.May 25 2022, 3:12 AM

This review replaces D104608 , which I am effectively commandeering as a co-author for the four original sub-patches by Vineet Kumar.

Hi Simon, did you think about making EVL a member of VPlan just like TripCount? In this case we might be not needed lots of these new classes.

Harbormaster completed remote builds in B166228: Diff 431925.May 25 2022, 3:23 AM

In D126363#3536686, @ABataev wrote:

Hi Simon, did you think about making EVL a member of VPlan just like TripCount? In this case we might be not needed lots of these new classes.

Hi Alexey! The EVL behaves like a mask and less like the TripCount. When used for tail predication, the value of EVL still depends on the current vector iteration and needs to be computed in the vector loop. Also - depending on you target - setting EVL is relatively light weight and instructions in the vector loop may have different EVLs in the future.

In D126363#3537197, @simoll wrote:

In D126363#3536686, @ABataev wrote:

Hi Simon, did you think about making EVL a member of VPlan just like TripCount? In this case we might be not needed lots of these new classes.

Hi Alexey! The EVL behaves like a mask and less like the TripCount. When used for tail predication, the value of EVL still depends on the current vector iteration and needs to be computed in the vector loop.

But you can also treat it as an effective vector factor and use it similarly to VectorTripCount. Introducing new nodes just to add an extra operand EVL does not look necessary

Also - depending on you target - setting EVL is relatively light weight and instructions in the vector loop may have different EVLs in the future.

In what case this can happen? I believe for unrolled loop? But it can be handled by the VPTransformState::Part and VPTransformState::set/get functions.

In D126363#3537242, @ABataev wrote:

In D126363#3537197, @simoll wrote:

In D126363#3536686, @ABataev wrote:

Hi Simon, did you think about making EVL a member of VPlan just like TripCount? In this case we might be not needed lots of these new classes.

Hi Alexey! The EVL behaves like a mask and less like the TripCount. When used for tail predication, the value of EVL still depends on the current vector iteration and needs to be computed in the vector loop.

But you can also treat it as an effective vector factor and use it similarly to VectorTripCount. Introducing new nodes just to add an extra operand EVL does not look necessary

I was looking at the comments we got earlier for the reference implementation. In particular @hahn 's comment on the EVL being loop-invariant when it's not used for tail predication.
The thing is, when EVL is used for tail predication you need to re-compute it in every vector loop iteration. I don't see how EVL could be handled like VectorTripCount in this case. Could you elaborate?

Also - depending on you target - setting EVL is relatively light weight and instructions in the vector loop may have different EVLs in the future.

In what case this can happen? I believe for unrolled loop? But it can be handled by the VPTransformState::Part and VPTransformState::set/get functions.

For unrolling/interleaving, sure. I was thinking of optimizations that compact a mask and use EVL == number-of-ones-in-the-mask to densely operate on the compressed vectors - nothing we need to concern us with for the time being.

Here is my suggestion:

We get an explicit-vector-length recipe to compute EVL inside the vector loop. And this will be the only recipe we add because..
We extend the existing recipes with an (optional) EVL operand. Presence of EVL implies that VP intrinsics are used for widening.

In D126363#3537348, @simoll wrote:

In D126363#3537242, @ABataev wrote:

In D126363#3537197, @simoll wrote:

In D126363#3536686, @ABataev wrote:

Hi Simon, did you think about making EVL a member of VPlan just like TripCount? In this case we might be not needed lots of these new classes.

Hi Alexey! The EVL behaves like a mask and less like the TripCount. When used for tail predication, the value of EVL still depends on the current vector iteration and needs to be computed in the vector loop.

But you can also treat it as an effective vector factor and use it similarly to VectorTripCount. Introducing new nodes just to add an extra operand EVL does not look necessary

I was looking at the comments we got earlier for the reference implementation. In particular @hahn 's comment on the EVL being loop-invariant when it's not used for tail predication.
The thing is, when EVL is used for tail predication you need to re-compute it in every vector loop iteration. I don't see how EVL could be handled like VectorTripCount in this case. Could you elaborate?

I have this scheme in mind:

vector.body:
  %canon.iv = phi int
  %evl = evl(%canon.iv, %vector.trip.count)
  ...
  br ...

We store the the %evl as the related value for VPValue *EVL using State.set(EVL, %evl, Part) and then get required value using State.get(EVL, Part).
In this case we can treat EVL similarly to canonical iv, which is not an invariant.

Also - depending on you target - setting EVL is relatively light weight and instructions in the vector loop may have different EVLs in the future.

In what case this can happen? I believe for unrolled loop? But it can be handled by the VPTransformState::Part and VPTransformState::set/get functions.

For unrolling/interleaving, sure. I was thinking of optimizations that compact a mask and use EVL == number-of-ones-in-the-mask to densely operate on the compressed vectors - nothing we need to concern us with for the time being.

Could elaborate how it may affect EVL value? Can we have different EVLs in the same loop? Or it is not an EVL, but some kind of transformation of the original EVL value?

Here is my suggestion:

We get an explicit-vector-length recipe to compute EVL inside the vector loop. And this will be the only recipe we add because..

We extend the existing recipes with an (optional) EVL operand. Presence of EVL implies that VP intrinsics are used for widening.

I'm afraid that it will require HUGE(!!!) amount of changes in the Vplan. I assume, there still will be the same recipe/vpvalue for EVL across of all recipes/vpinstructions.

In D126363#3537416, @ABataev wrote:
In D126363#3537348, @simoll wrote:

In D126363#3537242, @ABataev wrote:

In D126363#3537197, @simoll wrote:

In D126363#3536686, @ABataev wrote:

Hi Simon, did you think about making EVL a member of VPlan just like TripCount? In this case we might be not needed lots of these new classes.

Hi Alexey! The EVL behaves like a mask and less like the TripCount. When used for tail predication, the value of EVL still depends on the current vector iteration and needs to be computed in the vector loop.

But you can also treat it as an effective vector factor and use it similarly to VectorTripCount. Introducing new nodes just to add an extra operand EVL does not look necessary

I was looking at the comments we got earlier for the reference implementation. In particular @hahn 's comment on the EVL being loop-invariant when it's not used for tail predication.
The thing is, when EVL is used for tail predication you need to re-compute it in every vector loop iteration. I don't see how EVL could be handled like VectorTripCount in this case. Could you elaborate?

I have this scheme in mind:
vector.body:
  %canon.iv = phi int
  %evl = evl(%canon.iv, %vector.trip.count)
  ...
  br ...
We store the the %evl as the related value for VPValue *EVL using State.set(EVL, %evl, Part) and then get required value using State.get(EVL, Part).
In this case we can treat EVL similarly to canonical iv, which is not an invariant.

Ok. If that works then having one global EVL per State defined this way should be fine for us for now.

Also - depending on you target - setting EVL is relatively light weight and instructions in the vector loop may have different EVLs in the future.

In what case this can happen? I believe for unrolled loop? But it can be handled by the VPTransformState::Part and VPTransformState::set/get functions.

For unrolling/interleaving, sure. I was thinking of optimizations that compact a mask and use EVL == number-of-ones-in-the-mask to densely operate on the compressed vectors - nothing we need to concern us with for the time being.

Could elaborate how it may affect EVL value? Can we have different EVLs in the same loop? Or it is not an EVL, but some kind of transformation of the original EVL value?

Here is my suggestion:

We get an explicit-vector-length recipe to compute EVL inside the vector loop. And this will be the only recipe we add because..

We extend the existing recipes with an (optional) EVL operand. Presence of EVL implies that VP intrinsics are used for widening.

I'm afraid that it will require HUGE(!!!) amount of changes in the Vplan. I assume, there still will be the same recipe/vpvalue for EVL across of all recipes/vpinstructions.

How? I think there is some kind of misunderstanding here.
There is an existing prototype implementation that uses these three new recipes and a global EVL per vector loop.
The code for the additional recipes is small and - frankly - trivial when you know how to do it. If you use separate recipes, the existing recipes are completely unaffected by this.

What part of my suggestion makes you think that there would be huge changes? Is it the adding EVL to existing recipes?

In D126363#3537438, @simoll wrote:
In D126363#3537416, @ABataev wrote:
In D126363#3537348, @simoll wrote:

In D126363#3537242, @ABataev wrote:

In D126363#3537197, @simoll wrote:

In D126363#3536686, @ABataev wrote:

Hi Simon, did you think about making EVL a member of VPlan just like TripCount? In this case we might be not needed lots of these new classes.

Hi Alexey! The EVL behaves like a mask and less like the TripCount. When used for tail predication, the value of EVL still depends on the current vector iteration and needs to be computed in the vector loop.

But you can also treat it as an effective vector factor and use it similarly to VectorTripCount. Introducing new nodes just to add an extra operand EVL does not look necessary

I was looking at the comments we got earlier for the reference implementation. In particular @hahn 's comment on the EVL being loop-invariant when it's not used for tail predication.
The thing is, when EVL is used for tail predication you need to re-compute it in every vector loop iteration. I don't see how EVL could be handled like VectorTripCount in this case. Could you elaborate?

I have this scheme in mind:
vector.body:
  %canon.iv = phi int
  %evl = evl(%canon.iv, %vector.trip.count)
  ...
  br ...
We store the the %evl as the related value for VPValue *EVL using State.set(EVL, %evl, Part) and then get required value using State.get(EVL, Part).
In this case we can treat EVL similarly to canonical iv, which is not an invariant.
Ok. If that works then having one global EVL per State defined this way should be fine for us for now.

Yep, I think so. Maybe @fhahn could help us here and confirm that it is a good scenario?

Also - depending on you target - setting EVL is relatively light weight and instructions in the vector loop may have different EVLs in the future.

In what case this can happen? I believe for unrolled loop? But it can be handled by the VPTransformState::Part and VPTransformState::set/get functions.

For unrolling/interleaving, sure. I was thinking of optimizations that compact a mask and use EVL == number-of-ones-in-the-mask to densely operate on the compressed vectors - nothing we need to concern us with for the time being.

Could elaborate how it may affect EVL value? Can we have different EVLs in the same loop? Or it is not an EVL, but some kind of transformation of the original EVL value?

Here is my suggestion:

We get an explicit-vector-length recipe to compute EVL inside the vector loop. And this will be the only recipe we add because..

We extend the existing recipes with an (optional) EVL operand. Presence of EVL implies that VP intrinsics are used for widening.

I'm afraid that it will require HUGE(!!!) amount of changes in the Vplan. I assume, there still will be the same recipe/vpvalue for EVL across of all recipes/vpinstructions.

How? I think there is some kind of misunderstanding here.
There is an existing prototype implementation that uses these three new recipes and a global EVL per vector loop.
The code for the additional recipes is small and - frankly - trivial when you know how to do it. If you use separate recipes, the existing recipes are completely unaffected by this.

What part of my suggestion makes you think that there would be huge changes? Is it the adding EVL to existing recipes?

Yes, it is about adding EVL to existing recipes. It is not huge (sorry, forgot to edit my initial response, was thinking about adding new recipes/instructions :) ) but still does not look like we need to keep same EVL in all the recipes/instructions. Using global EVL let us just modify codegen and (maybe) a cost model.

In D126363#3537438, @simoll wrote:

In D126363#3537416, @ABataev wrote:

We store the the %evl as the related value for VPValue *EVL using State.set(EVL, %evl, Part) and then get required value using State.get(EVL, Part).
In this case we can treat EVL similarly to canonical iv, which is not an invariant.

Ok. If that works then having one global EVL per State defined this way should be fine for us for now.

The way VectorTripCount is handled at the moment is a workaround and probably shouldn't act as inspiration. AFAICT we already removed all uses during code-gen that where using State to access the vector trip count. If it is needed for code-gen of a recipe, it should be expressed as operand.

If we need to generate code to compute the EVL, it should be modeled as recipe. If the EVL depends on any other recipes (like the canonical induction), it needs to be a recipe. If all that is needed is an opcode and operands, then it should probably just be a additional opcode in VPInstruction, instead of a new recipe.

Here is my suggestion:

We get an explicit-vector-length recipe to compute EVL inside the vector loop. And this will be the only recipe we add because..

We extend the existing recipes with an (optional) EVL operand. Presence of EVL implies that VP intrinsics are used for widening.

I'm afraid that it will require HUGE(!!!) amount of changes in the Vplan. I assume, there still will be the same recipe/vpvalue for EVL across of all recipes/vpinstructions.

How? I think there is some kind of misunderstanding here.
There is an existing prototype implementation that uses these three new recipes and a global EVL per vector loop.
The code for the additional recipes is small and - frankly - trivial when you know how to do it. If you use separate recipes, the existing recipes are completely unaffected by this.

What part of my suggestion makes you think that there would be huge changes? Is it the adding EVL to existing recipes?

I think when deciding whether to add new recipes here, a key question is what the differences are to other existing recipes. IIUC the only difference between VPPredicatedWidenRecipe and VPWidenRecipe modulo the extra mask & EVL operands is during code-gen, right? But fundamentally both still widen an (arithmetic) operation. Whether some elements may be masked out shouldn't really matter for VPlan based analysis at the moment.

We already have some precedence here with the VPWidenMemoryInstructionRecipe, which has an optional mask. IIUC a VPWidenRecipe could have either an additional Mask operand or Mask & EVL, so it should not be too difficult to distinguish between the version when printing the recipe and codegen. This was something I mentioned at the earlier reviews IIRC.

I think it should also be possible to use predicated instructions without needing EVL , right? The modeling should be flexible enough to not force us to add redundant EVL operands when they are not needed IMO.

In D126363#3538612, @fhahn wrote:

In D126363#3537438, @simoll wrote:

In D126363#3537416, @ABataev wrote:

We store the the %evl as the related value for VPValue *EVL using State.set(EVL, %evl, Part) and then get required value using State.get(EVL, Part).
In this case we can treat EVL similarly to canonical iv, which is not an invariant.

Ok. If that works then having one global EVL per State defined this way should be fine for us for now.

The way VectorTripCount is handled at the moment is a workaround and probably shouldn't act as inspiration. AFAICT we already removed all uses during code-gen that where using State to access the vector trip count. If it is needed for code-gen of a recipe, it should be expressed as operand.

Better to treat EVL as CanonicalIV. Yes, I agree that the recipe is better choice here (similar to CaonicalIV) but it requires some extra work because of VPWidenIntOrFpInductionRecipe should depend on EVL. Probably, need to split VPWidenIntOrFpInductionRecipe to a PHI recipe and something like CanonicalIVIncrement, otherwise this dependency prevents it from being vectorized effectively.

If we need to generate code to compute the EVL, it should be modeled as recipe. If the EVL depends on any other recipes (like the canonical induction), it needs to be a recipe. If all that is needed is an opcode and operands, then it should probably just be a additional opcode in VPInstruction, instead of a new recipe.

Here is my suggestion:

We get an explicit-vector-length recipe to compute EVL inside the vector loop. And this will be the only recipe we add because..

We extend the existing recipes with an (optional) EVL operand. Presence of EVL implies that VP intrinsics are used for widening.

I'm afraid that it will require HUGE(!!!) amount of changes in the Vplan. I assume, there still will be the same recipe/vpvalue for EVL across of all recipes/vpinstructions.

How? I think there is some kind of misunderstanding here.
There is an existing prototype implementation that uses these three new recipes and a global EVL per vector loop.
The code for the additional recipes is small and - frankly - trivial when you know how to do it. If you use separate recipes, the existing recipes are completely unaffected by this.

What part of my suggestion makes you think that there would be huge changes? Is it the adding EVL to existing recipes?

I think when deciding whether to add new recipes here, a key question is what the differences are to other existing recipes. IIUC the only difference between VPPredicatedWidenRecipe and VPWidenRecipe modulo the extra mask & EVL operands is during code-gen, right? But fundamentally both still widen an (arithmetic) operation. Whether some elements may be masked out shouldn't really matter for VPlan based analysis at the moment.

We already have some precedence here with the VPWidenMemoryInstructionRecipe, which has an optional mask. IIUC a VPWidenRecipe could have either an additional Mask operand or Mask & EVL, so it should not be too difficult to distinguish between the version when printing the recipe and codegen. This was something I mentioned at the earlier reviews IIRC.

I rather doubt we need to handle EVL same way as Mask.

I think it should also be possible to use predicated instructions without needing EVL , right? The modeling should be flexible enough to not force us to add redundant EVL operands when they are not needed IMO.

Looks like all vp intrinsics require EVL. Some of them require previous EVL (which, I assume, can be simply a VF?).

We need to generate code to compute the EVL, its operand should be application vector length ,and EVL maybe depends on the canonical induction recipes.
For example,

for (i = 0; i < n; ++i)
  c[i] = a[i] + b[i];

VPlan should be modeled as:

n = trip count
 vector loop: {
  EMIT i = canonical induction
  EMIT avl = n - i
  EMIT evl = set.evl(avl, ...)  (generate explicit vector length)
  EMIT mask = (all true mask)
  WIDEN t0 = vp.load(a, mask, evl)
  WIDEN t1 = vp.load(b, mask, evl)
  WIDEN t2 = vp.add(t0, t1, mask, evl)
  WIDEN vp.store(t2, c, mask, evl)
  EMIT  i.next = i + evl
  EMIT  i < n    (branch-on-count)
}

EVL may be different in each vector iteration, and set.evl may also require more parameters, such as the upper limit of the vector length or the width of the data type.
Various situations show that we need to model it as recipe.
And "llvm.set.evl" intrinsic need to be added.
We may refer to RVV architecture to set some restrictions for "llvm.set.evl", like...

evl = 0 if avl = 0
evl > 0 if avl > 0
evl ≤ VF
evl ≤ avl

In D126363#3538646, @ABataev wrote:

In D126363#3538612, @fhahn wrote:

In D126363#3537438, @simoll wrote:

In D126363#3537416, @ABataev wrote:

We store the the %evl as the related value for VPValue *EVL using State.set(EVL, %evl, Part) and then get required value using State.get(EVL, Part).
In this case we can treat EVL similarly to canonical iv, which is not an invariant.

Ok. If that works then having one global EVL per State defined this way should be fine for us for now.

The way VectorTripCount is handled at the moment is a workaround and probably shouldn't act as inspiration. AFAICT we already removed all uses during code-gen that where using State to access the vector trip count. If it is needed for code-gen of a recipe, it should be expressed as operand.

Better to treat EVL as CanonicalIV. Yes, I agree that the recipe is better choice here (similar to CaonicalIV) but it requires some extra work because of VPWidenIntOrFpInductionRecipe should depend on EVL. Probably, need to split VPWidenIntOrFpInductionRecipe to a PHI recipe and something like CanonicalIVIncrement, otherwise this dependency prevents it from being vectorized effectively.

If we need to generate code to compute the EVL, it should be modeled as recipe. If the EVL depends on any other recipes (like the canonical induction), it needs to be a recipe. If all that is needed is an opcode and operands, then it should probably just be a additional opcode in VPInstruction, instead of a new recipe.

Here is my suggestion:

We get an explicit-vector-length recipe to compute EVL inside the vector loop. And this will be the only recipe we add because..

We extend the existing recipes with an (optional) EVL operand. Presence of EVL implies that VP intrinsics are used for widening.

I'm afraid that it will require HUGE(!!!) amount of changes in the Vplan. I assume, there still will be the same recipe/vpvalue for EVL across of all recipes/vpinstructions.

How? I think there is some kind of misunderstanding here.
There is an existing prototype implementation that uses these three new recipes and a global EVL per vector loop.
The code for the additional recipes is small and - frankly - trivial when you know how to do it. If you use separate recipes, the existing recipes are completely unaffected by this.

What part of my suggestion makes you think that there would be huge changes? Is it the adding EVL to existing recipes?

I think when deciding whether to add new recipes here, a key question is what the differences are to other existing recipes. IIUC the only difference between VPPredicatedWidenRecipe and VPWidenRecipe modulo the extra mask & EVL operands is during code-gen, right? But fundamentally both still widen an (arithmetic) operation. Whether some elements may be masked out shouldn't really matter for VPlan based analysis at the moment.

I suppose? These are the differences between VPPredicatedWidenRecipe and VPWidenRecipe, i see:

Mask & EVL operands. The Mask operand could be optional (which implies a constant all-one mask), EVL is mandatory.
EVL has significant performance implications on RVV and VE. If you count costing as a VPlan based analysis, then that would be one analysis in favor for having two recipe kinds.
VPPredicatedWiden* recipes always widen to VP intrinsics. VPWiden* don't.

We already have some precedence here with the VPWidenMemoryInstructionRecipe, which has an optional mask. IIUC a VPWidenRecipe could have either an additional Mask operand or Mask & EVL, so it should not be too difficult to distinguish between the version when printing the recipe and codegen. This was something I mentioned at the earlier reviews IIRC.

I rather doubt we need to handle EVL same way as Mask.

Juggling mutiple EVLs in one piece of code is not unheard of in VE/vector code. I suspect RVV programmers will eventually pull similar tricks (hardware permitting). You only need the EVL operands for that. The first VPlan-VP vector plan builder can pass in the same EVL for all of them but having EVL operands will give us the flexibility to go further.

I think it should also be possible to use predicated instructions without needing EVL , right? The modeling should be flexible enough to not force us to add redundant EVL operands when they are not needed IMO.

Looks like all vp intrinsics require EVL. Some of them require previous EVL (which, I assume, can be simply a VF?).

Yes, you can effectively disable the EVL in VP Intrinsics by passing in the VF. The current VP infrastructure recognizes that.

In D126363#3541663, @ym1813382441 wrote:
We need to generate code to compute the EVL, its operand should be application vector length ,and EVL maybe depends on the canonical induction recipes.
For example,
for (i = 0; i < n; ++i)
  c[i] = a[i] + b[i];
VPlan should be modeled as:
n = trip count
 vector loop: {
  EMIT i = canonical induction
  EMIT avl = n - i
  EMIT evl = set.evl(avl, ...)  (generate explicit vector length)
  EMIT mask = (all true mask)
  WIDEN t0 = vp.load(a, mask, evl)
  WIDEN t1 = vp.load(b, mask, evl)
  WIDEN t2 = vp.add(t0, t1, mask, evl)
  WIDEN vp.store(t2, c, mask, evl)
  EMIT  i.next = i + evl
  EMIT  i < n    (branch-on-count)
}
EVL may be different in each vector iteration, and set.evl may also require more parameters, such as the upper limit of the vector length or the width of the data type.
Various situations show that we need to model it as recipe.
And "llvm.set.evl" intrinsic need to be added.
We may refer to RVV architecture to set some restrictions for "llvm.set.evl", like...
evl = 0 if avl = 0
evl > 0 if avl > 0
evl ≤ VF
evl ≤ avl

Thanks for the example! We seem to converge on the VPWidenEVL recipe among all active participants in the discussion. To make a concrete proposal:

evl = ExplicitVectorLength(TripCount, CanonicalInduction)

That's just pulled and rephrased from your example. This should be sufficient for RVV, right?

In D126363#3542142, @simoll wrote:
In D126363#3541663, @ym1813382441 wrote:
We need to generate code to compute the EVL, its operand should be application vector length ,and EVL maybe depends on the canonical induction recipes.
For example,
for (i = 0; i < n; ++i)
  c[i] = a[i] + b[i];
VPlan should be modeled as:
n = trip count
 vector loop: {
  EMIT i = canonical induction
  EMIT avl = n - i
  EMIT evl = set.evl(avl, ...)  (generate explicit vector length)
  EMIT mask = (all true mask)
  WIDEN t0 = vp.load(a, mask, evl)
  WIDEN t1 = vp.load(b, mask, evl)
  WIDEN t2 = vp.add(t0, t1, mask, evl)
  WIDEN vp.store(t2, c, mask, evl)
  EMIT  i.next = i + evl
  EMIT  i < n    (branch-on-count)
}
EVL may be different in each vector iteration, and set.evl may also require more parameters, such as the upper limit of the vector length or the width of the data type.
Various situations show that we need to model it as recipe.
And "llvm.set.evl" intrinsic need to be added.
We may refer to RVV architecture to set some restrictions for "llvm.set.evl", like...
evl = 0 if avl = 0
evl > 0 if avl > 0
evl ≤ VF
evl ≤ avl
Thanks for the example! We seem to converge on the VPWidenEVL recipe among all active participants in the discussion. To make a concrete proposal:

evl = ExplicitVectorLength(TripCount, CanonicalInduction)

That's just pulled and rephrased from your example. This should be sufficient for RVV, right?

Shall we rename it to something like VPCanonicalEVL... instead? It is scalar.

In D126363#3542168, @ABataev wrote:
In D126363#3542142, @simoll wrote:
In D126363#3541663, @ym1813382441 wrote:
We need to generate code to compute the EVL, its operand should be application vector length ,and EVL maybe depends on the canonical induction recipes.
For example,
for (i = 0; i < n; ++i)
  c[i] = a[i] + b[i];
VPlan should be modeled as:
n = trip count
 vector loop: {
  EMIT i = canonical induction
  EMIT avl = n - i
  EMIT evl = set.evl(avl, ...)  (generate explicit vector length)
  EMIT mask = (all true mask)
  WIDEN t0 = vp.load(a, mask, evl)
  WIDEN t1 = vp.load(b, mask, evl)
  WIDEN t2 = vp.add(t0, t1, mask, evl)
  WIDEN vp.store(t2, c, mask, evl)
  EMIT  i.next = i + evl
  EMIT  i < n    (branch-on-count)
}
EVL may be different in each vector iteration, and set.evl may also require more parameters, such as the upper limit of the vector length or the width of the data type.
Various situations show that we need to model it as recipe.
And "llvm.set.evl" intrinsic need to be added.
We may refer to RVV architecture to set some restrictions for "llvm.set.evl", like...
evl = 0 if avl = 0
evl > 0 if avl > 0
evl ≤ VF
evl ≤ avl
Thanks for the example! We seem to converge on the VPWidenEVL recipe among all active participants in the discussion. To make a concrete proposal:

evl = ExplicitVectorLength(TripCount, CanonicalInduction)

That's just pulled and rephrased from your example. This should be sufficient for RVV, right?
Shall we rename it to something like VPCanonicalEVL... instead? It is scalar.

sure. "widen" does not make too much sense here.

In D126363#3538646, @ABataev wrote:

In D126363#3538612, @fhahn wrote:

In D126363#3537438, @simoll wrote:

In D126363#3537416, @ABataev wrote:

We store the the %evl as the related value for VPValue *EVL using State.set(EVL, %evl, Part) and then get required value using State.get(EVL, Part).
In this case we can treat EVL similarly to canonical iv, which is not an invariant.

Ok. If that works then having one global EVL per State defined this way should be fine for us for now.

The way VectorTripCount is handled at the moment is a workaround and probably shouldn't act as inspiration. AFAICT we already removed all uses during code-gen that where using State to access the vector trip count. If it is needed for code-gen of a recipe, it should be expressed as operand.

Better to treat EVL as CanonicalIV. Yes, I agree that the recipe is better choice here (similar to CaonicalIV) but it requires some extra work because of VPWidenIntOrFpInductionRecipe should depend on EVL. Probably, need to split VPWidenIntOrFpInductionRecipe to a PHI recipe and something like CanonicalIVIncrement, otherwise this dependency prevents it from being vectorized effectively.

Is any of the existing PHI recipes suitable for this? The increment is just an add, not sure if that needs its own recipe.

ym1813382441 mentioned this in D126625: [VPlan, VP] 1/4 Introduce new recipes to support predicated vectorization.May 29 2022, 8:14 PM

XiaPZ added a subscriber: XiaPZ.Nov 13 2022, 5:06 AM

Herald added a subscriber: • pcwang-thead. · View Herald TranscriptNov 13 2022, 5:06 AM

ping

ssinad added a subscriber: ssinad.Apr 14 2023, 7:40 PM

Revision Contents

Path

Size

llvm/

include/

llvm/

IR/

IRBuilder.h

6 lines

lib/

Transforms/

Vectorize/

VPlan.h

197 lines

VPlan.cpp

38 lines

VPlanValue.h

11 lines

Diff 431925

llvm/include/llvm/IR/IRBuilder.h

Show First 20 Lines • Show All 2,431 Lines • ▼ Show 20 Lines	public:

Value CreatePreserveUnionAccessIndex(Value Base, unsigned FieldIndex,		Value CreatePreserveUnionAccessIndex(Value Base, unsigned FieldIndex,
MDNode *DbgInfo);		MDNode *DbgInfo);

Value CreatePreserveStructAccessIndex(Type ElTy, Value *Base,		Value CreatePreserveStructAccessIndex(Type ElTy, Value *Base,
unsigned Index, unsigned FieldIndex,		unsigned Index, unsigned FieldIndex,
MDNode *DbgInfo);		MDNode *DbgInfo);

		/// Return an all true boolean vector of size and scalability \p NumElts.
		const Value *getTrueVector(ElementCount NumElts) {
		VectorType *VTy = VectorType::get(Type::getInt1Ty(Context), NumElts);
		return Constant::getAllOnesValue(VTy);
		}

private:		private:
/// Helper function that creates an assume intrinsic call that		/// Helper function that creates an assume intrinsic call that
/// represents an alignment assumption on the provided pointer \p PtrValue		/// represents an alignment assumption on the provided pointer \p PtrValue
/// with offset \p OffsetValue and alignment value \p AlignValue.		/// with offset \p OffsetValue and alignment value \p AlignValue.
CallInst *CreateAlignmentAssumptionHelper(const DataLayout &DL,		CallInst *CreateAlignmentAssumptionHelper(const DataLayout &DL,
Value PtrValue, Value AlignValue,		Value PtrValue, Value AlignValue,
Value *OffsetValue);		Value *OffsetValue);

▲ Show 20 Lines • Show All 108 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/VPlan.h

Show First 20 Lines • Show All 785 Lines • ▼ Show 20 Lines	public:
/// Returns true if the recipe may read from or write to memory.		/// Returns true if the recipe may read from or write to memory.
bool mayReadOrWriteMemory() const {		bool mayReadOrWriteMemory() const {
return mayReadFromMemory() \|\| mayWriteToMemory();		return mayReadFromMemory() \|\| mayWriteToMemory();
}		}
};		};

inline bool VPUser::classof(const VPDef *Def) {		inline bool VPUser::classof(const VPDef *Def) {
return Def->getVPDefID() == VPRecipeBase::VPInstructionSC \|\|		return Def->getVPDefID() == VPRecipeBase::VPInstructionSC \|\|
		Def->getVPDefID() == VPRecipeBase::VPPredicatedWidenSC \|\|
Def->getVPDefID() == VPRecipeBase::VPWidenSC \|\|		Def->getVPDefID() == VPRecipeBase::VPWidenSC \|\|
Def->getVPDefID() == VPRecipeBase::VPWidenCallSC \|\|		Def->getVPDefID() == VPRecipeBase::VPWidenCallSC \|\|
Def->getVPDefID() == VPRecipeBase::VPWidenSelectSC \|\|		Def->getVPDefID() == VPRecipeBase::VPWidenSelectSC \|\|
Def->getVPDefID() == VPRecipeBase::VPWidenGEPSC \|\|		Def->getVPDefID() == VPRecipeBase::VPWidenGEPSC \|\|
Def->getVPDefID() == VPRecipeBase::VPBlendSC \|\|		Def->getVPDefID() == VPRecipeBase::VPBlendSC \|\|
Def->getVPDefID() == VPRecipeBase::VPInterleaveSC \|\|		Def->getVPDefID() == VPRecipeBase::VPInterleaveSC \|\|
Def->getVPDefID() == VPRecipeBase::VPReplicateSC \|\|		Def->getVPDefID() == VPRecipeBase::VPReplicateSC \|\|
Def->getVPDefID() == VPRecipeBase::VPReductionSC \|\|		Def->getVPDefID() == VPRecipeBase::VPReductionSC \|\|
Def->getVPDefID() == VPRecipeBase::VPBranchOnMaskSC \|\|		Def->getVPDefID() == VPRecipeBase::VPBranchOnMaskSC \|\|
		Def->getVPDefID() ==
		VPRecipeBase::VPPredicatedWidenMemoryInstructionSC \|\|
Def->getVPDefID() == VPRecipeBase::VPWidenMemoryInstructionSC;		Def->getVPDefID() == VPRecipeBase::VPWidenMemoryInstructionSC;
}		}

/// This is a concrete Recipe that models a single VPlan-level instruction.		/// This is a concrete Recipe that models a single VPlan-level instruction.
/// While as any Recipe it may generate a sequence of IR instructions when		/// While as any Recipe it may generate a sequence of IR instructions when
/// executed, these instructions would always form a single-def expression as		/// executed, these instructions would always form a single-def expression as
/// the VPInstruction is also a single def-use vertex.		/// the VPInstruction is also a single def-use vertex.
class VPInstruction : public VPRecipeBase, public VPValue {		class VPInstruction : public VPRecipeBase, public VPValue {
friend class VPlanSlp;		friend class VPlanSlp;

public:		public:
/// VPlan opcodes, extending LLVM IR with idiomatics instructions.		/// VPlan opcodes, extending LLVM IR with idiomatics instructions.
enum {		enum {
FirstOrderRecurrenceSplice =		FirstOrderRecurrenceSplice =
Instruction::OtherOpsEnd + 1, // Combines the incoming and previous		Instruction::OtherOpsEnd + 1, // Combines the incoming and previous
// values of a first-order recurrence.		// values of a first-order recurrence.
Not,		Not,
ICmpULE,		ICmpULE,
SLPLoad,		SLPLoad,
SLPStore,		SLPStore,
ActiveLaneMask,		ActiveLaneMask,
		AllTrueMask,
CanonicalIVIncrement,		CanonicalIVIncrement,
CanonicalIVIncrementNUW,		CanonicalIVIncrementNUW,
BranchOnCount,		BranchOnCount,
};		};

private:		private:
typedef unsigned char OpcodeTy;		typedef unsigned char OpcodeTy;
OpcodeTy Opcode;		OpcodeTy Opcode;
▲ Show 20 Lines • Show All 108 Lines • ▼ Show 20 Lines	bool onlyFirstLaneUsed(const VPValue *Op) const override {
llvm_unreachable("switch should return");		llvm_unreachable("switch should return");
}		}
};		};

/// VPWidenRecipe is a recipe for producing a copy of vector type its		/// VPWidenRecipe is a recipe for producing a copy of vector type its
/// ingredient. This recipe covers most of the traditional vectorization cases		/// ingredient. This recipe covers most of the traditional vectorization cases
/// where each ingredient transforms into a vectorized version of itself.		/// where each ingredient transforms into a vectorized version of itself.
class VPWidenRecipe : public VPRecipeBase, public VPValue {		class VPWidenRecipe : public VPRecipeBase, public VPValue {
		protected:
		template <typename IterT>
		VPWidenRecipe(Instruction &I, iterator_range<IterT> Operands,
		const unsigned char RecipeSC, const unsigned char ValueSC)
		: VPRecipeBase(RecipeSC, Operands), VPValue(ValueSC, &I, this) {}

public:		public:
template <typename IterT>		template <typename IterT>
VPWidenRecipe(Instruction &I, iterator_range<IterT> Operands)		VPWidenRecipe(Instruction &I, iterator_range<IterT> Operands)
: VPRecipeBase(VPRecipeBase::VPWidenSC, Operands),		: VPWidenRecipe(I, Operands, VPRecipeBase::VPWidenSC,
VPValue(VPValue::VPVWidenSC, &I, this) {}		VPValue::VPVWidenSC) {}

~VPWidenRecipe() override = default;		~VPWidenRecipe() override = default;

/// Method to support type inquiry through isa, cast, and dyn_cast.		/// Method to support type inquiry through isa, cast, and dyn_cast.
static inline bool classof(const VPDef *D) {		static inline bool classof(const VPDef *D) {
return D->getVPDefID() == VPRecipeBase::VPWidenSC;		return D->getVPDefID() == VPRecipeBase::VPWidenSC;
}		}
static inline bool classof(const VPValue *V) {		static inline bool classof(const VPValue *V) {
return V->getVPValueID() == VPValue::VPVWidenSC;		return V->getVPValueID() == VPValue::VPVWidenSC;
}		}

/// Produce widened copies of all Ingredients.		/// Produce widened copies of all Ingredients.
void execute(VPTransformState &State) override;		void execute(VPTransformState &State) override;

#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)		#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)
/// Print the recipe.		/// Print the recipe.
void print(raw_ostream &O, const Twine &Indent,		void print(raw_ostream &O, const Twine &Indent,
VPSlotTracker &SlotTracker) const override;		VPSlotTracker &SlotTracker) const override;
#endif		#endif
};		};

		/// VPPredicatedWidenRecipe is a recipe for producing a copy of vector type
		/// using VP intrinsics for its ingredient. This recipe covers most of the
		/// traditional vectorization cases where each ingredient transforms into a
		/// vectorized version of itself.
		class VPPredicatedWidenRecipe : public VPWidenRecipe {
		public:
		template <typename IterT>
		VPPredicatedWidenRecipe(Instruction &I, iterator_range<IterT> Operands,
		VPValue Mask, VPValue EVL)
		: VPWidenRecipe(I, Operands, VPRecipeBase::VPPredicatedWidenSC,
		VPValue::VPVPredicatedWidenSC) {
		addOperand(Mask);
		addOperand(EVL);
		}

		~VPPredicatedWidenRecipe() override = default;

		/// Method to support type inquiry through isa, cast, and dyn_cast.
		static inline bool classof(const VPDef *D) {
		return D->getVPDefID() == VPRecipeBase::VPPredicatedWidenSC;
		}
		static inline bool classof(const VPValue *V) {
		return V->getVPValueID() == VPValue::VPVPredicatedWidenSC;
		}

		/// Return the mask used by this recipe.
		VPValue *getMask() const { return getOperand(getNumOperands() - 2); }

		/// Return the explicit vector length used by this recipe.
		VPValue *getEVL() const { return getOperand(getNumOperands() - 1); }

		/// Generate the wide load/store.
		void execute(VPTransformState &State) override;

		#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)
		/// Print the recipe.
		void print(raw_ostream &O, const Twine &Indent,
		VPSlotTracker &SlotTracker) const override;
		#endif
		};

/// A recipe for widening Call instructions.		/// A recipe for widening Call instructions.
class VPWidenCallRecipe : public VPRecipeBase, public VPValue {		class VPWidenCallRecipe : public VPRecipeBase, public VPValue {

public:		public:
template <typename IterT>		template <typename IterT>
VPWidenCallRecipe(CallInst &I, iterator_range<IterT> CallArguments)		VPWidenCallRecipe(CallInst &I, iterator_range<IterT> CallArguments)
: VPRecipeBase(VPRecipeBase::VPWidenCallSC, CallArguments),		: VPRecipeBase(VPRecipeBase::VPWidenCallSC, CallArguments),
VPValue(VPValue::VPVWidenCallSC, &I, this) {}		VPValue(VPValue::VPVWidenCallSC, &I, this) {}
▲ Show 20 Lines • Show All 695 Lines • ▼ Show 20 Lines	#endif
/// by a nullptr.		/// by a nullptr.
VPValue *getMask() const {		VPValue *getMask() const {
assert(getNumOperands() <= 1 && "should have either 0 or 1 operands");		assert(getNumOperands() <= 1 && "should have either 0 or 1 operands");
// Mask is optional.		// Mask is optional.
return getNumOperands() == 1 ? getOperand(0) : nullptr;		return getNumOperands() == 1 ? getOperand(0) : nullptr;
}		}
};		};

		/// A recipe to generate Explicit Vector Length (EVL) value to be used with
		/// VPred intrinsics.
		class VPWidenEVLRecipe : public VPRecipeBase, public VPValue {

		public:
		VPWidenEVLRecipe(VPValue IV, VPValue TC)
		: VPRecipeBase(VPRecipeBase::VPWidenEVLSC, {IV, TC}),
		VPValue(VPValue::VPVWidenEVLSC, nullptr, this) {}
		~VPWidenEVLRecipe() override = default;

		/// Return the VPValue representing EVL.
		const VPValue *getEVL() const { return this; }
		VPValue *getEVL() { return this; }

		/// Return VPValue representing Induction Variable.
		VPValue *getIV() const { return getOperand(0); }

		/// Return VPValue representing trip count.
		VPValue *getTripCount() const { return getOperand(1); }

		/// Method to support type inquiry through isa, cast, and dyn_cast.
		static inline bool classof(const VPDef *D) {
		return D->getVPDefID() == VPRecipeBase::VPWidenEVLSC;
		}

		/// Generate the instructions to compute EVL.
		void execute(VPTransformState &State) override;

		#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)
		/// Print the recipe.
		void print(raw_ostream &O, const Twine &Indent,
		VPSlotTracker &SlotTracker) const override;
		#endif
		};

/// VPPredInstPHIRecipe is a recipe for generating the phi nodes needed when		/// VPPredInstPHIRecipe is a recipe for generating the phi nodes needed when
/// control converges back from a Branch-on-Mask. The phi nodes are needed in		/// control converges back from a Branch-on-Mask. The phi nodes are needed in
/// order to merge values that are set under such a branch and feed their uses.		/// order to merge values that are set under such a branch and feed their uses.
/// The phi nodes can be scalar or vector depending on the users of the value.		/// The phi nodes can be scalar or vector depending on the users of the value.
/// This recipe works in concert with VPBranchOnMaskRecipe.		/// This recipe works in concert with VPBranchOnMaskRecipe.
class VPPredInstPHIRecipe : public VPRecipeBase, public VPValue {		class VPPredInstPHIRecipe : public VPRecipeBase, public VPValue {
public:		public:
/// Construct a VPPredInstPHIRecipe given \p PredInst whose value needs a phi		/// Construct a VPPredInstPHIRecipe given \p PredInst whose value needs a phi
Show All 27 Lines

/// A Recipe for widening load/store operations.		/// A Recipe for widening load/store operations.
/// The recipe uses the following VPValues:		/// The recipe uses the following VPValues:
/// - For load: Address, optional mask		/// - For load: Address, optional mask
/// - For store: Address, stored value, optional mask		/// - For store: Address, stored value, optional mask
/// TODO: We currently execute only per-part unless a specific instance is		/// TODO: We currently execute only per-part unless a specific instance is
/// provided.		/// provided.
class VPWidenMemoryInstructionRecipe : public VPRecipeBase {		class VPWidenMemoryInstructionRecipe : public VPRecipeBase {
		protected:
		void setMask(VPValue *Mask) {
		if (!Mask)
		return;
		addOperand(Mask);
		}

		bool isMasked() const {
		return isStore() ? getNumOperands() == 3 : getNumOperands() == 2;
		}

Instruction &Ingredient;		Instruction &Ingredient;

// Whether the loaded-from / stored-to addresses are consecutive.		// Whether the loaded-from / stored-to addresses are consecutive.
bool Consecutive;		bool Consecutive;

// Whether the consecutive loaded/stored addresses are in reverse order.		// Whether the consecutive loaded/stored addresses are in reverse order.
bool Reverse;		bool Reverse;

void setMask(VPValue *Mask) {		VPWidenMemoryInstructionRecipe(LoadInst &Load, VPValue *Addr,
if (!Mask)		const unsigned char RecipeSC,
return;		const unsigned char ValueSC, bool Consecutive,
addOperand(Mask);		bool Reverse)
		: VPRecipeBase(RecipeSC, {Addr}), Ingredient(Load),
		Consecutive(Consecutive), Reverse(Reverse) {
		assert((Consecutive \|\| !Reverse) && "Reverse implies consecutive");
		new VPValue(ValueSC, &Load, this);
}		}

bool isMasked() const {		VPWidenMemoryInstructionRecipe(StoreInst &Store, VPValue *Addr,
return isStore() ? getNumOperands() == 3 : getNumOperands() == 2;		VPValue *StoredValue,
		const unsigned char RecipeSC, bool Consecutive,
		bool Reverse)
		: VPRecipeBase(RecipeSC, {Addr, StoredValue}), Ingredient(Store),
		Consecutive(Consecutive), Reverse(Reverse) {
		assert((Consecutive \|\| !Reverse) && "Reverse implies consecutive");
}		}

public:		public:
VPWidenMemoryInstructionRecipe(LoadInst &Load, VPValue Addr, VPValue Mask,		VPWidenMemoryInstructionRecipe(LoadInst &Load, VPValue Addr, VPValue Mask,
bool Consecutive, bool Reverse)		bool Consecutive, bool Reverse)
: VPRecipeBase(VPWidenMemoryInstructionSC, {Addr}), Ingredient(Load),		: VPWidenMemoryInstructionRecipe(Load, Addr, VPWidenMemoryInstructionSC,
Consecutive(Consecutive), Reverse(Reverse) {		VPValue::VPVMemoryInstructionSC,
assert((Consecutive \|\| !Reverse) && "Reverse implies consecutive");		Consecutive, Reverse) {
new VPValue(VPValue::VPVMemoryInstructionSC, &Load, this);
setMask(Mask);		setMask(Mask);
}		}

VPWidenMemoryInstructionRecipe(StoreInst &Store, VPValue *Addr,		VPWidenMemoryInstructionRecipe(StoreInst &Store, VPValue *Addr,
VPValue StoredValue, VPValue Mask,		VPValue StoredValue, VPValue Mask,
bool Consecutive, bool Reverse)		bool Consecutive, bool Reverse)
: VPRecipeBase(VPWidenMemoryInstructionSC, {Addr, StoredValue}),		: VPWidenMemoryInstructionRecipe(Store, Addr, StoredValue,
Ingredient(Store), Consecutive(Consecutive), Reverse(Reverse) {		VPWidenMemoryInstructionSC, Consecutive,
assert((Consecutive \|\| !Reverse) && "Reverse implies consecutive");		Reverse) {
setMask(Mask);		setMask(Mask);
}		}

/// Method to support type inquiry through isa, cast, and dyn_cast.		/// Method to support type inquiry through isa, cast, and dyn_cast.
static inline bool classof(const VPDef *D) {		static inline bool classof(const VPDef *D) {
return D->getVPDefID() == VPRecipeBase::VPWidenMemoryInstructionSC;		return D->getVPDefID() == VPRecipeBase::VPWidenMemoryInstructionSC;
}		}

▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	bool onlyFirstLaneUsed(const VPValue *Op) const override {
// happen with opaque pointers.		// happen with opaque pointers.
return Op == getAddr() && isConsecutive() &&		return Op == getAddr() && isConsecutive() &&
(!isStore() \|\| Op != getStoredValue());		(!isStore() \|\| Op != getStoredValue());
}		}

Instruction &getIngredient() const { return Ingredient; }		Instruction &getIngredient() const { return Ingredient; }
};		};

		/// A Recipe for widening load/store operations to VP intrinsics.
		/// The recipe uses the following VPValues:
		/// - For load: Address, mask, EVL
		/// - For store: Address, stored value, mask, EVL
		class VPPredicatedWidenMemoryInstructionRecipe
		: public VPWidenMemoryInstructionRecipe {
		void setEVL(VPValue *EVL) {
		if (!EVL)
		return;
		addOperand(EVL);
		}

		public:
		VPPredicatedWidenMemoryInstructionRecipe(LoadInst &Load, VPValue *Addr,
		VPValue Mask, VPValue EVL,
		bool Consecutive, bool Reverse)
		: VPWidenMemoryInstructionRecipe(
		Load, Addr, VPPredicatedWidenMemoryInstructionSC,
		VPValue::VPVPredicatedMemoryInstructionSC, Consecutive, Reverse) {
		setMask(Mask);
		setEVL(EVL);
		}

		VPPredicatedWidenMemoryInstructionRecipe(StoreInst &Store, VPValue *Addr,
		VPValue StoredValue, VPValue Mask,
		VPValue *EVL, bool Consecutive,
		bool Reverse)
		: VPWidenMemoryInstructionRecipe(Store, Addr, StoredValue,
		VPPredicatedWidenMemoryInstructionSC,
		Consecutive, Reverse) {
		setMask(Mask);
		setEVL(EVL);
		}

		/// Method to support type inquiry through isa, cast, and dyn_cast.
		static inline bool classof(const VPDef *D) {
		return D->getVPDefID() ==
		VPValue::VPVPredicatedMemoryInstructionSC;
		}

		/// Return the mask used by this recipe.
		VPValue *getMask() const {
		// Mask is the second last, mandatory operand.
		return getOperand(getNumOperands() - 2);
		}

		/// Return the EVL used by this recipe.
		VPValue *getEVL() const {
		// EVL is the last, mandatory operand.
		return getOperand(getNumOperands() - 1);
		}

		/// Generate the wide load/store.
		void execute(VPTransformState &State) override;

		#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)
		/// Print the recipe.
		void print(raw_ostream &O, const Twine &Indent,
		VPSlotTracker &SlotTracker) const override;
		#endif
		};

/// Recipe to expand a SCEV expression.		/// Recipe to expand a SCEV expression.
class VPExpandSCEVRecipe : public VPRecipeBase, public VPValue {		class VPExpandSCEVRecipe : public VPRecipeBase, public VPValue {
const SCEV *Expr;		const SCEV *Expr;
ScalarEvolution &SE;		ScalarEvolution &SE;

public:		public:
VPExpandSCEVRecipe(const SCEV *Expr, ScalarEvolution &SE)		VPExpandSCEVRecipe(const SCEV *Expr, ScalarEvolution &SE)
: VPRecipeBase(VPExpandSCEVSC, {}), VPValue(nullptr, this), Expr(Expr),		: VPRecipeBase(VPExpandSCEVSC, {}), VPValue(nullptr, this), Expr(Expr),
▲ Show 20 Lines • Show All 1,262 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/VPlan.cpp

Show First 20 Lines • Show All 565 Lines • ▼ Show 20 Lines	return cast<Instruction>(getVPSingleValue()->getUnderlyingValue())
->mayWriteToMemory();		->mayWriteToMemory();
case VPBranchOnMaskSC:		case VPBranchOnMaskSC:
return false;		return false;
case VPWidenIntOrFpInductionSC:		case VPWidenIntOrFpInductionSC:
case VPWidenCanonicalIVSC:		case VPWidenCanonicalIVSC:
case VPWidenPHISC:		case VPWidenPHISC:
case VPBlendSC:		case VPBlendSC:
case VPWidenSC:		case VPWidenSC:
		case VPPredicatedWidenSC:
case VPWidenGEPSC:		case VPWidenGEPSC:
case VPReductionSC:		case VPReductionSC:
case VPWidenSelectSC: {		case VPWidenSelectSC: {
const Instruction *I =		const Instruction *I =
dyn_cast_or_null<Instruction>(getVPSingleValue()->getUnderlyingValue());		dyn_cast_or_null<Instruction>(getVPSingleValue()->getUnderlyingValue());
(void)I;		(void)I;
assert((!I \|\| !I->mayWriteToMemory()) &&		assert((!I \|\| !I->mayWriteToMemory()) &&
"underlying instruction may write to memory");		"underlying instruction may write to memory");
Show All 15 Lines	return cast<Instruction>(getVPSingleValue()->getUnderlyingValue())
->mayReadFromMemory();		->mayReadFromMemory();
case VPBranchOnMaskSC:		case VPBranchOnMaskSC:
return false;		return false;
case VPWidenIntOrFpInductionSC:		case VPWidenIntOrFpInductionSC:
case VPWidenCanonicalIVSC:		case VPWidenCanonicalIVSC:
case VPWidenPHISC:		case VPWidenPHISC:
case VPBlendSC:		case VPBlendSC:
case VPWidenSC:		case VPWidenSC:
		case VPPredicatedWidenSC:
case VPWidenGEPSC:		case VPWidenGEPSC:
case VPReductionSC:		case VPReductionSC:
case VPWidenSelectSC: {		case VPWidenSelectSC: {
const Instruction *I =		const Instruction *I =
dyn_cast_or_null<Instruction>(getVPSingleValue()->getUnderlyingValue());		dyn_cast_or_null<Instruction>(getVPSingleValue()->getUnderlyingValue());
(void)I;		(void)I;
assert((!I \|\| !I->mayReadFromMemory()) &&		assert((!I \|\| !I->mayReadFromMemory()) &&
"underlying instruction may read from memory");		"underlying instruction may read from memory");
▲ Show 20 Lines • Show All 133 Lines • ▼ Show 20 Lines	case VPInstruction::ActiveLaneMask: {
auto *Int1Ty = Type::getInt1Ty(Builder.getContext());		auto *Int1Ty = Type::getInt1Ty(Builder.getContext());
auto *PredTy = VectorType::get(Int1Ty, State.VF);		auto *PredTy = VectorType::get(Int1Ty, State.VF);
Instruction *Call = Builder.CreateIntrinsic(		Instruction *Call = Builder.CreateIntrinsic(
Intrinsic::get_active_lane_mask, {PredTy, ScalarTC->getType()},		Intrinsic::get_active_lane_mask, {PredTy, ScalarTC->getType()},
{VIVElem0, ScalarTC}, nullptr, "active.lane.mask");		{VIVElem0, ScalarTC}, nullptr, "active.lane.mask");
State.set(this, Call, Part);		State.set(this, Call, Part);
break;		break;
}		}
		case VPInstruction::AllTrueMask: {
		Value *AllTrueMask = Builder.getTrueVector(State.VF);
		State.set(this, AllTrueMask, Part);
		break;
		}
case VPInstruction::FirstOrderRecurrenceSplice: {		case VPInstruction::FirstOrderRecurrenceSplice: {
// Generate code to combine the previous and current values in vector v3.		// Generate code to combine the previous and current values in vector v3.
//		//
// vector.ph:		// vector.ph:
// v_init = vector(..., ..., ..., a[-1])		// v_init = vector(..., ..., ..., a[-1])
// br vector.body		// br vector.body
//		//
// vector.body		// vector.body
▲ Show 20 Lines • Show All 96 Lines • ▼ Show 20 Lines	case VPInstruction::SLPLoad:
O << "combined load";		O << "combined load";
break;		break;
case VPInstruction::SLPStore:		case VPInstruction::SLPStore:
O << "combined store";		O << "combined store";
break;		break;
case VPInstruction::ActiveLaneMask:		case VPInstruction::ActiveLaneMask:
O << "active lane mask";		O << "active lane mask";
break;		break;
		case VPInstruction::AllTrueMask:
		O << "all true mask";
		break;
case VPInstruction::FirstOrderRecurrenceSplice:		case VPInstruction::FirstOrderRecurrenceSplice:
O << "first-order splice";		O << "first-order splice";
break;		break;
case VPInstruction::CanonicalIVIncrement:		case VPInstruction::CanonicalIVIncrement:
O << "VF * UF + ";		O << "VF * UF + ";
break;		break;
case VPInstruction::CanonicalIVIncrementNUW:		case VPInstruction::CanonicalIVIncrementNUW:
O << "VF * UF +(nuw) ";		O << "VF * UF +(nuw) ";
▲ Show 20 Lines • Show All 437 Lines • ▼ Show 20 Lines
void VPWidenRecipe::print(raw_ostream &O, const Twine &Indent,		void VPWidenRecipe::print(raw_ostream &O, const Twine &Indent,
VPSlotTracker &SlotTracker) const {		VPSlotTracker &SlotTracker) const {
O << Indent << "WIDEN ";		O << Indent << "WIDEN ";
printAsOperand(O, SlotTracker);		printAsOperand(O, SlotTracker);
O << " = " << getUnderlyingInstr()->getOpcodeName() << " ";		O << " = " << getUnderlyingInstr()->getOpcodeName() << " ";
printOperands(O, SlotTracker);		printOperands(O, SlotTracker);
}		}

		void VPPredicatedWidenRecipe::print(raw_ostream &O, const Twine &Indent,
		VPSlotTracker &SlotTracker) const {
		O << Indent << "PREDICATED-WIDEN ";
		printAsOperand(O, SlotTracker);
		O << " = " << getUnderlyingInstr()->getOpcodeName() << " ";
		printOperands(O, SlotTracker);
		}

void VPWidenIntOrFpInductionRecipe::print(raw_ostream &O, const Twine &Indent,		void VPWidenIntOrFpInductionRecipe::print(raw_ostream &O, const Twine &Indent,
VPSlotTracker &SlotTracker) const {		VPSlotTracker &SlotTracker) const {
O << Indent << "WIDEN-INDUCTION";		O << Indent << "WIDEN-INDUCTION";
if (getTruncInst()) {		if (getTruncInst()) {
O << "\\l\"";		O << "\\l\"";
O << " +\n" << Indent << "\" " << VPlanIngredient(IV) << "\\l\"";		O << " +\n" << Indent << "\" " << VPlanIngredient(IV) << "\\l\"";
O << " +\n" << Indent << "\" ";		O << " +\n" << Indent << "\" ";
getVPValue(0)->printAsOperand(O, SlotTracker);		getVPValue(0)->printAsOperand(O, SlotTracker);
▲ Show 20 Lines • Show All 211 Lines • ▼ Show 20 Lines
void VPExpandSCEVRecipe::print(raw_ostream &O, const Twine &Indent,		void VPExpandSCEVRecipe::print(raw_ostream &O, const Twine &Indent,
VPSlotTracker &SlotTracker) const {		VPSlotTracker &SlotTracker) const {
O << Indent << "EMIT ";		O << Indent << "EMIT ";
getVPSingleValue()->printAsOperand(O, SlotTracker);		getVPSingleValue()->printAsOperand(O, SlotTracker);
O << " = EXPAND SCEV " << *Expr;		O << " = EXPAND SCEV " << *Expr;
}		}
#endif		#endif

		void VPPredicatedWidenMemoryInstructionRecipe::print(
		raw_ostream &O, const Twine &Indent, VPSlotTracker &SlotTracker) const {
		O << Indent << "PREDICATED-WIDEN ";

		if (!isStore()) {
		getVPSingleValue()->printAsOperand(O, SlotTracker);
		O << " = ";
		}
		O << Instruction::getOpcodeName(Ingredient.getOpcode()) << " ";

		printOperands(O, SlotTracker);
		}

void VPWidenCanonicalIVRecipe::execute(VPTransformState &State) {		void VPWidenCanonicalIVRecipe::execute(VPTransformState &State) {
Value *CanonicalIV = State.get(getOperand(0), 0);		Value *CanonicalIV = State.get(getOperand(0), 0);
Type *STy = CanonicalIV->getType();		Type *STy = CanonicalIV->getType();
IRBuilder<> Builder(State.CFG.PrevBB->getTerminator());		IRBuilder<> Builder(State.CFG.PrevBB->getTerminator());
ElementCount VF = State.VF;		ElementCount VF = State.VF;
Value *VStart = VF.isScalar()		Value *VStart = VF.isScalar()
? CanonicalIV		? CanonicalIV
: Builder.CreateVectorSplat(VF, CanonicalIV, "broadcast");		: Builder.CreateVectorSplat(VF, CanonicalIV, "broadcast");
▲ Show 20 Lines • Show All 126 Lines • ▼ Show 20 Lines	void VPReductionPHIRecipe::print(raw_ostream &O, const Twine &Indent,
O << Indent << "WIDEN-REDUCTION-PHI ";		O << Indent << "WIDEN-REDUCTION-PHI ";

printAsOperand(O, SlotTracker);		printAsOperand(O, SlotTracker);
O << " = phi ";		O << " = phi ";
printOperands(O, SlotTracker);		printOperands(O, SlotTracker);
}		}
#endif		#endif

		void VPWidenEVLRecipe::print(raw_ostream &O, const Twine &Indent,
		VPSlotTracker &SlotTracker) const {
		O << Indent << "EMIT ";
		getEVL()->printAsOperand(O, SlotTracker);
		O << " = GENERATE-EXPLICIT-VECTOR-LENGTH";
		}

template void DomTreeBuilder::Calculate<VPDominatorTree>(VPDominatorTree &DT);		template void DomTreeBuilder::Calculate<VPDominatorTree>(VPDominatorTree &DT);

void VPValue::replaceAllUsesWith(VPValue *New) {		void VPValue::replaceAllUsesWith(VPValue *New) {
for (unsigned J = 0; J < getNumUsers();) {		for (unsigned J = 0; J < getNumUsers();) {
VPUser *User = Users[J];		VPUser *User = Users[J];
unsigned NumUsers = getNumUsers();		unsigned NumUsers = getNumUsers();
for (unsigned I = 0, E = User->getNumOperands(); I < E; ++I)		for (unsigned I = 0, E = User->getNumOperands(); I < E; ++I)
if (User->getOperand(I) == this)		if (User->getOperand(I) == this)
▲ Show 20 Lines • Show All 122 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/VPlanValue.h

Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	class VPValue {
friend class VPDef;		friend class VPDef;
friend class VPInstruction;		friend class VPInstruction;
friend struct VPlanTransforms;		friend struct VPlanTransforms;
friend class VPBasicBlock;		friend class VPBasicBlock;
friend class VPInterleavedAccessInfo;		friend class VPInterleavedAccessInfo;
friend class VPSlotTracker;		friend class VPSlotTracker;
friend class VPRecipeBase;		friend class VPRecipeBase;
friend class VPWidenMemoryInstructionRecipe;		friend class VPWidenMemoryInstructionRecipe;
		friend class VPPredicatedWidenMemoryInstructionRecipe;

const unsigned char SubclassID; ///< Subclass identifier (for isa/dyn_cast).		const unsigned char SubclassID; ///< Subclass identifier (for isa/dyn_cast).

SmallVector<VPUser *, 1> Users;		SmallVector<VPUser *, 1> Users;

protected:		protected:
// Hold the underlying Value, if any, attached to this VPValue.		// Hold the underlying Value, if any, attached to this VPValue.
Value *UnderlyingVal;		Value *UnderlyingVal;
Show All 33 Lines	enum {
VPVReductionSC,		VPVReductionSC,
VPVReplicateSC,		VPVReplicateSC,
VPVWidenSC,		VPVWidenSC,
VPVWidenCallSC,		VPVWidenCallSC,
VPVWidenCanonicalIVSC,		VPVWidenCanonicalIVSC,
VPVWidenGEPSC,		VPVWidenGEPSC,
VPVWidenSelectSC,		VPVWidenSelectSC,

		// Vector Predication values.
		VPVPredicatedMemoryInstructionSC,
		VPVPredicatedWidenSC,
		VPVWidenEVLSC,

// Phi-like VPValues. Need to be kept together.		// Phi-like VPValues. Need to be kept together.
VPVBlendSC,		VPVBlendSC,
VPVCanonicalIVPHISC,		VPVCanonicalIVPHISC,
VPVFirstOrderRecurrencePHISC,		VPVFirstOrderRecurrencePHISC,
VPVWidenPHISC,		VPVWidenPHISC,
VPVWidenIntOrFpInductionSC,		VPVWidenIntOrFpInductionSC,
VPVWidenPointerInductionSC,		VPVWidenPointerInductionSC,
VPVPredInstPHI,		VPVPredInstPHI,
▲ Show 20 Lines • Show All 242 Lines • ▼ Show 20 Lines	using VPRecipeTy = enum {
VPScalarIVStepsSC,		VPScalarIVStepsSC,
VPWidenCallSC,		VPWidenCallSC,
VPWidenCanonicalIVSC,		VPWidenCanonicalIVSC,
VPWidenGEPSC,		VPWidenGEPSC,
VPWidenMemoryInstructionSC,		VPWidenMemoryInstructionSC,
VPWidenSC,		VPWidenSC,
VPWidenSelectSC,		VPWidenSelectSC,

		// Vector Predication recipes.
		VPPredicatedWidenMemoryInstructionSC,
		VPPredicatedWidenSC,
		VPWidenEVLSC,

// Phi-like recipes. Need to be kept together.		// Phi-like recipes. Need to be kept together.
VPBlendSC,		VPBlendSC,
VPCanonicalIVPHISC,		VPCanonicalIVPHISC,
VPFirstOrderRecurrencePHISC,		VPFirstOrderRecurrencePHISC,
VPWidenPHISC,		VPWidenPHISC,
VPWidenIntOrFpInductionSC,		VPWidenIntOrFpInductionSC,
VPWidenPointerInductionSC,		VPWidenPointerInductionSC,
VPPredInstPHISC,		VPPredInstPHISC,
▲ Show 20 Lines • Show All 94 Lines • Show Last 20 Lines