Page MenuHomePhabricator

[VPlan, VP] 1/4 Introduce new recipes to support predicated vectorization
Needs ReviewPublic

Authored by simoll on May 25 2022, 3:10 AM.

Details

Summary

This patch introduces new VPlan recipes VPWidenEVLRecipe, VPPredicatedWidenRecipe, VPPredicatedWidenMemoryInstrctionRecipe, and AllTrueMask VPInstruction as a first step to enable predicated vectorization introduced in RFC patch D99750.

co-authored-by: Simon Moll <moll@cs.uni-saarland.de>

Diff Detail

Event Timeline

simoll created this revision.May 25 2022, 3:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 25 2022, 3:10 AM
simoll requested review of this revision.May 25 2022, 3:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 25 2022, 3:10 AM

This review replaces D104608 , which I am effectively commandeering as a co-author for the four original sub-patches by Vineet Kumar.

Hi Simon, did you think about making EVL a member of VPlan just like TripCount? In this case we might be not needed lots of these new classes.

Hi Simon, did you think about making EVL a member of VPlan just like TripCount? In this case we might be not needed lots of these new classes.

Hi Alexey! The EVL behaves like a mask and less like the TripCount. When used for tail predication, the value of EVL still depends on the current vector iteration and needs to be computed in the vector loop. Also - depending on you target - setting EVL is relatively light weight and instructions in the vector loop may have different EVLs in the future.

Hi Simon, did you think about making EVL a member of VPlan just like TripCount? In this case we might be not needed lots of these new classes.

Hi Alexey! The EVL behaves like a mask and less like the TripCount. When used for tail predication, the value of EVL still depends on the current vector iteration and needs to be computed in the vector loop.

But you can also treat it as an effective vector factor and use it similarly to VectorTripCount. Introducing new nodes just to add an extra operand EVL does not look necessary

Also - depending on you target - setting EVL is relatively light weight and instructions in the vector loop may have different EVLs in the future.

In what case this can happen? I believe for unrolled loop? But it can be handled by the VPTransformState::Part and VPTransformState::set/get functions.

Hi Simon, did you think about making EVL a member of VPlan just like TripCount? In this case we might be not needed lots of these new classes.

Hi Alexey! The EVL behaves like a mask and less like the TripCount. When used for tail predication, the value of EVL still depends on the current vector iteration and needs to be computed in the vector loop.

But you can also treat it as an effective vector factor and use it similarly to VectorTripCount. Introducing new nodes just to add an extra operand EVL does not look necessary

I was looking at the comments we got earlier for the reference implementation. In particular @hahn 's comment on the EVL being loop-invariant when it's not used for tail predication.
The thing is, when EVL is used for tail predication you need to re-compute it in every vector loop iteration. I don't see how EVL could be handled like VectorTripCount in this case. Could you elaborate?

Also - depending on you target - setting EVL is relatively light weight and instructions in the vector loop may have different EVLs in the future.

In what case this can happen? I believe for unrolled loop? But it can be handled by the VPTransformState::Part and VPTransformState::set/get functions.

For unrolling/interleaving, sure. I was thinking of optimizations that compact a mask and use EVL == number-of-ones-in-the-mask to densely operate on the compressed vectors - nothing we need to concern us with for the time being.

Here is my suggestion:

  1. We get an explicit-vector-length recipe to compute EVL inside the vector loop. And this will be the only recipe we add because..
  2. We extend the existing recipes with an (optional) EVL operand. Presence of EVL implies that VP intrinsics are used for widening.

Hi Simon, did you think about making EVL a member of VPlan just like TripCount? In this case we might be not needed lots of these new classes.

Hi Alexey! The EVL behaves like a mask and less like the TripCount. When used for tail predication, the value of EVL still depends on the current vector iteration and needs to be computed in the vector loop.

But you can also treat it as an effective vector factor and use it similarly to VectorTripCount. Introducing new nodes just to add an extra operand EVL does not look necessary

I was looking at the comments we got earlier for the reference implementation. In particular @hahn 's comment on the EVL being loop-invariant when it's not used for tail predication.
The thing is, when EVL is used for tail predication you need to re-compute it in every vector loop iteration. I don't see how EVL could be handled like VectorTripCount in this case. Could you elaborate?

I have this scheme in mind:

vector.body:
  %canon.iv = phi int
  %evl = evl(%canon.iv, %vector.trip.count)
  ...
  br ...

We store the the %evl as the related value for VPValue *EVL using State.set(EVL, %evl, Part) and then get required value using State.get(EVL, Part).
In this case we can treat EVL similarly to canonical iv, which is not an invariant.

Also - depending on you target - setting EVL is relatively light weight and instructions in the vector loop may have different EVLs in the future.

In what case this can happen? I believe for unrolled loop? But it can be handled by the VPTransformState::Part and VPTransformState::set/get functions.

For unrolling/interleaving, sure. I was thinking of optimizations that compact a mask and use EVL == number-of-ones-in-the-mask to densely operate on the compressed vectors - nothing we need to concern us with for the time being.

Could elaborate how it may affect EVL value? Can we have different EVLs in the same loop? Or it is not an EVL, but some kind of transformation of the original EVL value?

Here is my suggestion:

  1. We get an explicit-vector-length recipe to compute EVL inside the vector loop. And this will be the only recipe we add because..
  2. We extend the existing recipes with an (optional) EVL operand. Presence of EVL implies that VP intrinsics are used for widening.

I'm afraid that it will require HUGE(!!!) amount of changes in the Vplan. I assume, there still will be the same recipe/vpvalue for EVL across of all recipes/vpinstructions.

Hi Simon, did you think about making EVL a member of VPlan just like TripCount? In this case we might be not needed lots of these new classes.

Hi Alexey! The EVL behaves like a mask and less like the TripCount. When used for tail predication, the value of EVL still depends on the current vector iteration and needs to be computed in the vector loop.

But you can also treat it as an effective vector factor and use it similarly to VectorTripCount. Introducing new nodes just to add an extra operand EVL does not look necessary

I was looking at the comments we got earlier for the reference implementation. In particular @hahn 's comment on the EVL being loop-invariant when it's not used for tail predication.
The thing is, when EVL is used for tail predication you need to re-compute it in every vector loop iteration. I don't see how EVL could be handled like VectorTripCount in this case. Could you elaborate?

I have this scheme in mind:

vector.body:
  %canon.iv = phi int
  %evl = evl(%canon.iv, %vector.trip.count)
  ...
  br ...

We store the the %evl as the related value for VPValue *EVL using State.set(EVL, %evl, Part) and then get required value using State.get(EVL, Part).
In this case we can treat EVL similarly to canonical iv, which is not an invariant.

Ok. If that works then having one global EVL per State defined this way should be fine for us for now.

Also - depending on you target - setting EVL is relatively light weight and instructions in the vector loop may have different EVLs in the future.

In what case this can happen? I believe for unrolled loop? But it can be handled by the VPTransformState::Part and VPTransformState::set/get functions.

For unrolling/interleaving, sure. I was thinking of optimizations that compact a mask and use EVL == number-of-ones-in-the-mask to densely operate on the compressed vectors - nothing we need to concern us with for the time being.

Could elaborate how it may affect EVL value? Can we have different EVLs in the same loop? Or it is not an EVL, but some kind of transformation of the original EVL value?

Here is my suggestion:

  1. We get an explicit-vector-length recipe to compute EVL inside the vector loop. And this will be the only recipe we add because..
  2. We extend the existing recipes with an (optional) EVL operand. Presence of EVL implies that VP intrinsics are used for widening.

I'm afraid that it will require HUGE(!!!) amount of changes in the Vplan. I assume, there still will be the same recipe/vpvalue for EVL across of all recipes/vpinstructions.

How? I think there is some kind of misunderstanding here.
There is an existing prototype implementation that uses these three new recipes and a global EVL per vector loop.
The code for the additional recipes is small and - frankly - trivial when you know how to do it. If you use separate recipes, the existing recipes are completely unaffected by this.

What part of my suggestion makes you think that there would be huge changes? Is it the adding EVL to existing recipes?

Hi Simon, did you think about making EVL a member of VPlan just like TripCount? In this case we might be not needed lots of these new classes.

Hi Alexey! The EVL behaves like a mask and less like the TripCount. When used for tail predication, the value of EVL still depends on the current vector iteration and needs to be computed in the vector loop.

But you can also treat it as an effective vector factor and use it similarly to VectorTripCount. Introducing new nodes just to add an extra operand EVL does not look necessary

I was looking at the comments we got earlier for the reference implementation. In particular @hahn 's comment on the EVL being loop-invariant when it's not used for tail predication.
The thing is, when EVL is used for tail predication you need to re-compute it in every vector loop iteration. I don't see how EVL could be handled like VectorTripCount in this case. Could you elaborate?

I have this scheme in mind:

vector.body:
  %canon.iv = phi int
  %evl = evl(%canon.iv, %vector.trip.count)
  ...
  br ...

We store the the %evl as the related value for VPValue *EVL using State.set(EVL, %evl, Part) and then get required value using State.get(EVL, Part).
In this case we can treat EVL similarly to canonical iv, which is not an invariant.

Ok. If that works then having one global EVL per State defined this way should be fine for us for now.

Yep, I think so. Maybe @fhahn could help us here and confirm that it is a good scenario?

Also - depending on you target - setting EVL is relatively light weight and instructions in the vector loop may have different EVLs in the future.

In what case this can happen? I believe for unrolled loop? But it can be handled by the VPTransformState::Part and VPTransformState::set/get functions.

For unrolling/interleaving, sure. I was thinking of optimizations that compact a mask and use EVL == number-of-ones-in-the-mask to densely operate on the compressed vectors - nothing we need to concern us with for the time being.

Could elaborate how it may affect EVL value? Can we have different EVLs in the same loop? Or it is not an EVL, but some kind of transformation of the original EVL value?

Here is my suggestion:

  1. We get an explicit-vector-length recipe to compute EVL inside the vector loop. And this will be the only recipe we add because..
  2. We extend the existing recipes with an (optional) EVL operand. Presence of EVL implies that VP intrinsics are used for widening.

I'm afraid that it will require HUGE(!!!) amount of changes in the Vplan. I assume, there still will be the same recipe/vpvalue for EVL across of all recipes/vpinstructions.

How? I think there is some kind of misunderstanding here.
There is an existing prototype implementation that uses these three new recipes and a global EVL per vector loop.
The code for the additional recipes is small and - frankly - trivial when you know how to do it. If you use separate recipes, the existing recipes are completely unaffected by this.

What part of my suggestion makes you think that there would be huge changes? Is it the adding EVL to existing recipes?

Yes, it is about adding EVL to existing recipes. It is not huge (sorry, forgot to edit my initial response, was thinking about adding new recipes/instructions :) ) but still does not look like we need to keep same EVL in all the recipes/instructions. Using global EVL let us just modify codegen and (maybe) a cost model.

We store the the %evl as the related value for VPValue *EVL using State.set(EVL, %evl, Part) and then get required value using State.get(EVL, Part).
In this case we can treat EVL similarly to canonical iv, which is not an invariant.

Ok. If that works then having one global EVL per State defined this way should be fine for us for now.

The way VectorTripCount is handled at the moment is a workaround and probably shouldn't act as inspiration. AFAICT we already removed all uses during code-gen that where using State to access the vector trip count. If it is needed for code-gen of a recipe, it should be expressed as operand.

If we need to generate code to compute the EVL, it should be modeled as recipe. If the EVL depends on any other recipes (like the canonical induction), it needs to be a recipe. If all that is needed is an opcode and operands, then it should probably just be a additional opcode in VPInstruction, instead of a new recipe.

Here is my suggestion:

  1. We get an explicit-vector-length recipe to compute EVL inside the vector loop. And this will be the only recipe we add because..
  2. We extend the existing recipes with an (optional) EVL operand. Presence of EVL implies that VP intrinsics are used for widening.

I'm afraid that it will require HUGE(!!!) amount of changes in the Vplan. I assume, there still will be the same recipe/vpvalue for EVL across of all recipes/vpinstructions.

How? I think there is some kind of misunderstanding here.
There is an existing prototype implementation that uses these three new recipes and a global EVL per vector loop.
The code for the additional recipes is small and - frankly - trivial when you know how to do it. If you use separate recipes, the existing recipes are completely unaffected by this.

What part of my suggestion makes you think that there would be huge changes? Is it the adding EVL to existing recipes?

I think when deciding whether to add new recipes here, a key question is what the differences are to other existing recipes. IIUC the only difference between VPPredicatedWidenRecipe and VPWidenRecipe modulo the extra mask & EVL operands is during code-gen, right? But fundamentally both still widen an (arithmetic) operation. Whether some elements may be masked out shouldn't really matter for VPlan based analysis at the moment.

We already have some precedence here with the VPWidenMemoryInstructionRecipe, which has an optional mask. IIUC a VPWidenRecipe could have either an additional Mask operand or Mask & EVL, so it should not be too difficult to distinguish between the version when printing the recipe and codegen. This was something I mentioned at the earlier reviews IIRC.

I think it should also be possible to use predicated instructions without needing EVL , right? The modeling should be flexible enough to not force us to add redundant EVL operands when they are not needed IMO.

We store the the %evl as the related value for VPValue *EVL using State.set(EVL, %evl, Part) and then get required value using State.get(EVL, Part).
In this case we can treat EVL similarly to canonical iv, which is not an invariant.

Ok. If that works then having one global EVL per State defined this way should be fine for us for now.

The way VectorTripCount is handled at the moment is a workaround and probably shouldn't act as inspiration. AFAICT we already removed all uses during code-gen that where using State to access the vector trip count. If it is needed for code-gen of a recipe, it should be expressed as operand.

Better to treat EVL as CanonicalIV. Yes, I agree that the recipe is better choice here (similar to CaonicalIV) but it requires some extra work because of VPWidenIntOrFpInductionRecipe should depend on EVL. Probably, need to split VPWidenIntOrFpInductionRecipe to a PHI recipe and something like CanonicalIVIncrement, otherwise this dependency prevents it from being vectorized effectively.

If we need to generate code to compute the EVL, it should be modeled as recipe. If the EVL depends on any other recipes (like the canonical induction), it needs to be a recipe. If all that is needed is an opcode and operands, then it should probably just be a additional opcode in VPInstruction, instead of a new recipe.

Here is my suggestion:

  1. We get an explicit-vector-length recipe to compute EVL inside the vector loop. And this will be the only recipe we add because..
  2. We extend the existing recipes with an (optional) EVL operand. Presence of EVL implies that VP intrinsics are used for widening.

I'm afraid that it will require HUGE(!!!) amount of changes in the Vplan. I assume, there still will be the same recipe/vpvalue for EVL across of all recipes/vpinstructions.

How? I think there is some kind of misunderstanding here.
There is an existing prototype implementation that uses these three new recipes and a global EVL per vector loop.
The code for the additional recipes is small and - frankly - trivial when you know how to do it. If you use separate recipes, the existing recipes are completely unaffected by this.

What part of my suggestion makes you think that there would be huge changes? Is it the adding EVL to existing recipes?

I think when deciding whether to add new recipes here, a key question is what the differences are to other existing recipes. IIUC the only difference between VPPredicatedWidenRecipe and VPWidenRecipe modulo the extra mask & EVL operands is during code-gen, right? But fundamentally both still widen an (arithmetic) operation. Whether some elements may be masked out shouldn't really matter for VPlan based analysis at the moment.

We already have some precedence here with the VPWidenMemoryInstructionRecipe, which has an optional mask. IIUC a VPWidenRecipe could have either an additional Mask operand or Mask & EVL, so it should not be too difficult to distinguish between the version when printing the recipe and codegen. This was something I mentioned at the earlier reviews IIRC.

I rather doubt we need to handle EVL same way as Mask.

I think it should also be possible to use predicated instructions without needing EVL , right? The modeling should be flexible enough to not force us to add redundant EVL operands when they are not needed IMO.

Looks like all vp intrinsics require EVL. Some of them require previous EVL (which, I assume, can be simply a VF?).

We need to generate code to compute the EVL, its operand should be application vector length ,and EVL maybe depends on the canonical induction recipes.
For example,

for (i = 0; i < n; ++i)
  c[i] = a[i] + b[i];

VPlan should be modeled as:

n = trip count
 vector loop: {
  EMIT i = canonical induction
  EMIT avl = n - i
  EMIT evl = set.evl(avl, ...)  (generate explicit vector length)
  EMIT mask = (all true mask)
  WIDEN t0 = vp.load(a, mask, evl)
  WIDEN t1 = vp.load(b, mask, evl)
  WIDEN t2 = vp.add(t0, t1, mask, evl)
  WIDEN vp.store(t2, c, mask, evl)
  EMIT  i.next = i + evl
  EMIT  i < n    (branch-on-count)
}

EVL may be different in each vector iteration, and set.evl may also require more parameters, such as the upper limit of the vector length or the width of the data type.
Various situations show that we need to model it as recipe.
And "llvm.set.evl" intrinsic need to be added.
We may refer to RVV architecture to set some restrictions for "llvm.set.evl", like...

evl = 0 if avl = 0
evl > 0 if avl > 0
evl ≤ VF
evl ≤ avl

We store the the %evl as the related value for VPValue *EVL using State.set(EVL, %evl, Part) and then get required value using State.get(EVL, Part).
In this case we can treat EVL similarly to canonical iv, which is not an invariant.

Ok. If that works then having one global EVL per State defined this way should be fine for us for now.

The way VectorTripCount is handled at the moment is a workaround and probably shouldn't act as inspiration. AFAICT we already removed all uses during code-gen that where using State to access the vector trip count. If it is needed for code-gen of a recipe, it should be expressed as operand.

Better to treat EVL as CanonicalIV. Yes, I agree that the recipe is better choice here (similar to CaonicalIV) but it requires some extra work because of VPWidenIntOrFpInductionRecipe should depend on EVL. Probably, need to split VPWidenIntOrFpInductionRecipe to a PHI recipe and something like CanonicalIVIncrement, otherwise this dependency prevents it from being vectorized effectively.

If we need to generate code to compute the EVL, it should be modeled as recipe. If the EVL depends on any other recipes (like the canonical induction), it needs to be a recipe. If all that is needed is an opcode and operands, then it should probably just be a additional opcode in VPInstruction, instead of a new recipe.

Here is my suggestion:

  1. We get an explicit-vector-length recipe to compute EVL inside the vector loop. And this will be the only recipe we add because..
  2. We extend the existing recipes with an (optional) EVL operand. Presence of EVL implies that VP intrinsics are used for widening.

I'm afraid that it will require HUGE(!!!) amount of changes in the Vplan. I assume, there still will be the same recipe/vpvalue for EVL across of all recipes/vpinstructions.

How? I think there is some kind of misunderstanding here.
There is an existing prototype implementation that uses these three new recipes and a global EVL per vector loop.
The code for the additional recipes is small and - frankly - trivial when you know how to do it. If you use separate recipes, the existing recipes are completely unaffected by this.

What part of my suggestion makes you think that there would be huge changes? Is it the adding EVL to existing recipes?

I think when deciding whether to add new recipes here, a key question is what the differences are to other existing recipes. IIUC the only difference between VPPredicatedWidenRecipe and VPWidenRecipe modulo the extra mask & EVL operands is during code-gen, right? But fundamentally both still widen an (arithmetic) operation. Whether some elements may be masked out shouldn't really matter for VPlan based analysis at the moment.

I suppose? These are the differences between VPPredicatedWidenRecipe and VPWidenRecipe, i see:

  • Mask & EVL operands. The Mask operand could be optional (which implies a constant all-one mask), EVL is mandatory.
  • EVL has significant performance implications on RVV and VE. If you count costing as a VPlan based analysis, then that would be one analysis in favor for having two recipe kinds.
  • VPPredicatedWiden* recipes always widen to VP intrinsics. VPWiden* don't.

We already have some precedence here with the VPWidenMemoryInstructionRecipe, which has an optional mask. IIUC a VPWidenRecipe could have either an additional Mask operand or Mask & EVL, so it should not be too difficult to distinguish between the version when printing the recipe and codegen. This was something I mentioned at the earlier reviews IIRC.

I rather doubt we need to handle EVL same way as Mask.

Juggling mutiple EVLs in one piece of code is not unheard of in VE/vector code. I suspect RVV programmers will eventually pull similar tricks (hardware permitting). You only need the EVL operands for that. The first VPlan-VP vector plan builder can pass in the same EVL for all of them but having EVL operands will give us the flexibility to go further.

I think it should also be possible to use predicated instructions without needing EVL , right? The modeling should be flexible enough to not force us to add redundant EVL operands when they are not needed IMO.

Looks like all vp intrinsics require EVL. Some of them require previous EVL (which, I assume, can be simply a VF?).

Yes, you can effectively disable the EVL in VP Intrinsics by passing in the VF. The current VP infrastructure recognizes that.

We need to generate code to compute the EVL, its operand should be application vector length ,and EVL maybe depends on the canonical induction recipes.
For example,

for (i = 0; i < n; ++i)
  c[i] = a[i] + b[i];

VPlan should be modeled as:

n = trip count
 vector loop: {
  EMIT i = canonical induction
  EMIT avl = n - i
  EMIT evl = set.evl(avl, ...)  (generate explicit vector length)
  EMIT mask = (all true mask)
  WIDEN t0 = vp.load(a, mask, evl)
  WIDEN t1 = vp.load(b, mask, evl)
  WIDEN t2 = vp.add(t0, t1, mask, evl)
  WIDEN vp.store(t2, c, mask, evl)
  EMIT  i.next = i + evl
  EMIT  i < n    (branch-on-count)
}

EVL may be different in each vector iteration, and set.evl may also require more parameters, such as the upper limit of the vector length or the width of the data type.
Various situations show that we need to model it as recipe.
And "llvm.set.evl" intrinsic need to be added.
We may refer to RVV architecture to set some restrictions for "llvm.set.evl", like...

evl = 0 if avl = 0
evl > 0 if avl > 0
evl ≤ VF
evl ≤ avl

Thanks for the example! We seem to converge on the VPWidenEVL recipe among all active participants in the discussion. To make a concrete proposal:

evl = ExplicitVectorLength(TripCount, CanonicalInduction)

That's just pulled and rephrased from your example. This should be sufficient for RVV, right?

We need to generate code to compute the EVL, its operand should be application vector length ,and EVL maybe depends on the canonical induction recipes.
For example,

for (i = 0; i < n; ++i)
  c[i] = a[i] + b[i];

VPlan should be modeled as:

n = trip count
 vector loop: {
  EMIT i = canonical induction
  EMIT avl = n - i
  EMIT evl = set.evl(avl, ...)  (generate explicit vector length)
  EMIT mask = (all true mask)
  WIDEN t0 = vp.load(a, mask, evl)
  WIDEN t1 = vp.load(b, mask, evl)
  WIDEN t2 = vp.add(t0, t1, mask, evl)
  WIDEN vp.store(t2, c, mask, evl)
  EMIT  i.next = i + evl
  EMIT  i < n    (branch-on-count)
}

EVL may be different in each vector iteration, and set.evl may also require more parameters, such as the upper limit of the vector length or the width of the data type.
Various situations show that we need to model it as recipe.
And "llvm.set.evl" intrinsic need to be added.
We may refer to RVV architecture to set some restrictions for "llvm.set.evl", like...

evl = 0 if avl = 0
evl > 0 if avl > 0
evl ≤ VF
evl ≤ avl

Thanks for the example! We seem to converge on the VPWidenEVL recipe among all active participants in the discussion. To make a concrete proposal:

evl = ExplicitVectorLength(TripCount, CanonicalInduction)

That's just pulled and rephrased from your example. This should be sufficient for RVV, right?

Shall we rename it to something like VPCanonicalEVL... instead? It is scalar.

We need to generate code to compute the EVL, its operand should be application vector length ,and EVL maybe depends on the canonical induction recipes.
For example,

for (i = 0; i < n; ++i)
  c[i] = a[i] + b[i];

VPlan should be modeled as:

n = trip count
 vector loop: {
  EMIT i = canonical induction
  EMIT avl = n - i
  EMIT evl = set.evl(avl, ...)  (generate explicit vector length)
  EMIT mask = (all true mask)
  WIDEN t0 = vp.load(a, mask, evl)
  WIDEN t1 = vp.load(b, mask, evl)
  WIDEN t2 = vp.add(t0, t1, mask, evl)
  WIDEN vp.store(t2, c, mask, evl)
  EMIT  i.next = i + evl
  EMIT  i < n    (branch-on-count)
}

EVL may be different in each vector iteration, and set.evl may also require more parameters, such as the upper limit of the vector length or the width of the data type.
Various situations show that we need to model it as recipe.
And "llvm.set.evl" intrinsic need to be added.
We may refer to RVV architecture to set some restrictions for "llvm.set.evl", like...

evl = 0 if avl = 0
evl > 0 if avl > 0
evl ≤ VF
evl ≤ avl

Thanks for the example! We seem to converge on the VPWidenEVL recipe among all active participants in the discussion. To make a concrete proposal:

evl = ExplicitVectorLength(TripCount, CanonicalInduction)

That's just pulled and rephrased from your example. This should be sufficient for RVV, right?

Shall we rename it to something like VPCanonicalEVL... instead? It is scalar.

sure. "widen" does not make too much sense here.

We store the the %evl as the related value for VPValue *EVL using State.set(EVL, %evl, Part) and then get required value using State.get(EVL, Part).
In this case we can treat EVL similarly to canonical iv, which is not an invariant.

Ok. If that works then having one global EVL per State defined this way should be fine for us for now.

The way VectorTripCount is handled at the moment is a workaround and probably shouldn't act as inspiration. AFAICT we already removed all uses during code-gen that where using State to access the vector trip count. If it is needed for code-gen of a recipe, it should be expressed as operand.

Better to treat EVL as CanonicalIV. Yes, I agree that the recipe is better choice here (similar to CaonicalIV) but it requires some extra work because of VPWidenIntOrFpInductionRecipe should depend on EVL. Probably, need to split VPWidenIntOrFpInductionRecipe to a PHI recipe and something like CanonicalIVIncrement, otherwise this dependency prevents it from being vectorized effectively.

Is any of the existing PHI recipes suitable for this? The increment is just an add, not sure if that needs its own recipe.