This is an archive of the discontinued LLVM Phabricator instance.

[VPlan] Extend VPValue to also model sub- & 'virtual' values.
AbandonedPublic

Authored by fhahn on Sep 27 2020, 9:16 AM.

Details

Reviewers
Ayal
rengolin
gilr
Summary

This patch extends VPValue to model 3 different kinds of VPValues:

  1. Concrete VPValues Concrete VPValues are either live-ins coming from IR or instructions/ recipes in VPlan which produce a single value. They are the most common kind.
  2. Sub VPValues Sub-VPValues are result values from instructions/recipes in VPlan that produce multiple values. They contain a reference to the producing 'virtual' VPValue.
  3. Virtual VPValues Virtual VPValues are used to model instructions/recipes that either produce multiple subvalues or no values at all. A virtual VPValue does not refer to a concrete value, which means it cannot be used like concrete or subvalues. For example, they cannot be used as operands. They can be used to traverse the def-use chains upwards. They also provide convenient access to all users of all sub-values of the producer.

Most existing recipes will be concrete VPValues (e.g. VPInstruction,
VPWidenRecipe & so on).Sub-VPValues can be used to model multiple result
values for VPInterleaveRecipe. VPInterleaveRecipe itself is a 'virtual'
VPValue, which allows for convenient traversal of the def-use chains.

The main advantage of handling everything in a VPValue over introducing
a new sub-class for sub-values (D87752) is that it slightly simplifies
things further down the road, by turning VPRecipeBase itself directly
into a VPValue. This simplifies things, by removing the duplicated
VPRecipeBaseID.

Following on this patch, we could make VPRecipeBase inherit from VPValue (D88379)
and VPUser (D88378), which should allow full traversal. Note that for the last 2 patches,
we should probably migrate the remaining recipes to manage operands using VPUser and
turn them into VPValues individually.

Diff Detail

Event Timeline

fhahn created this revision.Sep 27 2020, 9:16 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 27 2020, 9:16 AM
fhahn requested review of this revision.Sep 27 2020, 9:16 AM
fhahn edited the summary of this revision. (Show Details)Sep 27 2020, 9:28 AM
dmgreen added a subscriber: dmgreen.Oct 6 2020, 7:01 AM
dmgreen added inline comments.
llvm/lib/Transforms/Vectorize/VPlanValue.h
55

"no values at all" could be concrete VPValues, just with void return type.

116

When would VPVirtualValueSC be used?

117

Would a VPMultiValueSC ever be used, or would it always be a VPVInterleaveSC? (Or whatever other recipe type it became)

144–145

The defining recipe would be another user of the value? The sounds like it would complicate the number of uses. When would it be useful to store this in both places?

203

Could this function just be checking for a new type of SubclassID?

fhahn added inline comments.Oct 6 2020, 8:09 AM
llvm/lib/Transforms/Vectorize/VPlanValue.h
55

If we have recipes that exclusively have void return types, 'virtual' values could indeed be used. The problem with the current recipes unfortunately is that for example VPWidenMemoryInstructionRecipe combines both load & store instructions and they share the same SubclassID. Same for others, like VPWidenCall. Not sure if we would create plain VPValues with void return type in practice.

116

This is only used for testing. Alternatively I could just keep VPInterleaveSC in this patch and use that instead.

117

This is left over from a previous version. I'll remove it.

122

This should be part of D84684

144–145

The defining recipe would be another user of the value?

As is yes. The main advantage is that this would allow clients to traverse the def-use chains without necessarily needing to account for 'virtual' values directly. A cleaner alternative would be to account for that when the client asks for the list of users and combine the users of all sub-values for virtual values on demand.

fhahn updated this revision to Diff 296489.Oct 6 2020, 9:49 AM

Streamline handling of underlying value/defining VPValue, use union & checking the SubclassID instead of using PointerSumType. Should be much simpler now.

fhahn added a comment.Nov 1 2020, 10:54 AM

I went ahead and also implemented an alternative approach that is along the lines of the first iteration, using VPMultiDef. Instead of extending VPValue it introduces a new VPDef class that mirrors VPuser but for defined values. This means VPValue and other parts end up being simpler, at the cost that VPValues cannot be dyn_casted to recipes directly. Instead, you have to get the VPDef for a VPValue first and cast that to a recipe. Overall this approach seems a bit simpler than making VPValue more complicated as in the this series.

The interesting patches are D90558, D90564, D90562 and D90565 (which changes VPInstruction to using VPDef and gives an idea into what changes are required to operate on VPDef).

Hello Florian. Sorry for the delay. I believe that the design of something is the most important part to get right - and the hardest part to change. I've been trying to take a look at these and the other new patches but I'm not sure I understand yet, likely because I have not had to deal with the complexities it is trying to address. I think my head-canon is simpler than yours, perhaps too simple! It mostly just thinks of things the same as IR instructions, which I like because it's familiar and dependable.

As far as I understand, correct me where I'm wrong:

  • Most instruction produce a single value. These are simple enough to deal with, we use a VPValue.
  • Most instructions also consume values, for which we use VPUser.
  • The complexity comes from interleaving groups which might need to deal with multiple values.
  • Interleaving stores do not produce multiple values, but do have multiple operands (?)
  • Interleaving loads do need to produce multiple values, in some way.
    • They can either use something like this built into the VPValue class (which I kept getting stuck on having two different type systems in VPValue).
    • They can have the Def/MultiDef from D90558 etal (which looks cleaner than the first version I was trying to look at last week, but still involves a lot of little nodes).
    • They could hold a vector of "Members" that are VPUsers and join those together with operands. The complexity gets moved into the VPInterleaveRecipe class and nothing is needed in VPValue/etc

As you might have guessed, I still quite like the last option. But that might well just be because there are somethings about it that I haven't had to deal with. I like how it mirrors the llvm-ir though, it to me seems simple and familiar.

Some other random questions:

  • Do you know how Store Interleaving recipes should be handled?
  • Do you think that vplan nodes will eventually need types? (From looking at some things I think the answer is probably yes).
  • Should we split VPInterleaveRecipe (and maybe VPWidenMemoryInstructionRecipe) into different load and store recipes?
fhahn added a comment.Nov 9 2020, 2:39 AM

Hello Florian. Sorry for the delay. I believe that the design of something is the most important part to get right - and the hardest part to change. I've been trying to take a look at these and the other new patches but I'm not sure I understand yet, likely because I have not had to deal with the complexities it is trying to address. I think my head-canon is simpler than yours, perhaps too simple! It mostly just thinks of things the same as IR instructions, which I like because it's familiar and dependable.

Thanks for taking a look. I realize those are a lot of changes. I think the points below are spot on, I tried to expand on a few.

As far as I understand, correct me where I'm wrong:

  • Most instruction produce a single value. These are simple enough to deal with, we use a VPValue.
  • Most instructions also consume values, for which we use VPUser.
  • The complexity comes from interleaving groups which might need to deal with multiple values.

Yes, interleave groups are the only example in the current code. But there very likely will be additional ones in the future. Another example that Ayal mentioned some time ago is modeling sincos for which various vector libraries like Accelerate (https://developer.apple.com/documentation/accelerate/vforce/3241297-sincos) or in Intel's SVML provide tuned implementations, which return separate vectors with the sin and cos results.

One benefit of modeling the fact that we can have recipes that define multiple values is that VPlan based analysis/transformations need and can account for this scenario in general. This would hopefully ensure the code is written in a way to safely handle future additions of 'multi-defs'. If we just special case the interleave case, this likely is not the case and introducing new multi-defs in the future will be more painful.

Also there's a trend towards more specialized vector instruction sets and I expect more specialized candidates for multi-defs to pop up there as well. Unfortunately some of those are not public, but I think modeling the multi-def case in general should ensure VPlan can be used for such specialized cases downstream without to much pain as well in the future.

  • Interleaving stores do not produce multiple values, but do have multiple operands (?)

yes, some recipes like VPWidenMemoryInstructionRecpipe, VPWidenCallRecipe or VPInterleaveRecipe may not actually define a VPValue, depending on the contents. That makes turning them into VPValues a little awkward, so those are probably likely candidates to break-down further in the future (see response further below as well)

  • Interleaving loads do need to produce multiple values, in some way.
    • They can either use something like this built into the VPValue class (which I kept getting stuck on having two different type systems in VPValue).
    • They can have the Def/MultiDef from D90558 etal (which looks cleaner than the first version I was trying to look at last week, but still involves a lot of little nodes).
    • They could hold a vector of "Members" that are VPUsers and join those together with operands. The complexity gets moved into the VPInterleaveRecipe class and nothing is needed in VPValue/etc

IIUC this is the same idea as the VPDef approach (D90558), where VPDef adds such a vector, but in a general fashion so we do not need to special case VPInterleaveRecipe. I think this vector has to hold VPValues in some form. I am not entirely sure how the VPUser fits into this vector, as I think the whole interleave group shares a single address as operand.

Note that D90558 & co should be simpler than the originally proposed MultiDef, which was spread out over multiple classes. The latest patches essentially just add the VPDef class, which just contains a vector of 'defined' VPValues. In the end, all recipes should inherit from it and VPDef can be folded directly into VPRecipeBase. So it should boil down to adding a single vector to VPRecipeBase.

It also requires a change to VPValue to allow getting the VPDef (effectively the recipe) that defines the value. I think there's no way around that, even if we just change VPInterleaveRecipe to keep a vector of VPValues it defines. To walk upwards the def-use chains, we need to be able to get the recipe/VPDef that defines a VPValue. In the single def cases, we can just cast the VPValue directly to the corresponding recipe, but that's not possible for VPValues defined by a 'multi-def'. Once we have to do that, I think we might as well add a general way to express that a recipe can define multiple values.

As you might have guessed, I still quite like the last option. But that might well just be because there are somethings about it that I haven't had to deal with. I like how it mirrors the llvm-ir though, it to me seems simple and familiar.

Some other random questions:

  • Do you know how Store Interleaving recipes should be handled?

(See below)

  • Do you think that vplan nodes will eventually need types? (From looking at some things I think the answer is probably yes).

Once we synthesize more VPInstructions or recipes that are not directly tied to the original IR, I think we will need a way to attach types to nodes for cost-modeling and code-generation. Currently we get away without types, because we mostly rely on the types of the underlying IR. Once all of code-gen is updated to work on VPValues I think we can start to think/work on this transition.

  • Should we split VPInterleaveRecipe (and maybe VPWidenMemoryInstructionRecipe) into different load and store recipes?

At the momentI think the current break-down mirrors the existing code-generation functions in LV (e.g. vectorizeMemoryInstrution, vectorizeInterelaveGroup). I think that might make sense to break down/split up some of those recipes further in the future, as we move towards migrating the code-gen code, VP cost-modeling & transforms.

IIUC this is the same idea as the VPDef approach (D90558), where VPDef adds such a vector, but in a general fashion so we do not need to special case VPInterleaveRecipe. I think this vector has to hold VPValues in some form. I am not entirely sure how the VPUser fits into this vector, as I think the whole interleave group shares a single address as operand.

Note that D90558 & co should be simpler than the originally proposed MultiDef, which was spread out over multiple classes. The latest patches essentially just add the VPDef class, which just contains a vector of 'defined' VPValues. In the end, all recipes should inherit from it and VPDef can be folded directly into VPRecipeBase. So it should boil down to adding a single vector to VPRecipeBase.

It also requires a change to VPValue to allow getting the VPDef (effectively the recipe) that defines the value. I think there's no way around that, even if we just change VPInterleaveRecipe to keep a vector of VPValues it defines. To walk upwards the def-use chains, we need to be able to get the recipe/VPDef that defines a VPValue. In the single def cases, we can just cast the VPValue directly to the corresponding recipe, but that's not possible for VPValues defined by a 'multi-def'. Once we have to do that, I think we might as well add a general way to express that a recipe can define multiple values.

Something like this is what I meant, to try and make it more concrete (this is just meant as pseudo code):

/// VPInterleaveRecipe is a recipe for transforming an interleave group of load
/// or stores into one wide load/store and shuffles.
class VPInterleaveRecipe : public VPRecipeBase {
  const InterleaveGroup<Instruction> *IG;
  SmallVector<std::unique_ptr<VPUser>, 4> Members;

public:
  VPInterleaveRecipe(const InterleaveGroup<Instruction> *IG, VPValue *Addr,
                     VPValue *Mask)
      : VPRecipeBase(VPInterleaveSC, nullptr, {Addr}), IG(IG) {
    if (Mask)
      addOperand(Mask);
    // FIXME: Only loads actually need this, but stores need something better for
    // detecting Mask operands.
    for (size_t i = 0; i < IG->getFactor(); i++)
      Members.emplace_back(
          new VPUser(VPValue::VPExtractSC, IG->getMember(i), {this}));
  }
  ~VPInterleaveRecipe() override = default;

  VPValue *getExtract(int Idx) const {
    return Members[Idx].get();
  }

The Members are the Values that the other recipes are connected to. They also become uses of the VPInterleaveRecipe (hence they are VPUsers, which glues together a def-use graph). Everything else is pretty standard, VPRecipeBase inherits from VPUser which inherits from VPValue.
I feel like either if you are having to make modifications that depend upon def-uses then you are altering VPInterleaveRecipe directly and you probably know how to deal with it's uses, or you are doing something like a replaceAllUsesWith on one of those "Members", which should then just work. There might be other things I'm not thinking about though, and I haven't tried to implement anything particularly on top of this scheme (other than the vmulh code, which did not have to deal with interleaving groups specifically, other than they might be a leaf node).

I think that method would be simpler and involve less things like creating new VPValues in the constructors of VPRecipes and storing the extra vectors for the defs. It also mirrors the llvm-ir, which like I said I'm a fan of because of it's familiararity. Take a think about it. Maybe try implementing something like the vmulh code on top of the VPDef code. (Or.. maybe something that actually does something with multi-node recipes).

If you think it's still the best way to go, I'll happily review those patches. So long as we have though through the options.

fhahn added a comment.Nov 10 2020, 2:21 PM

... snip ..

The Members are the Values that the other recipes are connected to. They also become uses of the VPInterleaveRecipe (hence they are VPUsers, which glues together a def-use graph). Everything else is pretty standard, VPRecipeBase inherits from VPUser which inherits from VPValue.
I feel like either if you are having to make modifications that depend upon def-uses then you are altering VPInterleaveRecipe directly and you probably know how to deal with it's uses, or you are doing something like a replaceAllUsesWith on one of those "Members", which should then just work. There might be other things I'm not thinking about though, and I haven't tried to implement anything particularly on top of this scheme (other than the vmulh code, which did not have to deal with interleaving groups specifically, other than they might be a leaf node).

Ah yes, I missed that this was the proposal with the VPInstruction extract nodes!

I think that method would be simpler and involve less things like creating new VPValues in the constructors of VPRecipes and storing the extra vectors for the defs. It also mirrors the llvm-ir, which like I said I'm a fan of because of it's familiararity.

Creating the VPValues in all the constructors for the recipes in earlier versions of the patches was indeed suboptimal & ugly. In the latest version of the patch, things have improved though I think. Now all recipes expect VPInterleaveRecipe keep inheriting from VPValue and the registration happens as part of the constructor.

Take a think about it. Maybe try implementing something like the vmulh code on top of the VPDef code. (Or.. maybe something that actually does something with multi-node recipes).

I spent some time porting the VMULH patch (D88152) on top of the VPDef system: D91198 (this excludes the cost-model & VPInstruction::execute changes, but they should not matter).

The only part that needed slight adjustments was the recursivelyDeleteUnusedRecipes implementation, which mostly boils down to iterating over all defined values when checking if the value is dead. I think code dealing only with single-def recipes should work unchanged.

void VPRecipeBase::recursivelyDeleteUnusedRecipes() {
  if (all_of(defined_values(), [](VPValue *Def) {
        return Def->getNumUsers() == 0;
      }) /* && isSafeToRemove()*/) {
    for (auto *Op : operands()) {
      Op->removeUser(*this);
      if (VPRecipeBase *R = dyn_cast<VPRecipeBase>(Op->getDef()))
        R->recursivelyDeleteUnusedRecipes();
    }
    eraseFromParent();
  }
}

If you think it's still the best way to go, I'll happily review those patches. So long as we have though through the options.

I think the ergonomics when dealing with single-def recipes is the same with all options.

The main advantage I see of modeling multi-defs is that we
(1) can avoid artificial/dummy VPValues; conceptually I think making VPInterleaveRecipe inherit from VPValue is not the right fit, it doesn't produce a value (granted, currently this is broken by stores and loads being handled in the same recipe, but that's easy to change);
(2) have a uniform way to define & handle multi-defs.

IMO it is quite hard to gauge up-front if the extra complexity is really warranted for a cleaner modeling, but given that the overhead seems minor (to me) I think it would preferable to go with the cleaner and uniform modeling. It would probably also be good to hear @Ayal 's or @gilr 's thoughts on this.

Some things that might be slightly harder when using extra VPINstructions to extract from a multi-def might be moving or predicating multi-def recipes.

Ayal added a comment.Nov 12 2020, 3:44 PM

Extending the recipe abstraction to support def/use relations is an important next step forward for VPlan, thanks for pushing this momentum forward!

Supporting multiple VPValues defined by a single recipe is relevant for VPInterleaveRecipe and to promote idioms such as sincos, add.with.overflow, addsub and potentially more. These are important to recognize and model during vectorization due to their associated atomic costs(*), and to facilitate more advanced "Loop Mixed" vectorization opportunities, where interleave groups form leaves and roots of "Loop aware" SLP trees (CGO'16).
While this multi-valued concept admittedly diverges from LLVM-IR's Instruction-is-a-User-is-a-single-Value, it is consistent with functions returning multiple values in MLIR. Furthermore, VPlan aims to model a region of code with live-ins and live-outs - typically a single loop-nest within a function. A VPValue that is generated by a recipe inside the region and used outside the region, could be represented by feeding a VPUser similar to in-region users; such a live-out VPUser however is out-of-scope, does not correspond to any recipe, and is therefore not a VPValue. Analogously, a VPUser of a recipe inside the region may have an operand from outside the region, which could be represented by a VPValue similar to in-region values; such a live-in VPValue however is out-of-scope, does not correspond to any recipe, and is therefore not a VPUser. The concept of VPDef to hold the zero, one or more VPValues defined by a recipe is analogous to VPUser which holds the operands of a recipe, as proposed by D90558. If/as all live-ins are single values, i.e., all multi-value VPDefs correspond to recipes, recipe and VPDef could be merged into one.

(*) Explicit extract VPInstructions could also be used following a wide VPInstruction load, once introduced into VPlan; but doing so effectively breaks the interleaved-load idiom along with its associated atomic cost.

OK sounds great. I did not know that MLIR could represent multiple values, that's good to see.

I still like honestly like a more "llvm" Value/User model more, but perhaps as I read the patches I will learn to like Defs too.

Ayal added a comment.Nov 13 2020, 5:33 AM

OK sounds great. I did not know that MLIR could represent multiple values, that's good to see.

I still like honestly like a more "llvm" Value/User model more, but perhaps as I read the patches I will learn to like Defs too.

Related references: https://mlir.llvm.org/docs/LangRef/#operations and the last "Traversing the def-use chains" section at the bottom of https://mlir.llvm.org/docs/Tutorials/UnderstandingTheIRStructure/ - hopefully this looks intuitive enough.

Perhaps VPDef should be renamed VPResults to clarify this correspondence, although other entities also have distinct names: VPUser/Operands and VPRecipeBase/Operation.

Just looking around! Glad to see VPlan moving forward! :)

llvm/lib/Transforms/Vectorize/VPlanValue.h
52

I'm trying to understand what 'Sub' refers to here. Not sure I can easily relate it to 'result'. Is this kind of VPValue intended to be use for something else in the future? Would "Result VPValue" or similar make it clearer?

59

So a Virtual VPValue is basically what an Operation is in MLIR, right?

196

This, indeed, looks pretty much like MLIR API :)

fhahn abandoned this revision.Nov 15 2020, 11:59 AM

Thanks for all the feedback. I'd propose to continue the discussion at the VPDef proposal (D90558). I'll abandon this one now, to avoid any confusion.

Just looking around! Glad to see VPlan moving forward! :)

Thanks for taking a look! We moved on to a slightly different implementation of the same ideas in D90558.

llvm/lib/Transforms/Vectorize/VPlanValue.h
196

Glad things align there :) We moved on to a slightly different implementation which represents the defined values of an operation through a dedicated VPDef class, without the need to complicate the VPValue hierarchy.