This is an archive of the discontinued LLVM Phabricator instance.

[LV] Model masking in VPlan, introducing VPInstructions
ClosedPublic

Authored by gilr on Oct 8 2017, 12:28 PM.

Details

Summary

This patch adds a new abstraction layer to VPlan and leverages it to model masking in VPlan. Masking is essential to the vectorizer's predication process, but is currently an ad-hoc side-effect of the vectorized IR-generation stage. Modeling masking directly in VPlan facilitates moving predication from IR-generation stage to the planning stage.

VPlan models the contents of VPBasicBlocks using Recipes, which do not currently expose their planned instructions nor the Def-Use relations between them. Modeling masking in VPlan requires VPlan to model Def-Use relations, specifically among the planned instructions that will generate, manipulate and consume masks.

The new VPValue and VPUser classes model how data flows into, through and out of a VPlan, forming the vertices of a planned Def-Use graph. The new VPInstruction class is a generic single-instruction Recipe that models a planned instruction along with its opcode, operands and users. See VectorizationPlan.rst for more details.

With this patch, VPlan models as VPInstructions the planned instructions that manipulate masks (AND, OR, NOT), introduced during predication. Their operands are other VPInstructions (e.g., a NOT feeding an AND), or VPValues coming from other Recipes or from live-in values. Their users are other VPInstructions or VPUsers employed to retrieve the mask during Recipes' execution (e.g., for generating calls to masked load/store intrinsics). More concretely:

  • createBlockInMask(), createEdgeMask() were moved from ILV to the Planner. Their code is essentially unchanged, except they now generate VPValues instead of Values. They are now called by Planner and passed down to mask-aware Recipes during VPlan construction.
  • ILV::vectorizeMemoryInstruction() so far generated the masks on-the-fly using createBlockInMask(). In this patch VPWidenMemoryInstructionRecipe provides the masks as an optional argument.
  • VPBlendRecipe takes over non-loop phi's. It generates the same sequence of selects, only based on VPlan masks.
  • VPBranchOnMaskRecipe now takes a VPValue mask.

Another aspect to notice is that until VPValues fully replace the existing scalar-IR-based ValueMap mechanism we effectively have two Def-Use graphs co-exist during vectorized code generation. Rewiring these two graphs together is done using
(a) Recipes taking VPValues and writing to ValueMap, and conversely
(b) VPValues that model Values still handled in ValueMap. The ILVCallback provides the bridge from VPlan’s DataState to ValueMap.

This abstraction layer facilitates modeling additional Def-Use relations in VPlan, to support bringing additional transformations to the planning stage.

Ayal & Gil

Diff Detail

Repository
rL LLVM

Event Timeline

gilr created this revision.Oct 8 2017, 12:28 PM

I haven't been keeping up with this area as much as I'd have liked, but have some comments. Overall the approach seems ok to me.

lib/Transforms/Vectorize/LoopVectorize.cpp
7950 ↗(On Diff #118173)

Can we use unique_ptr instead of a raw pointer to avoid repeating this?

8020 ↗(On Diff #118173)

This can be combined into line below.

lib/Transforms/Vectorize/VPlan.cpp
210 ↗(On Diff #118173)

Can you instead check if it's a BinaryOperator instead of having to repeat these opcodes?

lib/Transforms/Vectorize/VPlan.h
736 ↗(On Diff #118173)

Why delete the value from the map but leave the key?

755 ↗(On Diff #118173)

Just make this take a pointer?

756 ↗(On Diff #118173)

This seems like it should be an assert instead? Why would a VPValue ever be created twice for a given Value and VPlan pair?

gilr marked 5 inline comments as done.Oct 10 2017, 6:31 AM

Thanks, Amara!

lib/Transforms/Vectorize/VPlan.h
736 ↗(On Diff #118173)

These are Values from the existing IR code - VPlan doesn't own them, just maintains a VPValue to represent them at the VPlan-level Def-Use graph.

gilr updated this revision to Diff 118391.Oct 10 2017, 7:44 AM
  • Adressed comments.
  • Fixed bug: tryToWidenMemory() wasn't considering Cost Model's sink-scalar-operands scalarization decisions.
aemerson added inline comments.Oct 11 2017, 3:39 AM
lib/Transforms/Vectorize/VPlan.h
736 ↗(On Diff #118173)

I meant why not erase() the entry from the map completely. Lleaving the key in the map but with a now invalid pointer doesn't seem right, unless I'm missing something.

rengolin edited edge metadata.Oct 12 2017, 8:13 AM

Hi Ayal/Gil,

This is an interesting pattern and I welcome the docs changes with the code. But the patch is quite big and it's hard to understand what's the actual change and what's a pre-requisite.

AFAICS, there are two main stages here:

  1. Groundwork to get the new constructs (VPlanValue/VPlanBuilder/VPlan changes) and necessary changes to LV and ILV.
  2. New recipes that use them (VPBlendRecipe/VPWidenMemoryInstructionRecipe) and necessary cleanups in LV/ILV as well as VPlan*.

I'd imagine this patch can be split into at least three blocks:

  • New constructs + docs + (I)LV cleanups
  • VPBlendRecipe + tests
  • VPWidenMemoryInstructionRecipe + tests

but I wouldn't be surprised if you needed some generic cleanups before the first patch and after the last one.

I'm also not expecting a lot of new tests, given that this is just doing better what we already do.

Would it be possible to split the patch, so that we can review them in a more concise way?

Thanks!
--renato

gilr added a comment.Oct 13 2017, 2:34 PM

Would it be possible to split the patch, so that we can review them in a more concise way?

Hi Renato,

The patch is admittedly quite massive despite our deliberate minimization efforts.
To avoid introducing dead code into LV we're adding the new VPlan constructs along with a single, well-bounded use case.
Splitting the masking code in LoopVectorize.cpp is challenging: once VPlan takes care of the masks it must rewire them to the relevant Recipes (which are few).

Most of the changes/additions to the VPlan* files are rather simple and technical. The changes in LoopVectorize.cpp are the more complex part and deserve a more elaborate explanation, to be appended to the patch summary:

  • createBlockInMask(), createEdgeMask(): these two methods moved from ILV to the Planner. Their code is essentially unchanged, except they now generate VPValues instead of Values. They are now called by Planner and passed down to mask-aware Recipes during VPlan construction.
  • ILV::vectorizeMemoryInstruction() was so far called from VPWidenRecipe and generated the masks on-the-fly using createBlockInMask(). In this patch it is called by the new VPWidenMemoryInstructionRecipe, which provides the masks as an optional argument.
  • VPBlendRecipe takes over non-loop phi's. It generates the same sequence of selects, only based on VPlan masks.
  • VPBranchOnMaskRecipe now takes a VPValue mask.

Another aspect to notice is that until VPValues fully replace the existing scalar-IR-based ValueMap mechanism we effectively have two Def-Use graphs co-exist during vectorized code generation. Rewiring these two graphs together is done using
(a) Recipes taking VPValues and writing to ValueMap, and conversely
(b) VPValues that model Values still handled in ValueMap. The ILVCallback provides the bridge from VPlan’s DataState to ValueMap.

lib/Transforms/Vectorize/VPlan.h
736 ↗(On Diff #118173)

The invalid entries will be destroyed immediately after on exiting the destructor, but it would indeed be good to add an explicit call to Value2VPValue.clear(), similar to LICM's runOnLoop(). Thanks!

gilr edited the summary of this revision. (Show Details)Oct 14 2017, 12:26 AM
gilr updated this revision to Diff 119081.Oct 15 2017, 6:40 AM

Addressed comment: Added a call to Value2VPValue.clear() on ~VPlan().

aemerson edited edge metadata.Oct 15 2017, 7:25 AM

A minor nit but LGTM. I don't have an aversion to "dead code" if it's going to be used in the near future, so perhaps to make the patch smaller split it into an initial patch to add VPlanValue.h and then the VPlanBuilder.h as well as the changes to the documentation. Those pieces seem uncontroversial to me, before moving onto the vectorizer. @mkuper does this make sense?

lib/Transforms/Vectorize/VPlan.cpp
33 ↗(On Diff #119081)

shorter with const auto* here

lib/Transforms/Vectorize/VPlan.h
736 ↗(On Diff #118173)

Ah I'd gotten mixed up when I was looking at this originally. I'm ok with the original code, sorry for noise, although maybe it makes sense to store them as unique_ptrs?

A minor nit but LGTM. I don't have an aversion to "dead code" if it's going to be used in the near future, so perhaps to make the patch smaller split it into an initial patch to add VPlanValue.h and then the VPlanBuilder.h as well as the changes to the documentation. Those pieces seem uncontroversial to me, before moving onto the vectorizer. @mkuper does this make sense?

That was basically my point. :)

In D38676#897413, @gilr wrote:

To avoid introducing dead code into LV we're adding the new VPlan constructs along with a single, well-bounded use case.

Earlier patches in a patch series are not considered dead code. If you submit a 2-patch series, it'd make it easier for us to understand what's needed to create the infrastructure and what's its usage. As it is, it's hard to know what code movement is one or the other.

Splitting the masking code in LoopVectorize.cpp is challenging: once VPlan takes care of the masks it must rewire them to the relevant Recipes (which are few).

This could be part of the first patch.

Most of the changes/additions to the VPlan* files are rather simple and technical. The changes in LoopVectorize.cpp are the more complex part and deserve a more elaborate explanation, to be appended to the patch summary:

These could also be comments. There are a lot of comments in the vectoriser for that reason. :)

  • createBlockInMask(), createEdgeMask(): these two methods moved from ILV to the Planner. Their code is essentially unchanged, except they now generate VPValues instead of Values. They are now called by Planner and passed down to mask-aware Recipes during VPlan construction.

This looks like it's part of the first patch.

  • ILV::vectorizeMemoryInstruction() was so far called from VPWidenRecipe and generated the masks on-the-fly using createBlockInMask(). In this patch it is called by the new VPWidenMemoryInstructionRecipe, which provides the masks as an optional argument.
  • VPBlendRecipe takes over non-loop phi's. It generates the same sequence of selects, only based on VPlan masks.
  • VPBranchOnMaskRecipe now takes a VPValue mask.

These look like they're part of the second.

--renato

mkuper edited edge metadata.Oct 16 2017, 2:42 PM

A minor nit but LGTM. I don't have an aversion to "dead code" if it's going to be used in the near future, so perhaps to make the patch smaller split it into an initial patch to add VPlanValue.h and then the VPlanBuilder.h as well as the changes to the documentation. Those pieces seem uncontroversial to me, before moving onto the vectorizer. @mkuper does this make sense?

My $0.02:
Basically, I agree with @aemerson and @rengolin, at least to the extent that even if either way could work, it would be best to do it the way that ends up with simpler patches and will take less time (overall) to review well.

I don't think there's a problem of dead code, or "it's unclear how the infrastructure will be used", since you already have the code that actually uses it ready for review as well. So, it would probably best to split it into two dependent patches, and post them for review separately. Then, the second one can be rebased on top of the first one, if it has any significant changes, when it goes in. Admittedly, it's more work, but I think it's worth it.

WDYT?

gilr added a comment.Oct 17 2017, 1:06 PM

A minor nit but LGTM. I don't have an aversion to "dead code" if it's going to be used in the near future, so perhaps to make the patch smaller split it into an initial patch to add VPlanValue.h and then the VPlanBuilder.h as well as the changes to the documentation. Those pieces seem uncontroversial to me, before moving onto the vectorizer. @mkuper does this make sense?

My $0.02:
Basically, I agree with @aemerson and @rengolin, at least to the extent that even if either way could work, it would be best to do it the way that ends up with simpler patches and will take less time (overall) to review well.

I don't think there's a problem of dead code, or "it's unclear how the infrastructure will be used", since you already have the code that actually uses it ready for review as well. So, it would probably best to split it into two dependent patches, and post them for review separately. Then, the second one can be rebased on top of the first one, if it has any significant changes, when it goes in. Admittedly, it's more work, but I think it's worth it.

WDYT?

Yes, small & digestible patches are ideal. So, as you guys are all Ok with introducing the VPInstruction infrastructure w/o its use, let's go with that.
One change we can probably peel off before, though, is to first introduce VPBlendRecipe and VPWidenMemoryInstruction w/o the new masking code. This should later narrow the diff in LoopVectorize.cpp and reflect the masking changes on those Recipes as well.

gilr added a comment.Oct 18 2017, 1:05 PM
In D38676#900051, @gilr wrote:

One change we can probably peel off before, though, is to first introduce VPBlendRecipe and VPWidenMemoryInstruction w/o the new masking code. This should later narrow the diff in LoopVectorize.cpp and reflect the masking changes on those Recipes as well.

This is now up for review as D39068.

hsaito added a subscriber: hsaito.Oct 20 2017, 1:59 PM

Regarding VPValue/VPUser/VPInstruction concept ----- input from in-person chat with Hal Finkel and Chris Lattner before/during LLVM Dev Con
(more precisely speaking, this is my interpretation of the chat --- if in doubt, please verify with them):
Both agreed with me that many of the code we have/write should, in theory, be functional on a CFG subgraph and Instructions that aren't strictly attached
to the Function (being integral part of the IR state). Having said that, we all agreed that figuring out what works and what does not, in very high confidence,
is a lot of work. As such, given the current limited scope of VPInstruction usage in VPlan, we don't see a point of investing a lot of effort in making many of
code working on CFG/Instruction truly useful when the CFG subgraph is detached (or not yet attached) to the Function.

The design choice I'm advocating is that we should be able to make all of the vectorizer up to cost modeling can be made as a valid Analysis Pass.
That means no IR Instructions left unattached (if we do that, verifiers will complain). As a result, if we are to use IR instructions instead of VPInstructions
in VPlan, we have to either clone the entire Function, or create a new Function and copy the loop (nest) of interest to it. I hugely disagree to such a hack

  • hence have a desire to move forward with the VPInstruction direction.

If we all agree to go that way, please continue to watch us, in our effort for avoiding copying/pasting as much as we can.

Thanks,
Hideki

The design choice I'm advocating is that we should be able to make all of the vectorizer up to cost modeling can be made as a valid Analysis Pass.
That means no IR Instructions left unattached (if we do that, verifiers will complain). As a result, if we are to use IR instructions instead of VPInstructions
in VPlan, we have to either clone the entire Function, or create a new Function and copy the loop (nest) of interest to it. I hugely disagree to such a hack

  • hence have a desire to move forward with the VPInstruction direction.

Fully agreed! Thanks for the clarification.

--renato

bollu added a subscriber: bollu.Oct 30 2017, 12:52 AM
fhahn added a subscriber: fhahn.Oct 31 2017, 2:21 AM
gilr updated this revision to Diff 121904.Nov 7 2017, 7:44 AM
gilr edited the summary of this revision. (Show Details)

Rebasing this patch on its preparatory patch D39068.

rengolin added inline comments.Nov 15 2017, 1:00 PM
lib/Transforms/Vectorize/LoopVectorize.cpp
8374 ↗(On Diff #121904)

So, buildVPlans will call buildVPlan for each VF, which means we'll call new VPlan multiple times, but now the Plan is a class global and will hold the last created plan's pointer. What is the purpose of this pointer?

lib/Transforms/Vectorize/VPlan.h
241 ↗(On Diff #121904)

How is this saving the value in the DenseMap PerPartOutput?

I finished my review, and apart from my two final comments, everything looks fine.

Thanks for the hard work! :)

gilr added a comment.Nov 19 2017, 8:15 AM

I finished my review, and apart from my two final comments, everything looks fine.

Thanks for the hard work! :)

Excellent. Will upload a revised version shortly.
Thanks a lot for reviewing this!

lib/Transforms/Vectorize/LoopVectorize.cpp
8374 ↗(On Diff #121904)

The Plan class-global pointer is used for maintaining the current VPlan being constructed. But this indeed may be confusing - will instead propagate the Plan into the Planner's methods.

lib/Transforms/Vectorize/VPlan.h
241 ↗(On Diff #121904)

This method is only a getter. Assigning Values to VPValues is done using set(). The old ILV ValueMap mechanism is delegated the call for Defs which are still not fully ported to the VPInstruction framework (and have therefore not been set()). I'll try to improve this method's comments to express this more clearly.

gilr updated this revision to Diff 123500.Nov 19 2017, 8:17 AM

Addressed review comments.

rengolin accepted this revision.Nov 19 2017, 9:49 AM

Great, LGTM now, thanks!

lib/Transforms/Vectorize/LoopVectorize.cpp
8374 ↗(On Diff #121904)

Right, I see, it's the *current* plan. Let's keep it as an argument for now and try to find a better pattern later. Thanks!

lib/Transforms/Vectorize/VPlan.h
241 ↗(On Diff #121904)

Ok, I thought this was a direct caching mechanism, now it makes sense.

This revision is now accepted and ready to land.Nov 19 2017, 9:49 AM
This revision was automatically updated to reflect the committed changes.
XiaPZ added a subscriber: XiaPZ.Nov 13 2022, 5:53 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript