Page MenuHomePhabricator

[LV] Using VPlan to model the vectorized code and drive its transformation
ClosedPublic

Authored by Ayal on May 4 2017, 10:10 AM.

Details

Summary

This patch contains the new VPlan model and uses it to represent the vectorized code and drive the generation of vectorized IR. It follows the breakdown of https://reviews.llvm.org/D28975.

The patch contains

  • https://reviews.llvm.org/D32200: once that patch is approved and committed, an updated and simplified version of this patch will be uploaded.
  • VectorizationPlan.rst: documenting the VPlan model, covering this and follow-up patches.
  • Protected methods made public to enable their reuse are annotated as such in-place to simplify the diff. They will eventually be moved and grouped together.

To recap, the introductory patch of VPlan is designed to

  • capture in VPlan all current vectorization decisions,
  • represent the control-flow of the vectorized loop body using VPlan's Hierarchical CFG,
  • retain current vectorizer output.

In this patch VPlan models the vectorized loop body: the vectorized control-flow is represented using VPlan's Hierarchical CFG, with predication refactored from being a post-vectorization-step into a vectorization planning step modeling if-then VPRegionBlocks, and generating code inline with non-predicated code. The vectorized code within each VPBasicBlock is represented as a sequence of Recipes, each responsible for modelling and generating a sequence of IR instructions. To keep the size of this commit manageable the Recipes in this patch are coarse-grained and capture large chunks of LV’s code-generation logic. The constructed VPlans are dumped under -debug.

Gil and Ayal.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
mkuper added inline comments.Jun 4 2017, 7:01 PM
lib/Transforms/Vectorize/VPlan.cpp
43

Any chance to make the const/non-const versions, here and below, share implementation? It seems like it ought to be possible.

191

This is extremely confusing. You're modifying what looks like a local variable to actually change the State->Instance for the State that will get passed to Block->execute(). Could you rewrite in a way that makes it clear what's going on?

210

Why do we need to save/restore the builder IP?

219

I don't understand this note. :-)

lib/Transforms/Vectorize/VPlan.h
187

Does this strictly need to be public? It looks like it would only be used in the classof of subclasses. Can it be protected?

462

Is it possible not to have an Entry here?
It seems like it shouldn't be, so this should probably be an assert.

497

Maybe a SmallSet?

520

Is this functionality we want?

522

I don't think we want a non-const version of this.

532

Should this be static? Or maybe even a free function in the implementation?

rengolin added inline comments.Jun 6 2017, 3:16 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
7658

shouldn't you have some asserts here to make sure the PHI node is what you need it to be?

Same for other VPlans.

7870

I know you can re-use MinVF, but it would be more readable if you created a new variable VF to use inside the loop.

7875

This is confusing. Is this RSO *just* for the plan name?

It'd seem more appropriate to have the name be a property of the plans, not some outside property.

Ayal marked 3 inline comments as done.Jun 6 2017, 3:21 PM

(Addressing review comments; To be completed)

lib/Transforms/Vectorize/VPlan.cpp
43

Not sure how, w/o casting away const.

We can drop the non-const getEntryBasicBlock() as only its const version is currently used, but both versions of getExitBasicBlock() are used.

191

OK, sure; will do so using an Optional<Instance>.

210

Ah, we don't :-).

219

Some subsequent code dereferences getFirstInsertionPt(), in LV or in code called by LV, w/o checking if it might be the end of a block.

lib/Transforms/Vectorize/VPlan.h
115

ok, sure.

117

Yes, UID is used for printing only, including the Name. Thinking of having the Planner keep track of this ordinal.

174

(Right)

317

Not sure what the type safety concern is?

462

It is conceivable for code motion to completely empty a Region from all its blocks.

Ayal marked 3 inline comments as done.Jun 7 2017, 7:40 AM

(Addressing review comments; To be completed)

Done. Will upload updated version shortly.

lib/Transforms/Vectorize/VPlan.cpp
43

Leaving them as is.

lib/Transforms/Vectorize/VPlan.h
117

Assigning the IDs on the fly in the printer, instead.

187

It needs to be public. A method (classof) of a derived class cannot invoke a protected method on an object of the base class (not "this").

223

Added the following definition to clarify the terminology:

/// An Ancestor of a block B is any block containing B, including B itself.

So it's either an "enclosing region" or the block itself, which in turn may be a region or a basic-block.

390

Yes we can.
(May sound old-fashioned ;-)

394

This odd looking thing is part of the ilist_node_with_parent contract, used by getPrevNode() and getNextNode().

497

Initially wanted to iterate over the VFs, but that's not needed now.
Changed to SmallSet, and also replaced getVFs() with hasVF(unsigned).

520

Possibly, but not now, removed.

522

We can do w/o both versions, by only checking hasVF(unsigned).

532

Yes, this should be static. It's offloading the last part of VPlan::execute(), so good to keep adjacent.

Ayal updated this revision to Diff 101752.Jun 7 2017, 8:58 AM

(Addressing review comments; To be completed)

Done. Will upload updated version shortly.

Done.

mkuper added inline comments.Jun 9 2017, 6:21 PM
lib/Transforms/Vectorize/LoopVectorize.cpp
1825

Why?

2056

As above, please group public/private members together.

2158

A vector of unique_ptr, maybe?
Anyway, I'm fine with this as is.

2194

This seems like a somewhat odd API.
Why do we only support cutting from the top, and not from the bottom or the middle? Do we expect to only ever prune from the top? Otherwise, I'd expect the range to be represented as a set, rather than an interval, and use a filter over that set. The current state may be fine for simplicity's sake, but i'd like to understand this better.

Regardless, please rename the method. It's really surprising that a bool test...() modifies one of its arguments.

4522

Why the split between how we handle int and ptr inductions? To minimize the patch?

4566–4567

Perhaps this should now be named widenInstruction?

7486

I suppose this (and the rest of this code) is constructed this way because we intend to support more than one VPlan per VF/UF pair soon?

7603

Maybe break all of the recipe stuff out into a separate file? Two files, even? (header/implementation)

8095

Relevent -> Relevant

8103

Please add a comment somewhere explaining that the order in which we try the recipes matters. (e.g. tryToInterleave needs to come first.)
I'm not sure how we should be handling recipe priority in general, if at all, but that's not for this review.

8135

Shouldn't we be resetting LastWidenRecipe somewhere, if we encountered an instruction that uses a different recipe?

lib/Transforms/Vectorize/VPlan.h
223

I would strongly prefer less confusing terminology (I think ancestor very strongly evokes "direct or indirect predecessor"), but if I'm the only one getting confused by this, feel free to keep it.

test/Transforms/LoopVectorize/if-pred-non-void.ll
213–215

The test changes look benign, but I'm curious about why we have them.

Ayal marked 2 inline comments as done.Jun 12 2017, 2:35 AM
Ayal added inline comments.
lib/Transforms/Vectorize/LoopVectorize.cpp
1825

Good catch; isProfitableToScalarize() should be called only for VF > 1, as it compares scalarizing to vectorizing-for-VF. All original callers to isProfitableToScalarize() first check if VF > 1 before calling it. This patch adds a new caller inside tryToWiden(), which also avoids calling it with VF == 1.

Will adding an "assert(VF > 1);" instead of this "if (VF ==1) return true;".

2158

Originally D28975 did use "shared_ptr<VPlan>". However using regular pointers to VPlan seems simpler, just need to destroy them correctly at the end.

2194

Indeed a more general implementation of building VPlans could represent arbitrary sets of VF's rather than "Ranges" or intervals of VF's, which this patch uses. VPlans themselves are not confined to represent only ranges.

This implementation builds VPlans for the full {1,2,4,8,...,MaxVF} range of feasible VF's by repeatedly building a VPlan starting from a given VF up until the maximum VF possible. Each vectorization decision can potentially reduce this maximum. The two extremes we could end up with are: one VPlan for all VF's, and one VPlan for each VF. Decisions hopefully exhibit this form of continuity, but they certainly don't have to.

Will see if above should be added as a comment to buildVPlans().

Some alternative names we came up with:

testAndClampVFRange()
prefixTestVFRange()
VFRangePrefixTest()
testStartAndSetEndVF()

any one of these looks ok? Better suggestions are welcome.

4522

Yes, we're keeping ILV's current split between its widenIntOrFpInduction() and how it handles ptr inductions and other phi instructions. The former was last addressed and merged by r296145; the latter, appearing below, is simpler and remains inline.

In any case, addressing this split is independent of introducing VPlan.

4566–4567

Yup.

7486

Currently each VPlan covers a range of VF's and arbitrary UF's, where every feasible VF is covered by a single VPlan.

In the end a single VPlan should remain, which is the one that gets executed. As soon as possible all other VPlans are discharged, as done here.

7603

Yes, this large LoopVectorizer.cpp file should be broken into multiple smaller files. ILV in its current form hinders doing so with these recipes. Follow-up patches should help facilitate this desired change.

7658

The asserts are there when execute() is called, e.g., when it calls ILV's widenIntOrFpInduction().

7870

OK, sure.

7875

Yes, this is to concatenate the VF's into the plan name, which is indeed a property of the plan. Will move from buildVPlans() to the end of buildVPlan().

8103

Sure. This ordering reflects the existing logic in LV.

Note that in the future, some recipes such as Interleave Groups may be constructed as a VPlan-based transformation, instead of jointly with other recipes, relieving this specific ordering concern here.

8135

We could, as an optimization, because VPWidenRecipe's appendInstruction(Instr) currently succeeds only if Instr is its 'next' ingredient.

But it's not necessary - we can simply call appendInstruction(Instr) for the LastWidenRecipe even if other instructions have gone by (and have it fail if so). This way more relaxed forms of appendInstruction() can be supported.

OTOH, doing so would clutter each 'continue'.

lib/Transforms/Vectorize/VPlan.h
223

ok, "Ancestors" and "Predecessors" are admittedly ambiguous terms in English... will use "getEnclosingBlockWithPredecessors()" and "getEnclosingBlockWithSuccecessors()" instead, after defining an Enclosing Block of block B to be any block that contains B, including itself.

test/Transforms/LoopVectorize/if-pred-non-void.ll
213–215

These test changes are due to the order in which predicated-and-scalarized basic-blocks are created and filled.

In current trunk, when the “udiv” is generated, the “extract” feeding its nominator is also generated and placed before it, courtesy of getScalarValue(). Later the “udiv” is placed in a basic-block of its own w/o its operands. Finally sinkScalarOperands() kicks in, sinking both its operands to appear next to it.

In this patch, when the “udiv” is generated, the “extract” feeding its nominator is also generated and placed before it, courtesy of getOrCreateScalar(). But having already created a basic-block for the “udiv”, this “extract” is placed there as well, complying with LV's approach of placing extracts next to their uses. When finally sinkScalarOperands() kicks in, it has only the other operand to sink (the denominator).

This discrepancy in the sinking order swaps the final placement of the two operands.

Ayal updated this revision to Diff 102175.Jun 12 2017, 5:12 AM

Updated following review comments.

mssimpso added inline comments.Jun 15 2017, 8:57 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
619–657

The interface to the vector and scalar instruction maps seems to be getting more complex and confusing with this patch. Do we actually need all of these additional functions? How do they interact with the existing ones (init* and get*)? Are the existing ones still necessary?

I think eventually we want to move this interface out of ILV and into a standalone storage class.

But it looks like one thing that changes with this patch is that the maps can be "incomplete" at a given instance (i.e., you call initVector with an Entry that has been initialized with null pointers, then later I assume use the new setters to complete the mapping). Currently, I think we only add a mapping to the storage once all the entry values are completely defined. Is there a reason this needs to change (it's fine if it does, I'm just trying to understand)? An advantage of the current approach would be that if a key exists in the map, we know that all of the entry values should be non-null (I think we've even caught some bugs this way).

getVectorValue was made to return a constant reference to enforce the idea that the entry values shouldn't be changed once added to the map (although getVector still had to exist for handling reductions, so this never really fit well, admittedly).

Sorry for the rambling comment - I think we need to better document the interface and remove the functions that are no longer necessary.

Ayal added inline comments.Jun 18 2017, 8:46 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
619–657

Yes, the existing interfaces are still needed. This patch introduces the ability for scalarizeInstruction() to generate a single replica when needed for scalarized and predicated instructions, *alongside* the existing ability of generating all replica's for other scalarized instructions.

Yes, the newly added getOrCreateScalar() supports setting one replica at a time, and as such renders this mapping "incomplete" until the last replica is generated. In the case of generating scalarized and predicated instructions with VPlan, each scalar replica is generated separately as part of replicating the entire region, under the (existing) assumption that all uses of one replica are generated before generating the next replica.

An alternative interface which may be simpler and clearer is to only support setting and getting a single Value per part, or per part-and-lane. Setting a value can assert that no value has already been set; this assert can be overridden when needed using a specialized "resetting" method. This keeps the implementation of *MapStorage internal to ValueMap, albeit potentially being less efficient as it avoids batching (if not inlined). If so, it may be better done in a separate patch. Sounds reasonable?

Current interface:

ILV:
  const VectorParts &getVectorValue(Value *V)
  Value *getScalarValue(Value *V, unsigned Part, unsigned Lane)
ValueMap:
  bool hasVector(Value *Key)
  bool hasScalar(Value *Key)
  const VectorParts &initVector(Value *Key, const VectorParts &Entry)
  const ScalarParts &initScalar(Value *Key, const ScalarParts &Entry)
  VectorParts &getVector(Value *Key)

Alternative interface:

ILV:
  Value *getScalarValue(Value *V, const VPIteration &I) // Get scalarized or extract vectorized.
  Value *getVectorValue(Value *V, unsigned Part) // Get vectorized or pack/broadcast scalarized/uniform.

ValueMap:
  bool hasScalarValue(Value *V, const VPIteration &I)
  bool hasVectorValue(Value *V, unsigned Part)
  void setScalarValue(Value *V, const VPIteration &I, Value *Scalar) // Asserts value not already set.
  void setVectorValue(Value *V, unsigned Part, Value *Vector) // Asserts value not already set.
  void resetScalarValue(Value *V, const VPIteration &I, Value *Scalar) // Can assert value already set.
  void resetVectorValue(Value *V, unsigned Part, Value *Vector) // Can assert value already set.

Agreed, this mapping should eventually move out of ILV and into a standalone storage class, similar to the VPBB2IRBB mapping in TransformState.CFGState. We're trying to breakdown these and other changes into separate patches.

Sorry for the rambling comment - I think we need to better document the interface and remove the functions that are no longer necessary.

Seconded ;-)

Ayal added a comment.Jun 21 2017, 1:04 PM

An alternative interface which may be simpler and clearer ... it may be better done in a separate patch.

The alternative interface was uploaded as a separate patch D34473. It's not strictly required by this patch of VPlan, but can be committed separately if desired.

Ayal updated this revision to Diff 105036.Jul 3 2017, 1:45 AM

Patch updated to llvm trunk, adapted to the new ValueMap interface of D34473. ValueMap is extracted to a standalone struct VectorizerValueMap.

Hi Ayal,

All my concerns were addressed. @mssimpso @mkuper are you also happy with the patch?

This is a big intrusive change and we'll probably have a few fine tune patches here and there, but overall, I think this is the right direction to go.

cheers,
--renato

mssimpso edited edge metadata.Aug 2 2017, 7:48 AM

Hi Ayal,

Very sorry for the long delay. I don't have anymore comments for this patch. I agree with Renato that although it's large, we should begin iterating on it in tree (especially since we've already branched for the release at this point). But please let Michael have another look at it before committing.

Thanks!

mkuper accepted this revision.Aug 8 2017, 1:45 AM

I think the revision summary doesn't fully represent what's going on here at this point. :-)

In any case, I agree with Matt and Renato - this looks good to go, we can continue iterating on it in-tree.

lib/Transforms/Vectorize/LoopVectorize.cpp
2194

Ok, this still sounds odd, but we can iterate over this in-tree in the future.
Re naming, testAndClampVFRange() sounds better, I think, but I'm good with anything that indicates this changes the range.

7603

It seems like putting all the recipes in a separate file would be easy to start with (instead of going into an anonymous namespace here.)
If it isn't, I'm ok with doing this in follow-ups.

This revision is now accepted and ready to land.Aug 8 2017, 1:45 AM
Ayal updated this revision to Diff 111426.Aug 16 2017, 3:33 PM
Ayal edited the summary of this revision. (Show Details)

Uploading the version updated to top of trunk before committing, including merging with SinkAfter patch D33058 by reordering ingredients before constructing recipes for them.

Also changed InnerLoopVectorizer to be under llvm rather than anonymous namespace, because building with gcc 4.8.5 produced the following warning:

[1/3] Building CXX object lib/Transfor.../LLVMVectorize.dir/LoopVectorize.cpp.o
In file included from llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:50:0:
llvm/lib/Transforms/Vectorize/VPlan.h:194:8: warning: 'llvm::VPTransformState' has a field 'llvm::VPTransformState::ILV' whose type uses the anonymous namespace [enabled by default]
struct VPTransformState {

^

Caught late as building with clang 4.0 did not produce this warning, or any other.

This revision was automatically updated to reflect the committed changes.
Ayal updated this revision to Diff 111848.Aug 19 2017, 11:42 AM

Previous upload missed newly added VPlan.h and VPlan.cpp, including them here. This is the version that was committed.

Ayal updated this revision to Diff 112467.Aug 23 2017, 3:41 PM

Fix PR34248: pack a predicated scalar into a vector only when vectorizing; avoid doing so when only unrolling. Add a test derived from the reproducer of PR34248.

Hi, it seems this patch causes a functional regression. I'm adding comments to https://reviews.llvm.org/rL311849.

I would appreciate if you could help me with some hints to narrow down the root cause.