Page MenuHomePhabricator

[RFC][VPlan, SLP] Add simple SLP analysis on top of VPlan.
ClosedPublic

Authored by fhahn on Jul 18 2018, 8:50 AM.

Details

Summary

This patch adds an initial implementation of the look-ahead SLP tree
construction described in 'Look-Ahead SLP: Auto-vectorization in the Presence
of Commutative Operations, CGO 2018 by Vasileios Porpodas, Rodrigo C. O. Rocha,
Luís F. W. Góes'.

It returns an SLP tree represented as VPInstructions, with combined
instructions represented as a single, wider VPInstruction.

This initial version does not support instructions with multiple
different users (either inside or outside the SLP tree) or
non-instruction operands; it won't generate any shuffles or
insertelement instructions.

It also just adds the analysis that builds an SLP tree rooted in a set
of stores. It does not include any cost modeling or memory legality
checks. The plan is to integrate it with VPlan based cost modeling, once
available and to only apply it to operations that can be widened.

A follow-up patch will add a support for replacing instructions in a
VPlan with their SLP counter parts

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
ABataev added inline comments.Jul 20 2018, 12:58 PM
lib/Transforms/Vectorize/VPlanSLP.cpp
65

llvm has special function llvm::to_vector<4>() that will allow to convert from ArrayRef to SmallVector

179–183

Do we need braces here?

195

Use llvm::None instead

303

Is it possible to replace it with emplace_back(Operands.first, Operands.second[0]);

303–309

You can preallocate the memory for Mode and FinalOrder vectors, because you know the number of elements is MultiNodeOps.size()

319

Also can preallocate the memory for the vector

335

use emplace_back()

fhahn updated this revision to Diff 156591.Jul 20 2018, 1:29 PM
fhahn updated this revision to Diff 156605.Jul 20 2018, 2:09 PM
fhahn marked 7 inline comments as done.
fhahn added inline comments.
lib/Transforms/Vectorize/VPlanSLP.cpp
65

Neat, thanks for pointing that out!

179–183

Not anymore, thanks

303

I don't think so, it fails to deduce the correct type for the second argument (with and without braces)

335

Is there are benefit by using emplace_back here? We are not constructing any objects here I think

fhahn added a comment.Aug 1 2018, 2:30 PM

Ping. Thanks for all the comments so far and the initial code review.

What do you think is the best way going forward overall: 1) evolving loop aware SLP initially in VPlan (independent of SLPVectorizer) and start using VPlan infrastructure for SLPVectorizer as it becomes available or 2) trying to share as much code between both from the start? Personally I think 1) would be better, as it allows us to evolve the VPlan parts quicker and I would like to avoid unnecessary code churn in SLPVectorizer and start using VPlan infrastructure there as it becomes clearly beneficial.

fhahn added a comment.Aug 1 2018, 2:32 PM

I'll also add some cleanup code for destroying the created instructions if the are not used in a bit.

Thanks for working on this, Florian, and sorry for my delayed response. I added some initial comments. I'll come back soon.

Ping. Thanks for all the comments so far and the initial code review.

What do you think is the best way going forward overall: 1) evolving loop aware SLP initially in VPlan (independent of SLPVectorizer) and start using VPlan infrastructure for SLPVectorizer as it becomes available or 2) trying to share as much code between both from the start? Personally I think 1) would be better, as it allows us to evolve the VPlan parts quicker and I would like to avoid unnecessary code churn in SLPVectorizer and start using VPlan infrastructure there as it becomes clearly beneficial.

I agree with you. I think it's better to have an initial end-to-end loop aware SLP PoC, properly define the representation of SLP operations, define how SLP transformation comes into play and interacts with the rest of the VPlan infrastructure, etc., before we move forward on SLPVectorizer. In the VPlan native path we have some room for experimentation. We can take advantage of that and start with SLPVectorizer once we have a solid model for SLP in VPlan.

In the long term, I think there is potential to share infrastructure with the SLPVectorizer on different levels: for example, we could potentially re-use VPlan based code generation, if the SLPVectorizer would emit VPInstructions; share the infrastructure to discover consecutive loads/stores; share different combination algorithms between VPlan and SLPVectorizer; potentially re-use part of the VPlan based cost model. One thing I want to try out is how easy/hard it would be to make the analysis work on VPInstruction and Instruction based basic blocks.

! In D49491#1166978, @rengolin wrote:
This is why we haven't gone too deep in the analysis, yet. Sharing code between SLPVec, which operates in IR, and VPlanSLP, which operates in VInstructions, can be confusing, limiting or even not possible. The analysis part mostly works, because VPInstruction is similar enough to Inst, but implementation and, worse, heuristics, can go wrong very quickly.

In terms of re-use, Value/VPValue templatization could work for some analyses. Templatizing IRBuilder/VPlanBuilder might also work for some parts of the implementation. However, we may want to think carefully. If for templatization to work we needed the whole VPValue hierarchy and internal details to be a carbon copy of Value hierarchy, we would be losing all the flexibility that VPValue/VPInstruction are bringing to VPlan. Of course, we would have to investigate this a bit more. Another alternative that I had in mind is introducing some kind of InstructionTraits. That would give us an unified interface while allowing more flexibility at VPValue/Value implementation level. Unfortunately, we haven't had have time to investigate this approach further.

lib/Transforms/Vectorize/VPlan.h
620

To be more specific, what about SLPLoad/SLPStore instead?

663

I wonder if having this interface is safe. Wouldn't it be problematic if we change the opcode of a VPInstruction object to an opcode of an operation which is represented with a VPInstruction subclass? Maybe it's better to not allow this and just create a VPInstruction using VPlanBuilder?

675

Make it protected per design principle described in getUnderlyingValue in VPlanValue.h? Same for setUnderlyingInstr.

1538

(parts) of -> (parts of)?

1543

SCEV -> VPValue?

I guess we couldn't have a single templatized version for sorted SmallVectors of any pointer, right?

1595

formatting

1606

Something to think about is the long term of this class. I wonder if it would be better to split it multiple ones, matching what we have for loop vectorization. For example, we may want to have a special VPlan (VPlanSLP) to represent the SLP planning information, a VPlanSLPBuilder, a VPlanSLPContext (if necessary)... We don't have to do it as part of this patch but it's something to keep in mind.

lib/Transforms/Vectorize/VPlanSLP.cpp
63

Would it make sense to move this to VPValue?

67

Some doc would be great. Same for some of the methods below.

211

If opcodes are the same at this point, shouldn't we keep only A or B checks but not both?
You could add assert(A->getOpcode() == B->getOpcode()) to be safe.

220

I would be great if you could elaborate a bit what 'score' is exactly.

fhahn updated this revision to Diff 159940.Aug 9 2018, 9:31 AM
fhahn marked 5 inline comments as done.

Addressed comments, thanks!

lib/Transforms/Vectorize/VPlan.h
663

Removed, it was a leftover from an earlier version.

1543

I think it is possible to provide a single implementation for SmallVectors. I've removed the implementation here and will upload a separate patch for it soon.

fhahn added a comment.Aug 22 2018, 9:43 AM

What do you think is the best way going forward overall: 1) evolving loop aware SLP initially in VPlan (independent of SLPVectorizer) and start using VPlan infrastructure for SLPVectorizer as it becomes available or 2) trying to share as much code between both from the start? Personally I think 1) would be better, as it allows us to evolve the VPlan parts quicker and I would like to avoid unnecessary code churn in SLPVectorizer and start using VPlan infrastructure there as it becomes clearly beneficial.

I agree with you. I think it's better to have an initial end-to-end loop aware SLP PoC, properly define the representation of SLP operations, define how SLP transformation comes into play and interacts with the rest of the VPlan infrastructure, etc., before we move forward on SLPVectorizer. In the VPlan native path we have some room for experimentation. We can take advantage of that and start with SLPVectorizer once we have a solid model for SLP in VPlan.

Yep! I think it would be good to get wider agreement of the direction. Is there anything I should look at/elaborate in more detail/summarize? Concerns with this approach that need to be addressed? It would be great to get some more feedback on this :)

This is why we haven't gone too deep in the analysis, yet. Sharing code between SLPVec, which operates in IR, and VPlanSLP, which operates in VInstructions, can be confusing, limiting or even not possible. The analysis part mostly works, because VPInstruction is similar enough to Inst, but implementation and, worse, heuristics, can go wrong very quickly.

In terms of re-use, Value/VPValue templatization could work for some analyses. Templatizing IRBuilder/VPlanBuilder might also work for some parts of the implementation. However, we may want to think carefully. If for templatization to work we needed the whole VPValue hierarchy and internal details to be a carbon copy of Value hierarchy, we would be losing all the flexibility that VPValue/VPInstruction are bringing to VPlan. Of course, we would have to investigate this a bit more. Another alternative that I had in mind is introducing some kind of InstructionTraits. That would give us an unified interface while allowing more flexibility at VPValue/Value implementation level. Unfortunately, we haven't had have time to investigate this approach further.

Right, I do not think we should restrict ourselves too much by coupling things together too early. I think in terms of analysis, we currently have access to def/uses, operands, opcodes in Instruction/VPInstruction and also DominatorTrees and LoopInfo for IR/VPlan, which should be enough for a set of analysis.

fhahn updated this revision to Diff 166309.Sep 20 2018, 9:05 AM

Rebased and added comments.

@RKSimon , @rengolin, @ABataev after the previous discussion, does getting this analysis in for VPlan only initially make sense to you? Or is there anything more I should investigate/write up?

fhahn added inline comments.
lib/Transforms/Vectorize/VPlan.h
1543

I've added a patch implementing DenseMap key info for smallvector: D52312

vporpo added inline comments.Sep 20 2018, 4:18 PM
lib/Transforms/Vectorize/VPlanSLP.cpp
135

Could we have other non-Store VPInstructions that could touch memory (e.g., calls) ?
Maybe it is safer to add a check whether the underlying IR instruction mayWriteToMemory().

141

I think we should also check that the operands are (i) all in the same BB, and (ii) in the same BB as the seed instructions (at least for now). A lit test for this would also be nice.

346

If I am not mistaken, since we are only allowing single user nodes in the graph, the graph is actually a tree. So this should not happen. We should have an assertion here to check this.

351

Use LLVM_DEBUG for dumpBundle()

364

Use LLVM_DEBUG for dumpBundle()

406

Hmm why no buildGraph() recursion here ?

fhahn updated this revision to Diff 166928.Sep 25 2018, 9:00 AM
fhahn marked 3 inline comments as done.

Thanks Vasileios! I hope I addressed all comments. The only follow up I need to do is the check that everything is in the same BB as the seed instructions, which I'll do tomorrow.

lib/Transforms/Vectorize/VPlanSLP.cpp
135

I've added a simple implementation for mayWriteToMemory to VPInstruction. We could just use mayWriteToMemory of the underlying instruction, but I think this would unnecessarily increase coupling of VPInstruction and IR instructions. What do you think?

141

I've added a check for (i) and I'll add a check for (ii) tomorrow. I'll think a bit more what the best way to do it would be.

I've added a unit test for (i), and will add IR tests once we VPlan SLP is hooked up to the vplan native code path.

346

Currently we check for unique users. For example, in the testSlpReuse_1 we have 2 add instruction which add the same value. In that case we should re-use the already created combined instruction I think

Hi Florian, yes I am happy with the changes, thanks!
Just please try to add an assertion that checks that the graph is a tree (see inline comment) to make sure that the graph is the one we expect.

lib/Transforms/Vectorize/VPlanSLP.cpp
135

I agree it looks better this way.

346

Ah yes, you are checking for unique users, so it should work the way you are describing. If I understand correctly, the graph can still be considered a tree since the only case when two nodes are connected by more than one path is when the user has two identical operands.

If this is the case, I still think that there should be some kind of assertion here checking that the graph is a tree and not a DAG, because SLP on a DAG is slightly different and would need a few more components to work correctly. I think that the only case where (I != BundleToCombined.end()) is true is the one you are describing: when the user instructions have identical operands. So maybe add an assertion that checks that the single users of Values have identical operands and a comment like "For now buildGraph should form a tree, not a DAG.".

fhahn updated this revision to Diff 168130.Oct 3 2018, 10:23 AM

Added assertion the ensure we only re-use nodes if the users of all values are equal and limited instructions considered to a single BB for now. I think this should now address all of @vporpo comments, thank you very much!

fhahn marked an inline comment as done.Oct 3 2018, 10:23 AM
ABataev added inline comments.Oct 31 2018, 10:52 AM
lib/Transforms/Vectorize/VPlanSLP.cpp
119

Seems to me the comment does not match the functionality

301

Try FinalOrder.emplace_back(Operands.first, Operands.second[0]);

fhahn updated this revision to Diff 172219.Nov 1 2018, 1:26 PM

Address comments, thanks!

fhahn added inline comments.Nov 1 2018, 1:29 PM
lib/Transforms/Vectorize/VPlanSLP.cpp
119

I've updated the comment and removed the code to check if all instructions are in the same BB. That's check earlier on. Does it make sense now?

301

I tried, but neither that nor emplace_back(Operands.first, {Operands.second[0])}; gets accepted unfortunately.

What about non-vectorizable loads/stores? Atomic, volaltile? Does it aware of those kind of instructions?

lib/Transforms/Vectorize/VPlanSLP.cpp
72

use try_emplace(to_vector<4>(Operands), New); instead

119

are no instructions?

fhahn updated this revision to Diff 172976.Nov 7 2018, 9:54 AM
fhahn marked an inline comment as done.

Address comments and add checks for simple loads/stores in areVectorizable().

Just one more question: how do you estimate that the vectorized code is more effective than the scalar one? What's the criterion?

lib/Transforms/Vectorize/VPlanValue.h
169

Capitalize i, must be I

fhahn updated this revision to Diff 173176.Nov 8 2018, 9:03 AM
fhahn marked an inline comment as done.
fhahn added a subscriber: sguggill.

Just one more question: how do you estimate that the vectorized code is more effective than the scalar one? What's the criterion?

This patch as a first step only adds support for the transform without cost modelling. This transform is intended as a VPlan2VPlan transformation and the general idea is to create multiple VPlans (say one with all the SLP opportunities applied and one with none) and evaluate the cost of the resulting VPlans at the end to choose the best one.

Together with @sguggill , we gave a brief overview of VP2VP transforms at the dev meeting (http://llvm.org/devmtg/2018-10/talk-abstracts.html#talk21) and this is something I want to write up a bit better in the actual VPlan design documentation.

I have no more comments.

fhahn added a comment.Nov 9 2018, 6:58 AM

I have no more comments.

Great, thanks for all the comments!

@RKSimon @rengolin @ABataev are you happy with the direction of getting this in as VPlan-only transform initially?

From the VPlan point of view, LGTM. I don't have any other comments.

Thanks, Florian!
Diego

rengolin accepted this revision.Nov 12 2018, 12:53 PM

Just had a look again and it's looking great, thanks Florian!

The unit tests help a lot in understanding what's the gist of the transformation and I really like where this is going.

I think time-wise, @RKSimon's concerns were addressed (ie. we're not going to switch to this any time soon, and certainly not without making sure all his work is kept/moved in).

Given that most people are happy with it, and there are no objections, I'll go on and approve it.

Thanks everyone!

This revision is now accepted and ready to land.Nov 12 2018, 12:53 PM

Thanks, everyone! Important first step towards the great converged Loop+SLP vectorizer.

I think time-wise, @RKSimon's concerns were addressed (ie. we're not going to switch to this any time soon, and certainly not without making sure all his work is kept/moved in).

Yup, I have no objections - cheers!

fhahn updated this revision to Diff 173830.Nov 13 2018, 3:45 AM

Thanks for all the review!

I've pulled in the DenseMapInfo changes from D52312 into this patch again, as making it generic requires some additional work. I will open an issue to follow up on this.

I'll commit this patch once D49489 gets accepted and committed.

fhahn closed this revision.Nov 14 2018, 8:14 AM

Committed as rL346857, thanks for all the comments! I will create a few tickets for follow up work.