This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
-
CMakeLists.txt
-
VPlan.h
-
VPlanHCFGTransforms.h
3/20
VPlanHCFGTransforms.cpp
-
unittests/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
-
CMakeLists.txt
1/3
VPlanHCFGTest.cpp

Differential D46827

[VPlan] Add VPInstruction to VPRecipe transformation.
ClosedPublic

Authored by fhahn on May 14 2018, 4:03 AM.

Download Raw Diff

Details

Reviewers

dcaballe
hsaito
mssimpso
hfinkel
rengolin
mkuper
javed.absar
sguggill

Commits

rG3385caaafd2c: [VPlan] Add VPInstruction to VPRecipe transformation.
rL334969: [VPlan] Add VPInstruction to VPRecipe transformation.

Summary

This patch introduces a VPInstructionToVPRecipe transformation, which
allows us to generate code for a VPInstruction based VPlan re-using the
existing infrastructure.

Diff Detail

Event Timeline

fhahn created this revision.May 14 2018, 4:03 AM

Herald added a reviewer: javed.absar. · View Herald TranscriptMay 14 2018, 4:03 AM

Herald added a subscriber: bollu. · View Herald Transcript

fhahn added a parent revision: D46826: [VPlan] Add VPlan based sinkInstructions utility..May 14 2018, 4:03 AM

fhahn added a parent revision: D44338: [LV][VPlan] Build plain CFG with simple VPInstructions for outer loops..May 14 2018, 7:27 AM

a.elovikov added a subscriber: a.elovikov.May 14 2018, 7:36 AM

fhahn mentioned this in D44338: [LV][VPlan] Build plain CFG with simple VPInstructions for outer loops..May 14 2018, 7:43 AM

Only add VPValues for branch conditions once for edge masks.

Hi Florian. Thanks for the patch!

This patch just moves things around and makes things clearer, but it modifies
the inner loop vectorizer. I think we should try to use as many parts of VPlan
for inner loop vectorization as well, to make sure we get as much
testing early on. But I am not entirely sure if it would be better to have
a completely separate inner loop vectorization path in the VPlan native path, to
eliminate the risk of breaking things in the inner loop vectorizers.
What do you think?

I agree with you but I think it's too soon to enable some of this code in the inner loop vectorizer. I would wait at least until we have codegen support for outer loops so that we can do a more proper testing before we use it for production. In addition, another big concern is that introducing this code in the inner loop vectorizer too soon will limit the current development flexibility that we have in the VPlan native path. For those reasons, I would prefer this patch to just do the VPInstruction2VPRecipe transformation for the VPlan native path and leave the inner loop vectorizer changes for later. Does it make sense to you?

lib/Transforms/Vectorize/LoopVectorizationPlanner.h
363 ↗	(On Diff #146616)	I think we should try to find a better place for this. The planner shouldn't be responsible for VPlan-to-VPlan transformations but just for doing the planning (orchestrating them). Otherwise, it will turn into a big brown bag of unrelated stuff. Since this is a small and temporal thing and it's kind of related to code generation, maybe we could include it in ILV class as a "prepare" CG step? Another option would be to create a specific class for it. We could rename VPlanHCFGBuilder.h/.cpp -> VPlanHCFGTransforms.h/.cpp and place there all the "small" VPlan-to-VPlan transformations.

In D46827#1098894, @dcaballe wrote:

I agree with you but I think it's too soon to enable some of this code in the inner loop vectorizer. I would wait at least until we have codegen support for outer loops so that we can do a more proper testing before we use it for production. In addition, another big concern is that introducing this code in the inner loop vectorizer too soon will limit the current development flexibility that we have in the VPlan native path. For those reasons, I would prefer this patch to just do the VPInstruction2VPRecipe transformation for the VPlan native path and leave the inner loop vectorizer changes for later. Does it make sense to you?

Yep, I think it would be better to move it to the VPlan native path, to have more flexibility when it comes to making changes.

The only problem is that the recipe generation depends on the cost model and we probably have to duplicate some code from the inner loop vectorizer in the VPlan native path, until we gradually replace them by VPlan implementations. I think that would give us a stable starting point (the VPlan native path should pass all existing inner loop vectorizer tests) and would help us to work towards VPlan native inner loop vectorization at a steady pace. Does this make sense?

lib/Transforms/Vectorize/LoopVectorizationPlanner.h
363 ↗	(On Diff #146616)	Agreed I just put it there because I was not sure where to best put it. I will move it to VPlanHCFGTransforms.

In D46827#1099761, @fhahn wrote:

The only problem is that the recipe generation depends on the cost model and we probably have to duplicate some code from the inner loop vectorizer in the VPlan native path, until we gradually replace them by VPlan implementations. I think that would give us a stable starting point (the VPlan native path should pass all existing inner loop vectorizer tests) and would help us to work towards VPlan native inner loop vectorization at a steady pace. Does this make sense?

What is the main dependence with cost model? Maybe @hsaito could have it into account since he is looking at how to refactor the current cost model.
In any case, I don't think we have to replicate too much code if we start with something simple, even if we generate inefficient code at the beginning. For example, I don't think we initially need the DeadInstructions code, anything related to masking or uniformity (since only uniform branches are allowed in the native path for now) or interleave accesses. That should simplify a lot the VPInstruction2VPRecipe step.

In D46827#1099973, @dcaballe wrote:

In D46827#1099761, @fhahn wrote:

The only problem is that the recipe generation depends on the cost model and we probably have to duplicate some code from the inner loop vectorizer in the VPlan native path, until we gradually replace them by VPlan implementations. I think that would give us a stable starting point (the VPlan native path should pass all existing inner loop vectorizer tests) and would help us to work towards VPlan native inner loop vectorization at a steady pace. Does this make sense?

What is the main dependence with cost model? Maybe @hsaito could have it into account since he is looking at how to refactor the current cost model.
In any case, I don't think we have to replicate too much code if we start with something simple, even if we generate inefficient code at the beginning. For example, I don't think we initially need the DeadInstructions code, anything related to masking or uniformity (since only uniform branches are allowed in the native path for now) or interleave accesses. That should simplify a lot the VPInstruction2VPRecipe step.

I'm working on CostModel starting from the last part, i.e., walk HCFG and VPRecipes/VPInstructions instead of BBs and Instructions, to compute the cost (and going backwards). There are bunch of decision processes, e.g., memref should be vectorized or serialized, that also need to be "converted" before the entire CostModel becomes VPlan ready ---- and the last part certainly depends on those decisions. I plan to create IR versions (which is existing code) and VPlan versions (new code), make sure we can compare side by side, become comfortable to switch to VPlan versions, and then finally retire the IR versions.

dcaballe mentioned this in D46825: [VPlan] Add moveAfter to VPRecipeBase..May 15 2018, 7:56 PM

dcaballe mentioned this in D46826: [VPlan] Add VPlan based sinkInstructions utility..May 15 2018, 8:09 PM

rkruppe added a subscriber: rkruppe.May 17 2018, 7:43 AM

I'll look into restructuring this soon.

I've tried restructuring the patch as suggested now and would appreciate your feedback on a few things before I update the patch:

Location of transformVPInstructionsToVPRecipies: I agree with Diego that LoopVectorizationPlanner is not the ideal place. But it needs access to the tryTo* functions to create VPRecipes, which unfortunately are defined in LoopVectorizationPlanner. I think there are 3 options: 1) keep transformVPInstructionsToVPRecipies in LoopVectorizationPlanner, 2) move the tryTo* functions to a better place (something like VPRecipeBuilder) or 3) pass a LoopVectorizationPlanner to the VPInstructionToVPRecipe transform. I think we should go for 1), if there is a concrete plan to get rid of VPRecipes in the near future or 2) if we need to keep the VPRecipes around for the time being (at least for the non-VPlan native path)

Move VPInstructionToVPRecipe to the VPlan native path: should be fairly straight forward, we already instantiate the legacy cost model in the VPlan native path, all we need to do is to allow inner loops in the VPlan native path, use the cost model to get the MaxVF and hook up VPlan execution, in case we decide to vectorize. We can migrate all those parts to VPlan infrastructure separately. Does that make sense?

Simplify buildVPlans() in the non VPlan native path: if we are pushing development of the VPlan based inner loop vectorization to the VPlan native path, we can avoid creating VPlans for each valid vectorization factor, as we throw them away without using them.

Hi Florian!

Location of transformVPInstructionsToVPRecipies: I agree with Diego that LoopVectorizationPlanner is not the ideal place. But it needs access to the tryTo* functions to create VPRecipes, which unfortunately are defined in LoopVectorizationPlanner. I think there are 3 options: 1) keep transformVPInstructionsToVPRecipies in LoopVectorizationPlanner, 2) move the tryTo* functions to a better place (something like VPRecipeBuilder) or 3) pass a LoopVectorizationPlanner to the VPInstructionToVPRecipe transform. I think we should go for 1), if there is a concrete plan to get rid of VPRecipes in the near future or 2) if we need to keep the VPRecipes around for the time being (at least for the non-VPlan native path)

Good suggestions! I would prefer #2 if it's feasible and it's not too much work. Moving the creation of the recipes to a different class sounds good to me since we'd prevent other developers to add similar code to the planner. VPRecipeBuilder sounds good to me. I also thought we could add them to VPBuilder since recipes are currently part of the IR representation, but I think it's not a good idea since these interfaces are doing much more than just creating a new recipe.

Move VPInstructionToVPRecipe to the VPlan native path: should be fairly straight forward, we already instantiate the legacy cost model in the VPlan native path, all we need to do is to allow inner loops in the VPlan native path, use the cost model to get the MaxVF and hook up VPlan execution, in case we decide to vectorize. We can migrate all those parts to VPlan infrastructure separately. Does that make sense?

My only concern is how that is going to work for outer loops. We need VPInstructionToVPRecipe working for them and, IMO, we shouldn't create another fork inside the VPlan native path to treat outer and inner loops differently. Reducing the number of recipes used in this step would also help us replace them quickly with VPInstructions in the VPlan native path. For those reasons, I suggested that we should start with a very simple VPInstructionToVPRecipe step that only created basic recipes (i.e., VPWiden*) for now, and leave interleaving and other optimizations for later, at least until we can make them work for outer loops. Do you see any problem with this approach? Is there something specific that you need for your follow-up work? We can think about how to accommodate it.

Simplify buildVPlans() in the non VPlan native path: if we are pushing development of the VPlan based inner loop vectorization to the VPlan native path, we can avoid creating VPlans for each valid vectorization factor, as we throw them away without using them.

I'm not sure I understand this point. In the non VPlan native path we are evaluating the cost for all VFs and choosing the best one. What do you want to change exactly?

Thanks!
Diego

Thanks Diego, I'll get the patches ready ASAP.

I'm not sure I understand this point. In the non VPlan native path we are evaluating the cost for all VFs and choosing the best one. What do you want to change exactly?

I've created D47477 to illustrate what I meant.

fhahn mentioned this in D47477: [VPlan] Move recipe based VPlan generation to separate function..May 29 2018, 2:41 PM

Update the patch to move inner loop vectorization using the VPInstr2VPRecipe transformation in the Vplan native path. I've updated the description of the revision with more details

Herald added a subscriber: tschuett. · View Herald TranscriptMay 31 2018, 10:06 AM

fhahn added a parent revision: D47595: [VPlan] Move recipe construction to VPRecipeBuilder..May 31 2018, 10:07 AM

My only concern is how that is going to work for outer loops. We need VPInstructionToVPRecipe working for them and, IMO, we shouldn't create another fork inside the VPlan native path to treat outer and inner loops differently. Reducing the number of recipes used in this step would also help us replace them quickly with VPInstructions in the VPlan native path. For those reasons, I suggested that we should start with a very simple VPInstructionToVPRecipe step that only created basic recipes (i.e., VPWiden*) for now, and leave interleaving and other optimizations for later, at least until we can make them work for outer loops. Do you see any problem with this approach? Is there something specific that you need for your follow-up work? We can think about how to accommodate it.

I played a around a bit with generating code for outer loops. I think there are a few things we need to do:

collect some required information about phis during the legality checks,
set which instructions to widen/scalarize in the outer/inner loops for the user provided VF, - that's all VPInstructionToVPRecipe needs for most instructions
handle inner loop PHI nodes and control flow between outer & inner loop.

sguggill added a subscriber: sguggill.Jun 4 2018, 9:47 AM

Thanks for this new version, Florian! Pretty excited to see that some inner loop tests are passing!
Satish is working on the support for outer loops in CG and we’ve been discussing your changes. We have to make sure that this patch aligns well with his next patch and also with the constraints that we stablished for this patch series and subsequent ones. Some comments in this regard:

We think that VPInstructionsToVPRecipes is still too smart for the vplan native path and bringing too much code (from CM and Legal, for example) an recipes that we would prefer to keep away from the native path until they are properly ported. Otherwise, we’d end up having the same development limitations as we have in the inner loop path. What we had in main for the Recipe to VPInstruction transition in the native path is something as simple as the code in https://reviews.llvm.org/D44338?vs=on&id=140081&whitespace=ignore-most, where createRecipesForVPBB is naively creating only VPWidenMemoryInstructionRecipe, VPWidenRecipe and VPWidenPHIRecipe. Satish will give you more details in this regard so that this patch aligns with his.

An example of #1 is createBlockInMask and createEdgeMask, which are currently invoked from tryToWidenMemory. In the VPlan native path we are only supporting uniform control flow so no masking is necessary at this point. For divergence control flow we plan to introduce a VPlan based predication algorithm in patch series #2 which will conflict with code.

If you agree with the suggestion in #1, I think it would be better to introduce the inner loop support in the VPlan native path in a separate patch. In that way, we would introduce: a) VPInstructionToVPRecipe, b) CG support (Satish) and, finally, c) Basic inner loop support, i.e., same constraints as for outer loops, same naïve approach as in a) and b) and making sure that support for both inner and outer loops are well aligned.

Of course, we also need to work to bring whatever is necessary for SLP-aware loop vectorization. We can have this discussion offline to better understand what would be missing in the VPlan native path.

Please, let me know what you think!
Satish will follow up with more information regarding the CG patch.

Thanks,
Diego

lib/Transforms/Vectorize/LoopVectorize.cpp
6300 ↗	(On Diff #149312)	Maybe it would be useful to keep this debug message for the remaining inner loops that don't hit the previous condition?

In D46827#1121357, @dcaballe wrote:

Thanks for this new version, Florian! Pretty excited to see that some inner loop tests are passing!
Satish is working on the support for outer loops in CG and we’ve been discussing your changes. We have to make sure that this patch aligns well with his next patch and also with the constraints that we stablished for this patch series and subsequent ones. Some comments in this regard:

We think that VPInstructionsToVPRecipes is still too smart for the vplan native path and bringing too much code (from CM and Legal, for example) an recipes that we would prefer to keep away from the native path until they are properly ported. Otherwise, we’d end up having the same development limitations as we have in the inner loop path. What we had in main for the Recipe to VPInstruction transition in the native path is something as simple as the code in https://reviews.llvm.org/D44338?vs=on&id=140081&whitespace=ignore-most, where createRecipesForVPBB is naively creating only VPWidenMemoryInstructionRecipe, VPWidenRecipe and VPWidenPHIRecipe. Satish will give you more details in this regard so that this patch aligns with his.

Sure, we can start with just creating the basic recipes. Should we only support widening for a start and not relying on the cost model at all? I suppose that would make things easier for outer loop code gen.

An example of #1 is createBlockInMask and createEdgeMask, which are currently invoked from tryToWidenMemory. In the VPlan native path we are only supporting uniform control flow so no masking is necessary at this point. For divergence control flow we plan to introduce a VPlan based predication algorithm in patch series #2 which will conflict with code.

If you agree with the suggestion in #1, I think it would be better to introduce the inner loop support in the VPlan native path in a separate patch. In that way, we would introduce: a) VPInstructionToVPRecipe, b) CG support (Satish) and, finally, c) Basic inner loop support, i.e., same constraints as for outer loops, same naïve approach as in a) and b) and making sure that support for both inner and outer loops are well aligned.

Yep that sounds good. I would prefer not to land this change as dead code and if inner loop support is not pending on CG for outer loops, it would be great if we could get it in independently from that. Of course make sure it fits well with CG for outer loops.

Of course, we also need to work to bring whatever is necessary for SLP-aware loop vectorization. We can have this discussion offline to better understand what would be missing in the VPlan native path.

I think to start with, the limited set of recipes is more than enough. Once we have inner loop support I shared an updated set of patches for SLP.

Thanks for your comments! Please let me know what you require for outer loop CG and when you are planning on sharing those patches, so I do not end up doing unnecessary work :)

Sure, we can start with just creating the basic recipes. Should we only support widening for a start and not relying on the cost model at all? I suppose that would make things easier for outer loop code gen.

Hi Florian,

For the initial outer loop vector code generation, we were planning to only support widening and not rely on the cost model at all. Where the current code generation relies on results of cost modeling, we were planning to put changes in place to return a conservative result for the VPlanNativePath. Given the current constraints for the supported outer loops in the VPlanNativePath, this is what we had in mind specifically:

generate gather/scatter for loads/stores until we have proper divergence analysis in place. We may be able to infer unit-stride information using current SCEV based analysis. However, for the initial implementation we can generate gathers/scatters by returning CM_GatherScatter in getWideningDecision for outer loops in VPlanNativePath.

widen non-induction PHIs. The initial implementation will run some part of legality checks to recognize inductions as you pointed out earlier. PHIs that are not in the outer loop header will be widened. Due to back edges from inner loops we may need to generate place-holder PHIs to begin with and fixup such PHIs at the end of code generation. This is due to the fact that vector values corresponding to all the phi-operands may not be available when widening the PHI instruction. This will be similar to the way reductions/recurrences are currently handled.

We will need to either guarantee that the phi operand order matches the parent BBlock successor order or have a mapping between scalar BBlocks and the corresponding vector BBlock.

treat all branch instructions as uniform. During vector CG, the branch operands need to be replaced with the vector equivalents. We will need an approach similar to handling of PHIs i.e. fixup branches and control flow at the end of vector CG.

all other instructions will simply be widened using vector values or generating broadcasts(for loop invariants)

Initially, we will need at the least - widenMemory, widen, and widenPhi recipes. Branch instruction can be part of a widenRecipe and CG needs to handle it appropriately for the VPlanNativePath. If that is not acceptable, we can think about having a new recipe to handle branch instructions.

Please let me know if this matches with what you had in mind. We will try to send you a patch by the end of this week so that you can see what we have.

Satish

In D46827#1121496, @sguggill wrote:

Please let me know if this matches with what you had in mind. We will try to send you a patch by the end of this week so that you can see what we have.

Yep that sounds good. It sounds like I could update this patch to use a simplified recipe creation as Diego mentioned?

In D46827#1121624, @fhahn wrote:

Yep that sounds good. It sounds like I could update this patch to use a simplified recipe creation as Diego mentioned?

I think that makes sense.

Updated to not use VPRecipeBuilder for recipe construction. It now relies relies on the original VPlan to be vectorizable without masking/interleaving. If that makes sense, I'll move the code enabling the CG to a separate patch and add a unit test for the transformation.

fhahn mentioned this in D47595: [VPlan] Move recipe construction to VPRecipeBuilder..Jun 6 2018, 10:26 AM

In D46827#1123771, @fhahn wrote:

Updated to not use VPRecipeBuilder for recipe construction. It now relies relies on the original VPlan to be vectorizable without masking/interleaving. If that makes sense, I'll move the code enabling the CG to a separate patch and add a unit test for the transformation.

Thanks, Florian! Much better now! Some comments below.

If that makes sense, I'll move the code enabling the CG to a separate patch and add a unit test for the transformation.

Please go ahead!

Let's see how this patch aligns with Satish's patch when it's ready.

Thanks,
Diego.

lib/Transforms/Vectorize/LoopVectorize.cpp
6395 ↗	(On Diff #150156)	I think this might not be needed at this point?
6398 ↗	(On Diff #150156)	unused?
lib/Transforms/Vectorize/VPlanHCFGTransforms.cpp
60	Since this transformation happens just before the execution of VPlan, I think it would be better to just modify the existing VPlan (i.e., remove each VPInstruction and add the corresponding recipes) instead of creating a new one. In that way, this code would be even simpler. Does it make sense to you or did you decide to create a new one for some reason?
74	We should always have a TopRegion. dyn_cast -> cast?
80	Please, remove TopRegion check.
95	Shoudn't we add DbgInfoIntrinsic to DeadInstructions then?

fhahn mentioned this in D47942: [SmallSet] Add SmallSetIterator..Jun 8 2018, 6:16 AM

fhahn added a parent revision: D47942: [SmallSet] Add SmallSetIterator..

craig.topper added a subscriber: craig.topper.Jun 8 2018, 8:58 AM

craig.topper added inline comments.

lib/Transforms/Vectorize/VPlanHCFGTransforms.cpp
134	Is the iteration order guaranteed here when the set is small? It's been a while since I've look at how SmallSet manages the vector.

fhahn added inline comments.Jun 8 2018, 9:01 AM

lib/Transforms/Vectorize/VPlanHCFGTransforms.cpp
134	I've just put up a patch adding an iterator for SmallSet: D47942 If the set is small, it just uses a SmallVector, otherwise it uses std::set, which should also give a deterministic iteration order.

craig.topper added inline comments.Jun 8 2018, 9:09 AM

lib/Transforms/Vectorize/VPlanHCFGTransforms.cpp
134	I guess deterministic wasn't the right word. It looks like when the set is small this will print out in the order that the VFs were inserted into the set not a numeric order? Given that this is going into a string is that a good idea?

fhahn added inline comments.Jun 8 2018, 9:11 AM

lib/Transforms/Vectorize/VPlanHCFGTransforms.cpp
134	Yeah, I guess we would need to sort the VFs here. But I'll update the patch to transform the original VPlan in place, so this code should go away.

craig.topper added inline comments.Jun 8 2018, 9:18 AM

lib/Transforms/Vectorize/VPlanHCFGTransforms.cpp
134	What's the range of VFs that you expect to use? Could you use a BitVector or SparseBitVector instead?

Updated to modify the original plan in place, and removed inner loop vectorization code path from VPlan native path for now.

Herald added subscribers: rogfer01, mgorny. · View Herald TranscriptJun 12 2018, 8:37 AM

fhahn added inline comments.Jun 12 2018, 8:39 AM

lib/Transforms/Vectorize/VPlanHCFGTransforms.cpp
80	Done. I think we still need a check to not create recipes for the pre-header and exit blocks?

Fix typo.

fhahn edited parent revisions, added: D48080: [VPlanRecipeBase] Add insertBefore helper., D48081: [VPlanRecipeBase] Add eraseFromParent().; removed: D47942: [SmallSet] Add SmallSetIterator., D47595: [VPlan] Move recipe construction to VPRecipeBuilder., D44338: [LV][VPlan] Build plain CFG with simple VPInstructions for outer loops., D46826: [VPlan] Add VPlan based sinkInstructions utility..Jun 12 2018, 8:44 AM

dcaballe added a reviewer: sguggill.Jun 13 2018, 4:25 PM

Thanks for the rework and for your patience, Florian!
I don't see any major issues. Please, wait for Satish's approval, just in case he has any comments regarding how this would interact with his next patch.
In any case, we could always address any issues in a separate commit.

Diego

lib/Transforms/Vectorize/VPlanHCFGTransforms.cpp
76	Shoudn't we collect DbgInfoIntrinsic as a DeadInstructions instead of having this check here?
80	Yes, you can do that for now. I plan to create clean/empty PH and Exit during construction so that we can safely vectorize or process whatever we add to them other VPlan-to-VPlan transformations.
83	remove { } for single line ifs and elses?
unittests/Transforms/Vectorize/VPlanHCFGTest.cpp
39	Thanks a lot for these tests!

Thanks Diego! I'll address the remaining comments after Satish had a look, so I can address any additional comments in one go

sguggill added inline comments.Jun 14 2018, 6:03 PM

lib/Transforms/Vectorize/VPlanHCFGTransforms.cpp
76	An assert(inst) would be nice here.
94	I think we are diverging from the nonVPlanNativePath case, where a widenRecipe can have > 1 ingredients but this should be fine.
unittests/Transforms/Vectorize/VPlanHCFGTest.cpp
144	VPInstructionsToVPRecipies -> VPInstructionsToVPRecipes I noticed this while trying to test my outerloop vector code generation changes along with your latest changes.

fhahn added subscribers: sdesmalen, huntergr.Jun 15 2018, 3:41 AM

Thanks for having a look. The comments should be addressed now.

dcaballe added inline comments.Jun 15 2018, 9:11 AM

lib/Transforms/Vectorize/VPlanHCFGTransforms.cpp

You mean in line 95? I see. We shouldn't create unnecessary recipes if we can reuse an existing one. Could we do something similar to the code at the end of 'tryToWiden'?:

// Success: widen this instruction. We optimize the common case where
// consecutive instructions can be represented by a single recipe.
if (!VPBB->empty()) {
  VPWidenRecipe *LastWidenRecipe = dyn_cast<VPWidenRecipe>(&VPBB->back());
  if (LastWidenRecipe && LastWidenRecipe->appendInstruction(I))
    return true;
}

VPBB->appendRecipe(new VPWidenRecipe(I));

Thanks, updated the patch to combine widened instructions in single recipe, if last recipe was a VPWidenRecipe.

lib/Transforms/Vectorize/VPlanHCFGTransforms.cpp
76	Switched to cast<>
94	Yep, this code creates a new WidenRecipe for each widened instruction because it is slightly more work to get the previous recipe, because we remove the current instruction. I can change it though if you think that's better.
unittests/Transforms/Vectorize/VPlanHCFGTest.cpp
144	Ah yes, sorry about that. This instance slipped through when I fixed the typo.

The changes look good to me. I have also verified that my outer loop vector code generation changes work with these changes after some minor modifications.

lib/Transforms/Vectorize/VPlanHCFGTransforms.cpp
76	OK.
94	This looks good. Thanks!

Thanks a lot, Florian! LGTM!

This revision is now accepted and ready to land.Jun 15 2018, 11:19 AM

Great thanks for all the comments! I'll commit it early next week, unless there are any more comments

Rebased. Thanks for all the feedback. I'll commit it in a bit

Closed by commit rL334969: [VPlan] Add VPInstruction to VPRecipe transformation. (authored by fhahn). · Explain WhyJun 18 2018, 11:33 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Transforms/

Vectorize/

CMakeLists.txt

1 line

VPlan.h

2 lines

VPlanHCFGTransforms.h

36 lines

VPlanHCFGTransforms.cpp

73 lines

unittests/

Transforms/

Vectorize/

CMakeLists.txt

2 lines

VPlanHCFGTest.cpp

178 lines

Diff 151738

lib/Transforms/Vectorize/CMakeLists.txt

	add_llvm_library(LLVMVectorize			add_llvm_library(LLVMVectorize
	LoadStoreVectorizer.cpp			LoadStoreVectorizer.cpp
	LoopVectorizationLegality.cpp			LoopVectorizationLegality.cpp
	LoopVectorize.cpp			LoopVectorize.cpp
	SLPVectorizer.cpp			SLPVectorizer.cpp
	Vectorize.cpp			Vectorize.cpp
	VPlan.cpp			VPlan.cpp
	VPlanHCFGBuilder.cpp			VPlanHCFGBuilder.cpp
				VPlanHCFGTransforms.cpp
	VPlanVerifier.cpp			VPlanVerifier.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	${LLVM_MAIN_INCLUDE_DIR}/llvm/Transforms			${LLVM_MAIN_INCLUDE_DIR}/llvm/Transforms

	DEPENDS			DEPENDS
	intrinsics_gen			intrinsics_gen
	)			)

lib/Transforms/Vectorize/VPlan.h

Show First 20 Lines • Show All 562 Lines • ▼ Show 20 Lines	public:
iplist<VPRecipeBase>::iterator eraseFromParent();		iplist<VPRecipeBase>::iterator eraseFromParent();
};		};

/// This is a concrete Recipe that models a single VPlan-level instruction.		/// This is a concrete Recipe that models a single VPlan-level instruction.
/// While as any Recipe it may generate a sequence of IR instructions when		/// While as any Recipe it may generate a sequence of IR instructions when
/// executed, these instructions would always form a single-def expression as		/// executed, these instructions would always form a single-def expression as
/// the VPInstruction is also a single def-use vertex.		/// the VPInstruction is also a single def-use vertex.
class VPInstruction : public VPUser, public VPRecipeBase {		class VPInstruction : public VPUser, public VPRecipeBase {
		friend class VPlanHCFGTransforms;

public:		public:
/// VPlan opcodes, extending LLVM IR with idiomatics instructions.		/// VPlan opcodes, extending LLVM IR with idiomatics instructions.
enum { Not = Instruction::OtherOpsEnd + 1 };		enum { Not = Instruction::OtherOpsEnd + 1 };

private:		private:
typedef unsigned char OpcodeTy;		typedef unsigned char OpcodeTy;
OpcodeTy Opcode;		OpcodeTy Opcode;

▲ Show 20 Lines • Show All 745 Lines • Show Last 20 Lines

lib/Transforms/Vectorize/VPlanHCFGTransforms.h

This file was added.

				//===- VPlanHCFGTransforms.h - Utility VPlan to VPlan transforms ----------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				/// This file provides utility VPlan to VPlan transformations.
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_TRANSFORMS_VECTORIZE_VPLANHCFGTRANSFORMS_H
				#define LLVM_TRANSFORMS_VECTORIZE_VPLANHCFGTRANSFORMS_H

				#include "VPlan.h"
				#include "llvm/IR/Instruction.h"
				#include "llvm/Transforms/Vectorize/LoopVectorizationLegality.h"

				namespace llvm {

				class VPlanHCFGTransforms {

				public:
				/// Replaces the VPInstructions in \p Plan with corresponding
				/// widen recipes.
				static void VPInstructionsToVPRecipes(
				VPlanPtr &Plan,
				LoopVectorizationLegality::InductionList *Inductions,
				SmallPtrSetImpl<Instruction *> &DeadInstructions);
				};

				} // namespace llvm

				#endif // LLVM_TRANSFORMS_VECTORIZE_VPLANHCFGTRANSFORMS_H

lib/Transforms/Vectorize/VPlanHCFGTransforms.cpp

This file was added.

				//===-- VPlanHCFGTransforms.cpp - Utility VPlan to VPlan transforms -------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				/// This file implements a set of utility VPlan to VPlan transformations.
				///
				//===----------------------------------------------------------------------===//

				#include "VPlanHCFGTransforms.h"
				#include "llvm/ADT/PostOrderIterator.h"

				using namespace llvm;

				void VPlanHCFGTransforms::VPInstructionsToVPRecipes(
				VPlanPtr &Plan,
				LoopVectorizationLegality::InductionList *Inductions,
				SmallPtrSetImpl<Instruction *> &DeadInstructions) {

				VPRegionBlock *TopRegion = dyn_cast<VPRegionBlock>(Plan->getEntry());
				ReversePostOrderTraversal<VPBlockBase *> RPOT(TopRegion->getEntry());
				for (VPBlockBase *Base : RPOT) {
				// Do not widen instructions in pre-header and exit blocks.
				if (Base->getNumPredecessors() == 0 \|\| Base->getNumSuccessors() == 0)
				continue;

				VPBasicBlock *VPBB = Base->getEntryBasicBlock();
				VPRecipeBase *LastRecipe = nullptr;
				// Introduce each ingredient into VPlan.
				for (auto I = VPBB->begin(), E = VPBB->end(); I != E;) {
				VPRecipeBase Ingredient = &I++;
				// Can only handle VPInstructions.
				VPInstruction *VPInst = cast<VPInstruction>(Ingredient);
				Instruction *Inst = cast<Instruction>(VPInst->getUnderlyingValue());
				if (DeadInstructions.count(Inst)) {
				Ingredient->eraseFromParent();
				continue;
				}

				VPRecipeBase *NewRecipe = nullptr;
				// Create VPWidenMemoryInstructionRecipe for loads and stores.
				if (isa<LoadInst>(Inst) \|\| isa<StoreInst>(Inst))
				NewRecipe = new VPWidenMemoryInstructionRecipe(Inst, nullptr /Mask*/);
				else if (PHINode *Phi = dyn_cast<PHINode>(Inst)) {
				InductionDescriptor II = Inductions->lookup(Phi);
				if (II.getKind() == InductionDescriptor::IK_IntInduction \|\|
				II.getKind() == InductionDescriptor::IK_FpInduction) {
				NewRecipe = new VPWidenIntOrFpInductionRecipe(Phi);
				} else
				NewRecipe = new VPWidenPHIRecipe(Phi);
				} else {
				// If the last recipe is a VPWidenRecipe, add Inst to it instead of
				// creating a new recipe.
				if (VPWidenRecipe *WidenRecipe =
				dyn_cast_or_null<VPWidenRecipe>(LastRecipe)) {
				dcaballeUnsubmitted Not Done Reply Inline Actions Since this transformation happens just before the execution of VPlan, I think it would be better to just modify the existing VPlan (i.e., remove each VPInstruction and add the corresponding recipes) instead of creating a new one. In that way, this code would be even simpler. Does it make sense to you or did you decide to create a new one for some reason? dcaballe: Since this transformation happens just before the execution of VPlan, I think it would be…
				WidenRecipe->appendInstruction(Inst);
				Ingredient->eraseFromParent();
				continue;
				}
				NewRecipe = new VPWidenRecipe(Inst);
				}

				NewRecipe->insertBefore(Ingredient);
				LastRecipe = NewRecipe;
				Ingredient->eraseFromParent();
				}
				}
				}
				dcaballeUnsubmitted Not Done Reply Inline Actions We should always have a TopRegion. dyn_cast -> cast? dcaballe: We should always have a TopRegion. dyn_cast -> cast?
				dcaballeUnsubmitted Not Done Reply Inline Actions Please, remove TopRegion check. dcaballe: Please, remove TopRegion check.
				fhahnAuthorUnsubmitted Not Done Reply Inline Actions Done. I think we still need a check to not create recipes for the pre-header and exit blocks? fhahn: Done. I think we still need a check to not create recipes for the pre-header and exit blocks?
				dcaballeUnsubmitted Not Done Reply Inline Actions Yes, you can do that for now. I plan to create clean/empty PH and Exit during construction so that we can safely vectorize or process whatever we add to them other VPlan-to-VPlan transformations. dcaballe: Yes, you can do that for now. I plan to create clean/empty PH and Exit during construction so…
				dcaballeUnsubmitted Not Done Reply Inline Actions Shoudn't we add DbgInfoIntrinsic to DeadInstructions then? dcaballe: Shoudn't we add DbgInfoIntrinsic to DeadInstructions then?
				craig.topperUnsubmitted Not Done Reply Inline Actions Is the iteration order guaranteed here when the set is small? It's been a while since I've look at how SmallSet manages the vector. craig.topper: Is the iteration order guaranteed here when the set is small? It's been a while since I've look…
				fhahnAuthorUnsubmitted Not Done Reply Inline Actions I've just put up a patch adding an iterator for SmallSet: D47942 If the set is small, it just uses a SmallVector, otherwise it uses std::set, which should also give a deterministic iteration order. fhahn: I've just put up a patch adding an iterator for SmallSet: D47942 If the set is small, it just…
				craig.topperUnsubmitted Not Done Reply Inline Actions I guess deterministic wasn't the right word. It looks like when the set is small this will print out in the order that the VFs were inserted into the set not a numeric order? Given that this is going into a string is that a good idea? craig.topper: I guess deterministic wasn't the right word. It looks like when the set is small this will…
				fhahnAuthorUnsubmitted Not Done Reply Inline Actions Yeah, I guess we would need to sort the VFs here. But I'll update the patch to transform the original VPlan in place, so this code should go away. fhahn: Yeah, I guess we would need to sort the VFs here. But I'll update the patch to transform the…
				craig.topperUnsubmitted Not Done Reply Inline Actions What's the range of VFs that you expect to use? Could you use a BitVector or SparseBitVector instead? craig.topper: What's the range of VFs that you expect to use? Could you use a BitVector or SparseBitVector…
				dcaballeUnsubmitted Done Reply Inline Actions remove { } for single line ifs and elses? dcaballe: remove { } for single line ifs and elses?
				dcaballeUnsubmitted Done Reply Inline Actions Shoudn't we collect DbgInfoIntrinsic as a DeadInstructions instead of having this check here? dcaballe: Shoudn't we collect DbgInfoIntrinsic as a DeadInstructions instead of having this check here?
				sguggillUnsubmitted Not Done Reply Inline Actions An assert(inst) would be nice here. sguggill: An assert(inst) would be nice here.
				fhahnAuthorUnsubmitted Not Done Reply Inline Actions Switched to cast<> fhahn: Switched to cast<>
				sguggillUnsubmitted Not Done Reply Inline Actions OK. sguggill: OK.
				sguggillUnsubmitted Not Done Reply Inline Actions I think we are diverging from the nonVPlanNativePath case, where a widenRecipe can have > 1 ingredients but this should be fine. sguggill: I think we are diverging from the nonVPlanNativePath case, where a widenRecipe can have > 1…
				fhahnAuthorUnsubmitted Not Done Reply Inline Actions Yep, this code creates a new WidenRecipe for each widened instruction because it is slightly more work to get the previous recipe, because we remove the current instruction. I can change it though if you think that's better. fhahn: Yep, this code creates a new WidenRecipe for each widened instruction because it is slightly…
				sguggillUnsubmitted Not Done Reply Inline Actions This looks good. Thanks! sguggill: This looks good. Thanks!
				dcaballeUnsubmitted Done Reply Inline Actions You mean in line 95? I see. We shouldn't create unnecessary recipes if we can reuse an existing one. Could we do something similar to the code at the end of 'tryToWiden'?: // Success: widen this instruction. We optimize the common case where // consecutive instructions can be represented by a single recipe. if (!VPBB->empty()) { VPWidenRecipe LastWidenRecipe = dyn_cast<VPWidenRecipe>(&VPBB->back()); if (LastWidenRecipe && LastWidenRecipe->appendInstruction(I)) return true; } VPBB->appendRecipe(new VPWidenRecipe(I)); dcaballe:* You mean in line 95? I see. We shouldn't create unnecessary recipes if we can reuse an existing…

unittests/Transforms/Vectorize/CMakeLists.txt

	set(LLVM_LINK_COMPONENTS			set(LLVM_LINK_COMPONENTS
	Vectorize			Vectorize
				AsmParser
	)			)

	add_llvm_unittest(VectorizeTests			add_llvm_unittest(VectorizeTests
	VPlanTest.cpp			VPlanTest.cpp
				VPlanHCFGTest.cpp
	)			)

unittests/Transforms/Vectorize/VPlanHCFGTest.cpp

This file was added.

				//===- llvm/unittest/Transforms/Vectorize/VPlanHCFGTest.cpp ---------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//

				#include "../lib/Transforms/Vectorize/VPlan.h"
				#include "../lib/Transforms/Vectorize/VPlanHCFGBuilder.h"
				#include "../lib/Transforms/Vectorize/VPlanHCFGTransforms.h"
				#include "llvm/AsmParser/Parser.h"
				#include "llvm/IR/Dominators.h"
				#include "gtest/gtest.h"

				namespace llvm {
				namespace {

				class VPlanHCFGTest : public testing::Test {
				protected:
				std::unique_ptr<DominatorTree> DT;
				std::unique_ptr<LoopInfo> LI;

				VPlanHCFGTest() {}

				VPlanPtr doBuildPlan(BasicBlock *LoopHeader) {
				DT.reset(new DominatorTree(*LoopHeader->getParent()));
				LI.reset(new LoopInfo(*DT));

				auto Plan = llvm::make_unique<VPlan>();
				VPlanHCFGBuilder HCFGBuilder(LI->getLoopFor(LoopHeader), LI.get());
				HCFGBuilder.buildHierarchicalCFG(*Plan.get());
				return Plan;
				}
				};

				TEST_F(VPlanHCFGTest, testBuildHCFGInnerLoop) {
				LLVMContext Ctx;
				dcaballeUnsubmitted Not Done Reply Inline Actions Thanks a lot for these tests! dcaballe: Thanks a lot for these tests!
				const char *ModuleString =
				"define void @f(i32* %A, i64 %N) {\n"
				"entry:\n"
				" br label %for.body\n"
				"for.body:\n"
				" %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]\n"
				" %arr.idx = getelementptr inbounds i32, i32* %A, i64 %indvars.iv\n"
				" %l1 = load i32, i32* %arr.idx, align 4\n"
				" %res = add i32 %l1, 10\n"
				" store i32 %res, i32* %arr.idx, align 4\n"
				" %indvars.iv.next = add i64 %indvars.iv, 1\n"
				" %exitcond = icmp ne i64 %indvars.iv.next, %N\n"
				" br i1 %exitcond, label %for.body, label %for.end\n"
				"for.end:\n"
				" ret void\n"
				"}\n";

				SMDiagnostic Err;
				std::unique_ptr<Module> M = parseAssemblyString(ModuleString, Err, Ctx);

				Function *F = M->getFunction("f");
				BasicBlock *LoopHeader = F->getEntryBlock().getSingleSuccessor();
				auto Plan = doBuildPlan(LoopHeader);

				VPBlockBase *Entry = Plan->getEntry()->getEntryBasicBlock();
				EXPECT_NE(nullptr, Entry->getSingleSuccessor());
				EXPECT_EQ(0u, Entry->getNumPredecessors());
				EXPECT_EQ(1u, Entry->getNumSuccessors());

				VPBasicBlock *VecBB = Entry->getSingleSuccessor()->getEntryBasicBlock();
				EXPECT_EQ(7u, VecBB->size());
				EXPECT_EQ(2u, VecBB->getNumPredecessors());
				EXPECT_EQ(2u, VecBB->getNumSuccessors());

				auto Iter = VecBB->begin();
				VPInstruction Phi = dyn_cast<VPInstruction>(&Iter++);
				EXPECT_EQ(Instruction::PHI, Phi->getOpcode());

				VPInstruction Idx = dyn_cast<VPInstruction>(&Iter++);
				EXPECT_EQ(Instruction::GetElementPtr, Idx->getOpcode());
				EXPECT_EQ(2u, Idx->getNumOperands());
				EXPECT_EQ(Phi, Idx->getOperand(1));

				VPInstruction Load = dyn_cast<VPInstruction>(&Iter++);
				EXPECT_EQ(Instruction::Load, Load->getOpcode());
				EXPECT_EQ(1u, Load->getNumOperands());
				EXPECT_EQ(Idx, Load->getOperand(0));

				VPInstruction Add = dyn_cast<VPInstruction>(&Iter++);
				EXPECT_EQ(Instruction::Add, Add->getOpcode());
				EXPECT_EQ(2u, Add->getNumOperands());
				EXPECT_EQ(Load, Add->getOperand(0));

				VPInstruction Store = dyn_cast<VPInstruction>(&Iter++);
				EXPECT_EQ(Instruction::Store, Store->getOpcode());
				EXPECT_EQ(2u, Store->getNumOperands());
				EXPECT_EQ(Add, Store->getOperand(0));
				EXPECT_EQ(Idx, Store->getOperand(1));

				VPInstruction IndvarAdd = dyn_cast<VPInstruction>(&Iter++);
				EXPECT_EQ(Instruction::Add, IndvarAdd->getOpcode());
				EXPECT_EQ(2u, IndvarAdd->getNumOperands());
				EXPECT_EQ(Phi, IndvarAdd->getOperand(0));

				VPInstruction ICmp = dyn_cast<VPInstruction>(&Iter++);
				EXPECT_EQ(Instruction::ICmp, ICmp->getOpcode());
				EXPECT_EQ(2u, ICmp->getNumOperands());
				EXPECT_EQ(IndvarAdd, ICmp->getOperand(0));

				LoopVectorizationLegality::InductionList Inductions;
				SmallPtrSet<Instruction *, 1> DeadInstructions;
				VPlanHCFGTransforms::VPInstructionsToVPRecipes(Plan, &Inductions,
				DeadInstructions);
				}

				TEST_F(VPlanHCFGTest, testVPInstructionToVPRecipesInner) {
				LLVMContext Ctx;
				const char *ModuleString =
				"define void @f(i32* %A, i64 %N) {\n"
				"entry:\n"
				" br label %for.body\n"
				"for.body:\n"
				" %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]\n"
				" %arr.idx = getelementptr inbounds i32, i32* %A, i64 %indvars.iv\n"
				" %l1 = load i32, i32* %arr.idx, align 4\n"
				" %res = add i32 %l1, 10\n"
				" store i32 %res, i32* %arr.idx, align 4\n"
				" %indvars.iv.next = add i64 %indvars.iv, 1\n"
				" %exitcond = icmp ne i64 %indvars.iv.next, %N\n"
				" br i1 %exitcond, label %for.body, label %for.end\n"
				"for.end:\n"
				" ret void\n"
				"}\n";

				SMDiagnostic Err;
				std::unique_ptr<Module> M = parseAssemblyString(ModuleString, Err, Ctx);

				Function *F = M->getFunction("f");
				BasicBlock *LoopHeader = F->getEntryBlock().getSingleSuccessor();
				auto Plan = doBuildPlan(LoopHeader);

				LoopVectorizationLegality::InductionList Inductions;
				SmallPtrSet<Instruction *, 1> DeadInstructions;
				VPlanHCFGTransforms::VPInstructionsToVPRecipes(Plan, &Inductions,
				DeadInstructions);
				sguggillUnsubmitted Done Reply Inline Actions VPInstructionsToVPRecipies -> VPInstructionsToVPRecipes I noticed this while trying to test my outerloop vector code generation changes along with your latest changes. sguggill: VPInstructionsToVPRecipies -> VPInstructionsToVPRecipes I noticed this while trying to test my…
				fhahnAuthorUnsubmitted Not Done Reply Inline Actions Ah yes, sorry about that. This instance slipped through when I fixed the typo. fhahn: Ah yes, sorry about that. This instance slipped through when I fixed the typo.

				VPBlockBase *Entry = Plan->getEntry()->getEntryBasicBlock();
				EXPECT_NE(nullptr, Entry->getSingleSuccessor());
				EXPECT_EQ(0u, Entry->getNumPredecessors());
				EXPECT_EQ(1u, Entry->getNumSuccessors());

				VPBasicBlock *VecBB = Entry->getSingleSuccessor()->getEntryBasicBlock();
				EXPECT_EQ(6u, VecBB->size());
				EXPECT_EQ(2u, VecBB->getNumPredecessors());
				EXPECT_EQ(2u, VecBB->getNumSuccessors());

				auto Iter = VecBB->begin();
				auto Phi = dyn_cast<VPWidenPHIRecipe>(&Iter++);
				EXPECT_NE(nullptr, Phi);

				auto Idx = dyn_cast<VPWidenRecipe>(&Iter++);
				EXPECT_NE(nullptr, Idx);

				auto Load = dyn_cast<VPWidenMemoryInstructionRecipe>(&Iter++);
				EXPECT_NE(nullptr, Load);

				auto Add = dyn_cast<VPWidenRecipe>(&Iter++);
				EXPECT_NE(nullptr, Add);

				auto Store = dyn_cast<VPWidenMemoryInstructionRecipe>(&Iter++);
				EXPECT_NE(nullptr, Store);

				auto LastWiden = dyn_cast<VPWidenRecipe>(&Iter++);
				EXPECT_NE(nullptr, LastWiden);
				EXPECT_EQ(VecBB->end(), Iter);
				}

				} // namespace
				} // namespace llvm

This is an archive of the discontinued LLVM Phabricator instance.

[VPlan] Add VPInstruction to VPRecipe transformation.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 151738

lib/Transforms/Vectorize/CMakeLists.txt

lib/Transforms/Vectorize/VPlan.h

lib/Transforms/Vectorize/VPlanHCFGTransforms.h

lib/Transforms/Vectorize/VPlanHCFGTransforms.cpp

unittests/Transforms/Vectorize/CMakeLists.txt

unittests/Transforms/Vectorize/VPlanHCFGTest.cpp

[VPlan] Add VPInstruction to VPRecipe transformation.
ClosedPublic