This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
3
LoopVectorize.cpp
-
VPRecipeBuilder.h
-
VPlanHCFGTransforms.h
3/20
VPlanHCFGTransforms.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
consecutive-ptr-uniforms.ll
-
i8-induction.ll
-
if-conversion.ll
-
increment.ll
-
induction.ll
-
induction_plus.ll
-
loop-scalars.ll
-
minmax_reduction.ll

Differential D46827

[VPlan] Add VPInstruction to VPRecipe transformation.
ClosedPublic

Authored by fhahn on May 14 2018, 4:03 AM.

Download Raw Diff

Details

Reviewers

dcaballe
hsaito
mssimpso
hfinkel
rengolin
mkuper
javed.absar
sguggill

Commits

rG3385caaafd2c: [VPlan] Add VPInstruction to VPRecipe transformation.
rL334969: [VPlan] Add VPInstruction to VPRecipe transformation.

Summary

This patch introduces a VPInstructionToVPRecipe transformation, which
allows us to generate code for a VPInstruction based VPlan re-using the
existing infrastructure.

Diff Detail

Event Timeline

fhahn created this revision.May 14 2018, 4:03 AM

Herald added a reviewer: javed.absar. · View Herald TranscriptMay 14 2018, 4:03 AM

Herald added a subscriber: bollu. · View Herald Transcript

fhahn added a parent revision: D46826: [VPlan] Add VPlan based sinkInstructions utility..May 14 2018, 4:03 AM

fhahn added a parent revision: D44338: [LV][VPlan] Build plain CFG with simple VPInstructions for outer loops..May 14 2018, 7:27 AM

a.elovikov added a subscriber: a.elovikov.May 14 2018, 7:36 AM

fhahn mentioned this in D44338: [LV][VPlan] Build plain CFG with simple VPInstructions for outer loops..May 14 2018, 7:43 AM

Only add VPValues for branch conditions once for edge masks.

Hi Florian. Thanks for the patch!

This patch just moves things around and makes things clearer, but it modifies
the inner loop vectorizer. I think we should try to use as many parts of VPlan
for inner loop vectorization as well, to make sure we get as much
testing early on. But I am not entirely sure if it would be better to have
a completely separate inner loop vectorization path in the VPlan native path, to
eliminate the risk of breaking things in the inner loop vectorizers.
What do you think?

I agree with you but I think it's too soon to enable some of this code in the inner loop vectorizer. I would wait at least until we have codegen support for outer loops so that we can do a more proper testing before we use it for production. In addition, another big concern is that introducing this code in the inner loop vectorizer too soon will limit the current development flexibility that we have in the VPlan native path. For those reasons, I would prefer this patch to just do the VPInstruction2VPRecipe transformation for the VPlan native path and leave the inner loop vectorizer changes for later. Does it make sense to you?

lib/Transforms/Vectorize/LoopVectorizationPlanner.h
363 ↗	(On Diff #146616)	I think we should try to find a better place for this. The planner shouldn't be responsible for VPlan-to-VPlan transformations but just for doing the planning (orchestrating them). Otherwise, it will turn into a big brown bag of unrelated stuff. Since this is a small and temporal thing and it's kind of related to code generation, maybe we could include it in ILV class as a "prepare" CG step? Another option would be to create a specific class for it. We could rename VPlanHCFGBuilder.h/.cpp -> VPlanHCFGTransforms.h/.cpp and place there all the "small" VPlan-to-VPlan transformations.

In D46827#1098894, @dcaballe wrote:

I agree with you but I think it's too soon to enable some of this code in the inner loop vectorizer. I would wait at least until we have codegen support for outer loops so that we can do a more proper testing before we use it for production. In addition, another big concern is that introducing this code in the inner loop vectorizer too soon will limit the current development flexibility that we have in the VPlan native path. For those reasons, I would prefer this patch to just do the VPInstruction2VPRecipe transformation for the VPlan native path and leave the inner loop vectorizer changes for later. Does it make sense to you?

Yep, I think it would be better to move it to the VPlan native path, to have more flexibility when it comes to making changes.

The only problem is that the recipe generation depends on the cost model and we probably have to duplicate some code from the inner loop vectorizer in the VPlan native path, until we gradually replace them by VPlan implementations. I think that would give us a stable starting point (the VPlan native path should pass all existing inner loop vectorizer tests) and would help us to work towards VPlan native inner loop vectorization at a steady pace. Does this make sense?

lib/Transforms/Vectorize/LoopVectorizationPlanner.h
363 ↗	(On Diff #146616)	Agreed I just put it there because I was not sure where to best put it. I will move it to VPlanHCFGTransforms.

In D46827#1099761, @fhahn wrote:

The only problem is that the recipe generation depends on the cost model and we probably have to duplicate some code from the inner loop vectorizer in the VPlan native path, until we gradually replace them by VPlan implementations. I think that would give us a stable starting point (the VPlan native path should pass all existing inner loop vectorizer tests) and would help us to work towards VPlan native inner loop vectorization at a steady pace. Does this make sense?

What is the main dependence with cost model? Maybe @hsaito could have it into account since he is looking at how to refactor the current cost model.
In any case, I don't think we have to replicate too much code if we start with something simple, even if we generate inefficient code at the beginning. For example, I don't think we initially need the DeadInstructions code, anything related to masking or uniformity (since only uniform branches are allowed in the native path for now) or interleave accesses. That should simplify a lot the VPInstruction2VPRecipe step.

In D46827#1099973, @dcaballe wrote:

In D46827#1099761, @fhahn wrote:

The only problem is that the recipe generation depends on the cost model and we probably have to duplicate some code from the inner loop vectorizer in the VPlan native path, until we gradually replace them by VPlan implementations. I think that would give us a stable starting point (the VPlan native path should pass all existing inner loop vectorizer tests) and would help us to work towards VPlan native inner loop vectorization at a steady pace. Does this make sense?

What is the main dependence with cost model? Maybe @hsaito could have it into account since he is looking at how to refactor the current cost model.
In any case, I don't think we have to replicate too much code if we start with something simple, even if we generate inefficient code at the beginning. For example, I don't think we initially need the DeadInstructions code, anything related to masking or uniformity (since only uniform branches are allowed in the native path for now) or interleave accesses. That should simplify a lot the VPInstruction2VPRecipe step.

I'm working on CostModel starting from the last part, i.e., walk HCFG and VPRecipes/VPInstructions instead of BBs and Instructions, to compute the cost (and going backwards). There are bunch of decision processes, e.g., memref should be vectorized or serialized, that also need to be "converted" before the entire CostModel becomes VPlan ready ---- and the last part certainly depends on those decisions. I plan to create IR versions (which is existing code) and VPlan versions (new code), make sure we can compare side by side, become comfortable to switch to VPlan versions, and then finally retire the IR versions.

dcaballe mentioned this in D46825: [VPlan] Add moveAfter to VPRecipeBase..May 15 2018, 7:56 PM

dcaballe mentioned this in D46826: [VPlan] Add VPlan based sinkInstructions utility..May 15 2018, 8:09 PM

rkruppe added a subscriber: rkruppe.May 17 2018, 7:43 AM

I'll look into restructuring this soon.

I've tried restructuring the patch as suggested now and would appreciate your feedback on a few things before I update the patch:

Location of transformVPInstructionsToVPRecipies: I agree with Diego that LoopVectorizationPlanner is not the ideal place. But it needs access to the tryTo* functions to create VPRecipes, which unfortunately are defined in LoopVectorizationPlanner. I think there are 3 options: 1) keep transformVPInstructionsToVPRecipies in LoopVectorizationPlanner, 2) move the tryTo* functions to a better place (something like VPRecipeBuilder) or 3) pass a LoopVectorizationPlanner to the VPInstructionToVPRecipe transform. I think we should go for 1), if there is a concrete plan to get rid of VPRecipes in the near future or 2) if we need to keep the VPRecipes around for the time being (at least for the non-VPlan native path)

Move VPInstructionToVPRecipe to the VPlan native path: should be fairly straight forward, we already instantiate the legacy cost model in the VPlan native path, all we need to do is to allow inner loops in the VPlan native path, use the cost model to get the MaxVF and hook up VPlan execution, in case we decide to vectorize. We can migrate all those parts to VPlan infrastructure separately. Does that make sense?

Simplify buildVPlans() in the non VPlan native path: if we are pushing development of the VPlan based inner loop vectorization to the VPlan native path, we can avoid creating VPlans for each valid vectorization factor, as we throw them away without using them.

Hi Florian!

Location of transformVPInstructionsToVPRecipies: I agree with Diego that LoopVectorizationPlanner is not the ideal place. But it needs access to the tryTo* functions to create VPRecipes, which unfortunately are defined in LoopVectorizationPlanner. I think there are 3 options: 1) keep transformVPInstructionsToVPRecipies in LoopVectorizationPlanner, 2) move the tryTo* functions to a better place (something like VPRecipeBuilder) or 3) pass a LoopVectorizationPlanner to the VPInstructionToVPRecipe transform. I think we should go for 1), if there is a concrete plan to get rid of VPRecipes in the near future or 2) if we need to keep the VPRecipes around for the time being (at least for the non-VPlan native path)

Good suggestions! I would prefer #2 if it's feasible and it's not too much work. Moving the creation of the recipes to a different class sounds good to me since we'd prevent other developers to add similar code to the planner. VPRecipeBuilder sounds good to me. I also thought we could add them to VPBuilder since recipes are currently part of the IR representation, but I think it's not a good idea since these interfaces are doing much more than just creating a new recipe.

Move VPInstructionToVPRecipe to the VPlan native path: should be fairly straight forward, we already instantiate the legacy cost model in the VPlan native path, all we need to do is to allow inner loops in the VPlan native path, use the cost model to get the MaxVF and hook up VPlan execution, in case we decide to vectorize. We can migrate all those parts to VPlan infrastructure separately. Does that make sense?

My only concern is how that is going to work for outer loops. We need VPInstructionToVPRecipe working for them and, IMO, we shouldn't create another fork inside the VPlan native path to treat outer and inner loops differently. Reducing the number of recipes used in this step would also help us replace them quickly with VPInstructions in the VPlan native path. For those reasons, I suggested that we should start with a very simple VPInstructionToVPRecipe step that only created basic recipes (i.e., VPWiden*) for now, and leave interleaving and other optimizations for later, at least until we can make them work for outer loops. Do you see any problem with this approach? Is there something specific that you need for your follow-up work? We can think about how to accommodate it.

Simplify buildVPlans() in the non VPlan native path: if we are pushing development of the VPlan based inner loop vectorization to the VPlan native path, we can avoid creating VPlans for each valid vectorization factor, as we throw them away without using them.

I'm not sure I understand this point. In the non VPlan native path we are evaluating the cost for all VFs and choosing the best one. What do you want to change exactly?

Thanks!
Diego

Thanks Diego, I'll get the patches ready ASAP.

I'm not sure I understand this point. In the non VPlan native path we are evaluating the cost for all VFs and choosing the best one. What do you want to change exactly?

I've created D47477 to illustrate what I meant.

fhahn mentioned this in D47477: [VPlan] Move recipe based VPlan generation to separate function..May 29 2018, 2:41 PM

Update the patch to move inner loop vectorization using the VPInstr2VPRecipe transformation in the Vplan native path. I've updated the description of the revision with more details

Herald added a subscriber: tschuett. · View Herald TranscriptMay 31 2018, 10:06 AM

fhahn added a parent revision: D47595: [VPlan] Move recipe construction to VPRecipeBuilder..May 31 2018, 10:07 AM

My only concern is how that is going to work for outer loops. We need VPInstructionToVPRecipe working for them and, IMO, we shouldn't create another fork inside the VPlan native path to treat outer and inner loops differently. Reducing the number of recipes used in this step would also help us replace them quickly with VPInstructions in the VPlan native path. For those reasons, I suggested that we should start with a very simple VPInstructionToVPRecipe step that only created basic recipes (i.e., VPWiden*) for now, and leave interleaving and other optimizations for later, at least until we can make them work for outer loops. Do you see any problem with this approach? Is there something specific that you need for your follow-up work? We can think about how to accommodate it.

I played a around a bit with generating code for outer loops. I think there are a few things we need to do:

collect some required information about phis during the legality checks,
set which instructions to widen/scalarize in the outer/inner loops for the user provided VF, - that's all VPInstructionToVPRecipe needs for most instructions
handle inner loop PHI nodes and control flow between outer & inner loop.

sguggill added a subscriber: sguggill.Jun 4 2018, 9:47 AM

Thanks for this new version, Florian! Pretty excited to see that some inner loop tests are passing!
Satish is working on the support for outer loops in CG and we’ve been discussing your changes. We have to make sure that this patch aligns well with his next patch and also with the constraints that we stablished for this patch series and subsequent ones. Some comments in this regard:

We think that VPInstructionsToVPRecipes is still too smart for the vplan native path and bringing too much code (from CM and Legal, for example) an recipes that we would prefer to keep away from the native path until they are properly ported. Otherwise, we’d end up having the same development limitations as we have in the inner loop path. What we had in main for the Recipe to VPInstruction transition in the native path is something as simple as the code in https://reviews.llvm.org/D44338?vs=on&id=140081&whitespace=ignore-most, where createRecipesForVPBB is naively creating only VPWidenMemoryInstructionRecipe, VPWidenRecipe and VPWidenPHIRecipe. Satish will give you more details in this regard so that this patch aligns with his.

An example of #1 is createBlockInMask and createEdgeMask, which are currently invoked from tryToWidenMemory. In the VPlan native path we are only supporting uniform control flow so no masking is necessary at this point. For divergence control flow we plan to introduce a VPlan based predication algorithm in patch series #2 which will conflict with code.

If you agree with the suggestion in #1, I think it would be better to introduce the inner loop support in the VPlan native path in a separate patch. In that way, we would introduce: a) VPInstructionToVPRecipe, b) CG support (Satish) and, finally, c) Basic inner loop support, i.e., same constraints as for outer loops, same naïve approach as in a) and b) and making sure that support for both inner and outer loops are well aligned.

Of course, we also need to work to bring whatever is necessary for SLP-aware loop vectorization. We can have this discussion offline to better understand what would be missing in the VPlan native path.

Please, let me know what you think!
Satish will follow up with more information regarding the CG patch.

Thanks,
Diego

lib/Transforms/Vectorize/LoopVectorize.cpp
6300	Maybe it would be useful to keep this debug message for the remaining inner loops that don't hit the previous condition?

In D46827#1121357, @dcaballe wrote:

Thanks for this new version, Florian! Pretty excited to see that some inner loop tests are passing!
Satish is working on the support for outer loops in CG and we’ve been discussing your changes. We have to make sure that this patch aligns well with his next patch and also with the constraints that we stablished for this patch series and subsequent ones. Some comments in this regard:

We think that VPInstructionsToVPRecipes is still too smart for the vplan native path and bringing too much code (from CM and Legal, for example) an recipes that we would prefer to keep away from the native path until they are properly ported. Otherwise, we’d end up having the same development limitations as we have in the inner loop path. What we had in main for the Recipe to VPInstruction transition in the native path is something as simple as the code in https://reviews.llvm.org/D44338?vs=on&id=140081&whitespace=ignore-most, where createRecipesForVPBB is naively creating only VPWidenMemoryInstructionRecipe, VPWidenRecipe and VPWidenPHIRecipe. Satish will give you more details in this regard so that this patch aligns with his.

Sure, we can start with just creating the basic recipes. Should we only support widening for a start and not relying on the cost model at all? I suppose that would make things easier for outer loop code gen.

An example of #1 is createBlockInMask and createEdgeMask, which are currently invoked from tryToWidenMemory. In the VPlan native path we are only supporting uniform control flow so no masking is necessary at this point. For divergence control flow we plan to introduce a VPlan based predication algorithm in patch series #2 which will conflict with code.

If you agree with the suggestion in #1, I think it would be better to introduce the inner loop support in the VPlan native path in a separate patch. In that way, we would introduce: a) VPInstructionToVPRecipe, b) CG support (Satish) and, finally, c) Basic inner loop support, i.e., same constraints as for outer loops, same naïve approach as in a) and b) and making sure that support for both inner and outer loops are well aligned.

Yep that sounds good. I would prefer not to land this change as dead code and if inner loop support is not pending on CG for outer loops, it would be great if we could get it in independently from that. Of course make sure it fits well with CG for outer loops.

Of course, we also need to work to bring whatever is necessary for SLP-aware loop vectorization. We can have this discussion offline to better understand what would be missing in the VPlan native path.

I think to start with, the limited set of recipes is more than enough. Once we have inner loop support I shared an updated set of patches for SLP.

Thanks for your comments! Please let me know what you require for outer loop CG and when you are planning on sharing those patches, so I do not end up doing unnecessary work :)

Sure, we can start with just creating the basic recipes. Should we only support widening for a start and not relying on the cost model at all? I suppose that would make things easier for outer loop code gen.

Hi Florian,

For the initial outer loop vector code generation, we were planning to only support widening and not rely on the cost model at all. Where the current code generation relies on results of cost modeling, we were planning to put changes in place to return a conservative result for the VPlanNativePath. Given the current constraints for the supported outer loops in the VPlanNativePath, this is what we had in mind specifically:

generate gather/scatter for loads/stores until we have proper divergence analysis in place. We may be able to infer unit-stride information using current SCEV based analysis. However, for the initial implementation we can generate gathers/scatters by returning CM_GatherScatter in getWideningDecision for outer loops in VPlanNativePath.

widen non-induction PHIs. The initial implementation will run some part of legality checks to recognize inductions as you pointed out earlier. PHIs that are not in the outer loop header will be widened. Due to back edges from inner loops we may need to generate place-holder PHIs to begin with and fixup such PHIs at the end of code generation. This is due to the fact that vector values corresponding to all the phi-operands may not be available when widening the PHI instruction. This will be similar to the way reductions/recurrences are currently handled.

We will need to either guarantee that the phi operand order matches the parent BBlock successor order or have a mapping between scalar BBlocks and the corresponding vector BBlock.

treat all branch instructions as uniform. During vector CG, the branch operands need to be replaced with the vector equivalents. We will need an approach similar to handling of PHIs i.e. fixup branches and control flow at the end of vector CG.

all other instructions will simply be widened using vector values or generating broadcasts(for loop invariants)

Initially, we will need at the least - widenMemory, widen, and widenPhi recipes. Branch instruction can be part of a widenRecipe and CG needs to handle it appropriately for the VPlanNativePath. If that is not acceptable, we can think about having a new recipe to handle branch instructions.

Please let me know if this matches with what you had in mind. We will try to send you a patch by the end of this week so that you can see what we have.

Satish

In D46827#1121496, @sguggill wrote:

Please let me know if this matches with what you had in mind. We will try to send you a patch by the end of this week so that you can see what we have.

Yep that sounds good. It sounds like I could update this patch to use a simplified recipe creation as Diego mentioned?

In D46827#1121624, @fhahn wrote:

Yep that sounds good. It sounds like I could update this patch to use a simplified recipe creation as Diego mentioned?

I think that makes sense.

Updated to not use VPRecipeBuilder for recipe construction. It now relies relies on the original VPlan to be vectorizable without masking/interleaving. If that makes sense, I'll move the code enabling the CG to a separate patch and add a unit test for the transformation.

fhahn mentioned this in D47595: [VPlan] Move recipe construction to VPRecipeBuilder..Jun 6 2018, 10:26 AM

In D46827#1123771, @fhahn wrote:

Updated to not use VPRecipeBuilder for recipe construction. It now relies relies on the original VPlan to be vectorizable without masking/interleaving. If that makes sense, I'll move the code enabling the CG to a separate patch and add a unit test for the transformation.

Thanks, Florian! Much better now! Some comments below.

If that makes sense, I'll move the code enabling the CG to a separate patch and add a unit test for the transformation.

Please go ahead!

Let's see how this patch aligns with Satish's patch when it's ready.

Thanks,
Diego.

lib/Transforms/Vectorize/LoopVectorize.cpp
6394	I think this might not be needed at this point?
6397	unused?
lib/Transforms/Vectorize/VPlanHCFGTransforms.cpp
60	Since this transformation happens just before the execution of VPlan, I think it would be better to just modify the existing VPlan (i.e., remove each VPInstruction and add the corresponding recipes) instead of creating a new one. In that way, this code would be even simpler. Does it make sense to you or did you decide to create a new one for some reason?
74	We should always have a TopRegion. dyn_cast -> cast?
80	Please, remove TopRegion check.
95	Shoudn't we add DbgInfoIntrinsic to DeadInstructions then?

fhahn mentioned this in D47942: [SmallSet] Add SmallSetIterator..Jun 8 2018, 6:16 AM

fhahn added a parent revision: D47942: [SmallSet] Add SmallSetIterator..

craig.topper added a subscriber: craig.topper.Jun 8 2018, 8:58 AM

craig.topper added inline comments.

lib/Transforms/Vectorize/VPlanHCFGTransforms.cpp
134	Is the iteration order guaranteed here when the set is small? It's been a while since I've look at how SmallSet manages the vector.

fhahn added inline comments.Jun 8 2018, 9:01 AM

lib/Transforms/Vectorize/VPlanHCFGTransforms.cpp
134	I've just put up a patch adding an iterator for SmallSet: D47942 If the set is small, it just uses a SmallVector, otherwise it uses std::set, which should also give a deterministic iteration order.

craig.topper added inline comments.Jun 8 2018, 9:09 AM

lib/Transforms/Vectorize/VPlanHCFGTransforms.cpp
134	I guess deterministic wasn't the right word. It looks like when the set is small this will print out in the order that the VFs were inserted into the set not a numeric order? Given that this is going into a string is that a good idea?

fhahn added inline comments.Jun 8 2018, 9:11 AM

lib/Transforms/Vectorize/VPlanHCFGTransforms.cpp
134	Yeah, I guess we would need to sort the VFs here. But I'll update the patch to transform the original VPlan in place, so this code should go away.

craig.topper added inline comments.Jun 8 2018, 9:18 AM

lib/Transforms/Vectorize/VPlanHCFGTransforms.cpp
134	What's the range of VFs that you expect to use? Could you use a BitVector or SparseBitVector instead?

Updated to modify the original plan in place, and removed inner loop vectorization code path from VPlan native path for now.

Herald added subscribers: rogfer01, mgorny. · View Herald TranscriptJun 12 2018, 8:37 AM

fhahn added inline comments.Jun 12 2018, 8:39 AM

lib/Transforms/Vectorize/VPlanHCFGTransforms.cpp
80	Done. I think we still need a check to not create recipes for the pre-header and exit blocks?

Fix typo.

fhahn edited parent revisions, added: D48080: [VPlanRecipeBase] Add insertBefore helper., D48081: [VPlanRecipeBase] Add eraseFromParent().; removed: D47942: [SmallSet] Add SmallSetIterator., D47595: [VPlan] Move recipe construction to VPRecipeBuilder., D44338: [LV][VPlan] Build plain CFG with simple VPInstructions for outer loops., D46826: [VPlan] Add VPlan based sinkInstructions utility..Jun 12 2018, 8:44 AM

dcaballe added a reviewer: sguggill.Jun 13 2018, 4:25 PM

Thanks for the rework and for your patience, Florian!
I don't see any major issues. Please, wait for Satish's approval, just in case he has any comments regarding how this would interact with his next patch.
In any case, we could always address any issues in a separate commit.

Diego

lib/Transforms/Vectorize/VPlanHCFGTransforms.cpp
76	Shoudn't we collect DbgInfoIntrinsic as a DeadInstructions instead of having this check here?
80	Yes, you can do that for now. I plan to create clean/empty PH and Exit during construction so that we can safely vectorize or process whatever we add to them other VPlan-to-VPlan transformations.
83	remove { } for single line ifs and elses?
unittests/Transforms/Vectorize/VPlanHCFGTest.cpp
38 ↗	(On Diff #150954)	Thanks a lot for these tests!

Thanks Diego! I'll address the remaining comments after Satish had a look, so I can address any additional comments in one go

sguggill added inline comments.Jun 14 2018, 6:03 PM

lib/Transforms/Vectorize/VPlanHCFGTransforms.cpp
76	An assert(inst) would be nice here.
94	I think we are diverging from the nonVPlanNativePath case, where a widenRecipe can have > 1 ingredients but this should be fine.
unittests/Transforms/Vectorize/VPlanHCFGTest.cpp
143 ↗	(On Diff #150954)	VPInstructionsToVPRecipies -> VPInstructionsToVPRecipes I noticed this while trying to test my outerloop vector code generation changes along with your latest changes.

fhahn added subscribers: sdesmalen, huntergr.Jun 15 2018, 3:41 AM

Thanks for having a look. The comments should be addressed now.

dcaballe added inline comments.Jun 15 2018, 9:11 AM

lib/Transforms/Vectorize/VPlanHCFGTransforms.cpp

You mean in line 95? I see. We shouldn't create unnecessary recipes if we can reuse an existing one. Could we do something similar to the code at the end of 'tryToWiden'?:

// Success: widen this instruction. We optimize the common case where
// consecutive instructions can be represented by a single recipe.
if (!VPBB->empty()) {
  VPWidenRecipe *LastWidenRecipe = dyn_cast<VPWidenRecipe>(&VPBB->back());
  if (LastWidenRecipe && LastWidenRecipe->appendInstruction(I))
    return true;
}

VPBB->appendRecipe(new VPWidenRecipe(I));

Thanks, updated the patch to combine widened instructions in single recipe, if last recipe was a VPWidenRecipe.

lib/Transforms/Vectorize/VPlanHCFGTransforms.cpp
76	Switched to cast<>
94	Yep, this code creates a new WidenRecipe for each widened instruction because it is slightly more work to get the previous recipe, because we remove the current instruction. I can change it though if you think that's better.
unittests/Transforms/Vectorize/VPlanHCFGTest.cpp
143 ↗	(On Diff #150954)	Ah yes, sorry about that. This instance slipped through when I fixed the typo.

The changes look good to me. I have also verified that my outer loop vector code generation changes work with these changes after some minor modifications.

lib/Transforms/Vectorize/VPlanHCFGTransforms.cpp
76	OK.
94	This looks good. Thanks!

Thanks a lot, Florian! LGTM!

This revision is now accepted and ready to land.Jun 15 2018, 11:19 AM

Great thanks for all the comments! I'll commit it early next week, unless there are any more comments

Rebased. Thanks for all the feedback. I'll commit it in a bit

Closed by commit rL334969: [VPlan] Add VPInstruction to VPRecipe transformation. (authored by fhahn). · Explain WhyJun 18 2018, 11:33 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

70 lines

VPRecipeBuilder.h

2 lines

VPlanHCFGTransforms.h

11 lines

VPlanHCFGTransforms.cpp

93 lines

test/

Transforms/

LoopVectorize/

consecutive-ptr-uniforms.ll

1 line

1 line

1 line

1 line

1 line

1 line

1 line

3 lines

Diff 149312

lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
// Vectorizing Compilers.		// Vectorizing Compilers.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/Transforms/Vectorize/LoopVectorize.h"		#include "llvm/Transforms/Vectorize/LoopVectorize.h"
#include "LoopVectorizationPlanner.h"		#include "LoopVectorizationPlanner.h"
#include "VPRecipeBuilder.h"		#include "VPRecipeBuilder.h"
#include "VPlanHCFGBuilder.h"		#include "VPlanHCFGBuilder.h"
		#include "VPlanHCFGTransforms.h"
#include "llvm/ADT/APInt.h"		#include "llvm/ADT/APInt.h"
#include "llvm/ADT/ArrayRef.h"		#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/DenseMapInfo.h"		#include "llvm/ADT/DenseMapInfo.h"
#include "llvm/ADT/Hashing.h"		#include "llvm/ADT/Hashing.h"
#include "llvm/ADT/MapVector.h"		#include "llvm/ADT/MapVector.h"
#include "llvm/ADT/None.h"		#include "llvm/ADT/None.h"
#include "llvm/ADT/Optional.h"		#include "llvm/ADT/Optional.h"
▲ Show 20 Lines • Show All 6,220 Lines • ▼ Show 20 Lines	if (!OrigLoop->empty()) {
assert(isPowerOf2_32(UserVF) && "VF needs to be a power of two");		assert(isPowerOf2_32(UserVF) && "VF needs to be a power of two");
LLVM_DEBUG(dbgs() << "LV: Using user VF " << UserVF << ".\n");		LLVM_DEBUG(dbgs() << "LV: Using user VF " << UserVF << ".\n");
buildVPlans(UserVF, UserVF);		buildVPlans(UserVF, UserVF);

// For VPlan build stress testing, we bail out after VPlan construction.		// For VPlan build stress testing, we bail out after VPlan construction.
if (VPlanBuildStressTest)		if (VPlanBuildStressTest)
return NoVectorization;		return NoVectorization;

		// No codegen support for outer loop VPlans for now.
		return NoVectorization;
		}

		if (UserVF) {
		LLVM_DEBUG(dbgs() << "LV: Using user VF " << UserVF << ".\n");
		assert(isPowerOf2_32(UserVF) && "VF needs to be a power of two");
		// Collect the instructions (and their associated costs) that will be more
		// profitable to scalarize.
		CM.selectUserVectorizationFactor(UserVF);
		buildVPlans(UserVF, UserVF);
		LLVM_DEBUG(printPlans(dbgs()));
return {UserVF, 0};		return {UserVF, 0};
}		}

LLVM_DEBUG(
dcaballeUnsubmitted Not Done Reply Inline Actions Maybe it would be useful to keep this debug message for the remaining inner loops that don't hit the previous condition? dcaballe: Maybe it would be useful to keep this debug message for the remaining inner loops that don't…
dbgs() << "LV: Not vectorizing. Inner loops aren't supported in the "
"VPlan-native path.\n");
return NoVectorization;		return NoVectorization;
}		}

VectorizationFactor		VectorizationFactor
LoopVectorizationPlanner::plan(bool OptForSize, unsigned UserVF) {		LoopVectorizationPlanner::plan(bool OptForSize, unsigned UserVF) {
assert(OrigLoop->empty() && "Inner loop expected.");		assert(OrigLoop->empty() && "Inner loop expected.");
// Width 1 means no vectorization, cost 0 means uncomputed cost.		// Width 1 means no vectorization, cost 0 means uncomputed cost.
const VectorizationFactor NoVectorization = {1U, 0U};		const VectorizationFactor NoVectorization = {1U, 0U};
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	void LoopVectorizationPlanner::executePlan(InnerLoopVectorizer &ILV,
// Notice: any optimization or new instruction that go		// Notice: any optimization or new instruction that go
// into the code below should also be implemented in		// into the code below should also be implemented in
// the cost-model.		// the cost-model.
//		//
//===------------------------------------------------===//		//===------------------------------------------------===//

// 2. Copy and widen instructions from the old loop into the new loop.		// 2. Copy and widen instructions from the old loop into the new loop.
assert(VPlans.size() == 1 && "Not a single VPlan to execute.");		assert(VPlans.size() == 1 && "Not a single VPlan to execute.");
		if (EnableVPlanNativePath) {
		VPlanHCFGTransforms::sinkInstructions(VPlans.front(),
		dcaballeUnsubmitted Not Done Reply Inline Actions I think this might not be needed at this point? dcaballe: I think this might not be needed at this point?
		Legal->getSinkAfter());

		VFRange Range = {BestVF, BestVF + 1};
		dcaballeUnsubmitted Not Done Reply Inline Actions unused? dcaballe: unused?
		VPRecipeBuilder RecipeBuilder(OrigLoop, TLI, TTI, Legal, CM, Builder);
		SmallPtrSet<Instruction *, 4> DeadInstructions;
		collectTriviallyDeadInstructions(DeadInstructions);

		VPlanPtr Widened = VPlanHCFGTransforms::VPInstructionsToVPRecipies(
		OrigLoop, VPlans.front(), Range, RecipeBuilder, DeadInstructions);

		Widened->execute(&State);
		} else
VPlans.front()->execute(&State);		VPlans.front()->execute(&State);

// 3. Fix the vectorized code: take care of header phi's, live-outs,		// 3. Fix the vectorized code: take care of header phi's, live-outs,
// predication, updating analyses.		// predication, updating analyses.
ILV.fixVectorizedLoop();		ILV.fixVectorizedLoop();
}		}

void LoopVectorizationPlanner::collectTriviallyDeadInstructions(		void LoopVectorizationPlanner::collectTriviallyDeadInstructions(
SmallPtrSetImpl<Instruction *> &DeadInstructions) {		SmallPtrSetImpl<Instruction *> &DeadInstructions) {
▲ Show 20 Lines • Show All 649 Lines • ▼ Show 20 Lines	LoopVectorizationPlanner::buildVPlanWithVPRecipes(
RSO.flush();		RSO.flush();
Plan->setName(PlanName);		Plan->setName(PlanName);

return Plan;		return Plan;
}		}

LoopVectorizationPlanner::VPlanPtr		LoopVectorizationPlanner::VPlanPtr
LoopVectorizationPlanner::buildVPlan(VFRange &Range) {		LoopVectorizationPlanner::buildVPlan(VFRange &Range) {
// Outer loop handling: They may require CFG and instruction level
// transformations before even evaluating whether vectorization is profitable.
// Since we cannot modify the incoming IR, we need to build VPlan upfront in		// Since we cannot modify the incoming IR, we need to build VPlan upfront in
// the vectorization pipeline.		// the vectorization pipeline so we can apply CFG and instruction level
assert(!OrigLoop->empty());		// transformations.
assert(EnableVPlanNativePath && "VPlan-native path is not enabled.");		assert(EnableVPlanNativePath && "VPlan-native path is not enabled.");

// Create new empty VPlan		// Create new empty VPlan
auto Plan = llvm::make_unique<VPlan>();		auto Plan = llvm::make_unique<VPlan>();

// Build hierarchical CFG		// Build hierarchical CFG
VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI);		VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI);
HCFGBuilder.buildHierarchicalCFG(*Plan.get());		HCFGBuilder.buildHierarchicalCFG(*Plan.get());

		std::string PlanName;
		raw_string_ostream RSO(PlanName);
		unsigned VF = Range.Start;
		Plan->addVF(VF);
		RSO << "Initial VPlan for VF={" << VF;
		for (VF = 2; VF < Range.End; VF = 2) {
		Plan->addVF(VF);
		RSO << "," << VF;
		}
		RSO << "},UF>=1";
		RSO.flush();
		Plan->setName(PlanName);

return Plan;		return Plan;
}		}

Value* LoopVectorizationPlanner::VPCallbackILV::		Value* LoopVectorizationPlanner::VPCallbackILV::
getOrCreateVectorValues(Value *V, unsigned Part) {		getOrCreateVectorValues(Value *V, unsigned Part) {
return ILV.getOrCreateVectorValue(V, Part);		return ILV.getOrCreateVectorValue(V, Part);
}		}

▲ Show 20 Lines • Show All 185 Lines • ▼ Show 20 Lines	static bool processLoopInVPlanNativePath(
unsigned UserVF = Hints.getWidth();		unsigned UserVF = Hints.getWidth();

// Check the function attributes to find out if this function should be		// Check the function attributes to find out if this function should be
// optimized for size.		// optimized for size.
bool OptForSize =		bool OptForSize =
Hints.getForce() != LoopVectorizeHints::FK_Enabled && F->optForSize();		Hints.getForce() != LoopVectorizeHints::FK_Enabled && F->optForSize();

// Plan how to best vectorize, return the best VF and its cost.		// Plan how to best vectorize, return the best VF and its cost.
LVP.planInVPlanNativePath(OptForSize, UserVF);		VectorizationFactor VF = LVP.planInVPlanNativePath(OptForSize, UserVF);

// Returning false. We are currently not generating vector code in the VPlan		if (VF.Width < 2)
// native path.
return false;		return false;

		LVP.setBestPlan(VF.Width, 1);

		// If we decided that it is legal to vectorize the loop, then do it.
		InnerLoopVectorizer LB(L, PSE, LI, DT, TLI, TTI, AC, ORE, VF.Width, 1, LVL,
		&CM);
		LVP.executePlan(LB, DT);
		++LoopsVectorized;

		return true;
}		}

bool LoopVectorizePass::processLoop(Loop *L) {		bool LoopVectorizePass::processLoop(Loop *L) {
assert((EnableVPlanNativePath \|\| L->empty()) &&		assert((EnableVPlanNativePath \|\| L->empty()) &&
"VPlan-native path is not enabled. Only process inner loops.");		"VPlan-native path is not enabled. Only process inner loops.");

#ifndef NDEBUG		#ifndef NDEBUG
const std::string DebugLocStr = getDebugLocString(L);		const std::string DebugLocStr = getDebugLocString(L);
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	#endif /* NDEBUG */
bool OptForSize =		bool OptForSize =
Hints.getForce() != LoopVectorizeHints::FK_Enabled && F->optForSize();		Hints.getForce() != LoopVectorizeHints::FK_Enabled && F->optForSize();

// Entrance to the VPlan-native vectorization path. Outer loops are processed		// Entrance to the VPlan-native vectorization path. Outer loops are processed
// here. They may require CFG and instruction level transformations before		// here. They may require CFG and instruction level transformations before
// even evaluating whether vectorization is profitable. Since we cannot modify		// even evaluating whether vectorization is profitable. Since we cannot modify
// the incoming IR, we need to build VPlan upfront in the vectorization		// the incoming IR, we need to build VPlan upfront in the vectorization
// pipeline.		// pipeline.
if (!L->empty())		if (EnableVPlanNativePath)
return processLoopInVPlanNativePath(L, PSE, LI, DT, &LVL, TTI, TLI, DB, AC,		return processLoopInVPlanNativePath(L, PSE, LI, DT, &LVL, TTI, TLI, DB, AC,
ORE, Hints);		ORE, Hints);

assert(L->empty() && "Inner loop expected.");		assert(L->empty() && "Inner loop expected.");
// Check the loop for a trip count threshold: vectorize loops with a tiny trip		// Check the loop for a trip count threshold: vectorize loops with a tiny trip
// count by optimizing for size, to minimize overheads.		// count by optimizing for size, to minimize overheads.
// Prefer constant trip counts over profile data, over upper bound estimate.		// Prefer constant trip counts over profile data, over upper bound estimate.
unsigned ExpectedTC = 0;		unsigned ExpectedTC = 0;
▲ Show 20 Lines • Show All 322 Lines • Show Last 20 Lines

lib/Transforms/Vectorize/VPRecipeBuilder.h

Show First 20 Lines • Show All 114 Lines • ▼ Show 20 Lines	VPRecipeBuilder(Loop OrigLoop, const TargetLibraryInfo TLI,
: OrigLoop(OrigLoop), TLI(TLI), TTI(TTI), Legal(Legal), CM(CM),		: OrigLoop(OrigLoop), TLI(TLI), TTI(TTI), Legal(Legal), CM(CM),
Builder(Builder) {}		Builder(Builder) {}

/// Check if a recipe can be create for \p I withing the given VF \p Range.		/// Check if a recipe can be create for \p I withing the given VF \p Range.
/// If a recipe can be created, it adds it to \p VPBB.		/// If a recipe can be created, it adds it to \p VPBB.
bool tryToCreateRecipe(Instruction *Instr, VFRange &Range, VPlanPtr &Plan,		bool tryToCreateRecipe(Instruction *Instr, VFRange &Range, VPlanPtr &Plan,
VPBasicBlock *VPBB);		VPBasicBlock *VPBB);

		void setInsertPoint(VPBasicBlock *VPBB) { Builder.setInsertPoint(VPBB); }

/// Build a VPReplicationRecipe for \p I and enclose it within a Region if it		/// Build a VPReplicationRecipe for \p I and enclose it within a Region if it
/// is predicated. \return \p VPBB augmented with this new recipe if \p I is		/// is predicated. \return \p VPBB augmented with this new recipe if \p I is
/// not predicated, otherwise \return a new VPBasicBlock that succeeds the new		/// not predicated, otherwise \return a new VPBasicBlock that succeeds the new
/// Region. Update the packing decision of predicated instructions if they		/// Region. Update the packing decision of predicated instructions if they
/// feed \p I. Range.End may be decreased to ensure same recipe behavior from		/// feed \p I. Range.End may be decreased to ensure same recipe behavior from
/// \p Range.Start to \p Range.End.		/// \p Range.Start to \p Range.End.
VPBasicBlock *handleReplication(		VPBasicBlock *handleReplication(
Instruction I, VFRange &Range, VPBasicBlock VPBB,		Instruction I, VFRange &Range, VPBasicBlock VPBB,
DenseMap<Instruction , VPReplicateRecipe > &PredInst2Recipe,		DenseMap<Instruction , VPReplicateRecipe > &PredInst2Recipe,
VPlanPtr &Plan);		VPlanPtr &Plan);
};		};
} // end namespace llvm		} // end namespace llvm

#endif // LLVM_TRANSFORMS_VECTORIZE_VPRECIPEBUILDER_H		#endif // LLVM_TRANSFORMS_VECTORIZE_VPRECIPEBUILDER_H

lib/Transforms/Vectorize/VPlanHCFGTransforms.h

	Show All 12 Lines

	#ifndef LLVM_TRANSFORMS_VECTORIZE_VPLANHCFGTRANSFORMS_H			#ifndef LLVM_TRANSFORMS_VECTORIZE_VPLANHCFGTRANSFORMS_H
	#define LLVM_TRANSFORMS_VECTORIZE_VPLANHCFGTRANSFORMS_H			#define LLVM_TRANSFORMS_VECTORIZE_VPLANHCFGTRANSFORMS_H

	#include "LoopVectorizationPlanner.h"			#include "LoopVectorizationPlanner.h"
	#include "VPlan.h"			#include "VPlan.h"
	#include "llvm/IR/Instruction.h"			#include "llvm/IR/Instruction.h"

				#include "VPRecipeBuilder.h"
	namespace llvm {			namespace llvm {

	class VPlanHCFGTransforms {			class VPlanHCFGTransforms {
	using VPlanPtr = std::unique_ptr<VPlan>;

	public:			public:
	/// Sinks instructions in \p Plan, depending on their underlying values in			/// Sinks instructions in \p Plan, depending on their underlying values in
	/// \p SinkAfter.			/// \p SinkAfter.
	// FIXME: Migrate to using a VPlan based mapping, once
	// LoopVectorizationLegality::getSinkAfter is moved to VPlan.
	static void			static void
	sinkInstructions(VPlanPtr &Plan,			sinkInstructions(VPlanPtr &Plan,
	DenseMap<Instruction , Instruction > &SinkAfter);			DenseMap<Instruction , Instruction > &SinkAfter);

				/// Creates a new VPlan using VPRecipes from a VPInstruction VPlan
				/// \p OriginalPlan
				static VPlanPtr
				VPInstructionsToVPRecipies(Loop *OrigLoop, VPlanPtr &OriginalPlan,
				VFRange &Range, VPRecipeBuilder &RecipeBuilder,
				SmallPtrSetImpl<Instruction *> &DeadInstructions);
	};			};

	} // namespace llvm			} // namespace llvm

	#endif // LLVM_TRANSFORMS_VECTORIZE_VPLANHCFGTRANSFORMS_H			#endif // LLVM_TRANSFORMS_VECTORIZE_VPLANHCFGTRANSFORMS_H

lib/Transforms/Vectorize/VPlanHCFGTransforms.cpp

//===-- VPlanHCFGTransforms.cpp -------------------------------------------===//		//===-- VPlanHCFGTransforms.cpp -------------------------------------------===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
///		///
/// \file		/// \file
/// This file implements a set of utility VPlan to VPlan transformations.		/// This file implements a set of utility VPlan to VPlan transformations.
///		///
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "VPlanHCFGTransforms.h"		#include "VPlanHCFGTransforms.h"
#include "llvm/ADT/PostOrderIterator.h"		#include "llvm/ADT/PostOrderIterator.h"
		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "loop-vectorize"		#define DEBUG_TYPE "loop-vectorize"

void VPlanHCFGTransforms::sinkInstructions(		void VPlanHCFGTransforms::sinkInstructions(
VPlanPtr &Plan, DenseMap<Instruction , Instruction > &SinkAfter) {		VPlanPtr &Plan, DenseMap<Instruction , Instruction > &SinkAfter) {
Show All 25 Lines	for (VPRecipeBase &Ingredient : *OriginalVPBB) {
// Move instructions to handle first-order recurrences, step 2: push the		// Move instructions to handle first-order recurrences, step 2: push the
// instruction to be sunk at its insertion point.		// instruction to be sunk at its insertion point.
auto SAInvIt = SinkAfterInverse.find(Instr);		auto SAInvIt = SinkAfterInverse.find(Instr);
if (SAInvIt != SinkAfterInverse.end())		if (SAInvIt != SinkAfterInverse.end())
SAInvIt->second->moveAfter(VPInst);		SAInvIt->second->moveAfter(VPInst);
}		}
}		}
}		}

		VPlanPtr VPlanHCFGTransforms::VPInstructionsToVPRecipies(
		dcaballeUnsubmitted Not Done Reply Inline Actions Since this transformation happens just before the execution of VPlan, I think it would be better to just modify the existing VPlan (i.e., remove each VPInstruction and add the corresponding recipes) instead of creating a new one. In that way, this code would be even simpler. Does it make sense to you or did you decide to create a new one for some reason? dcaballe: Since this transformation happens just before the execution of VPlan, I think it would be…
		Loop *OrigLoop, VPlanPtr &OriginalPlan, VFRange &Range,
		VPRecipeBuilder &RecipeBuilder,
		SmallPtrSetImpl<Instruction *> &DeadInstructions) {
		// Hold a mapping from predicated instructions to their recipes, in order to
		// fix their AlsoPack behavior if a user is determined to replicate and use a
		// scalar instead of vector value.
		DenseMap<Instruction , VPReplicateRecipe > PredInst2Recipe;

		// Create a dummy pre-entry VPBasicBlock to start building the VPlan.
		VPBasicBlock *VPBB = new VPBasicBlock("Pre-Entry");
		auto Plan = llvm::make_unique<VPlan>(VPBB);

		// Create VPValues used by createEdgeMask.
		auto *Latch = OrigLoop->getLoopLatch();
		dcaballeUnsubmitted Not Done Reply Inline Actions We should always have a TopRegion. dyn_cast -> cast? dcaballe: We should always have a TopRegion. dyn_cast -> cast?
		SmallPtrSet<Value *, 4> AddedValues;
		for (BasicBlock *BB : OrigLoop->blocks()) {
		dcaballeUnsubmitted Done Reply Inline Actions Shoudn't we collect DbgInfoIntrinsic as a DeadInstructions instead of having this check here? dcaballe: Shoudn't we collect DbgInfoIntrinsic as a DeadInstructions instead of having this check here?
		sguggillUnsubmitted Not Done Reply Inline Actions An assert(inst) would be nice here. sguggill: An assert(inst) would be nice here.
		fhahnAuthorUnsubmitted Not Done Reply Inline Actions Switched to cast<> fhahn: Switched to cast<>
		sguggillUnsubmitted Not Done Reply Inline Actions OK. sguggill: OK.
		if (BB == Latch)
		continue;
		BranchInst *Branch = dyn_cast<BranchInst>(BB->getTerminator());
		if (Branch && Branch->isConditional() &&
		dcaballeUnsubmitted Not Done Reply Inline Actions Please, remove TopRegion check. dcaballe: Please, remove TopRegion check.
		fhahnAuthorUnsubmitted Not Done Reply Inline Actions Done. I think we still need a check to not create recipes for the pre-header and exit blocks? fhahn: Done. I think we still need a check to not create recipes for the pre-header and exit blocks?
		dcaballeUnsubmitted Not Done Reply Inline Actions Yes, you can do that for now. I plan to create clean/empty PH and Exit during construction so that we can safely vectorize or process whatever we add to them other VPlan-to-VPlan transformations. dcaballe: Yes, you can do that for now. I plan to create clean/empty PH and Exit during construction so…
		!AddedValues.count(Branch->getCondition())) {
		Plan->addVPValue(Branch->getCondition());
		AddedValues.insert(Branch->getCondition());
		dcaballeUnsubmitted Done Reply Inline Actions remove { } for single line ifs and elses? dcaballe: remove { } for single line ifs and elses?
		}
		}

		VPRegionBlock *TopRegion = dyn_cast<VPRegionBlock>(OriginalPlan->getEntry());
		ReversePostOrderTraversal<VPBlockBase *> RPOT(TopRegion->getEntry());
		for (VPBlockBase *Base : RPOT) {
		VPBasicBlock *OriginalVPBB = Base->getEntryBasicBlock();
		// Skip entry and exit nodes for now. Currently the recipes will take
		// care of creating instructions in entry and exit blocks.
		if (TopRegion && (OriginalVPBB == TopRegion->getEntry() \|\|
		OriginalVPBB == TopRegion->getExit()))
		sguggillUnsubmitted Not Done Reply Inline Actions I think we are diverging from the nonVPlanNativePath case, where a widenRecipe can have > 1 ingredients but this should be fine. sguggill: I think we are diverging from the nonVPlanNativePath case, where a widenRecipe can have > 1…
		fhahnAuthorUnsubmitted Not Done Reply Inline Actions Yep, this code creates a new WidenRecipe for each widened instruction because it is slightly more work to get the previous recipe, because we remove the current instruction. I can change it though if you think that's better. fhahn: Yep, this code creates a new WidenRecipe for each widened instruction because it is slightly…
		sguggillUnsubmitted Not Done Reply Inline Actions This looks good. Thanks! sguggill: This looks good. Thanks!
		dcaballeUnsubmitted Done Reply Inline Actions You mean in line 95? I see. We shouldn't create unnecessary recipes if we can reuse an existing one. Could we do something similar to the code at the end of 'tryToWiden'?: // Success: widen this instruction. We optimize the common case where // consecutive instructions can be represented by a single recipe. if (!VPBB->empty()) { VPWidenRecipe LastWidenRecipe = dyn_cast<VPWidenRecipe>(&VPBB->back()); if (LastWidenRecipe && LastWidenRecipe->appendInstruction(I)) return true; } VPBB->appendRecipe(new VPWidenRecipe(I)); dcaballe:* You mean in line 95? I see. We shouldn't create unnecessary recipes if we can reuse an existing…
		continue;
		dcaballeUnsubmitted Not Done Reply Inline Actions Shoudn't we add DbgInfoIntrinsic to DeadInstructions then? dcaballe: Shoudn't we add DbgInfoIntrinsic to DeadInstructions then?

		auto *FirstVPBBForBB = new VPBasicBlock(OriginalVPBB->getName());
		VPBlockUtils::insertBlockAfter(FirstVPBBForBB, VPBB);
		VPBB = FirstVPBBForBB;
		RecipeBuilder.setInsertPoint(VPBB);
		unsigned VPBBsForBB = 0;

		std::vector<VPRecipeBase *> Ingredients;

		// Introduce each ingredient into VPlan.
		for (VPRecipeBase &Ingredient : *OriginalVPBB) {
		VPInstruction *VPInst = dyn_cast<VPInstruction>(&Ingredient);
		assert(VPInst && "Can only handle VPInstructions.");
		Instruction *Instr = dyn_cast<Instruction>(VPInst->getUnderlyingValue());
		if (DeadInstructions.count(Instr) \|\| isa<DbgInfoIntrinsic>(Instr))
		continue;

		if (RecipeBuilder.tryToCreateRecipe(Instr, Range, Plan, VPBB))
		continue;

		// Otherwise, if all widening options failed, Instruction is to be
		// replicated. This may create a successor for VPBB.
		VPBasicBlock *NextVPBB = RecipeBuilder.handleReplication(
		Instr, Range, VPBB, PredInst2Recipe, Plan);
		if (NextVPBB != VPBB) {
		VPBB = NextVPBB;
		VPBB->setName(VPBB->getName() + "." + Twine(VPBBsForBB++));
		}
		}
		}

		// Discard empty dummy pre-entry VPBasicBlock. Note that other VPBasicBlocks
		// may also be empty, such as the last one VPBB, reflecting original
		// basic-blocks with no recipes.
		VPBasicBlock *PreEntry = cast<VPBasicBlock>(Plan->getEntry());
		assert(PreEntry->empty() && "Expecting empty pre-entry block.");
		VPBlockBase *Entry = Plan->setEntry(PreEntry->getSingleSuccessor());
		VPBlockUtils::disconnectBlocks(PreEntry, Entry);
		delete PreEntry;
		craig.topperUnsubmitted Not Done Reply Inline Actions Is the iteration order guaranteed here when the set is small? It's been a while since I've look at how SmallSet manages the vector. craig.topper: Is the iteration order guaranteed here when the set is small? It's been a while since I've look…
		fhahnAuthorUnsubmitted Not Done Reply Inline Actions I've just put up a patch adding an iterator for SmallSet: D47942 If the set is small, it just uses a SmallVector, otherwise it uses std::set, which should also give a deterministic iteration order. fhahn: I've just put up a patch adding an iterator for SmallSet: D47942 If the set is small, it just…
		craig.topperUnsubmitted Not Done Reply Inline Actions I guess deterministic wasn't the right word. It looks like when the set is small this will print out in the order that the VFs were inserted into the set not a numeric order? Given that this is going into a string is that a good idea? craig.topper: I guess deterministic wasn't the right word. It looks like when the set is small this will…
		fhahnAuthorUnsubmitted Not Done Reply Inline Actions Yeah, I guess we would need to sort the VFs here. But I'll update the patch to transform the original VPlan in place, so this code should go away. fhahn: Yeah, I guess we would need to sort the VFs here. But I'll update the patch to transform the…
		craig.topperUnsubmitted Not Done Reply Inline Actions What's the range of VFs that you expect to use? Could you use a BitVector or SparseBitVector instead? craig.topper: What's the range of VFs that you expect to use? Could you use a BitVector or SparseBitVector…

		std::string PlanName;
		raw_string_ostream RSO(PlanName);
		unsigned VF = Range.Start;
		Plan->addVF(VF);
		RSO << "Initial VPlan for VF={" << VF;
		for (VF = 2; VF < Range.End; VF = 2) {
		Plan->addVF(VF);
		RSO << "," << VF;
		}
		RSO << "},UF>=1";
		RSO.flush();
		Plan->setName(PlanName);

		return Plan;
		}

test/Transforms/LoopVectorize/consecutive-ptr-uniforms.ll

	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: opt < %s -loop-vectorize -force-vector-width=4 -force-vector-interleave=1 -instcombine -debug-only=loop-vectorize -disable-output -print-after=instcombine 2>&1 \| FileCheck %s			; RUN: opt < %s -loop-vectorize -force-vector-width=4 -force-vector-interleave=1 -instcombine -debug-only=loop-vectorize -disable-output -print-after=instcombine 2>&1 \| FileCheck %s
				; RUN: opt < %s -loop-vectorize -force-vector-width=4 -force-vector-interleave=1 -enable-vplan-native-path -instcombine -debug-only=loop-vectorize -disable-output -print-after=instcombine 2>&1 \| FileCheck %s
	; RUN: opt < %s -loop-vectorize -force-vector-width=4 -force-vector-interleave=1 -enable-interleaved-mem-accesses -instcombine -debug-only=loop-vectorize -disable-output -print-after=instcombine 2>&1 \| FileCheck %s --check-prefix=INTER			; RUN: opt < %s -loop-vectorize -force-vector-width=4 -force-vector-interleave=1 -enable-interleaved-mem-accesses -instcombine -debug-only=loop-vectorize -disable-output -print-after=instcombine 2>&1 \| FileCheck %s --check-prefix=INTER

	target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"

	%pair = type { i32, i32 }			%pair = type { i32, i32 }

	; CHECK-LABEL: consecutive_ptr_forward			; CHECK-LABEL: consecutive_ptr_forward
	;			;
	▲ Show 20 Lines • Show All 480 Lines • Show Last 20 Lines

test/Transforms/LoopVectorize/i8-induction.ll

	; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=4 -dce -instcombine -S			; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=4 -dce -instcombine -S
				; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=4 -enable-vplan-native-path -dce -instcombine -S

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"

	@a = common global i8 0, align 1			@a = common global i8 0, align 1
	@b = common global i8 0, align 1			@b = common global i8 0, align 1

	define void @f() nounwind uwtable ssp {			define void @f() nounwind uwtable ssp {
	scalar.ph:			scalar.ph:
	Show All 21 Lines

test/Transforms/LoopVectorize/if-conversion.ll

	; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=4 -enable-if-conversion -dce -instcombine -S \| FileCheck %s			; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=4 -enable-if-conversion -dce -instcombine -S \| FileCheck %s
				; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=4 -enable-vplan-native-path -enable-if-conversion -dce -instcombine -S \| FileCheck %s

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"

	; This is the loop in this example:			; This is the loop in this example:
	;			;
	;int function0(int a, int b, int start, int end) {			;int function0(int a, int b, int start, int end) {
	;			;
	; for (int i=start; i<end; ++i) {			; for (int i=start; i<end; ++i) {
	▲ Show 20 Lines • Show All 188 Lines • Show Last 20 Lines

test/Transforms/LoopVectorize/increment.ll

				; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=4 -enable-vplan-native-path -dce -instcombine -S \| FileCheck %s
	; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=4 -dce -instcombine -S \| FileCheck %s			; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=4 -dce -instcombine -S \| FileCheck %s

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"

	@a = common global [2048 x i32] zeroinitializer, align 16			@a = common global [2048 x i32] zeroinitializer, align 16

	; This is the loop.			; This is the loop.
	; for (i=0; i<n; i++){			; for (i=0; i<n; i++){
	▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

test/Transforms/LoopVectorize/induction.ll

	; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=2 -S \| FileCheck %s			; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=2 -S \| FileCheck %s
				; RUN: opt < %s -enable-vplan-native-path -loop-vectorize -force-vector-interleave=1 -force-vector-width=2 -S \| FileCheck %s
	; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=2 -instcombine -S \| FileCheck %s --check-prefix=IND			; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=2 -instcombine -S \| FileCheck %s --check-prefix=IND
	; RUN: opt < %s -loop-vectorize -force-vector-interleave=2 -force-vector-width=2 -instcombine -S \| FileCheck %s --check-prefix=UNROLL			; RUN: opt < %s -loop-vectorize -force-vector-interleave=2 -force-vector-width=2 -instcombine -S \| FileCheck %s --check-prefix=UNROLL
	; RUN: opt < %s -loop-vectorize -force-vector-interleave=2 -force-vector-width=2 -S \| FileCheck %s --check-prefix=UNROLL-NO-IC			; RUN: opt < %s -loop-vectorize -force-vector-interleave=2 -force-vector-width=2 -S \| FileCheck %s --check-prefix=UNROLL-NO-IC
	; RUN: opt < %s -loop-vectorize -force-vector-interleave=2 -force-vector-width=4 -enable-interleaved-mem-accesses -instcombine -S \| FileCheck %s --check-prefix=INTERLEAVE			; RUN: opt < %s -loop-vectorize -force-vector-interleave=2 -force-vector-width=4 -enable-interleaved-mem-accesses -instcombine -S \| FileCheck %s --check-prefix=INTERLEAVE

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"

	; Make sure that we can handle multiple integer induction variables.			; Make sure that we can handle multiple integer induction variables.
	▲ Show 20 Lines • Show All 887 Lines • Show Last 20 Lines

test/Transforms/LoopVectorize/induction_plus.ll

	; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=4 -S \| FileCheck %s			; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=4 -S \| FileCheck %s
				; RUN: opt < %s -enable-vplan-native-path -loop-vectorize -force-vector-interleave=1 -force-vector-width=4 -S \| FileCheck %s

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"

	@array = common global [1024 x i32] zeroinitializer, align 16			@array = common global [1024 x i32] zeroinitializer, align 16

	;CHECK-LABEL: @array_at_plus_one(			;CHECK-LABEL: @array_at_plus_one(
	;CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]			;CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	;CHECK: %vec.ind = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, %vector.ph ], [ %vec.ind.next, %vector.body ]			;CHECK: %vec.ind = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, %vector.ph ], [ %vec.ind.next, %vector.body ]
	Show All 25 Lines

test/Transforms/LoopVectorize/loop-scalars.ll

	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: opt < %s -loop-vectorize -force-vector-width=2 -force-vector-interleave=1 -instcombine -debug-only=loop-vectorize -disable-output -print-after=instcombine 2>&1 \| FileCheck %s			; RUN: opt < %s -loop-vectorize -force-vector-width=2 -force-vector-interleave=1 -instcombine -debug-only=loop-vectorize -disable-output -print-after=instcombine 2>&1 \| FileCheck %s
				; RUN: opt < %s -loop-vectorize -force-vector-width=2 -force-vector-interleave=1 -enable-vplan-native-path -instcombine -debug-only=loop-vectorize -disable-output -print-after=instcombine 2>&1 \| FileCheck %s

	target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"

	; CHECK-LABEL: vector_gep			; CHECK-LABEL: vector_gep
	; CHECK-NOT: LV: Found scalar instruction: %tmp0 = getelementptr inbounds i32, i32* %b, i64 %i			; CHECK-NOT: LV: Found scalar instruction: %tmp0 = getelementptr inbounds i32, i32* %b, i64 %i
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], %vector.body ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], %vector.body ]
	; CHECK-NEXT: [[VEC_IND:%.]] = phi <2 x i64> [ <i64 0, i64 1>, %vector.ph ], [ [[VEC_IND_NEXT:%.]], %vector.body ]			; CHECK-NEXT: [[VEC_IND:%.]] = phi <2 x i64> [ <i64 0, i64 1>, %vector.ph ], [ [[VEC_IND_NEXT:%.]], %vector.body ]
	▲ Show 20 Lines • Show All 133 Lines • Show Last 20 Lines

test/Transforms/LoopVectorize/minmax_reduction.ll

	; RUN: opt -S -loop-vectorize -dce -instcombine -force-vector-width=2 -force-vector-interleave=1 < %s \| FileCheck %s			; RUN: opt -S -loop-vectorize -dce -instcombine -force-vector-width=2 -force-vector-interleave=1 < %s \| FileCheck %s
				; RUN: opt -S -loop-vectorize -dce -instcombine -force-vector-width=2 -force-vector-interleave=1 -enable-vplan-native-path < %s \| FileCheck %s

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"

	@A = common global [1024 x i32] zeroinitializer, align 16			@A = common global [1024 x i32] zeroinitializer, align 16
	@fA = common global [1024 x float] zeroinitializer, align 16			@fA = common global [1024 x float] zeroinitializer, align 16
	@dA = common global [1024 x double] zeroinitializer, align 16			@dA = common global [1024 x double] zeroinitializer, align 16

	; Signed tests.			; Signed tests.
	▲ Show 20 Lines • Show All 876 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[VPlan] Add VPInstruction to VPRecipe transformation.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 149312

lib/Transforms/Vectorize/LoopVectorize.cpp

lib/Transforms/Vectorize/VPRecipeBuilder.h

lib/Transforms/Vectorize/VPlanHCFGTransforms.h

lib/Transforms/Vectorize/VPlanHCFGTransforms.cpp

test/Transforms/LoopVectorize/consecutive-ptr-uniforms.ll

test/Transforms/LoopVectorize/i8-induction.ll

test/Transforms/LoopVectorize/if-conversion.ll

test/Transforms/LoopVectorize/increment.ll

test/Transforms/LoopVectorize/induction.ll

test/Transforms/LoopVectorize/induction_plus.ll

test/Transforms/LoopVectorize/loop-scalars.ll

test/Transforms/LoopVectorize/minmax_reduction.ll

[VPlan] Add VPInstruction to VPRecipe transformation.
ClosedPublic