This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
docs/Proposals/
-
Proposals/
-
VectorizationPlan.rst
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
-
LoopVectorize.cpp
-
VPlan.h
-
VPlan.cpp
-
VPlanBuilder.h
-
VPlanValue.h
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
if-conversion-nest.ll

Differential D38676

[LV] Model masking in VPlan, introducing VPInstructions
ClosedPublic

Authored by gilr on Oct 8 2017, 12:28 PM.

Download Raw Diff

Details

Reviewers

mkuper
rengolin
mssimpso
aemerson
jdoerfert

Commits

rG8b9d1f3c5b65: [LV] Model masking in VPlan, introducing VPInstructions
rL318645: [LV] Model masking in VPlan, introducing VPInstructions

Summary

This patch adds a new abstraction layer to VPlan and leverages it to model masking in VPlan. Masking is essential to the vectorizer's predication process, but is currently an ad-hoc side-effect of the vectorized IR-generation stage. Modeling masking directly in VPlan facilitates moving predication from IR-generation stage to the planning stage.

VPlan models the contents of VPBasicBlocks using Recipes, which do not currently expose their planned instructions nor the Def-Use relations between them. Modeling masking in VPlan requires VPlan to model Def-Use relations, specifically among the planned instructions that will generate, manipulate and consume masks.

The new VPValue and VPUser classes model how data flows into, through and out of a VPlan, forming the vertices of a planned Def-Use graph. The new VPInstruction class is a generic single-instruction Recipe that models a planned instruction along with its opcode, operands and users. See VectorizationPlan.rst for more details.

With this patch, VPlan models as VPInstructions the planned instructions that manipulate masks (AND, OR, NOT), introduced during predication. Their operands are other VPInstructions (e.g., a NOT feeding an AND), or VPValues coming from other Recipes or from live-in values. Their users are other VPInstructions or VPUsers employed to retrieve the mask during Recipes' execution (e.g., for generating calls to masked load/store intrinsics). More concretely:

createBlockInMask(), createEdgeMask() were moved from ILV to the Planner. Their code is essentially unchanged, except they now generate VPValues instead of Values. They are now called by Planner and passed down to mask-aware Recipes during VPlan construction.

ILV::vectorizeMemoryInstruction() so far generated the masks on-the-fly using createBlockInMask(). In this patch VPWidenMemoryInstructionRecipe provides the masks as an optional argument.

VPBlendRecipe takes over non-loop phi's. It generates the same sequence of selects, only based on VPlan masks.

VPBranchOnMaskRecipe now takes a VPValue mask.

Another aspect to notice is that until VPValues fully replace the existing scalar-IR-based ValueMap mechanism we effectively have two Def-Use graphs co-exist during vectorized code generation. Rewiring these two graphs together is done using
(a) Recipes taking VPValues and writing to ValueMap, and conversely
(b) VPValues that model Values still handled in ValueMap. The ILVCallback provides the bridge from VPlan’s DataState to ValueMap.

This abstraction layer facilitates modeling additional Def-Use relations in VPlan, to support bringing additional transformations to the planning stage.

Ayal & Gil

Diff Detail

Repository: rL LLVM

Event Timeline

gilr created this revision.Oct 8 2017, 12:28 PM

I haven't been keeping up with this area as much as I'd have liked, but have some comments. Overall the approach seems ok to me.

lib/Transforms/Vectorize/LoopVectorize.cpp
7950 ↗	(On Diff #118173)	Can we use unique_ptr instead of a raw pointer to avoid repeating this?
8020 ↗	(On Diff #118173)	This can be combined into line below.
lib/Transforms/Vectorize/VPlan.cpp
210 ↗	(On Diff #118173)	Can you instead check if it's a BinaryOperator instead of having to repeat these opcodes?
lib/Transforms/Vectorize/VPlan.h
736 ↗	(On Diff #118173)	Why delete the value from the map but leave the key?
755 ↗	(On Diff #118173)	Just make this take a pointer?
756 ↗	(On Diff #118173)	This seems like it should be an assert instead? Why would a VPValue ever be created twice for a given Value and VPlan pair?

aemerson added a reviewer: aemerson.Oct 8 2017, 3:02 PM

dcaballe added a subscriber: dcaballe.Oct 8 2017, 7:57 PM

Thanks, Amara!

lib/Transforms/Vectorize/VPlan.h
736 ↗	(On Diff #118173)	These are Values from the existing IR code - VPlan doesn't own them, just maintains a VPValue to represent them at the VPlan-level Def-Use graph.

Adressed comments.
Fixed bug: tryToWidenMemory() wasn't considering Cost Model's sink-scalar-operands scalarization decisions.

a.elovikov added a subscriber: a.elovikov.Oct 10 2017, 8:23 AM

aemerson added inline comments.Oct 11 2017, 3:39 AM

lib/Transforms/Vectorize/VPlan.h
736 ↗	(On Diff #118173)	I meant why not erase() the entry from the map completely. Lleaving the key in the map but with a now invalid pointer doesn't seem right, unless I'm missing something.

egarcia added a subscriber: egarcia.Oct 11 2017, 9:11 AM

Hi Ayal/Gil,

This is an interesting pattern and I welcome the docs changes with the code. But the patch is quite big and it's hard to understand what's the actual change and what's a pre-requisite.

AFAICS, there are two main stages here:

Groundwork to get the new constructs (VPlanValue/VPlanBuilder/VPlan changes) and necessary changes to LV and ILV.
New recipes that use them (VPBlendRecipe/VPWidenMemoryInstructionRecipe) and necessary cleanups in LV/ILV as well as VPlan*.

I'd imagine this patch can be split into at least three blocks:

New constructs + docs + (I)LV cleanups
VPBlendRecipe + tests
VPWidenMemoryInstructionRecipe + tests

but I wouldn't be surprised if you needed some generic cleanups before the first patch and after the last one.

I'm also not expecting a lot of new tests, given that this is just doing better what we already do.

Would it be possible to split the patch, so that we can review them in a more concise way?

Thanks!
--renato

In D38676#895867, @rengolin wrote:

Would it be possible to split the patch, so that we can review them in a more concise way?

Hi Renato,

The patch is admittedly quite massive despite our deliberate minimization efforts.
To avoid introducing dead code into LV we're adding the new VPlan constructs along with a single, well-bounded use case.
Splitting the masking code in LoopVectorize.cpp is challenging: once VPlan takes care of the masks it must rewire them to the relevant Recipes (which are few).

Most of the changes/additions to the VPlan* files are rather simple and technical. The changes in LoopVectorize.cpp are the more complex part and deserve a more elaborate explanation, to be appended to the patch summary:

createBlockInMask(), createEdgeMask(): these two methods moved from ILV to the Planner. Their code is essentially unchanged, except they now generate VPValues instead of Values. They are now called by Planner and passed down to mask-aware Recipes during VPlan construction.

ILV::vectorizeMemoryInstruction() was so far called from VPWidenRecipe and generated the masks on-the-fly using createBlockInMask(). In this patch it is called by the new VPWidenMemoryInstructionRecipe, which provides the masks as an optional argument.

VPBlendRecipe takes over non-loop phi's. It generates the same sequence of selects, only based on VPlan masks.

VPBranchOnMaskRecipe now takes a VPValue mask.

lib/Transforms/Vectorize/VPlan.h
736 ↗	(On Diff #118173)	The invalid entries will be destroyed immediately after on exiting the destructor, but it would indeed be good to add an explicit call to Value2VPValue.clear(), similar to LICM's runOnLoop(). Thanks!

gilr edited the summary of this revision. (Show Details)Oct 14 2017, 12:26 AM

Addressed comment: Added a call to Value2VPValue.clear() on ~VPlan().

A minor nit but LGTM. I don't have an aversion to "dead code" if it's going to be used in the near future, so perhaps to make the patch smaller split it into an initial patch to add VPlanValue.h and then the VPlanBuilder.h as well as the changes to the documentation. Those pieces seem uncontroversial to me, before moving onto the vectorizer. @mkuper does this make sense?

lib/Transforms/Vectorize/VPlan.cpp
33 ↗	(On Diff #119081)	shorter with `const auto*` here
lib/Transforms/Vectorize/VPlan.h
736 ↗	(On Diff #118173)	Ah I'd gotten mixed up when I was looking at this originally. I'm ok with the original code, sorry for noise, although maybe it makes sense to store them as unique_ptrs?

In D38676#898099, @aemerson wrote:

A minor nit but LGTM. I don't have an aversion to "dead code" if it's going to be used in the near future, so perhaps to make the patch smaller split it into an initial patch to add VPlanValue.h and then the VPlanBuilder.h as well as the changes to the documentation. Those pieces seem uncontroversial to me, before moving onto the vectorizer. @mkuper does this make sense?

That was basically my point. :)

In D38676#897413, @gilr wrote:

To avoid introducing dead code into LV we're adding the new VPlan constructs along with a single, well-bounded use case.

Earlier patches in a patch series are not considered dead code. If you submit a 2-patch series, it'd make it easier for us to understand what's needed to create the infrastructure and what's its usage. As it is, it's hard to know what code movement is one or the other.

Splitting the masking code in LoopVectorize.cpp is challenging: once VPlan takes care of the masks it must rewire them to the relevant Recipes (which are few).

This could be part of the first patch.

Most of the changes/additions to the VPlan* files are rather simple and technical. The changes in LoopVectorize.cpp are the more complex part and deserve a more elaborate explanation, to be appended to the patch summary:

These could also be comments. There are a lot of comments in the vectoriser for that reason. :)

createBlockInMask(), createEdgeMask(): these two methods moved from ILV to the Planner. Their code is essentially unchanged, except they now generate VPValues instead of Values. They are now called by Planner and passed down to mask-aware Recipes during VPlan construction.

This looks like it's part of the first patch.

ILV::vectorizeMemoryInstruction() was so far called from VPWidenRecipe and generated the masks on-the-fly using createBlockInMask(). In this patch it is called by the new VPWidenMemoryInstructionRecipe, which provides the masks as an optional argument.

VPBlendRecipe takes over non-loop phi's. It generates the same sequence of selects, only based on VPlan masks.

VPBranchOnMaskRecipe now takes a VPValue mask.

These look like they're part of the second.

--renato

FYI: http://lists.llvm.org/pipermail/llvm-dev/2017-October/118241.html

In D38676#898099, @aemerson wrote:

A minor nit but LGTM. I don't have an aversion to "dead code" if it's going to be used in the near future, so perhaps to make the patch smaller split it into an initial patch to add VPlanValue.h and then the VPlanBuilder.h as well as the changes to the documentation. Those pieces seem uncontroversial to me, before moving onto the vectorizer. @mkuper does this make sense?

My $0.02:
Basically, I agree with @aemerson and @rengolin, at least to the extent that even if either way could work, it would be best to do it the way that ends up with simpler patches and will take less time (overall) to review well.

I don't think there's a problem of dead code, or "it's unclear how the infrastructure will be used", since you already have the code that actually uses it ready for review as well. So, it would probably best to split it into two dependent patches, and post them for review separately. Then, the second one can be rebased on top of the first one, if it has any significant changes, when it goes in. Admittedly, it's more work, but I think it's worth it.

WDYT?

In D38676#899081, @mkuper wrote:

In D38676#898099, @aemerson wrote:

A minor nit but LGTM. I don't have an aversion to "dead code" if it's going to be used in the near future, so perhaps to make the patch smaller split it into an initial patch to add VPlanValue.h and then the VPlanBuilder.h as well as the changes to the documentation. Those pieces seem uncontroversial to me, before moving onto the vectorizer. @mkuper does this make sense?

My $0.02:
Basically, I agree with @aemerson and @rengolin, at least to the extent that even if either way could work, it would be best to do it the way that ends up with simpler patches and will take less time (overall) to review well.

I don't think there's a problem of dead code, or "it's unclear how the infrastructure will be used", since you already have the code that actually uses it ready for review as well. So, it would probably best to split it into two dependent patches, and post them for review separately. Then, the second one can be rebased on top of the first one, if it has any significant changes, when it goes in. Admittedly, it's more work, but I think it's worth it.

WDYT?

Yes, small & digestible patches are ideal. So, as you guys are all Ok with introducing the VPInstruction infrastructure w/o its use, let's go with that.
One change we can probably peel off before, though, is to first introduce VPBlendRecipe and VPWidenMemoryInstruction w/o the new masking code. This should later narrow the diff in LoopVectorize.cpp and reflect the masking changes on those Recipes as well.

sguggill added a subscriber: sguggill.Oct 17 2017, 4:45 PM

In D38676#900051, @gilr wrote:

One change we can probably peel off before, though, is to first introduce VPBlendRecipe and VPWidenMemoryInstruction w/o the new masking code. This should later narrow the diff in LoopVectorize.cpp and reflect the masking changes on those Recipes as well.

This is now up for review as D39068.

RKSimon added a subscriber: RKSimon.Oct 19 2017, 7:38 AM

In D38676#898442, @rengolin wrote:

FYI: http://lists.llvm.org/pipermail/llvm-dev/2017-October/118241.html

Regarding VPValue/VPUser/VPInstruction concept ----- input from in-person chat with Hal Finkel and Chris Lattner before/during LLVM Dev Con
(more precisely speaking, this is my interpretation of the chat --- if in doubt, please verify with them):
Both agreed with me that many of the code we have/write should, in theory, be functional on a CFG subgraph and Instructions that aren't strictly attached
to the Function (being integral part of the IR state). Having said that, we all agreed that figuring out what works and what does not, in very high confidence,
is a lot of work. As such, given the current limited scope of VPInstruction usage in VPlan, we don't see a point of investing a lot of effort in making many of
code working on CFG/Instruction truly useful when the CFG subgraph is detached (or not yet attached) to the Function.

The design choice I'm advocating is that we should be able to make all of the vectorizer up to cost modeling can be made as a valid Analysis Pass.
That means no IR Instructions left unattached (if we do that, verifiers will complain). As a result, if we are to use IR instructions instead of VPInstructions
in VPlan, we have to either clone the entire Function, or create a new Function and copy the loop (nest) of interest to it. I hugely disagree to such a hack

hence have a desire to move forward with the VPInstruction direction.

If we all agree to go that way, please continue to watch us, in our effort for avoiding copying/pasting as much as we can.

Thanks,
Hideki

In D38676#902541, @hsaito wrote:

The design choice I'm advocating is that we should be able to make all of the vectorizer up to cost modeling can be made as a valid Analysis Pass.
That means no IR Instructions left unattached (if we do that, verifiers will complain). As a result, if we are to use IR instructions instead of VPInstructions
in VPlan, we have to either clone the entire Function, or create a new Function and copy the loop (nest) of interest to it. I hugely disagree to such a hack

hence have a desire to move forward with the VPInstruction direction.

Fully agreed! Thanks for the clarification.

--renato

rengolin added a child revision: D39068: [LV] Introduce VPBlendRecipe, VPWidenMemoryInstructionRecipe.Oct 28 2017, 10:37 AM

bollu added a subscriber: bollu.Oct 30 2017, 12:52 AM

fhahn added a subscriber: fhahn.Oct 31 2017, 2:21 AM

rengolin mentioned this in D39068: [LV] Introduce VPBlendRecipe, VPWidenMemoryInstructionRecipe.Nov 6 2017, 9:19 AM

Rebasing this patch on its preparatory patch D39068.

gilr mentioned this in rL318149: [LV] Introduce VPBlendRecipe, VPWidenMemoryInstructionRecipe.Nov 14 2017, 4:10 AM

rengolin added inline comments.Nov 15 2017, 1:00 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
8374 ↗	(On Diff #121904)	So, `buildVPlans` will call `buildVPlan` for each VF, which means we'll call `new VPlan` multiple times, but now the `Plan` is a class global and will hold the last created plan's pointer. What is the purpose of this pointer?
lib/Transforms/Vectorize/VPlan.h
241 ↗	(On Diff #121904)	How is this saving the value in the `DenseMap PerPartOutput`?

I finished my review, and apart from my two final comments, everything looks fine.

Thanks for the hard work! :)

In D38676#929411, @rengolin wrote:

I finished my review, and apart from my two final comments, everything looks fine.

Thanks for the hard work! :)

Excellent. Will upload a revised version shortly.
Thanks a lot for reviewing this!

lib/Transforms/Vectorize/LoopVectorize.cpp
8374 ↗	(On Diff #121904)	The Plan class-global pointer is used for maintaining the current VPlan being constructed. But this indeed may be confusing - will instead propagate the Plan into the Planner's methods.
lib/Transforms/Vectorize/VPlan.h
241 ↗	(On Diff #121904)	This method is only a getter. Assigning Values to VPValues is done using set(). The old ILV ValueMap mechanism is delegated the call for Defs which are still not fully ported to the VPInstruction framework (and have therefore not been set()). I'll try to improve this method's comments to express this more clearly.

Addressed review comments.

Great, LGTM now, thanks!

lib/Transforms/Vectorize/LoopVectorize.cpp
8374 ↗	(On Diff #121904)	Right, I see, it's the current plan. Let's keep it as an argument for now and try to find a better pattern later. Thanks!
lib/Transforms/Vectorize/VPlan.h
241 ↗	(On Diff #121904)	Ok, I thought this was a direct caching mechanism, now it makes sense.

This revision is now accepted and ready to land.Nov 19 2017, 9:49 AM

Closed by commit rL318645: [LV] Model masking in VPlan, introducing VPInstructions (authored by gilr). · Explain WhyNov 20 2017, 4:01 AM

This revision was automatically updated to reflect the committed changes.

XiaPZ added a subscriber: XiaPZ.Nov 13 2022, 5:53 AM

Herald added a reviewer: jdoerfert. · View Herald TranscriptNov 13 2022, 5:53 AM

Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript

Herald added subscribers: • pcwang-thead, tschuett, sstefan1 and 3 others. · View Herald Transcript

Revision Contents

Path

Size

llvm/

trunk/

docs/

Proposals/

VectorizationPlan.rst

69 lines

lib/

Transforms/

Vectorize/

343 lines

143 lines

74 lines

61 lines

146 lines

test/

Transforms/

LoopVectorize/

if-conversion-nest.ll

2 lines

Diff 123560

llvm/trunk/docs/Proposals/VectorizationPlan.rst

Show First 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	5. Support vectorizing idioms, such as interleaved groups of strided loads or
a "Recipe", which is responsible for computing its cost and generating its		a "Recipe", which is responsible for computing its cost and generating its
code.		code.

6. Encapsulate Single-Entry Single-Exit regions (SESE). During vectorization		6. Encapsulate Single-Entry Single-Exit regions (SESE). During vectorization
such regions may need to be, for example, predicated and linearized, or		such regions may need to be, for example, predicated and linearized, or
replicated VF*UF times to handle scalarized and predicated instructions.		replicated VF*UF times to handle scalarized and predicated instructions.
Innerloops are also modelled as SESE regions.		Innerloops are also modelled as SESE regions.

Low-level Design		7. Support instruction-level analysis and transformation, as part of Planning
================		Step 2.b: During vectorization instructions may need to be traversed, moved,
		replaced by other instructions or be created. For example, vector idiom
		detection and formation involves searching for and optimizing instruction
		patterns.

		Definitions
		===========
The low-level design of VPlan comprises of the following classes.		The low-level design of VPlan comprises of the following classes.

:LoopVectorizationPlanner:		:LoopVectorizationPlanner:
A LoopVectorizationPlanner is designed to handle the vectorization of a loop		A LoopVectorizationPlanner is designed to handle the vectorization of a loop
or a loop nest. It can construct, optimize and discard one or more VPlans,		or a loop nest. It can construct, optimize and discard one or more VPlans,
each VPlan modelling a distinct way to vectorize the loop or the loop nest.		each VPlan modelling a distinct way to vectorize the loop or the loop nest.
Once the best VPlan is determined, including the best VF and UF, this VPlan		Once the best VPlan is determined, including the best VF and UF, this VPlan
drives the generation of output IR.		drives the generation of output IR.
Show All 39 Lines
:VPRecipeBase:		:VPRecipeBase:
A pure-virtual base class modeling a sequence of one or more output IR		A pure-virtual base class modeling a sequence of one or more output IR
instructions, possibly based on one or more input IR instructions. These		instructions, possibly based on one or more input IR instructions. These
input IR instructions are referred to as "Ingredients" of the Recipe. A Recipe		input IR instructions are referred to as "Ingredients" of the Recipe. A Recipe
may specify how its ingredients are to be transformed to produce the output IR		may specify how its ingredients are to be transformed to produce the output IR
instructions; e.g., cloned once, replicated multiple times or widened		instructions; e.g., cloned once, replicated multiple times or widened
according to selected VF.		according to selected VF.

		:VPValue:
		The base of VPlan's def-use relations class hierarchy. When instantiated, it
		models a constant or a live-in Value in VPlan. It has users, which are of type
		VPUser, but no operands.

		:VPUser:
		A VPValue representing a general vertex in the def-use graph of VPlan. It has
		operands which are of type VPValue. When instantiated, it represents a
		live-out Instruction that exists outside VPlan. VPUser is similar in some
		aspects to LLVM's User class.

		:VPInstruction:
		A VPInstruction is both a VPRecipe and a VPUser. It models a single
		VPlan-level instruction to be generated if the VPlan is executed, including
		its opcode and possibly additional characteristics. It is the basis for
		writing instruction-level analyses and optimizations in VPlan as creating,
		replacing or moving VPInstructions record both def-use and scheduling
		decisions. VPInstructions also extend LLVM IR's opcodes with idiomatic
		operations that enrich the Vectorizer's semantics.

:VPTransformState:		:VPTransformState:
Stores information used for generating output IR, passed from		Stores information used for generating output IR, passed from
LoopVectorizationPlanner to its selected VPlan for execution, and used to pass		LoopVectorizationPlanner to its selected VPlan for execution, and used to pass
additional information down to VPBlocks and VPRecipes.		additional information down to VPBlocks and VPRecipes.

		The Planning Process and VPlan Roadmap
		======================================

		Transforming the Loop Vectorizer to use VPlan follows a staged approach. First,
		VPlan is used to record the final vectorization decisions, and to execute them:
		the Hierarchical CFG models the planned control-flow, and Recipes capture
		decisions taken inside basic-blocks. Next, VPlan will be used also as the basis
		for taking these decisions, effectively turning them into a series of
		VPlan-to-VPlan algorithms. Finally, VPlan will support the planning process
		itself including cost-based analyses for making these decisions, to fully
		support compositional and iterative decision making.

		Some decisions are local to an instruction in the loop, such as whether to widen
		it into a vector instruction or replicate it, keeping the generated instructions
		in place. Other decisions, however, involve moving instructions, replacing them
		with other instructions, and/or introducing new instructions. For example, a
		cast may sink past a later instruction and be widened to handle first-order
		recurrence; an interleave group of strided gathers or scatters may effectively
		move to one place where they are replaced with shuffles and a common wide vector
		load or store; new instructions may be introduced to compute masks, shuffle the
		elements of vectors, and pack scalar values into vectors or vice-versa.

		In order for VPlan to support making instruction-level decisions and analyses,
		it needs to model the relevant instructions along with their def/use relations.
		This too follows a staged approach: first, the new instructions that compute
		masks are modeled as VPInstructions, along with their induced def/use subgraph.
		This effectively models masks in VPlan, facilitating VPlan-based predication.
		Next, the logic embedded within each Recipe for generating its instructions at
		VPlan execution time, will instead take part in the planning process by modeling
		them as VPInstructions. Finally, only logic that applies to instructions as a
		group will remain in Recipes, such as interleave groups and potentially other
		idiom groups having synergistic cost.

Related LLVM components		Related LLVM components
-----------------------		-----------------------
1. SLP Vectorizer: one can compare the VPlan model with LLVM's existing SLP		1. SLP Vectorizer: one can compare the VPlan model with LLVM's existing SLP
tree, where TSLP [3]_ adds Plan Step 2.b.		tree, where TSLP [3]_ adds Plan Step 2.b.

2. RegionInfo: one can compare VPlan's H-CFG with the Region Analysis as used by		2. RegionInfo: one can compare VPlan's H-CFG with the Region Analysis as used by
Polly [7]_.		Polly [7]_.

		3. Loop Vectorizer: the Vectorization Plan aims to upgrade the infrastructure of
		the Loop Vectorizer and extend it to handle outer loops [8,9]_.

References		References
----------		----------
.. [1] "Outer-loop vectorization: revisited for short SIMD architectures", Dorit		.. [1] "Outer-loop vectorization: revisited for short SIMD architectures", Dorit
Nuzman and Ayal Zaks, PACT 2008.		Nuzman and Ayal Zaks, PACT 2008.

.. [2] "Proposal for function vectorization and loop vectorization with function		.. [2] "Proposal for function vectorization and loop vectorization with function
calls", Xinmin Tian, [`cfe-dev		calls", Xinmin Tian, [`cfe-dev
<http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html>`_].,		<http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html>`_].,
Show All 12 Lines
.. [6] "Structural analysis: A new approach to flow analysis in optimizing		.. [6] "Structural analysis: A new approach to flow analysis in optimizing
compilers", M. Sharir, Journal of Computer Languages, Jan. 1980		compilers", M. Sharir, Journal of Computer Languages, Jan. 1980

.. [7] "Enabling Polyhedral Optimizations in LLVM", Tobias Grosser, Diploma		.. [7] "Enabling Polyhedral Optimizations in LLVM", Tobias Grosser, Diploma
thesis, 2011.		thesis, 2011.

.. [8] "Introducing VPlan to the Loop Vectorizer", Gil Rapaport and Ayal Zaks,		.. [8] "Introducing VPlan to the Loop Vectorizer", Gil Rapaport and Ayal Zaks,
European LLVM Developers' Meeting 2017.		European LLVM Developers' Meeting 2017.

		.. [9] "Extending LoopVectorizer: OpenMP4.5 SIMD and Outer Loop
		Auto-Vectorization", Intel Vectorizer Team, LLVM Developers' Meeting 2016.

llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
//		//
// S. Maleki, Y. Gao, M. Garzaran, T. Wong and D. Padua. An Evaluation of		// S. Maleki, Y. Gao, M. Garzaran, T. Wong and D. Padua. An Evaluation of
// Vectorizing Compilers.		// Vectorizing Compilers.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/Transforms/Vectorize/LoopVectorize.h"		#include "llvm/Transforms/Vectorize/LoopVectorize.h"
#include "VPlan.h"		#include "VPlan.h"
		#include "VPlanBuilder.h"
#include "llvm/ADT/APInt.h"		#include "llvm/ADT/APInt.h"
#include "llvm/ADT/ArrayRef.h"		#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/DenseMapInfo.h"		#include "llvm/ADT/DenseMapInfo.h"
#include "llvm/ADT/Hashing.h"		#include "llvm/ADT/Hashing.h"
#include "llvm/ADT/MapVector.h"		#include "llvm/ADT/MapVector.h"
#include "llvm/ADT/None.h"		#include "llvm/ADT/None.h"
#include "llvm/ADT/Optional.h"		#include "llvm/ADT/Optional.h"
▲ Show 20 Lines • Show All 386 Lines • ▼ Show 20 Lines	public:
// Return true if any runtime check is added.		// Return true if any runtime check is added.
bool areSafetyChecksAdded() { return AddedSafetyChecks; }		bool areSafetyChecksAdded() { return AddedSafetyChecks; }

/// A type for vectorized values in the new loop. Each value from the		/// A type for vectorized values in the new loop. Each value from the
/// original loop, when vectorized, is represented by UF vector values in the		/// original loop, when vectorized, is represented by UF vector values in the
/// new unrolled loop, where UF is the unroll factor.		/// new unrolled loop, where UF is the unroll factor.
using VectorParts = SmallVector<Value *, 2>;		using VectorParts = SmallVector<Value *, 2>;

/// A helper function that computes the predicate of the block BB, assuming
/// that the header block of the loop is set to True. It returns the entry
/// mask for the block BB.
VectorParts createBlockInMask(BasicBlock *BB);

/// A helper function that computes the predicate of the edge between SRC
/// and DST.
VectorParts createEdgeMask(BasicBlock Src, BasicBlock Dst);

/// Vectorize a single PHINode in a block. This method handles the induction		/// Vectorize a single PHINode in a block. This method handles the induction
/// variable canonicalization. It supports both VF = 1 for unrolled loops and		/// variable canonicalization. It supports both VF = 1 for unrolled loops and
/// arbitrary length vectors.		/// arbitrary length vectors.
void widenPHIInstruction(Instruction *PN, unsigned UF, unsigned VF);		void widenPHIInstruction(Instruction *PN, unsigned UF, unsigned VF);

/// A helper function to scalarize a single Instruction in the innermost loop.		/// A helper function to scalarize a single Instruction in the innermost loop.
/// Generates a sequence of scalar instances for each lane between \p MinLane		/// Generates a sequence of scalar instances for each lane between \p MinLane
/// and \p MaxLane, times each part between \p MinPart and \p MaxPart,		/// and \p MaxLane, times each part between \p MinPart and \p MaxPart,
Show All 36 Lines	public:
Value getOrCreateScalarValue(Value V, const VPIteration &Instance);		Value getOrCreateScalarValue(Value V, const VPIteration &Instance);

/// Construct the vector value of a scalarized value \p V one lane at a time.		/// Construct the vector value of a scalarized value \p V one lane at a time.
void packScalarIntoVectorValue(Value *V, const VPIteration &Instance);		void packScalarIntoVectorValue(Value *V, const VPIteration &Instance);

/// Try to vectorize the interleaved access group that \p Instr belongs to.		/// Try to vectorize the interleaved access group that \p Instr belongs to.
void vectorizeInterleaveGroup(Instruction *Instr);		void vectorizeInterleaveGroup(Instruction *Instr);

/// Vectorize Load and Store instructions,		/// Vectorize Load and Store instructions, optionally masking the vector
virtual void vectorizeMemoryInstruction(Instruction *Instr);		/// operations if \p BlockInMask is non-null.
		void vectorizeMemoryInstruction(Instruction *Instr,
		VectorParts *BlockInMask = nullptr);

/// \brief Set the debug location in the builder using the debug location in		/// \brief Set the debug location in the builder using the debug location in
/// the instruction.		/// the instruction.
void setDebugLocFromInst(IRBuilder<> &B, const Value *Ptr);		void setDebugLocFromInst(IRBuilder<> &B, const Value *Ptr);

protected:		protected:
friend class LoopVectorizationPlanner;		friend class LoopVectorizationPlanner;

/// A small list of PHINodes.		/// A small list of PHINodes.
using PhiVector = SmallVector<PHINode *, 4>;		using PhiVector = SmallVector<PHINode *, 4>;

/// A type for scalarized values in the new loop. Each value from the		/// A type for scalarized values in the new loop. Each value from the
/// original loop, when scalarized, is represented by UF x VF scalar values		/// original loop, when scalarized, is represented by UF x VF scalar values
/// in the new unrolled loop, where UF is the unroll factor and VF is the		/// in the new unrolled loop, where UF is the unroll factor and VF is the
/// vectorization factor.		/// vectorization factor.
using ScalarParts = SmallVector<SmallVector<Value *, 4>, 2>;		using ScalarParts = SmallVector<SmallVector<Value *, 4>, 2>;

// When we if-convert we need to create edge masks. We have to cache values
// so that we don't end up with exponential recursion/IR.
using EdgeMaskCacheTy =
DenseMap<std::pair<BasicBlock , BasicBlock >, VectorParts>;
using BlockMaskCacheTy = DenseMap<BasicBlock *, VectorParts>;

/// Set up the values of the IVs correctly when exiting the vector loop.		/// Set up the values of the IVs correctly when exiting the vector loop.
void fixupIVUsers(PHINode *OrigPhi, const InductionDescriptor &II,		void fixupIVUsers(PHINode *OrigPhi, const InductionDescriptor &II,
Value CountRoundDown, Value EndValue,		Value CountRoundDown, Value EndValue,
BasicBlock *MiddleBlock);		BasicBlock *MiddleBlock);

/// Create a new induction variable inside L.		/// Create a new induction variable inside L.
PHINode createInductionVariable(Loop L, Value Start, Value End,		PHINode createInductionVariable(Loop L, Value Start, Value End,
Value Step, Instruction DL);		Value Step, Instruction DL);
▲ Show 20 Lines • Show All 187 Lines • ▼ Show 20 Lines	protected:
/// vectorized loop. A key value can map to either vector values, scalar		/// vectorized loop. A key value can map to either vector values, scalar
/// values or both kinds of values, depending on whether the key was		/// values or both kinds of values, depending on whether the key was
/// vectorized and scalarized.		/// vectorized and scalarized.
VectorizerValueMap VectorLoopValueMap;		VectorizerValueMap VectorLoopValueMap;

/// Store instructions that were predicated.		/// Store instructions that were predicated.
SmallVector<Instruction *, 4> PredicatedInstructions;		SmallVector<Instruction *, 4> PredicatedInstructions;

EdgeMaskCacheTy EdgeMaskCache;
BlockMaskCacheTy BlockMaskCache;

/// Trip count of the original loop.		/// Trip count of the original loop.
Value *TripCount = nullptr;		Value *TripCount = nullptr;

/// Trip count of the widened loop (TripCount - TripCount % (VF*UF))		/// Trip count of the widened loop (TripCount - TripCount % (VF*UF))
Value *VectorTripCount = nullptr;		Value *VectorTripCount = nullptr;

/// The legality analysis.		/// The legality analysis.
LoopVectorizationLegality *Legal;		LoopVectorizationLegality *Legal;
▲ Show 20 Lines • Show All 1,473 Lines • ▼ Show 20 Lines	class LoopVectorizationPlanner {
const TargetTransformInfo *TTI;		const TargetTransformInfo *TTI;

/// The legality analysis.		/// The legality analysis.
LoopVectorizationLegality *Legal;		LoopVectorizationLegality *Legal;

/// The profitablity analysis.		/// The profitablity analysis.
LoopVectorizationCostModel &CM;		LoopVectorizationCostModel &CM;

SmallVector<std::unique_ptr<VPlan>, 4> VPlans;		using VPlanPtr = std::unique_ptr<VPlan>;

		SmallVector<VPlanPtr, 4> VPlans;

		/// This class is used to enable the VPlan to invoke a method of ILV. This is
		/// needed until the method is refactored out of ILV and becomes reusable.
		struct VPCallbackILV : public VPCallback {
		InnerLoopVectorizer &ILV;

		VPCallbackILV(InnerLoopVectorizer &ILV) : ILV(ILV) {}

		Value getOrCreateVectorValues(Value V, unsigned Part) override {
		return ILV.getOrCreateVectorValue(V, Part);
		}
		};

		/// A builder used to construct the current plan.
		VPBuilder Builder;

		/// When we if-convert we need to create edge masks. We have to cache values
		/// so that we don't end up with exponential recursion/IR. Note that
		/// if-conversion currently takes place during VPlan-construction, so these
		/// caches are only used at that stage.
		using EdgeMaskCacheTy =
		DenseMap<std::pair<BasicBlock , BasicBlock >, VPValue *>;
		using BlockMaskCacheTy = DenseMap<BasicBlock , VPValue >;
		EdgeMaskCacheTy EdgeMaskCache;
		BlockMaskCacheTy BlockMaskCache;

unsigned BestVF = 0;		unsigned BestVF = 0;
unsigned BestUF = 0;		unsigned BestUF = 0;

public:		public:
LoopVectorizationPlanner(Loop L, LoopInfo LI, const TargetLibraryInfo *TLI,		LoopVectorizationPlanner(Loop L, LoopInfo LI, const TargetLibraryInfo *TLI,
const TargetTransformInfo *TTI,		const TargetTransformInfo *TTI,
LoopVectorizationLegality *Legal,		LoopVectorizationLegality *Legal,
Show All 40 Lines	bool getDecisionAndClampRange(const std::function<bool(unsigned)> &Predicate,
VFRange &Range);		VFRange &Range);

/// Build VPlans for power-of-2 VF's between \p MinVF and \p MaxVF inclusive,		/// Build VPlans for power-of-2 VF's between \p MinVF and \p MaxVF inclusive,
/// according to the information gathered by Legal when it checked if it is		/// according to the information gathered by Legal when it checked if it is
/// legal to vectorize the loop.		/// legal to vectorize the loop.
void buildVPlans(unsigned MinVF, unsigned MaxVF);		void buildVPlans(unsigned MinVF, unsigned MaxVF);

private:		private:
		/// A helper function that computes the predicate of the block BB, assuming
		/// that the header block of the loop is set to True. It returns the entry
		/// mask for the block BB.
		VPValue createBlockInMask(BasicBlock BB, VPlanPtr &Plan);

		/// A helper function that computes the predicate of the edge between SRC
		/// and DST.
		VPValue createEdgeMask(BasicBlock Src, BasicBlock *Dst, VPlanPtr &Plan);

/// Check if \I belongs to an Interleave Group within the given VF \p Range,		/// Check if \I belongs to an Interleave Group within the given VF \p Range,
/// \return true in the first returned value if so and false otherwise.		/// \return true in the first returned value if so and false otherwise.
/// Build a new VPInterleaveGroup Recipe if \I is the primary member of an IG		/// Build a new VPInterleaveGroup Recipe if \I is the primary member of an IG
/// for \p Range.Start, and provide it as the second returned value.		/// for \p Range.Start, and provide it as the second returned value.
/// Note that if \I is an adjunct member of an IG for \p Range.Start, the		/// Note that if \I is an adjunct member of an IG for \p Range.Start, the
/// \return value is <true, nullptr>, as it is handled by another recipe.		/// \return value is <true, nullptr>, as it is handled by another recipe.
/// \p Range.End may be decreased to ensure same decision from \p Range.Start		/// \p Range.End may be decreased to ensure same decision from \p Range.Start
/// to \p Range.End.		/// to \p Range.End.
VPInterleaveRecipe tryToInterleaveMemory(Instruction I, VFRange &Range);		VPInterleaveRecipe tryToInterleaveMemory(Instruction I, VFRange &Range);

// Check if \I is a memory instruction to be widened for \p Range.Start and		// Check if \I is a memory instruction to be widened for \p Range.Start and
// potentially masked.		// potentially masked. Such instructions are handled by a recipe that takes an
		// additional VPInstruction for the mask.
VPWidenMemoryInstructionRecipe tryToWidenMemory(Instruction I,		VPWidenMemoryInstructionRecipe tryToWidenMemory(Instruction I,
VFRange &Range);		VFRange &Range,
		VPlanPtr &Plan);

/// Check if an induction recipe should be constructed for \I within the given		/// Check if an induction recipe should be constructed for \I within the given
/// VF \p Range. If so build and return it. If not, return null. \p Range.End		/// VF \p Range. If so build and return it. If not, return null. \p Range.End
/// may be decreased to ensure same decision from \p Range.Start to		/// may be decreased to ensure same decision from \p Range.Start to
/// \p Range.End.		/// \p Range.End.
VPWidenIntOrFpInductionRecipe tryToOptimizeInduction(Instruction I,		VPWidenIntOrFpInductionRecipe tryToOptimizeInduction(Instruction I,
VFRange &Range);		VFRange &Range);

/// Handle non-loop phi nodes. Currently all such phi nodes are turned into		/// Handle non-loop phi nodes. Currently all such phi nodes are turned into
/// a sequence of select instructions as the vectorizer currently performs		/// a sequence of select instructions as the vectorizer currently performs
/// full if-conversion.		/// full if-conversion.
VPBlendRecipe tryToBlend(Instruction I);		VPBlendRecipe tryToBlend(Instruction I, VPlanPtr &Plan);

/// Check if \p I can be widened within the given VF \p Range. If \p I can be		/// Check if \p I can be widened within the given VF \p Range. If \p I can be
/// widened for \p Range.Start, check if the last recipe of \p VPBB can be		/// widened for \p Range.Start, check if the last recipe of \p VPBB can be
/// extended to include \p I or else build a new VPWidenRecipe for it and		/// extended to include \p I or else build a new VPWidenRecipe for it and
/// append it to \p VPBB. Return true if \p I can be widened for Range.Start,		/// append it to \p VPBB. Return true if \p I can be widened for Range.Start,
/// false otherwise. Range.End may be decreased to ensure same decision from		/// false otherwise. Range.End may be decreased to ensure same decision from
/// \p Range.Start to \p Range.End.		/// \p Range.Start to \p Range.End.
bool tryToWiden(Instruction I, VPBasicBlock VPBB, VFRange &Range);		bool tryToWiden(Instruction I, VPBasicBlock VPBB, VFRange &Range);

/// Build a VPReplicationRecipe for \p I and enclose it within a Region if it		/// Build a VPReplicationRecipe for \p I and enclose it within a Region if it
/// is predicated. \return \p VPBB augmented with this new recipe if \p I is		/// is predicated. \return \p VPBB augmented with this new recipe if \p I is
/// not predicated, otherwise \return a new VPBasicBlock that succeeds the new		/// not predicated, otherwise \return a new VPBasicBlock that succeeds the new
/// Region. Update the packing decision of predicated instructions if they		/// Region. Update the packing decision of predicated instructions if they
/// feed \p I. Range.End may be decreased to ensure same recipe behavior from		/// feed \p I. Range.End may be decreased to ensure same recipe behavior from
/// \p Range.Start to \p Range.End.		/// \p Range.Start to \p Range.End.
VPBasicBlock *handleReplication(		VPBasicBlock *handleReplication(
Instruction I, VFRange &Range, VPBasicBlock VPBB,		Instruction I, VFRange &Range, VPBasicBlock VPBB,
DenseMap<Instruction , VPReplicateRecipe > &PredInst2Recipe);		DenseMap<Instruction , VPReplicateRecipe > &PredInst2Recipe,
		VPlanPtr &Plan);

/// Create a replicating region for instruction \p I that requires		/// Create a replicating region for instruction \p I that requires
/// predication. \p PredRecipe is a VPReplicateRecipe holding \p I.		/// predication. \p PredRecipe is a VPReplicateRecipe holding \p I.
VPRegionBlock createReplicateRegion(Instruction I,		VPRegionBlock createReplicateRegion(Instruction I, VPRecipeBase *PredRecipe,
VPRecipeBase *PredRecipe);		VPlanPtr &Plan);

/// Build a VPlan according to the information gathered by Legal. \return a		/// Build a VPlan according to the information gathered by Legal. \return a
/// VPlan for vectorization factors \p Range.Start and up to \p Range.End		/// VPlan for vectorization factors \p Range.Start and up to \p Range.End
/// exclusive, possibly decreasing \p Range.End.		/// exclusive, possibly decreasing \p Range.End.
std::unique_ptr<VPlan> buildVPlan(VFRange &Range);		VPlanPtr buildVPlan(VFRange &Range,
		const SmallPtrSetImpl<Value *> &NeedDef);
};		};

} // end namespace llvm		} // end namespace llvm

namespace {		namespace {

/// \brief This holds vectorization requirements that must be verified late in		/// \brief This holds vectorization requirements that must be verified late in
/// the process. The requirements are set by legalize and costmodel. Once		/// the process. The requirements are set by legalize and costmodel. Once
▲ Show 20 Lines • Show All 733 Lines • ▼ Show 20 Lines	Value *IVec = Builder.CreateShuffleVector(WideVec, UndefVec, IMask,
"interleaved.vec");		"interleaved.vec");

Instruction *NewStoreInstr =		Instruction *NewStoreInstr =
Builder.CreateAlignedStore(IVec, NewPtrs[Part], Group->getAlignment());		Builder.CreateAlignedStore(IVec, NewPtrs[Part], Group->getAlignment());
addMetadata(NewStoreInstr, Instr);		addMetadata(NewStoreInstr, Instr);
}		}
}		}

void InnerLoopVectorizer::vectorizeMemoryInstruction(Instruction *Instr) {		void InnerLoopVectorizer::vectorizeMemoryInstruction(Instruction *Instr,
		VectorParts *BlockInMask) {
// Attempt to issue a wide load.		// Attempt to issue a wide load.
LoadInst *LI = dyn_cast<LoadInst>(Instr);		LoadInst *LI = dyn_cast<LoadInst>(Instr);
StoreInst *SI = dyn_cast<StoreInst>(Instr);		StoreInst *SI = dyn_cast<StoreInst>(Instr);

assert((LI \|\| SI) && "Invalid Load/Store instruction");		assert((LI \|\| SI) && "Invalid Load/Store instruction");

LoopVectorizationCostModel::InstWidening Decision =		LoopVectorizationCostModel::InstWidening Decision =
Cost->getWideningDecision(Instr, VF);		Cost->getWideningDecision(Instr, VF);
Show All 24 Lines	void InnerLoopVectorizer::vectorizeMemoryInstruction(Instruction *Instr,
// gather/scatter. Otherwise Decision should have been to Scalarize.		// gather/scatter. Otherwise Decision should have been to Scalarize.
assert((ConsecutiveStride \|\| CreateGatherScatter) &&		assert((ConsecutiveStride \|\| CreateGatherScatter) &&
"The instruction should be scalarized");		"The instruction should be scalarized");

// Handle consecutive loads/stores.		// Handle consecutive loads/stores.
if (ConsecutiveStride)		if (ConsecutiveStride)
Ptr = getOrCreateScalarValue(Ptr, {0, 0});		Ptr = getOrCreateScalarValue(Ptr, {0, 0});

VectorParts Mask = createBlockInMask(Instr->getParent());		VectorParts Mask;
		bool isMaskRequired = BlockInMask;
		if (isMaskRequired)
		Mask = *BlockInMask;

// Handle Stores:		// Handle Stores:
if (SI) {		if (SI) {
assert(!Legal->isUniform(SI->getPointerOperand()) &&		assert(!Legal->isUniform(SI->getPointerOperand()) &&
"We do not allow storing to uniform addresses");		"We do not allow storing to uniform addresses");
setDebugLocFromInst(Builder, SI);		setDebugLocFromInst(Builder, SI);

for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
Instruction *NewSI = nullptr;		Instruction *NewSI = nullptr;
Value *StoredVal = getOrCreateVectorValue(SI->getValueOperand(), Part);		Value *StoredVal = getOrCreateVectorValue(SI->getValueOperand(), Part);
if (CreateGatherScatter) {		if (CreateGatherScatter) {
Value *MaskPart = Legal->isMaskRequired(SI) ? Mask[Part] : nullptr;		Value *MaskPart = isMaskRequired ? Mask[Part] : nullptr;
Value *VectorGep = getOrCreateVectorValue(Ptr, Part);		Value *VectorGep = getOrCreateVectorValue(Ptr, Part);
NewSI = Builder.CreateMaskedScatter(StoredVal, VectorGep, Alignment,		NewSI = Builder.CreateMaskedScatter(StoredVal, VectorGep, Alignment,
MaskPart);		MaskPart);
} else {		} else {
// Calculate the pointer for the specific unroll-part.		// Calculate the pointer for the specific unroll-part.
Value *PartPtr =		Value *PartPtr =
Builder.CreateGEP(nullptr, Ptr, Builder.getInt32(Part * VF));		Builder.CreateGEP(nullptr, Ptr, Builder.getInt32(Part * VF));

if (Reverse) {		if (Reverse) {
// If we store to reverse consecutive memory locations, then we need		// If we store to reverse consecutive memory locations, then we need
// to reverse the order of elements in the stored value.		// to reverse the order of elements in the stored value.
StoredVal = reverseVector(StoredVal);		StoredVal = reverseVector(StoredVal);
// We don't want to update the value in the map as it might be used in		// We don't want to update the value in the map as it might be used in
// another expression. So don't call resetVectorValue(StoredVal).		// another expression. So don't call resetVectorValue(StoredVal).

// If the address is consecutive but reversed, then the		// If the address is consecutive but reversed, then the
// wide store needs to start at the last vector element.		// wide store needs to start at the last vector element.
PartPtr =		PartPtr =
Builder.CreateGEP(nullptr, Ptr, Builder.getInt32(-Part * VF));		Builder.CreateGEP(nullptr, Ptr, Builder.getInt32(-Part * VF));
PartPtr =		PartPtr =
Builder.CreateGEP(nullptr, PartPtr, Builder.getInt32(1 - VF));		Builder.CreateGEP(nullptr, PartPtr, Builder.getInt32(1 - VF));
if (Mask[Part]) // The reverse of a null all-one mask is a null mask.		if (isMaskRequired) // Reverse of a null all-one mask is a null mask.
Mask[Part] = reverseVector(Mask[Part]);		Mask[Part] = reverseVector(Mask[Part]);
}		}

Value *VecPtr =		Value *VecPtr =
Builder.CreateBitCast(PartPtr, DataTy->getPointerTo(AddressSpace));		Builder.CreateBitCast(PartPtr, DataTy->getPointerTo(AddressSpace));

if (Legal->isMaskRequired(SI) && Mask[Part])		if (isMaskRequired)
NewSI = Builder.CreateMaskedStore(StoredVal, VecPtr, Alignment,		NewSI = Builder.CreateMaskedStore(StoredVal, VecPtr, Alignment,
Mask[Part]);		Mask[Part]);
else		else
NewSI = Builder.CreateAlignedStore(StoredVal, VecPtr, Alignment);		NewSI = Builder.CreateAlignedStore(StoredVal, VecPtr, Alignment);
}		}
addMetadata(NewSI, SI);		addMetadata(NewSI, SI);
}		}
return;		return;
}		}

// Handle loads.		// Handle loads.
assert(LI && "Must have a load instruction");		assert(LI && "Must have a load instruction");
setDebugLocFromInst(Builder, LI);		setDebugLocFromInst(Builder, LI);
for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
Value *NewLI;		Value *NewLI;
if (CreateGatherScatter) {		if (CreateGatherScatter) {
Value *MaskPart = Legal->isMaskRequired(LI) ? Mask[Part] : nullptr;		Value *MaskPart = isMaskRequired ? Mask[Part] : nullptr;
Value *VectorGep = getOrCreateVectorValue(Ptr, Part);		Value *VectorGep = getOrCreateVectorValue(Ptr, Part);
NewLI = Builder.CreateMaskedGather(VectorGep, Alignment, MaskPart,		NewLI = Builder.CreateMaskedGather(VectorGep, Alignment, MaskPart,
nullptr, "wide.masked.gather");		nullptr, "wide.masked.gather");
addMetadata(NewLI, LI);		addMetadata(NewLI, LI);
} else {		} else {
// Calculate the pointer for the specific unroll-part.		// Calculate the pointer for the specific unroll-part.
Value *PartPtr =		Value *PartPtr =
Builder.CreateGEP(nullptr, Ptr, Builder.getInt32(Part * VF));		Builder.CreateGEP(nullptr, Ptr, Builder.getInt32(Part * VF));

if (Reverse) {		if (Reverse) {
// If the address is consecutive but reversed, then the		// If the address is consecutive but reversed, then the
// wide load needs to start at the last vector element.		// wide load needs to start at the last vector element.
PartPtr = Builder.CreateGEP(nullptr, Ptr, Builder.getInt32(-Part * VF));		PartPtr = Builder.CreateGEP(nullptr, Ptr, Builder.getInt32(-Part * VF));
PartPtr = Builder.CreateGEP(nullptr, PartPtr, Builder.getInt32(1 - VF));		PartPtr = Builder.CreateGEP(nullptr, PartPtr, Builder.getInt32(1 - VF));
if (Mask[Part]) // The reverse of a null all-one mask is a null mask.		if (isMaskRequired) // Reverse of a null all-one mask is a null mask.
Mask[Part] = reverseVector(Mask[Part]);		Mask[Part] = reverseVector(Mask[Part]);
}		}

Value *VecPtr =		Value *VecPtr =
Builder.CreateBitCast(PartPtr, DataTy->getPointerTo(AddressSpace));		Builder.CreateBitCast(PartPtr, DataTy->getPointerTo(AddressSpace));
if (Legal->isMaskRequired(LI) && Mask[Part])		if (isMaskRequired)
NewLI = Builder.CreateMaskedLoad(VecPtr, Alignment, Mask[Part],		NewLI = Builder.CreateMaskedLoad(VecPtr, Alignment, Mask[Part],
UndefValue::get(DataTy),		UndefValue::get(DataTy),
"wide.masked.load");		"wide.masked.load");
else		else
NewLI = Builder.CreateAlignedLoad(VecPtr, Alignment, "wide.load");		NewLI = Builder.CreateAlignedLoad(VecPtr, Alignment, "wide.load");

// Add metadata to the load, but setVectorValue to the reverse shuffle.		// Add metadata to the load, but setVectorValue to the reverse shuffle.
addMetadata(NewLI, LI);		addMetadata(NewLI, LI);
▲ Show 20 Lines • Show All 1,285 Lines • ▼ Show 20 Lines	while (!Worklist.empty()) {
// need to iterate.		// need to iterate.
Changed = true;		Changed = true;
}		}
} while (Changed);		} while (Changed);
}		}

void InnerLoopVectorizer::widenPHIInstruction(Instruction *PN, unsigned UF,		void InnerLoopVectorizer::widenPHIInstruction(Instruction *PN, unsigned UF,
unsigned VF) {		unsigned VF) {
		assert(PN->getParent() == OrigLoop->getHeader() &&
		"Non-header phis should have been handled elsewhere");

PHINode *P = cast<PHINode>(PN);		PHINode *P = cast<PHINode>(PN);
// In order to support recurrences we need to be able to vectorize Phi nodes.		// In order to support recurrences we need to be able to vectorize Phi nodes.
// Phi nodes have cycles, so we need to vectorize them in two stages. This is		// Phi nodes have cycles, so we need to vectorize them in two stages. This is
// stage #1: We create a new vector PHI node with no incoming edges. We'll use		// stage #1: We create a new vector PHI node with no incoming edges. We'll use
// this value when we vectorize all of the instructions that use the PHI.		// this value when we vectorize all of the instructions that use the PHI.
if (Legal->isReductionVariable(P) \|\| Legal->isFirstOrderRecurrence(P)) {		if (Legal->isReductionVariable(P) \|\| Legal->isFirstOrderRecurrence(P)) {
for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
// This is phase one of vectorizing PHIs.		// This is phase one of vectorizing PHIs.
▲ Show 20 Lines • Show All 2,981 Lines • ▼ Show 20 Lines	LoopVectorizationPlanner::plan(bool OptForSize, unsigned UserVF) {
return CM.selectVectorizationFactor(MaxVF);		return CM.selectVectorizationFactor(MaxVF);
}		}

void LoopVectorizationPlanner::setBestPlan(unsigned VF, unsigned UF) {		void LoopVectorizationPlanner::setBestPlan(unsigned VF, unsigned UF) {
DEBUG(dbgs() << "Setting best plan to VF=" << VF << ", UF=" << UF << '\n');		DEBUG(dbgs() << "Setting best plan to VF=" << VF << ", UF=" << UF << '\n');
BestVF = VF;		BestVF = VF;
BestUF = UF;		BestUF = UF;

erase_if(VPlans, [VF](const std::unique_ptr<VPlan> &Plan) {		erase_if(VPlans, [VF](const VPlanPtr &Plan) {
return !Plan->hasVF(VF);		return !Plan->hasVF(VF);
});		});
assert(VPlans.size() == 1 && "Best VF has not a single VPlan.");		assert(VPlans.size() == 1 && "Best VF has not a single VPlan.");
}		}

void LoopVectorizationPlanner::executePlan(InnerLoopVectorizer &ILV,		void LoopVectorizationPlanner::executePlan(InnerLoopVectorizer &ILV,
DominatorTree *DT) {		DominatorTree *DT) {
// Perform the actual loop transformation.		// Perform the actual loop transformation.

// 1. Create a new empty loop. Unlink the old loop and connect the new one.		// 1. Create a new empty loop. Unlink the old loop and connect the new one.
VPTransformState State{		VPCallbackILV CallbackILV(ILV);
BestVF, BestUF, LI, DT, ILV.Builder, ILV.VectorLoopValueMap, &ILV};
		VPTransformState State{BestVF, BestUF, LI,
		DT, ILV.Builder, ILV.VectorLoopValueMap,
		&ILV, CallbackILV};
State.CFG.PrevBB = ILV.createVectorizedLoopSkeleton();		State.CFG.PrevBB = ILV.createVectorizedLoopSkeleton();

//===------------------------------------------------===//		//===------------------------------------------------===//
//		//
// Notice: any optimization or new instruction that go		// Notice: any optimization or new instruction that go
// into the code below should also be implemented in		// into the code below should also be implemented in
// the cost-model.		// the cost-model.
//		//
▲ Show 20 Lines • Show All 196 Lines • ▼ Show 20 Lines
};		};

/// A recipe for vectorizing a phi-node as a sequence of mask-based select		/// A recipe for vectorizing a phi-node as a sequence of mask-based select
/// instructions.		/// instructions.
class VPBlendRecipe : public VPRecipeBase {		class VPBlendRecipe : public VPRecipeBase {
private:		private:
PHINode *Phi;		PHINode *Phi;

		/// The blend operation is a User of a mask, if not null.
		std::unique_ptr<VPUser> User;

public:		public:
VPBlendRecipe(PHINode *Phi) : VPRecipeBase(VPBlendSC), Phi(Phi) {}		VPBlendRecipe(PHINode Phi, ArrayRef<VPValue > Masks)
		: VPRecipeBase(VPBlendSC), Phi(Phi) {
		assert((Phi->getNumIncomingValues() == 1 \|\|
		Phi->getNumIncomingValues() == Masks.size()) &&
		"Expected the same number of incoming values and masks");
		if (!Masks.empty())
		User.reset(new VPUser(Masks));
		}

/// Method to support type inquiry through isa, cast, and dyn_cast.		/// Method to support type inquiry through isa, cast, and dyn_cast.
static inline bool classof(const VPRecipeBase *V) {		static inline bool classof(const VPRecipeBase *V) {
return V->getVPRecipeID() == VPRecipeBase::VPBlendSC;		return V->getVPRecipeID() == VPRecipeBase::VPBlendSC;
}		}

/// Generate the phi/select nodes.		/// Generate the phi/select nodes.
void execute(VPTransformState &State) override {		void execute(VPTransformState &State) override {
State.ILV->setDebugLocFromInst(State.Builder, Phi);		State.ILV->setDebugLocFromInst(State.Builder, Phi);
// We know that all PHIs in non-header blocks are converted into		// We know that all PHIs in non-header blocks are converted into
// selects, so we don't have to worry about the insertion order and we		// selects, so we don't have to worry about the insertion order and we
// can just use the builder.		// can just use the builder.
// At this point we generate the predication tree. There may be		// At this point we generate the predication tree. There may be
// duplications since this is a simple recursive scan, but future		// duplications since this is a simple recursive scan, but future
// optimizations will clean it up.		// optimizations will clean it up.

unsigned NumIncoming = Phi->getNumIncomingValues();		unsigned NumIncoming = Phi->getNumIncomingValues();

		assert((User \|\| NumIncoming == 1) &&
		"Multiple predecessors with predecessors having a full mask");
// Generate a sequence of selects of the form:		// Generate a sequence of selects of the form:
// SELECT(Mask3, In3,		// SELECT(Mask3, In3,
// SELECT(Mask2, In2,		// SELECT(Mask2, In2,
// ( ...)))		// ( ...)))
InnerLoopVectorizer::VectorParts Entry(State.UF);		InnerLoopVectorizer::VectorParts Entry(State.UF);
for (unsigned In = 0; In < NumIncoming; In++) {		for (unsigned In = 0; In < NumIncoming; ++In) {
InnerLoopVectorizer::VectorParts Cond =
State.ILV->createEdgeMask(Phi->getIncomingBlock(In), Phi->getParent());

for (unsigned Part = 0; Part < State.UF; ++Part) {		for (unsigned Part = 0; Part < State.UF; ++Part) {
		// We might have single edge PHIs (blocks) - use an identity
		// 'select' for the first PHI operand.
Value *In0 =		Value *In0 =
State.ILV->getOrCreateVectorValue(Phi->getIncomingValue(In), Part);		State.ILV->getOrCreateVectorValue(Phi->getIncomingValue(In), Part);
assert((Cond[Part] \|\| NumIncoming == 1) &&
"Multiple predecessors with one predecessor having a full mask");
if (In == 0)		if (In == 0)
Entry[Part] = In0; // Initialize with the first incoming value.		Entry[Part] = In0; // Initialize with the first incoming value.
else		else {
// Select between the current value and the previous incoming edge		// Select between the current value and the previous incoming edge
// based on the incoming mask.		// based on the incoming mask.
Entry[Part] = State.Builder.CreateSelect(Cond[Part], In0, Entry[Part],		Value *Cond = State.get(User->getOperand(In), Part);
"predphi");		Entry[Part] =
		State.Builder.CreateSelect(Cond, In0, Entry[Part], "predphi");
		}
}		}
}		}
for (unsigned Part = 0; Part < State.UF; ++Part)		for (unsigned Part = 0; Part < State.UF; ++Part)
State.ValueMap.setVectorValue(Phi, Part, Entry[Part]);		State.ValueMap.setVectorValue(Phi, Part, Entry[Part]);
}		}

/// Print the recipe.		/// Print the recipe.
void print(raw_ostream &O, const Twine &Indent) const override {		void print(raw_ostream &O, const Twine &Indent) const override {
O << " +\n" << Indent << "\"BLEND ";		O << " +\n" << Indent << "\"BLEND ";
Phi->printAsOperand(O, false);		Phi->printAsOperand(O, false);
O << " =";		O << " =";
if (Phi->getNumIncomingValues() == 1) {		if (!User) {
// Not a User of any mask: not really blending, this is a		// Not a User of any mask: not really blending, this is a
// single-predecessor phi.		// single-predecessor phi.
O << " ";		O << " ";
Phi->getIncomingValue(0)->printAsOperand(O, false);		Phi->getIncomingValue(0)->printAsOperand(O, false);
} else {		} else {
for (unsigned I = 0, E = Phi->getNumIncomingValues(); I < E; ++I) {		for (unsigned I = 0, E = User->getNumOperands(); I < E; ++I) {
O << " ";		O << " ";
Phi->getIncomingValue(I)->printAsOperand(O, false);		Phi->getIncomingValue(I)->printAsOperand(O, false);
O << "/";		O << "/";
Phi->getIncomingBlock(I)->printAsOperand(O, false);		User->getOperand(I)->printAsOperand(O);
}		}
}		}
O << "\\l\"";		O << "\\l\"";

}		}
};		};

/// VPInterleaveRecipe is a recipe for transforming an interleave group of load		/// VPInterleaveRecipe is a recipe for transforming an interleave group of load
/// or stores into one wide load/store and shuffles.		/// or stores into one wide load/store and shuffles.
class VPInterleaveRecipe : public VPRecipeBase {		class VPInterleaveRecipe : public VPRecipeBase {
private:		private:
const InterleaveGroup *IG;		const InterleaveGroup *IG;
▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	if (AlsoPack)
O << " (S->V)";		O << " (S->V)";
O << "\\l\"";		O << "\\l\"";
}		}
};		};

/// A recipe for generating conditional branches on the bits of a mask.		/// A recipe for generating conditional branches on the bits of a mask.
class VPBranchOnMaskRecipe : public VPRecipeBase {		class VPBranchOnMaskRecipe : public VPRecipeBase {
private:		private:
/// The input IR basic block used to obtain the mask providing the condition		std::unique_ptr<VPUser> User;
/// bits for the branch.
BasicBlock *MaskedBasicBlock;

public:		public:
VPBranchOnMaskRecipe(BasicBlock *BB)		VPBranchOnMaskRecipe(VPValue *BlockInMask) : VPRecipeBase(VPBranchOnMaskSC) {
: VPRecipeBase(VPBranchOnMaskSC), MaskedBasicBlock(BB) {}		if (BlockInMask) // nullptr means all-one mask.
		User.reset(new VPUser({BlockInMask}));
		}

/// Method to support type inquiry through isa, cast, and dyn_cast.		/// Method to support type inquiry through isa, cast, and dyn_cast.
static inline bool classof(const VPRecipeBase *V) {		static inline bool classof(const VPRecipeBase *V) {
return V->getVPRecipeID() == VPRecipeBase::VPBranchOnMaskSC;		return V->getVPRecipeID() == VPRecipeBase::VPBranchOnMaskSC;
}		}

/// Generate the extraction of the appropriate bit from the block mask and the		/// Generate the extraction of the appropriate bit from the block mask and the
/// conditional branch.		/// conditional branch.
void execute(VPTransformState &State) override;		void execute(VPTransformState &State) override;

/// Print the recipe.		/// Print the recipe.
void print(raw_ostream &O, const Twine &Indent) const override {		void print(raw_ostream &O, const Twine &Indent) const override {
O << " +\n"		O << " +\n" << Indent << "\"BRANCH-ON-MASK ";
<< Indent << "\"BRANCH-ON-MASK-OF " << MaskedBasicBlock->getName()		if (User)
<< "\\l\"";		O << *User->getOperand(0);
		else
		O << " All-One";
		O << "\\l\"";
}		}
};		};

/// VPPredInstPHIRecipe is a recipe for generating the phi nodes needed when		/// VPPredInstPHIRecipe is a recipe for generating the phi nodes needed when
/// control converges back from a Branch-on-Mask. The phi nodes are needed in		/// control converges back from a Branch-on-Mask. The phi nodes are needed in
/// order to merge values that are set under such a branch and feed their uses.		/// order to merge values that are set under such a branch and feed their uses.
/// The phi nodes can be scalar or vector depending on the users of the value.		/// The phi nodes can be scalar or vector depending on the users of the value.
/// This recipe works in concert with VPBranchOnMaskRecipe.		/// This recipe works in concert with VPBranchOnMaskRecipe.
Show All 20 Lines	public:
void print(raw_ostream &O, const Twine &Indent) const override {		void print(raw_ostream &O, const Twine &Indent) const override {
O << " +\n"		O << " +\n"
<< Indent << "\"PHI-PREDICATED-INSTRUCTION " << VPlanIngredient(PredInst)		<< Indent << "\"PHI-PREDICATED-INSTRUCTION " << VPlanIngredient(PredInst)
<< "\\l\"";		<< "\\l\"";
}		}
};		};

/// A Recipe for widening load/store operations.		/// A Recipe for widening load/store operations.
		/// TODO: We currently execute only per-part unless a specific instance is
		/// provided.
class VPWidenMemoryInstructionRecipe : public VPRecipeBase {		class VPWidenMemoryInstructionRecipe : public VPRecipeBase {
private:		private:
Instruction &Instr;		Instruction &Instr;
		std::unique_ptr<VPUser> User;

public:		public:
VPWidenMemoryInstructionRecipe(Instruction &Instr)		VPWidenMemoryInstructionRecipe(Instruction &Instr, VPValue *Mask)
: VPRecipeBase(VPWidenMemoryInstructionSC), Instr(Instr) {}		: VPRecipeBase(VPWidenMemoryInstructionSC), Instr(Instr) {
		if (Mask) // Create a VPInstruction to register as a user of the mask.
		User.reset(new VPUser({Mask}));
		}

/// Method to support type inquiry through isa, cast, and dyn_cast.		/// Method to support type inquiry through isa, cast, and dyn_cast.
static inline bool classof(const VPRecipeBase *V) {		static inline bool classof(const VPRecipeBase *V) {
return V->getVPRecipeID() == VPRecipeBase::VPWidenMemoryInstructionSC;		return V->getVPRecipeID() == VPRecipeBase::VPWidenMemoryInstructionSC;
}		}

/// Generate the wide load/store.		/// Generate the wide load/store.
void execute(VPTransformState &State) override {		void execute(VPTransformState &State) override {
State.ILV->vectorizeMemoryInstruction(&Instr);		if (!User)
		return State.ILV->vectorizeMemoryInstruction(&Instr);

		// Last (and currently only) operand is a mask.
		InnerLoopVectorizer::VectorParts MaskValues(State.UF);
		VPValue *Mask = User->getOperand(User->getNumOperands() - 1);
		for (unsigned Part = 0; Part < State.UF; ++Part)
		MaskValues[Part] = State.get(Mask, Part);
		State.ILV->vectorizeMemoryInstruction(&Instr, &MaskValues);
}		}

/// Print the recipe.		/// Print the recipe.
void print(raw_ostream &O, const Twine &Indent) const override {		void print(raw_ostream &O, const Twine &Indent) const override {
O << " +\n" << Indent << "\"WIDEN " << VPlanIngredient(&Instr);		O << " +\n" << Indent << "\"WIDEN " << VPlanIngredient(&Instr);
		if (User) {
		O << ", ";
		User->getOperand(0)->printAsOperand(O);
		}
O << "\\l\"";		O << "\\l\"";
}		}
};		};
} // end anonymous namespace		} // end anonymous namespace

bool LoopVectorizationPlanner::getDecisionAndClampRange(		bool LoopVectorizationPlanner::getDecisionAndClampRange(
const std::function<bool(unsigned)> &Predicate, VFRange &Range) {		const std::function<bool(unsigned)> &Predicate, VFRange &Range) {
assert(Range.End > Range.Start && "Trying to test an empty VF range.");		assert(Range.End > Range.Start && "Trying to test an empty VF range.");
Show All 9 Lines
}		}

/// Build VPlans for the full range of feasible VF's = {\p MinVF, 2 * \p MinVF,		/// Build VPlans for the full range of feasible VF's = {\p MinVF, 2 * \p MinVF,
/// 4 * \p MinVF, ..., \p MaxVF} by repeatedly building a VPlan for a sub-range		/// 4 * \p MinVF, ..., \p MaxVF} by repeatedly building a VPlan for a sub-range
/// of VF's starting at a given VF and extending it as much as possible. Each		/// of VF's starting at a given VF and extending it as much as possible. Each
/// vectorization decision can potentially shorten this sub-range during		/// vectorization decision can potentially shorten this sub-range during
/// buildVPlan().		/// buildVPlan().
void LoopVectorizationPlanner::buildVPlans(unsigned MinVF, unsigned MaxVF) {		void LoopVectorizationPlanner::buildVPlans(unsigned MinVF, unsigned MaxVF) {

		// Collect conditions feeding internal conditional branches; they need to be
		// represented in VPlan for it to model masking.
		SmallPtrSet<Value *, 1> NeedDef;

		auto *Latch = OrigLoop->getLoopLatch();
		for (BasicBlock *BB : OrigLoop->blocks()) {
		if (BB == Latch)
		continue;
		BranchInst *Branch = dyn_cast<BranchInst>(BB->getTerminator());
		if (Branch && Branch->isConditional())
		NeedDef.insert(Branch->getCondition());
		}

for (unsigned VF = MinVF; VF < MaxVF + 1;) {		for (unsigned VF = MinVF; VF < MaxVF + 1;) {
VFRange SubRange = {VF, MaxVF + 1};		VFRange SubRange = {VF, MaxVF + 1};
VPlans.push_back(buildVPlan(SubRange));		VPlans.push_back(buildVPlan(SubRange, NeedDef));
VF = SubRange.End;		VF = SubRange.End;
}		}
}		}

InnerLoopVectorizer::VectorParts		VPValue LoopVectorizationPlanner::createEdgeMask(BasicBlock Src,
InnerLoopVectorizer::createEdgeMask(BasicBlock Src, BasicBlock Dst) {		BasicBlock *Dst,
		VPlanPtr &Plan) {
assert(is_contained(predecessors(Dst), Src) && "Invalid edge");		assert(is_contained(predecessors(Dst), Src) && "Invalid edge");

// Look for cached value.		// Look for cached value.
std::pair<BasicBlock , BasicBlock > Edge(Src, Dst);		std::pair<BasicBlock , BasicBlock > Edge(Src, Dst);
EdgeMaskCacheTy::iterator ECEntryIt = EdgeMaskCache.find(Edge);		EdgeMaskCacheTy::iterator ECEntryIt = EdgeMaskCache.find(Edge);
if (ECEntryIt != EdgeMaskCache.end())		if (ECEntryIt != EdgeMaskCache.end())
return ECEntryIt->second;		return ECEntryIt->second;

VectorParts SrcMask = createBlockInMask(Src);		VPValue *SrcMask = createBlockInMask(Src, Plan);

// The terminator has to be a branch inst!		// The terminator has to be a branch inst!
BranchInst *BI = dyn_cast<BranchInst>(Src->getTerminator());		BranchInst *BI = dyn_cast<BranchInst>(Src->getTerminator());
assert(BI && "Unexpected terminator found");		assert(BI && "Unexpected terminator found");

if (!BI->isConditional())		if (!BI->isConditional())
return EdgeMaskCache[Edge] = SrcMask;		return EdgeMaskCache[Edge] = SrcMask;

VectorParts EdgeMask(UF);		VPValue *EdgeMask = Plan->getVPValue(BI->getCondition());
for (unsigned Part = 0; Part < UF; ++Part) {		assert(EdgeMask && "No Edge Mask found for condition");
auto *EdgeMaskPart = getOrCreateVectorValue(BI->getCondition(), Part);
if (BI->getSuccessor(0) != Dst)
EdgeMaskPart = Builder.CreateNot(EdgeMaskPart);

if (SrcMask[Part]) // Otherwise block in-mask is all-one, no need to AND.		if (BI->getSuccessor(0) != Dst)
EdgeMaskPart = Builder.CreateAnd(EdgeMaskPart, SrcMask[Part]);		EdgeMask = Builder.createNot(EdgeMask);

EdgeMask[Part] = EdgeMaskPart;		if (SrcMask) // Otherwise block in-mask is all-one, no need to AND.
}		EdgeMask = Builder.createAnd(EdgeMask, SrcMask);

return EdgeMaskCache[Edge] = EdgeMask;		return EdgeMaskCache[Edge] = EdgeMask;
}		}

InnerLoopVectorizer::VectorParts		VPValue LoopVectorizationPlanner::createBlockInMask(BasicBlock BB,
InnerLoopVectorizer::createBlockInMask(BasicBlock *BB) {		VPlanPtr &Plan) {
assert(OrigLoop->contains(BB) && "Block is not a part of a loop");		assert(OrigLoop->contains(BB) && "Block is not a part of a loop");

// Look for cached value.		// Look for cached value.
BlockMaskCacheTy::iterator BCEntryIt = BlockMaskCache.find(BB);		BlockMaskCacheTy::iterator BCEntryIt = BlockMaskCache.find(BB);
if (BCEntryIt != BlockMaskCache.end())		if (BCEntryIt != BlockMaskCache.end())
return BCEntryIt->second;		return BCEntryIt->second;

// All-one mask is modelled as no-mask following the convention for masked		// All-one mask is modelled as no-mask following the convention for masked
// load/store/gather/scatter. Initialize BlockMask to no-mask.		// load/store/gather/scatter. Initialize BlockMask to no-mask.
VectorParts BlockMask(UF);		VPValue *BlockMask = nullptr;
for (unsigned Part = 0; Part < UF; ++Part)
BlockMask[Part] = nullptr;

// Loop incoming mask is all-one.		// Loop incoming mask is all-one.
if (OrigLoop->getHeader() == BB)		if (OrigLoop->getHeader() == BB)
return BlockMaskCache[BB] = BlockMask;		return BlockMaskCache[BB] = BlockMask;

// This is the block mask. We OR all incoming edges.		// This is the block mask. We OR all incoming edges.
for (auto *Predecessor : predecessors(BB)) {		for (auto *Predecessor : predecessors(BB)) {
VectorParts EdgeMask = createEdgeMask(Predecessor, BB);		VPValue *EdgeMask = createEdgeMask(Predecessor, BB, Plan);
if (!EdgeMask[0]) // Mask of predecessor is all-one so mask of block is too.		if (!EdgeMask) // Mask of predecessor is all-one so mask of block is too.
return BlockMaskCache[BB] = EdgeMask;		return BlockMaskCache[BB] = EdgeMask;

if (!BlockMask[0]) { // BlockMask has its initialized nullptr value.		if (!BlockMask) { // BlockMask has its initialized nullptr value.
BlockMask = EdgeMask;		BlockMask = EdgeMask;
continue;		continue;
}		}

for (unsigned Part = 0; Part < UF; ++Part)		BlockMask = Builder.createOr(BlockMask, EdgeMask);
BlockMask[Part] = Builder.CreateOr(BlockMask[Part], EdgeMask[Part]);
}		}

return BlockMaskCache[BB] = BlockMask;		return BlockMaskCache[BB] = BlockMask;
}		}

VPInterleaveRecipe *		VPInterleaveRecipe *
LoopVectorizationPlanner::tryToInterleaveMemory(Instruction *I,		LoopVectorizationPlanner::tryToInterleaveMemory(Instruction *I,
VFRange &Range) {		VFRange &Range) {
Show All 17 Lines	LoopVectorizationPlanner::tryToInterleaveMemory(Instruction *I,
// Otherwise, it's an adjunct member of the IG, do not construct any Recipe.		// Otherwise, it's an adjunct member of the IG, do not construct any Recipe.
assert(I == IG->getInsertPos() &&		assert(I == IG->getInsertPos() &&
"Generating a recipe for an adjunct member of an interleave group");		"Generating a recipe for an adjunct member of an interleave group");

return new VPInterleaveRecipe(IG);		return new VPInterleaveRecipe(IG);
}		}

VPWidenMemoryInstructionRecipe *		VPWidenMemoryInstructionRecipe *
LoopVectorizationPlanner::tryToWidenMemory(Instruction *I, VFRange &Range) {		LoopVectorizationPlanner::tryToWidenMemory(Instruction *I, VFRange &Range,
		VPlanPtr &Plan) {
if (!isa<LoadInst>(I) && !isa<StoreInst>(I))		if (!isa<LoadInst>(I) && !isa<StoreInst>(I))
return nullptr;		return nullptr;

auto willWiden = [&](unsigned VF) -> bool {		auto willWiden = [&](unsigned VF) -> bool {
if (VF == 1)		if (VF == 1)
return false;		return false;
if (CM.isScalarAfterVectorization(I, VF) \|\|		if (CM.isScalarAfterVectorization(I, VF) \|\|
CM.isProfitableToScalarize(I, VF))		CM.isProfitableToScalarize(I, VF))
return false;		return false;
LoopVectorizationCostModel::InstWidening Decision =		LoopVectorizationCostModel::InstWidening Decision =
CM.getWideningDecision(I, VF);		CM.getWideningDecision(I, VF);
assert(Decision != LoopVectorizationCostModel::CM_Unknown &&		assert(Decision != LoopVectorizationCostModel::CM_Unknown &&
"CM decision should be taken at this point.");		"CM decision should be taken at this point.");
assert(Decision != LoopVectorizationCostModel::CM_Interleave &&		assert(Decision != LoopVectorizationCostModel::CM_Interleave &&
"Interleave memory opportunity should be caught earlier.");		"Interleave memory opportunity should be caught earlier.");
return Decision != LoopVectorizationCostModel::CM_Scalarize;		return Decision != LoopVectorizationCostModel::CM_Scalarize;
};		};

if (!getDecisionAndClampRange(willWiden, Range))		if (!getDecisionAndClampRange(willWiden, Range))
return nullptr;		return nullptr;

return new VPWidenMemoryInstructionRecipe(*I);		VPValue *Mask = nullptr;
		if (Legal->isMaskRequired(I))
		Mask = createBlockInMask(I->getParent(), Plan);

		return new VPWidenMemoryInstructionRecipe(*I, Mask);
}		}

VPWidenIntOrFpInductionRecipe *		VPWidenIntOrFpInductionRecipe *
LoopVectorizationPlanner::tryToOptimizeInduction(Instruction *I,		LoopVectorizationPlanner::tryToOptimizeInduction(Instruction *I,
VFRange &Range) {		VFRange &Range) {
if (PHINode *Phi = dyn_cast<PHINode>(I)) {		if (PHINode *Phi = dyn_cast<PHINode>(I)) {
// Check if this is an integer or fp induction. If so, build the recipe that		// Check if this is an integer or fp induction. If so, build the recipe that
// produces its scalar and vector values.		// produces its scalar and vector values.
Show All 20 Lines	LoopVectorizationPlanner::tryToOptimizeInduction(Instruction *I,

if (isa<TruncInst>(I) &&		if (isa<TruncInst>(I) &&
getDecisionAndClampRange(isOptimizableIVTruncate(I), Range))		getDecisionAndClampRange(isOptimizableIVTruncate(I), Range))
return new VPWidenIntOrFpInductionRecipe(cast<PHINode>(I->getOperand(0)),		return new VPWidenIntOrFpInductionRecipe(cast<PHINode>(I->getOperand(0)),
cast<TruncInst>(I));		cast<TruncInst>(I));
return nullptr;		return nullptr;
}		}

VPBlendRecipe LoopVectorizationPlanner::tryToBlend(Instruction I) {		VPBlendRecipe *
		LoopVectorizationPlanner::tryToBlend(Instruction *I, VPlanPtr &Plan) {
PHINode *Phi = dyn_cast<PHINode>(I);		PHINode *Phi = dyn_cast<PHINode>(I);
if (!Phi \|\| Phi->getParent() == OrigLoop->getHeader())		if (!Phi \|\| Phi->getParent() == OrigLoop->getHeader())
return nullptr;		return nullptr;

return new VPBlendRecipe(Phi);		// We know that all PHIs in non-header blocks are converted into selects, so
		// we don't have to worry about the insertion order and we can just use the
		// builder. At this point we generate the predication tree. There may be
		// duplications since this is a simple recursive scan, but future
		// optimizations will clean it up.

		SmallVector<VPValue *, 2> Masks;
		unsigned NumIncoming = Phi->getNumIncomingValues();
		for (unsigned In = 0; In < NumIncoming; In++) {
		VPValue *EdgeMask =
		createEdgeMask(Phi->getIncomingBlock(In), Phi->getParent(), Plan);
		assert((EdgeMask \|\| NumIncoming == 1) &&
		"Multiple predecessors with one having a full mask");
		if (EdgeMask)
		Masks.push_back(EdgeMask);
		}
		return new VPBlendRecipe(Phi, Masks);
}		}

bool LoopVectorizationPlanner::tryToWiden(Instruction I, VPBasicBlock VPBB,		bool LoopVectorizationPlanner::tryToWiden(Instruction I, VPBasicBlock VPBB,
VFRange &Range) {		VFRange &Range) {
if (Legal->isScalarWithPredication(I))		if (Legal->isScalarWithPredication(I))
return false;		return false;

auto IsVectorizableOpcode = [](unsigned Opcode) {		auto IsVectorizableOpcode = [](unsigned Opcode) {
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	if (CallInst *CI = dyn_cast<CallInst>(I)) {
// Is it beneficial to perform intrinsic call compared to lib call?		// Is it beneficial to perform intrinsic call compared to lib call?
bool NeedToScalarize;		bool NeedToScalarize;
unsigned CallCost = getVectorCallCost(CI, VF, *TTI, TLI, NeedToScalarize);		unsigned CallCost = getVectorCallCost(CI, VF, *TTI, TLI, NeedToScalarize);
bool UseVectorIntrinsic =		bool UseVectorIntrinsic =
ID && getVectorIntrinsicCost(CI, VF, *TTI, TLI) <= CallCost;		ID && getVectorIntrinsicCost(CI, VF, *TTI, TLI) <= CallCost;
return UseVectorIntrinsic \|\| !NeedToScalarize;		return UseVectorIntrinsic \|\| !NeedToScalarize;
}		}
if (isa<LoadInst>(I) \|\| isa<StoreInst>(I)) {		if (isa<LoadInst>(I) \|\| isa<StoreInst>(I)) {
LoopVectorizationCostModel::InstWidening Decision =		assert(CM.getWideningDecision(I, VF) ==
CM.getWideningDecision(I, VF);		LoopVectorizationCostModel::CM_Scalarize &&
assert(Decision != LoopVectorizationCostModel::CM_Unknown &&		"Memory widening decisions should have been taken care by now");
"CM decision should be taken at this point.");		return false;
assert(Decision != LoopVectorizationCostModel::CM_Interleave &&
"Interleave memory opportunity should be caught earlier.");
return Decision != LoopVectorizationCostModel::CM_Scalarize;
}		}
return true;		return true;
};		};

if (!getDecisionAndClampRange(willWiden, Range))		if (!getDecisionAndClampRange(willWiden, Range))
return false;		return false;

// Success: widen this instruction. We optimize the common case where		// Success: widen this instruction. We optimize the common case where
// consecutive instructions can be represented by a single recipe.		// consecutive instructions can be represented by a single recipe.
if (!VPBB->empty()) {		if (!VPBB->empty()) {
VPWidenRecipe *LastWidenRecipe = dyn_cast<VPWidenRecipe>(&VPBB->back());		VPWidenRecipe *LastWidenRecipe = dyn_cast<VPWidenRecipe>(&VPBB->back());
if (LastWidenRecipe && LastWidenRecipe->appendInstruction(I))		if (LastWidenRecipe && LastWidenRecipe->appendInstruction(I))
return true;		return true;
}		}

VPBB->appendRecipe(new VPWidenRecipe(I));		VPBB->appendRecipe(new VPWidenRecipe(I));
return true;		return true;
}		}

VPBasicBlock *LoopVectorizationPlanner::handleReplication(		VPBasicBlock *LoopVectorizationPlanner::handleReplication(
Instruction I, VFRange &Range, VPBasicBlock VPBB,		Instruction I, VFRange &Range, VPBasicBlock VPBB,
DenseMap<Instruction , VPReplicateRecipe > &PredInst2Recipe) {		DenseMap<Instruction , VPReplicateRecipe > &PredInst2Recipe,
		VPlanPtr &Plan) {
bool IsUniform = getDecisionAndClampRange(		bool IsUniform = getDecisionAndClampRange(
[&](unsigned VF) { return CM.isUniformAfterVectorization(I, VF); },		[&](unsigned VF) { return CM.isUniformAfterVectorization(I, VF); },
Range);		Range);

bool IsPredicated = Legal->isScalarWithPredication(I);		bool IsPredicated = Legal->isScalarWithPredication(I);
auto *Recipe = new VPReplicateRecipe(I, IsUniform, IsPredicated);		auto *Recipe = new VPReplicateRecipe(I, IsUniform, IsPredicated);

// Find if I uses a predicated instruction. If so, it will use its scalar		// Find if I uses a predicated instruction. If so, it will use its scalar
Show All 10 Lines	if (!IsPredicated) {
VPBB->appendRecipe(Recipe);		VPBB->appendRecipe(Recipe);
return VPBB;		return VPBB;
}		}
DEBUG(dbgs() << "LV: Scalarizing and predicating:" << *I << "\n");		DEBUG(dbgs() << "LV: Scalarizing and predicating:" << *I << "\n");
assert(VPBB->getSuccessors().empty() &&		assert(VPBB->getSuccessors().empty() &&
"VPBB has successors when handling predicated replication.");		"VPBB has successors when handling predicated replication.");
// Record predicated instructions for above packing optimizations.		// Record predicated instructions for above packing optimizations.
PredInst2Recipe[I] = Recipe;		PredInst2Recipe[I] = Recipe;
VPBlockBase *Region = VPBB->setOneSuccessor(createReplicateRegion(I, Recipe));		VPBlockBase *Region =
		VPBB->setOneSuccessor(createReplicateRegion(I, Recipe, Plan));
return cast<VPBasicBlock>(Region->setOneSuccessor(new VPBasicBlock()));		return cast<VPBasicBlock>(Region->setOneSuccessor(new VPBasicBlock()));
}		}

VPRegionBlock *		VPRegionBlock *
LoopVectorizationPlanner::createReplicateRegion(Instruction *Instr,		LoopVectorizationPlanner::createReplicateRegion(Instruction *Instr,
VPRecipeBase *PredRecipe) {		VPRecipeBase *PredRecipe,
		VPlanPtr &Plan) {
// Instructions marked for predication are replicated and placed under an		// Instructions marked for predication are replicated and placed under an
// if-then construct to prevent side-effects.		// if-then construct to prevent side-effects.

		// Generate recipes to compute the block mask for this region.
		VPValue *BlockInMask = createBlockInMask(Instr->getParent(), Plan);

// Build the triangular if-then region.		// Build the triangular if-then region.
std::string RegionName = (Twine("pred.") + Instr->getOpcodeName()).str();		std::string RegionName = (Twine("pred.") + Instr->getOpcodeName()).str();
assert(Instr->getParent() && "Predicated instruction not in any basic block");		assert(Instr->getParent() && "Predicated instruction not in any basic block");
auto *BOMRecipe = new VPBranchOnMaskRecipe(Instr->getParent());		auto *BOMRecipe = new VPBranchOnMaskRecipe(BlockInMask);
auto *Entry = new VPBasicBlock(Twine(RegionName) + ".entry", BOMRecipe);		auto *Entry = new VPBasicBlock(Twine(RegionName) + ".entry", BOMRecipe);
auto *PHIRecipe =		auto *PHIRecipe =
Instr->getType()->isVoidTy() ? nullptr : new VPPredInstPHIRecipe(Instr);		Instr->getType()->isVoidTy() ? nullptr : new VPPredInstPHIRecipe(Instr);
auto *Exit = new VPBasicBlock(Twine(RegionName) + ".continue", PHIRecipe);		auto *Exit = new VPBasicBlock(Twine(RegionName) + ".continue", PHIRecipe);
auto *Pred = new VPBasicBlock(Twine(RegionName) + ".if", PredRecipe);		auto *Pred = new VPBasicBlock(Twine(RegionName) + ".if", PredRecipe);
VPRegionBlock *Region = new VPRegionBlock(Entry, Exit, RegionName, true);		VPRegionBlock *Region = new VPRegionBlock(Entry, Exit, RegionName, true);

// Note: first set Entry as region entry and then connect successors starting		// Note: first set Entry as region entry and then connect successors starting
// from it in order, to propagate the "parent" of each VPBasicBlock.		// from it in order, to propagate the "parent" of each VPBasicBlock.
Entry->setTwoSuccessors(Pred, Exit);		Entry->setTwoSuccessors(Pred, Exit);
Pred->setOneSuccessor(Exit);		Pred->setOneSuccessor(Exit);

return Region;		return Region;
}		}

std::unique_ptr<VPlan> LoopVectorizationPlanner::buildVPlan(VFRange &Range) {		LoopVectorizationPlanner::VPlanPtr
		LoopVectorizationPlanner::buildVPlan(VFRange &Range,
		const SmallPtrSetImpl<Value *> &NeedDef) {
		EdgeMaskCache.clear();
		BlockMaskCache.clear();
DenseMap<Instruction , Instruction > &SinkAfter = Legal->getSinkAfter();		DenseMap<Instruction , Instruction > &SinkAfter = Legal->getSinkAfter();
DenseMap<Instruction , Instruction > SinkAfterInverse;		DenseMap<Instruction , Instruction > SinkAfterInverse;

// Collect instructions from the original loop that will become trivially dead		// Collect instructions from the original loop that will become trivially dead
// in the vectorized loop. We don't need to vectorize these instructions. For		// in the vectorized loop. We don't need to vectorize these instructions. For
// example, original induction update instructions can become dead because we		// example, original induction update instructions can become dead because we
// separately emit induction "steps" when generating code for the new loop.		// separately emit induction "steps" when generating code for the new loop.
// Similarly, we create a new latch condition when setting up the structure		// Similarly, we create a new latch condition when setting up the structure
// of the new loop, so the old one can become dead.		// of the new loop, so the old one can become dead.
SmallPtrSet<Instruction *, 4> DeadInstructions;		SmallPtrSet<Instruction *, 4> DeadInstructions;
collectTriviallyDeadInstructions(DeadInstructions);		collectTriviallyDeadInstructions(DeadInstructions);

// Hold a mapping from predicated instructions to their recipes, in order to		// Hold a mapping from predicated instructions to their recipes, in order to
// fix their AlsoPack behavior if a user is determined to replicate and use a		// fix their AlsoPack behavior if a user is determined to replicate and use a
// scalar instead of vector value.		// scalar instead of vector value.
DenseMap<Instruction , VPReplicateRecipe > PredInst2Recipe;		DenseMap<Instruction , VPReplicateRecipe > PredInst2Recipe;

// Create a dummy pre-entry VPBasicBlock to start building the VPlan.		// Create a dummy pre-entry VPBasicBlock to start building the VPlan.
VPBasicBlock *VPBB = new VPBasicBlock("Pre-Entry");		VPBasicBlock *VPBB = new VPBasicBlock("Pre-Entry");
auto Plan = llvm::make_unique<VPlan>(VPBB);		auto Plan = llvm::make_unique<VPlan>(VPBB);

		// Represent values that will have defs inside VPlan.
		for (Value *V : NeedDef)
		Plan->addVPValue(V);

// Scan the body of the loop in a topological order to visit each basic block		// Scan the body of the loop in a topological order to visit each basic block
// after having visited its predecessor basic blocks.		// after having visited its predecessor basic blocks.
LoopBlocksDFS DFS(OrigLoop);		LoopBlocksDFS DFS(OrigLoop);
DFS.perform(LI);		DFS.perform(LI);

for (BasicBlock *BB : make_range(DFS.beginRPO(), DFS.endRPO())) {		for (BasicBlock *BB : make_range(DFS.beginRPO(), DFS.endRPO())) {
// Relevant instructions from basic block BB will be grouped into VPRecipe		// Relevant instructions from basic block BB will be grouped into VPRecipe
// ingredients and fill a new VPBasicBlock.		// ingredients and fill a new VPBasicBlock.
unsigned VPBBsForBB = 0;		unsigned VPBBsForBB = 0;
auto *FirstVPBBForBB = new VPBasicBlock(BB->getName());		auto *FirstVPBBForBB = new VPBasicBlock(BB->getName());
VPBB->setOneSuccessor(FirstVPBBForBB);		VPBB->setOneSuccessor(FirstVPBBForBB);
VPBB = FirstVPBBForBB;		VPBB = FirstVPBBForBB;
		Builder.setInsertPoint(VPBB);

std::vector<Instruction *> Ingredients;		std::vector<Instruction *> Ingredients;

// Organize the ingredients to vectorize from current basic block in the		// Organize the ingredients to vectorize from current basic block in the
// right order.		// right order.
for (Instruction &I : *BB) {		for (Instruction &I : *BB) {
Instruction *Instr = &I;		Instruction *Instr = &I;

▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	for (Instruction *Instr : Ingredients) {
// Check if Instr should belong to an interleave memory recipe, or already		// Check if Instr should belong to an interleave memory recipe, or already
// does. In the latter case Instr is irrelevant.		// does. In the latter case Instr is irrelevant.
if ((Recipe = tryToInterleaveMemory(Instr, Range))) {		if ((Recipe = tryToInterleaveMemory(Instr, Range))) {
VPBB->appendRecipe(Recipe);		VPBB->appendRecipe(Recipe);
continue;		continue;
}		}

// Check if Instr is a memory operation that should be widened.		// Check if Instr is a memory operation that should be widened.
if ((Recipe = tryToWidenMemory(Instr, Range))) {		if ((Recipe = tryToWidenMemory(Instr, Range, Plan))) {
VPBB->appendRecipe(Recipe);		VPBB->appendRecipe(Recipe);
continue;		continue;
}		}

// Check if Instr should form some PHI recipe.		// Check if Instr should form some PHI recipe.
if ((Recipe = tryToOptimizeInduction(Instr, Range))) {		if ((Recipe = tryToOptimizeInduction(Instr, Range))) {
VPBB->appendRecipe(Recipe);		VPBB->appendRecipe(Recipe);
continue;		continue;
}		}
if ((Recipe = tryToBlend(Instr))) {		if ((Recipe = tryToBlend(Instr, Plan))) {
VPBB->appendRecipe(Recipe);		VPBB->appendRecipe(Recipe);
continue;		continue;
}		}
if (PHINode *Phi = dyn_cast<PHINode>(Instr)) {		if (PHINode *Phi = dyn_cast<PHINode>(Instr)) {
VPBB->appendRecipe(new VPWidenPHIRecipe(Phi));		VPBB->appendRecipe(new VPWidenPHIRecipe(Phi));
continue;		continue;
}		}

// Check if Instr is to be widened by a general VPWidenRecipe, after		// Check if Instr is to be widened by a general VPWidenRecipe, after
// having first checked for specific widening recipes that deal with		// having first checked for specific widening recipes that deal with
// Interleave Groups, Inductions and Phi nodes.		// Interleave Groups, Inductions and Phi nodes.
if (tryToWiden(Instr, VPBB, Range))		if (tryToWiden(Instr, VPBB, Range))
continue;		continue;

// Otherwise, if all widening options failed, Instruction is to be		// Otherwise, if all widening options failed, Instruction is to be
// replicated. This may create a successor for VPBB.		// replicated. This may create a successor for VPBB.
VPBasicBlock *NextVPBB =		VPBasicBlock *NextVPBB =
handleReplication(Instr, Range, VPBB, PredInst2Recipe);		handleReplication(Instr, Range, VPBB, PredInst2Recipe, Plan);
if (NextVPBB != VPBB) {		if (NextVPBB != VPBB) {
VPBB = NextVPBB;		VPBB = NextVPBB;
VPBB->setName(BB->hasName() ? BB->getName() + "." + Twine(VPBBsForBB++)		VPBB->setName(BB->hasName() ? BB->getName() + "." + Twine(VPBBsForBB++)
: "");		: "");
}		}
}		}
}		}

▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines
}		}

void VPBranchOnMaskRecipe::execute(VPTransformState &State) {		void VPBranchOnMaskRecipe::execute(VPTransformState &State) {
assert(State.Instance && "Branch on Mask works only on single instance.");		assert(State.Instance && "Branch on Mask works only on single instance.");

unsigned Part = State.Instance->Part;		unsigned Part = State.Instance->Part;
unsigned Lane = State.Instance->Lane;		unsigned Lane = State.Instance->Lane;

auto Cond = State.ILV->createBlockInMask(MaskedBasicBlock);		Value *ConditionBit = nullptr;
		if (!User) // Block in mask is all-one.
Value *ConditionBit = Cond[Part];
if (!ConditionBit) // Block in mask is all-one.
ConditionBit = State.Builder.getTrue();		ConditionBit = State.Builder.getTrue();
else if (ConditionBit->getType()->isVectorTy())		else {
		VPValue *BlockInMask = User->getOperand(0);
		ConditionBit = State.get(BlockInMask, Part);
		if (ConditionBit->getType()->isVectorTy())
ConditionBit = State.Builder.CreateExtractElement(		ConditionBit = State.Builder.CreateExtractElement(
ConditionBit, State.Builder.getInt32(Lane));		ConditionBit, State.Builder.getInt32(Lane));
		}

// Replace the temporary unreachable terminator with a new conditional branch,		// Replace the temporary unreachable terminator with a new conditional branch,
// whose two destinations will be set later when they are created.		// whose two destinations will be set later when they are created.
auto *CurrentTerminator = State.CFG.PrevBB->getTerminator();		auto *CurrentTerminator = State.CFG.PrevBB->getTerminator();
assert(isa<UnreachableInst>(CurrentTerminator) &&		assert(isa<UnreachableInst>(CurrentTerminator) &&
"Expected to replace unreachable terminator with conditional branch.");		"Expected to replace unreachable terminator with conditional branch.");
auto *CondBr = BranchInst::Create(State.CFG.PrevBB, nullptr, ConditionBit);		auto *CondBr = BranchInst::Create(State.CFG.PrevBB, nullptr, ConditionBit);
CondBr->setSuccessor(0, nullptr);		CondBr->setSuccessor(0, nullptr);
ReplaceInstWithInst(CurrentTerminator, CondBr);		ReplaceInstWithInst(CurrentTerminator, CondBr);

DEBUG(dbgs() << "\nLV: vectorizing BranchOnMask recipe "
<< MaskedBasicBlock->getName());
}		}

void VPPredInstPHIRecipe::execute(VPTransformState &State) {		void VPPredInstPHIRecipe::execute(VPTransformState &State) {
assert(State.Instance && "Predicated instruction PHI works per instance.");		assert(State.Instance && "Predicated instruction PHI works per instance.");
Instruction *ScalarPredInst = cast<Instruction>(		Instruction *ScalarPredInst = cast<Instruction>(
State.ValueMap.getScalarValue(PredInst, *State.Instance));		State.ValueMap.getScalarValue(PredInst, *State.Instance));
BasicBlock *PredicatedBB = ScalarPredInst->getParent();		BasicBlock *PredicatedBB = ScalarPredInst->getParent();
BasicBlock *PredicatingBB = PredicatedBB->getSinglePredecessor();		BasicBlock *PredicatingBB = PredicatedBB->getSinglePredecessor();
▲ Show 20 Lines • Show All 379 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/Vectorize/VPlan.h

Show All 9 Lines
/// \file		/// \file
/// This file contains the declarations of the Vectorization Plan base classes:		/// This file contains the declarations of the Vectorization Plan base classes:
/// 1. VPBasicBlock and VPRegionBlock that inherit from a common pure virtual		/// 1. VPBasicBlock and VPRegionBlock that inherit from a common pure virtual
/// VPBlockBase, together implementing a Hierarchical CFG;		/// VPBlockBase, together implementing a Hierarchical CFG;
/// 2. Specializations of GraphTraits that allow VPBlockBase graphs to be		/// 2. Specializations of GraphTraits that allow VPBlockBase graphs to be
/// treated as proper graphs for generic algorithms;		/// treated as proper graphs for generic algorithms;
/// 3. Pure virtual VPRecipeBase serving as the base class for recipes contained		/// 3. Pure virtual VPRecipeBase serving as the base class for recipes contained
/// within VPBasicBlocks;		/// within VPBasicBlocks;
/// 4. The VPlan class holding a candidate for vectorization;		/// 4. VPInstruction, a concrete Recipe and VPUser modeling a single planned
/// 5. The VPlanPrinter class providing a way to print a plan in dot format.		/// instruction;
		/// 5. The VPlan class holding a candidate for vectorization;
		/// 6. The VPlanPrinter class providing a way to print a plan in dot format;
/// These are documented in docs/VectorizationPlan.rst.		/// These are documented in docs/VectorizationPlan.rst.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_TRANSFORMS_VECTORIZE_VPLAN_H		#ifndef LLVM_TRANSFORMS_VECTORIZE_VPLAN_H
#define LLVM_TRANSFORMS_VECTORIZE_VPLAN_H		#define LLVM_TRANSFORMS_VECTORIZE_VPLAN_H

		#include "VPlanValue.h"
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/GraphTraits.h"		#include "llvm/ADT/GraphTraits.h"
#include "llvm/ADT/Optional.h"		#include "llvm/ADT/Optional.h"
#include "llvm/ADT/SmallSet.h"		#include "llvm/ADT/SmallSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Twine.h"		#include "llvm/ADT/Twine.h"
#include "llvm/ADT/ilist.h"		#include "llvm/ADT/ilist.h"
#include "llvm/ADT/ilist_node.h"		#include "llvm/ADT/ilist_node.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include <algorithm>		#include <algorithm>
#include <cassert>		#include <cassert>
#include <cstddef>		#include <cstddef>
#include <map>		#include <map>
#include <string>		#include <string>

		// The (re)use of existing LoopVectorize classes is subject to future VPlan
		// refactoring.
		namespace {
		class LoopVectorizationLegality;
		class LoopVectorizationCostModel;
		} // namespace

namespace llvm {		namespace llvm {

class BasicBlock;		class BasicBlock;
class DominatorTree;		class DominatorTree;
class InnerLoopVectorizer;		class InnerLoopVectorizer;
class LoopInfo;		class LoopInfo;
class raw_ostream;		class raw_ostream;
class Value;		class Value;
Show All 27 Lines
/// replace an existing one, call resetVectorValue and resetScalarValue. This is		/// replace an existing one, call resetVectorValue and resetScalarValue. This is
/// currently needed to modify the mapped values during "fix-up" operations that		/// currently needed to modify the mapped values during "fix-up" operations that
/// occur once the first phase of widening is complete. These operations include		/// occur once the first phase of widening is complete. These operations include
/// type truncation and the second phase of recurrence widening.		/// type truncation and the second phase of recurrence widening.
///		///
/// Entries from either map can be retrieved using the getVectorValue and		/// Entries from either map can be retrieved using the getVectorValue and
/// getScalarValue functions, which assert that the desired value exists.		/// getScalarValue functions, which assert that the desired value exists.
struct VectorizerValueMap {		struct VectorizerValueMap {
		friend struct VPTransformState;

private:		private:
/// The unroll factor. Each entry in the vector map contains UF vector values.		/// The unroll factor. Each entry in the vector map contains UF vector values.
unsigned UF;		unsigned UF;

/// The vectorization factor. Each entry in the scalar map contains UF x VF		/// The vectorization factor. Each entry in the scalar map contains UF x VF
/// scalar values.		/// scalar values.
unsigned VF;		unsigned VF;

▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines	public:
void resetScalarValue(Value *Key, const VPIteration &Instance,		void resetScalarValue(Value *Key, const VPIteration &Instance,
Value *Scalar) {		Value *Scalar) {
assert(hasScalarValue(Key, Instance) &&		assert(hasScalarValue(Key, Instance) &&
"Scalar value not set for part and lane");		"Scalar value not set for part and lane");
ScalarMapStorage[Key][Instance.Part][Instance.Lane] = Scalar;		ScalarMapStorage[Key][Instance.Part][Instance.Lane] = Scalar;
}		}
};		};

		/// This class is used to enable the VPlan to invoke a method of ILV. This is
		/// needed until the method is refactored out of ILV and becomes reusable.
		struct VPCallback {
		virtual ~VPCallback() {}
		virtual Value getOrCreateVectorValues(Value V, unsigned Part) = 0;
		};

/// VPTransformState holds information passed down when "executing" a VPlan,		/// VPTransformState holds information passed down when "executing" a VPlan,
/// needed for generating the output IR.		/// needed for generating the output IR.
struct VPTransformState {		struct VPTransformState {
VPTransformState(unsigned VF, unsigned UF, LoopInfo LI, DominatorTree DT,		VPTransformState(unsigned VF, unsigned UF, LoopInfo LI, DominatorTree DT,
IRBuilder<> &Builder, VectorizerValueMap &ValueMap,		IRBuilder<> &Builder, VectorizerValueMap &ValueMap,
InnerLoopVectorizer *ILV)		InnerLoopVectorizer *ILV, VPCallback &Callback)
: VF(VF), UF(UF), LI(LI), DT(DT), Builder(Builder), ValueMap(ValueMap),		: VF(VF), UF(UF), Instance(), LI(LI), DT(DT), Builder(Builder),
ILV(ILV) {}		ValueMap(ValueMap), ILV(ILV), Callback(Callback) {}

/// The chosen Vectorization and Unroll Factors of the loop being vectorized.		/// The chosen Vectorization and Unroll Factors of the loop being vectorized.
unsigned VF;		unsigned VF;
unsigned UF;		unsigned UF;

/// Hold the indices to generate specific scalar instructions. Null indicates		/// Hold the indices to generate specific scalar instructions. Null indicates
/// that all instances are to be generated, using either scalar or vector		/// that all instances are to be generated, using either scalar or vector
/// instructions.		/// instructions.
Optional<VPIteration> Instance;		Optional<VPIteration> Instance;

		struct DataState {
		/// A type for vectorized values in the new loop. Each value from the
		/// original loop, when vectorized, is represented by UF vector values in
		/// the new unrolled loop, where UF is the unroll factor.
		typedef SmallVector<Value *, 2> PerPartValuesTy;

		DenseMap<VPValue *, PerPartValuesTy> PerPartOutput;
		} Data;

		/// Get the generated Value for a given VPValue and a given Part. Note that
		/// as some Defs are still created by ILV and managed in its ValueMap, this
		/// method will delegate the call to ILV in such cases in order to provide
		/// callers a consistent API.
		/// \see set.
		Value get(VPValue Def, unsigned Part) {
		// If Values have been set for this Def return the one relevant for \p Part.
		if (Data.PerPartOutput.count(Def))
		return Data.PerPartOutput[Def][Part];
		// Def is managed by ILV: bring the Values from ValueMap.
		return Callback.getOrCreateVectorValues(VPValue2Value[Def], Part);
		}

		/// Set the generated Value for a given VPValue and a given Part.
		void set(VPValue Def, Value V, unsigned Part) {
		if (!Data.PerPartOutput.count(Def)) {
		DataState::PerPartValuesTy Entry(UF);
		Data.PerPartOutput[Def] = Entry;
		}
		Data.PerPartOutput[Def][Part] = V;
		}

/// Hold state information used when constructing the CFG of the output IR,		/// Hold state information used when constructing the CFG of the output IR,
/// traversing the VPBasicBlocks and generating corresponding IR BasicBlocks.		/// traversing the VPBasicBlocks and generating corresponding IR BasicBlocks.
struct CFGState {		struct CFGState {
/// The previous VPBasicBlock visited. Initially set to null.		/// The previous VPBasicBlock visited. Initially set to null.
VPBasicBlock *PrevVPBB = nullptr;		VPBasicBlock *PrevVPBB = nullptr;

/// The previous IR BasicBlock created or used. Initially set to the new		/// The previous IR BasicBlock created or used. Initially set to the new
/// header BasicBlock.		/// header BasicBlock.
Show All 18 Lines	struct VPTransformState {

/// Hold a reference to the IRBuilder used to generate output IR code.		/// Hold a reference to the IRBuilder used to generate output IR code.
IRBuilder<> &Builder;		IRBuilder<> &Builder;

/// Hold a reference to the Value state information used when generating the		/// Hold a reference to the Value state information used when generating the
/// Values of the output IR.		/// Values of the output IR.
VectorizerValueMap &ValueMap;		VectorizerValueMap &ValueMap;

		/// Hold a reference to a mapping between VPValues in VPlan and original
		/// Values they correspond to.
		VPValue2ValueTy VPValue2Value;

/// Hold a pointer to InnerLoopVectorizer to reuse its IR generation methods.		/// Hold a pointer to InnerLoopVectorizer to reuse its IR generation methods.
InnerLoopVectorizer *ILV;		InnerLoopVectorizer *ILV;

		VPCallback &Callback;
};		};

/// VPBlockBase is the building block of the Hierarchical Control-Flow Graph.		/// VPBlockBase is the building block of the Hierarchical Control-Flow Graph.
/// A VPBlockBase can be either a VPBasicBlock or a VPRegionBlock.		/// A VPBlockBase can be either a VPBasicBlock or a VPRegionBlock.
class VPBlockBase {		class VPBlockBase {
private:		private:
const unsigned char SubclassID; ///< Subclass identifier (for isa/dyn_cast).		const unsigned char SubclassID; ///< Subclass identifier (for isa/dyn_cast).

▲ Show 20 Lines • Show All 189 Lines • ▼ Show 20 Lines
public:		public:
/// An enumeration for keeping track of the concrete subclass of VPRecipeBase		/// An enumeration for keeping track of the concrete subclass of VPRecipeBase
/// that is actually instantiated. Values of this enumeration are kept in the		/// that is actually instantiated. Values of this enumeration are kept in the
/// SubclassID field of the VPRecipeBase objects. They are used for concrete		/// SubclassID field of the VPRecipeBase objects. They are used for concrete
/// type identification.		/// type identification.
using VPRecipeTy = enum {		using VPRecipeTy = enum {
VPBlendSC,		VPBlendSC,
VPBranchOnMaskSC,		VPBranchOnMaskSC,
		VPInstructionSC,
VPInterleaveSC,		VPInterleaveSC,
VPPredInstPHISC,		VPPredInstPHISC,
VPReplicateSC,		VPReplicateSC,
VPWidenIntOrFpInductionSC,		VPWidenIntOrFpInductionSC,
VPWidenMemoryInstructionSC,		VPWidenMemoryInstructionSC,
VPWidenPHISC,		VPWidenPHISC,
VPWidenSC,		VPWidenSC,
};		};
Show All 13 Lines	public:
/// The method which generates the output IR instructions that correspond to		/// The method which generates the output IR instructions that correspond to
/// this VPRecipe, thereby "executing" the VPlan.		/// this VPRecipe, thereby "executing" the VPlan.
virtual void execute(struct VPTransformState &State) = 0;		virtual void execute(struct VPTransformState &State) = 0;

/// Each recipe prints itself.		/// Each recipe prints itself.
virtual void print(raw_ostream &O, const Twine &Indent) const = 0;		virtual void print(raw_ostream &O, const Twine &Indent) const = 0;
};		};

		/// This is a concrete Recipe that models a single VPlan-level instruction.
		/// While as any Recipe it may generate a sequence of IR instructions when
		/// executed, these instructions would always form a single-def expression as
		/// the VPInstruction is also a single def-use vertex.
		class VPInstruction : public VPUser, public VPRecipeBase {
		public:
		/// VPlan opcodes, extending LLVM IR with idiomatics instructions.
		enum { Not = Instruction::OtherOpsEnd + 1 };

		private:
		typedef unsigned char OpcodeTy;
		OpcodeTy Opcode;

		/// Utility method serving execute(): generates a single instance of the
		/// modeled instruction.
		void generateInstruction(VPTransformState &State, unsigned Part);

		public:
		VPInstruction(unsigned Opcode, std::initializer_list<VPValue *> Operands)
		: VPUser(VPValue::VPInstructionSC, Operands),
		VPRecipeBase(VPRecipeBase::VPInstructionSC), Opcode(Opcode) {}

		/// Method to support type inquiry through isa, cast, and dyn_cast.
		static inline bool classof(const VPValue *V) {
		return V->getVPValueID() == VPValue::VPInstructionSC;
		}

		/// Method to support type inquiry through isa, cast, and dyn_cast.
		static inline bool classof(const VPRecipeBase *R) {
		return R->getVPRecipeID() == VPRecipeBase::VPInstructionSC;
		}

		unsigned getOpcode() const { return Opcode; }

		/// Generate the instruction.
		/// TODO: We currently execute only per-part unless a specific instance is
		/// provided.
		void execute(VPTransformState &State) override;

		/// Print the Recipe.
		void print(raw_ostream &O, const Twine &Indent) const override;

		/// Print the VPInstruction.
		void print(raw_ostream &O) const;
		};

/// VPBasicBlock serves as the leaf of the Hierarchical Control-Flow Graph. It		/// VPBasicBlock serves as the leaf of the Hierarchical Control-Flow Graph. It
/// holds a sequence of zero or more VPRecipe's each representing a sequence of		/// holds a sequence of zero or more VPRecipe's each representing a sequence of
/// output IR instructions.		/// output IR instructions.
class VPBasicBlock : public VPBlockBase {		class VPBasicBlock : public VPBlockBase {
public:		public:
using RecipeListTy = iplist<VPRecipeBase>;		using RecipeListTy = iplist<VPRecipeBase>;

private:		private:
Show All 40 Lines	static RecipeListTy VPBasicBlock::getSublistAccess(VPRecipeBase ) {
return &VPBasicBlock::Recipes;		return &VPBasicBlock::Recipes;
}		}

/// Method to support type inquiry through isa, cast, and dyn_cast.		/// Method to support type inquiry through isa, cast, and dyn_cast.
static inline bool classof(const VPBlockBase *V) {		static inline bool classof(const VPBlockBase *V) {
return V->getVPBlockID() == VPBlockBase::VPBasicBlockSC;		return V->getVPBlockID() == VPBlockBase::VPBasicBlockSC;
}		}

/// Augment the existing recipes of a VPBasicBlock with an additional		void insert(VPRecipeBase *Recipe, iterator InsertPt) {
/// \p Recipe as the last recipe.
void appendRecipe(VPRecipeBase *Recipe) {
assert(Recipe && "No recipe to append.");		assert(Recipe && "No recipe to append.");
assert(!Recipe->Parent && "Recipe already in VPlan");		assert(!Recipe->Parent && "Recipe already in VPlan");
Recipe->Parent = this;		Recipe->Parent = this;
return Recipes.push_back(Recipe);		Recipes.insert(InsertPt, Recipe);
}		}

		/// Augment the existing recipes of a VPBasicBlock with an additional
		/// \p Recipe as the last recipe.
		void appendRecipe(VPRecipeBase *Recipe) { insert(Recipe, end()); }

/// The method which generates the output IR instructions that correspond to		/// The method which generates the output IR instructions that correspond to
/// this VPBasicBlock, thereby "executing" the VPlan.		/// this VPBasicBlock, thereby "executing" the VPlan.
void execute(struct VPTransformState *State) override;		void execute(struct VPTransformState *State) override;

private:		private:
/// Create an IR BasicBlock to hold the output instructions generated by this		/// Create an IR BasicBlock to hold the output instructions generated by this
/// VPBasicBlock, and return it. Update the CFGState accordingly.		/// VPBasicBlock, and return it. Update the CFGState accordingly.
BasicBlock *createEmptyBasicBlock(VPTransformState::CFGState &CFG);		BasicBlock *createEmptyBasicBlock(VPTransformState::CFGState &CFG);
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
};		};

/// VPlan models a candidate for vectorization, encoding various decisions take		/// VPlan models a candidate for vectorization, encoding various decisions take
/// to produce efficient output IR, including which branches, basic-blocks and		/// to produce efficient output IR, including which branches, basic-blocks and
/// output IR instructions to generate, and their cost. VPlan holds a		/// output IR instructions to generate, and their cost. VPlan holds a
/// Hierarchical-CFG of VPBasicBlocks and VPRegionBlocks rooted at an Entry		/// Hierarchical-CFG of VPBasicBlocks and VPRegionBlocks rooted at an Entry
/// VPBlock.		/// VPBlock.
class VPlan {		class VPlan {
		friend class VPlanPrinter;

private:		private:
/// Hold the single entry to the Hierarchical CFG of the VPlan.		/// Hold the single entry to the Hierarchical CFG of the VPlan.
VPBlockBase *Entry;		VPBlockBase *Entry;

/// Holds the VFs applicable to this VPlan.		/// Holds the VFs applicable to this VPlan.
SmallSet<unsigned, 2> VFs;		SmallSet<unsigned, 2> VFs;

/// Holds the name of the VPlan, for printing.		/// Holds the name of the VPlan, for printing.
std::string Name;		std::string Name;

		/// Holds a mapping between Values and their corresponding VPValue inside
		/// VPlan.
		Value2VPValueTy Value2VPValue;

public:		public:
VPlan(VPBlockBase *Entry = nullptr) : Entry(Entry) {}		VPlan(VPBlockBase *Entry = nullptr) : Entry(Entry) {}

~VPlan() {		~VPlan() {
if (Entry)		if (Entry)
VPBlockBase::deleteCFG(Entry);		VPBlockBase::deleteCFG(Entry);
		for (auto &MapEntry : Value2VPValue)
		delete MapEntry.second;
}		}

/// Generate the IR code for this VPlan.		/// Generate the IR code for this VPlan.
void execute(struct VPTransformState *State);		void execute(struct VPTransformState *State);

VPBlockBase *getEntry() { return Entry; }		VPBlockBase *getEntry() { return Entry; }
const VPBlockBase *getEntry() const { return Entry; }		const VPBlockBase *getEntry() const { return Entry; }

VPBlockBase setEntry(VPBlockBase Block) { return Entry = Block; }		VPBlockBase setEntry(VPBlockBase Block) { return Entry = Block; }

void addVF(unsigned VF) { VFs.insert(VF); }		void addVF(unsigned VF) { VFs.insert(VF); }

bool hasVF(unsigned VF) { return VFs.count(VF); }		bool hasVF(unsigned VF) { return VFs.count(VF); }

const std::string &getName() const { return Name; }		const std::string &getName() const { return Name; }

void setName(const Twine &newName) { Name = newName.str(); }		void setName(const Twine &newName) { Name = newName.str(); }

		void addVPValue(Value *V) {
		assert(V && "Trying to add a null Value to VPlan");
		assert(!Value2VPValue.count(V) && "Value already exists in VPlan");
		Value2VPValue[V] = new VPValue();
		}

		VPValue getVPValue(Value V) {
		assert(V && "Trying to get the VPValue of a null Value");
		assert(Value2VPValue.count(V) && "Value does not exist in VPlan");
		return Value2VPValue[V];
		}

private:		private:
/// Add to the given dominator tree the header block and every new basic block		/// Add to the given dominator tree the header block and every new basic block
/// that was created between it and the latch block, inclusive.		/// that was created between it and the latch block, inclusive.
static void updateDominatorTree(DominatorTree *DT,		static void updateDominatorTree(DominatorTree *DT,
BasicBlock *LoopPreHeaderBB,		BasicBlock *LoopPreHeaderBB,
BasicBlock *LoopLatchBB);		BasicBlock *LoopLatchBB);
};		};

▲ Show 20 Lines • Show All 130 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/Vectorize/VPlan.cpp

Show All 40 Lines
#include <iterator>		#include <iterator>
#include <string>		#include <string>
#include <vector>		#include <vector>

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "vplan"		#define DEBUG_TYPE "vplan"

		raw_ostream &llvm::operator<<(raw_ostream &OS, const VPValue &V) {
		if (const VPInstruction *Instr = dyn_cast<VPInstruction>(&V))
		Instr->print(OS);
		else
		V.printAsOperand(OS);
		return OS;
		}

/// \return the VPBasicBlock that is the entry of Block, possibly indirectly.		/// \return the VPBasicBlock that is the entry of Block, possibly indirectly.
const VPBasicBlock *VPBlockBase::getEntryBasicBlock() const {		const VPBasicBlock *VPBlockBase::getEntryBasicBlock() const {
const VPBlockBase *Block = this;		const VPBlockBase *Block = this;
while (const VPRegionBlock *Region = dyn_cast<VPRegionBlock>(Block))		while (const VPRegionBlock *Region = dyn_cast<VPRegionBlock>(Block))
Block = Region->getEntry();		Block = Region->getEntry();
return cast<VPBasicBlock>(Block);		return cast<VPBasicBlock>(Block);
}		}

▲ Show 20 Lines • Show All 150 Lines • ▼ Show 20 Lines	for (unsigned Lane = 0, VF = State->VF; Lane < VF; ++Lane) {
}		}
}		}
}		}

// Exit replicating mode.		// Exit replicating mode.
State->Instance.reset();		State->Instance.reset();
}		}

		void VPInstruction::generateInstruction(VPTransformState &State,
		unsigned Part) {
		IRBuilder<> &Builder = State.Builder;

		if (Instruction::isBinaryOp(getOpcode())) {
		Value *A = State.get(getOperand(0), Part);
		Value *B = State.get(getOperand(1), Part);
		Value *V = Builder.CreateBinOp((Instruction::BinaryOps)getOpcode(), A, B);
		State.set(this, V, Part);
		return;
		}

		switch (getOpcode()) {
		case VPInstruction::Not: {
		Value *A = State.get(getOperand(0), Part);
		Value *V = Builder.CreateNot(A);
		State.set(this, V, Part);
		break;
		}
		default:
		llvm_unreachable("Unsupported opcode for instruction");
		}
		}

		void VPInstruction::execute(VPTransformState &State) {
		assert(!State.Instance && "VPInstruction executing an Instance");
		for (unsigned Part = 0; Part < State.UF; ++Part)
		generateInstruction(State, Part);
		}

		void VPInstruction::print(raw_ostream &O, const Twine &Indent) const {
		O << " +\n" << Indent << "\"EMIT ";
		print(O);
		O << "\\l\"";
		}

		void VPInstruction::print(raw_ostream &O) const {
		printAsOperand(O);
		O << " = ";

		switch (getOpcode()) {
		case VPInstruction::Not:
		O << "not";
		break;
		default:
		O << Instruction::getOpcodeName(getOpcode());
		}

		for (const VPValue *Operand : operands()) {
		O << " ";
		Operand->printAsOperand(O);
		}
		}

/// Generate the code inside the body of the vectorized loop. Assumes a single		/// Generate the code inside the body of the vectorized loop. Assumes a single
/// LoopVectorBody basic-block was created for this. Introduce additional		/// LoopVectorBody basic-block was created for this. Introduce additional
/// basic-blocks as needed, and fill them all.		/// basic-blocks as needed, and fill them all.
void VPlan::execute(VPTransformState *State) {		void VPlan::execute(VPTransformState *State) {
		// 0. Set the reverse mapping from VPValues to Values for code generation.
		for (auto &Entry : Value2VPValue)
		State->VPValue2Value[Entry.second] = Entry.first;

BasicBlock *VectorPreHeaderBB = State->CFG.PrevBB;		BasicBlock *VectorPreHeaderBB = State->CFG.PrevBB;
BasicBlock *VectorHeaderBB = VectorPreHeaderBB->getSingleSuccessor();		BasicBlock *VectorHeaderBB = VectorPreHeaderBB->getSingleSuccessor();
assert(VectorHeaderBB && "Loop preheader does not have a single successor.");		assert(VectorHeaderBB && "Loop preheader does not have a single successor.");
BasicBlock *VectorLatchBB = VectorHeaderBB;		BasicBlock *VectorLatchBB = VectorHeaderBB;

// 1. Make room to generate basic-blocks inside loop body if needed.		// 1. Make room to generate basic-blocks inside loop body if needed.
VectorLatchBB = VectorHeaderBB->splitBasicBlock(		VectorLatchBB = VectorHeaderBB->splitBasicBlock(
VectorHeaderBB->getFirstInsertionPt(), "vector.body.latch");		VectorHeaderBB->getFirstInsertionPt(), "vector.body.latch");
▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines

void VPlanPrinter::dump() {		void VPlanPrinter::dump() {
Depth = 1;		Depth = 1;
bumpIndent(0);		bumpIndent(0);
OS << "digraph VPlan {\n";		OS << "digraph VPlan {\n";
OS << "graph [labelloc=t, fontsize=30; label=\"Vectorization Plan";		OS << "graph [labelloc=t, fontsize=30; label=\"Vectorization Plan";
if (!Plan.getName().empty())		if (!Plan.getName().empty())
OS << "\\n" << DOT::EscapeString(Plan.getName());		OS << "\\n" << DOT::EscapeString(Plan.getName());
		if (!Plan.Value2VPValue.empty()) {
		OS << ", where:";
		for (auto Entry : Plan.Value2VPValue) {
		OS << "\\n" << *Entry.second;
		OS << DOT::EscapeString(" := ");
		Entry.first->printAsOperand(OS, false);
		}
		}
OS << "\"]\n";		OS << "\"]\n";
OS << "node [shape=rect, fontname=Courier, fontsize=30]\n";		OS << "node [shape=rect, fontname=Courier, fontsize=30]\n";
OS << "edge [fontname=Courier, fontsize=30]\n";		OS << "edge [fontname=Courier, fontsize=30]\n";
OS << "compound=true\n";		OS << "compound=true\n";

for (VPBlockBase *Block : depth_first(Plan.getEntry()))		for (VPBlockBase *Block : depth_first(Plan.getEntry()))
dumpBlock(Block);		dumpBlock(Block);

▲ Show 20 Lines • Show All 91 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/Vectorize/VPlanBuilder.h

Property	Old Value	New Value
svn:eol-style	null	native \ No newline at end of property
svn:keywords	null	Author Date Id Rev URL \ No newline at end of property
svn:mime-type	null	text/plain \ No newline at end of property

				//===- VPlanBuilder.h - A VPlan utility for constructing VPInstructions ---===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				/// This file provides a VPlan-based builder utility analogous to IRBuilder.
				/// It provides an instruction-level API for generating VPInstructions while
				/// abstracting away the Recipe manipulation details.
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_TRANSFORMS_VECTORIZE_VPLAN_BUILDER_H
				#define LLVM_TRANSFORMS_VECTORIZE_VPLAN_BUILDER_H

				#include "VPlan.h"

				namespace llvm {

				class VPBuilder {
				private:
				VPBasicBlock *BB = nullptr;
				VPBasicBlock::iterator InsertPt = VPBasicBlock::iterator();

				VPInstruction *createInstruction(unsigned Opcode,
				std::initializer_list<VPValue *> Operands) {
				VPInstruction *Instr = new VPInstruction(Opcode, Operands);
				BB->insert(Instr, InsertPt);
				return Instr;
				}

				public:
				VPBuilder() {}

				/// \brief This specifies that created VPInstructions should be appended to
				/// the end of the specified block.
				void setInsertPoint(VPBasicBlock *TheBB) {
				assert(TheBB && "Attempting to set a null insert point");
				BB = TheBB;
				InsertPt = BB->end();
				}

				VPValue createNot(VPValue Operand) {
				return createInstruction(VPInstruction::Not, {Operand});
				}

				VPValue createAnd(VPValue LHS, VPValue *RHS) {
				return createInstruction(Instruction::BinaryOps::And, {LHS, RHS});
				}

				VPValue createOr(VPValue LHS, VPValue *RHS) {
				return createInstruction(Instruction::BinaryOps::Or, {LHS, RHS});
				}
				};

				} // namespace llvm

				#endif // LLVM_TRANSFORMS_VECTORIZE_VPLAN_BUILDER_H

llvm/trunk/lib/Transforms/Vectorize/VPlanValue.h

Property	Old Value	New Value
svn:eol-style	null	native \ No newline at end of property
svn:keywords	null	Author Date Id Rev URL \ No newline at end of property
svn:mime-type	null	text/plain \ No newline at end of property

				//===- VPlanValue.h - Represent Values in Vectorizer Plan -----------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				/// This file contains the declarations of the entities induced by Vectorization
				/// Plans, e.g. the instructions the VPlan intends to generate if executed.
				/// VPlan models the following entities:
				/// VPValue
				/// \|-- VPUser
				/// \| \|-- VPInstruction
				/// These are documented in docs/VectorizationPlan.rst.
				///
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_TRANSFORMS_VECTORIZE_VPLAN_VALUE_H
				#define LLVM_TRANSFORMS_VECTORIZE_VPLAN_VALUE_H

				#include "llvm/ADT/DenseMap.h"
				#include "llvm/ADT/SmallVector.h"
				#include "llvm/IR/Value.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Support/raw_ostream.h"

				namespace llvm {

				// Forward declarations.
				class VPUser;

				// This is the base class of the VPlan Def/Use graph, used for modeling the data
				// flow into, within and out of the VPlan. VPValues can stand for live-ins
				// coming from the input IR, instructions which VPlan will generate if executed
				// and live-outs which the VPlan will need to fix accordingly.
				class VPValue {
				private:
				const unsigned char SubclassID; ///< Subclass identifier (for isa/dyn_cast).

				SmallVector<VPUser *, 1> Users;

				protected:
				VPValue(const unsigned char SC) : SubclassID(SC) {}

				public:
				/// An enumeration for keeping track of the concrete subclass of VPValue that
				/// are actually instantiated. Values of this enumeration are kept in the
				/// SubclassID field of the VPValue objects. They are used for concrete
				/// type identification.
				enum { VPValueSC, VPUserSC, VPInstructionSC };

				VPValue() : SubclassID(VPValueSC) {}
				VPValue(const VPValue &) = delete;
				VPValue &operator=(const VPValue &) = delete;

				/// \return an ID for the concrete type of this object.
				/// This is used to implement the classof checks. This should not be used
				/// for any other purpose, as the values may change as LLVM evolves.
				unsigned getVPValueID() const { return SubclassID; }

				void printAsOperand(raw_ostream &OS) const {
				OS << "%vp" << (unsigned short)(unsigned long long)this;
				}

				unsigned getNumUsers() const { return Users.size(); }
				void addUser(VPUser &User) { Users.push_back(&User); }

				typedef SmallVectorImpl<VPUser *>::iterator user_iterator;
				typedef SmallVectorImpl<VPUser *>::const_iterator const_user_iterator;
				typedef iterator_range<user_iterator> user_range;
				typedef iterator_range<const_user_iterator> const_user_range;

				user_iterator user_begin() { return Users.begin(); }
				const_user_iterator user_begin() const { return Users.begin(); }
				user_iterator user_end() { return Users.end(); }
				const_user_iterator user_end() const { return Users.end(); }
				user_range users() { return user_range(user_begin(), user_end()); }
				const_user_range users() const {
				return const_user_range(user_begin(), user_end());
				}
				};

				typedef DenseMap<Value , VPValue > Value2VPValueTy;
				typedef DenseMap<VPValue , Value > VPValue2ValueTy;

				raw_ostream &operator<<(raw_ostream &OS, const VPValue &V);

				/// This class augments VPValue with operands which provide the inverse def-use
				/// edges from VPValue's users to their defs.
				class VPUser : public VPValue {
				private:
				SmallVector<VPValue *, 2> Operands;

				void addOperand(VPValue *Operand) {
				Operands.push_back(Operand);
				Operand->addUser(*this);
				}

				protected:
				VPUser(const unsigned char SC) : VPValue(SC) {}
				VPUser(const unsigned char SC, ArrayRef<VPValue *> Operands) : VPValue(SC) {
				for (VPValue *Operand : Operands)
				addOperand(Operand);
				}

				public:
				VPUser() : VPValue(VPValue::VPUserSC) {}
				VPUser(ArrayRef<VPValue *> Operands) : VPUser(VPValue::VPUserSC, Operands) {}
				VPUser(std::initializer_list<VPValue *> Operands)
				: VPUser(ArrayRef<VPValue *>(Operands)) {}
				VPUser(const VPUser &) = delete;
				VPUser &operator=(const VPUser &) = delete;

				/// Method to support type inquiry through isa, cast, and dyn_cast.
				static inline bool classof(const VPValue *V) {
				return V->getVPValueID() >= VPUserSC &&
				V->getVPValueID() <= VPInstructionSC;
				}

				unsigned getNumOperands() const { return Operands.size(); }
				inline VPValue *getOperand(unsigned N) const {
				assert(N < Operands.size() && "Operand index out of bounds");
				return Operands[N];
				}

				typedef SmallVectorImpl<VPValue *>::iterator operand_iterator;
				typedef SmallVectorImpl<VPValue *>::const_iterator const_operand_iterator;
				typedef iterator_range<operand_iterator> operand_range;
				typedef iterator_range<const_operand_iterator> const_operand_range;

				operand_iterator op_begin() { return Operands.begin(); }
				const_operand_iterator op_begin() const { return Operands.begin(); }
				operand_iterator op_end() { return Operands.end(); }
				const_operand_iterator op_end() const { return Operands.end(); }
				operand_range operands() { return operand_range(op_begin(), op_end()); }
				const_operand_range operands() const {
				return const_operand_range(op_begin(), op_end());
				}
				};

				} // namespace llvm

				#endif // LLVM_TRANSFORMS_VECTORIZE_VPLAN_VALUE_H

llvm/trunk/test/Transforms/LoopVectorize/if-conversion-nest.ll

	Show All 36 Lines
	; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP10:%.]] = bitcast i32 [[TMP9]] to <4 x i32>*			; CHECK-NEXT: [[TMP10:%.]] = bitcast i32 [[TMP9]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD6:%.]] = load <4 x i32>, <4 x i32> [[TMP10]], align 4, !alias.scope !3			; CHECK-NEXT: [[WIDE_LOAD6:%.]] = load <4 x i32>, <4 x i32> [[TMP10]], align 4, !alias.scope !3
	; CHECK-NEXT: [[TMP11:%.*]] = icmp sgt <4 x i32> [[WIDE_LOAD]], [[WIDE_LOAD6]]			; CHECK-NEXT: [[TMP11:%.*]] = icmp sgt <4 x i32> [[WIDE_LOAD]], [[WIDE_LOAD6]]
	; CHECK-NEXT: [[TMP12:%.*]] = icmp sgt <4 x i32> [[WIDE_LOAD]], <i32 19, i32 19, i32 19, i32 19>			; CHECK-NEXT: [[TMP12:%.*]] = icmp sgt <4 x i32> [[WIDE_LOAD]], <i32 19, i32 19, i32 19, i32 19>
	; CHECK-NEXT: [[TMP13:%.*]] = icmp slt <4 x i32> [[WIDE_LOAD6]], <i32 4, i32 4, i32 4, i32 4>			; CHECK-NEXT: [[TMP13:%.*]] = icmp slt <4 x i32> [[WIDE_LOAD6]], <i32 4, i32 4, i32 4, i32 4>
	; CHECK-NEXT: [[TMP14:%.*]] = select <4 x i1> [[TMP13]], <4 x i32> <i32 4, i32 4, i32 4, i32 4>, <4 x i32> <i32 5, i32 5, i32 5, i32 5>			; CHECK-NEXT: [[TMP14:%.*]] = select <4 x i1> [[TMP13]], <4 x i32> <i32 4, i32 4, i32 4, i32 4>, <4 x i32> <i32 5, i32 5, i32 5, i32 5>
	; CHECK-NEXT: [[TMP15:%.*]] = and <4 x i1> [[TMP12]], [[TMP11]]			; CHECK-NEXT: [[TMP15:%.*]] = and <4 x i1> [[TMP12]], [[TMP11]]
	; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP15]], <4 x i32> <i32 3, i32 3, i32 3, i32 3>, <4 x i32> <i32 9, i32 9, i32 9, i32 9>
	; CHECK-NEXT: [[TMP16:%.*]] = xor <4 x i1> [[TMP12]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP16:%.*]] = xor <4 x i1> [[TMP12]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP17:%.*]] = and <4 x i1> [[TMP11]], [[TMP16]]			; CHECK-NEXT: [[TMP17:%.*]] = and <4 x i1> [[TMP11]], [[TMP16]]
				; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP15]], <4 x i32> <i32 3, i32 3, i32 3, i32 3>, <4 x i32> <i32 9, i32 9, i32 9, i32 9>
	; CHECK-NEXT: [[PREDPHI7:%.*]] = select <4 x i1> [[TMP17]], <4 x i32> [[TMP14]], <4 x i32> [[PREDPHI]]			; CHECK-NEXT: [[PREDPHI7:%.*]] = select <4 x i1> [[TMP17]], <4 x i32> [[TMP14]], <4 x i32> [[PREDPHI]]
	; CHECK-NEXT: [[TMP18:%.]] = bitcast i32 [[TMP7]] to <4 x i32>*			; CHECK-NEXT: [[TMP18:%.]] = bitcast i32 [[TMP7]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[PREDPHI7]], <4 x i32>* [[TMP18]], align 4, !alias.scope !0, !noalias !3			; CHECK-NEXT: store <4 x i32> [[PREDPHI7]], <4 x i32>* [[TMP18]], align 4, !alias.scope !0, !noalias !3
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP19:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP19:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP19]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !5			; CHECK-NEXT: br i1 [[TMP19]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !5
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[TMP6]], 0			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[TMP6]], 0
	▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines