This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
-
CMakeLists.txt
7/44
LoopVectorize.cpp
3/32
VPlan.h
9
VPlan.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
AArch64/
-
aarch64-predication.ll
-
predication_costs.ll
-
SystemZ/
-
load-store-scalarization-cost.ll
-
first-order-recurrence.ll
2
if-pred-non-void.ll

Differential D32871

[LV] Using VPlan to model the vectorized code and drive its transformation
ClosedPublic

Authored by Ayal on May 4 2017, 10:10 AM.

Download Raw Diff

Details

Reviewers

mkuper
mssimpso
rengolin

Commits

rG662788336965: [LV] Using VPlan to model the vectorized code and drive its transformation
rL311077: [LV] Using VPlan to model the vectorized code and drive its transformation

Summary

This patch contains the new VPlan model and uses it to represent the vectorized code and drive the generation of vectorized IR. It follows the breakdown of https://reviews.llvm.org/D28975.

The patch contains

https://reviews.llvm.org/D32200: once that patch is approved and committed, an updated and simplified version of this patch will be uploaded.
VectorizationPlan.rst: documenting the VPlan model, covering this and follow-up patches.
Protected methods made public to enable their reuse are annotated as such in-place to simplify the diff. They will eventually be moved and grouped together.

To recap, the introductory patch of VPlan is designed to

capture in VPlan all current vectorization decisions,
represent the control-flow of the vectorized loop body using VPlan's Hierarchical CFG,
retain current vectorizer output.

In this patch VPlan models the vectorized loop body: the vectorized control-flow is represented using VPlan's Hierarchical CFG, with predication refactored from being a post-vectorization-step into a vectorization planning step modeling if-then VPRegionBlocks, and generating code inline with non-predicated code. The vectorized code within each VPBasicBlock is represented as a sequence of Recipes, each responsible for modelling and generating a sequence of IR instructions. To keep the size of this commit manageable the Recipes in this patch are coarse-grained and capture large chunks of LV’s code-generation logic. The constructed VPlans are dumped under -debug.

Gil and Ayal.

Diff Detail

Event Timeline

Ayal created this revision.May 4 2017, 10:10 AM

Herald added a subscriber: mzolotukhin. · View Herald TranscriptMay 4 2017, 10:10 AM

https://reviews.llvm.org/D32200: once that patch is approved and committed, an updated and simplified version of this patch will be uploaded.

Done, as promised.

Herald added a subscriber: javed.absar. · View Herald TranscriptMay 14 2017, 4:35 AM

ashutosh.nema added a subscriber: ashutosh.nema.May 14 2017, 10:39 PM

Ping

Hi Ayal, sorry for the delay. Can you commit the plan doc separately?

We have a docs/Proposals sub-directory that is perfect for that kind of document. It should be fine to commit there directly, as none of what's in there is "an official LLVM document" and can change with time.

It would also be good to get a link to that document from the existing vectoriser documents (as an FYI on what we're working), but once VPlans are fully in, we should move the core part of the VPlan doc into the vectorisers (or just move them outside of Proposals), so that they become official.

I'll start looking at the code changes this week.

cheers,
--renato

huntergr added a subscriber: huntergr.May 22 2017, 3:57 AM

Some initial comments...

lib/Transforms/Vectorize/LoopVectorize.cpp
531	The public protected swap is confusing. I'd create a "protected bubble" and move the methods inside it.
543	Not virtual anymore? Should it be marked "final"?

In D32871#760712, @rengolin wrote:

Hi Ayal, sorry for the delay. Can you commit the plan doc separately?

We have a docs/Proposals sub-directory that is perfect for that kind of document. It should be fine to commit there directly, as none of what's in there is "an official LLVM document" and can change with time.

It would also be good to get a link to that document from the existing vectoriser documents (as an FYI on what we're working), but once VPlans are fully in, we should move the core part of the VPlan doc into the vectorisers (or just move them outside of Proposals), so that they become official.

I'll start looking at the code changes this week.

cheers,
--renato

Sure, done (r304161), thanks!

lib/Transforms/Vectorize/LoopVectorize.cpp
531	Sorry for this confusion. The idea was to avoid moving methods around to simplify the diff: Protected methods made public to enable their reuse are annotated as such in-place to simplify the diff. They will eventually be moved and grouped together. An alternative is to use "friend" instead.
543	This should have been part of r297580 originally. In general ILV's methods are marked virtual iff they're overridden by Unroller; otherwise they may be marked "final". This is unrelated to VPlan. Sorry for the confusion.

A few more comments... :)

lib/Transforms/Vectorize/LoopVectorize.cpp
531	Grouping should be fine.
543	Right, a separate fix is better in this case, then.
619	A short comment would be nice, here.
2156	Nit: Can you move the member variables to the top? Makes it easier to know what "VPlans" are, etc.
2158	SmallVectorImpl should destroy its own range, you don't need to do it yourself.
2188	This POD is so small that it's worth passing it by value most of the time. If you need to change it, it's also worth passing it by reference as a whole.

Updated version addressing review comments.

Ayal marked 5 inline comments as done.Jun 4 2017, 2:47 PM

Ayal added inline comments.

lib/Transforms/Vectorize/LoopVectorize.cpp
531	Done. Also desired for "public"-only changes to CostModel methods?
543	Dropping 'virtual' from scalarizeInstruction() was committed separately.
2158	Leaving ~LoopVectorizationPlanner() {} does not seem to deallocate the VPlan objects created by buildVPlan(). Is there another way to prevent them from leaking?
2188	Holding a constant Start plus non-constant End, and passing the struct by reference wherever End may be modified, instead of holding an unsigned Start and an unsigned reference End.

Sorry it's taking me so long to get to this.
I've only looked at parts of the code, more forthcoming in the next few days. :-)

lib/Transforms/Vectorize/VPlan.h
115	I think naming this SubclassID (like Value does) or something similar would be clearer. Same for VRID.
117	This look not at all thread-safe, and it seems like the only thing that actually uses the UIDs is the printer. Perhaps assign the IDs on the fly in the printer? (Does anything else in LLVM do it this way?)
174	I think this should be ConstVPBlaocksTy or something of that sort. (Or, just use const VPBlocksTy?)
189	You can drop the "class" here and everywhere below.
223	The terminology here is somewhat confusing, since it's not immediately clear what the difference between an "ancestor" and a "predecessor" is. Maybe "enclosing region" instead of ancestor?
307	Same as above.
314	are -> is?
317	I'm not sure about the type safety of this. Is this how we do it elsewhere?
390	Do we want the const accessor? IIUC, if you're not modifying it, you should be using the forwarding methods. In any case, it looks like neither of these methods is used right now. Can we remove both?
394	This isn't used anywhere either - and even if it were, the whole thing looks odd.

mkuper added inline comments.Jun 4 2017, 7:01 PM

lib/Transforms/Vectorize/VPlan.cpp
43	Any chance to make the const/non-const versions, here and below, share implementation? It seems like it ought to be possible.
191	This is extremely confusing. You're modifying what looks like a local variable to actually change the State->Instance for the State that will get passed to Block->execute(). Could you rewrite in a way that makes it clear what's going on?
210	Why do we need to save/restore the builder IP?
219	I don't understand this note. :-)
lib/Transforms/Vectorize/VPlan.h
187	Does this strictly need to be public? It looks like it would only be used in the classof of subclasses. Can it be protected?
462	Is it possible not to have an Entry here? It seems like it shouldn't be, so this should probably be an assert.
497	Maybe a SmallSet?
520	Is this functionality we want?
522	I don't think we want a non-const version of this.
532	Should this be static? Or maybe even a free function in the implementation?

rengolin added inline comments.Jun 6 2017, 3:16 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
7658	shouldn't you have some asserts here to make sure the PHI node is what you need it to be? Same for other VPlans.
7870	I know you can re-use MinVF, but it would be more readable if you created a new variable VF to use inside the loop.
7875	This is confusing. Is this RSO just for the plan name? It'd seem more appropriate to have the name be a property of the plans, not some outside property.

(Addressing review comments; To be completed)

lib/Transforms/Vectorize/VPlan.cpp
43	Not sure how, w/o casting away const. We can drop the non-const getEntryBasicBlock() as only its const version is currently used, but both versions of getExitBasicBlock() are used.
191	OK, sure; will do so using an Optional<Instance>.
210	Ah, we don't :-).
219	Some subsequent code dereferences getFirstInsertionPt(), in LV or in code called by LV, w/o checking if it might be the end of a block.
lib/Transforms/Vectorize/VPlan.h
115	ok, sure.
117	Yes, UID is used for printing only, including the Name. Thinking of having the Planner keep track of this ordinal.
174	(Right)
317	Not sure what the type safety concern is?
462	It is conceivable for code motion to completely empty a Region from all its blocks.

(Addressing review comments; To be completed)

Done. Will upload updated version shortly.

lib/Transforms/Vectorize/VPlan.cpp
43	Leaving them as is.
lib/Transforms/Vectorize/VPlan.h
117	Assigning the IDs on the fly in the printer, instead.
187	It needs to be public. A method (classof) of a derived class cannot invoke a protected method on an object of the base class (not "this").
223	Added the following definition to clarify the terminology: /// An Ancestor of a block B is any block containing B, including B itself. So it's either an "enclosing region" or the block itself, which in turn may be a region or a basic-block.
390	Yes we can. (May sound old-fashioned ;-)
394	This odd looking thing is part of the ilist_node_with_parent contract, used by getPrevNode() and getNextNode().
497	Initially wanted to iterate over the VFs, but that's not needed now. Changed to SmallSet, and also replaced getVFs() with hasVF(unsigned).
520	Possibly, but not now, removed.
522	We can do w/o both versions, by only checking hasVF(unsigned).
532	Yes, this should be static. It's offloading the last part of VPlan::execute(), so good to keep adjacent.

(Addressing review comments; To be completed)

Done. Will upload updated version shortly.

Done.

mkuper added inline comments.Jun 9 2017, 6:21 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
1825	Why?
2056	As above, please group public/private members together.
2158	A vector of unique_ptr, maybe? Anyway, I'm fine with this as is.
2194	This seems like a somewhat odd API. Why do we only support cutting from the top, and not from the bottom or the middle? Do we expect to only ever prune from the top? Otherwise, I'd expect the range to be represented as a set, rather than an interval, and use a filter over that set. The current state may be fine for simplicity's sake, but i'd like to understand this better. Regardless, please rename the method. It's really surprising that a bool test...() modifies one of its arguments.
4522	Why the split between how we handle int and ptr inductions? To minimize the patch?
4566–4567	Perhaps this should now be named widenInstruction?
7486	I suppose this (and the rest of this code) is constructed this way because we intend to support more than one VPlan per VF/UF pair soon?
7603	Maybe break all of the recipe stuff out into a separate file? Two files, even? (header/implementation)
8095	Relevent -> Relevant
8103	Please add a comment somewhere explaining that the order in which we try the recipes matters. (e.g. tryToInterleave needs to come first.) I'm not sure how we should be handling recipe priority in general, if at all, but that's not for this review.
8135	Shouldn't we be resetting LastWidenRecipe somewhere, if we encountered an instruction that uses a different recipe?
lib/Transforms/Vectorize/VPlan.h
223	I would strongly prefer less confusing terminology (I think ancestor very strongly evokes "direct or indirect predecessor"), but if I'm the only one getting confused by this, feel free to keep it.
test/Transforms/LoopVectorize/if-pred-non-void.ll
213–215	The test changes look benign, but I'm curious about why we have them.

Ayal marked 2 inline comments as done.Jun 12 2017, 2:35 AM

Ayal added inline comments.

lib/Transforms/Vectorize/LoopVectorize.cpp
1825	Good catch; isProfitableToScalarize() should be called only for VF > 1, as it compares scalarizing to vectorizing-for-VF. All original callers to isProfitableToScalarize() first check if VF > 1 before calling it. This patch adds a new caller inside tryToWiden(), which also avoids calling it with VF == 1. Will adding an "assert(VF > 1);" instead of this "if (VF ==1) return true;".
2158	Originally D28975 did use "shared_ptr<VPlan>". However using regular pointers to VPlan seems simpler, just need to destroy them correctly at the end.
2194	Indeed a more general implementation of building VPlans could represent arbitrary sets of VF's rather than "Ranges" or intervals of VF's, which this patch uses. VPlans themselves are not confined to represent only ranges. This implementation builds VPlans for the full {1,2,4,8,...,MaxVF} range of feasible VF's by repeatedly building a VPlan starting from a given VF up until the maximum VF possible. Each vectorization decision can potentially reduce this maximum. The two extremes we could end up with are: one VPlan for all VF's, and one VPlan for each VF. Decisions hopefully exhibit this form of continuity, but they certainly don't have to. Will see if above should be added as a comment to buildVPlans(). Some alternative names we came up with: testAndClampVFRange() prefixTestVFRange() VFRangePrefixTest() testStartAndSetEndVF() any one of these looks ok? Better suggestions are welcome.
4522	Yes, we're keeping ILV's current split between its widenIntOrFpInduction() and how it handles ptr inductions and other phi instructions. The former was last addressed and merged by r296145; the latter, appearing below, is simpler and remains inline. In any case, addressing this split is independent of introducing VPlan.
4566–4567	Yup.
7486	Currently each VPlan covers a range of VF's and arbitrary UF's, where every feasible VF is covered by a single VPlan. In the end a single VPlan should remain, which is the one that gets executed. As soon as possible all other VPlans are discharged, as done here.
7603	Yes, this large LoopVectorizer.cpp file should be broken into multiple smaller files. ILV in its current form hinders doing so with these recipes. Follow-up patches should help facilitate this desired change.
7658	The asserts are there when execute() is called, e.g., when it calls ILV's widenIntOrFpInduction().
7870	OK, sure.
7875	Yes, this is to concatenate the VF's into the plan name, which is indeed a property of the plan. Will move from buildVPlans() to the end of buildVPlan().
8103	Sure. This ordering reflects the existing logic in LV. Note that in the future, some recipes such as Interleave Groups may be constructed as a VPlan-based transformation, instead of jointly with other recipes, relieving this specific ordering concern here.
8135	We could, as an optimization, because VPWidenRecipe's appendInstruction(Instr) currently succeeds only if Instr is its 'next' ingredient. But it's not necessary - we can simply call appendInstruction(Instr) for the LastWidenRecipe even if other instructions have gone by (and have it fail if so). This way more relaxed forms of appendInstruction() can be supported. OTOH, doing so would clutter each 'continue'.
lib/Transforms/Vectorize/VPlan.h
223	ok, "Ancestors" and "Predecessors" are admittedly ambiguous terms in English... will use "getEnclosingBlockWithPredecessors()" and "getEnclosingBlockWithSuccecessors()" instead, after defining an Enclosing Block of block B to be any block that contains B, including itself.
test/Transforms/LoopVectorize/if-pred-non-void.ll
213–215	These test changes are due to the order in which predicated-and-scalarized basic-blocks are created and filled. In current trunk, when the “udiv” is generated, the “extract” feeding its nominator is also generated and placed before it, courtesy of getScalarValue(). Later the “udiv” is placed in a basic-block of its own w/o its operands. Finally sinkScalarOperands() kicks in, sinking both its operands to appear next to it. In this patch, when the “udiv” is generated, the “extract” feeding its nominator is also generated and placed before it, courtesy of getOrCreateScalar(). But having already created a basic-block for the “udiv”, this “extract” is placed there as well, complying with LV's approach of placing extracts next to their uses. When finally sinkScalarOperands() kicks in, it has only the other operand to sink (the denominator). This discrepancy in the sinking order swaps the final placement of the two operands.

Updated following review comments.

mssimpso added inline comments.Jun 15 2017, 8:57 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
619–657	The interface to the vector and scalar instruction maps seems to be getting more complex and confusing with this patch. Do we actually need all of these additional functions? How do they interact with the existing ones (init* and get*)? Are the existing ones still necessary? I think eventually we want to move this interface out of ILV and into a standalone storage class. But it looks like one thing that changes with this patch is that the maps can be "incomplete" at a given instance (i.e., you call initVector with an Entry that has been initialized with null pointers, then later I assume use the new setters to complete the mapping). Currently, I think we only add a mapping to the storage once all the entry values are completely defined. Is there a reason this needs to change (it's fine if it does, I'm just trying to understand)? An advantage of the current approach would be that if a key exists in the map, we know that all of the entry values should be non-null (I think we've even caught some bugs this way). getVectorValue was made to return a constant reference to enforce the idea that the entry values shouldn't be changed once added to the map (although getVector still had to exist for handling reductions, so this never really fit well, admittedly). Sorry for the rambling comment - I think we need to better document the interface and remove the functions that are no longer necessary.

Ayal added inline comments.Jun 18 2017, 8:46 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
619–657	Yes, the existing interfaces are still needed. This patch introduces the ability for scalarizeInstruction() to generate a single replica when needed for scalarized and predicated instructions, alongside the existing ability of generating all replica's for other scalarized instructions. Yes, the newly added getOrCreateScalar() supports setting one replica at a time, and as such renders this mapping "incomplete" until the last replica is generated. In the case of generating scalarized and predicated instructions with VPlan, each scalar replica is generated separately as part of replicating the entire region, under the (existing) assumption that all uses of one replica are generated before generating the next replica. An alternative interface which may be simpler and clearer is to only support setting and getting a single Value per part, or per part-and-lane. Setting a value can assert that no value has already been set; this assert can be overridden when needed using a specialized "resetting" method. This keeps the implementation of MapStorage internal to ValueMap, albeit potentially being less efficient as it avoids batching (if not inlined). If so, it may be better done in a separate patch. Sounds reasonable? Current interface: ILV: const VectorParts &getVectorValue(Value V) Value getScalarValue(Value V, unsigned Part, unsigned Lane) ValueMap: bool hasVector(Value Key) bool hasScalar(Value Key) const VectorParts &initVector(Value Key, const VectorParts &Entry) const ScalarParts &initScalar(Value Key, const ScalarParts &Entry) VectorParts &getVector(Value Key) Alternative interface: ILV: Value getScalarValue(Value V, const VPIteration &I) // Get scalarized or extract vectorized. Value getVectorValue(Value V, unsigned Part) // Get vectorized or pack/broadcast scalarized/uniform. ValueMap: bool hasScalarValue(Value V, const VPIteration &I) bool hasVectorValue(Value V, unsigned Part) void setScalarValue(Value V, const VPIteration &I, Value Scalar) // Asserts value not already set. void setVectorValue(Value V, unsigned Part, Value Vector) // Asserts value not already set. void resetScalarValue(Value V, const VPIteration &I, Value Scalar) // Can assert value already set. void resetVectorValue(Value V, unsigned Part, Value *Vector) // Can assert value already set. Agreed, this mapping should eventually move out of ILV and into a standalone storage class, similar to the VPBB2IRBB mapping in TransformState.CFGState. We're trying to breakdown these and other changes into separate patches. Sorry for the rambling comment - I think we need to better document the interface and remove the functions that are no longer necessary. Seconded ;-)

Ayal mentioned this in D34473: [LV] Changing the interface of ValueMap.Jun 21 2017, 1:00 PM

An alternative interface which may be simpler and clearer ... it may be better done in a separate patch.

The alternative interface was uploaded as a separate patch D34473. It's not strictly required by this patch of VPlan, but can be committed separately if desired.

Ayal mentioned this in D34760: [LV] Fix PR33613 - retain order of insertelements per part.Jun 28 2017, 9:04 AM

Patch updated to llvm trunk, adapted to the new ValueMap interface of D34473. ValueMap is extracted to a standalone struct VectorizerValueMap.

Ping

Hi Ayal,

All my concerns were addressed. @mssimpso @mkuper are you also happy with the patch?

This is a big intrusive change and we'll probably have a few fine tune patches here and there, but overall, I think this is the right direction to go.

cheers,
--renato

Ayal mentioned this in D35725: [LV] Avoid redundant operations manipulating masks.Jul 21 2017, 2:25 PM

Hi Ayal,

Very sorry for the long delay. I don't have anymore comments for this patch. I agree with Renato that although it's large, we should begin iterating on it in tree (especially since we've already branched for the release at this point). But please let Michael have another look at it before committing.

Thanks!

I think the revision summary doesn't fully represent what's going on here at this point. :-)

In any case, I agree with Matt and Renato - this looks good to go, we can continue iterating on it in-tree.

lib/Transforms/Vectorize/LoopVectorize.cpp
2194	Ok, this still sounds odd, but we can iterate over this in-tree in the future. Re naming, testAndClampVFRange() sounds better, I think, but I'm good with anything that indicates this changes the range.
7603	It seems like putting all the recipes in a separate file would be easy to start with (instead of going into an anonymous namespace here.) If it isn't, I'm ok with doing this in follow-ups.

This revision is now accepted and ready to land.Aug 8 2017, 1:45 AM

Uploading the version updated to top of trunk before committing, including merging with SinkAfter patch D33058 by reordering ingredients before constructing recipes for them.

Also changed InnerLoopVectorizer to be under llvm rather than anonymous namespace, because building with gcc 4.8.5 produced the following warning:

[1/3] Building CXX object lib/Transfor.../LLVMVectorize.dir/LoopVectorize.cpp.o
In file included from llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:50:0:
llvm/lib/Transforms/Vectorize/VPlan.h:194:8: warning: 'llvm::VPTransformState' has a field 'llvm::VPTransformState::ILV' whose type uses the anonymous namespace [enabled by default]
struct VPTransformState {

Caught late as building with clang 4.0 did not produce this warning, or any other.

Closed by commit rL311077: [LV] Using VPlan to model the vectorized code and drive its transformation (authored by ayalz). · Explain WhyAug 17 2017, 2:31 AM

This revision was automatically updated to reflect the committed changes.

Previous upload missed newly added VPlan.h and VPlan.cpp, including them here. This is the version that was committed.

Fix PR34248: pack a predicated scalar into a vector only when vectorizing; avoid doing so when only unrolling. Add a test derived from the reproducer of PR34248.

Diffusion mentioned this in rL311849: [LV] Fix PR34248 - recommit D32871 after revert r311304.Aug 27 2017, 5:57 AM

Hi, it seems this patch causes a functional regression. I'm adding comments to https://reviews.llvm.org/rL311849.

I would appreciate if you could help me with some hints to narrow down the root cause.

Revision Contents

Path

Size

lib/

Transforms/

Vectorize/

1 line

1587 lines

789 lines

401 lines

test/

Transforms/

LoopVectorize/

AArch64/

aarch64-predication.ll

12 lines

predication_costs.ll

22 lines

SystemZ/

load-store-scalarization-cost.ll

6 lines

first-order-recurrence.ll

7 lines

if-pred-non-void.ll

36 lines

Diff 112467

lib/Transforms/Vectorize/CMakeLists.txt

	add_llvm_library(LLVMVectorize			add_llvm_library(LLVMVectorize
	LoadStoreVectorizer.cpp			LoadStoreVectorizer.cpp
	LoopVectorize.cpp			LoopVectorize.cpp
	SLPVectorizer.cpp			SLPVectorizer.cpp
	Vectorize.cpp			Vectorize.cpp
				VPlan.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	${LLVM_MAIN_INCLUDE_DIR}/llvm/Transforms			${LLVM_MAIN_INCLUDE_DIR}/llvm/Transforms

	DEPENDS			DEPENDS
	intrinsics_gen			intrinsics_gen
	)			)

lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
// A. Zaks and D. Nuzman. Autovectorization in GCC-two years later.		// A. Zaks and D. Nuzman. Autovectorization in GCC-two years later.
//		//
// S. Maleki, Y. Gao, M. Garzaran, T. Wong and D. Padua. An Evaluation of		// S. Maleki, Y. Gao, M. Garzaran, T. Wong and D. Padua. An Evaluation of
// Vectorizing Compilers.		// Vectorizing Compilers.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/Transforms/Vectorize/LoopVectorize.h"		#include "llvm/Transforms/Vectorize/LoopVectorize.h"
		#include "VPlan.h"
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/Hashing.h"		#include "llvm/ADT/Hashing.h"
#include "llvm/ADT/MapVector.h"		#include "llvm/ADT/MapVector.h"
#include "llvm/ADT/Optional.h"		#include "llvm/ADT/Optional.h"
#include "llvm/ADT/SCCIterator.h"		#include "llvm/ADT/SCCIterator.h"
#include "llvm/ADT/SetVector.h"		#include "llvm/ADT/SetVector.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallSet.h"		#include "llvm/ADT/SmallSet.h"
Show All 35 Lines
#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"		#include "llvm/Transforms/Utils/BasicBlockUtils.h"
#include "llvm/Transforms/Utils/Local.h"		#include "llvm/Transforms/Utils/Local.h"
#include "llvm/Transforms/Utils/LoopSimplify.h"		#include "llvm/Transforms/Utils/LoopSimplify.h"
#include "llvm/Transforms/Utils/LoopUtils.h"		#include "llvm/Transforms/Utils/LoopUtils.h"
#include "llvm/Transforms/Utils/LoopVersioning.h"		#include "llvm/Transforms/Utils/LoopVersioning.h"
#include "llvm/Transforms/Vectorize.h"		#include "llvm/Transforms/Vectorize.h"
#include <algorithm>		#include <algorithm>
		#include <functional>
#include <map>		#include <map>
#include <tuple>		#include <tuple>

using namespace llvm;		using namespace llvm;
using namespace llvm::PatternMatch;		using namespace llvm::PatternMatch;

#define LV_NAME "loop-vectorize"		#define LV_NAME "loop-vectorize"
#define DEBUG_TYPE LV_NAME		#define DEBUG_TYPE LV_NAME
▲ Show 20 Lines • Show All 136 Lines • ▼ Show 20 Lines

namespace {		namespace {

// Forward declarations.		// Forward declarations.
class LoopVectorizeHints;		class LoopVectorizeHints;
class LoopVectorizationLegality;		class LoopVectorizationLegality;
class LoopVectorizationCostModel;		class LoopVectorizationCostModel;
class LoopVectorizationRequirements;		class LoopVectorizationRequirements;
		class VPInterleaveRecipe;
		class VPReplicateRecipe;
		class VPWidenIntOrFpInductionRecipe;
		class VPWidenRecipe;

/// Returns true if the given loop body has a cycle, excluding the loop		/// Returns true if the given loop body has a cycle, excluding the loop
/// itself.		/// itself.
static bool hasCyclesInLoopBody(const Loop &L) {		static bool hasCyclesInLoopBody(const Loop &L) {
if (!L.empty())		if (!L.empty())
return true;		return true;

for (const auto &SCC :		for (const auto &SCC :
▲ Show 20 Lines • Show All 96 Lines • ▼ Show 20 Lines

/// A helper function that returns an integer or floating-point constant with		/// A helper function that returns an integer or floating-point constant with
/// value C.		/// value C.
static Constant getSignedIntOrFpConstant(Type Ty, int64_t C) {		static Constant getSignedIntOrFpConstant(Type Ty, int64_t C) {
return Ty->isIntegerTy() ? ConstantInt::getSigned(Ty, C)		return Ty->isIntegerTy() ? ConstantInt::getSigned(Ty, C)
: ConstantFP::get(Ty, C);		: ConstantFP::get(Ty, C);
}		}

		} // end anonymous namespace

		namespace llvm {
/// InnerLoopVectorizer vectorizes loops which contain only one basic		/// InnerLoopVectorizer vectorizes loops which contain only one basic
/// block to a specified vectorization factor (VF).		/// block to a specified vectorization factor (VF).
/// This class performs the widening of scalars into vectors, or multiple		/// This class performs the widening of scalars into vectors, or multiple
/// scalars. This class also implements the following features:		/// scalars. This class also implements the following features:
/// * It inserts an epilogue loop for handling loops that don't have iteration		/// * It inserts an epilogue loop for handling loops that don't have iteration
/// counts that are known to be a multiple of the vectorization factor.		/// counts that are known to be a multiple of the vectorization factor.
/// * It handles the code generation for reduction variables.		/// * It handles the code generation for reduction variables.
/// * Scalarization (implementation using scalars) of un-vectorizable		/// * Scalarization (implementation using scalars) of un-vectorizable
Show All 15 Lines	InnerLoopVectorizer(Loop *OrigLoop, PredicatedScalarEvolution &PSE,
: OrigLoop(OrigLoop), PSE(PSE), LI(LI), DT(DT), TLI(TLI), TTI(TTI),		: OrigLoop(OrigLoop), PSE(PSE), LI(LI), DT(DT), TLI(TLI), TTI(TTI),
AC(AC), ORE(ORE), VF(VecWidth), UF(UnrollFactor),		AC(AC), ORE(ORE), VF(VecWidth), UF(UnrollFactor),
Builder(PSE.getSE()->getContext()), Induction(nullptr),		Builder(PSE.getSE()->getContext()), Induction(nullptr),
OldInduction(nullptr), VectorLoopValueMap(UnrollFactor, VecWidth),		OldInduction(nullptr), VectorLoopValueMap(UnrollFactor, VecWidth),
TripCount(nullptr), VectorTripCount(nullptr), Legal(LVL), Cost(CM),		TripCount(nullptr), VectorTripCount(nullptr), Legal(LVL), Cost(CM),
AddedSafetyChecks(false) {}		AddedSafetyChecks(false) {}

/// Create a new empty loop. Unlink the old loop and connect the new one.		/// Create a new empty loop. Unlink the old loop and connect the new one.
void createVectorizedLoopSkeleton();		/// Return the pre-header block of the new loop.
		BasicBlock *createVectorizedLoopSkeleton();

/// Vectorize a single instruction within the innermost loop.		/// Widen a single instruction within the innermost loop.
void vectorizeInstruction(Instruction &I);		void widenInstruction(Instruction &I);

/// Fix the vectorized code, taking care of header phi's, live-outs, and more.		/// Fix the vectorized code, taking care of header phi's, live-outs, and more.
void fixVectorizedLoop();		void fixVectorizedLoop();

// Return true if any runtime check is added.		// Return true if any runtime check is added.
bool areSafetyChecksAdded() { return AddedSafetyChecks; }		bool areSafetyChecksAdded() { return AddedSafetyChecks; }

virtual ~InnerLoopVectorizer() {}		virtual ~InnerLoopVectorizer() {}

protected:
/// A small list of PHINodes.
typedef SmallVector<PHINode *, 4> PhiVector;

/// A type for vectorized values in the new loop. Each value from the		/// A type for vectorized values in the new loop. Each value from the
/// original loop, when vectorized, is represented by UF vector values in the		/// original loop, when vectorized, is represented by UF vector values in the
/// new unrolled loop, where UF is the unroll factor.		/// new unrolled loop, where UF is the unroll factor.
typedef SmallVector<Value *, 2> VectorParts;		typedef SmallVector<Value *, 2> VectorParts;

		/// A helper function that computes the predicate of the block BB, assuming
		/// that the header block of the loop is set to True. It returns the entry
		/// mask for the block BB.
		VectorParts createBlockInMask(BasicBlock *BB);

		/// Vectorize a single PHINode in a block. This method handles the induction
		/// variable canonicalization. It supports both VF = 1 for unrolled loops and
		/// arbitrary length vectors.
		void widenPHIInstruction(Instruction *PN, unsigned UF, unsigned VF);

		/// A helper function to scalarize a single Instruction in the innermost loop.
		/// Generates a sequence of scalar instances for each lane between \p MinLane
		/// and \p MaxLane, times each part between \p MinPart and \p MaxPart,
		/// inclusive..
		void scalarizeInstruction(Instruction *Instr, const VPIteration &Instance,
		bool IfPredicateInstr);

		/// Widen an integer or floating-point induction variable \p IV. If \p Trunc
		/// is provided, the integer induction variable will first be truncated to
		/// the corresponding type.
		void widenIntOrFpInduction(PHINode IV, TruncInst Trunc = nullptr);

		/// getOrCreateVectorValue and getOrCreateScalarValue coordinate to generate a
		/// vector or scalar value on-demand if one is not yet available. When
		/// vectorizing a loop, we visit the definition of an instruction before its
		/// uses. When visiting the definition, we either vectorize or scalarize the
		/// instruction, creating an entry for it in the corresponding map. (In some
		/// cases, such as induction variables, we will create both vector and scalar
		/// entries.) Then, as we encounter uses of the definition, we derive values
		/// for each scalar or vector use unless such a value is already available.
		/// For example, if we scalarize a definition and one of its uses is vector,
		/// we build the required vector on-demand with an insertelement sequence
		/// when visiting the use. Otherwise, if the use is scalar, we can use the
		/// existing scalar definition.
		///
		/// Return a value in the new loop corresponding to \p V from the original
		/// loop at unroll index \p Part. If the value has already been vectorized,
		/// the corresponding vector entry in VectorLoopValueMap is returned. If,
		/// however, the value has a scalar entry in VectorLoopValueMap, we construct
		/// a new vector value on-demand by inserting the scalar values into a vector
		/// with an insertelement sequence. If the value has been neither vectorized
		/// nor scalarized, it must be loop invariant, so we simply broadcast the
		/// value into a vector.
		Value getOrCreateVectorValue(Value V, unsigned Part);

		/// Return a value in the new loop corresponding to \p V from the original
		/// loop at unroll and vector indices \p Instance. If the value has been
		/// vectorized but not scalarized, the necessary extractelement instruction
		/// will be generated.
		Value getOrCreateScalarValue(Value V, const VPIteration &Instance);

		/// Construct the vector value of a scalarized value \p V one lane at a time.
		void packScalarIntoVectorValue(Value *V, const VPIteration &Instance);

		/// Try to vectorize the interleaved access group that \p Instr belongs to.
		void vectorizeInterleaveGroup(Instruction *Instr);

		protected:
		/// A small list of PHINodes.
		typedef SmallVector<PHINode *, 4> PhiVector;

/// A type for scalarized values in the new loop. Each value from the		/// A type for scalarized values in the new loop. Each value from the
/// original loop, when scalarized, is represented by UF x VF scalar values		/// original loop, when scalarized, is represented by UF x VF scalar values
/// in the new unrolled loop, where UF is the unroll factor and VF is the		/// in the new unrolled loop, where UF is the unroll factor and VF is the
/// vectorization factor.		/// vectorization factor.
typedef SmallVector<SmallVector<Value *, 4>, 2> ScalarParts;		typedef SmallVector<SmallVector<Value *, 4>, 2> ScalarParts;

// When we if-convert we need to create edge masks. We have to cache values		// When we if-convert we need to create edge masks. We have to cache values
// so that we don't end up with exponential recursion/IR.		// so that we don't end up with exponential recursion/IR.
Show All 26 Lines	protected:
/// that were defined inside the loop and we should have one value for		/// that were defined inside the loop and we should have one value for
/// each predecessor of its parent basic block. See PR14725.		/// each predecessor of its parent basic block. See PR14725.
void fixLCSSAPHIs();		void fixLCSSAPHIs();

/// Iteratively sink the scalarized operands of a predicated instruction into		/// Iteratively sink the scalarized operands of a predicated instruction into
/// the block that was created for it.		/// the block that was created for it.
void sinkScalarOperands(Instruction *PredInst);		void sinkScalarOperands(Instruction *PredInst);

/// Predicate conditional instructions that require predication on their
/// respective conditions.
void predicateInstructions();

/// Shrinks vector element sizes to the smallest bitwidth they can be legally		/// Shrinks vector element sizes to the smallest bitwidth they can be legally
/// represented as.		/// represented as.
void truncateToMinimalBitwidths();		void truncateToMinimalBitwidths();

/// A helper function that computes the predicate of the block BB, assuming
/// that the header block of the loop is set to True. It returns the entry
/// mask for the block BB.
VectorParts createBlockInMask(BasicBlock *BB);
/// A helper function that computes the predicate of the edge between SRC		/// A helper function that computes the predicate of the edge between SRC
		rengolinUnsubmitted Not Done Reply Inline Actions The public protected swap is confusing. I'd create a "protected bubble" and move the methods inside it. rengolin: The public protected swap is confusing. I'd create a "protected bubble" and move the methods…
		AyalAuthorUnsubmitted Not Done Reply Inline Actions Sorry for this confusion. The idea was to avoid moving methods around to simplify the diff: Protected methods made public to enable their reuse are annotated as such in-place to simplify the diff. They will eventually be moved and grouped together. An alternative is to use "friend" instead. Ayal: Sorry for this confusion. The idea was to avoid moving methods around to simplify the diff: >…
		rengolinUnsubmitted Done Reply Inline Actions Grouping should be fine. rengolin: Grouping should be fine.
		AyalAuthorUnsubmitted Not Done Reply Inline Actions Done. Also desired for "public"-only changes to CostModel methods? Ayal: Done. Also desired for "public"-only changes to CostModel methods?
/// and DST.		/// and DST.
VectorParts createEdgeMask(BasicBlock Src, BasicBlock Dst);		VectorParts createEdgeMask(BasicBlock Src, BasicBlock Dst);

/// Vectorize a single PHINode in a block. This method handles the induction
/// variable canonicalization. It supports both VF = 1 for unrolled loops and
/// arbitrary length vectors.
void widenPHIInstruction(Instruction *PN, unsigned UF, unsigned VF);

/// Insert the new loop to the loop hierarchy and pass manager		/// Insert the new loop to the loop hierarchy and pass manager
/// and update the analysis passes.		/// and update the analysis passes.
void updateAnalysis();		void updateAnalysis();

/// This instruction is un-vectorizable. Implement it as a sequence
/// of scalars. If \p IfPredicateInstr is true we need to 'hide' each
/// scalarized instruction behind an if block predicated on the control
/// dependence of the instruction.
void scalarizeInstruction(Instruction *Instr, bool IfPredicateInstr = false);

/// Vectorize Load and Store instructions,		/// Vectorize Load and Store instructions,
virtual void vectorizeMemoryInstruction(Instruction *Instr);		virtual void vectorizeMemoryInstruction(Instruction *Instr);

/// Create a broadcast instruction. This method generates a broadcast		/// Create a broadcast instruction. This method generates a broadcast
/// instruction (shuffle) for loop invariant values and for the induction		/// instruction (shuffle) for loop invariant values and for the induction
		rengolinUnsubmitted Not Done Reply Inline Actions Not virtual anymore? Should it be marked "final"? rengolin: Not virtual anymore? Should it be marked "final"?
		AyalAuthorUnsubmitted Not Done Reply Inline Actions This should have been part of r297580 originally. In general ILV's methods are marked virtual iff they're overridden by Unroller; otherwise they may be marked "final". This is unrelated to VPlan. Sorry for the confusion. Ayal: This should have been part of r297580 originally. In general ILV's methods are marked virtual…
		rengolinUnsubmitted Done Reply Inline Actions Right, a separate fix is better in this case, then. rengolin: Right, a separate fix is better in this case, then.
		AyalAuthorUnsubmitted Not Done Reply Inline Actions Dropping 'virtual' from scalarizeInstruction() was committed separately. Ayal: Dropping 'virtual' from scalarizeInstruction() was committed separately.
/// value. If this is the induction variable then we extend it to N, N+1, ...		/// value. If this is the induction variable then we extend it to N, N+1, ...
/// this is needed because each iteration in the loop corresponds to a SIMD		/// this is needed because each iteration in the loop corresponds to a SIMD
/// element.		/// element.
virtual Value getBroadcastInstrs(Value V);		virtual Value getBroadcastInstrs(Value V);

/// This function adds (StartIdx, StartIdx + Step, StartIdx + 2*Step, ...)		/// This function adds (StartIdx, StartIdx + Step, StartIdx + 2*Step, ...)
/// to each vector element of Val. The sequence starts at StartIndex.		/// to each vector element of Val. The sequence starts at StartIndex.
/// \p Opcode is relevant for FP induction variable.		/// \p Opcode is relevant for FP induction variable.
Show All 12 Lines	protected:
/// Create a vector induction phi node based on an existing scalar one. \p		/// Create a vector induction phi node based on an existing scalar one. \p
/// EntryVal is the value from the original loop that maps to the vector phi		/// EntryVal is the value from the original loop that maps to the vector phi
/// node, and \p Step is the loop-invariant step. If \p EntryVal is a		/// node, and \p Step is the loop-invariant step. If \p EntryVal is a
/// truncate instruction, instead of widening the original IV, we widen a		/// truncate instruction, instead of widening the original IV, we widen a
/// version of the IV truncated to \p EntryVal's type.		/// version of the IV truncated to \p EntryVal's type.
void createVectorIntOrFpInductionPHI(const InductionDescriptor &II,		void createVectorIntOrFpInductionPHI(const InductionDescriptor &II,
Value Step, Instruction EntryVal);		Value Step, Instruction EntryVal);

/// Widen an integer or floating-point induction variable \p IV. If \p Trunc
/// is provided, the integer induction variable will first be truncated to
/// the corresponding type.
void widenIntOrFpInduction(PHINode IV, TruncInst Trunc = nullptr);

/// Returns true if an instruction \p I should be scalarized instead of		/// Returns true if an instruction \p I should be scalarized instead of
/// vectorized for the chosen vectorization factor.		/// vectorized for the chosen vectorization factor.
bool shouldScalarizeInstruction(Instruction *I) const;		bool shouldScalarizeInstruction(Instruction *I) const;

/// Returns true if we should generate a scalar version of \p IV.		/// Returns true if we should generate a scalar version of \p IV.
bool needsScalarInduction(Instruction *IV) const;		bool needsScalarInduction(Instruction *IV) const;

/// getOrCreateVectorValue and getOrCreateScalarValue coordinate to generate a
/// vector or scalar value on-demand if one is not yet available. When
/// vectorizing a loop, we visit the definition of an instruction before its
/// uses. When visiting the definition, we either vectorize or scalarize the
/// instruction, creating an entry for it in the corresponding map. (In some
/// cases, such as induction variables, we will create both vector and scalar
/// entries.) Then, as we encounter uses of the definition, we derive values
/// for each scalar or vector use unless such a value is already available.
/// For example, if we scalarize a definition and one of its uses is vector,
/// we build the required vector on-demand with an insertelement sequence
/// when visiting the use. Otherwise, if the use is scalar, we can use the
/// existing scalar definition.
///
/// Return a value in the new loop corresponding to \p V from the original
/// loop at unroll index \p Part. If the value has already been vectorized,
/// the corresponding vector entry in VectorLoopValueMap is returned. If,
/// however, the value has a scalar entry in VectorLoopValueMap, we construct
/// a new vector value on-demand by inserting the scalar values into a vector
/// with an insertelement sequence. If the value has been neither vectorized
/// nor scalarized, it must be loop invariant, so we simply broadcast the
/// value into a vector.
Value getOrCreateVectorValue(Value V, unsigned Part);

/// Return a value in the new loop corresponding to \p V from the original
/// loop at unroll index \p Part and vector index \p Lane. If the value has
/// been vectorized but not scalarized, the necessary extractelement
/// instruction will be generated.
Value getOrCreateScalarValue(Value V, unsigned Part, unsigned Lane);

/// Try to vectorize the interleaved access group that \p Instr belongs to.
void vectorizeInterleaveGroup(Instruction *Instr);

/// Generate a shuffle sequence that will reverse the vector Vec.		/// Generate a shuffle sequence that will reverse the vector Vec.
virtual Value reverseVector(Value Vec);		virtual Value reverseVector(Value Vec);

/// Returns (and creates if needed) the original loop trip count.		/// Returns (and creates if needed) the original loop trip count.
Value getOrCreateTripCount(Loop NewLoop);		Value getOrCreateTripCount(Loop NewLoop);

/// Returns (and creates if needed) the trip count of the widened loop.		/// Returns (and creates if needed) the trip count of the widened loop.
Value getOrCreateVectorTripCount(Loop NewLoop);		Value getOrCreateVectorTripCount(Loop NewLoop);
Show All 24 Lines	protected:
/// \brief Similar to the previous function but it adds the metadata to a		/// \brief Similar to the previous function but it adds the metadata to a
/// vector of instructions.		/// vector of instructions.
void addMetadata(ArrayRef<Value > To, Instruction From);		void addMetadata(ArrayRef<Value > To, Instruction From);

/// \brief Set the debug location in the builder using the debug location in		/// \brief Set the debug location in the builder using the debug location in
/// the instruction.		/// the instruction.
void setDebugLocFromInst(IRBuilder<> &B, const Value *Ptr);		void setDebugLocFromInst(IRBuilder<> &B, const Value *Ptr);

/// This is a helper class for maintaining vectorization state. It's used for
/// mapping values from the original loop to their corresponding values in
/// the new loop. Two mappings are maintained: one for vectorized values and
/// one for scalarized values. Vectorized values are represented with UF
/// vector values in the new loop, and scalarized values are represented with
/// UF x VF scalar values in the new loop. UF and VF are the unroll and
/// vectorization factors, respectively.
///
/// Entries can be added to either map with setVectorValue and setScalarValue,
/// which assert that an entry was not already added before. If an entry is to
/// replace an existing one, call resetVectorValue. This is currently needed
/// to modify the mapped values during "fix-up" operations that occur once the
/// first phase of widening is complete. These operations include type
/// truncation and the second phase of recurrence widening.
///
/// Entries from either map can be retrieved using the getVectorValue and
/// getScalarValue functions, which assert that the desired value exists.

struct ValueMap {

/// Construct an empty map with the given unroll and vectorization factors.
ValueMap(unsigned UF, unsigned VF) : UF(UF), VF(VF) {}

/// \return True if the map has any vector entry for \p Key.
bool hasAnyVectorValue(Value *Key) const {
return VectorMapStorage.count(Key);
}

/// \return True if the map has a vector entry for \p Key and \p Part.
bool hasVectorValue(Value *Key, unsigned Part) const {
assert(Part < UF && "Queried Vector Part is too large.");
if (!hasAnyVectorValue(Key))
return false;
const VectorParts &Entry = VectorMapStorage.find(Key)->second;
assert(Entry.size() == UF && "VectorParts has wrong dimensions.");
return Entry[Part] != nullptr;
}

/// \return True if the map has any scalar entry for \p Key.
bool hasAnyScalarValue(Value *Key) const {
return ScalarMapStorage.count(Key);
}

/// \return True if the map has a scalar entry for \p Key, \p Part and
/// \p Part.
bool hasScalarValue(Value *Key, unsigned Part, unsigned Lane) const {
assert(Part < UF && "Queried Scalar Part is too large.");
assert(Lane < VF && "Queried Scalar Lane is too large.");
if (!hasAnyScalarValue(Key))
return false;
const ScalarParts &Entry = ScalarMapStorage.find(Key)->second;
assert(Entry.size() == UF && "ScalarParts has wrong dimensions.");
assert(Entry[Part].size() == VF && "ScalarParts has wrong dimensions.");
return Entry[Part][Lane] != nullptr;
}

/// Retrieve the existing vector value that corresponds to \p Key and
/// \p Part.
Value getVectorValue(Value Key, unsigned Part) {
assert(hasVectorValue(Key, Part) && "Getting non-existent value.");
return VectorMapStorage[Key][Part];
}

/// Retrieve the existing scalar value that corresponds to \p Key, \p Part
/// and \p Lane.
Value getScalarValue(Value Key, unsigned Part, unsigned Lane) {
assert(hasScalarValue(Key, Part, Lane) && "Getting non-existent value.");
return ScalarMapStorage[Key][Part][Lane];
}

/// Set a vector value associated with \p Key and \p Part. Assumes such a
/// value is not already set. If it is, use resetVectorValue() instead.
void setVectorValue(Value Key, unsigned Part, Value Vector) {
assert(!hasVectorValue(Key, Part) && "Vector value already set for part");
if (!VectorMapStorage.count(Key)) {
VectorParts Entry(UF);
VectorMapStorage[Key] = Entry;
}
VectorMapStorage[Key][Part] = Vector;
}

/// Set a scalar value associated with \p Key for \p Part and \p Lane.
/// Assumes such a value is not already set.
void setScalarValue(Value *Key, unsigned Part, unsigned Lane,
Value *Scalar) {
assert(!hasScalarValue(Key, Part, Lane) && "Scalar value already set");
if (!ScalarMapStorage.count(Key)) {
ScalarParts Entry(UF);
for (unsigned Part = 0; Part < UF; ++Part)
Entry[Part].resize(VF, nullptr);
// TODO: Consider storing uniform values only per-part, as they occupy
// lane 0 only, keeping the other VF-1 redundant entries null.
ScalarMapStorage[Key] = Entry;
}
ScalarMapStorage[Key][Part][Lane] = Scalar;
}

/// Reset the vector value associated with \p Key for the given \p Part.
/// This function can be used to update values that have already been
/// vectorized. This is the case for "fix-up" operations including type
/// truncation and the second phase of recurrence vectorization.
void resetVectorValue(Value Key, unsigned Part, Value Vector) {
assert(hasVectorValue(Key, Part) && "Vector value not set for part");
VectorMapStorage[Key][Part] = Vector;
}

private:
/// The unroll factor. Each entry in the vector map contains UF vector
/// values.
unsigned UF;

/// The vectorization factor. Each entry in the scalar map contains UF x VF
/// scalar values.
unsigned VF;

/// The vector and scalar map storage. We use std::map and not DenseMap
/// because insertions to DenseMap invalidate its iterators.
std::map<Value *, VectorParts> VectorMapStorage;
std::map<Value *, ScalarParts> ScalarMapStorage;
};

/// The original loop.		/// The original loop.
		rengolinUnsubmitted Done Reply Inline Actions A short comment would be nice, here. rengolin: A short comment would be nice, here.
Loop *OrigLoop;		Loop *OrigLoop;
/// A wrapper around ScalarEvolution used to add runtime SCEV checks. Applies		/// A wrapper around ScalarEvolution used to add runtime SCEV checks. Applies
/// dynamic knowledge to simplify SCEV expressions and converts them to a		/// dynamic knowledge to simplify SCEV expressions and converts them to a
/// more usable form.		/// more usable form.
PredicatedScalarEvolution &PSE;		PredicatedScalarEvolution &PSE;
/// Loop Info.		/// Loop Info.
LoopInfo *LI;		LoopInfo *LI;
/// Dominator Tree.		/// Dominator Tree.
DominatorTree *DT;		DominatorTree *DT;
/// Alias Analysis.		/// Alias Analysis.
AliasAnalysis *AA;		AliasAnalysis *AA;
/// Target Library Info.		/// Target Library Info.
const TargetLibraryInfo *TLI;		const TargetLibraryInfo *TLI;
/// Target Transform Info.		/// Target Transform Info.
const TargetTransformInfo *TTI;		const TargetTransformInfo *TTI;
/// Assumption Cache.		/// Assumption Cache.
AssumptionCache *AC;		AssumptionCache *AC;
/// Interface to emit optimization remarks.		/// Interface to emit optimization remarks.
OptimizationRemarkEmitter *ORE;		OptimizationRemarkEmitter *ORE;

/// \brief LoopVersioning. It's only set up (non-null) if memchecks were		/// \brief LoopVersioning. It's only set up (non-null) if memchecks were
/// used.		/// used.
///		///
/// This is currently only used to add no-alias metadata based on the		/// This is currently only used to add no-alias metadata based on the
/// memchecks. The actually versioning is performed manually.		/// memchecks. The actually versioning is performed manually.
std::unique_ptr<LoopVersioning> LVer;		std::unique_ptr<LoopVersioning> LVer;

/// The vectorization SIMD factor to use. Each vector will have this many		/// The vectorization SIMD factor to use. Each vector will have this many
/// vector elements.		/// vector elements.
unsigned VF;		unsigned VF;

protected:
/// The vectorization unroll factor to use. Each scalar is vectorized to this		/// The vectorization unroll factor to use. Each scalar is vectorized to this
/// many different vector instructions.		/// many different vector instructions.
unsigned UF;		unsigned UF;

/// The builder that we use		/// The builder that we use
IRBuilder<> Builder;		IRBuilder<> Builder;

		mssimpsoUnsubmitted Not Done Reply Inline Actions The interface to the vector and scalar instruction maps seems to be getting more complex and confusing with this patch. Do we actually need all of these additional functions? How do they interact with the existing ones (init* and get)? Are the existing ones still necessary? I think eventually we want to move this interface out of ILV and into a standalone storage class. But it looks like one thing that changes with this patch is that the maps can be "incomplete" at a given instance (i.e., you call initVector with an Entry that has been initialized with null pointers, then later I assume use the new setters to complete the mapping). Currently, I think we only add a mapping to the storage once all the entry values are completely defined. Is there a reason this needs to change (it's fine if it does, I'm just trying to understand)? An advantage of the current approach would be that if a key exists in the map, we know that all of the entry values should be non-null (I think we've even caught some bugs this way). getVectorValue was made to return a constant reference to enforce the idea that the entry values shouldn't be changed once added to the map (although getVector still had to exist for handling reductions, so this never really fit well, admittedly). Sorry for the rambling comment - I think we need to better document the interface and remove the functions that are no longer necessary. mssimpso:* The interface to the vector and scalar instruction maps seems to be getting more complex and…
		AyalAuthorUnsubmitted Not Done Reply Inline Actions Yes, the existing interfaces are still needed. This patch introduces the ability for scalarizeInstruction() to generate a single replica when needed for scalarized and predicated instructions, alongside the existing ability of generating all replica's for other scalarized instructions. Yes, the newly added getOrCreateScalar() supports setting one replica at a time, and as such renders this mapping "incomplete" until the last replica is generated. In the case of generating scalarized and predicated instructions with VPlan, each scalar replica is generated separately as part of replicating the entire region, under the (existing) assumption that all uses of one replica are generated before generating the next replica. An alternative interface which may be simpler and clearer is to only support setting and getting a single Value per part, or per part-and-lane. Setting a value can assert that no value has already been set; this assert can be overridden when needed using a specialized "resetting" method. This keeps the implementation of MapStorage internal to ValueMap, albeit potentially being less efficient as it avoids batching (if not inlined). If so, it may be better done in a separate patch. Sounds reasonable? Current interface: ILV: const VectorParts &getVectorValue(Value V) Value getScalarValue(Value V, unsigned Part, unsigned Lane) ValueMap: bool hasVector(Value Key) bool hasScalar(Value Key) const VectorParts &initVector(Value Key, const VectorParts &Entry) const ScalarParts &initScalar(Value Key, const ScalarParts &Entry) VectorParts &getVector(Value Key) Alternative interface: ILV: Value getScalarValue(Value V, const VPIteration &I) // Get scalarized or extract vectorized. Value getVectorValue(Value V, unsigned Part) // Get vectorized or pack/broadcast scalarized/uniform. ValueMap: bool hasScalarValue(Value V, const VPIteration &I) bool hasVectorValue(Value V, unsigned Part) void setScalarValue(Value V, const VPIteration &I, Value Scalar) // Asserts value not already set. void setVectorValue(Value V, unsigned Part, Value Vector) // Asserts value not already set. void resetScalarValue(Value V, const VPIteration &I, Value Scalar) // Can assert value already set. void resetVectorValue(Value V, unsigned Part, Value Vector) // Can assert value already set. Agreed, this mapping should eventually move out of ILV and into a standalone storage class, similar to the VPBB2IRBB mapping in TransformState.CFGState. We're trying to breakdown these and other changes into separate patches. Sorry for the rambling comment - I think we need to better document the interface and remove the functions that are no longer necessary. Seconded ;-) Ayal:* Yes, the existing interfaces are still needed. This patch introduces the ability for…
// --- Vectorization state ---		// --- Vectorization state ---

/// The vector-loop preheader.		/// The vector-loop preheader.
BasicBlock *LoopVectorPreHeader;		BasicBlock *LoopVectorPreHeader;
/// The scalar-loop preheader.		/// The scalar-loop preheader.
BasicBlock *LoopScalarPreHeader;		BasicBlock *LoopScalarPreHeader;
/// Middle Block between the vector and the scalar.		/// Middle Block between the vector and the scalar.
BasicBlock *LoopMiddleBlock;		BasicBlock *LoopMiddleBlock;
Show All 10 Lines	protected:
PHINode *Induction;		PHINode *Induction;
/// The induction variable of the old basic block.		/// The induction variable of the old basic block.
PHINode *OldInduction;		PHINode *OldInduction;

/// Maps values from the original loop to their corresponding values in the		/// Maps values from the original loop to their corresponding values in the
/// vectorized loop. A key value can map to either vector values, scalar		/// vectorized loop. A key value can map to either vector values, scalar
/// values or both kinds of values, depending on whether the key was		/// values or both kinds of values, depending on whether the key was
/// vectorized and scalarized.		/// vectorized and scalarized.
ValueMap VectorLoopValueMap;		VectorizerValueMap VectorLoopValueMap;

/// Store instructions that should be predicated, as a pair		/// Store instructions that were predicated.
/// <StoreInst, Predicate>		SmallVector<Instruction *, 4> PredicatedInstructions;
SmallVector<std::pair<Instruction , Value >, 4> PredicatedInstructions;
EdgeMaskCacheTy EdgeMaskCache;		EdgeMaskCacheTy EdgeMaskCache;
BlockMaskCacheTy BlockMaskCache;		BlockMaskCacheTy BlockMaskCache;
/// Trip count of the original loop.		/// Trip count of the original loop.
Value *TripCount;		Value *TripCount;
/// Trip count of the widened loop (TripCount - TripCount % (VF*UF))		/// Trip count of the widened loop (TripCount - TripCount % (VF*UF))
Value *VectorTripCount;		Value *VectorTripCount;

/// The legality analysis.		/// The legality analysis.
LoopVectorizationLegality *Legal;		LoopVectorizationLegality *Legal;

/// The profitablity analysis.		/// The profitablity analysis.
LoopVectorizationCostModel *Cost;		LoopVectorizationCostModel *Cost;

// Record whether runtime checks are added.		// Record whether runtime checks are added.
bool AddedSafetyChecks;		bool AddedSafetyChecks;

// Holds the end values for each induction variable. We save the end values		// Holds the end values for each induction variable. We save the end values
// so we can later fix-up the external users of the induction variables.		// so we can later fix-up the external users of the induction variables.
DenseMap<PHINode , Value > IVEndValues;		DenseMap<PHINode , Value > IVEndValues;

		friend class LoopVectorizationPlanner;
};		};

class InnerLoopUnroller : public InnerLoopVectorizer {		class InnerLoopUnroller : public InnerLoopVectorizer {
public:		public:
InnerLoopUnroller(Loop *OrigLoop, PredicatedScalarEvolution &PSE,		InnerLoopUnroller(Loop *OrigLoop, PredicatedScalarEvolution &PSE,
LoopInfo LI, DominatorTree DT,		LoopInfo LI, DominatorTree DT,
const TargetLibraryInfo *TLI,		const TargetLibraryInfo *TLI,
const TargetTransformInfo TTI, AssumptionCache AC,		const TargetTransformInfo TTI, AssumptionCache AC,
OptimizationRemarkEmitter *ORE, unsigned UnrollFactor,		OptimizationRemarkEmitter *ORE, unsigned UnrollFactor,
LoopVectorizationLegality *LVL,		LoopVectorizationLegality *LVL,
LoopVectorizationCostModel *CM)		LoopVectorizationCostModel *CM)
: InnerLoopVectorizer(OrigLoop, PSE, LI, DT, TLI, TTI, AC, ORE, 1,		: InnerLoopVectorizer(OrigLoop, PSE, LI, DT, TLI, TTI, AC, ORE, 1,
UnrollFactor, LVL, CM) {}		UnrollFactor, LVL, CM) {}

private:		private:
void vectorizeMemoryInstruction(Instruction *Instr) override;
Value getBroadcastInstrs(Value V) override;		Value getBroadcastInstrs(Value V) override;
Value getStepVector(Value Val, int StartIdx, Value *Step,		Value getStepVector(Value Val, int StartIdx, Value *Step,
Instruction::BinaryOps Opcode =		Instruction::BinaryOps Opcode =
Instruction::BinaryOpsEnd) override;		Instruction::BinaryOpsEnd) override;
Value reverseVector(Value Vec) override;		Value reverseVector(Value Vec) override;
};		};

/// \brief Look for a meaningful debug location on the instruction or it's		/// \brief Look for a meaningful debug location on the instruction or it's
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
void InnerLoopVectorizer::addMetadata(ArrayRef<Value *> To,		void InnerLoopVectorizer::addMetadata(ArrayRef<Value *> To,
Instruction *From) {		Instruction *From) {
for (Value *V : To) {		for (Value *V : To) {
if (Instruction *I = dyn_cast<Instruction>(V))		if (Instruction *I = dyn_cast<Instruction>(V))
addMetadata(I, From);		addMetadata(I, From);
}		}
}		}

		} // namespace llvm

		namespace {

/// \brief The group of interleaved loads/stores sharing the same stride and		/// \brief The group of interleaved loads/stores sharing the same stride and
/// close to each other.		/// close to each other.
///		///
/// Each member in this group has an index starting from 0, and the largest		/// Each member in this group has an index starting from 0, and the largest
/// index should be less than interleaved factor, which is equal to the absolute		/// index should be less than interleaved factor, which is equal to the absolute
/// value of the access's stride.		/// value of the access's stride.
///		///
/// E.g. An interleaved load group of factor 4:		/// E.g. An interleaved load group of factor 4:
▲ Show 20 Lines • Show All 1,005 Lines • ▼ Show 20 Lines	public:
/// type.		/// type.
const MapVector<Instruction *, uint64_t> &getMinimalBitwidths() const {		const MapVector<Instruction *, uint64_t> &getMinimalBitwidths() const {
return MinBWs;		return MinBWs;
}		}

/// \returns True if it is more profitable to scalarize instruction \p I for		/// \returns True if it is more profitable to scalarize instruction \p I for
/// vectorization factor \p VF.		/// vectorization factor \p VF.
bool isProfitableToScalarize(Instruction *I, unsigned VF) const {		bool isProfitableToScalarize(Instruction *I, unsigned VF) const {
		assert(VF > 1 && "Profitable to scalarize relevant only for VF > 1.");
		mkuperUnsubmitted Not Done Reply Inline Actions Why? mkuper: Why?
		AyalAuthorUnsubmitted Not Done Reply Inline Actions Good catch; isProfitableToScalarize() should be called only for VF > 1, as it compares scalarizing to vectorizing-for-VF. All original callers to isProfitableToScalarize() first check if VF > 1 before calling it. This patch adds a new caller inside tryToWiden(), which also avoids calling it with VF == 1. Will adding an "assert(VF > 1);" instead of this "if (VF ==1) return true;". Ayal: Good catch; isProfitableToScalarize() should be called only for VF > 1, as it compares…
auto Scalars = InstsToScalarize.find(VF);		auto Scalars = InstsToScalarize.find(VF);
assert(Scalars != InstsToScalarize.end() &&		assert(Scalars != InstsToScalarize.end() &&
"VF not yet analyzed for scalarization profitability");		"VF not yet analyzed for scalarization profitability");
return Scalars->second.count(I);		return Scalars->second.count(I);
}		}

/// Returns true if \p I is known to be uniform after vectorization.		/// Returns true if \p I is known to be uniform after vectorization.
bool isUniformAfterVectorization(Instruction *I, unsigned VF) const {		bool isUniformAfterVectorization(Instruction *I, unsigned VF) const {
▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines	bool isOptimizableIVTruncate(Instruction *I, unsigned VF) {
Value *Op = Trunc->getOperand(0);		Value *Op = Trunc->getOperand(0);
if (Op != Legal->getPrimaryInduction() && TTI.isTruncateFree(SrcTy, DestTy))		if (Op != Legal->getPrimaryInduction() && TTI.isTruncateFree(SrcTy, DestTy))
return false;		return false;

// If the truncated value is not an induction variable, return false.		// If the truncated value is not an induction variable, return false.
return Legal->isInductionVariable(Op);		return Legal->isInductionVariable(Op);
}		}

		/// Collects the instructions to scalarize for each predicated instruction in
		/// the loop.
		void collectInstsToScalarize(unsigned VF);

		/// Collect Uniform and Scalar values for the given \p VF.
		/// The sets depend on CM decision for Load/Store instructions
		/// that may be vectorized as interleave, gather-scatter or scalarized.
		void collectUniformsAndScalars(unsigned VF) {
		// Do the analysis once.
		if (VF == 1 \|\| Uniforms.count(VF))
		return;
		setCostBasedWideningDecision(VF);
		collectLoopUniforms(VF);
		collectLoopScalars(VF);
		}

private:		private:
/// \return An upper bound for the vectorization factor, larger than zero.		/// \return An upper bound for the vectorization factor, larger than zero.
/// One is returned if vectorization should best be avoided due to cost.		/// One is returned if vectorization should best be avoided due to cost.
unsigned computeFeasibleMaxVF(bool OptForSize);		unsigned computeFeasibleMaxVF(bool OptForSize);

/// The vectorization cost is a combination of the cost itself and a boolean		/// The vectorization cost is a combination of the cost itself and a boolean
/// indicating whether any of the contributing operations will actually		/// indicating whether any of the contributing operations will actually
/// operate on		/// operate on
▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines	private:
/// Returns the expected difference in cost from scalarizing the expression		/// Returns the expected difference in cost from scalarizing the expression
/// feeding a predicated instruction \p PredInst. The instructions to		/// feeding a predicated instruction \p PredInst. The instructions to
/// scalarize and their scalar costs are collected in \p ScalarCosts. A		/// scalarize and their scalar costs are collected in \p ScalarCosts. A
/// non-negative return value implies the expression will be scalarized.		/// non-negative return value implies the expression will be scalarized.
/// Currently, only single-use chains are considered for scalarization.		/// Currently, only single-use chains are considered for scalarization.
int computePredInstDiscount(Instruction *PredInst, ScalarCostsTy &ScalarCosts,		int computePredInstDiscount(Instruction *PredInst, ScalarCostsTy &ScalarCosts,
unsigned VF);		unsigned VF);

/// Collects the instructions to scalarize for each predicated instruction in
/// the loop.
void collectInstsToScalarize(unsigned VF);

/// Collect the instructions that are uniform after vectorization. An		/// Collect the instructions that are uniform after vectorization. An
		mkuperUnsubmitted Done Reply Inline Actions As above, please group public/private members together. mkuper: As above, please group public/private members together.
/// instruction is uniform if we represent it with a single scalar value in		/// instruction is uniform if we represent it with a single scalar value in
/// the vectorized loop corresponding to each vector iteration. Examples of		/// the vectorized loop corresponding to each vector iteration. Examples of
/// uniform instructions include pointer operands of consecutive or		/// uniform instructions include pointer operands of consecutive or
/// interleaved memory accesses. Note that although uniformity implies an		/// interleaved memory accesses. Note that although uniformity implies an
/// instruction will be scalar, the reverse is not true. In general, a		/// instruction will be scalar, the reverse is not true. In general, a
/// scalarized instruction will be represented by VF scalar values in the		/// scalarized instruction will be represented by VF scalar values in the
/// vectorized loop, each corresponding to an iteration of the original		/// vectorized loop, each corresponding to an iteration of the original
/// scalar loop.		/// scalar loop.
void collectLoopUniforms(unsigned VF);		void collectLoopUniforms(unsigned VF);

/// Collect the instructions that are scalar after vectorization. An		/// Collect the instructions that are scalar after vectorization. An
/// instruction is scalar if it is known to be uniform or will be scalarized		/// instruction is scalar if it is known to be uniform or will be scalarized
/// during vectorization. Non-uniform scalarized instructions will be		/// during vectorization. Non-uniform scalarized instructions will be
/// represented by VF values in the vectorized loop, each corresponding to an		/// represented by VF values in the vectorized loop, each corresponding to an
/// iteration of the original scalar loop.		/// iteration of the original scalar loop.
void collectLoopScalars(unsigned VF);		void collectLoopScalars(unsigned VF);

/// Collect Uniform and Scalar values for the given \p VF.
/// The sets depend on CM decision for Load/Store instructions
/// that may be vectorized as interleave, gather-scatter or scalarized.
void collectUniformsAndScalars(unsigned VF) {
// Do the analysis once.
if (VF == 1 \|\| Uniforms.count(VF))
return;
setCostBasedWideningDecision(VF);
collectLoopUniforms(VF);
collectLoopScalars(VF);
}

/// Keeps cost model vectorization decision and cost for instructions.		/// Keeps cost model vectorization decision and cost for instructions.
/// Right now it is used for memory instructions only.		/// Right now it is used for memory instructions only.
typedef DenseMap<std::pair<Instruction *, unsigned>,		typedef DenseMap<std::pair<Instruction *, unsigned>,
std::pair<InstWidening, unsigned>>		std::pair<InstWidening, unsigned>>
DecisionList;		DecisionList;

DecisionList WideningDecisions;		DecisionList WideningDecisions;

Show All 21 Lines	public:
/// Loop Vectorize Hint.		/// Loop Vectorize Hint.
const LoopVectorizeHints *Hints;		const LoopVectorizeHints *Hints;
/// Values to ignore in the cost model.		/// Values to ignore in the cost model.
SmallPtrSet<const Value *, 16> ValuesToIgnore;		SmallPtrSet<const Value *, 16> ValuesToIgnore;
/// Values to ignore in the cost model when VF > 1.		/// Values to ignore in the cost model when VF > 1.
SmallPtrSet<const Value *, 16> VecValuesToIgnore;		SmallPtrSet<const Value *, 16> VecValuesToIgnore;
};		};

		} // end anonymous namespace

		namespace llvm {
		/// InnerLoopVectorizer vectorizes loops which contain only one basic
/// LoopVectorizationPlanner - drives the vectorization process after having		/// LoopVectorizationPlanner - drives the vectorization process after having
/// passed Legality checks.		/// passed Legality checks.
		/// The planner builds and optimizes the Vectorization Plans which record the
		/// decisions how to vectorize the given loop. In particular, represent the
		/// control-flow of the vectorized version, the replication of instructions that
		/// are to be scalarized, and interleave access groups.
class LoopVectorizationPlanner {		class LoopVectorizationPlanner {
		/// The loop that we evaluate.
		Loop *OrigLoop;

		/// Loop Info analysis.
		LoopInfo *LI;

		/// Target Library Info.
		const TargetLibraryInfo *TLI;

		/// Target Transform Info.
		const TargetTransformInfo *TTI;

		/// The legality analysis.
		LoopVectorizationLegality *Legal;

		/// The profitablity analysis.
		LoopVectorizationCostModel &CM;

		SmallVector<VPlan *, 4> VPlans;

		unsigned BestVF;
		unsigned BestUF;

public:		public:
LoopVectorizationPlanner(Loop OrigLoop, LoopInfo LI,		LoopVectorizationPlanner(Loop L, LoopInfo LI, const TargetLibraryInfo *TLI,
		const TargetTransformInfo *TTI,
LoopVectorizationLegality *Legal,		LoopVectorizationLegality *Legal,
LoopVectorizationCostModel &CM)		LoopVectorizationCostModel &CM)
: OrigLoop(OrigLoop), LI(LI), Legal(Legal), CM(CM) {}		: OrigLoop(L), LI(LI), TLI(TLI), TTI(TTI), Legal(Legal), CM(CM),
		BestVF(0), BestUF(0) {}

~LoopVectorizationPlanner() {}		~LoopVectorizationPlanner() {
		while (!VPlans.empty()) {
		VPlan *Plan = VPlans.back();
		VPlans.pop_back();
		rengolinUnsubmitted Done Reply Inline Actions Nit: Can you move the member variables to the top? Makes it easier to know what "VPlans" are, etc. rengolin: Nit: Can you move the member variables to the top? Makes it easier to know what "VPlans" are…
		delete Plan;
		}
		rengolinUnsubmitted Not Done Reply Inline Actions SmallVectorImpl should destroy its own range, you don't need to do it yourself. rengolin: SmallVectorImpl should destroy its own range, you don't need to do it yourself.
		AyalAuthorUnsubmitted Not Done Reply Inline Actions Leaving ~LoopVectorizationPlanner() {} does not seem to deallocate the VPlan objects created by buildVPlan(). Is there another way to prevent them from leaking? Ayal: Leaving ~LoopVectorizationPlanner() {} does not seem to deallocate the VPlan objects created by…
		mkuperUnsubmitted Not Done Reply Inline Actions A vector of unique_ptr, maybe? Anyway, I'm fine with this as is. mkuper: A vector of unique_ptr, maybe? Anyway, I'm fine with this as is.
		AyalAuthorUnsubmitted Not Done Reply Inline Actions Originally D28975 did use "shared_ptr<VPlan>". However using regular pointers to VPlan seems simpler, just need to destroy them correctly at the end. Ayal: Originally D28975 did use "shared_ptr<VPlan>". However using regular pointers to VPlan seems…
		}

/// Plan how to best vectorize, return the best VF and its cost.		/// Plan how to best vectorize, return the best VF and its cost.
LoopVectorizationCostModel::VectorizationFactor plan(bool OptForSize,		LoopVectorizationCostModel::VectorizationFactor plan(bool OptForSize,
unsigned UserVF);		unsigned UserVF);

/// Generate the IR code for the vectorized loop.		/// Finalize the best decision and dispose of all other VPlans.
void executePlan(InnerLoopVectorizer &ILV);		void setBestPlan(unsigned VF, unsigned UF);

		/// Generate the IR code for the body of the vectorized loop according to the
		/// best selected VPlan.
		void executePlan(InnerLoopVectorizer &LB, DominatorTree *DT);

		void printPlans(raw_ostream &O) {
		for (VPlan *Plan : VPlans)
		O << *Plan;
		}

protected:		protected:
/// Collect the instructions from the original loop that would be trivially		/// Collect the instructions from the original loop that would be trivially
/// dead in the vectorized loop if generated.		/// dead in the vectorized loop if generated.
void collectTriviallyDeadInstructions(		void collectTriviallyDeadInstructions(
SmallPtrSetImpl<Instruction *> &DeadInstructions);		SmallPtrSetImpl<Instruction *> &DeadInstructions);

private:		/// A range of powers-of-2 vectorization factors with fixed start and
/// The loop that we evaluate.		/// adjustable end. The range includes start and excludes end, e.g.,:
Loop *OrigLoop;		/// [1, 9) = {1, 2, 4, 8}
		struct VFRange {
/// Loop Info analysis.		const unsigned Start; // A power of 2.
LoopInfo *LI;		unsigned End; // Need not be a power of 2. If End <= Start range is empty.
		rengolinUnsubmitted Done Reply Inline Actions This POD is so small that it's worth passing it by value most of the time. If you need to change it, it's also worth passing it by reference as a whole. rengolin: This POD is so small that it's worth passing it by value most of the time. If you need to…
		AyalAuthorUnsubmitted Not Done Reply Inline Actions Holding a constant Start plus non-constant End, and passing the struct by reference wherever End may be modified, instead of holding an unsigned Start and an unsigned reference End. Ayal: Holding a constant Start plus non-constant End, and passing the struct by reference wherever…
		};

/// The legality analysis.		/// Test a \p Predicate on a \p Range of VF's. Return the value of applying
LoopVectorizationLegality *Legal;		/// \p Predicate on Range.Start, possibly decreasing Range.End such that the
		/// returned value holds for the entire \p Range.
		bool getDecisionAndClampRange(const std::function<bool(unsigned)> &Predicate,
		mkuperUnsubmitted Not Done Reply Inline Actions This seems like a somewhat odd API. Why do we only support cutting from the top, and not from the bottom or the middle? Do we expect to only ever prune from the top? Otherwise, I'd expect the range to be represented as a set, rather than an interval, and use a filter over that set. The current state may be fine for simplicity's sake, but i'd like to understand this better. Regardless, please rename the method. It's really surprising that a bool test...() modifies one of its arguments. mkuper: This seems like a somewhat odd API. Why do we only support cutting from the top, and not from…
		AyalAuthorUnsubmitted Not Done Reply Inline Actions Indeed a more general implementation of building VPlans could represent arbitrary sets of VF's rather than "Ranges" or intervals of VF's, which this patch uses. VPlans themselves are not confined to represent only ranges. This implementation builds VPlans for the full {1,2,4,8,...,MaxVF} range of feasible VF's by repeatedly building a VPlan starting from a given VF up until the maximum VF possible. Each vectorization decision can potentially reduce this maximum. The two extremes we could end up with are: one VPlan for all VF's, and one VPlan for each VF. Decisions hopefully exhibit this form of continuity, but they certainly don't have to. Will see if above should be added as a comment to buildVPlans(). Some alternative names we came up with: testAndClampVFRange() prefixTestVFRange() VFRangePrefixTest() testStartAndSetEndVF() any one of these looks ok? Better suggestions are welcome. Ayal: Indeed a more general implementation of building VPlans could represent arbitrary sets of VF's…
		mkuperUnsubmitted Not Done Reply Inline Actions Ok, this still sounds odd, but we can iterate over this in-tree in the future. Re naming, testAndClampVFRange() sounds better, I think, but I'm good with anything that indicates this changes the range. mkuper: Ok, this still sounds odd, but we can iterate over this in-tree in the future. Re naming…
		VFRange &Range);

		/// Build VPlans for power-of-2 VF's between \p MinVF and \p MaxVF inclusive,
		/// according to the information gathered by Legal when it checked if it is
		/// legal to vectorize the loop.
		void buildVPlans(unsigned MinVF, unsigned MaxVF);

/// The profitablity analysis.		private:
LoopVectorizationCostModel &CM;		/// Check if \I belongs to an Interleave Group within the given VF \p Range,
		/// \return true in the first returned value if so and false otherwise.
		/// Build a new VPInterleaveGroup Recipe if \I is the primary member of an IG
		/// for \p Range.Start, and provide it as the second returned value.
		/// Note that if \I is an adjunct member of an IG for \p Range.Start, the
		/// \return value is <true, nullptr>, as it is handled by another recipe.
		/// \p Range.End may be decreased to ensure same decision from \p Range.Start
		/// to \p Range.End.
		VPInterleaveRecipe tryToInterleaveMemory(Instruction I, VFRange &Range);

		/// Check if an induction recipe should be constructed for \I within the given
		/// VF \p Range. If so build and return it. If not, return null. \p Range.End
		/// may be decreased to ensure same decision from \p Range.Start to
		/// \p Range.End.
		VPWidenIntOrFpInductionRecipe tryToOptimizeInduction(Instruction I,
		VFRange &Range);

		/// Check if \I can be widened within the given VF \p Range. If \I can be
		/// widened for Range.Start, extend \p LastWidenRecipe to include \p I if
		/// possible or else build a new VPWidenRecipe for it, and return the
		/// VPWidenRecipe that includes \p I. If \p I cannot be widened for
		/// Range.Start \return null. Range.End may be decreased to ensure same
		/// decision from \p Range.Start to \p Range.End.
		VPWidenRecipe tryToWiden(Instruction I, VPWidenRecipe *LastWidenRecipe,
		VFRange &Range);

		/// Build a VPReplicationRecipe for \p I and enclose it within a Region if it
		/// is predicated. \return \p VPBB augmented with this new recipe if \p I is
		/// not predicated, otherwise \return a new VPBasicBlock that succeeds the new
		/// Region. Update the packing decision of predicated instructions if they
		/// feed \p I. Range.End may be decreased to ensure same recipe behavior from
		/// \p Range.Start to \p Range.End.
		VPBasicBlock *handleReplication(
		Instruction I, VFRange &Range, VPBasicBlock VPBB,
		DenseMap<Instruction , VPReplicateRecipe > &PredInst2Recipe);

		/// Create a replicating region for instruction \p I that requires
		/// predication. \p PredRecipe is a VPReplicateRecipe holding \p I.
		VPRegionBlock createReplicateRegion(Instruction I,
		VPRecipeBase *PredRecipe);

		/// Build a VPlan according to the information gathered by Legal. \return a
		/// VPlan for vectorization factors \p Range.Start and up to \p Range.End
		/// exclusive, possibly decreasing \p Range.End.
		VPlan *buildVPlan(VFRange &Range);
};		};

		} // namespace llvm

		namespace {

/// \brief This holds vectorization requirements that must be verified late in		/// \brief This holds vectorization requirements that must be verified late in
/// the process. The requirements are set by legalize and costmodel. Once		/// the process. The requirements are set by legalize and costmodel. Once
/// vectorization has been determined to be possible and profitable the		/// vectorization has been determined to be possible and profitable the
/// requirements can be verified by looking for metadata or compiler options.		/// requirements can be verified by looking for metadata or compiler options.
/// For example, some loops require FP commutativity which is only allowed if		/// For example, some loops require FP commutativity which is only allowed if
/// vectorization is explicitly specified or if the fast-math compiler option		/// vectorization is explicitly specified or if the fast-math compiler option
/// has been provided.		/// has been provided.
/// Late evaluation of these requirements allows helpful diagnostics to be		/// Late evaluation of these requirements allows helpful diagnostics to be
▲ Show 20 Lines • Show All 404 Lines • ▼ Show 20 Lines	if (ScalarIVTy->isIntegerTy()) {
AddOp = ID.getInductionOpcode();		AddOp = ID.getInductionOpcode();
MulOp = Instruction::FMul;		MulOp = Instruction::FMul;
}		}

// Determine the number of scalars we need to generate for each unroll		// Determine the number of scalars we need to generate for each unroll
// iteration. If EntryVal is uniform, we only need to generate the first		// iteration. If EntryVal is uniform, we only need to generate the first
// lane. Otherwise, we generate all VF values.		// lane. Otherwise, we generate all VF values.
unsigned Lanes =		unsigned Lanes =
Cost->isUniformAfterVectorization(cast<Instruction>(EntryVal), VF) ? 1 : VF;		Cost->isUniformAfterVectorization(cast<Instruction>(EntryVal), VF) ? 1
		: VF;
// Compute the scalar steps and save the results in VectorLoopValueMap.		// Compute the scalar steps and save the results in VectorLoopValueMap.
for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
for (unsigned Lane = 0; Lane < Lanes; ++Lane) {		for (unsigned Lane = 0; Lane < Lanes; ++Lane) {
auto StartIdx = getSignedIntOrFpConstant(ScalarIVTy, VF Part + Lane);		auto StartIdx = getSignedIntOrFpConstant(ScalarIVTy, VF Part + Lane);
auto *Mul = addFastMathFlag(Builder.CreateBinOp(MulOp, StartIdx, Step));		auto *Mul = addFastMathFlag(Builder.CreateBinOp(MulOp, StartIdx, Step));
auto *Add = addFastMathFlag(Builder.CreateBinOp(AddOp, ScalarIV, Mul));		auto *Add = addFastMathFlag(Builder.CreateBinOp(AddOp, ScalarIV, Mul));
VectorLoopValueMap.setScalarValue(EntryVal, Part, Lane, Add);		VectorLoopValueMap.setScalarValue(EntryVal, {Part, Lane}, Add);
}		}
}		}
}		}

int LoopVectorizationLegality::isConsecutivePtr(Value *Ptr) {		int LoopVectorizationLegality::isConsecutivePtr(Value *Ptr) {

const ValueToValueMap &Strides = getSymbolicStrides() ? *getSymbolicStrides() :		const ValueToValueMap &Strides = getSymbolicStrides() ? *getSymbolicStrides() :
ValueToValueMap();		ValueToValueMap();
Show All 21 Lines	Value InnerLoopVectorizer::getOrCreateVectorValue(Value V, unsigned Part) {
if (VectorLoopValueMap.hasVectorValue(V, Part))		if (VectorLoopValueMap.hasVectorValue(V, Part))
return VectorLoopValueMap.getVectorValue(V, Part);		return VectorLoopValueMap.getVectorValue(V, Part);

// If the value has not been vectorized, check if it has been scalarized		// If the value has not been vectorized, check if it has been scalarized
// instead. If it has been scalarized, and we actually need the value in		// instead. If it has been scalarized, and we actually need the value in
// vector form, we will construct the vector values on demand.		// vector form, we will construct the vector values on demand.
if (VectorLoopValueMap.hasAnyScalarValue(V)) {		if (VectorLoopValueMap.hasAnyScalarValue(V)) {

Value *ScalarValue = VectorLoopValueMap.getScalarValue(V, Part, 0);		Value *ScalarValue = VectorLoopValueMap.getScalarValue(V, {Part, 0});

// If we've scalarized a value, that value should be an instruction.		// If we've scalarized a value, that value should be an instruction.
auto *I = cast<Instruction>(V);		auto *I = cast<Instruction>(V);

// If we aren't vectorizing, we can just copy the scalar map values over to		// If we aren't vectorizing, we can just copy the scalar map values over to
// the vector map.		// the vector map.
if (VF == 1) {		if (VF == 1) {
VectorLoopValueMap.setVectorValue(V, Part, ScalarValue);		VectorLoopValueMap.setVectorValue(V, Part, ScalarValue);
return ScalarValue;		return ScalarValue;
}		}

// Get the last scalar instruction we generated for V and Part. If the value		// Get the last scalar instruction we generated for V and Part. If the value
// is known to be uniform after vectorization, this corresponds to lane zero		// is known to be uniform after vectorization, this corresponds to lane zero
// of the Part unroll iteration. Otherwise, the last instruction is the one		// of the Part unroll iteration. Otherwise, the last instruction is the one
// we created for the last vector lane of the Part unroll iteration.		// we created for the last vector lane of the Part unroll iteration.
unsigned LastLane = Cost->isUniformAfterVectorization(I, VF) ? 0 : VF - 1;		unsigned LastLane = Cost->isUniformAfterVectorization(I, VF) ? 0 : VF - 1;
auto *LastInst =		auto *LastInst = cast<Instruction>(
cast<Instruction>(VectorLoopValueMap.getScalarValue(V, Part, LastLane));		VectorLoopValueMap.getScalarValue(V, {Part, LastLane}));

// Set the insert point after the last scalarized instruction. This ensures		// Set the insert point after the last scalarized instruction. This ensures
// the insertelement sequence will directly follow the scalar definitions.		// the insertelement sequence will directly follow the scalar definitions.
auto OldIP = Builder.saveIP();		auto OldIP = Builder.saveIP();
auto NewIP = std::next(BasicBlock::iterator(LastInst));		auto NewIP = std::next(BasicBlock::iterator(LastInst));
Builder.SetInsertPoint(&*NewIP);		Builder.SetInsertPoint(&*NewIP);

// However, if we are vectorizing, we need to construct the vector values.		// However, if we are vectorizing, we need to construct the vector values.
// If the value is known to be uniform after vectorization, we can just		// If the value is known to be uniform after vectorization, we can just
// broadcast the scalar value corresponding to lane zero for each unroll		// broadcast the scalar value corresponding to lane zero for each unroll
// iteration. Otherwise, we construct the vector values using insertelement		// iteration. Otherwise, we construct the vector values using insertelement
// instructions. Since the resulting vectors are stored in		// instructions. Since the resulting vectors are stored in
// VectorLoopValueMap, we will only generate the insertelements once.		// VectorLoopValueMap, we will only generate the insertelements once.
Value *VectorValue = nullptr;		Value *VectorValue = nullptr;
if (Cost->isUniformAfterVectorization(I, VF)) {		if (Cost->isUniformAfterVectorization(I, VF)) {
VectorValue = getBroadcastInstrs(ScalarValue);		VectorValue = getBroadcastInstrs(ScalarValue);
		VectorLoopValueMap.setVectorValue(V, Part, VectorValue);
} else {		} else {
VectorValue = UndefValue::get(VectorType::get(V->getType(), VF));		// Initialize packing with insertelements to start from undef.
		Value *Undef = UndefValue::get(VectorType::get(V->getType(), VF));
		VectorLoopValueMap.setVectorValue(V, Part, Undef);
for (unsigned Lane = 0; Lane < VF; ++Lane)		for (unsigned Lane = 0; Lane < VF; ++Lane)
VectorValue = Builder.CreateInsertElement(		packScalarIntoVectorValue(V, {Part, Lane});
VectorValue, getOrCreateScalarValue(V, Part, Lane),		VectorValue = VectorLoopValueMap.getVectorValue(V, Part);
Builder.getInt32(Lane));
}		}
VectorLoopValueMap.setVectorValue(V, Part, VectorValue);
Builder.restoreIP(OldIP);		Builder.restoreIP(OldIP);
return VectorValue;		return VectorValue;
}		}

// If this scalar is unknown, assume that it is a constant or that it is		// If this scalar is unknown, assume that it is a constant or that it is
// loop invariant. Broadcast V and save the value for future uses.		// loop invariant. Broadcast V and save the value for future uses.
Value *B = getBroadcastInstrs(V);		Value *B = getBroadcastInstrs(V);
VectorLoopValueMap.setVectorValue(V, Part, B);		VectorLoopValueMap.setVectorValue(V, Part, B);
return B;		return B;
}		}

Value InnerLoopVectorizer::getOrCreateScalarValue(Value V, unsigned Part,		Value *
unsigned Lane) {		InnerLoopVectorizer::getOrCreateScalarValue(Value *V,
		const VPIteration &Instance) {
// If the value is not an instruction contained in the loop, it should		// If the value is not an instruction contained in the loop, it should
// already be scalar.		// already be scalar.
if (OrigLoop->isLoopInvariant(V))		if (OrigLoop->isLoopInvariant(V))
return V;		return V;

assert(Lane > 0 ? !Cost->isUniformAfterVectorization(cast<Instruction>(V), VF)		assert(Instance.Lane > 0
		? !Cost->isUniformAfterVectorization(cast<Instruction>(V), VF)
: true && "Uniform values only have lane zero");		: true && "Uniform values only have lane zero");

// If the value from the original loop has not been vectorized, it is		// If the value from the original loop has not been vectorized, it is
// represented by UF x VF scalar values in the new loop. Return the requested		// represented by UF x VF scalar values in the new loop. Return the requested
// scalar value.		// scalar value.
if (VectorLoopValueMap.hasScalarValue(V, Part, Lane))		if (VectorLoopValueMap.hasScalarValue(V, Instance))
return VectorLoopValueMap.getScalarValue(V, Part, Lane);		return VectorLoopValueMap.getScalarValue(V, Instance);

// If the value has not been scalarized, get its entry in VectorLoopValueMap		// If the value has not been scalarized, get its entry in VectorLoopValueMap
// for the given unroll part. If this entry is not a vector type (i.e., the		// for the given unroll part. If this entry is not a vector type (i.e., the
// vectorization factor is one), there is no need to generate an		// vectorization factor is one), there is no need to generate an
// extractelement instruction.		// extractelement instruction.
auto *U = getOrCreateVectorValue(V, Part);		auto *U = getOrCreateVectorValue(V, Instance.Part);
if (!U->getType()->isVectorTy()) {		if (!U->getType()->isVectorTy()) {
assert(VF == 1 && "Value not scalarized has non-vector type");		assert(VF == 1 && "Value not scalarized has non-vector type");
return U;		return U;
}		}

// Otherwise, the value from the original loop has been vectorized and is		// Otherwise, the value from the original loop has been vectorized and is
// represented by UF vector values. Extract and return the requested scalar		// represented by UF vector values. Extract and return the requested scalar
// value from the appropriate vector lane.		// value from the appropriate vector lane.
return Builder.CreateExtractElement(U, Builder.getInt32(Lane));		return Builder.CreateExtractElement(U, Builder.getInt32(Instance.Lane));
		}

		void InnerLoopVectorizer::packScalarIntoVectorValue(
		Value *V, const VPIteration &Instance) {
		assert(V != Induction && "The new induction variable should not be used.");
		assert(!V->getType()->isVectorTy() && "Can't pack a vector");
		assert(!V->getType()->isVoidTy() && "Type does not produce a value");

		Value *ScalarInst = VectorLoopValueMap.getScalarValue(V, Instance);
		Value *VectorValue = VectorLoopValueMap.getVectorValue(V, Instance.Part);
		VectorValue = Builder.CreateInsertElement(VectorValue, ScalarInst,
		Builder.getInt32(Instance.Lane));
		VectorLoopValueMap.resetVectorValue(V, Instance.Part, VectorValue);
}		}

Value InnerLoopVectorizer::reverseVector(Value Vec) {		Value InnerLoopVectorizer::reverseVector(Value Vec) {
assert(Vec->getType()->isVectorTy() && "Invalid type");		assert(Vec->getType()->isVectorTy() && "Invalid type");
SmallVector<Constant *, 8> ShuffleMask;		SmallVector<Constant *, 8> ShuffleMask;
for (unsigned i = 0; i < VF; ++i)		for (unsigned i = 0; i < VF; ++i)
ShuffleMask.push_back(Builder.getInt32(VF - i - 1));		ShuffleMask.push_back(Builder.getInt32(VF - i - 1));

▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::vectorizeInterleaveGroup(Instruction *Instr) {
// rather than directly getting the pointer for lane VF - 1, because the		// rather than directly getting the pointer for lane VF - 1, because the
// pointer operand of the interleaved access is supposed to be uniform. For		// pointer operand of the interleaved access is supposed to be uniform. For
// uniform instructions, we're only required to generate a value for the		// uniform instructions, we're only required to generate a value for the
// first vector lane in each unroll iteration.		// first vector lane in each unroll iteration.
if (Group->isReverse())		if (Group->isReverse())
Index += (VF - 1) * Group->getFactor();		Index += (VF - 1) * Group->getFactor();

for (unsigned Part = 0; Part < UF; Part++) {		for (unsigned Part = 0; Part < UF; Part++) {
Value *NewPtr = getOrCreateScalarValue(Ptr, Part, 0);		Value *NewPtr = getOrCreateScalarValue(Ptr, {Part, 0});

// Notice current instruction could be any index. Need to adjust the address		// Notice current instruction could be any index. Need to adjust the address
// to the member of index 0.		// to the member of index 0.
//		//
// E.g. a = A[i+1]; // Member of index 1 (Current instruction)		// E.g. a = A[i+1]; // Member of index 1 (Current instruction)
// b = A[i]; // Member of index 0		// b = A[i]; // Member of index 0
// Current pointer is pointed to A[i+1], adjust it to A[i].		// Current pointer is pointed to A[i+1], adjust it to A[i].
//		//
▲ Show 20 Lines • Show All 109 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::vectorizeMemoryInstruction(Instruction *Instr) {
unsigned Alignment = getMemInstAlignment(Instr);		unsigned Alignment = getMemInstAlignment(Instr);
// An alignment of 0 means target abi alignment. We need to use the scalar's		// An alignment of 0 means target abi alignment. We need to use the scalar's
// target abi alignment in such a case.		// target abi alignment in such a case.
const DataLayout &DL = Instr->getModule()->getDataLayout();		const DataLayout &DL = Instr->getModule()->getDataLayout();
if (!Alignment)		if (!Alignment)
Alignment = DL.getABITypeAlignment(ScalarDataTy);		Alignment = DL.getABITypeAlignment(ScalarDataTy);
unsigned AddressSpace = getMemInstAddressSpace(Instr);		unsigned AddressSpace = getMemInstAddressSpace(Instr);

// Scalarize the memory instruction if necessary.
if (Decision == LoopVectorizationCostModel::CM_Scalarize)
return scalarizeInstruction(Instr, Legal->isScalarWithPredication(Instr));

// Determine if the pointer operand of the access is either consecutive or		// Determine if the pointer operand of the access is either consecutive or
// reverse consecutive.		// reverse consecutive.
int ConsecutiveStride = Legal->isConsecutivePtr(Ptr);		int ConsecutiveStride = Legal->isConsecutivePtr(Ptr);
bool Reverse = ConsecutiveStride < 0;		bool Reverse = ConsecutiveStride < 0;
bool CreateGatherScatter =		bool CreateGatherScatter =
(Decision == LoopVectorizationCostModel::CM_GatherScatter);		(Decision == LoopVectorizationCostModel::CM_GatherScatter);

// Either Ptr feeds a vector load/store, or a vector GEP should feed a vector		// Either Ptr feeds a vector load/store, or a vector GEP should feed a vector
// gather/scatter. Otherwise Decision should have been to Scalarize.		// gather/scatter. Otherwise Decision should have been to Scalarize.
assert((ConsecutiveStride \|\| CreateGatherScatter) &&		assert((ConsecutiveStride \|\| CreateGatherScatter) &&
"The instruction should be scalarized");		"The instruction should be scalarized");

// Handle consecutive loads/stores.		// Handle consecutive loads/stores.
if (ConsecutiveStride)		if (ConsecutiveStride)
Ptr = getOrCreateScalarValue(Ptr, 0, 0);		Ptr = getOrCreateScalarValue(Ptr, {0, 0});

VectorParts Mask = createBlockInMask(Instr->getParent());		VectorParts Mask = createBlockInMask(Instr->getParent());
// Handle Stores:		// Handle Stores:
if (SI) {		if (SI) {
assert(!Legal->isUniform(SI->getPointerOperand()) &&		assert(!Legal->isUniform(SI->getPointerOperand()) &&
"We do not allow storing to uniform addresses");		"We do not allow storing to uniform addresses");
setDebugLocFromInst(Builder, SI);		setDebugLocFromInst(Builder, SI);

▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	if (CreateGatherScatter) {
if (Reverse)		if (Reverse)
NewLI = reverseVector(NewLI);		NewLI = reverseVector(NewLI);
}		}
VectorLoopValueMap.setVectorValue(Instr, Part, NewLI);		VectorLoopValueMap.setVectorValue(Instr, Part, NewLI);
}		}
}		}

void InnerLoopVectorizer::scalarizeInstruction(Instruction *Instr,		void InnerLoopVectorizer::scalarizeInstruction(Instruction *Instr,
		const VPIteration &Instance,
bool IfPredicateInstr) {		bool IfPredicateInstr) {
assert(!Instr->getType()->isAggregateType() && "Can't handle vectors");		assert(!Instr->getType()->isAggregateType() && "Can't handle vectors");
DEBUG(dbgs() << "LV: Scalarizing"
<< (IfPredicateInstr ? " and predicating:" : ":") << *Instr
<< '\n');
// Holds vector parameters or scalars, in case of uniform vals.
SmallVector<VectorParts, 4> Params;

setDebugLocFromInst(Builder, Instr);		setDebugLocFromInst(Builder, Instr);

// Does this instruction return a value ?		// Does this instruction return a value ?
bool IsVoidRetTy = Instr->getType()->isVoidTy();		bool IsVoidRetTy = Instr->getType()->isVoidTy();

VectorParts Cond;
if (IfPredicateInstr)
Cond = createBlockInMask(Instr->getParent());

// Determine the number of scalars we need to generate for each unroll
// iteration. If the instruction is uniform, we only need to generate the
// first lane. Otherwise, we generate all VF values.
unsigned Lanes = Cost->isUniformAfterVectorization(Instr, VF) ? 1 : VF;

// For each vector unroll 'part':
for (unsigned Part = 0; Part < UF; ++Part) {
// For each scalar that we create:
for (unsigned Lane = 0; Lane < Lanes; ++Lane) {

// Start if-block.
Value *Cmp = nullptr;
if (IfPredicateInstr) {
Cmp = Cond[Part];
if (!Cmp) // Block in mask is all-one.
Cmp = Builder.getTrue();
else if (Cmp->getType()->isVectorTy())
Cmp = Builder.CreateExtractElement(Cmp, Builder.getInt32(Lane));
}

Instruction *Cloned = Instr->clone();		Instruction *Cloned = Instr->clone();
if (!IsVoidRetTy)		if (!IsVoidRetTy)
Cloned->setName(Instr->getName() + ".cloned");		Cloned->setName(Instr->getName() + ".cloned");

// Replace the operands of the cloned instructions with their scalar		// Replace the operands of the cloned instructions with their scalar
// equivalents in the new loop.		// equivalents in the new loop.
for (unsigned op = 0, e = Instr->getNumOperands(); op != e; ++op) {		for (unsigned op = 0, e = Instr->getNumOperands(); op != e; ++op) {
auto *NewOp = getOrCreateScalarValue(Instr->getOperand(op), Part, Lane);		auto *NewOp = getOrCreateScalarValue(Instr->getOperand(op), Instance);
Cloned->setOperand(op, NewOp);		Cloned->setOperand(op, NewOp);
}		}
addNewMetadata(Cloned, Instr);		addNewMetadata(Cloned, Instr);

// Place the cloned scalar in the new loop.		// Place the cloned scalar in the new loop.
Builder.Insert(Cloned);		Builder.Insert(Cloned);

// Add the cloned scalar to the scalar map entry.		// Add the cloned scalar to the scalar map entry.
VectorLoopValueMap.setScalarValue(Instr, Part, Lane, Cloned);		VectorLoopValueMap.setScalarValue(Instr, Instance, Cloned);

// If we just cloned a new assumption, add it the assumption cache.		// If we just cloned a new assumption, add it the assumption cache.
if (auto *II = dyn_cast<IntrinsicInst>(Cloned))		if (auto *II = dyn_cast<IntrinsicInst>(Cloned))
if (II->getIntrinsicID() == Intrinsic::assume)		if (II->getIntrinsicID() == Intrinsic::assume)
AC->registerAssumption(II);		AC->registerAssumption(II);

// End if-block.		// End if-block.
if (IfPredicateInstr)		if (IfPredicateInstr)
PredicatedInstructions.push_back(std::make_pair(Cloned, Cmp));		PredicatedInstructions.push_back(Cloned);
}
}
}		}

PHINode InnerLoopVectorizer::createInductionVariable(Loop L, Value *Start,		PHINode InnerLoopVectorizer::createInductionVariable(Loop L, Value *Start,
Value End, Value Step,		Value End, Value Step,
Instruction *DL) {		Instruction *DL) {
BasicBlock *Header = L->getHeader();		BasicBlock *Header = L->getHeader();
BasicBlock *Latch = L->getLoopLatch();		BasicBlock *Latch = L->getLoopLatch();
// As we're just creating this loop, it's possible no latch exists		// As we're just creating this loop, it's possible no latch exists
▲ Show 20 Lines • Show All 187 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::emitMemRuntimeChecks(Loop L, BasicBlock Bypass) {

// We currently don't use LoopVersioning for the actual loop cloning but we		// We currently don't use LoopVersioning for the actual loop cloning but we
// still use it to add the noalias metadata.		// still use it to add the noalias metadata.
LVer = llvm::make_unique<LoopVersioning>(*Legal->getLAI(), OrigLoop, LI, DT,		LVer = llvm::make_unique<LoopVersioning>(*Legal->getLAI(), OrigLoop, LI, DT,
PSE.getSE());		PSE.getSE());
LVer->prepareNoAliasMetadata();		LVer->prepareNoAliasMetadata();
}		}

void InnerLoopVectorizer::createVectorizedLoopSkeleton() {		BasicBlock *InnerLoopVectorizer::createVectorizedLoopSkeleton() {
/*		/*
In this function we generate a new loop. The new loop will contain		In this function we generate a new loop. The new loop will contain
the vectorized instructions while the old loop will continue to run the		the vectorized instructions while the old loop will continue to run the
scalar remainder.		scalar remainder.

[ ] <-- loop iteration number check.		[ ] <-- loop iteration number check.
/ \|		/ \|
/ v		/ v
▲ Show 20 Lines • Show All 164 Lines • ▼ Show 20 Lines	BasicBlock *InnerLoopVectorizer::createVectorizedLoopSkeleton() {

// Keep all loop hints from the original loop on the vector loop (we'll		// Keep all loop hints from the original loop on the vector loop (we'll
// replace the vectorizer-specific hints below).		// replace the vectorizer-specific hints below).
if (MDNode *LID = OrigLoop->getLoopID())		if (MDNode *LID = OrigLoop->getLoopID())
Lp->setLoopID(LID);		Lp->setLoopID(LID);

LoopVectorizeHints Hints(Lp, true, *ORE);		LoopVectorizeHints Hints(Lp, true, *ORE);
Hints.setAlreadyVectorized();		Hints.setAlreadyVectorized();

		return LoopVectorPreHeader;
}		}

// Fix up external users of the induction variable. At this point, we are		// Fix up external users of the induction variable. At this point, we are
// in LCSSA form, with all external PHIs that use the IV having one input value,		// in LCSSA form, with all external PHIs that use the IV having one input value,
// coming from the remainder loop. We need those PHIs to also have a correct		// coming from the remainder loop. We need those PHIs to also have a correct
// value for the IV when arriving directly from the middle block.		// value for the IV when arriving directly from the middle block.
void InnerLoopVectorizer::fixupIVUsers(PHINode *OrigPhi,		void InnerLoopVectorizer::fixupIVUsers(PHINode *OrigPhi,
const InductionDescriptor &II,		const InductionDescriptor &II,
▲ Show 20 Lines • Show All 357 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::fixVectorizedLoop() {

// Fix-up external users of the induction variables.		// Fix-up external users of the induction variables.
for (auto &Entry : *Legal->getInductionVars())		for (auto &Entry : *Legal->getInductionVars())
fixupIVUsers(Entry.first, Entry.second,		fixupIVUsers(Entry.first, Entry.second,
getOrCreateVectorTripCount(LI->getLoopFor(LoopVectorBody)),		getOrCreateVectorTripCount(LI->getLoopFor(LoopVectorBody)),
IVEndValues[Entry.first], LoopMiddleBlock);		IVEndValues[Entry.first], LoopMiddleBlock);

fixLCSSAPHIs();		fixLCSSAPHIs();
predicateInstructions();		for (Instruction *PI : PredicatedInstructions)
		sinkScalarOperands(&*PI);

// Remove redundant induction instructions.		// Remove redundant induction instructions.
cse(LoopVectorBody);		cse(LoopVectorBody);
}		}

void InnerLoopVectorizer::fixCrossIterationPHIs() {		void InnerLoopVectorizer::fixCrossIterationPHIs() {
// In order to support recurrences we need to be able to vectorize Phi nodes.		// In order to support recurrences we need to be able to vectorize Phi nodes.
// Phi nodes have cycles, so we need to vectorize them in two stages. This is		// Phi nodes have cycles, so we need to vectorize them in two stages. This is
▲ Show 20 Lines • Show All 438 Lines • ▼ Show 20 Lines	while (!Worklist.empty()) {

// The sinking may have enabled other instructions to be sunk, so we will		// The sinking may have enabled other instructions to be sunk, so we will
// need to iterate.		// need to iterate.
Changed = true;		Changed = true;
}		}
} while (Changed);		} while (Changed);
}		}

void InnerLoopVectorizer::predicateInstructions() {

// For each instruction I marked for predication on value C, split I into its
// own basic block to form an if-then construct over C. Since I may be fed by
// an extractelement instruction or other scalar operand, we try to
// iteratively sink its scalar operands into the predicated block. If I feeds
// an insertelement instruction, we try to move this instruction into the
// predicated block as well. For non-void types, a phi node will be created
// for the resulting value (either vector or scalar).
//
// So for some predicated instruction, e.g. the conditional sdiv in:
//
// for.body:
// ...
// %add = add nsw i32 %mul, %0
// %cmp5 = icmp sgt i32 %2, 7
// br i1 %cmp5, label %if.then, label %if.end
//
// if.then:
// %div = sdiv i32 %0, %1
// br label %if.end
//
// if.end:
// %x.0 = phi i32 [ %div, %if.then ], [ %add, %for.body ]
//
// the sdiv at this point is scalarized and if-converted using a select.
// The inactive elements in the vector are not used, but the predicated
// instruction is still executed for all vector elements, essentially:
//
// vector.body:
// ...
// %17 = add nsw <2 x i32> %16, %wide.load
// %29 = extractelement <2 x i32> %wide.load, i32 0
// %30 = extractelement <2 x i32> %wide.load51, i32 0
// %31 = sdiv i32 %29, %30
// %32 = insertelement <2 x i32> undef, i32 %31, i32 0
// %35 = extractelement <2 x i32> %wide.load, i32 1
// %36 = extractelement <2 x i32> %wide.load51, i32 1
// %37 = sdiv i32 %35, %36
// %38 = insertelement <2 x i32> %32, i32 %37, i32 1
// %predphi = select <2 x i1> %26, <2 x i32> %38, <2 x i32> %17
//
// Predication will now re-introduce the original control flow to avoid false
// side-effects by the sdiv instructions on the inactive elements, yielding
// (after cleanup):
//
// vector.body:
// ...
// %5 = add nsw <2 x i32> %4, %wide.load
// %8 = icmp sgt <2 x i32> %wide.load52, <i32 7, i32 7>
// %9 = extractelement <2 x i1> %8, i32 0
// br i1 %9, label %pred.sdiv.if, label %pred.sdiv.continue
//
// pred.sdiv.if:
// %10 = extractelement <2 x i32> %wide.load, i32 0
// %11 = extractelement <2 x i32> %wide.load51, i32 0
// %12 = sdiv i32 %10, %11
// %13 = insertelement <2 x i32> undef, i32 %12, i32 0
// br label %pred.sdiv.continue
//
// pred.sdiv.continue:
// %14 = phi <2 x i32> [ undef, %vector.body ], [ %13, %pred.sdiv.if ]
// %15 = extractelement <2 x i1> %8, i32 1
// br i1 %15, label %pred.sdiv.if54, label %pred.sdiv.continue55
//
// pred.sdiv.if54:
// %16 = extractelement <2 x i32> %wide.load, i32 1
// %17 = extractelement <2 x i32> %wide.load51, i32 1
// %18 = sdiv i32 %16, %17
// %19 = insertelement <2 x i32> %14, i32 %18, i32 1
// br label %pred.sdiv.continue55
//
// pred.sdiv.continue55:
// %20 = phi <2 x i32> [ %14, %pred.sdiv.continue ], [ %19, %pred.sdiv.if54 ]
// %predphi = select <2 x i1> %8, <2 x i32> %20, <2 x i32> %5

for (auto KV : PredicatedInstructions) {
BasicBlock::iterator I(KV.first);
BasicBlock *Head = I->getParent();
auto T = SplitBlockAndInsertIfThen(KV.second, &I, /Unreachable=/false,
/BranchWeights=/nullptr, DT, LI);
I->moveBefore(T);
sinkScalarOperands(&*I);

BasicBlock *PredicatedBlock = I->getParent();
Twine BBNamePrefix = Twine("pred.") + I->getOpcodeName();
PredicatedBlock->setName(BBNamePrefix + ".if");
PredicatedBlock->getSingleSuccessor()->setName(BBNamePrefix + ".continue");

// If the instruction is non-void create a Phi node at reconvergence point.
if (!I->getType()->isVoidTy()) {
Value *IncomingTrue = nullptr;
Value *IncomingFalse = nullptr;

if (I->hasOneUse() && isa<InsertElementInst>(*I->user_begin())) {
// If the predicated instruction is feeding an insert-element, move it
// into the Then block; Phi node will be created for the vector.
InsertElementInst IEI = cast<InsertElementInst>(I->user_begin());
IEI->moveBefore(T);
IncomingTrue = IEI; // the new vector with the inserted element.
IncomingFalse = IEI->getOperand(0); // the unmodified vector
} else {
// Phi node will be created for the scalar predicated instruction.
IncomingTrue = &*I;
IncomingFalse = UndefValue::get(I->getType());
}

BasicBlock *PostDom = I->getParent()->getSingleSuccessor();
assert(PostDom && "Then block has multiple successors");
PHINode *Phi =
PHINode::Create(IncomingTrue->getType(), 2, "", &PostDom->front());
IncomingTrue->replaceAllUsesWith(Phi);
Phi->addIncoming(IncomingFalse, Head);
Phi->addIncoming(IncomingTrue, I->getParent());
}
}

DEBUG(DT->verifyDomTree());
}

InnerLoopVectorizer::VectorParts		InnerLoopVectorizer::VectorParts
InnerLoopVectorizer::createEdgeMask(BasicBlock Src, BasicBlock Dst) {		InnerLoopVectorizer::createEdgeMask(BasicBlock Src, BasicBlock Dst) {
assert(is_contained(predecessors(Dst), Src) && "Invalid edge");		assert(is_contained(predecessors(Dst), Src) && "Invalid edge");

// Look for cached value.		// Look for cached value.
std::pair<BasicBlock , BasicBlock > Edge(Src, Dst);		std::pair<BasicBlock , BasicBlock > Edge(Src, Dst);
EdgeMaskCacheTy::iterator ECEntryIt = EdgeMaskCache.find(Edge);		EdgeMaskCacheTy::iterator ECEntryIt = EdgeMaskCache.find(Edge);
if (ECEntryIt != EdgeMaskCache.end())		if (ECEntryIt != EdgeMaskCache.end())
▲ Show 20 Lines • Show All 127 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::widenPHIInstruction(Instruction *PN, unsigned UF,

// FIXME: The newly created binary instructions should contain nsw/nuw flags,		// FIXME: The newly created binary instructions should contain nsw/nuw flags,
// which can be found from the original scalar operations.		// which can be found from the original scalar operations.
switch (II.getKind()) {		switch (II.getKind()) {
case InductionDescriptor::IK_NoInduction:		case InductionDescriptor::IK_NoInduction:
llvm_unreachable("Unknown induction");		llvm_unreachable("Unknown induction");
case InductionDescriptor::IK_IntInduction:		case InductionDescriptor::IK_IntInduction:
case InductionDescriptor::IK_FpInduction:		case InductionDescriptor::IK_FpInduction:
return widenIntOrFpInduction(P);		llvm_unreachable("Integer/fp induction is handled elsewhere.");
		mkuperUnsubmitted Not Done Reply Inline Actions Why the split between how we handle int and ptr inductions? To minimize the patch? mkuper: Why the split between how we handle int and ptr inductions? To minimize the patch?
		AyalAuthorUnsubmitted Not Done Reply Inline Actions Yes, we're keeping ILV's current split between its widenIntOrFpInduction() and how it handles ptr inductions and other phi instructions. The former was last addressed and merged by r296145; the latter, appearing below, is simpler and remains inline. In any case, addressing this split is independent of introducing VPlan. Ayal: Yes, we're keeping ILV's current split between its widenIntOrFpInduction() and how it handles…
case InductionDescriptor::IK_PtrInduction: {		case InductionDescriptor::IK_PtrInduction: {
// Handle the pointer induction variable case.		// Handle the pointer induction variable case.
assert(P->getType()->isPointerTy() && "Unexpected type.");		assert(P->getType()->isPointerTy() && "Unexpected type.");
// This is the normalized GEP that starts counting at zero.		// This is the normalized GEP that starts counting at zero.
Value *PtrInd = Induction;		Value *PtrInd = Induction;
PtrInd = Builder.CreateSExtOrTrunc(PtrInd, II.getStep()->getType());		PtrInd = Builder.CreateSExtOrTrunc(PtrInd, II.getStep()->getType());
// Determine the number of scalars we need to generate for each unroll		// Determine the number of scalars we need to generate for each unroll
// iteration. If the instruction is uniform, we only need to generate the		// iteration. If the instruction is uniform, we only need to generate the
// first lane. Otherwise, we generate all VF values.		// first lane. Otherwise, we generate all VF values.
unsigned Lanes = Cost->isUniformAfterVectorization(P, VF) ? 1 : VF;		unsigned Lanes = Cost->isUniformAfterVectorization(P, VF) ? 1 : VF;
// These are the scalar results. Notice that we don't generate vector GEPs		// These are the scalar results. Notice that we don't generate vector GEPs
// because scalar GEPs result in better code.		// because scalar GEPs result in better code.
for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
for (unsigned Lane = 0; Lane < Lanes; ++Lane) {		for (unsigned Lane = 0; Lane < Lanes; ++Lane) {
Constant Idx = ConstantInt::get(PtrInd->getType(), Lane + Part VF);		Constant Idx = ConstantInt::get(PtrInd->getType(), Lane + Part VF);
Value *GlobalIdx = Builder.CreateAdd(PtrInd, Idx);		Value *GlobalIdx = Builder.CreateAdd(PtrInd, Idx);
Value *SclrGep = II.transform(Builder, GlobalIdx, PSE.getSE(), DL);		Value *SclrGep = II.transform(Builder, GlobalIdx, PSE.getSE(), DL);
SclrGep->setName("next.gep");		SclrGep->setName("next.gep");
VectorLoopValueMap.setScalarValue(P, Part, Lane, SclrGep);		VectorLoopValueMap.setScalarValue(P, {Part, Lane}, SclrGep);
}		}
}		}
return;		return;
}		}
}		}
}		}

/// A helper function for checking whether an integer division-related		/// A helper function for checking whether an integer division-related
/// instruction may divide by zero (in which case it must be predicated if		/// instruction may divide by zero (in which case it must be predicated if
/// executed conditionally in the scalar code).		/// executed conditionally in the scalar code).
/// TODO: It may be worthwhile to generalize and check isKnownNonZero().		/// TODO: It may be worthwhile to generalize and check isKnownNonZero().
/// Non-zero divisors that are non compile-time constants will not be		/// Non-zero divisors that are non compile-time constants will not be
/// converted into multiplication, so we will still end up scalarizing		/// converted into multiplication, so we will still end up scalarizing
/// the division, but can do so w/o predication.		/// the division, but can do so w/o predication.
static bool mayDivideByZero(Instruction &I) {		static bool mayDivideByZero(Instruction &I) {
assert((I.getOpcode() == Instruction::UDiv \|\|		assert((I.getOpcode() == Instruction::UDiv \|\|
I.getOpcode() == Instruction::SDiv \|\|		I.getOpcode() == Instruction::SDiv \|\|
I.getOpcode() == Instruction::URem \|\|		I.getOpcode() == Instruction::URem \|\|
I.getOpcode() == Instruction::SRem) &&		I.getOpcode() == Instruction::SRem) &&
"Unexpected instruction");		"Unexpected instruction");
Value *Divisor = I.getOperand(1);		Value *Divisor = I.getOperand(1);
auto *CInt = dyn_cast<ConstantInt>(Divisor);		auto *CInt = dyn_cast<ConstantInt>(Divisor);
return !CInt \|\| CInt->isZero();		return !CInt \|\| CInt->isZero();
}		}

void InnerLoopVectorizer::vectorizeInstruction(Instruction &I) {		void InnerLoopVectorizer::widenInstruction(Instruction &I) {
		mkuperUnsubmitted Not Done Reply Inline Actions Perhaps this should now be named widenInstruction? mkuper: Perhaps this should now be named widenInstruction?
		AyalAuthorUnsubmitted Not Done Reply Inline Actions Yup. Ayal: Yup.
// Scalarize instructions that should remain scalar after vectorization.
if (VF > 1 &&
!(isa<BranchInst>(&I) \|\| isa<PHINode>(&I) \|\| isa<DbgInfoIntrinsic>(&I)) &&
shouldScalarizeInstruction(&I)) {
scalarizeInstruction(&I, Legal->isScalarWithPredication(&I));
return;
}

switch (I.getOpcode()) {		switch (I.getOpcode()) {
case Instruction::Br:		case Instruction::Br:
// Nothing to do for PHIs and BR, since we already took care of the		case Instruction::PHI:
// loop control flow instructions.		llvm_unreachable("This instruction is handled by a different recipe.");
break;
case Instruction::PHI: {
// Vectorize PHINodes.
widenPHIInstruction(&I, UF, VF);
break;
} // End of PHI.
case Instruction::GetElementPtr: {		case Instruction::GetElementPtr: {
// Construct a vector GEP by widening the operands of the scalar GEP as		// Construct a vector GEP by widening the operands of the scalar GEP as
// necessary. We mark the vector GEP 'inbounds' if appropriate. A GEP		// necessary. We mark the vector GEP 'inbounds' if appropriate. A GEP
// results in a vector of pointers when at least one operand of the GEP		// results in a vector of pointers when at least one operand of the GEP
// is vector-typed. Thus, to keep the representation compact, we only use		// is vector-typed. Thus, to keep the representation compact, we only use
// vector-typed operands for loop-varying values.		// vector-typed operands for loop-varying values.
auto *GEP = cast<GetElementPtrInst>(&I);		auto *GEP = cast<GetElementPtrInst>(&I);

▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	case Instruction::GetElementPtr: {
}		}

break;		break;
}		}
case Instruction::UDiv:		case Instruction::UDiv:
case Instruction::SDiv:		case Instruction::SDiv:
case Instruction::SRem:		case Instruction::SRem:
case Instruction::URem:		case Instruction::URem:
// Scalarize with predication if this instruction may divide by zero and
// block execution is conditional, otherwise fallthrough.
if (Legal->isScalarWithPredication(&I)) {
scalarizeInstruction(&I, true);
break;
}
LLVM_FALLTHROUGH;
case Instruction::Add:		case Instruction::Add:
case Instruction::FAdd:		case Instruction::FAdd:
case Instruction::Sub:		case Instruction::Sub:
case Instruction::FSub:		case Instruction::FSub:
case Instruction::Mul:		case Instruction::Mul:
case Instruction::FMul:		case Instruction::FMul:
case Instruction::FDiv:		case Instruction::FDiv:
case Instruction::FRem:		case Instruction::FRem:
Show All 31 Lines	bool InvariantCond =
SE->isLoopInvariant(PSE.getSCEV(I.getOperand(0)), OrigLoop);		SE->isLoopInvariant(PSE.getSCEV(I.getOperand(0)), OrigLoop);
setDebugLocFromInst(Builder, &I);		setDebugLocFromInst(Builder, &I);

// The condition can be loop invariant but still defined inside the		// The condition can be loop invariant but still defined inside the
// loop. This means that we can't just use the original 'cond' value.		// loop. This means that we can't just use the original 'cond' value.
// We have to take the 'vectorized' value and pick the first lane.		// We have to take the 'vectorized' value and pick the first lane.
// Instcombine will make this a no-op.		// Instcombine will make this a no-op.

auto *ScalarCond = getOrCreateScalarValue(I.getOperand(0), 0, 0);		auto *ScalarCond = getOrCreateScalarValue(I.getOperand(0), {0, 0});

for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
Value *Cond = getOrCreateVectorValue(I.getOperand(0), Part);		Value *Cond = getOrCreateVectorValue(I.getOperand(0), Part);
Value *Op0 = getOrCreateVectorValue(I.getOperand(1), Part);		Value *Op0 = getOrCreateVectorValue(I.getOperand(1), Part);
Value *Op1 = getOrCreateVectorValue(I.getOperand(2), Part);		Value *Op1 = getOrCreateVectorValue(I.getOperand(2), Part);
Value *Sel =		Value *Sel =
Builder.CreateSelect(InvariantCond ? ScalarCond : Cond, Op0, Op1);		Builder.CreateSelect(InvariantCond ? ScalarCond : Cond, Op0, Op1);
VectorLoopValueMap.setVectorValue(&I, Part, Sel);		VectorLoopValueMap.setVectorValue(&I, Part, Sel);
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::widenInstruction(Instruction &I) {
case Instruction::SIToFP:		case Instruction::SIToFP:
case Instruction::UIToFP:		case Instruction::UIToFP:
case Instruction::Trunc:		case Instruction::Trunc:
case Instruction::FPTrunc:		case Instruction::FPTrunc:
case Instruction::BitCast: {		case Instruction::BitCast: {
auto *CI = dyn_cast<CastInst>(&I);		auto *CI = dyn_cast<CastInst>(&I);
setDebugLocFromInst(Builder, CI);		setDebugLocFromInst(Builder, CI);

// Optimize the special case where the source is a constant integer
// induction variable. Notice that we can only optimize the 'trunc' case
// because (a) FP conversions lose precision, (b) sext/zext may wrap, and
// (c) other casts depend on pointer size.
if (Cost->isOptimizableIVTruncate(CI, VF)) {
widenIntOrFpInduction(cast<PHINode>(CI->getOperand(0)),
cast<TruncInst>(CI));
break;
}

/// Vectorize casts.		/// Vectorize casts.
Type *DestTy =		Type *DestTy =
(VF == 1) ? CI->getType() : VectorType::get(CI->getType(), VF);		(VF == 1) ? CI->getType() : VectorType::get(CI->getType(), VF);

for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
Value *A = getOrCreateVectorValue(CI->getOperand(0), Part);		Value *A = getOrCreateVectorValue(CI->getOperand(0), Part);
Value *Cast = Builder.CreateCast(CI->getOpcode(), A, DestTy);		Value *Cast = Builder.CreateCast(CI->getOpcode(), A, DestTy);
VectorLoopValueMap.setVectorValue(&I, Part, Cast);		VectorLoopValueMap.setVectorValue(&I, Part, Cast);
Show All 14 Lines	case Instruction::Call: {
StringRef FnName = CI->getCalledFunction()->getName();		StringRef FnName = CI->getCalledFunction()->getName();
Function *F = CI->getCalledFunction();		Function *F = CI->getCalledFunction();
Type *RetTy = ToVectorTy(CI->getType(), VF);		Type *RetTy = ToVectorTy(CI->getType(), VF);
SmallVector<Type *, 4> Tys;		SmallVector<Type *, 4> Tys;
for (Value *ArgOperand : CI->arg_operands())		for (Value *ArgOperand : CI->arg_operands())
Tys.push_back(ToVectorTy(ArgOperand->getType(), VF));		Tys.push_back(ToVectorTy(ArgOperand->getType(), VF));

Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);		Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);
if (ID && (ID == Intrinsic::assume \|\| ID == Intrinsic::lifetime_end \|\|
ID == Intrinsic::lifetime_start)) {
scalarizeInstruction(&I);
break;
}
// The flag shows whether we use Intrinsic or a usual Call for vectorized		// The flag shows whether we use Intrinsic or a usual Call for vectorized
// version of the instruction.		// version of the instruction.
// Is it beneficial to perform intrinsic call compared to lib call?		// Is it beneficial to perform intrinsic call compared to lib call?
bool NeedToScalarize;		bool NeedToScalarize;
unsigned CallCost = getVectorCallCost(CI, VF, *TTI, TLI, NeedToScalarize);		unsigned CallCost = getVectorCallCost(CI, VF, *TTI, TLI, NeedToScalarize);
bool UseVectorIntrinsic =		bool UseVectorIntrinsic =
ID && getVectorIntrinsicCost(CI, VF, *TTI, TLI) <= CallCost;		ID && getVectorIntrinsicCost(CI, VF, *TTI, TLI) <= CallCost;
if (!UseVectorIntrinsic && NeedToScalarize) {		assert((UseVectorIntrinsic \|\| !NeedToScalarize) &&
scalarizeInstruction(&I);		"Instruction should be scalarized elsewhere.");
break;
}

for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
SmallVector<Value *, 4> Args;		SmallVector<Value *, 4> Args;
for (unsigned i = 0, ie = CI->getNumArgOperands(); i != ie; ++i) {		for (unsigned i = 0, ie = CI->getNumArgOperands(); i != ie; ++i) {
Value *Arg = CI->getArgOperand(i);		Value *Arg = CI->getArgOperand(i);
// Some intrinsics have a scalar argument - don't replace it with a		// Some intrinsics have a scalar argument - don't replace it with a
// vector.		// vector.
if (!UseVectorIntrinsic \|\| !hasVectorInstrinsicScalarOpd(ID, i))		if (!UseVectorIntrinsic \|\| !hasVectorInstrinsicScalarOpd(ID, i))
Show All 33 Lines	for (unsigned Part = 0; Part < UF; ++Part) {
VectorLoopValueMap.setVectorValue(&I, Part, V);		VectorLoopValueMap.setVectorValue(&I, Part, V);
addMetadata(V, &I);		addMetadata(V, &I);
}		}

break;		break;
}		}

default:		default:
// All other instructions are unsupported. Scalarize them.		// All other instructions are scalarized.
scalarizeInstruction(&I);		DEBUG(dbgs() << "LV: Found an unhandled instruction: " << I);
break;		llvm_unreachable("Unhandled instruction!");
} // end of switch.		} // end of switch.
}		}

void InnerLoopVectorizer::updateAnalysis() {		void InnerLoopVectorizer::updateAnalysis() {
// Forget the original basic block.		// Forget the original basic block.
PSE.getSE()->forgetLoop(OrigLoop);		PSE.getSE()->forgetLoop(OrigLoop);

// Update the dominator tree information.		// Update the dominator tree information.
assert(DT->properlyDominates(LoopBypassBlocks.front(), LoopExitBlock) &&		assert(DT->properlyDominates(LoopBypassBlocks.front(), LoopExitBlock) &&
"Entry does not dominate exit.");		"Entry does not dominate exit.");

DT->addNewBlock(LI->getLoopFor(LoopVectorBody)->getHeader(),
LoopVectorPreHeader);
DT->addNewBlock(LoopMiddleBlock,		DT->addNewBlock(LoopMiddleBlock,
LI->getLoopFor(LoopVectorBody)->getLoopLatch());		LI->getLoopFor(LoopVectorBody)->getLoopLatch());
DT->addNewBlock(LoopScalarPreHeader, LoopBypassBlocks[0]);		DT->addNewBlock(LoopScalarPreHeader, LoopBypassBlocks[0]);
DT->changeImmediateDominator(LoopScalarBody, LoopScalarPreHeader);		DT->changeImmediateDominator(LoopScalarBody, LoopScalarPreHeader);
DT->changeImmediateDominator(LoopExitBlock, LoopBypassBlocks[0]);		DT->changeImmediateDominator(LoopExitBlock, LoopBypassBlocks[0]);

DEBUG(DT->verifyDomTree());		DEBUG(DT->verifyDomTree());
}		}

/// \brief Check whether it is safe to if-convert this phi node.		/// \brief Check whether it is safe to if-convert this phi node.
///		///
/// Phi nodes with constant expressions that can trap are not safe to if		/// Phi nodes with constant expressions that can trap are not safe to if
/// convert.		/// convert.
static bool canIfConvertPHINodes(BasicBlock *BB) {		static bool canIfConvertPHINodes(BasicBlock *BB) {
▲ Show 20 Lines • Show All 1,923 Lines • ▼ Show 20 Lines	int LoopVectorizationCostModel::computePredInstDiscount(

return Discount;		return Discount;
}		}

LoopVectorizationCostModel::VectorizationCostTy		LoopVectorizationCostModel::VectorizationCostTy
LoopVectorizationCostModel::expectedCost(unsigned VF) {		LoopVectorizationCostModel::expectedCost(unsigned VF) {
VectorizationCostTy Cost;		VectorizationCostTy Cost;

// Collect Uniform and Scalar instructions after vectorization with VF.
collectUniformsAndScalars(VF);

// Collect the instructions (and their associated costs) that will be more
// profitable to scalarize.
collectInstsToScalarize(VF);

// For each block.		// For each block.
for (BasicBlock *BB : TheLoop->blocks()) {		for (BasicBlock *BB : TheLoop->blocks()) {
VectorizationCostTy BlockCost;		VectorizationCostTy BlockCost;

// For each instruction in the old loop.		// For each instruction in the old loop.
for (Instruction &I : *BB) {		for (Instruction &I : *BB) {
// Skip dbg intrinsics.		// Skip dbg intrinsics.
if (isa<DbgInfoIntrinsic>(I))		if (isa<DbgInfoIntrinsic>(I))
▲ Show 20 Lines • Show All 641 Lines • ▼ Show 20 Lines	if (!MaybeMaxVF.hasValue()) // Cases considered too costly to vectorize.
return NoVectorization;		return NoVectorization;

if (UserVF) {		if (UserVF) {
DEBUG(dbgs() << "LV: Using user VF " << UserVF << ".\n");		DEBUG(dbgs() << "LV: Using user VF " << UserVF << ".\n");
assert(isPowerOf2_32(UserVF) && "VF needs to be a power of two");		assert(isPowerOf2_32(UserVF) && "VF needs to be a power of two");
// Collect the instructions (and their associated costs) that will be more		// Collect the instructions (and their associated costs) that will be more
// profitable to scalarize.		// profitable to scalarize.
CM.selectUserVectorizationFactor(UserVF);		CM.selectUserVectorizationFactor(UserVF);
		buildVPlans(UserVF, UserVF);
		DEBUG(printPlans(dbgs()));
return {UserVF, 0};		return {UserVF, 0};
}		}

unsigned MaxVF = MaybeMaxVF.getValue();		unsigned MaxVF = MaybeMaxVF.getValue();
assert(MaxVF != 0 && "MaxVF is zero.");		assert(MaxVF != 0 && "MaxVF is zero.");

		for (unsigned VF = 1; VF <= MaxVF; VF *= 2) {
		// Collect Uniform and Scalar instructions after vectorization with VF.
		CM.collectUniformsAndScalars(VF);

		// Collect the instructions (and their associated costs) that will be more
		// profitable to scalarize.
		if (VF > 1)
		CM.collectInstsToScalarize(VF);
		}

		buildVPlans(1, MaxVF);
		DEBUG(printPlans(dbgs()));
if (MaxVF == 1)		if (MaxVF == 1)
return NoVectorization;		return NoVectorization;

// Select the optimal vectorization factor.		// Select the optimal vectorization factor.
return CM.selectVectorizationFactor(MaxVF);		return CM.selectVectorizationFactor(MaxVF);
}		}

void LoopVectorizationPlanner::executePlan(InnerLoopVectorizer &ILV) {		void LoopVectorizationPlanner::setBestPlan(unsigned VF, unsigned UF) {
		DEBUG(dbgs() << "Setting best plan to VF=" << VF << ", UF=" << UF << '\n');
		BestVF = VF;
		BestUF = UF;

		for (auto *VPlanIter = VPlans.begin(); VPlanIter != VPlans.end();) {
		mkuperUnsubmitted Not Done Reply Inline Actions I suppose this (and the rest of this code) is constructed this way because we intend to support more than one VPlan per VF/UF pair soon? mkuper: I suppose this (and the rest of this code) is constructed this way because we intend to support…
		AyalAuthorUnsubmitted Not Done Reply Inline Actions Currently each VPlan covers a range of VF's and arbitrary UF's, where every feasible VF is covered by a single VPlan. In the end a single VPlan should remain, which is the one that gets executed. As soon as possible all other VPlans are discharged, as done here. Ayal: Currently each VPlan covers a range of VF's and arbitrary UF's, where every feasible VF is…
		VPlan Plan = VPlanIter;
		if (Plan->hasVF(VF))
		++VPlanIter;
		else {
		VPlanIter = VPlans.erase(VPlanIter);
		delete Plan;
		}
		}
		assert(VPlans.size() == 1 && "Best VF has not a single VPlan.");
		}

		void LoopVectorizationPlanner::executePlan(InnerLoopVectorizer &ILV,
		DominatorTree *DT) {
// Perform the actual loop transformation.		// Perform the actual loop transformation.

// 1. Create a new empty loop. Unlink the old loop and connect the new one.		// 1. Create a new empty loop. Unlink the old loop and connect the new one.
ILV.createVectorizedLoopSkeleton();		VPTransformState State{
		BestVF, BestUF, LI, DT, ILV.Builder, ILV.VectorLoopValueMap, &ILV};
		State.CFG.PrevBB = ILV.createVectorizedLoopSkeleton();

//===------------------------------------------------===//		//===------------------------------------------------===//
//		//
// Notice: any optimization or new instruction that go		// Notice: any optimization or new instruction that go
// into the code below should also be implemented in		// into the code below should also be implemented in
// the cost-model.		// the cost-model.
//		//
//===------------------------------------------------===//		//===------------------------------------------------===//

// 2. Copy and widen instructions from the old loop into the new loop.		// 2. Copy and widen instructions from the old loop into the new loop.
		assert(VPlans.size() == 1 && "Not a single VPlan to execute.");
// Move instructions to handle first-order recurrences.		VPlan Plan = VPlans.begin();
DenseMap<Instruction , Instruction > &SinkAfter = Legal->getSinkAfter();		Plan->execute(&State);
for (auto &Entry : SinkAfter) {
Entry.first->removeFromParent();
Entry.first->insertAfter(Entry.second);
DEBUG(dbgs() << "Sinking" << Entry.first << " after" << Entry.second
<< " to vectorize a 1st order recurrence.\n");
}

// Collect instructions from the original loop that will become trivially dead
// in the vectorized loop. We don't need to vectorize these instructions. For
// example, original induction update instructions can become dead because we
// separately emit induction "steps" when generating code for the new loop.
// Similarly, we create a new latch condition when setting up the structure
// of the new loop, so the old one can become dead.
SmallPtrSet<Instruction *, 4> DeadInstructions;
collectTriviallyDeadInstructions(DeadInstructions);

// Scan the loop in a topological order to ensure that defs are vectorized
// before users.
LoopBlocksDFS DFS(OrigLoop);
DFS.perform(LI);

// Vectorize all instructions in the original loop that will not become
// trivially dead when vectorized.
for (BasicBlock *BB : make_range(DFS.beginRPO(), DFS.endRPO()))
for (Instruction &I : *BB)
if (!DeadInstructions.count(&I))
ILV.vectorizeInstruction(I);

// 3. Fix the vectorized code: take care of header phi's, live-outs,		// 3. Fix the vectorized code: take care of header phi's, live-outs,
// predication, updating analyses.		// predication, updating analyses.
ILV.fixVectorizedLoop();		ILV.fixVectorizedLoop();
}		}

void LoopVectorizationPlanner::collectTriviallyDeadInstructions(		void LoopVectorizationPlanner::collectTriviallyDeadInstructions(
SmallPtrSetImpl<Instruction *> &DeadInstructions) {		SmallPtrSetImpl<Instruction *> &DeadInstructions) {
Show All 13 Lines	for (auto &Induction : *Legal->getInductionVars()) {
PHINode *Ind = Induction.first;		PHINode *Ind = Induction.first;
auto *IndUpdate = cast<Instruction>(Ind->getIncomingValueForBlock(Latch));		auto *IndUpdate = cast<Instruction>(Ind->getIncomingValueForBlock(Latch));
if (all_of(IndUpdate->users(), [&](User *U) -> bool {		if (all_of(IndUpdate->users(), [&](User *U) -> bool {
return U == Ind \|\| DeadInstructions.count(cast<Instruction>(U));		return U == Ind \|\| DeadInstructions.count(cast<Instruction>(U));
}))		}))
DeadInstructions.insert(IndUpdate);		DeadInstructions.insert(IndUpdate);
}		}
}		}

void InnerLoopUnroller::vectorizeMemoryInstruction(Instruction *Instr) {
auto *SI = dyn_cast<StoreInst>(Instr);
bool IfPredicateInstr = (SI && Legal->blockNeedsPredication(SI->getParent()));

return scalarizeInstruction(Instr, IfPredicateInstr);
}

Value InnerLoopUnroller::reverseVector(Value Vec) { return Vec; }		Value InnerLoopUnroller::reverseVector(Value Vec) { return Vec; }

Value InnerLoopUnroller::getBroadcastInstrs(Value V) { return V; }		Value InnerLoopUnroller::getBroadcastInstrs(Value V) { return V; }

Value InnerLoopUnroller::getStepVector(Value Val, int StartIdx, Value *Step,		Value InnerLoopUnroller::getStepVector(Value Val, int StartIdx, Value *Step,
Instruction::BinaryOps BinOp) {		Instruction::BinaryOps BinOp) {
// When unrolling and the VF is 1, we only need to add a simple scalar.		// When unrolling and the VF is 1, we only need to add a simple scalar.
Type *Ty = Val->getType();		Type *Ty = Val->getType();
Show All 39 Lines	if (!IsUnrollMetadata) {
MDs.push_back(DisableNode);		MDs.push_back(DisableNode);
MDNode *NewLoopID = MDNode::get(Context, MDs);		MDNode *NewLoopID = MDNode::get(Context, MDs);
// Set operand 0 to refer to the loop id itself.		// Set operand 0 to refer to the loop id itself.
NewLoopID->replaceOperandWith(0, NewLoopID);		NewLoopID->replaceOperandWith(0, NewLoopID);
L->setLoopID(NewLoopID);		L->setLoopID(NewLoopID);
}		}
}		}

		namespace {
		mkuperUnsubmitted Not Done Reply Inline Actions Maybe break all of the recipe stuff out into a separate file? Two files, even? (header/implementation) mkuper: Maybe break all of the recipe stuff out into a separate file? Two files, even?
		AyalAuthorUnsubmitted Not Done Reply Inline Actions Yes, this large LoopVectorizer.cpp file should be broken into multiple smaller files. ILV in its current form hinders doing so with these recipes. Follow-up patches should help facilitate this desired change. Ayal: Yes, this large LoopVectorizer.cpp file should be broken into multiple smaller files. ILV in…
		mkuperUnsubmitted Not Done Reply Inline Actions It seems like putting all the recipes in a separate file would be easy to start with (instead of going into an anonymous namespace here.) If it isn't, I'm ok with doing this in follow-ups. mkuper: It seems like putting all the recipes in a separate file would be easy to start with (instead…
		/// VPWidenRecipe is a recipe for producing a copy of vector type for each
		/// Instruction in its ingredients independently, in order. This recipe covers
		/// most of the traditional vectorization cases where each ingredient transforms
		/// into a vectorized version of itself.
		class VPWidenRecipe : public VPRecipeBase {
		private:
		/// Hold the ingredients by pointing to their original BasicBlock location.
		BasicBlock::iterator Begin;
		BasicBlock::iterator End;

		public:
		VPWidenRecipe(Instruction *I) : VPRecipeBase(VPWidenSC) {
		End = I->getIterator();
		Begin = End++;
		}

		~VPWidenRecipe() {}

		/// Method to support type inquiry through isa, cast, and dyn_cast.
		static inline bool classof(const VPRecipeBase *V) {
		return V->getVPRecipeID() == VPRecipeBase::VPWidenSC;
		}

		/// Produce widened copies of all Ingredients.
		void execute(VPTransformState &State) override {
		for (auto &Instr : make_range(Begin, End))
		State.ILV->widenInstruction(Instr);
		}

		/// Augment the recipe to include Instr, if it lies at its End.
		bool appendInstruction(Instruction *Instr) {
		if (End != Instr->getIterator())
		return false;
		End++;
		return true;
		}

		/// Print the recipe.
		void print(raw_ostream &O, const Twine &Indent) const override {
		O << " +\n" << Indent << "\"WIDEN\\l\"";
		for (auto &Instr : make_range(Begin, End))
		O << " +\n" << Indent << "\" " << VPlanIngredient(&Instr) << "\\l\"";
		}
		};

		/// A recipe for handling phi nodes of integer and floating-point inductions,
		/// producing their vector and scalar values.
		class VPWidenIntOrFpInductionRecipe : public VPRecipeBase {
		private:
		PHINode *IV;
		TruncInst *Trunc;

		public:
		VPWidenIntOrFpInductionRecipe(PHINode IV, TruncInst Trunc = nullptr)
		: VPRecipeBase(VPWidenIntOrFpInductionSC), IV(IV), Trunc(Trunc) {}
		rengolinUnsubmitted Not Done Reply Inline Actions shouldn't you have some asserts here to make sure the PHI node is what you need it to be? Same for other VPlans. rengolin: shouldn't you have some asserts here to make sure the PHI node is what you need it to be? Same…
		AyalAuthorUnsubmitted Not Done Reply Inline Actions The asserts are there when execute() is called, e.g., when it calls ILV's widenIntOrFpInduction(). Ayal: The asserts are there when execute() is called, e.g., when it calls ILV's widenIntOrFpInduction…

		~VPWidenIntOrFpInductionRecipe() {}

		/// Method to support type inquiry through isa, cast, and dyn_cast.
		static inline bool classof(const VPRecipeBase *V) {
		return V->getVPRecipeID() == VPRecipeBase::VPWidenIntOrFpInductionSC;
		}

		/// Generate the vectorized and scalarized versions of the phi node as
		/// needed by their users.
		void execute(VPTransformState &State) override {
		assert(!State.Instance && "Int or FP induction being replicated.");
		State.ILV->widenIntOrFpInduction(IV, Trunc);
		}

		/// Print the recipe.
		void print(raw_ostream &O, const Twine &Indent) const override {
		O << " +\n" << Indent << "\"WIDEN-INDUCTION";
		if (Trunc) {
		O << "\\l\"";
		O << " +\n" << Indent << "\" " << VPlanIngredient(IV) << "\\l\"";
		O << " +\n" << Indent << "\" " << VPlanIngredient(Trunc) << "\\l\"";
		} else
		O << " " << VPlanIngredient(IV) << "\\l\"";
		}
		};

		/// A recipe for handling all phi nodes except for integer and FP inductions.
		class VPWidenPHIRecipe : public VPRecipeBase {
		private:
		PHINode *Phi;

		public:
		VPWidenPHIRecipe(PHINode *Phi) : VPRecipeBase(VPWidenPHISC), Phi(Phi) {}

		~VPWidenPHIRecipe() {}

		/// Method to support type inquiry through isa, cast, and dyn_cast.
		static inline bool classof(const VPRecipeBase *V) {
		return V->getVPRecipeID() == VPRecipeBase::VPWidenPHISC;
		}

		/// Generate the phi/select nodes.
		void execute(VPTransformState &State) override {
		State.ILV->widenPHIInstruction(Phi, State.UF, State.VF);
		}

		/// Print the recipe.
		void print(raw_ostream &O, const Twine &Indent) const override {
		O << " +\n" << Indent << "\"WIDEN-PHI " << VPlanIngredient(Phi) << "\\l\"";
		}
		};

		/// VPInterleaveRecipe is a recipe for transforming an interleave group of load
		/// or stores into one wide load/store and shuffles.
		class VPInterleaveRecipe : public VPRecipeBase {
		private:
		const InterleaveGroup *IG;

		public:
		VPInterleaveRecipe(const InterleaveGroup *IG)
		: VPRecipeBase(VPInterleaveSC), IG(IG) {}

		~VPInterleaveRecipe() {}

		/// Method to support type inquiry through isa, cast, and dyn_cast.
		static inline bool classof(const VPRecipeBase *V) {
		return V->getVPRecipeID() == VPRecipeBase::VPInterleaveSC;
		}

		/// Generate the wide load or store, and shuffles.
		void execute(VPTransformState &State) override {
		assert(!State.Instance && "Interleave group being replicated.");
		State.ILV->vectorizeInterleaveGroup(IG->getInsertPos());
		}

		/// Print the recipe.
		void print(raw_ostream &O, const Twine &Indent) const override;

		const InterleaveGroup *getInterleaveGroup() { return IG; }
		};

		/// VPReplicateRecipe replicates a given instruction producing multiple scalar
		/// copies of the original scalar type, one per lane, instead of producing a
		/// single copy of widened type for all lanes. If the instruction is known to be
		/// uniform only one copy, per lane zero, will be generated.
		class VPReplicateRecipe : public VPRecipeBase {
		private:
		/// The instruction being replicated.
		Instruction *Ingredient;

		/// Indicator if only a single replica per lane is needed.
		bool IsUniform;

		/// Indicator if the replicas are also predicated.
		bool IsPredicated;

		/// Indicator if the scalar values should also be packed into a vector.
		bool AlsoPack;

		public:
		VPReplicateRecipe(Instruction *I, bool IsUniform, bool IsPredicated = false)
		: VPRecipeBase(VPReplicateSC), Ingredient(I), IsUniform(IsUniform),
		IsPredicated(IsPredicated) {
		// Retain the previous behavior of predicateInstructions(), where an
		// insert-element of a predicated instruction got hoisted into the
		// predicated basic block iff it was its only user. This is achieved by
		// having predicated instructions also pack their values into a vector by
		// default unless they have a replicated user which uses their scalar value.
		AlsoPack = IsPredicated && !I->use_empty();
		}

		~VPReplicateRecipe() {}

		/// Method to support type inquiry through isa, cast, and dyn_cast.
		static inline bool classof(const VPRecipeBase *V) {
		return V->getVPRecipeID() == VPRecipeBase::VPReplicateSC;
		}

		/// Generate replicas of the desired Ingredient. Replicas will be generated
		/// for all parts and lanes unless a specific part and lane are specified in
		/// the \p State.
		void execute(VPTransformState &State) override;

		void setAlsoPack(bool Pack) { AlsoPack = Pack; }

		/// Print the recipe.
		void print(raw_ostream &O, const Twine &Indent) const override {
		O << " +\n"
		<< Indent << "\"" << (IsUniform ? "CLONE " : "REPLICATE ")
		<< VPlanIngredient(Ingredient);
		if (AlsoPack)
		O << " (S->V)";
		O << "\\l\"";
		}
		};

		/// A recipe for generating conditional branches on the bits of a mask.
		class VPBranchOnMaskRecipe : public VPRecipeBase {
		private:
		/// The input IR basic block used to obtain the mask providing the condition
		/// bits for the branch.
		BasicBlock *MaskedBasicBlock;

		public:
		VPBranchOnMaskRecipe(BasicBlock *BB)
		: VPRecipeBase(VPBranchOnMaskSC), MaskedBasicBlock(BB) {}

		/// Method to support type inquiry through isa, cast, and dyn_cast.
		static inline bool classof(const VPRecipeBase *V) {
		return V->getVPRecipeID() == VPRecipeBase::VPBranchOnMaskSC;
		}

		/// Generate the extraction of the appropriate bit from the block mask and the
		/// conditional branch.
		void execute(VPTransformState &State) override;

		/// Print the recipe.
		void print(raw_ostream &O, const Twine &Indent) const override {
		O << " +\n"
		<< Indent << "\"BRANCH-ON-MASK-OF " << MaskedBasicBlock->getName()
		<< "\\l\"";
		}
		};

		/// VPPredInstPHIRecipe is a recipe for generating the phi nodes needed when
		/// control converges back from a Branch-on-Mask. The phi nodes are needed in
		/// order to merge values that are set under such a branch and feed their uses.
		/// The phi nodes can be scalar or vector depending on the users of the value.
		/// This recipe works in concert with VPBranchOnMaskRecipe.
		class VPPredInstPHIRecipe : public VPRecipeBase {
		private:
		Instruction *PredInst;

		public:
		/// Construct a VPPredInstPHIRecipe given \p PredInst whose value needs a phi
		/// nodes after merging back from a Branch-on-Mask.
		VPPredInstPHIRecipe(Instruction *PredInst)
		: VPRecipeBase(VPPredInstPHISC), PredInst(PredInst) {}

		~VPPredInstPHIRecipe() {}

		/// Method to support type inquiry through isa, cast, and dyn_cast.
		static inline bool classof(const VPRecipeBase *V) {
		return V->getVPRecipeID() == VPRecipeBase::VPPredInstPHISC;
		}

		/// Generates phi nodes for live-outs as needed to retain SSA form.
		void execute(VPTransformState &State) override;

		/// Print the recipe.
		void print(raw_ostream &O, const Twine &Indent) const override {
		O << " +\n"
		<< Indent << "\"PHI-PREDICATED-INSTRUCTION " << VPlanIngredient(PredInst)
		<< "\\l\"";
		}
		};
		} // end anonymous namespace

		bool LoopVectorizationPlanner::getDecisionAndClampRange(
		const std::function<bool(unsigned)> &Predicate, VFRange &Range) {
		assert(Range.End > Range.Start && "Trying to test an empty VF range.");
		bool PredicateAtRangeStart = Predicate(Range.Start);

		for (unsigned TmpVF = Range.Start * 2; TmpVF < Range.End; TmpVF *= 2)
		if (Predicate(TmpVF) != PredicateAtRangeStart) {
		Range.End = TmpVF;
		break;
		}

		return PredicateAtRangeStart;
		}
		rengolinUnsubmitted Not Done Reply Inline Actions I know you can re-use MinVF, but it would be more readable if you created a new variable VF to use inside the loop. rengolin: I know you can re-use MinVF, but it would be more readable if you created a new variable VF to…
		AyalAuthorUnsubmitted Not Done Reply Inline Actions OK, sure. Ayal: OK, sure.

		/// Build VPlans for the full range of feasible VF's = {\p MinVF, 2 * \p MinVF,
		/// 4 * \p MinVF, ..., \p MaxVF} by repeatedly building a VPlan for a sub-range
		/// of VF's starting at a given VF and extending it as much as possible. Each
		/// vectorization decision can potentially shorten this sub-range during
		rengolinUnsubmitted Not Done Reply Inline Actions This is confusing. Is this RSO just for the plan name? It'd seem more appropriate to have the name be a property of the plans, not some outside property. rengolin: This is confusing. Is this RSO just for the plan name? It'd seem more appropriate to have…
		AyalAuthorUnsubmitted Not Done Reply Inline Actions Yes, this is to concatenate the VF's into the plan name, which is indeed a property of the plan. Will move from buildVPlans() to the end of buildVPlan(). Ayal: Yes, this is to concatenate the VF's into the plan name, which is indeed a property of the plan.
		/// buildVPlan().
		void LoopVectorizationPlanner::buildVPlans(unsigned MinVF, unsigned MaxVF) {
		for (unsigned VF = MinVF; VF < MaxVF + 1;) {
		VFRange SubRange = {VF, MaxVF + 1};
		VPlan *Plan = buildVPlan(SubRange);
		VPlans.push_back(Plan);
		VF = SubRange.End;
		}
		}

		VPInterleaveRecipe *
		LoopVectorizationPlanner::tryToInterleaveMemory(Instruction *I,
		VFRange &Range) {
		const InterleaveGroup *IG = Legal->getInterleavedAccessGroup(I);
		if (!IG)
		return nullptr;

		// Now check if IG is relevant for VF's in the given range.
		auto isIGMember = [&](Instruction *I) -> std::function<bool(unsigned)> {
		return [=](unsigned VF) -> bool {
		return (VF >= 2 && // Query is illegal for VF == 1
		CM.getWideningDecision(I, VF) ==
		LoopVectorizationCostModel::CM_Interleave);
		};
		};
		if (!getDecisionAndClampRange(isIGMember(I), Range))
		return nullptr;

		// I is a member of an InterleaveGroup for VF's in the (possibly trimmed)
		// range. If it's the primary member of the IG construct a VPInterleaveRecipe.
		// Otherwise, it's an adjunct member of the IG, do not construct any Recipe.
		assert(I == IG->getInsertPos() &&
		"Generating a recipe for an adjunct member of an interleave group");

		return new VPInterleaveRecipe(IG);
		}

		VPWidenIntOrFpInductionRecipe *
		LoopVectorizationPlanner::tryToOptimizeInduction(Instruction *I,
		VFRange &Range) {
		if (PHINode *Phi = dyn_cast<PHINode>(I)) {
		// Check if this is an integer or fp induction. If so, build the recipe that
		// produces its scalar and vector values.
		InductionDescriptor II = Legal->getInductionVars()->lookup(Phi);
		if (II.getKind() == InductionDescriptor::IK_IntInduction \|\|
		II.getKind() == InductionDescriptor::IK_FpInduction)
		return new VPWidenIntOrFpInductionRecipe(Phi);

		return nullptr;
		}

		// Optimize the special case where the source is a constant integer
		// induction variable. Notice that we can only optimize the 'trunc' case
		// because (a) FP conversions lose precision, (b) sext/zext may wrap, and
		// (c) other casts depend on pointer size.

		// Determine whether \p K is a truncation based on an induction variable that
		// can be optimized.
		auto isOptimizableIVTruncate =
		[&](Instruction *K) -> std::function<bool(unsigned)> {
		return
		[=](unsigned VF) -> bool { return CM.isOptimizableIVTruncate(K, VF); };
		};

		if (isa<TruncInst>(I) &&
		getDecisionAndClampRange(isOptimizableIVTruncate(I), Range))
		return new VPWidenIntOrFpInductionRecipe(cast<PHINode>(I->getOperand(0)),
		cast<TruncInst>(I));
		return nullptr;
		}

		VPWidenRecipe *LoopVectorizationPlanner::tryToWiden(
		Instruction I, VPWidenRecipe LastWidenRecipe, VFRange &Range) {

		if (Legal->isScalarWithPredication(I))
		return nullptr;

		static DenseSet<unsigned> VectorizableOpcodes = {
		Instruction::Br, Instruction::PHI, Instruction::GetElementPtr,
		Instruction::UDiv, Instruction::SDiv, Instruction::SRem,
		Instruction::URem, Instruction::Add, Instruction::FAdd,
		Instruction::Sub, Instruction::FSub, Instruction::Mul,
		Instruction::FMul, Instruction::FDiv, Instruction::FRem,
		Instruction::Shl, Instruction::LShr, Instruction::AShr,
		Instruction::And, Instruction::Or, Instruction::Xor,
		Instruction::Select, Instruction::ICmp, Instruction::FCmp,
		Instruction::Store, Instruction::Load, Instruction::ZExt,
		Instruction::SExt, Instruction::FPToUI, Instruction::FPToSI,
		Instruction::FPExt, Instruction::PtrToInt, Instruction::IntToPtr,
		Instruction::SIToFP, Instruction::UIToFP, Instruction::Trunc,
		Instruction::FPTrunc, Instruction::BitCast, Instruction::Call};

		if (!VectorizableOpcodes.count(I->getOpcode()))
		return nullptr;

		if (CallInst *CI = dyn_cast<CallInst>(I)) {
		Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);
		if (ID && (ID == Intrinsic::assume \|\| ID == Intrinsic::lifetime_end \|\|
		ID == Intrinsic::lifetime_start))
		return nullptr;
		}

		auto willWiden = [&](unsigned VF) -> bool {
		if (!isa<PHINode>(I) && (CM.isScalarAfterVectorization(I, VF) \|\|
		CM.isProfitableToScalarize(I, VF)))
		return false;
		if (CallInst *CI = dyn_cast<CallInst>(I)) {
		Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);
		// The following case may be scalarized depending on the VF.
		// The flag shows whether we use Intrinsic or a usual Call for vectorized
		// version of the instruction.
		// Is it beneficial to perform intrinsic call compared to lib call?
		bool NeedToScalarize;
		unsigned CallCost = getVectorCallCost(CI, VF, *TTI, TLI, NeedToScalarize);
		bool UseVectorIntrinsic =
		ID && getVectorIntrinsicCost(CI, VF, *TTI, TLI) <= CallCost;
		return UseVectorIntrinsic \|\| !NeedToScalarize;
		}
		if (isa<LoadInst>(I) \|\| isa<StoreInst>(I)) {
		LoopVectorizationCostModel::InstWidening Decision =
		CM.getWideningDecision(I, VF);
		assert(Decision != LoopVectorizationCostModel::CM_Unknown &&
		"CM decision should be taken at this point.");
		assert(Decision != LoopVectorizationCostModel::CM_Interleave &&
		"Interleave memory opportunity should be caught earlier.");
		return Decision != LoopVectorizationCostModel::CM_Scalarize;
		}
		return true;
		};

		if (!getDecisionAndClampRange(willWiden, Range))
		return nullptr;

		// Success: widen this instruction. We optimize the common case where
		// consecutive instructions can be represented by a single recipe.
		if (LastWidenRecipe && LastWidenRecipe->appendInstruction(I))
		return LastWidenRecipe;
		return new VPWidenRecipe(I);
		}

		VPBasicBlock *LoopVectorizationPlanner::handleReplication(
		Instruction I, VFRange &Range, VPBasicBlock VPBB,
		DenseMap<Instruction , VPReplicateRecipe > &PredInst2Recipe) {

		bool IsUniform = getDecisionAndClampRange(
		[&](unsigned VF) { return CM.isUniformAfterVectorization(I, VF); },
		Range);

		bool IsPredicated = Legal->isScalarWithPredication(I);
		auto *Recipe = new VPReplicateRecipe(I, IsUniform, IsPredicated);

		// Find if I uses a predicated instruction. If so, it will use its scalar
		// value. Avoid hoisting the insert-element which packs the scalar value into
		// a vector value, as that happens iff all users use the vector value.
		for (auto &Op : I->operands())
		if (auto *PredInst = dyn_cast<Instruction>(Op))
		if (PredInst2Recipe.find(PredInst) != PredInst2Recipe.end())
		PredInst2Recipe[PredInst]->setAlsoPack(false);

		// Finalize the recipe for Instr, first if it is not predicated.
		if (!IsPredicated) {
		DEBUG(dbgs() << "LV: Scalarizing:" << *I << "\n");
		VPBB->appendRecipe(Recipe);
		return VPBB;
		}
		DEBUG(dbgs() << "LV: Scalarizing and predicating:" << *I << "\n");
		assert(VPBB->getSuccessors().empty() &&
		"VPBB has successors when handling predicated replication.");
		// Record predicated instructions for above packing optimizations.
		PredInst2Recipe[I] = Recipe;
		VPBlockBase *Region = VPBB->setOneSuccessor(createReplicateRegion(I, Recipe));
		return cast<VPBasicBlock>(Region->setOneSuccessor(new VPBasicBlock()));
		}

		VPRegionBlock *
		LoopVectorizationPlanner::createReplicateRegion(Instruction *Instr,
		VPRecipeBase *PredRecipe) {
		// Instructions marked for predication are replicated and placed under an
		// if-then construct to prevent side-effects.

		// Build the triangular if-then region.
		std::string RegionName = (Twine("pred.") + Instr->getOpcodeName()).str();
		assert(Instr->getParent() && "Predicated instruction not in any basic block");
		auto *BOMRecipe = new VPBranchOnMaskRecipe(Instr->getParent());
		auto *Entry = new VPBasicBlock(Twine(RegionName) + ".entry", BOMRecipe);
		auto *PHIRecipe =
		Instr->getType()->isVoidTy() ? nullptr : new VPPredInstPHIRecipe(Instr);
		auto *Exit = new VPBasicBlock(Twine(RegionName) + ".continue", PHIRecipe);
		auto *Pred = new VPBasicBlock(Twine(RegionName) + ".if", PredRecipe);
		VPRegionBlock *Region = new VPRegionBlock(Entry, Exit, RegionName, true);

		// Note: first set Entry as region entry and then connect successors starting
		// from it in order, to propagate the "parent" of each VPBasicBlock.
		Entry->setTwoSuccessors(Pred, Exit);
		Pred->setOneSuccessor(Exit);

		return Region;
		}

		VPlan *LoopVectorizationPlanner::buildVPlan(VFRange &Range) {

		DenseMap<Instruction , Instruction > &SinkAfter = Legal->getSinkAfter();
		DenseMap<Instruction , Instruction > SinkAfterInverse;

		// Collect instructions from the original loop that will become trivially dead
		// in the vectorized loop. We don't need to vectorize these instructions. For
		// example, original induction update instructions can become dead because we
		// separately emit induction "steps" when generating code for the new loop.
		// Similarly, we create a new latch condition when setting up the structure
		// of the new loop, so the old one can become dead.
		SmallPtrSet<Instruction *, 4> DeadInstructions;
		collectTriviallyDeadInstructions(DeadInstructions);

		// Hold a mapping from predicated instructions to their recipes, in order to
		// fix their AlsoPack behavior if a user is determined to replicate and use a
		// scalar instead of vector value.
		DenseMap<Instruction , VPReplicateRecipe > PredInst2Recipe;

		// Create a dummy pre-entry VPBasicBlock to start building the VPlan.
		VPBasicBlock *VPBB = new VPBasicBlock("Pre-Entry");
		mkuperUnsubmitted Done Reply Inline Actions Relevent -> Relevant mkuper: Relevent -> Relevant
		VPlan *Plan = new VPlan(VPBB);

		// Scan the body of the loop in a topological order to visit each basic block
		// after having visited its predecessor basic blocks.
		LoopBlocksDFS DFS(OrigLoop);
		DFS.perform(LI);

		for (BasicBlock *BB : make_range(DFS.beginRPO(), DFS.endRPO())) {
		mkuperUnsubmitted Not Done Reply Inline Actions Please add a comment somewhere explaining that the order in which we try the recipes matters. (e.g. tryToInterleave needs to come first.) I'm not sure how we should be handling recipe priority in general, if at all, but that's not for this review. mkuper: Please add a comment somewhere explaining that the order in which we try the recipes matters.
		AyalAuthorUnsubmitted Not Done Reply Inline Actions Sure. This ordering reflects the existing logic in LV. Note that in the future, some recipes such as Interleave Groups may be constructed as a VPlan-based transformation, instead of jointly with other recipes, relieving this specific ordering concern here. Ayal: Sure. This ordering reflects the existing logic in LV. Note that in the future, some recipes…
		// Relevant instructions from basic block BB will be grouped into VPRecipe
		// ingredients and fill a new VPBasicBlock.
		unsigned VPBBsForBB = 0;
		auto *FirstVPBBForBB = new VPBasicBlock(BB->getName());
		VPBB->setOneSuccessor(FirstVPBBForBB);
		VPBB = FirstVPBBForBB;
		VPWidenRecipe *LastWidenRecipe = nullptr;

		std::vector<Instruction *> Ingredients;

		// Organize the ingredients to vectorize from current basic block in the
		// right order.
		for (Instruction &I : *BB) {
		Instruction *Instr = &I;

		// First filter out irrelevant instructions, to ensure no recipes are
		// built for them.
		if (isa<BranchInst>(Instr) \|\| isa<DbgInfoIntrinsic>(Instr) \|\|
		DeadInstructions.count(Instr))
		continue;

		// I is a member of an InterleaveGroup for Range.Start. If it's an adjunct
		// member of the IG, do not construct any Recipe for it.
		const InterleaveGroup *IG = Legal->getInterleavedAccessGroup(Instr);
		if (IG && Instr != IG->getInsertPos() &&
		Range.Start >= 2 && // Query is illegal for VF == 1
		CM.getWideningDecision(Instr, Range.Start) ==
		LoopVectorizationCostModel::CM_Interleave)
		continue;

		// Move instructions to handle first-order recurrences, step 1: avoid
		// handling this instruction until after we've handled the instruction it
		mkuperUnsubmitted Not Done Reply Inline Actions Shouldn't we be resetting LastWidenRecipe somewhere, if we encountered an instruction that uses a different recipe? mkuper: Shouldn't we be resetting LastWidenRecipe somewhere, if we encountered an instruction that uses…
		AyalAuthorUnsubmitted Not Done Reply Inline Actions We could, as an optimization, because VPWidenRecipe's appendInstruction(Instr) currently succeeds only if Instr is its 'next' ingredient. But it's not necessary - we can simply call appendInstruction(Instr) for the LastWidenRecipe even if other instructions have gone by (and have it fail if so). This way more relaxed forms of appendInstruction() can be supported. OTOH, doing so would clutter each 'continue'. Ayal: We could, as an optimization, because VPWidenRecipe's appendInstruction(Instr) currently…
		// should follow.
		auto SAIt = SinkAfter.find(Instr);
		if (SAIt != SinkAfter.end()) {
		DEBUG(dbgs() << "Sinking" << SAIt->first << " after" << SAIt->second
		<< " to vectorize a 1st order recurrence.\n");
		SinkAfterInverse[SAIt->second] = Instr;
		continue;
		}

		Ingredients.push_back(Instr);

		// Move instructions to handle first-order recurrences, step 2: push the
		// instruction to be sunk at its insertion point.
		auto SAInvIt = SinkAfterInverse.find(Instr);
		if (SAInvIt != SinkAfterInverse.end())
		Ingredients.push_back(SAInvIt->second);
		}

		// Introduce each ingredient into VPlan.
		for (Instruction *Instr : Ingredients) {
		VPRecipeBase *Recipe = nullptr;

		// Check if Instr should belong to an interleave memory recipe, or already
		// does. In the latter case Instr is irrelevant.
		if ((Recipe = tryToInterleaveMemory(Instr, Range))) {
		VPBB->appendRecipe(Recipe);
		continue;
		}

		// Check if Instr should form some PHI recipe.
		if ((Recipe = tryToOptimizeInduction(Instr, Range))) {
		VPBB->appendRecipe(Recipe);
		continue;
		}
		if (PHINode *Phi = dyn_cast<PHINode>(Instr)) {
		VPBB->appendRecipe(new VPWidenPHIRecipe(Phi));
		continue;
		}

		// Check if Instr is to be widened by a general VPWidenRecipe, after
		// having first checked for specific widening recipes that deal with
		// Interleave Groups, Inductions and Phi nodes.
		if ((Recipe = tryToWiden(Instr, LastWidenRecipe, Range))) {
		if (Recipe != LastWidenRecipe)
		VPBB->appendRecipe(Recipe);
		LastWidenRecipe = cast<VPWidenRecipe>(Recipe);
		continue;
		}

		// Otherwise, if all widening options failed, Instruction is to be
		// replicated. This may create a successor for VPBB.
		VPBasicBlock *NextVPBB =
		handleReplication(Instr, Range, VPBB, PredInst2Recipe);
		if (NextVPBB != VPBB) {
		VPBB = NextVPBB;
		VPBB->setName(BB->hasName() ? BB->getName() + "." + Twine(VPBBsForBB++)
		: "");
		}
		}
		}

		// Discard empty dummy pre-entry VPBasicBlock. Note that other VPBasicBlocks
		// may also be empty, such as the last one VPBB, reflecting original
		// basic-blocks with no recipes.
		VPBasicBlock *PreEntry = cast<VPBasicBlock>(Plan->getEntry());
		assert(PreEntry->empty() && "Expecting empty pre-entry block.");
		VPBlockBase *Entry = Plan->setEntry(PreEntry->getSingleSuccessor());
		PreEntry->disconnectSuccessor(Entry);
		delete PreEntry;

		std::string PlanName;
		raw_string_ostream RSO(PlanName);
		unsigned VF = Range.Start;
		Plan->addVF(VF);
		RSO << "Initial VPlan for VF={" << VF;
		for (VF = 2; VF < Range.End; VF = 2) {
		Plan->addVF(VF);
		RSO << "," << VF;
		}
		RSO << "},UF>=1";
		RSO.flush();
		Plan->setName(PlanName);

		return Plan;
		}

		void VPInterleaveRecipe::print(raw_ostream &O, const Twine &Indent) const {
		O << " +\n"
		<< Indent << "\"INTERLEAVE-GROUP with factor " << IG->getFactor() << " at ";
		IG->getInsertPos()->printAsOperand(O, false);
		O << "\\l\"";
		for (unsigned i = 0; i < IG->getFactor(); ++i)
		if (Instruction *I = IG->getMember(i))
		O << " +\n"
		<< Indent << "\" " << VPlanIngredient(I) << " " << i << "\\l\"";
		}

		void VPReplicateRecipe::execute(VPTransformState &State) {

		if (State.Instance) { // Generate a single instance.
		State.ILV->scalarizeInstruction(Ingredient, *State.Instance, IsPredicated);
		// Insert scalar instance packing it into a vector.
		if (AlsoPack && State.VF > 1) {
		// If we're constructing lane 0, initialize to start from undef.
		if (State.Instance->Lane == 0) {
		Value *Undef =
		UndefValue::get(VectorType::get(Ingredient->getType(), State.VF));
		State.ValueMap.setVectorValue(Ingredient, State.Instance->Part, Undef);
		}
		State.ILV->packScalarIntoVectorValue(Ingredient, *State.Instance);
		}
		return;
		}

		// Generate scalar instances for all VF lanes of all UF parts, unless the
		// instruction is uniform inwhich case generate only the first lane for each
		// of the UF parts.
		unsigned EndLane = IsUniform ? 1 : State.VF;
		for (unsigned Part = 0; Part < State.UF; ++Part)
		for (unsigned Lane = 0; Lane < EndLane; ++Lane)
		State.ILV->scalarizeInstruction(Ingredient, {Part, Lane}, IsPredicated);
		}

		void VPBranchOnMaskRecipe::execute(VPTransformState &State) {
		assert(State.Instance && "Branch on Mask works only on single instance.");

		unsigned Part = State.Instance->Part;
		unsigned Lane = State.Instance->Lane;

		auto Cond = State.ILV->createBlockInMask(MaskedBasicBlock);

		Value *ConditionBit = Cond[Part];
		if (!ConditionBit) // Block in mask is all-one.
		ConditionBit = State.Builder.getTrue();
		else if (ConditionBit->getType()->isVectorTy())
		ConditionBit = State.Builder.CreateExtractElement(
		ConditionBit, State.Builder.getInt32(Lane));

		// Replace the temporary unreachable terminator with a new conditional branch,
		// whose two destinations will be set later when they are created.
		auto *CurrentTerminator = State.CFG.PrevBB->getTerminator();
		assert(isa<UnreachableInst>(CurrentTerminator) &&
		"Expected to replace unreachable terminator with conditional branch.");
		auto *CondBr = BranchInst::Create(State.CFG.PrevBB, nullptr, ConditionBit);
		CondBr->setSuccessor(0, nullptr);
		ReplaceInstWithInst(CurrentTerminator, CondBr);

		DEBUG(dbgs() << "\nLV: vectorizing BranchOnMask recipe "
		<< MaskedBasicBlock->getName());
		}

		void VPPredInstPHIRecipe::execute(VPTransformState &State) {
		assert(State.Instance && "Predicated instruction PHI works per instance.");
		Instruction *ScalarPredInst = cast<Instruction>(
		State.ValueMap.getScalarValue(PredInst, *State.Instance));
		BasicBlock *PredicatedBB = ScalarPredInst->getParent();
		BasicBlock *PredicatingBB = PredicatedBB->getSinglePredecessor();
		assert(PredicatingBB && "Predicated block has no single predecessor.");

		// By current pack/unpack logic we need to generate only a single phi node: if
		// a vector value for the predicated instruction exists at this point it means
		// the instruction has vector users only, and a phi for the vector value is
		// needed. In this case the recipe of the predicated instruction is marked to
		// also do that packing, thereby "hoisting" the insert-element sequence.
		// Otherwise, a phi node for the scalar value is needed.
		unsigned Part = State.Instance->Part;
		if (State.ValueMap.hasVectorValue(PredInst, Part)) {
		Value *VectorValue = State.ValueMap.getVectorValue(PredInst, Part);
		InsertElementInst *IEI = cast<InsertElementInst>(VectorValue);
		PHINode *VPhi = State.Builder.CreatePHI(IEI->getType(), 2);
		VPhi->addIncoming(IEI->getOperand(0), PredicatingBB); // Unmodified vector.
		VPhi->addIncoming(IEI, PredicatedBB); // New vector with inserted element.
		State.ValueMap.resetVectorValue(PredInst, Part, VPhi); // Update cache.
		} else {
		Type *PredInstType = PredInst->getType();
		PHINode *Phi = State.Builder.CreatePHI(PredInstType, 2);
		Phi->addIncoming(UndefValue::get(ScalarPredInst->getType()), PredicatingBB);
		Phi->addIncoming(ScalarPredInst, PredicatedBB);
		State.ValueMap.resetScalarValue(PredInst, *State.Instance, Phi);
		}
		}

bool LoopVectorizePass::processLoop(Loop *L) {		bool LoopVectorizePass::processLoop(Loop *L) {
assert(L->empty() && "Only process inner loops.");		assert(L->empty() && "Only process inner loops.");

#ifndef NDEBUG		#ifndef NDEBUG
const std::string DebugLocStr = getDebugLocString(L);		const std::string DebugLocStr = getDebugLocString(L);
#endif /* NDEBUG */		#endif /* NDEBUG */

DEBUG(dbgs() << "\nLV: Checking a loop in \""		DEBUG(dbgs() << "\nLV: Checking a loop in \""
▲ Show 20 Lines • Show All 102 Lines • ▼ Show 20 Lines	#endif /* NDEBUG */
}		}

// Use the cost model.		// Use the cost model.
LoopVectorizationCostModel CM(L, PSE, LI, &LVL, *TTI, TLI, DB, AC, ORE, F,		LoopVectorizationCostModel CM(L, PSE, LI, &LVL, *TTI, TLI, DB, AC, ORE, F,
&Hints);		&Hints);
CM.collectValuesToIgnore();		CM.collectValuesToIgnore();

// Use the planner for vectorization.		// Use the planner for vectorization.
LoopVectorizationPlanner LVP(L, LI, &LVL, CM);		LoopVectorizationPlanner LVP(L, LI, TLI, TTI, &LVL, CM);

// Get user vectorization factor.		// Get user vectorization factor.
unsigned UserVF = Hints.getWidth();		unsigned UserVF = Hints.getWidth();

// Plan how to best vectorize, return the best VF and its cost.		// Plan how to best vectorize, return the best VF and its cost.
LoopVectorizationCostModel::VectorizationFactor VF =		LoopVectorizationCostModel::VectorizationFactor VF =
LVP.plan(OptForSize, UserVF);		LVP.plan(OptForSize, UserVF);

▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	ORE->emit(OptimizationRemarkAnalysis(LV_NAME, IntDiagMsg.first,
L->getStartLoc(), L->getHeader())		L->getStartLoc(), L->getHeader())
<< IntDiagMsg.second);		<< IntDiagMsg.second);
} else if (VectorizeLoop && InterleaveLoop) {		} else if (VectorizeLoop && InterleaveLoop) {
DEBUG(dbgs() << "LV: Found a vectorizable loop (" << VF.Width << ") in "		DEBUG(dbgs() << "LV: Found a vectorizable loop (" << VF.Width << ") in "
<< DebugLocStr << '\n');		<< DebugLocStr << '\n');
DEBUG(dbgs() << "LV: Interleave Count is " << IC << '\n');		DEBUG(dbgs() << "LV: Interleave Count is " << IC << '\n');
}		}

		LVP.setBestPlan(VF.Width, IC);

using namespace ore;		using namespace ore;
if (!VectorizeLoop) {		if (!VectorizeLoop) {
assert(IC > 1 && "interleave count should not be 1 or 0");		assert(IC > 1 && "interleave count should not be 1 or 0");
// If we decided that it is not legal to vectorize the loop, then		// If we decided that it is not legal to vectorize the loop, then
// interleave it.		// interleave it.
InnerLoopUnroller Unroller(L, PSE, LI, DT, TLI, TTI, AC, ORE, IC, &LVL,		InnerLoopUnroller Unroller(L, PSE, LI, DT, TLI, TTI, AC, ORE, IC, &LVL,
&CM);		&CM);
LVP.executePlan(Unroller);		LVP.executePlan(Unroller, DT);

ORE->emit(OptimizationRemark(LV_NAME, "Interleaved", L->getStartLoc(),		ORE->emit(OptimizationRemark(LV_NAME, "Interleaved", L->getStartLoc(),
L->getHeader())		L->getHeader())
<< "interleaved loop (interleaved count: "		<< "interleaved loop (interleaved count: "
<< NV("InterleaveCount", IC) << ")");		<< NV("InterleaveCount", IC) << ")");
} else {		} else {
// If we decided that it is legal to vectorize the loop, then do it.		// If we decided that it is legal to vectorize the loop, then do it.
InnerLoopVectorizer LB(L, PSE, LI, DT, TLI, TTI, AC, ORE, VF.Width, IC,		InnerLoopVectorizer LB(L, PSE, LI, DT, TLI, TTI, AC, ORE, VF.Width, IC,
&LVL, &CM);		&LVL, &CM);
LVP.executePlan(LB);		LVP.executePlan(LB, DT);
++LoopsVectorized;		++LoopsVectorized;

// Add metadata to disable runtime unrolling a scalar loop when there are		// Add metadata to disable runtime unrolling a scalar loop when there are
// no runtime checks about strides and memory. A scalar loop that is		// no runtime checks about strides and memory. A scalar loop that is
// rarely used is not worth unrolling.		// rarely used is not worth unrolling.
if (!LB.areSafetyChecksAdded())		if (!LB.areSafetyChecksAdded())
AddRuntimeUnrollDisableMetaData(L);		AddRuntimeUnrollDisableMetaData(L);

▲ Show 20 Lines • Show All 111 Lines • Show Last 20 Lines

lib/Transforms/Vectorize/VPlan.h

				//===- VPlan.h - Represent A Vectorizer Plan ------------------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				/// This file contains the declarations of the Vectorization Plan base classes:
				/// 1. VPBasicBlock and VPRegionBlock that inherit from a common pure virtual
				/// VPBlockBase, together implementing a Hierarchical CFG;
				/// 2. Specializations of GraphTraits that allow VPBlockBase graphs to be
				/// treated as proper graphs for generic algorithms;
				/// 3. Pure virtual VPRecipeBase serving as the base class for recipes contained
				/// within VPBasicBlocks;
				/// 4. The VPlan class holding a candidate for vectorization;
				/// 5. The VPlanPrinter class providing a way to print a plan in dot format.
				/// These are documented in docs/VectorizationPlan.rst.
				///
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_TRANSFORMS_VECTORIZE_VPLAN_H
				#define LLVM_TRANSFORMS_VECTORIZE_VPLAN_H

				#include "llvm/ADT/GraphTraits.h"
				#include "llvm/ADT/SmallSet.h"
				#include "llvm/ADT/ilist.h"
				#include "llvm/ADT/ilist_node.h"
				#include "llvm/IR/IRBuilder.h"
				#include "llvm/Support/raw_ostream.h"

				// The (re)use of existing LoopVectorize classes is subject to future VPlan
				// refactoring.
				namespace {
				// Forward declarations.
				//class InnerLoopVectorizer;
				class LoopVectorizationLegality;
				class LoopVectorizationCostModel;
				} // namespace

				namespace llvm {

				// Forward declarations.
				class BasicBlock;
				class InnerLoopVectorizer;
				class VPBasicBlock;

				/// In what follows, the term "input IR" refers to code that is fed into the
				/// vectorizer whereas the term "output IR" refers to code that is generated by
				/// the vectorizer.

				/// VPIteration represents a single point in the iteration space of the output
				/// (vectorized and/or unrolled) IR loop.
				struct VPIteration {
				unsigned Part; ///< in [0..UF)
				unsigned Lane; ///< in [0..VF)
				};

				/// This is a helper struct for maintaining vectorization state. It's used for
				/// mapping values from the original loop to their corresponding values in
				/// the new loop. Two mappings are maintained: one for vectorized values and
				/// one for scalarized values. Vectorized values are represented with UF
				/// vector values in the new loop, and scalarized values are represented with
				/// UF x VF scalar values in the new loop. UF and VF are the unroll and
				/// vectorization factors, respectively.
				///
				/// Entries can be added to either map with setVectorValue and setScalarValue,
				/// which assert that an entry was not already added before. If an entry is to
				/// replace an existing one, call resetVectorValue and resetScalarValue. This is
				/// currently needed to modify the mapped values during "fix-up" operations that
				/// occur once the first phase of widening is complete. These operations include
				/// type truncation and the second phase of recurrence widening.
				///
				/// Entries from either map can be retrieved using the getVectorValue and
				/// getScalarValue functions, which assert that the desired value exists.

				struct VectorizerValueMap {
				private:
				/// The unroll factor. Each entry in the vector map contains UF vector values.
				unsigned UF;

				/// The vectorization factor. Each entry in the scalar map contains UF x VF
				/// scalar values.
				unsigned VF;

				/// The vector and scalar map storage. We use std::map and not DenseMap
				/// because insertions to DenseMap invalidate its iterators.
				typedef SmallVector<Value *, 2> VectorParts;
				typedef SmallVector<SmallVector<Value *, 4>, 2> ScalarParts;
				std::map<Value *, VectorParts> VectorMapStorage;
				std::map<Value *, ScalarParts> ScalarMapStorage;

				public:
				/// Construct an empty map with the given unroll and vectorization factors.
				VectorizerValueMap(unsigned UF, unsigned VF) : UF(UF), VF(VF) {}

				/// \return True if the map has any vector entry for \p Key.
				bool hasAnyVectorValue(Value *Key) const {
				return VectorMapStorage.count(Key);
				}

				/// \return True if the map has a vector entry for \p Key and \p Part.
				bool hasVectorValue(Value *Key, unsigned Part) const {
				assert(Part < UF && "Queried Vector Part is too large.");
				if (!hasAnyVectorValue(Key))
				return false;
				const VectorParts &Entry = VectorMapStorage.find(Key)->second;
				assert(Entry.size() == UF && "VectorParts has wrong dimensions.");
				return Entry[Part] != nullptr;
				}

				/// \return True if the map has any scalar entry for \p Key.
				bool hasAnyScalarValue(Value *Key) const {
				mkuperUnsubmitted Not Done Reply Inline Actions I think naming this SubclassID (like Value does) or something similar would be clearer. Same for VRID. mkuper: I think naming this SubclassID (like Value does) or something similar would be clearer. Same…
				AyalAuthorUnsubmitted Not Done Reply Inline Actions ok, sure. Ayal: ok, sure.
				return ScalarMapStorage.count(Key);
				}
				mkuperUnsubmitted Not Done Reply Inline Actions This look not at all thread-safe, and it seems like the only thing that actually uses the UIDs is the printer. Perhaps assign the IDs on the fly in the printer? (Does anything else in LLVM do it this way?) mkuper: This look not at all thread-safe, and it seems like the only thing that actually uses the UIDs…
				AyalAuthorUnsubmitted Not Done Reply Inline Actions Yes, UID is used for printing only, including the Name. Thinking of having the Planner keep track of this ordinal. Ayal: Yes, UID is used for printing only, including the Name. Thinking of having the Planner keep…
				AyalAuthorUnsubmitted Not Done Reply Inline Actions Assigning the IDs on the fly in the printer, instead. Ayal: Assigning the IDs on the fly in the printer, instead.

				/// \return True if the map has a scalar entry for \p Key and \p Instance.
				bool hasScalarValue(Value *Key, const VPIteration &Instance) const {
				assert(Instance.Part < UF && "Queried Scalar Part is too large.");
				assert(Instance.Lane < VF && "Queried Scalar Lane is too large.");
				if (!hasAnyScalarValue(Key))
				return false;
				const ScalarParts &Entry = ScalarMapStorage.find(Key)->second;
				assert(Entry.size() == UF && "ScalarParts has wrong dimensions.");
				assert(Entry[Instance.Part].size() == VF &&
				"ScalarParts has wrong dimensions.");
				return Entry[Instance.Part][Instance.Lane] != nullptr;
				}

				/// Retrieve the existing vector value that corresponds to \p Key and
				/// \p Part.
				Value getVectorValue(Value Key, unsigned Part) {
				assert(hasVectorValue(Key, Part) && "Getting non-existent value.");
				return VectorMapStorage[Key][Part];
				}

				/// Retrieve the existing scalar value that corresponds to \p Key and
				/// \p Instance.
				Value getScalarValue(Value Key, const VPIteration &Instance) {
				assert(hasScalarValue(Key, Instance) && "Getting non-existent value.");
				return ScalarMapStorage[Key][Instance.Part][Instance.Lane];
				}

				/// Set a vector value associated with \p Key and \p Part. Assumes such a
				/// value is not already set. If it is, use resetVectorValue() instead.
				void setVectorValue(Value Key, unsigned Part, Value Vector) {
				assert(!hasVectorValue(Key, Part) && "Vector value already set for part");
				if (!VectorMapStorage.count(Key)) {
				VectorParts Entry(UF);
				VectorMapStorage[Key] = Entry;
				}
				VectorMapStorage[Key][Part] = Vector;
				}

				/// Set a scalar value associated with \p Key and \p Instance. Assumes such a
				/// value is not already set.
				void setScalarValue(Value Key, const VPIteration &Instance, Value Scalar) {
				assert(!hasScalarValue(Key, Instance) && "Scalar value already set");
				if (!ScalarMapStorage.count(Key)) {
				ScalarParts Entry(UF);
				// TODO: Consider storing uniform values only per-part, as they occupy
				// lane 0 only, keeping the other VF-1 redundant entries null.
				for (unsigned Part = 0; Part < UF; ++Part)
				Entry[Part].resize(VF, nullptr);
				ScalarMapStorage[Key] = Entry;
				}
				ScalarMapStorage[Key][Instance.Part][Instance.Lane] = Scalar;
				}

				/// Reset the vector value associated with \p Key for the given \p Part.
				/// This function can be used to update values that have already been
				/// vectorized. This is the case for "fix-up" operations including type
				mkuperUnsubmitted Not Done Reply Inline Actions I think this should be ConstVPBlaocksTy or something of that sort. (Or, just use const VPBlocksTy?) mkuper: I think this should be ConstVPBlaocksTy or something of that sort. (Or, just use const…
				AyalAuthorUnsubmitted Not Done Reply Inline Actions (Right) Ayal: (Right)
				/// truncation and the second phase of recurrence vectorization.
				void resetVectorValue(Value Key, unsigned Part, Value Vector) {
				assert(hasVectorValue(Key, Part) && "Vector value not set for part");
				VectorMapStorage[Key][Part] = Vector;
				}

				/// Reset the scalar value associated with \p Key for \p Part and \p Lane.
				/// This function can be used to update values that have already been
				/// scalarized. This is the case for "fix-up" operations including scalar phi
				/// nodes for scalarized and predicated instructions.
				void resetScalarValue(Value *Key, const VPIteration &Instance,
				Value *Scalar) {
				assert(hasScalarValue(Key, Instance) &&
				mkuperUnsubmitted Not Done Reply Inline Actions Does this strictly need to be public? It looks like it would only be used in the classof of subclasses. Can it be protected? mkuper: Does this strictly need to be public? It looks like it would only be used in the classof of…
				AyalAuthorUnsubmitted Not Done Reply Inline Actions It needs to be public. A method (classof) of a derived class cannot invoke a protected method on an object of the base class (not "this"). Ayal: It needs to be public. A method (classof) of a derived class cannot invoke a protected method…
				"Scalar value not set for part and lane");
				ScalarMapStorage[Key][Instance.Part][Instance.Lane] = Scalar;
				mkuperUnsubmitted Done Reply Inline Actions You can drop the "class" here and everywhere below. mkuper: You can drop the "class" here and everywhere below.
				}
				};

				/// VPTransformState holds information passed down when "executing" a VPlan,
				/// needed for generating the output IR.
				struct VPTransformState {

				VPTransformState(unsigned VF, unsigned UF, class LoopInfo *LI,
				class DominatorTree *DT, IRBuilder<> &Builder,
				VectorizerValueMap &ValueMap, InnerLoopVectorizer *ILV)
				: VF(VF), UF(UF), Instance(), LI(LI), DT(DT), Builder(Builder),
				ValueMap(ValueMap), ILV(ILV) {}

				/// The chosen Vectorization and Unroll Factors of the loop being vectorized.
				unsigned VF;
				unsigned UF;

				/// Hold the indices to generate specific scalar instructions. Null indicates
				/// that all instances are to be generated, using either scalar or vector
				/// instructions.
				Optional<VPIteration> Instance;

				/// Hold state information used when constructing the CFG of the output IR,
				/// traversing the VPBasicBlocks and generating corresponding IR BasicBlocks.
				struct CFGState {
				/// The previous VPBasicBlock visited. Initially set to null.
				VPBasicBlock *PrevVPBB;
				/// The previous IR BasicBlock created or used. Initially set to the new
				/// header BasicBlock.
				BasicBlock *PrevBB;
				/// The last IR BasicBlock in the output IR. Set to the new latch
				/// BasicBlock, used for placing the newly created BasicBlocks.
				BasicBlock *LastBB;
				/// A mapping of each VPBasicBlock to the corresponding BasicBlock. In case
				mkuperUnsubmitted Not Done Reply Inline Actions The terminology here is somewhat confusing, since it's not immediately clear what the difference between an "ancestor" and a "predecessor" is. Maybe "enclosing region" instead of ancestor? mkuper: The terminology here is somewhat confusing, since it's not immediately clear what the…
				AyalAuthorUnsubmitted Not Done Reply Inline Actions Added the following definition to clarify the terminology: /// An Ancestor of a block B is any block containing B, including B itself. So it's either an "enclosing region" or the block itself, which in turn may be a region or a basic-block. Ayal: Added the following definition to clarify the terminology: /// An Ancestor of a block B is…
				mkuperUnsubmitted Not Done Reply Inline Actions I would strongly prefer less confusing terminology (I think ancestor very strongly evokes "direct or indirect predecessor"), but if I'm the only one getting confused by this, feel free to keep it. mkuper: I would strongly prefer less confusing terminology (I think ancestor very strongly evokes…
				AyalAuthorUnsubmitted Not Done Reply Inline Actions ok, "Ancestors" and "Predecessors" are admittedly ambiguous terms in English... will use "getEnclosingBlockWithPredecessors()" and "getEnclosingBlockWithSuccecessors()" instead, after defining an Enclosing Block of block B to be any block that contains B, including itself. Ayal: ok, "Ancestors" and "Predecessors" are admittedly ambiguous terms in English... will use…
				/// of replication, maps the BasicBlock of the last replica created.
				SmallDenseMap<VPBasicBlock , BasicBlock > VPBB2IRBB;

				CFGState() : PrevVPBB(nullptr), PrevBB(nullptr), LastBB(nullptr) {}
				} CFG;

				/// Hold a pointer to LoopInfo to register new basic blocks in the loop.
				class LoopInfo *LI;

				/// Hold a pointer to Dominator Tree to register new basic blocks in the loop.
				class DominatorTree *DT;

				/// Hold a reference to the IRBuilder used to generate output IR code.
				IRBuilder<> &Builder;

				/// Hold a reference to the Value state information used when generating the
				/// Values of the output IR.
				VectorizerValueMap &ValueMap;

				/// Hold a pointer to InnerLoopVectorizer to reuse its IR generation methods.
				class InnerLoopVectorizer *ILV;
				};

				/// VPBlockBase is the building block of the Hierarchical Control-Flow Graph.
				/// A VPBlockBase can be either a VPBasicBlock or a VPRegionBlock.
				class VPBlockBase {
				private:
				const unsigned char SubclassID; ///< Subclass identifier (for isa/dyn_cast).

				/// An optional name for the block.
				std::string Name;

				/// The immediate VPRegionBlock which this VPBlockBase belongs to, or null if
				/// it is a topmost VPBlockBase.
				class VPRegionBlock *Parent;

				/// List of predecessor blocks.
				SmallVector<VPBlockBase *, 1> Predecessors;

				/// List of successor blocks.
				SmallVector<VPBlockBase *, 1> Successors;

				/// Add \p Successor as the last successor to this block.
				void appendSuccessor(VPBlockBase *Successor) {
				assert(Successor && "Cannot add nullptr successor!");
				Successors.push_back(Successor);
				}

				/// Add \p Predecessor as the last predecessor to this block.
				void appendPredecessor(VPBlockBase *Predecessor) {
				assert(Predecessor && "Cannot add nullptr predecessor!");
				Predecessors.push_back(Predecessor);
				}

				/// Remove \p Predecessor from the predecessors of this block.
				void removePredecessor(VPBlockBase *Predecessor) {
				auto Pos = std::find(Predecessors.begin(), Predecessors.end(), Predecessor);
				assert(Pos && "Predecessor does not exist");
				Predecessors.erase(Pos);
				}

				/// Remove \p Successor from the successors of this block.
				void removeSuccessor(VPBlockBase *Successor) {
				auto Pos = std::find(Successors.begin(), Successors.end(), Successor);
				assert(Pos && "Successor does not exist");
				Successors.erase(Pos);
				}

				protected:
				VPBlockBase(const unsigned char SC, const std::string &N)
				: SubclassID(SC), Name(N), Parent(nullptr) {}

				public:
				/// An enumeration for keeping track of the concrete subclass of VPBlockBase
				/// that are actually instantiated. Values of this enumeration are kept in the
				/// SubclassID field of the VPBlockBase objects. They are used for concrete
				/// type identification.
				typedef enum { VPBasicBlockSC, VPRegionBlockSC } VPBlockTy;

				typedef SmallVectorImpl<VPBlockBase *> VPBlocksTy;

				virtual ~VPBlockBase() {}

				const std::string &getName() const { return Name; }
				mkuperUnsubmitted Done Reply Inline Actions Same as above. mkuper: Same as above.

				void setName(const Twine &newName) { Name = newName.str(); }

				/// \return an ID for the concrete type of this object.
				/// This is used to implement the classof checks. This should not be used
				/// for any other purpose, as the values may change as LLVM evolves.
				unsigned getVPBlockID() const { return SubclassID; }
				mkuperUnsubmitted Done Reply Inline Actions are -> is? mkuper: are -> is?

				const VPRegionBlock *getParent() const { return Parent; }

				mkuperUnsubmitted Not Done Reply Inline Actions I'm not sure about the type safety of this. Is this how we do it elsewhere? mkuper: I'm not sure about the type safety of this. Is this how we do it elsewhere?
				AyalAuthorUnsubmitted Not Done Reply Inline Actions Not sure what the type safety concern is? Ayal: Not sure what the type safety concern is?
				void setParent(VPRegionBlock *P) { Parent = P; }

				/// \return the VPBasicBlock that is the entry of this VPBlockBase,
				/// recursively, if the latter is a VPRegionBlock. Otherwise, if this
				/// VPBlockBase is a VPBasicBlock, it is returned.
				const VPBasicBlock *getEntryBasicBlock() const;
				VPBasicBlock *getEntryBasicBlock();

				/// \return the VPBasicBlock that is the exit of this VPBlockBase,
				/// recursively, if the latter is a VPRegionBlock. Otherwise, if this
				/// VPBlockBase is a VPBasicBlock, it is returned.
				const VPBasicBlock *getExitBasicBlock() const;
				VPBasicBlock *getExitBasicBlock();

				const VPBlocksTy &getSuccessors() const { return Successors; }
				VPBlocksTy &getSuccessors() { return Successors; }

				const VPBlocksTy &getPredecessors() const { return Predecessors; }
				VPBlocksTy &getPredecessors() { return Predecessors; }

				/// \return the successor of this VPBlockBase if it has a single successor.
				/// Otherwise return a null pointer.
				VPBlockBase *getSingleSuccessor() const {
				return (Successors.size() == 1 ? *Successors.begin() : nullptr);
				}

				/// \return the predecessor of this VPBlockBase if it has a single
				/// predecessor. Otherwise return a null pointer.
				VPBlockBase *getSinglePredecessor() const {
				return (Predecessors.size() == 1 ? *Predecessors.begin() : nullptr);
				}

				/// An Enclosing Block of a block B is any block containing B, including B
				/// itself. \return the closest enclosing block starting from "this", which
				/// has successors. \return the root enclosing block if all enclosing blocks
				/// have no successors.
				VPBlockBase *getEnclosingBlockWithSuccessors();

				/// \return the closest enclosing block starting from "this", which has
				/// predecessors. \return the root enclosing block if all enclosing blocks
				/// have no predecessors.
				VPBlockBase *getEnclosingBlockWithPredecessors();

				/// \return the successors either attached directly to this VPBlockBase or, if
				/// this VPBlockBase is the exit block of a VPRegionBlock and has no
				/// successors of its own, search recursively for the first enclosing
				/// VPRegionBlock that has successors and return them. If no such
				/// VPRegionBlock exists, return the (empty) successors of the topmost
				/// VPBlockBase reached.
				const VPBlocksTy &getHierarchicalSuccessors() {
				return getEnclosingBlockWithSuccessors()->getSuccessors();
				}

				/// \return the hierarchical successor of this VPBlockBase if it has a single
				/// hierarchical successor. Otherwise return a null pointer.
				VPBlockBase *getSingleHierarchicalSuccessor() {
				return getEnclosingBlockWithSuccessors()->getSingleSuccessor();
				}

				/// \return the predecessors either attached directly to this VPBlockBase or,
				/// if this VPBlockBase is the entry block of a VPRegionBlock and has no
				/// predecessors of its own, search recursively for the first enclosing
				/// VPRegionBlock that has predecessors and return them. If no such
				/// VPRegionBlock exists, return the (empty) predecessors of the topmost
				/// VPBlockBase reached.
				const VPBlocksTy &getHierarchicalPredecessors() {
				return getEnclosingBlockWithPredecessors()->getPredecessors();
				}

				/// \return the hierarchical predecessor of this VPBlockBase if it has a
				/// single hierarchical predecessor. Otherwise return a null pointer.
				VPBlockBase *getSingleHierarchicalPredecessor() {
				return getEnclosingBlockWithPredecessors()->getSinglePredecessor();
				mkuperUnsubmitted Not Done Reply Inline Actions Do we want the const accessor? IIUC, if you're not modifying it, you should be using the forwarding methods. In any case, it looks like neither of these methods is used right now. Can we remove both? mkuper: Do we want the const accessor? IIUC, if you're not modifying it, you should be using the…
				AyalAuthorUnsubmitted Not Done Reply Inline Actions Yes we can. (May sound old-fashioned ;-) Ayal: Yes we can. (May sound old-fashioned ;-)
				}

				/// Sets a given VPBlockBase \p Successor as the single successor and \return
				/// \p Successor. The parent of this Block is copied to be the parent of
				mkuperUnsubmitted Not Done Reply Inline Actions This isn't used anywhere either - and even if it were, the whole thing looks odd. mkuper: This isn't used anywhere either - and even if it were, the whole thing looks odd.
				AyalAuthorUnsubmitted Not Done Reply Inline Actions This odd looking thing is part of the ilist_node_with_parent contract, used by getPrevNode() and getNextNode(). Ayal: This odd looking thing is part of the ilist_node_with_parent contract, used by getPrevNode()…
				/// \p Successor.
				VPBlockBase setOneSuccessor(VPBlockBase Successor) {
				assert(Successors.empty() && "Setting one successor when others exist.");
				appendSuccessor(Successor);
				Successor->appendPredecessor(this);
				Successor->Parent = Parent;
				return Successor;
				}

				/// Sets two given VPBlockBases \p IfTrue and \p IfFalse to be the two
				/// successors. The parent of this Block is copied to be the parent of both
				/// \p IfTrue and \p IfFalse.
				void setTwoSuccessors(VPBlockBase IfTrue, VPBlockBase IfFalse) {
				assert(Successors.empty() && "Setting two successors when others exist.");
				appendSuccessor(IfTrue);
				appendSuccessor(IfFalse);
				IfTrue->appendPredecessor(this);
				IfFalse->appendPredecessor(this);
				IfTrue->Parent = Parent;
				IfFalse->Parent = Parent;
				}

				void disconnectSuccessor(VPBlockBase *Successor) {
				assert(Successor && "Successor to disconnect is null.");
				removeSuccessor(Successor);
				Successor->removePredecessor(this);
				}

				/// The method which generates the output IR that correspond to this
				/// VPBlockBase, thereby "executing" the VPlan.
				virtual void execute(struct VPTransformState *State) = 0;

				/// Delete all blocks reachable from a given VPBlockBase, inclusive.
				static void deleteCFG(VPBlockBase *Entry);
				};

				/// VPRecipeBase is a base class modeling a sequence of one or more output IR
				/// instructions.
				class VPRecipeBase : public ilist_node_with_parent<VPRecipeBase, VPBasicBlock> {
				friend VPBasicBlock;

				private:
				const unsigned char SubclassID; ///< Subclass identifier (for isa/dyn_cast).

				/// Each VPRecipe belongs to a single VPBasicBlock.
				VPBasicBlock *Parent;

				public:
				/// An enumeration for keeping track of the concrete subclass of VPRecipeBase
				/// that is actually instantiated. Values of this enumeration are kept in the
				/// SubclassID field of the VPRecipeBase objects. They are used for concrete
				/// type identification.
				typedef enum {
				VPBranchOnMaskSC,
				VPInterleaveSC,
				VPPredInstPHISC,
				VPReplicateSC,
				VPWidenIntOrFpInductionSC,
				VPWidenPHISC,
				VPWidenSC,
				} VPRecipeTy;

				VPRecipeBase(const unsigned char SC) : SubclassID(SC), Parent(nullptr) {}

				virtual ~VPRecipeBase() {}

				/// \return an ID for the concrete type of this object.
				/// This is used to implement the classof checks. This should not be used
				mkuperUnsubmitted Not Done Reply Inline Actions Is it possible not to have an Entry here? It seems like it shouldn't be, so this should probably be an assert. mkuper: Is it possible not to have an Entry here? It seems like it shouldn't be, so this should…
				AyalAuthorUnsubmitted Not Done Reply Inline Actions It is conceivable for code motion to completely empty a Region from all its blocks. Ayal: It is conceivable for code motion to completely empty a Region from all its blocks.
				/// for any other purpose, as the values may change as LLVM evolves.
				unsigned getVPRecipeID() const { return SubclassID; }

				/// \return the VPBasicBlock which this VPRecipe belongs to.
				VPBasicBlock *getParent() { return Parent; }
				const VPBasicBlock *getParent() const { return Parent; }

				/// The method which generates the output IR instructions that correspond to
				/// this VPRecipe, thereby "executing" the VPlan.
				virtual void execute(struct VPTransformState &State) = 0;

				/// Each recipe prints itself.
				virtual void print(raw_ostream &O, const Twine &Indent) const = 0;
				};

				/// VPBasicBlock serves as the leaf of the Hierarchical Control-Flow Graph. It
				/// holds a sequence of zero or more VPRecipe's each representing a sequence of
				/// output IR instructions.
				class VPBasicBlock : public VPBlockBase {
				public:
				typedef iplist<VPRecipeBase> RecipeListTy;

				private:
				/// The VPRecipes held in the order of output instructions to generate.
				RecipeListTy Recipes;

				public:
				/// Instruction iterators...
				typedef RecipeListTy::iterator iterator;
				typedef RecipeListTy::const_iterator const_iterator;
				typedef RecipeListTy::reverse_iterator reverse_iterator;
				typedef RecipeListTy::const_reverse_iterator const_reverse_iterator;

				//===--------------------------------------------------------------------===//
				/// Recipe iterator methods
				mkuperUnsubmitted Not Done Reply Inline Actions Maybe a SmallSet? mkuper: Maybe a SmallSet?
				AyalAuthorUnsubmitted Not Done Reply Inline Actions Initially wanted to iterate over the VFs, but that's not needed now. Changed to SmallSet, and also replaced getVFs() with hasVF(unsigned). Ayal: Initially wanted to iterate over the VFs, but that's not needed now. Changed to SmallSet, and…
				///
				inline iterator begin() { return Recipes.begin(); }
				inline const_iterator begin() const { return Recipes.begin(); }
				inline iterator end() { return Recipes.end(); }
				inline const_iterator end() const { return Recipes.end(); }

				inline reverse_iterator rbegin() { return Recipes.rbegin(); }
				inline const_reverse_iterator rbegin() const { return Recipes.rbegin(); }
				inline reverse_iterator rend() { return Recipes.rend(); }
				inline const_reverse_iterator rend() const { return Recipes.rend(); }

				inline size_t size() const { return Recipes.size(); }
				inline bool empty() const { return Recipes.empty(); }
				inline const VPRecipeBase &front() const { return Recipes.front(); }
				inline VPRecipeBase &front() { return Recipes.front(); }
				inline const VPRecipeBase &back() const { return Recipes.back(); }
				inline VPRecipeBase &back() { return Recipes.back(); }

				/// \brief Returns a pointer to a member of the recipe list.
				static RecipeListTy VPBasicBlock::getSublistAccess(VPRecipeBase ) {
				return &VPBasicBlock::Recipes;
				}

				mkuperUnsubmitted Not Done Reply Inline Actions Is this functionality we want? mkuper: Is this functionality we want?
				AyalAuthorUnsubmitted Not Done Reply Inline Actions Possibly, but not now, removed. Ayal: Possibly, but not now, removed.
				VPBasicBlock(const Twine &Name = "", VPRecipeBase *Recipe = nullptr)
				: VPBlockBase(VPBasicBlockSC, Name.str()) {
				mkuperUnsubmitted Not Done Reply Inline Actions I don't think we want a non-const version of this. mkuper: I don't think we want a non-const version of this.
				AyalAuthorUnsubmitted Not Done Reply Inline Actions We can do w/o both versions, by only checking hasVF(unsigned). Ayal: We can do w/o both versions, by only checking hasVF(unsigned).
				if (Recipe)
				appendRecipe(Recipe);
				}

				~VPBasicBlock() { Recipes.clear(); }

				/// Method to support type inquiry through isa, cast, and dyn_cast.
				static inline bool classof(const VPBlockBase *V) {
				return V->getVPBlockID() == VPBlockBase::VPBasicBlockSC;
				}
				mkuperUnsubmitted Not Done Reply Inline Actions Should this be static? Or maybe even a free function in the implementation? mkuper: Should this be static? Or maybe even a free function in the implementation?
				AyalAuthorUnsubmitted Not Done Reply Inline Actions Yes, this should be static. It's offloading the last part of VPlan::execute(), so good to keep adjacent. Ayal: Yes, this should be static. It's offloading the last part of VPlan::execute(), so good to keep…

				/// Augment the existing recipes of a VPBasicBlock with an additional
				/// \p Recipe as the last recipe.
				void appendRecipe(VPRecipeBase *Recipe) {
				assert(Recipe && "No recipe to append.");
				assert(!Recipe->Parent && "Recipe already in VPlan");
				Recipe->Parent = this;
				return Recipes.push_back(Recipe);
				}

				/// The method which generates the output IR instructions that correspond to
				/// this VPBasicBlock, thereby "executing" the VPlan.
				void execute(struct VPTransformState *State) override;

				private:
				/// Create an IR BasicBlock to hold the output instructions generated by this
				/// VPBasicBlock, and return it. Update the CFGState accordingly.
				BasicBlock *createEmptyBasicBlock(VPTransformState::CFGState &CFG);
				};

				/// VPRegionBlock represents a collection of VPBasicBlocks and VPRegionBlocks
				/// which form a Single-Entry-Single-Exit subgraph of the output IR CFG.
				/// A VPRegionBlock may indicate that its contents are to be replicated several
				/// times. This is designed to support predicated scalarization, in which a
				/// scalar if-then code structure needs to be generated VF * UF times. Having
				/// this replication indicator helps to keep a single model for multiple
				/// candidate VF's. The actual replication takes place only once the desired VF
				/// and UF have been determined.
				class VPRegionBlock : public VPBlockBase {
				private:
				/// Hold the Single Entry of the SESE region modelled by the VPRegionBlock.
				VPBlockBase *Entry;

				/// Hold the Single Exit of the SESE region modelled by the VPRegionBlock.
				VPBlockBase *Exit;

				/// An indicator whether this region is to generate multiple replicated
				/// instances of output IR corresponding to its VPBlockBases.
				bool IsReplicator;

				public:
				VPRegionBlock(VPBlockBase Entry, VPBlockBase Exit,
				const std::string &Name = "", bool IsReplicator = false)
				: VPBlockBase(VPRegionBlockSC, Name), Entry(Entry), Exit(Exit),
				IsReplicator(IsReplicator) {
				assert(Entry->getPredecessors().empty() && "Entry block has predecessors.");
				assert(Exit->getSuccessors().empty() && "Exit block has successors.");
				Entry->setParent(this);
				Exit->setParent(this);
				}

				~VPRegionBlock() {
				if (Entry)
				deleteCFG(Entry);
				}

				/// Method to support type inquiry through isa, cast, and dyn_cast.
				static inline bool classof(const VPBlockBase *V) {
				return V->getVPBlockID() == VPBlockBase::VPRegionBlockSC;
				}

				const VPBlockBase *getEntry() const { return Entry; }
				VPBlockBase *getEntry() { return Entry; }

				const VPBlockBase *getExit() const { return Exit; }
				VPBlockBase *getExit() { return Exit; }

				/// An indicator whether this region is to generate multiple replicated
				/// instances of output IR corresponding to its VPBlockBases.
				bool isReplicator() const { return IsReplicator; }

				/// The method which generates the output IR instructions that correspond to
				/// this VPRegionBlock, thereby "executing" the VPlan.
				void execute(struct VPTransformState *State) override;
				};

				/// VPlan models a candidate for vectorization, encoding various decisions take
				/// to produce efficient output IR, including which branches, basic-blocks and
				/// output IR instructions to generate, and their cost. VPlan holds a
				/// Hierarchical-CFG of VPBasicBlocks and VPRegionBlocks rooted at an Entry
				/// VPBlock.
				class VPlan {
				private:
				/// Hold the single entry to the Hierarchical CFG of the VPlan.
				VPBlockBase *Entry;

				/// Holds the VFs applicable to this VPlan.
				SmallSet<unsigned, 2> VFs;

				/// Holds the name of the VPlan, for printing.
				std::string Name;

				public:
				VPlan(VPBlockBase *Entry = nullptr) : Entry(Entry) {}

				~VPlan() {
				if (Entry)
				VPBlockBase::deleteCFG(Entry);
				}

				/// Generate the IR code for this VPlan.
				void execute(struct VPTransformState *State);

				VPBlockBase *getEntry() { return Entry; }
				const VPBlockBase *getEntry() const { return Entry; }

				VPBlockBase setEntry(VPBlockBase Block) { return Entry = Block; }

				void addVF(unsigned VF) { VFs.insert(VF); }

				bool hasVF(unsigned VF) { return VFs.count(VF); }

				const std::string &getName() const { return Name; }

				void setName(const Twine &newName) { Name = newName.str(); }

				private:
				/// Add to the given dominator tree the header block and every new basic block
				/// that was created between it and the latch block, inclusive.
				static void updateDominatorTree(class DominatorTree *DT,
				BasicBlock *LoopPreHeaderBB,
				BasicBlock *LoopLatchBB);
				};

				/// VPlanPrinter prints a given VPlan to a given output stream. The printing is
				/// indented and follows the dot format.
				class VPlanPrinter {
				friend inline raw_ostream &operator<<(raw_ostream &OS, VPlan &Plan);
				friend inline raw_ostream &operator<<(raw_ostream &OS,
				const struct VPlanIngredient &I);

				private:
				raw_ostream &OS;
				VPlan &Plan;
				unsigned Depth;
				unsigned TabWidth = 2;
				std::string Indent;

				unsigned BID = 0;

				SmallDenseMap<const VPBlockBase *, unsigned> BlockID;

				/// Handle indentation.
				void bumpIndent(int b) { Indent = std::string((Depth += b) * TabWidth, ' '); }

				/// Print a given \p Block of the Plan.
				void dumpBlock(const VPBlockBase *Block);

				/// Print the information related to the CFG edges going out of a given
				/// \p Block, followed by printing the successor blocks themselves.
				void dumpEdges(const VPBlockBase *Block);

				/// Print a given \p BasicBlock, including its VPRecipes, followed by printing
				/// its successor blocks.
				void dumpBasicBlock(const VPBasicBlock *BasicBlock);

				/// Print a given \p Region of the Plan.
				void dumpRegion(const VPRegionBlock *Region);

				unsigned getOrCreateBID(const VPBlockBase *Block) {
				return BlockID.count(Block) ? BlockID[Block] : BlockID[Block] = BID++;
				}

				const Twine getOrCreateName(const VPBlockBase *Block);

				const Twine getUID(const VPBlockBase *Block);

				/// Print the information related to a CFG edge between two VPBlockBases.
				void drawEdge(const VPBlockBase From, const VPBlockBase To, bool Hidden,
				const Twine &Label);

				VPlanPrinter(raw_ostream &O, VPlan &P) : OS(O), Plan(P) {}

				void dump();

				static void printAsIngredient(raw_ostream &O, Value *V);
				};

				struct VPlanIngredient {
				Value *V;
				VPlanIngredient(Value *V) : V(V) {}
				};

				inline raw_ostream &operator<<(raw_ostream &OS, const VPlanIngredient &I) {
				VPlanPrinter::printAsIngredient(OS, I.V);
				return OS;
				}

				inline raw_ostream &operator<<(raw_ostream &OS, VPlan &Plan) {
				VPlanPrinter Printer(OS, Plan);
				Printer.dump();
				return OS;
				}

				//===--------------------------------------------------------------------===//
				// GraphTraits specializations for VPlan/VPRegionBlock Control-Flow Graphs //
				//===--------------------------------------------------------------------===//

				// Provide specializations of GraphTraits to be able to treat a VPBlockBase as a
				// graph of VPBlockBase nodes...

				template <> struct GraphTraits<VPBlockBase *> {
				typedef VPBlockBase *NodeRef;
				typedef SmallVectorImpl<VPBlockBase *>::iterator ChildIteratorType;

				static NodeRef getEntryNode(NodeRef N) { return N; }

				static inline ChildIteratorType child_begin(NodeRef N) {
				return N->getSuccessors().begin();
				}

				static inline ChildIteratorType child_end(NodeRef N) {
				return N->getSuccessors().end();
				}
				};

				template <> struct GraphTraits<const VPBlockBase *> {
				typedef const VPBlockBase *NodeRef;
				typedef SmallVectorImpl<VPBlockBase *>::const_iterator ChildIteratorType;

				static NodeRef getEntryNode(NodeRef N) { return N; }

				static inline ChildIteratorType child_begin(NodeRef N) {
				return N->getSuccessors().begin();
				}

				static inline ChildIteratorType child_end(NodeRef N) {
				return N->getSuccessors().end();
				}
				};

				// Provide specializations of GraphTraits to be able to treat a VPBlockBase as a
				// graph of VPBlockBase nodes... and to walk it in inverse order. Inverse order
				// for a VPBlockBase is considered to be when traversing the predecessors of a
				// VPBlockBase instead of its successors.
				//

				template <> struct GraphTraits<Inverse<VPBlockBase *>> {
				typedef VPBlockBase *NodeRef;
				typedef SmallVectorImpl<VPBlockBase *>::iterator ChildIteratorType;

				static Inverse<VPBlockBase > getEntryNode(Inverse<VPBlockBase > B) {
				return B;
				}

				static inline ChildIteratorType child_begin(NodeRef N) {
				return N->getPredecessors().begin();
				}

				static inline ChildIteratorType child_end(NodeRef N) {
				return N->getPredecessors().end();
				}
				};

				} // namespace llvm

				#endif // LLVM_TRANSFORMS_VECTORIZE_VPLAN_H

lib/Transforms/Vectorize/VPlan.cpp

				//===- VPlan.cpp - Vectorizer Plan ----------------------------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				/// This is the LLVM vectorization plan. It represents a candidate for
				/// vectorization, allowing to plan and optimize how to vectorize a given loop
				/// before generating LLVM-IR.
				/// The vectorizer uses vectorization plans to estimate the costs of potential
				/// candidates and if profitable to execute the desired plan, generating vector
				/// LLVM-IR code.
				///
				//===----------------------------------------------------------------------===//

				#include "VPlan.h"
				#include "llvm/ADT/PostOrderIterator.h"
				#include "llvm/Analysis/LoopInfo.h"
				#include "llvm/IR/BasicBlock.h"
				#include "llvm/IR/Dominators.h"
				#include "llvm/Support/GraphWriter.h"
				#include "llvm/Transforms/Utils/BasicBlockUtils.h"

				using namespace llvm;

				#define DEBUG_TYPE "vplan"

				/// \return the VPBasicBlock that is the entry of Block, possibly indirectly.
				const VPBasicBlock *VPBlockBase::getEntryBasicBlock() const {
				const VPBlockBase *Block = this;
				while (const VPRegionBlock *Region = dyn_cast<VPRegionBlock>(Block))
				Block = Region->getEntry();
				return cast<VPBasicBlock>(Block);
				}

				VPBasicBlock *VPBlockBase::getEntryBasicBlock() {
				VPBlockBase *Block = this;
				while (VPRegionBlock *Region = dyn_cast<VPRegionBlock>(Block))
				Block = Region->getEntry();
				mkuperUnsubmitted Not Done Reply Inline Actions Any chance to make the const/non-const versions, here and below, share implementation? It seems like it ought to be possible. mkuper: Any chance to make the const/non-const versions, here and below, share implementation? It seems…
				AyalAuthorUnsubmitted Not Done Reply Inline Actions Not sure how, w/o casting away const. We can drop the non-const getEntryBasicBlock() as only its const version is currently used, but both versions of getExitBasicBlock() are used. Ayal: Not sure how, w/o casting away const. We can drop the non-const getEntryBasicBlock() as only…
				AyalAuthorUnsubmitted Not Done Reply Inline Actions Leaving them as is. Ayal: Leaving them as is.
				return cast<VPBasicBlock>(Block);
				}

				/// \return the VPBasicBlock that is the exit of Block, possibly indirectly.
				const VPBasicBlock *VPBlockBase::getExitBasicBlock() const {
				const VPBlockBase *Block = this;
				while (const VPRegionBlock *Region = dyn_cast<VPRegionBlock>(Block))
				Block = Region->getExit();
				return cast<VPBasicBlock>(Block);
				}

				VPBasicBlock *VPBlockBase::getExitBasicBlock() {
				VPBlockBase *Block = this;
				while (VPRegionBlock *Region = dyn_cast<VPRegionBlock>(Block))
				Block = Region->getExit();
				return cast<VPBasicBlock>(Block);
				}

				VPBlockBase *VPBlockBase::getEnclosingBlockWithSuccessors() {
				if (!Successors.empty() \|\| !Parent)
				return this;
				assert(Parent->getExit() == this &&
				"Block w/o successors not the exit of its parent.");
				return Parent->getEnclosingBlockWithSuccessors();
				}

				VPBlockBase *VPBlockBase::getEnclosingBlockWithPredecessors() {
				if (!Predecessors.empty() \|\| !Parent)
				return this;
				assert(Parent->getEntry() == this &&
				"Block w/o predecessors not the entry of its parent.");
				return Parent->getEnclosingBlockWithPredecessors();
				}

				void VPBlockBase::deleteCFG(VPBlockBase *Entry) {
				SmallVector<VPBlockBase *, 8> Blocks;
				for (VPBlockBase *Block : depth_first(Entry))
				Blocks.push_back(Block);

				for (VPBlockBase *Block : Blocks)
				delete Block;
				}

				BasicBlock *
				VPBasicBlock::createEmptyBasicBlock(VPTransformState::CFGState &CFG) {
				// BB stands for IR BasicBlocks. VPBB stands for VPlan VPBasicBlocks.
				// Pred stands for Predessor. Prev stands for Previous - last visited/created.
				BasicBlock *PrevBB = CFG.PrevBB;
				BasicBlock *NewBB = BasicBlock::Create(PrevBB->getContext(), getName(),
				PrevBB->getParent(), CFG.LastBB);
				DEBUG(dbgs() << "LV: created " << NewBB->getName() << '\n');

				// Hook up the new basic block to its predecessors.
				for (VPBlockBase *PredVPBlock : getHierarchicalPredecessors()) {
				VPBasicBlock *PredVPBB = PredVPBlock->getExitBasicBlock();
				auto &PredVPSuccessors = PredVPBB->getSuccessors();
				BasicBlock *PredBB = CFG.VPBB2IRBB[PredVPBB];
				assert(PredBB && "Predecessor basic-block not found building successor.");
				auto *PredBBTerminator = PredBB->getTerminator();
				DEBUG(dbgs() << "LV: draw edge from" << PredBB->getName() << '\n');
				if (isa<UnreachableInst>(PredBBTerminator)) {
				assert(PredVPSuccessors.size() == 1 &&
				"Predecessor ending w/o branch must have single successor.");
				PredBBTerminator->eraseFromParent();
				BranchInst::Create(NewBB, PredBB);
				} else {
				assert(PredVPSuccessors.size() == 2 &&
				"Predecessor ending with branch must have two successors.");
				unsigned idx = PredVPSuccessors.front() == this ? 0 : 1;
				assert(!PredBBTerminator->getSuccessor(idx) &&
				"Trying to reset an existing successor block.");
				PredBBTerminator->setSuccessor(idx, NewBB);
				}
				}
				return NewBB;
				}

				void VPBasicBlock::execute(VPTransformState *State) {
				bool Replica = State->Instance &&
				!(State->Instance->Part == 0 && State->Instance->Lane == 0);
				VPBasicBlock *PrevVPBB = State->CFG.PrevVPBB;
				VPBlockBase *SingleHPred = nullptr;
				BasicBlock *NewBB = State->CFG.PrevBB; // Reuse it if possible.

				// 1. Create an IR basic block, or reuse the last one if possible.
				// The last IR basic block is reused, as an optimization, in three cases:
				// A. the first VPBB reuses the loop header BB - when PrevVPBB is null;
				// B. when the current VPBB has a single (hierarchical) predecessor which
				// is PrevVPBB and the latter has a single (hierarchical) successor; and
				// C. when the current VPBB is an entry of a region replica - where PrevVPBB
				// is the exit of this region from a previous instance, or the predecessor
				// of this region.
				if (PrevVPBB && /* A */
				!((SingleHPred = getSingleHierarchicalPredecessor()) &&
				SingleHPred->getExitBasicBlock() == PrevVPBB &&
				PrevVPBB->getSingleHierarchicalSuccessor()) && /* B */
				!(Replica && getPredecessors().empty())) { /* C */

				NewBB = createEmptyBasicBlock(State->CFG);
				State->Builder.SetInsertPoint(NewBB);
				// Temporarily terminate with unreachable until CFG is rewired.
				UnreachableInst *Terminator = State->Builder.CreateUnreachable();
				State->Builder.SetInsertPoint(Terminator);
				// Register NewBB in its loop. In innermost loops its the same for all BB's.
				Loop *L = State->LI->getLoopFor(State->CFG.LastBB);
				L->addBasicBlockToLoop(NewBB, *State->LI);
				State->CFG.PrevBB = NewBB;
				}

				// 2. Fill the IR basic block with IR instructions.
				DEBUG(dbgs() << "LV: vectorizing VPBB:" << getName()
				<< " in BB:" << NewBB->getName() << '\n');

				State->CFG.VPBB2IRBB[this] = NewBB;
				State->CFG.PrevVPBB = this;

				for (VPRecipeBase &Recipe : Recipes)
				Recipe.execute(*State);

				DEBUG(dbgs() << "LV: filled BB:" << *NewBB);
				}

				void VPRegionBlock::execute(VPTransformState *State) {
				ReversePostOrderTraversal<VPBlockBase *> RPOT(Entry);

				if (!isReplicator()) {
				// Visit the VPBlocks connected to "this", starting from it.
				for (VPBlockBase *Block : RPOT) {
				DEBUG(dbgs() << "LV: VPBlock in RPO " << Block->getName() << '\n');
				Block->execute(State);
				}
				return;
				}

				assert(!State->Instance && "Replicating a Region with non-null instance.");

				// Enter replicating mode.
				State->Instance = {0, 0};

				for (unsigned Part = 0, UF = State->UF; Part < UF; ++Part) {
				State->Instance->Part = Part;
				for (unsigned Lane = 0, VF = State->VF; Lane < VF; ++Lane) {
				State->Instance->Lane = Lane;
				// Visit the VPBlocks connected to \p this, starting from it.
				for (VPBlockBase *Block : RPOT) {
				DEBUG(dbgs() << "LV: VPBlock in RPO " << Block->getName() << '\n');
				Block->execute(State);
				}
				mkuperUnsubmitted Not Done Reply Inline Actions This is extremely confusing. You're modifying what looks like a local variable to actually change the State->Instance for the State that will get passed to Block->execute(). Could you rewrite in a way that makes it clear what's going on? mkuper: This is extremely confusing. You're modifying what looks like a local variable to actually…
				AyalAuthorUnsubmitted Not Done Reply Inline Actions OK, sure; will do so using an Optional<Instance>. Ayal: OK, sure; will do so using an Optional<Instance>.
				}
				}

				// Exit replicating mode.
				State->Instance.reset();
				}

				/// Generate the code inside the body of the vectorized loop. Assumes a single
				/// LoopVectorBody basic-block was created for this. Introduce additional
				/// basic-blocks as needed, and fill them all.
				void VPlan::execute(VPTransformState *State) {
				BasicBlock *VectorPreHeaderBB = State->CFG.PrevBB;
				BasicBlock *VectorHeaderBB = VectorPreHeaderBB->getSingleSuccessor();
				assert(VectorHeaderBB && "Loop preheader does not have a single successor.");
				BasicBlock *VectorLatchBB = VectorHeaderBB;

				// 1. Make room to generate basic-blocks inside loop body if needed.
				VectorLatchBB = VectorHeaderBB->splitBasicBlock(
				VectorHeaderBB->getFirstInsertionPt(), "vector.body.latch");
				mkuperUnsubmitted Not Done Reply Inline Actions Why do we need to save/restore the builder IP? mkuper: Why do we need to save/restore the builder IP?
				AyalAuthorUnsubmitted Not Done Reply Inline Actions Ah, we don't :-). Ayal: Ah, we don't :-).
				Loop *L = State->LI->getLoopFor(VectorHeaderBB);
				L->addBasicBlockToLoop(VectorLatchBB, *State->LI);
				// Remove the edge between Header and Latch to allow other connections.
				// Temporarily terminate with unreachable until CFG is rewired.
				// Note: this asserts the generated code's assumption that
				// getFirstInsertionPt() can be dereferenced into an Instruction.
				VectorHeaderBB->getTerminator()->eraseFromParent();
				State->Builder.SetInsertPoint(VectorHeaderBB);
				UnreachableInst *Terminator = State->Builder.CreateUnreachable();
				mkuperUnsubmitted Not Done Reply Inline Actions I don't understand this note. :-) mkuper: I don't understand this note. :-)
				AyalAuthorUnsubmitted Not Done Reply Inline Actions Some subsequent code dereferences getFirstInsertionPt(), in LV or in code called by LV, w/o checking if it might be the end of a block. Ayal: Some subsequent code dereferences getFirstInsertionPt(), in LV or in code called by LV, w/o…
				State->Builder.SetInsertPoint(Terminator);

				// 2. Generate code in loop body.
				State->CFG.PrevVPBB = nullptr;
				State->CFG.PrevBB = VectorHeaderBB;
				State->CFG.LastBB = VectorLatchBB;

				for (VPBlockBase *Block : depth_first(Entry))
				Block->execute(State);

				// 3. Merge the temporary latch created with the last basic-block filled.
				BasicBlock *LastBB = State->CFG.PrevBB;
				// Connect LastBB to VectorLatchBB to facilitate their merge.
				assert(isa<UnreachableInst>(LastBB->getTerminator()) &&
				"Expected VPlan CFG to terminate with unreachable");
				LastBB->getTerminator()->eraseFromParent();
				BranchInst::Create(VectorLatchBB, LastBB);

				// Merge LastBB with Latch.
				bool Merged = MergeBlockIntoPredecessor(VectorLatchBB, nullptr, State->LI);
				(void)Merged;
				assert(Merged && "Could not merge last basic block with latch.");
				VectorLatchBB = LastBB;

				updateDominatorTree(State->DT, VectorPreHeaderBB, VectorLatchBB);
				}

				void VPlan::updateDominatorTree(DominatorTree DT, BasicBlock LoopPreHeaderBB,
				BasicBlock *LoopLatchBB) {
				BasicBlock *LoopHeaderBB = LoopPreHeaderBB->getSingleSuccessor();
				assert(LoopHeaderBB && "Loop preheader does not have a single successor.");
				DT->addNewBlock(LoopHeaderBB, LoopPreHeaderBB);
				// The vector body may be more than a single basic-block by this point.
				// Update the dominator tree information inside the vector body by propagating
				// it from header to latch, expecting only triangular control-flow, if any.
				BasicBlock *PostDomSucc = nullptr;
				for (auto *BB = LoopHeaderBB; BB != LoopLatchBB; BB = PostDomSucc) {
				// Get the list of successors of this block.
				std::vector<BasicBlock *> Succs(succ_begin(BB), succ_end(BB));
				assert(Succs.size() <= 2 &&
				"Basic block in vector loop has more than 2 successors.");
				PostDomSucc = Succs[0];
				if (Succs.size() == 1) {
				assert(PostDomSucc->getSinglePredecessor() &&
				"PostDom successor has more than one predecessor.");
				DT->addNewBlock(PostDomSucc, BB);
				continue;
				}
				BasicBlock *InterimSucc = Succs[1];
				if (PostDomSucc->getSingleSuccessor() == InterimSucc) {
				PostDomSucc = Succs[1];
				InterimSucc = Succs[0];
				}
				assert(InterimSucc->getSingleSuccessor() == PostDomSucc &&
				"One successor of a basic block does not lead to the other.");
				assert(InterimSucc->getSinglePredecessor() &&
				"Interim successor has more than one predecessor.");
				assert(std::distance(pred_begin(PostDomSucc), pred_end(PostDomSucc)) == 2 &&
				"PostDom successor has more than two predecessors.");
				DT->addNewBlock(InterimSucc, BB);
				DT->addNewBlock(PostDomSucc, BB);
				}
				}

				const Twine VPlanPrinter::getUID(const VPBlockBase *Block) {
				return (isa<VPRegionBlock>(Block) ? "cluster_N" : "N") +
				Twine(getOrCreateBID(Block));
				}

				const Twine VPlanPrinter::getOrCreateName(const VPBlockBase *Block) {
				const std::string &Name = Block->getName();
				if (!Name.empty())
				return Name;
				return "VPB" + Twine(getOrCreateBID(Block));
				}

				void VPlanPrinter::dump() {
				Depth = 1;
				bumpIndent(0);
				OS << "digraph VPlan {\n";
				OS << "graph [labelloc=t, fontsize=30; label=\"Vectorization Plan";
				if (!Plan.getName().empty())
				OS << "\\n" << DOT::EscapeString(Plan.getName());
				OS << "\"]\n";
				OS << "node [shape=rect, fontname=Courier, fontsize=30]\n";
				OS << "edge [fontname=Courier, fontsize=30]\n";
				OS << "compound=true\n";

				for (VPBlockBase *Block : depth_first(Plan.getEntry()))
				dumpBlock(Block);

				OS << "}\n";
				}

				void VPlanPrinter::dumpBlock(const VPBlockBase *Block) {
				if (const VPBasicBlock *BasicBlock = dyn_cast<VPBasicBlock>(Block))
				dumpBasicBlock(BasicBlock);
				else if (const VPRegionBlock *Region = dyn_cast<VPRegionBlock>(Block))
				dumpRegion(Region);
				else
				llvm_unreachable("Unsupported kind of VPBlock.");
				}

				void VPlanPrinter::drawEdge(const VPBlockBase From, const VPBlockBase To,
				bool Hidden, const Twine &Label) {
				// Due to "dot" we print an edge between two regions as an edge between the
				// exit basic block and the entry basic of the respective regions.
				const VPBlockBase *Tail = From->getExitBasicBlock();
				const VPBlockBase *Head = To->getEntryBasicBlock();
				OS << Indent << getUID(Tail) << " -> " << getUID(Head);
				OS << " [ label=\"" << Label << '\"';
				if (Tail != From)
				OS << " ltail=" << getUID(From);
				if (Head != To)
				OS << " lhead=" << getUID(To);
				if (Hidden)
				OS << "; splines=none";
				OS << "]\n";
				}

				void VPlanPrinter::dumpEdges(const VPBlockBase *Block) {
				auto &Successors = Block->getSuccessors();
				if (Successors.size() == 1)
				drawEdge(Block, Successors.front(), false, "");
				else if (Successors.size() == 2) {
				drawEdge(Block, Successors.front(), false, "T");
				drawEdge(Block, Successors.back(), false, "F");
				} else {
				unsigned SuccessorNumber = 0;
				for (auto *Successor : Successors)
				drawEdge(Block, Successor, false, Twine(SuccessorNumber++));
				}
				}

				void VPlanPrinter::dumpBasicBlock(const VPBasicBlock *BasicBlock) {
				OS << Indent << getUID(BasicBlock) << " [label =\n";
				bumpIndent(1);
				OS << Indent << "\"" << DOT::EscapeString(BasicBlock->getName()) << ":\\n\"";
				bumpIndent(1);
				for (const VPRecipeBase &Recipe : *BasicBlock)
				Recipe.print(OS, Indent);
				bumpIndent(-2);
				OS << "\n" << Indent << "]\n";
				dumpEdges(BasicBlock);
				}

				void VPlanPrinter::dumpRegion(const VPRegionBlock *Region) {
				OS << Indent << "subgraph " << getUID(Region) << " {\n";
				bumpIndent(1);
				OS << Indent << "fontname=Courier\n"
				<< Indent << "label=\""
				<< DOT::EscapeString(Region->isReplicator() ? "<xVFxUF> " : "<x1> ")
				<< DOT::EscapeString(Region->getName()) << "\"\n";
				// Dump the blocks of the region.
				assert(Region->getEntry() && "Region contains no inner blocks.");
				for (const VPBlockBase *Block : depth_first(Region->getEntry()))
				dumpBlock(Block);
				bumpIndent(-1);
				OS << Indent << "}\n";
				dumpEdges(Region);
				}

				void VPlanPrinter::printAsIngredient(raw_ostream &O, Value *V) {
				std::string IngredientString;
				raw_string_ostream RSO(IngredientString);
				if (auto *Inst = dyn_cast<Instruction>(V)) {
				if (!Inst->getType()->isVoidTy()) {
				Inst->printAsOperand(RSO, false);
				RSO << " = ";
				}
				RSO << Inst->getOpcodeName() << " ";
				unsigned E = Inst->getNumOperands();
				if (E > 0) {
				Inst->getOperand(0)->printAsOperand(RSO, false);
				for (unsigned I = 1; I < E; ++I)
				Inst->getOperand(I)->printAsOperand(RSO << ", ", false);
				}
				} else // !Inst
				V->printAsOperand(RSO, false);
				RSO.flush();
				O << DOT::EscapeString(IngredientString);
				}

test/Transforms/LoopVectorize/AArch64/aarch64-predication.ll

	Show All 20 Lines
	; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i64, i64 %a, i64 [[INDEX]]			; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i64, i64 %a, i64 [[INDEX]]
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i64 [[TMP0]] to <2 x i64>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast i64 [[TMP0]] to <2 x i64>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <2 x i64>, <2 x i64> [[TMP1]], align 4			; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <2 x i64>, <2 x i64> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt <2 x i64> [[WIDE_LOAD]], zeroinitializer			; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt <2 x i64> [[WIDE_LOAD]], zeroinitializer
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x i1> [[TMP2]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x i1> [[TMP2]], i32 0
	; CHECK-NEXT: br i1 [[TMP3]], label %[[PRED_UDIV_IF:.]], label %[[PRED_UDIV_CONTINUE:.]]			; CHECK-NEXT: br i1 [[TMP3]], label %[[PRED_UDIV_IF:.]], label %[[PRED_UDIV_CONTINUE:.]]
	; CHECK: [[PRED_UDIV_IF]]:			; CHECK: [[PRED_UDIV_IF]]:
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[WIDE_LOAD]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[WIDE_LOAD]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[WIDE_LOAD]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = add nsw i64 [[TMP4]], %x
	; CHECK-NEXT: [[TMP6:%.*]] = add nsw i64 [[TMP5]], %x			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i64> [[WIDE_LOAD]], i32 0
	; CHECK-NEXT: [[TMP7:%.*]] = udiv i64 [[TMP4]], [[TMP6]]			; CHECK-NEXT: [[TMP7:%.*]] = udiv i64 [[TMP6]], [[TMP5]]
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x i64> undef, i64 [[TMP7]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x i64> undef, i64 [[TMP7]], i32 0
	; CHECK-NEXT: br label %[[PRED_UDIV_CONTINUE]]			; CHECK-NEXT: br label %[[PRED_UDIV_CONTINUE]]
	; CHECK: [[PRED_UDIV_CONTINUE]]:			; CHECK: [[PRED_UDIV_CONTINUE]]:
	; CHECK-NEXT: [[TMP9:%.*]] = phi <2 x i64> [ undef, %vector.body ], [ [[TMP8]], %[[PRED_UDIV_IF]] ]			; CHECK-NEXT: [[TMP9:%.*]] = phi <2 x i64> [ undef, %vector.body ], [ [[TMP8]], %[[PRED_UDIV_IF]] ]
	; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x i1> [[TMP2]], i32 1			; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x i1> [[TMP2]], i32 1
	; CHECK-NEXT: br i1 [[TMP10]], label %[[PRED_UDIV_IF1:.*]], label %[[PRED_UDIV_CONTINUE2]]			; CHECK-NEXT: br i1 [[TMP10]], label %[[PRED_UDIV_IF1:.*]], label %[[PRED_UDIV_CONTINUE2]]
	; CHECK: [[PRED_UDIV_IF1]]:			; CHECK: [[PRED_UDIV_IF1]]:
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x i64> [[WIDE_LOAD]], i32 1			; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x i64> [[WIDE_LOAD]], i32 1
	; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x i64> [[WIDE_LOAD]], i32 1			; CHECK-NEXT: [[TMP12:%.*]] = add nsw i64 [[TMP11]], %x
	; CHECK-NEXT: [[TMP13:%.*]] = add nsw i64 [[TMP12]], %x			; CHECK-NEXT: [[TMP13:%.*]] = extractelement <2 x i64> [[WIDE_LOAD]], i32 1
	; CHECK-NEXT: [[TMP14:%.*]] = udiv i64 [[TMP11]], [[TMP13]]			; CHECK-NEXT: [[TMP14:%.*]] = udiv i64 [[TMP13]], [[TMP12]]
	; CHECK-NEXT: [[TMP15:%.*]] = insertelement <2 x i64> [[TMP9]], i64 [[TMP14]], i32 1			; CHECK-NEXT: [[TMP15:%.*]] = insertelement <2 x i64> [[TMP9]], i64 [[TMP14]], i32 1
	; CHECK-NEXT: br label %[[PRED_UDIV_CONTINUE2]]			; CHECK-NEXT: br label %[[PRED_UDIV_CONTINUE2]]
	; CHECK: [[PRED_UDIV_CONTINUE2]]:			; CHECK: [[PRED_UDIV_CONTINUE2]]:
	; CHECK-NEXT: [[TMP16:%.*]] = phi <2 x i64> [ [[TMP9]], %[[PRED_UDIV_CONTINUE]] ], [ [[TMP15]], %[[PRED_UDIV_IF1]] ]			; CHECK-NEXT: [[TMP16:%.*]] = phi <2 x i64> [ [[TMP9]], %[[PRED_UDIV_CONTINUE]] ], [ [[TMP15]], %[[PRED_UDIV_IF1]] ]
	; CHECK-NEXT: [[PREDPHI:%.*]] = select <2 x i1> [[TMP2]], <2 x i64> [[TMP16]], <2 x i64> [[WIDE_LOAD]]			; CHECK-NEXT: [[PREDPHI:%.*]] = select <2 x i1> [[TMP2]], <2 x i64> [[TMP16]], <2 x i64> [[WIDE_LOAD]]
	; CHECK-NEXT: [[TMP17]] = add <2 x i64> [[VEC_PHI]], [[PREDPHI]]			; CHECK-NEXT: [[TMP17]] = add <2 x i64> [[VEC_PHI]], [[PREDPHI]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 2			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 2
	; CHECK: br i1 {{.*}}, label %middle.block, label %vector.body			; CHECK: br i1 {{.*}}, label %middle.block, label %vector.body
	Show All 29 Lines

test/Transforms/LoopVectorize/AArch64/predication_costs.ll

	Show All 12 Lines
	;			;
	; This test checks that we correctly compute the cost of the predicated udiv			; This test checks that we correctly compute the cost of the predicated udiv
	; instruction. If we assume the block probability is 50%, we compute the cost			; instruction. If we assume the block probability is 50%, we compute the cost
	; as:			; as:
	;			;
	; Cost of udiv:			; Cost of udiv:
	; (udiv(2) + extractelement(6) + insertelement(3)) / 2 = 5			; (udiv(2) + extractelement(6) + insertelement(3)) / 2 = 5
	;			;
	; CHECK: Found an estimated cost of 5 for VF 2 For instruction: %tmp4 = udiv i32 %tmp2, %tmp3
	; CHECK: Scalarizing and predicating: %tmp4 = udiv i32 %tmp2, %tmp3			; CHECK: Scalarizing and predicating: %tmp4 = udiv i32 %tmp2, %tmp3
				; CHECK: Found an estimated cost of 5 for VF 2 For instruction: %tmp4 = udiv i32 %tmp2, %tmp3
	;			;
	define i32 @predicated_udiv(i32* %a, i32* %b, i1 %c, i64 %n) {			define i32 @predicated_udiv(i32* %a, i32* %b, i1 %c, i64 %n) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.inc ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.inc ]
	%r = phi i32 [ 0, %entry ], [ %tmp6, %for.inc ]			%r = phi i32 [ 0, %entry ], [ %tmp6, %for.inc ]
	Show All 23 Lines
	;			;
	; This test checks that we correctly compute the cost of the predicated store			; This test checks that we correctly compute the cost of the predicated store
	; instruction. If we assume the block probability is 50%, we compute the cost			; instruction. If we assume the block probability is 50%, we compute the cost
	; as:			; as:
	;			;
	; Cost of store:			; Cost of store:
	; (store(4) + extractelement(3)) / 2 = 3			; (store(4) + extractelement(3)) / 2 = 3
	;			;
	; CHECK: Found an estimated cost of 3 for VF 2 For instruction: store i32 %tmp2, i32* %tmp0, align 4
	; CHECK: Scalarizing and predicating: store i32 %tmp2, i32* %tmp0, align 4			; CHECK: Scalarizing and predicating: store i32 %tmp2, i32* %tmp0, align 4
				; CHECK: Found an estimated cost of 3 for VF 2 For instruction: store i32 %tmp2, i32* %tmp0, align 4
	;			;
	define void @predicated_store(i32* %a, i1 %c, i32 %x, i64 %n) {			define void @predicated_store(i32* %a, i1 %c, i32 %x, i64 %n) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.inc ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.inc ]
	%tmp0 = getelementptr inbounds i32, i32* %a, i64 %i			%tmp0 = getelementptr inbounds i32, i32* %a, i64 %i
	Show All 21 Lines
	; inside the predicated block. If we assume the block probability is 50%, we			; inside the predicated block. If we assume the block probability is 50%, we
	; compute the cost as:			; compute the cost as:
	;			;
	; Cost of add:			; Cost of add:
	; (add(2) + extractelement(3)) / 2 = 2			; (add(2) + extractelement(3)) / 2 = 2
	; Cost of udiv:			; Cost of udiv:
	; (udiv(2) + extractelement(3) + insertelement(3)) / 2 = 4			; (udiv(2) + extractelement(3) + insertelement(3)) / 2 = 4
	;			;
	; CHECK: Found an estimated cost of 2 for VF 2 For instruction: %tmp3 = add nsw i32 %tmp2, %x
	; CHECK: Found an estimated cost of 4 for VF 2 For instruction: %tmp4 = udiv i32 %tmp2, %tmp3
	; CHECK: Scalarizing: %tmp3 = add nsw i32 %tmp2, %x			; CHECK: Scalarizing: %tmp3 = add nsw i32 %tmp2, %x
	; CHECK: Scalarizing and predicating: %tmp4 = udiv i32 %tmp2, %tmp3			; CHECK: Scalarizing and predicating: %tmp4 = udiv i32 %tmp2, %tmp3
				; CHECK: Found an estimated cost of 2 for VF 2 For instruction: %tmp3 = add nsw i32 %tmp2, %x
				; CHECK: Found an estimated cost of 4 for VF 2 For instruction: %tmp4 = udiv i32 %tmp2, %tmp3
	;			;
	define i32 @predicated_udiv_scalarized_operand(i32* %a, i1 %c, i32 %x, i64 %n) {			define i32 @predicated_udiv_scalarized_operand(i32* %a, i1 %c, i32 %x, i64 %n) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.inc ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.inc ]
	%r = phi i32 [ 0, %entry ], [ %tmp6, %for.inc ]			%r = phi i32 [ 0, %entry ], [ %tmp6, %for.inc ]
	Show All 25 Lines
	; inside the predicated block. If we assume the block probability is 50%, we			; inside the predicated block. If we assume the block probability is 50%, we
	; compute the cost as:			; compute the cost as:
	;			;
	; Cost of add:			; Cost of add:
	; (add(2) + extractelement(3)) / 2 = 2			; (add(2) + extractelement(3)) / 2 = 2
	; Cost of store:			; Cost of store:
	; store(4) / 2 = 2			; store(4) / 2 = 2
	;			;
	; CHECK: Found an estimated cost of 2 for VF 2 For instruction: %tmp2 = add nsw i32 %tmp1, %x
	; CHECK: Found an estimated cost of 2 for VF 2 For instruction: store i32 %tmp2, i32* %tmp0, align 4
	; CHECK: Scalarizing: %tmp2 = add nsw i32 %tmp1, %x			; CHECK: Scalarizing: %tmp2 = add nsw i32 %tmp1, %x
	; CHECK: Scalarizing and predicating: store i32 %tmp2, i32* %tmp0, align 4			; CHECK: Scalarizing and predicating: store i32 %tmp2, i32* %tmp0, align 4
				; CHECK: Found an estimated cost of 2 for VF 2 For instruction: %tmp2 = add nsw i32 %tmp1, %x
				; CHECK: Found an estimated cost of 2 for VF 2 For instruction: store i32 %tmp2, i32* %tmp0, align 4
	;			;
	define void @predicated_store_scalarized_operand(i32* %a, i1 %c, i32 %x, i64 %n) {			define void @predicated_store_scalarized_operand(i32* %a, i1 %c, i32 %x, i64 %n) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.inc ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.inc ]
	%tmp0 = getelementptr inbounds i32, i32* %a, i64 %i			%tmp0 = getelementptr inbounds i32, i32* %a, i64 %i
	Show All 29 Lines
	; (sdiv(2) + extractelement(6) + insertelement(3)) / 2 = 5			; (sdiv(2) + extractelement(6) + insertelement(3)) / 2 = 5
	; Cost of udiv:			; Cost of udiv:
	; (udiv(2) + extractelement(6) + insertelement(3)) / 2 = 5			; (udiv(2) + extractelement(6) + insertelement(3)) / 2 = 5
	; Cost of sub:			; Cost of sub:
	; (sub(2) + extractelement(3)) / 2 = 2			; (sub(2) + extractelement(3)) / 2 = 2
	; Cost of store:			; Cost of store:
	; store(4) / 2 = 2			; store(4) / 2 = 2
	;			;
	; CHECK: Found an estimated cost of 1 for VF 2 For instruction: %tmp2 = add i32 %tmp1, %x
	; CHECK: Found an estimated cost of 5 for VF 2 For instruction: %tmp3 = sdiv i32 %tmp1, %tmp2
	; CHECK: Found an estimated cost of 5 for VF 2 For instruction: %tmp4 = udiv i32 %tmp3, %tmp2
	; CHECK: Found an estimated cost of 2 for VF 2 For instruction: %tmp5 = sub i32 %tmp4, %x
	; CHECK: Found an estimated cost of 2 for VF 2 For instruction: store i32 %tmp5, i32* %tmp0, align 4
	; CHECK-NOT: Scalarizing: %tmp2 = add i32 %tmp1, %x			; CHECK-NOT: Scalarizing: %tmp2 = add i32 %tmp1, %x
	; CHECK: Scalarizing and predicating: %tmp3 = sdiv i32 %tmp1, %tmp2			; CHECK: Scalarizing and predicating: %tmp3 = sdiv i32 %tmp1, %tmp2
	; CHECK: Scalarizing and predicating: %tmp4 = udiv i32 %tmp3, %tmp2			; CHECK: Scalarizing and predicating: %tmp4 = udiv i32 %tmp3, %tmp2
	; CHECK: Scalarizing: %tmp5 = sub i32 %tmp4, %x			; CHECK: Scalarizing: %tmp5 = sub i32 %tmp4, %x
	; CHECK: Scalarizing and predicating: store i32 %tmp5, i32* %tmp0, align 4			; CHECK: Scalarizing and predicating: store i32 %tmp5, i32* %tmp0, align 4
				; CHECK: Found an estimated cost of 1 for VF 2 For instruction: %tmp2 = add i32 %tmp1, %x
				; CHECK: Found an estimated cost of 5 for VF 2 For instruction: %tmp3 = sdiv i32 %tmp1, %tmp2
				; CHECK: Found an estimated cost of 5 for VF 2 For instruction: %tmp4 = udiv i32 %tmp3, %tmp2
				; CHECK: Found an estimated cost of 2 for VF 2 For instruction: %tmp5 = sub i32 %tmp4, %x
				; CHECK: Found an estimated cost of 2 for VF 2 For instruction: store i32 %tmp5, i32* %tmp0, align 4
	;			;
	define void @predication_multi_context(i32* %a, i1 %c, i32 %x, i64 %n) {			define void @predication_multi_context(i32* %a, i1 %c, i32 %x, i64 %n) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.inc ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.inc ]
	%tmp0 = getelementptr inbounds i32, i32* %a, i64 %i			%tmp0 = getelementptr inbounds i32, i32* %a, i64 %i
	Show All 19 Lines

test/Transforms/LoopVectorize/SystemZ/load-store-scalarization-cost.ll

Show All 18 Lines	for.body:
store i32 %tmp2, i32* %tmp0, align 4		store i32 %tmp2, i32* %tmp0, align 4
%i.next = add nuw nsw i64 %i, 2		%i.next = add nuw nsw i64 %i, 2
%cond = icmp slt i64 %i.next, %n		%cond = icmp slt i64 %i.next, %n
br i1 %cond, label %for.body, label %for.end		br i1 %cond, label %for.body, label %for.end

for.end:		for.end:
ret void		ret void

; CHECK: LV: Found an estimated cost of 4 for VF 4 For instruction: %tmp1 = load i32, i32* %tmp0, align 4
; CHECK: LV: Found an estimated cost of 4 for VF 4 For instruction: store i32 %tmp2, i32* %tmp0, align 4

; CHECK: LV: Scalarizing: %tmp1 = load i32, i32* %tmp0, align 4		; CHECK: LV: Scalarizing: %tmp1 = load i32, i32* %tmp0, align 4
; CHECK: LV: Scalarizing: store i32 %tmp2, i32* %tmp0, align 4		; CHECK: LV: Scalarizing: store i32 %tmp2, i32* %tmp0, align 4

		; CHECK: LV: Found an estimated cost of 4 for VF 4 For instruction: %tmp1 = load i32, i32* %tmp0, align 4
		; CHECK: LV: Found an estimated cost of 4 for VF 4 For instruction: store i32 %tmp2, i32* %tmp0, align 4
}		}

test/Transforms/LoopVectorize/first-order-recurrence.ll

	Show First 20 Lines • Show All 461 Lines • ▼ Show 20 Lines
	; Check that the sext sank after the load in the vector loop.			; Check that the sext sank after the load in the vector loop.
	; SINK-AFTER: vector.body			; SINK-AFTER: vector.body
	; SINK-AFTER: %vector.recur = phi <4 x i16> [ %vector.recur.init, %vector.ph ], [ %wide.load, %vector.body ]			; SINK-AFTER: %vector.recur = phi <4 x i16> [ %vector.recur.init, %vector.ph ], [ %wide.load, %vector.body ]
	; SINK-AFTER: %wide.load = load <4 x i16>			; SINK-AFTER: %wide.load = load <4 x i16>
	; SINK-AFTER: %[[VSHUF:.+]] = shufflevector <4 x i16> %vector.recur, <4 x i16> %wide.load, <4 x i32> <i32 3, i32 4, i32 5, i32 6>			; SINK-AFTER: %[[VSHUF:.+]] = shufflevector <4 x i16> %vector.recur, <4 x i16> %wide.load, <4 x i32> <i32 3, i32 4, i32 5, i32 6>
	; SINK-AFTER: %[[VCONV:.+]] = sext <4 x i16> %[[VSHUF]] to <4 x i32>			; SINK-AFTER: %[[VCONV:.+]] = sext <4 x i16> %[[VSHUF]] to <4 x i32>
	; SINK-AFTER: %[[VCONV3:.+]] = sext <4 x i16> %wide.load to <4 x i32>			; SINK-AFTER: %[[VCONV3:.+]] = sext <4 x i16> %wide.load to <4 x i32>
	; SINK-AFTER: mul nsw <4 x i32> %[[VCONV3]], %[[VCONV]]			; SINK-AFTER: mul nsw <4 x i32> %[[VCONV3]], %[[VCONV]]
	; Check also that the sext sank after the load in the scalar loop.
	; SINK-AFTER: for.body
	; SINK-AFTER: %scalar.recur = phi i16 [ %scalar.recur.init, %scalar.ph ], [ %[[LOAD:.+]], %for.body ]
	; SINK-AFTER: %[[LOAD]] = load i16, i16* %arrayidx2
	; SINK-AFTER: %[[CONV:.+]] = sext i16 %scalar.recur to i32
	; SINK-AFTER: %[[CONV3:.+]] = sext i16 %[[LOAD]] to i32
	; SINK-AFTER: %mul = mul nsw i32 %[[CONV3]], %[[CONV]]
	;			;
	define void @sink_after(i16* %a, i32* %b, i64 %n) {			define void @sink_after(i16* %a, i32* %b, i64 %n) {
	entry:			entry:
	%.pre = load i16, i16* %a			%.pre = load i16, i16* %a
	br label %for.body			br label %for.body

	for.body:			for.body:
	%0 = phi i16 [ %.pre, %entry ], [ %1, %for.body ]			%0 = phi i16 [ %.pre, %entry ], [ %1, %for.body ]
	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

test/Transforms/LoopVectorize/if-pred-non-void.ll

; RUN: opt -S -force-vector-width=2 -force-vector-interleave=1 -loop-vectorize -verify-loop-info -simplifycfg < %s \| FileCheck %s		; RUN: opt -S -force-vector-width=2 -force-vector-interleave=1 -loop-vectorize -verify-loop-info -simplifycfg < %s \| FileCheck %s
		; RUN: opt -S -force-vector-width=1 -force-vector-interleave=2 -loop-vectorize -verify-loop-info < %s \| FileCheck %s --check-prefix=UNROLL-NO-VF

target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"		target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

; Test predication of non-void instructions, specifically (i) that these		; Test predication of non-void instructions, specifically (i) that these
; instructions permit vectorization and (ii) the creation of an insertelement		; instructions permit vectorization and (ii) the creation of an insertelement
; and a Phi node. We check the full 2-element sequence for the first		; and a Phi node. We check the full 2-element sequence for the first
; instruction; For the rest we'll just make sure they get predicated based		; instruction; For the rest we'll just make sure they get predicated based
; on the code generated for the first element.		; on the code generated for the first element.
▲ Show 20 Lines • Show All 194 Lines • ▼ Show 20 Lines	entry:
br label %for.body		br label %for.body

; CHECK-LABEL: predicated_udiv_scalarized_operand		; CHECK-LABEL: predicated_udiv_scalarized_operand
; CHECK: vector.body:		; CHECK: vector.body:
; CHECK: %wide.load = load <2 x i32>, <2 x i32>* {{.*}}, align 4		; CHECK: %wide.load = load <2 x i32>, <2 x i32>* {{.*}}, align 4
; CHECK: br i1 {{.*}}, label %[[IF0:.+]], label %[[CONT0:.+]]		; CHECK: br i1 {{.*}}, label %[[IF0:.+]], label %[[CONT0:.+]]
; CHECK: [[IF0]]:		; CHECK: [[IF0]]:
; CHECK: %[[T00:.+]] = extractelement <2 x i32> %wide.load, i32 0		; CHECK: %[[T00:.+]] = extractelement <2 x i32> %wide.load, i32 0
; CHECK: %[[T01:.+]] = extractelement <2 x i32> %wide.load, i32 0		; CHECK: %[[T01:.+]] = add nsw i32 %[[T00]], %x
; CHECK: %[[T02:.+]] = add nsw i32 %[[T01]], %x		; CHECK: %[[T02:.+]] = extractelement <2 x i32> %wide.load, i32 0
; CHECK: %[[T03:.+]] = udiv i32 %[[T00]], %[[T02]]		; CHECK: %[[T03:.+]] = udiv i32 %[[T02]], %[[T01]]
		mkuperUnsubmitted Not Done Reply Inline Actions The test changes look benign, but I'm curious about why we have them. mkuper: The test changes look benign, but I'm curious about why we have them.
		AyalAuthorUnsubmitted Not Done Reply Inline Actions These test changes are due to the order in which predicated-and-scalarized basic-blocks are created and filled. In current trunk, when the “udiv” is generated, the “extract” feeding its nominator is also generated and placed before it, courtesy of getScalarValue(). Later the “udiv” is placed in a basic-block of its own w/o its operands. Finally sinkScalarOperands() kicks in, sinking both its operands to appear next to it. In this patch, when the “udiv” is generated, the “extract” feeding its nominator is also generated and placed before it, courtesy of getOrCreateScalar(). But having already created a basic-block for the “udiv”, this “extract” is placed there as well, complying with LV's approach of placing extracts next to their uses. When finally sinkScalarOperands() kicks in, it has only the other operand to sink (the denominator). This discrepancy in the sinking order swaps the final placement of the two operands. Ayal: These test changes are due to the order in which predicated-and-scalarized basic-blocks are…
; CHECK: %[[T04:.+]] = insertelement <2 x i32> undef, i32 %[[T03]], i32 0		; CHECK: %[[T04:.+]] = insertelement <2 x i32> undef, i32 %[[T03]], i32 0
; CHECK: br label %[[CONT0]]		; CHECK: br label %[[CONT0]]
; CHECK: [[CONT0]]:		; CHECK: [[CONT0]]:
; CHECK: %[[T05:.+]] = phi <2 x i32> [ undef, %vector.body ], [ %[[T04]], %[[IF0]] ]		; CHECK: %[[T05:.+]] = phi <2 x i32> [ undef, %vector.body ], [ %[[T04]], %[[IF0]] ]
; CHECK: br i1 {{.*}}, label %[[IF1:.+]], label %[[CONT1:.+]]		; CHECK: br i1 {{.*}}, label %[[IF1:.+]], label %[[CONT1:.+]]
; CHECK: [[IF1]]:		; CHECK: [[IF1]]:
; CHECK: %[[T06:.+]] = extractelement <2 x i32> %wide.load, i32 1		; CHECK: %[[T06:.+]] = extractelement <2 x i32> %wide.load, i32 1
; CHECK: %[[T07:.+]] = extractelement <2 x i32> %wide.load, i32 1		; CHECK: %[[T07:.+]] = add nsw i32 %[[T06]], %x
; CHECK: %[[T08:.+]] = add nsw i32 %[[T07]], %x		; CHECK: %[[T08:.+]] = extractelement <2 x i32> %wide.load, i32 1
; CHECK: %[[T09:.+]] = udiv i32 %[[T06]], %[[T08]]		; CHECK: %[[T09:.+]] = udiv i32 %[[T08]], %[[T07]]
; CHECK: %[[T10:.+]] = insertelement <2 x i32> %[[T05]], i32 %[[T09]], i32 1		; CHECK: %[[T10:.+]] = insertelement <2 x i32> %[[T05]], i32 %[[T09]], i32 1
; CHECK: br label %[[CONT1]]		; CHECK: br label %[[CONT1]]
; CHECK: [[CONT1]]:		; CHECK: [[CONT1]]:
; CHECK: phi <2 x i32> [ %[[T05]], %[[CONT0]] ], [ %[[T10]], %[[IF1]] ]		; CHECK: phi <2 x i32> [ %[[T05]], %[[CONT0]] ], [ %[[T10]], %[[IF1]] ]
; CHECK: br i1 {{.*}}, label %middle.block, label %vector.body		; CHECK: br i1 {{.*}}, label %middle.block, label %vector.body

		; Test predicating an instruction that feeds a vectorizable use, when unrolled
		; but not vectorized. Derived from pr34248 reproducer.
		;
		; UNROLL-NO-VF-LABEL: predicated_udiv_scalarized_operand
		; UNROLL-NO-VF: vector.body:
		; UNROLL-NO-VF: %[[LOAD0:.+]] = load i32, i32*
		; UNROLL-NO-VF: %[[LOAD1:.+]] = load i32, i32*
		; UNROLL-NO-VF: br i1 {{.*}}, label %[[IF0:.+]], label %[[CONT0:.+]]
		; UNROLL-NO-VF: [[IF0]]:
		; UNROLL-NO-VF: %[[ADD0:.+]] = add nsw i32 %[[LOAD0]], %x
		; UNROLL-NO-VF: %[[DIV0:.+]] = udiv i32 %[[LOAD0]], %[[ADD0]]
		; UNROLL-NO-VF: br label %[[CONT0]]
		; UNROLL-NO-VF: [[CONT0]]:
		; UNROLL-NO-VF: phi i32 [ undef, %vector.body ], [ %[[DIV0]], %[[IF0]] ]
		; UNROLL-NO-VF: br i1 {{.*}}, label %[[IF1:.+]], label %[[CONT1:.+]]
		; UNROLL-NO-VF: [[IF1]]:
		; UNROLL-NO-VF: %[[ADD1:.+]] = add nsw i32 %[[LOAD1]], %x
		; UNROLL-NO-VF: %[[DIV1:.+]] = udiv i32 %[[LOAD1]], %[[ADD1]]
		; UNROLL-NO-VF: br label %[[CONT1]]
		; UNROLL-NO-VF: [[CONT1]]:
		; UNROLL-NO-VF: phi i32 [ undef, %[[CONT0]] ], [ %[[DIV1]], %[[IF1]] ]
		; UNROLL-NO-VF: br i1 {{.*}}, label %middle.block, label %vector.body
		;
for.body:		for.body:
%i = phi i64 [ 0, %entry ], [ %i.next, %for.inc ]		%i = phi i64 [ 0, %entry ], [ %i.next, %for.inc ]
%r = phi i32 [ 0, %entry ], [ %tmp6, %for.inc ]		%r = phi i32 [ 0, %entry ], [ %tmp6, %for.inc ]
%tmp0 = getelementptr inbounds i32, i32* %a, i64 %i		%tmp0 = getelementptr inbounds i32, i32* %a, i64 %i
%tmp2 = load i32, i32* %tmp0, align 4		%tmp2 = load i32, i32* %tmp0, align 4
br i1 %c, label %if.then, label %for.inc		br i1 %c, label %if.then, label %for.inc

if.then:		if.then:
Show All 15 Lines