This is an archive of the discontinued LLVM Phabricator instance.

[LV] Vectorize first-order recurrences
ClosedPublic

Authored by mssimpso on Jan 14 2016, 10:02 AM.

Download Raw Diff

Details

Reviewers

anemet
nadav
jmolloy
hfinkel
mcrosier

Commits

rG29c997c1a18a: [LV] Vectorize first-order recurrences
rL261346: [LV] Vectorize first-order recurrences

Summary

This patch enables the vectorization of first-order (non-reduction) recurrences. For example:

for (int i = 0; i < n; ++i)
  b[i] = a[i] - a[i - 1];

In the example above, the load PRE of the GVN pass can often hoist a[i - 1] into the loop preheader. This leaves a phi node inside the loop containing values for the hoisted load and a[i]. Although GVN can create these phi nodes, they can also occur naturally.

In this patch, we add a new recurrence kind for these phi nodes and attempt to vectorize them if possible. Vectorization is performed by shuffling the values for the current and previous iterations. The vectorization cost estimate is updated to account for the added shuffle instruction.

Contributed-by: Matthew Simpson and Chad Rosier <mcrosier@codeaurora.org>

Diff Detail

Event Timeline

mssimpso updated this revision to Diff 44897.Jan 14 2016, 10:02 AM

mssimpso retitled this revision from to [LV] Vectorize pre-load recurrences.

mssimpso updated this object.

mssimpso added reviewers: mcrosier, jmolloy, hfinkel, anemet, nadav.

mssimpso added a subscriber: llvm-commits.

Herald added a subscriber: sanjoy. · View Herald TranscriptJan 14 2016, 10:02 AM

bmakam added a subscriber: bmakam.Jan 14 2016, 10:21 AM

Matt, I've added a few comments below.

lib/Transforms/Vectorize/LoopVectorize.cpp
4573	I don't think checking for store instructions is enough here. Isn't I->mayWriteToMemory() more appropriate to catch e.g. calls?
4579	Same as above. Plus, I'm not sure this is a sufficient check. What if there is a store after the sink point in iteration 'i' that may alias the load from iteration 'i+1'? Perhaps you can piggy back on the check for vectorizability of the original load in the loop? That may not be enough either since the load in question might be known to only write the 0th element (i.e. the value loaded by the sunk load).

One other question: have you explored vectorizing this recurrence as a shuffle+insertelement instead? That would avoid the need for any extra memory dependency checking, and would avoid introducing more loads in the loop.

In D16197#327098, @gberry wrote:

One other question: have you explored vectorizing this recurrence as a shuffle+insertelement instead? That would avoid the need for any extra memory dependency checking, and would avoid introducing more loads in the loop.

I just talked with Geoff in person, and I like the shuffle/insert approach. This will avoid a lot of the memory issues and should also expand the types of phi's we can handle beyond loads. Thanks for the quick feedback! I'll post an updated version soon.

sbaranga added a subscriber: sbaranga.Jan 15 2016, 4:36 AM

sbaranga added inline comments.

lib/Transforms/Utils/LoopUtils.cpp
562	It might be worth taking a PredicatedScalarEvolution for this function which I think might help in some cases. At the moment it is possible that the loop was versioned such that AR's step is known to be 1. In this case doing PSE->getSCEV() would give you the more accurate expression. I think you also need to check that AR's is an AddRecExpr for TheLoop?

Addressed comments from Geoff and Adam.

Code generation is now done by shuffling the vectors from the current and previous iterations instead of trying to sink loads into the loop. The new approach is simpler and I don't think requires as much legality checking.

Herald added a subscriber: mcrosier. · View Herald TranscriptJan 20 2016, 11:35 AM

mssimpso added inline comments.Jan 20 2016, 11:38 AM

lib/Transforms/Utils/LoopUtils.cpp
587	Hi Silviu. Thanks very much for the comments. After changing the code generation approach, I don't think scalar evolution is required any longer.

Ping.

Hi Matt,

I've started reviewing this but I found that I could use an example of this transformation fully spelled out. Particularly, I am not sure I understand why the initial vector value is a splat vector.

Thanks,
Adam

lib/Transforms/Vectorize/LoopVectorize.cpp
367	I guess a definition of first-order recurrence should be added here.
3333–3334	I actually think we need to this before this patch ;). This is a 200-line loop and we shouldn't pile on more strangeness. I think that all we need to do is to rename RdxPHIsToFix to PHIsToFix in a prequel NFC patch.
3336	I know that we use fix... in other places but that's pretty general. How about vectorize... or something like that.
3574–3577	assert that Previous is loop-invariant at this point?

Hi Adam,

Thanks very much for the feedback. Yes, the initial value doesn't have to be broadcast to all lanes since just an insert will do. I will upload a new version with your suggestions. Let me walk through the simple example I mentioned in the summary. For this loop, the (shorthand) scalar IR looks something like this:

scalar.ph:
  s_init = a[-1]
  br scalar.body

scalar.body:
  s1 = phi [s_init, scalar.ph], [s2, scalar.body]
  i = phi [0, scalar.ph], [i+1, scalar.body]
  s2 = a[i]
  b[i] = s2 - s1
  br cond, scalar.body, ...

Here, s1 is a non-reduction recurrence that we currently give up on. This patch calls it a first-order recurrence (because it's value depends on the previous iteration) and we try to vectorize it. The check in isFirstOrderRecurrence basically ensures that s_init is loop-invariant, that s2 is in the loop header (and thus loop-varying), and that every use of s1 is dominated by the definition of s2 (see below). The vectorized IR looks something like this for VF=4:

vector.ph:
  v_init = vector(..., ..., ..., a[-1])
  br vector.body

vector.body
  v1 = phi [v_init, vector.ph], [v2, vector.body]
  i = phi [0, vector.ph], [i+4, vector.body]
  v2 = a[i, i+1, i+2, i+3];
  v3 = vector(v1(3), v2(0, 1, 2))
  b[i, i+1, i+2, i+3] = v2 - v3
  br cond, vector.body, middle.block

middle.block:
  x = v2(3)
  br scalar.ph

scalar.ph:
  s_init = phi [x, middle.block], [a[-1], otherwise]
  br scalar.body

The dominance requirement is there so that we can shuffle v1 with v2 before this value is needed for the assignment to b. After we leave the vector loop, we extract the next value of the recurrence (x) to use as the initial value when jumping to the scalar portion. I hope that helps.

lib/Transforms/Vectorize/LoopVectorize.cpp
367	Agreed.
3333–3334	Sure, I'll push a patch that renames RdxPHIsToFix to PHIsToFix first.
3336	vectorizeFirstOrderRecurrence sounds good to me.
3574–3577	Good idea.

mssimpso mentioned this in rL259364: [LV] Rename RdxPHIsToFix to PHIsToFix (NFC).Feb 1 2016, 8:11 AM

mssimpso marked 4 inline comments as done.Feb 1 2016, 9:52 AM

mssimpso added inline comments.

lib/Transforms/Vectorize/LoopVectorize.cpp
3574–3577	We actually want the previous value to be loop-varying here. The initial value should be loop-invariant. I've added this assertion as well as a few others.

Addressed Adam's comments

Matt, that's a nice example. Can you please add it as a comment to the code where it's appropriate.

lib/Transforms/Utils/LoopUtils.cpp
532–541	Thinking more about this, isn't it true that whether the phi operand is initial or previous depends on their edge assignment rather than whether they loop-variant/invariant.
543–550	How can user of the current value be loop-invariant?
lib/Transforms/Vectorize/LoopVectorize.cpp
368–381	We don't add empty /// like that I actually think that this definition of this should be before isFirstOrderRecurrence.
3566	Sorry but I forgot that "fix" is an actual "phase" here. We first do the default widening then we fix up the phis. Can you please rename it back, sorry :(
3581–3593	This is duplicated between here and isFirstOrderRecurrence, would be nice to somehow remove this.
3604–3605	"... when initially vectorizing ..."
3609–3610	Outdated comment.
3633–3644	I think that CurrentParts should be a single value, this is the value that's incoming into the iteration (whether the real loop iteration or an unrolled loop iteration). We may also want to use the name VecPhi rather than "current" and add comment clarifying the above. What do you think?
3665–3667	A newline before this block?
3749–3750	It's generally not a good practice to mix a multi-line formatting change with a single line (?) functionality change .

mssimpso added inline comments.Feb 11 2016, 9:12 AM

lib/Transforms/Utils/LoopUtils.cpp
532–541	When I think more about this, yes, I agree. I'll make the necessary updates. Thanks for catching this.
543–550	Current (the Phi) could be used outside the loop. Reductions are an example where a Phi is used externally. I actually don't think we need this restriction though, because we can handle this case in code generation. The use would just be of the last value of the recurrence, which we already extract in middle.block.
lib/Transforms/Vectorize/LoopVectorize.cpp
368–381	Sure, I'll move it.
3566	No problem!
3581–3593	Sure, I will delete these assertions here.
3633–3644	Yes, I don't think we need to keep track of CurrentParts as a SmallVector since a single value should do. Changing the name is probably a good idea to avoid confusion.
3749–3750	Sure, we can fix the formatting in a separate patch.

anemet added inline comments.Feb 11 2016, 9:35 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
3581–3593	I didn't mean this only for the asserts. I think it would make sense to factor out the way we get the previous and init for a first-order recurrence somewhere (e.g. RecurrenceDescriptor?). There may also be some common functionality between this new function and RecurrenceDescriptor::isFirstOrderRecurrence that could be shared.

mssimpso added inline comments.Feb 11 2016, 10:45 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
3581–3593	I see. Thanks for clarifying!

Addressed Adam's comments. Thanks, Adam!

Herald added a subscriber: mzolotukhin. · View Herald TranscriptFeb 11 2016, 1:17 PM

anemet added inline comments.Feb 17 2016, 11:30 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
3617–3636	Don't we only enter this function if the Phi has passed isFirstOrderRecurrence? Why are we asserting the same properties that were already checked there? This is almost like saying assert(isFirstOrderRecurrence(Phi)), no? I don't think we want these unless I am misunderstanding something.
test/Transforms/LoopVectorize/AArch64/first-order-recurrence.ll
1 ↗	(On Diff #47718)	Please also add a UF>1 test.
1–4 ↗	(On Diff #47718)	I think that it's better practice to avoid the triple and use -force-vector-width to make the test robust.
18 ↗	(On Diff #47718)	Is there any reason we can't fully spell out most of these instructions, e.g. shuffle mask?

Addressed Adam's comments. Thanks again, Adam!

lib/Transforms/Vectorize/LoopVectorize.cpp
3617–3636	Yes, this is my mistake. I thought you had requested these asserts in an earlier review. I've removed the duplication.
test/Transforms/LoopVectorize/AArch64/first-order-recurrence.ll
18 ↗	(On Diff #47718)	We can definitely do this.

LGTM.

This revision is now accepted and ready to land.Feb 18 2016, 2:47 PM

Thanks, Adam! I really appreciate the multiple rounds of feedback you gave on this one.

You're welcome and thanks for your patience. I also had the EuroLLVM reviews happening in parallel.

Closed by commit rL261346: [LV] Vectorize first-order recurrences (authored by mssimpso). · Explain WhyFeb 19 2016, 10:00 AM

This revision was automatically updated to reflect the committed changes.

Ayal mentioned this in D78210: [LV] Mark first-order recurrences as allowed exit.Apr 15 2020, 8:11 AM

Ayal mentioned this in rG8e0c5f720058: [LV] Mark first-order recurrences as allowed exits.Apr 18 2020, 2:01 PM

Revision Contents

Path

Size

include/

llvm/

Transforms/

Utils/

LoopUtils.h

5 lines

lib/

Transforms/

Utils/

LoopUtils.cpp

63 lines

Vectorize/

LoopVectorize.cpp

200 lines

test/

Transforms/

LoopVectorize/

AArch64/

pre-load-recurrence.ll

147 lines

Diff 44897

include/llvm/Transforms/Utils/LoopUtils.h

Show First 20 Lines • Show All 169 Lines • ▼ Show 20 Lines	static bool AddReductionVar(PHINode Phi, RecurrenceKind Kind, Loop TheLoop,
bool HasFunNoNaNAttr,		bool HasFunNoNaNAttr,
RecurrenceDescriptor &RedDes);		RecurrenceDescriptor &RedDes);

/// Returns true if Phi is a reduction in TheLoop. The RecurrenceDescriptor is		/// Returns true if Phi is a reduction in TheLoop. The RecurrenceDescriptor is
/// returned in RedDes.		/// returned in RedDes.
static bool isReductionPHI(PHINode Phi, Loop TheLoop,		static bool isReductionPHI(PHINode Phi, Loop TheLoop,
RecurrenceDescriptor &RedDes);		RecurrenceDescriptor &RedDes);

		/// Returns true if Phi is a pre-load recurrence in TheLoop. A pre-load
		/// recurrence is a phi of loads for the current and previous loop
		/// iterations.
		static bool isPreLoadPHI(PHINode Phi, Loop TheLoop, ScalarEvolution *SE);

RecurrenceKind getRecurrenceKind() { return Kind; }		RecurrenceKind getRecurrenceKind() { return Kind; }

MinMaxRecurrenceKind getMinMaxRecurrenceKind() { return MinMaxKind; }		MinMaxRecurrenceKind getMinMaxRecurrenceKind() { return MinMaxKind; }

TrackingVH<Value> getRecurrenceStartValue() { return StartValue; }		TrackingVH<Value> getRecurrenceStartValue() { return StartValue; }

Instruction *getLoopExitInstr() { return LoopExitInstr; }		Instruction *getLoopExitInstr() { return LoopExitInstr; }

▲ Show 20 Lines • Show All 197 Lines • Show Last 20 Lines

lib/Transforms/Utils/LoopUtils.cpp

Show First 20 Lines • Show All 508 Lines • ▼ Show 20 Lines	bool RecurrenceDescriptor::isReductionPHI(PHINode Phi, Loop TheLoop,
if (AddReductionVar(Phi, RK_FloatMinMax, TheLoop, HasFunNoNaNAttr, RedDes)) {		if (AddReductionVar(Phi, RK_FloatMinMax, TheLoop, HasFunNoNaNAttr, RedDes)) {
DEBUG(dbgs() << "Found an float MINMAX reduction PHI." << *Phi << "\n");		DEBUG(dbgs() << "Found an float MINMAX reduction PHI." << *Phi << "\n");
return true;		return true;
}		}
// Not a reduction of known type.		// Not a reduction of known type.
return false;		return false;
}		}

		bool RecurrenceDescriptor::isPreLoadPHI(PHINode Phi, Loop TheLoop,
		ScalarEvolution *SE) {

		// Ensure the Phi has two incoming values and is only used once.
		if (Phi->getNumIncomingValues() != 2 \|\| !Phi->hasOneUse())
		return false;

		// Ensure the single use of the Phi is in the loop header.
		auto User = dyn_cast<Instruction>(Phi->user_begin());
		if (!User \|\| User->getParent() != TheLoop->getHeader())
		return false;

		// Ensure the Phi is an integer or floating point type.
		Type *Ty = Phi->getType();
		if (!Ty->isIntegerTy() && !Ty->isFloatingPointTy())
		return false;

		// Ensure the two incoming blocks are the loop header and preheader.
		if (Phi->getBasicBlockIndex(TheLoop->getHeader()) < 0 \|\|
		!TheLoop->getLoopPreheader() \|\|
		Phi->getBasicBlockIndex(TheLoop->getLoopPreheader()) < 0)
		return false;

		// Ensure the the two incoming values are loads.
		auto *Load =
		anemetUnsubmitted Done Reply Inline Actions Thinking more about this, isn't it true that whether the phi operand is initial or previous depends on their edge assignment rather than whether they loop-variant/invariant. anemet: Thinking more about this, isn't it true that whether the phi operand is initial or previous…
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions When I think more about this, yes, I agree. I'll make the necessary updates. Thanks for catching this. mssimpso: When I think more about this, yes, I agree. I'll make the necessary updates. Thanks for…
		dyn_cast<LoadInst>(Phi->getIncomingValueForBlock(TheLoop->getHeader()));
		auto *PreLoad = dyn_cast<LoadInst>(
		Phi->getIncomingValueForBlock(TheLoop->getLoopPreheader()));
		if (!PreLoad \|\| !Load)
		return false;

		// Ensure Phi is in the loop header and PreLoad is in the loop preheader.
		// This also ensures that Load is in the loop header.
		if (Phi->getParent() != TheLoop->getHeader() \|\|
		anemetUnsubmitted Done Reply Inline Actions How can user of the current value be loop-invariant? anemet: How can user of the current value be loop-invariant?
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions Current (the Phi) could be used outside the loop. Reductions are an example where a Phi is used externally. I actually don't think we need this restriction though, because we can handle this case in code generation. The use would just be of the last value of the recurrence, which we already extract in middle.block. mssimpso: Current (the Phi) could be used outside the loop. Reductions are an example where a Phi is used…
		PreLoad->getParent() != TheLoop->getLoopPreheader())
		return false;

		// We will now ensure Preload and Load (on the first iteration of the loop)
		// are consecutive accesses. First, get the add recurrence for the pointer
		// operand of the loop-varying load.
		auto *AR = dyn_cast<SCEVAddRecExpr>(SE->getSCEV(Load->getPointerOperand()));
		if (!AR)
		return false;

		// Next, calculate the stride of the pointer operand.
		auto Step = AR->getStepRecurrence(SE);
		sbarangaUnsubmitted Not Done Reply Inline Actions It might be worth taking a PredicatedScalarEvolution for this function which I think might help in some cases. At the moment it is possible that the loop was versioned such that AR's step is known to be 1. In this case doing PSE->getSCEV() would give you the more accurate expression. I think you also need to check that AR's is an AddRecExpr for TheLoop? sbaranga: It might be worth taking a PredicatedScalarEvolution for this function which I think might help…
		if (!isa<SCEVConstant>(Step))
		return false;

		// Next, evaluate the add recurrence on the first iteration of the loop.
		auto *IterationZero =
		AR->evaluateAtIteration(SE->getConstant(APInt(64, 0)), *SE);

		// Finally, Ensure that the pointer operand of the loop-invariant load plus
		// the loop step size equals the pointer operand of the loop-varying load on
		// the first iteration of the loop.
		if (SE->getAddExpr(SE->getSCEV(PreLoad->getPointerOperand()), Step) !=
		IterationZero)
		return false;

		return true;
		}

/// This function returns the identity element (or neutral element) for		/// This function returns the identity element (or neutral element) for
/// the operation K.		/// the operation K.
Constant *RecurrenceDescriptor::getRecurrenceIdentity(RecurrenceKind K,		Constant *RecurrenceDescriptor::getRecurrenceIdentity(RecurrenceKind K,
Type *Tp) {		Type *Tp) {
switch (K) {		switch (K) {
case RK_IntegerXor:		case RK_IntegerXor:
case RK_IntegerAdd:		case RK_IntegerAdd:
case RK_IntegerOr:		case RK_IntegerOr:
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions Hi Silviu. Thanks very much for the comments. After changing the code generation approach, I don't think scalar evolution is required any longer. mssimpso: Hi Silviu. Thanks very much for the comments. After changing the code generation approach, I…
// Adding, Xoring, Oring zero to a number does not change it.		// Adding, Xoring, Oring zero to a number does not change it.
return ConstantInt::get(Tp, 0);		return ConstantInt::get(Tp, 0);
case RK_IntegerMult:		case RK_IntegerMult:
// Multiplying a number by 1 does not change it.		// Multiplying a number by 1 does not change it.
return ConstantInt::get(Tp, 1);		return ConstantInt::get(Tp, 1);
case RK_IntegerAnd:		case RK_IntegerAnd:
// AND-ing a number with an all-1 value does not change it.		// AND-ing a number with an all-1 value does not change it.
return ConstantInt::get(Tp, -1, true);		return ConstantInt::get(Tp, -1, true);
▲ Show 20 Lines • Show All 197 Lines • Show Last 20 Lines

lib/Transforms/Vectorize/LoopVectorize.cpp

Show First 20 Lines • Show All 358 Lines • ▼ Show 20 Lines	protected:
/// Create an empty loop, based on the loop ranges of the old loop.		/// Create an empty loop, based on the loop ranges of the old loop.
void createEmptyLoop();		void createEmptyLoop();
/// Create a new induction variable inside L.		/// Create a new induction variable inside L.
PHINode createInductionVariable(Loop L, Value Start, Value End,		PHINode createInductionVariable(Loop L, Value Start, Value End,
Value Step, Instruction DL);		Value Step, Instruction DL);
/// Copy and widen the instructions from the old loop.		/// Copy and widen the instructions from the old loop.
virtual void vectorizeLoop();		virtual void vectorizeLoop();

		/// Turn the scalar loop-invariant load of a pre-load recurrence into a
		anemetUnsubmitted Done Reply Inline Actions I guess a definition of first-order recurrence should be added here. anemet: I guess a definition of first-order recurrence should be added here.
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions Agreed. mssimpso: Agreed.
		/// vectorized loop-varying load.
		void fixPreLoadPHI(PHINode *Phi);

/// \brief The Loop exit block may have single value PHI nodes where the		/// \brief The Loop exit block may have single value PHI nodes where the
/// incoming value is 'Undef'. While vectorizing we only handled real values		/// incoming value is 'Undef'. While vectorizing we only handled real values
/// that were defined inside the loop. Here we fix the 'undef case'.		/// that were defined inside the loop. Here we fix the 'undef case'.
/// See PR14725.		/// See PR14725.
void fixLCSSAPHIs();		void fixLCSSAPHIs();

/// Shrinks vector element sizes based on information in "MinBWs".		/// Shrinks vector element sizes based on information in "MinBWs".
void truncateToMinimalBitwidths();		void truncateToMinimalBitwidths();

/// A helper function that computes the predicate of the block BB, assuming		/// A helper function that computes the predicate of the block BB, assuming
/// that the header block of the loop is set to True. It returns the entry		/// that the header block of the loop is set to True. It returns the entry
		anemetUnsubmitted Done Reply Inline Actions We don't add empty /// like that I actually think that this definition of this should be before isFirstOrderRecurrence. anemet: We don't add empty /// like that I actually think that this definition of this should be…
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions Sure, I'll move it. mssimpso: Sure, I'll move it.
/// mask for the block BB.		/// mask for the block BB.
VectorParts createBlockInMask(BasicBlock *BB);		VectorParts createBlockInMask(BasicBlock *BB);
/// A helper function that computes the predicate of the edge between SRC		/// A helper function that computes the predicate of the edge between SRC
/// and DST.		/// and DST.
VectorParts createEdgeMask(BasicBlock Src, BasicBlock Dst);		VectorParts createEdgeMask(BasicBlock Src, BasicBlock Dst);

/// A helper function to vectorize a single BB within the innermost loop.		/// A helper function to vectorize a single BB within the innermost loop.
void vectorizeBlockInLoop(BasicBlock BB, PhiVector PV);		void vectorizeBlockInLoop(BasicBlock BB, PhiVector PV);
▲ Show 20 Lines • Show All 810 Lines • ▼ Show 20 Lines	public:
/// ReductionList contains the reduction descriptors for all		/// ReductionList contains the reduction descriptors for all
/// of the reductions that were found in the loop.		/// of the reductions that were found in the loop.
typedef DenseMap<PHINode *, RecurrenceDescriptor> ReductionList;		typedef DenseMap<PHINode *, RecurrenceDescriptor> ReductionList;

/// InductionList saves induction variables and maps them to the		/// InductionList saves induction variables and maps them to the
/// induction descriptor.		/// induction descriptor.
typedef MapVector<PHINode*, InductionDescriptor> InductionList;		typedef MapVector<PHINode*, InductionDescriptor> InductionList;

		/// PreLoadPHISet contains the phi nodes that are pre-load recurrences. A
		/// pre-load recurrence is a phi of loads for the current and previous loop
		/// iterations.
		typedef SmallPtrSet<const PHINode *, 8> PreLoadPHISet;

/// Returns true if it is legal to vectorize this loop.		/// Returns true if it is legal to vectorize this loop.
/// This does not mean that it is profitable to vectorize this		/// This does not mean that it is profitable to vectorize this
/// loop, only that it is legal to do so.		/// loop, only that it is legal to do so.
bool canVectorize();		bool canVectorize();

/// Returns the Induction variable.		/// Returns the Induction variable.
PHINode *getInduction() { return Induction; }		PHINode *getInduction() { return Induction; }

/// Returns the reduction variables found in the loop.		/// Returns the reduction variables found in the loop.
ReductionList *getReductionVars() { return &Reductions; }		ReductionList *getReductionVars() { return &Reductions; }

/// Returns the induction variables found in the loop.		/// Returns the induction variables found in the loop.
InductionList *getInductionVars() { return &Inductions; }		InductionList *getInductionVars() { return &Inductions; }

		/// Return the pre-load recurrences found in the loop.
		PreLoadPHISet *getPreLoadPHIs() { return &PreLoadPHIs; }

/// Returns the widest induction type.		/// Returns the widest induction type.
Type *getWidestInductionType() { return WidestIndTy; }		Type *getWidestInductionType() { return WidestIndTy; }

/// Returns True if V is an induction variable in this loop.		/// Returns True if V is an induction variable in this loop.
bool isInductionVariable(const Value *V);		bool isInductionVariable(const Value *V);

/// Returns True if PN is a reduction variable in this loop.		/// Returns True if PN is a reduction variable in this loop.
bool isReductionVariable(PHINode *PN) { return Reductions.count(PN); }		bool isReductionVariable(PHINode *PN) { return Reductions.count(PN); }

		/// Returns True if Phi is a pre-load recurrence in this loop.
		bool isPreLoadPHI(const PHINode *Phi);

/// Return true if the block BB needs to be predicated in order for the loop		/// Return true if the block BB needs to be predicated in order for the loop
/// to be vectorized.		/// to be vectorized.
bool blockNeedsPredication(BasicBlock *BB);		bool blockNeedsPredication(BasicBlock *BB);

/// Check if this pointer is consecutive when vectorizing. This happens		/// Check if this pointer is consecutive when vectorizing. This happens
/// when the last index of the GEP is the induction variable, or that the		/// when the last index of the GEP is the induction variable, or that the
/// pointer itself is an induction variable.		/// pointer itself is an induction variable.
/// This check allows us to vectorize A[idx] into a wide load/store.		/// This check allows us to vectorize A[idx] into a wide load/store.
▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	private:
/// legal to vectorize the code, considering only memory constrains.		/// legal to vectorize the code, considering only memory constrains.
/// Returns true if the loop is vectorizable		/// Returns true if the loop is vectorizable
bool canVectorizeMemory();		bool canVectorizeMemory();

/// Return true if we can vectorize this loop using the IF-conversion		/// Return true if we can vectorize this loop using the IF-conversion
/// transformation.		/// transformation.
bool canVectorizeWithIfConvert();		bool canVectorizeWithIfConvert();

		/// Return true if we can sink the loop-invariant load of the pre-load
		/// recurrence Phi into the body of loop L.
		///
		/// TODO: We can make this check much more precise by utilizing a memory
		/// dependence analysis and by considering more complicated
		/// control-flow.
		bool canSinkPreLoadPHI(PHINode Phi, Loop L);

/// Collect the variables that need to stay uniform after vectorization.		/// Collect the variables that need to stay uniform after vectorization.
void collectLoopUniforms();		void collectLoopUniforms();

/// Return true if all of the instructions in the block can be speculatively		/// Return true if all of the instructions in the block can be speculatively
/// executed. \p SafePtrs is a list of addresses that are known to be legal		/// executed. \p SafePtrs is a list of addresses that are known to be legal
/// and we know that we can read from them without segfault.		/// and we know that we can read from them without segfault.
bool blockCanBePredicated(BasicBlock BB, SmallPtrSetImpl<Value > &SafePtrs);		bool blockCanBePredicated(BasicBlock BB, SmallPtrSetImpl<Value > &SafePtrs);

▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	private:
/// loop.		/// loop.
PHINode *Induction;		PHINode *Induction;
/// Holds the reduction variables.		/// Holds the reduction variables.
ReductionList Reductions;		ReductionList Reductions;
/// Holds all of the induction variables that we found in the loop.		/// Holds all of the induction variables that we found in the loop.
/// Notice that inductions don't need to start at zero and that induction		/// Notice that inductions don't need to start at zero and that induction
/// variables can be pointers.		/// variables can be pointers.
InductionList Inductions;		InductionList Inductions;
		/// Holds the phi nodes that are pre-load recurrences.
		PreLoadPHISet PreLoadPHIs;
/// Holds the widest induction type encountered.		/// Holds the widest induction type encountered.
Type *WidestIndTy;		Type *WidestIndTy;

/// Allowed outside users. This holds the reduction		/// Allowed outside users. This holds the reduction
/// vars which can be accessed from outside the loop.		/// vars which can be accessed from outside the loop.
SmallPtrSet<Value*, 4> AllowedExit;		SmallPtrSet<Value*, 4> AllowedExit;
/// This set holds the variables which are known to be uniform after		/// This set holds the variables which are known to be uniform after
/// vectorization.		/// vectorization.
▲ Show 20 Lines • Show All 1,913 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::vectorizeLoop() {
// Create the 'reduced' values for each of the induction vars.		// Create the 'reduced' values for each of the induction vars.
// The reduced values are the vector values that we scalarize and combine		// The reduced values are the vector values that we scalarize and combine
// after the loop is finished.		// after the loop is finished.
for (PhiVector::iterator it = RdxPHIsToFix.begin(), e = RdxPHIsToFix.end();		for (PhiVector::iterator it = RdxPHIsToFix.begin(), e = RdxPHIsToFix.end();
it != e; ++it) {		it != e; ++it) {
PHINode RdxPhi = it;		PHINode RdxPhi = it;
assert(RdxPhi && "Unable to recover vectorized PHI");		assert(RdxPhi && "Unable to recover vectorized PHI");

		// If RdxPhi is a pre-load recurrence, it's not actually a reduction, but
		// it still needs to be fixed. Vectorize it and continue.
		//
		// FIXME: We need to refactor this function since the recurrences in
		// RdxPHIsToFix are not necessarily reductions.
		anemetUnsubmitted Done Reply Inline Actions I actually think we need to this before this patch ;). This is a 200-line loop and we shouldn't pile on more strangeness. I think that all we need to do is to rename RdxPHIsToFix to PHIsToFix in a prequel NFC patch. anemet: I actually think we need to this before this patch ;). This is a 200-line loop and we…
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions Sure, I'll push a patch that renames RdxPHIsToFix to PHIsToFix first. mssimpso: Sure, I'll push a patch that renames RdxPHIsToFix to PHIsToFix first.
		if (Legal->isPreLoadPHI(RdxPhi)) {
		fixPreLoadPHI(RdxPhi);
		anemetUnsubmitted Done Reply Inline Actions I know that we use fix... in other places but that's pretty general. How about vectorize... or something like that. anemet: I know that we use fix... in other places but that's pretty general. How about vectorize... or…
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions vectorizeFirstOrderRecurrence sounds good to me. mssimpso: vectorizeFirstOrderRecurrence sounds good to me.
		continue;
		}

// Find the reduction variable descriptor.		// Find the reduction variable descriptor.
assert(Legal->isReductionVariable(RdxPhi) &&		assert(Legal->isReductionVariable(RdxPhi) &&
"Unable to find the reduction variable");		"Unable to find the reduction variable");
RecurrenceDescriptor RdxDesc = (*Legal->getReductionVars())[RdxPhi];		RecurrenceDescriptor RdxDesc = (*Legal->getReductionVars())[RdxPhi];

RecurrenceDescriptor::RecurrenceKind RK = RdxDesc.getRecurrenceKind();		RecurrenceDescriptor::RecurrenceKind RK = RdxDesc.getRecurrenceKind();
TrackingVH<Value> ReductionStartValue = RdxDesc.getRecurrenceStartValue();		TrackingVH<Value> ReductionStartValue = RdxDesc.getRecurrenceStartValue();
Instruction *LoopExitInst = RdxDesc.getLoopExitInstr();		Instruction *LoopExitInst = RdxDesc.getLoopExitInstr();
▲ Show 20 Lines • Show All 210 Lines • ▼ Show 20 Lines	for (auto KV : PredicatedStores) {
I->getParent()->setName("pred.store.if");		I->getParent()->setName("pred.store.if");
BB->setName("pred.store.continue");		BB->setName("pred.store.continue");
}		}
DEBUG(DT->verifyDomTree());		DEBUG(DT->verifyDomTree());
// Remove redundant induction instructions.		// Remove redundant induction instructions.
cse(LoopVectorBody);		cse(LoopVectorBody);
}		}

		void InnerLoopVectorizer::fixPreLoadPHI(PHINode *Phi) {
		anemetUnsubmitted Done Reply Inline Actions Sorry but I forgot that "fix" is an actual "phase" here. We first do the default widening then we fix up the phis. Can you please rename it back, sorry :( anemet: Sorry but I forgot that "fix" is an actual "phase" here. We first do the default widening then…
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions No problem! mssimpso: No problem!

		// The original loop header and its corresponding loop-varying load.
		auto *Header = OrigLoop->getHeader();
		auto *Load = cast<LoadInst>(Phi->getIncomingValueForBlock(Header));

		// The data layout for this module.
		auto &DL = Header->getModule()->getDataLayout();

		// The original loop preheader and its corresponding loop-invariant load.
		// This load is used on the first iteration of the loop.
		auto *PreHeader = OrigLoop->getLoopPreheader();
		anemetUnsubmitted Done Reply Inline Actions assert that Previous is loop-invariant at this point? anemet: assert that Previous is loop-invariant at this point?
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions Good idea. mssimpso: Good idea.
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions We actually want the previous value to be loop-varying here. The initial value should be loop-invariant. I've added this assertion as well as a few others. mssimpso: We actually want the previous value to be loop-varying here. The initial value should be loop…
		auto *PreLoad = Phi->getIncomingValueForBlock(PreHeader);

		// Get the negated step size of the original loop. Note that these casts were
		// checked when we original detected the PreLoadPHI.
		auto *AR = cast<SCEVAddRecExpr>(PSE.getSCEV(Load->getPointerOperand()));
		auto SR = cast<SCEVConstant>(AR->getStepRecurrence(PSE.getSE()));
		int LoadSize = DL.getTypeStoreSize(Load->getType());
		int StepSize = -1 * SR->getValue()->getSExtValue() / LoadSize;

		// Get the VectorParts we will need. We ensured the Phi had only one user
		// during legalization.
		auto &LoadParts = getVectorValue(Load);
		auto &PhiParts = getVectorValue(Phi);
		auto &UserParts = getVectorValue(*Phi->user_begin());

		// We are going to sink PreLoad into the vectorized loop. We will do this by
		anemetUnsubmitted Done Reply Inline Actions This is duplicated between here and isFirstOrderRecurrence, would be nice to somehow remove this. anemet: This is duplicated between here and isFirstOrderRecurrence, would be nice to somehow remove…
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions Sure, I will delete these assertions here. mssimpso: Sure, I will delete these assertions here.
		anemetUnsubmitted Not Done Reply Inline Actions I didn't mean this only for the asserts. I think it would make sense to factor out the way we get the previous and init for a first-order recurrence somewhere (e.g. RecurrenceDescriptor?). There may also be some common functionality between this new function and RecurrenceDescriptor::isFirstOrderRecurrence that could be shared. anemet: I didn't mean this only for the asserts. I think it would make sense to factor out the way we…
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions I see. Thanks for clarifying! mssimpso: I see. Thanks for clarifying!
		// duplicating the loop-varying load and subtracting one iteration from its
		// pointer operand. We will map Phi to this new load in the ValueMap.
		// However, because the loop-varying load has already been vectorized, we
		// need to first save its VectorParts before we overwrite them.
		VectorParts TempParts(UF);
		for (unsigned I = 0; I < UF; ++I)
		TempParts[I] = LoadParts[I];

		// Set the insertion point before the vectorized version of the Phi's single
		// user, and create another copy of the loop-varying load.
		Builder.SetInsertPoint(cast<Instruction>(UserParts[0]));
		vectorizeMemoryInstruction(Load);
		anemetUnsubmitted Done Reply Inline Actions "... when initially vectorizing ..." anemet: "... when initially vectorizing ..."

		for (unsigned Part = 0; Part < UF; ++Part) {

		// If Phi is a PreLoadPHI, Load should not have been scalarized. Thus, its
		// corresponding vector parts should either be loads (for non-interleaved
		anemetUnsubmitted Done Reply Inline Actions Outdated comment. anemet: Outdated comment.
		// accesses) or shuffles (for interleaved accesses). The code below finds
		// the vector load or asserts if one is not found.
		auto *V = LoadParts[Part];
		if (auto *Shuffle = dyn_cast<ShuffleVectorInst>(V))
		V = Shuffle->getOperand(0);
		assert(isa<LoadInst>(V) && "Vector part is neither a load nor shuffle");
		auto *LoadPart = cast<LoadInst>(V);
		Builder.SetInsertPoint(LoadPart);

		// Get the pointer operand of LoadPart and cast it to a scalar type.
		auto *Ptr = LoadPart->getPointerOperand();
		if (auto *VecTy = dyn_cast<VectorType>(LoadPart->getType()))
		Ptr = Builder.CreateBitOrPointerCast(
		Ptr, PointerType::getUnqual(VecTy->getElementType()));

		// Subtract one iteration from the pointer operand.
		auto *GEP = Builder.CreateGEP(nullptr, Ptr, Builder.getInt32(StepSize));

		// Cast the pointer operand back to the vector type.
		if (Ptr != LoadPart->getPointerOperand())
		GEP = Builder.CreateBitOrPointerCast(
		GEP, LoadPart->getPointerOperand()->getType());

		// Update the pointer operand of LoadPart to refer to the new location.
		LoadPart->setOperand(0, GEP);

		anemetUnsubmitted Done Reply Inline Actions Don't we only enter this function if the Phi has passed isFirstOrderRecurrence? Why are we asserting the same properties that were already checked there? This is almost like saying assert(isFirstOrderRecurrence(Phi)), no? I don't think we want these unless I am misunderstanding something. anemet: Don't we only enter this function if the Phi has passed isFirstOrderRecurrence? Why are we…
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions Yes, this is my mistake. I thought you had requested these asserts in an earlier review. I've removed the duplication. mssimpso: Yes, this is my mistake. I thought you had requested these asserts in an earlier review. I've…
		// Replace the temporary vector values we created for Phi with the new
		// loads. We also restore the original LoadParts since it was just
		// overwritten with the mapping created for the Phi.
		PhiParts[Part]->replaceAllUsesWith(LoadParts[Part]);
		PhiParts[Part] = LoadParts[Part];
		LoadParts[Part] = TempParts[Part];
		}

		anemetUnsubmitted Done Reply Inline Actions I think that CurrentParts should be a single value, this is the value that's incoming into the iteration (whether the real loop iteration or an unrolled loop iteration). We may also want to use the name VecPhi rather than "current" and add comment clarifying the above. What do you think? anemet: I think that CurrentParts should be a single value, this is the value that's incoming into the…
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions Yes, I don't think we need to keep track of CurrentParts as a SmallVector since a single value should do. Changing the name is probably a good idea to avoid confusion. mssimpso: Yes, I don't think we need to keep track of CurrentParts as a SmallVector since a single value…
		// If the vector loop jumps to the scalar version to complete the required
		// iterations, we need to first extract the last vector value. We have to
		// recreate the PreLoadPHI for the scalar loop, ensuring that the first
		// iteration is correct.
		Builder.SetInsertPoint(LoopMiddleBlock->getTerminator());
		auto *Extract =
		Builder.CreateExtractElement(LoadParts[UF - 1], Builder.getInt32(VF - 1));

		// Create a new PHI for the starting value of the scalar PreLoadPHI. This
		// will either be the original PreLoad or the value we extracted from the
		// vector when exiting the vector loop.
		Builder.SetInsertPoint(&*LoopScalarPreHeader->begin());
		auto *PreLoadPHI = Builder.CreatePHI(PreLoad->getType(), 2);
		for (auto *BB : predecessors(LoopScalarPreHeader)) {
		auto *Incoming = BB == LoopMiddleBlock ? Extract : PreLoad;
		PreLoadPHI->addIncoming(Incoming, BB);
		}

		// Finally, update the incoming value of the scalar PreLoadPHI.
		Phi->setIncomingValue(Phi->getBasicBlockIndex(LoopScalarPreHeader),
		PreLoadPHI);
		}

		anemetUnsubmitted Done Reply Inline Actions A newline before this block? anemet: A newline before this block?
void InnerLoopVectorizer::fixLCSSAPHIs() {		void InnerLoopVectorizer::fixLCSSAPHIs() {
for (BasicBlock::iterator LEI = LoopExitBlock->begin(),		for (BasicBlock::iterator LEI = LoopExitBlock->begin(),
LEE = LoopExitBlock->end(); LEI != LEE; ++LEI) {		LEE = LoopExitBlock->end(); LEI != LEE; ++LEI) {
PHINode *LCSSAPhi = dyn_cast<PHINode>(LEI);		PHINode *LCSSAPhi = dyn_cast<PHINode>(LEI);
if (!LCSSAPhi) break;		if (!LCSSAPhi) break;
if (LCSSAPhi->getNumIncomingValues() == 1)		if (LCSSAPhi->getNumIncomingValues() == 1)
LCSSAPhi->addIncoming(UndefValue::get(LCSSAPhi->getType()),		LCSSAPhi->addIncoming(UndefValue::get(LCSSAPhi->getType()),
LoopMiddleBlock);		LoopMiddleBlock);
▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::widenPHIInstruction(
PHINode* P = cast<PHINode>(PN);		PHINode* P = cast<PHINode>(PN);
// Handle reduction variables:		// Handle reduction variables:
if (Legal->isReductionVariable(P)) {		if (Legal->isReductionVariable(P)) {
for (unsigned part = 0; part < UF; ++part) {		for (unsigned part = 0; part < UF; ++part) {
// This is phase one of vectorizing PHIs.		// This is phase one of vectorizing PHIs.
Type *VecTy = (VF == 1) ? PN->getType() :		Type *VecTy = (VF == 1) ? PN->getType() :
VectorType::get(PN->getType(), VF);		VectorType::get(PN->getType(), VF);
Entry[part] = PHINode::Create(		Entry[part] = PHINode::Create(
VecTy, 2, "vec.phi", &*LoopVectorBody.back()->getFirstInsertionPt());		VecTy, 2, "vec.phi", &*LoopVectorBody.back()->getFirstInsertionPt());
}		}
		anemetUnsubmitted Done Reply Inline Actions It's generally not a good practice to mix a multi-line formatting change with a single line (?) functionality change . anemet: It's generally not a good practice to mix a multi-line formatting change with a single line (?)…
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions Sure, we can fix the formatting in a separate patch. mssimpso: Sure, we can fix the formatting in a separate patch.
PV->push_back(P);		PV->push_back(P);
return;		return;
}		}

		// Handle pre-load recurrences. We create a temporary load within the loop
		// for the loop-invariant load of the pre-load recurrence. We will fix this
		// load later after all induction variables have been generated.
		if (Legal->isPreLoadPHI(P)) {
		auto *Ty = (VF == 1) ? PN->getType() : VectorType::get(PN->getType(), VF);
		auto *LI =
		cast<LoadInst>(P->getIncomingValueForBlock(OrigLoop->getHeader()));
		for (unsigned part = 0; part < UF; ++part)
		Entry[part] = new LoadInst(
		UndefValue::get(PointerType::get(Ty, LI->getPointerAddressSpace())),
		"vec.pre", &*LoopVectorBody.back()->getFirstInsertionPt());
		PV->push_back(P);
		return;
		}

setDebugLocFromInst(Builder, P);		setDebugLocFromInst(Builder, P);
// Check for PHI nodes that are lowered to vector selects.		// Check for PHI nodes that are lowered to vector selects.
if (P->getParent() != OrigLoop->getHeader()) {		if (P->getParent() != OrigLoop->getHeader()) {
// We know that all PHIs in non-header blocks are converted into		// We know that all PHIs in non-header blocks are converted into
// selects, so we don't have to worry about the insertion order and we		// selects, so we don't have to worry about the insertion order and we
// can just use the builder.		// can just use the builder.
// At this point we generate the predication tree. There may be		// At this point we generate the predication tree. There may be
// duplications since this is a simple recursive scan, but future		// duplications since this is a simple recursive scan, but future
▲ Show 20 Lines • Show All 672 Lines • ▼ Show 20 Lines	for (BasicBlock::iterator it = (bb)->begin(), e = (bb)->end(); it != e;
if (RecurrenceDescriptor::isReductionPHI(Phi, TheLoop, RedDes)) {		if (RecurrenceDescriptor::isReductionPHI(Phi, TheLoop, RedDes)) {
if (RedDes.hasUnsafeAlgebra())		if (RedDes.hasUnsafeAlgebra())
Requirements->addUnsafeAlgebraInst(RedDes.getUnsafeAlgebraInst());		Requirements->addUnsafeAlgebraInst(RedDes.getUnsafeAlgebraInst());
AllowedExit.insert(RedDes.getLoopExitInstr());		AllowedExit.insert(RedDes.getLoopExitInstr());
Reductions[Phi] = RedDes;		Reductions[Phi] = RedDes;
continue;		continue;
}		}

		// If Phi is a pre-load recurrence, we also have to ensure that we can
		// sink its loop-invariant load into the body of the loop.
		if (RecurrenceDescriptor::isPreLoadPHI(Phi, TheLoop, PSE.getSE()))
		if (canSinkPreLoadPHI(Phi, TheLoop)) {
		PreLoadPHIs.insert(Phi);
		continue;
		}

emitAnalysis(VectorizationReport(&*it) <<		emitAnalysis(VectorizationReport(&*it) <<
"value that could not be identified as "		"value that could not be identified as "
"reduction is used outside the loop");		"reduction is used outside the loop");
DEBUG(dbgs() << "LV: Found an unidentified PHI."<< *Phi <<"\n");		DEBUG(dbgs() << "LV: Found an unidentified PHI."<< *Phi <<"\n");
return false;		return false;
}// end of PHI handling		}// end of PHI handling

// We handle calls that:		// We handle calls that:
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	bool LoopVectorizationLegality::canVectorizeInstrs() {
// is the same size. If it's not, unset it here and InnerLoopVectorizer		// is the same size. If it's not, unset it here and InnerLoopVectorizer
// will create another.		// will create another.
if (Induction && WidestIndTy != Induction->getType())		if (Induction && WidestIndTy != Induction->getType())
Induction = nullptr;		Induction = nullptr;

return true;		return true;
}		}

		bool LoopVectorizationLegality::canSinkPreLoadPHI(PHINode Phi, Loop L) {

		// The loop header and preheader. We ensured the loop had a preheader when
		// detecting the PreLoadPHI.
		auto *Header = L->getHeader();
		auto *PreHeader = L->getLoopPreheader();

		// The loop-invariant load and the Phi's single user. We checked these casts
		// when detecting the PreLoadPHI.
		auto *PreLoad = cast<LoadInst>(Phi->getIncomingValueForBlock(PreHeader));
		auto User = dyn_cast<Instruction>(Phi->user_begin());

		// Ensure there is no store between the loop-invariant load we are going to
		// sink and the end of the loop preheader. We ensured the loop-invariant load
		// is in the preheader when detecting the PreLoadPHI.
		using Iter = BasicBlock::iterator;
		for (auto I = Iter(PreLoad), E = PreHeader->end(); I != E; ++I)
		if (isa<StoreInst>(&*I))
		gberryUnsubmitted Not Done Reply Inline Actions I don't think checking for store instructions is enough here. Isn't I->mayWriteToMemory() more appropriate to catch e.g. calls? gberry: I don't think checking for store instructions is enough here. Isn't I->mayWriteToMemory() more…
		return false;

		// Ensure there is no store between the beginning of the loop header and the
		// location to which we're going to sink the load.
		for (auto I = Header->begin(), E = Iter(User); I != E; ++I)
		if (isa<StoreInst>(&*I))
		gberryUnsubmitted Not Done Reply Inline Actions Same as above. Plus, I'm not sure this is a sufficient check. What if there is a store after the sink point in iteration 'i' that may alias the load from iteration 'i+1'? Perhaps you can piggy back on the check for vectorizability of the original load in the loop? That may not be enough either since the load in question might be known to only write the 0th element (i.e. the value loaded by the sunk load). gberry: Same as above. Plus, I'm not sure this is a sufficient check. What if there is a store after…
		return false;

		// If we didn't see any stores, it's safe to sink the loop-invariant load.
		return true;
		}

void LoopVectorizationLegality::collectStridedAccess(Value *MemAccess) {		void LoopVectorizationLegality::collectStridedAccess(Value *MemAccess) {
Value *Ptr = nullptr;		Value *Ptr = nullptr;
if (LoadInst *LI = dyn_cast<LoadInst>(MemAccess))		if (LoadInst *LI = dyn_cast<LoadInst>(MemAccess))
Ptr = LI->getPointerOperand();		Ptr = LI->getPointerOperand();
else if (StoreInst *SI = dyn_cast<StoreInst>(MemAccess))		else if (StoreInst *SI = dyn_cast<StoreInst>(MemAccess))
Ptr = SI->getPointerOperand();		Ptr = SI->getPointerOperand();
else		else
return;		return;
▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	bool LoopVectorizationLegality::isInductionVariable(const Value *V) {
Value In0 = const_cast<Value>(V);		Value In0 = const_cast<Value>(V);
PHINode *PN = dyn_cast_or_null<PHINode>(In0);		PHINode *PN = dyn_cast_or_null<PHINode>(In0);
if (!PN)		if (!PN)
return false;		return false;

return Inductions.count(PN);		return Inductions.count(PN);
}		}

		bool LoopVectorizationLegality::isPreLoadPHI(const PHINode *Phi) {
		return PreLoadPHIs.count(Phi);
		}

bool LoopVectorizationLegality::blockNeedsPredication(BasicBlock *BB) {		bool LoopVectorizationLegality::blockNeedsPredication(BasicBlock *BB) {
return LoopAccessInfo::blockNeedsPredication(BB, TheLoop, DT);		return LoopAccessInfo::blockNeedsPredication(BB, TheLoop, DT);
}		}

bool LoopVectorizationLegality::blockCanBePredicated(BasicBlock *BB,		bool LoopVectorizationLegality::blockCanBePredicated(BasicBlock *BB,
SmallPtrSetImpl<Value *> &SafePtrs) {		SmallPtrSetImpl<Value *> &SafePtrs) {

for (BasicBlock::iterator it = BB->begin(), e = BB->end(); it != e; ++it) {		for (BasicBlock::iterator it = BB->begin(), e = BB->end(); it != e; ++it) {
▲ Show 20 Lines • Show All 831 Lines • ▼ Show 20 Lines

unsigned		unsigned
LoopVectorizationCostModel::getInstructionCost(Instruction *I, unsigned VF) {		LoopVectorizationCostModel::getInstructionCost(Instruction *I, unsigned VF) {
// If we know that this instruction will remain uniform, check the cost of		// If we know that this instruction will remain uniform, check the cost of
// the scalar version.		// the scalar version.
if (Legal->isUniformAfterVectorization(I))		if (Legal->isUniformAfterVectorization(I))
VF = 1;		VF = 1;

		// If the instruction is a pre-load recurrence, it will be replaced by a load
		// within the loop. Update the instruction to account for the load.
		if (auto *Phi = dyn_cast<PHINode>(I))
		if (VF > 1 && Legal->isPreLoadPHI(Phi))
		I = cast<LoadInst>(Phi->getIncomingValueForBlock(TheLoop->getHeader()));

Type *RetTy = I->getType();		Type *RetTy = I->getType();
if (VF > 1 && MinBWs.count(I))		if (VF > 1 && MinBWs.count(I))
RetTy = IntegerType::get(RetTy->getContext(), MinBWs[I]);		RetTy = IntegerType::get(RetTy->getContext(), MinBWs[I]);
Type *VectorTy = ToVectorTy(RetTy, VF);		Type *VectorTy = ToVectorTy(RetTy, VF);
auto SE = PSE.getSE();		auto SE = PSE.getSE();

// TODO: We need to estimate the cost of intrinsic calls.		// TODO: We need to estimate the cost of intrinsic calls.
switch (I->getOpcode()) {		switch (I->getOpcode()) {
▲ Show 20 Lines • Show All 486 Lines • Show Last 20 Lines

test/Transforms/LoopVectorize/AArch64/pre-load-recurrence.ll

This file was added.

				; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -dce -instcombine -S \| FileCheck %s

				target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"
				target triple = "aarch64--linux-gnu"

				; CHECK-LABEL: @pre_load_1
				;
				; void pre_load_1(int a, int b, int n) {
				; for(int i = 0; i < n; i++)
				; b[i] = a[i] + a[i - 1]
				; }
				;
				; CHECK: vector.body:
				; CHECK: [[L:%[a-zA-Z0-9.]+]] = load <4 x i32>
				;
				; CHECK: middle.block:
				; CHECK: extractelement <4 x i32> [[L]], i32 3
				;
				define void @pre_load_1(i32* nocapture readonly %a, i32* nocapture %b, i32 %n) {
				entry:
				br label %for.preheader

				for.preheader:
				%arrayidx.phi.trans.insert = getelementptr inbounds i32, i32* %a, i64 0
				%pre_load = load i32, i32* %arrayidx.phi.trans.insert
				br label %for.body

				for.body:
				%0 = phi i32 [ %pre_load, %for.preheader ], [ %1, %for.body ]
				%indvars.iv = phi i64 [ 0, %for.preheader ], [ %indvars.iv.next, %for.body ]
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%arrayidx32 = getelementptr inbounds i32, i32* %a, i64 %indvars.iv.next
				%1 = load i32, i32* %arrayidx32
				%arrayidx34 = getelementptr inbounds i32, i32* %b, i64 %indvars.iv
				%add35 = add i32 %1, %0
				store i32 %add35, i32* %arrayidx34
				%lftr.wideiv = trunc i64 %indvars.iv.next to i32
				%exitcond = icmp eq i32 %lftr.wideiv, %n
				br i1 %exitcond, label %for.exit, label %for.body

				for.exit:
				ret void
				}

				; CHECK-LABEL: @pre_load_2
				;
				; int pre_load_2(int *a, int n) {
				; int minmax;
				; for (int i = 0; i < n; ++i)
				; minmax = min(minmax, max(a[i] - a[i-1], 0));
				; return minmax;
				; }
				;
				; CHECK: vector.body:
				; CHECK: [[L:%[a-zA-Z0-9.]+]] = load <4 x i32>
				;
				; CHECK: middle.block:
				; CHECK: extractelement <4 x i32> [[L]], i32 3
				;
				define i32 @pre_load_2(i32* nocapture readonly %a, i32 %n) {
				entry:
				%cmp27 = icmp sgt i32 %n, 0
				br i1 %cmp27, label %for.body.preheader, label %for.cond.cleanup

				for.body.preheader:
				%arrayidx2.phi.trans.insert = getelementptr inbounds i32, i32* %a, i64 -1
				%.pre = load i32, i32* %arrayidx2.phi.trans.insert, align 4
				br label %for.body

				for.cond.cleanup.loopexit:
				%minmax.0.cond.lcssa = phi i32 [ %minmax.0.cond, %for.body ]
				br label %for.cond.cleanup

				for.cond.cleanup:
				%minmax.0.lcssa = phi i32 [ undef, %entry ], [ %minmax.0.cond.lcssa, %for.cond.cleanup.loopexit ]
				ret i32 %minmax.0.lcssa

				for.body:
				%0 = phi i32 [ %.pre, %for.body.preheader ], [ %1, %for.body ]
				%indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]
				%minmax.028 = phi i32 [ undef, %for.body.preheader ], [ %minmax.0.cond, %for.body ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
				%1 = load i32, i32* %arrayidx, align 4
				%sub3 = sub nsw i32 %1, %0
				%cmp4 = icmp sgt i32 %sub3, 0
				%cond = select i1 %cmp4, i32 %sub3, i32 0
				%cmp5 = icmp slt i32 %minmax.028, %cond
				%minmax.0.cond = select i1 %cmp5, i32 %minmax.028, i32 %cond
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%lftr.wideiv = trunc i64 %indvars.iv.next to i32
				%exitcond = icmp eq i32 %lftr.wideiv, %n
				br i1 %exitcond, label %for.cond.cleanup.loopexit, label %for.body
				}

				; CHECK-LABEL: @pre_load_3
				;
				; void pre_load_3(short a, double b, int n, float f, short p) {
				; b[0] = (double)a[0] - f * (double)p;
				; for (int i = 1; i < n; i++)
				; b[i] = (double)a[i] - f * (double)a[i - 1];
				; }
				;
				; CHECK-NOT: load <{{.*}}>
				;
				; TODO: We currently do not vectorize this case because we do not attempt to
				; determinte if we can sink %0 into the loop. Sinking this load requires
				; more sophisticated analysis since it is not in the loop preheader and
				; would cross a store.
				;
				define void @pre_load_3(i16* nocapture readonly %a, double* nocapture %b, i32 %n, float %f, i16 %p) {
				entry:
				%0 = load i16, i16* %a, align 2
				%conv = sitofp i16 %0 to double
				%conv1 = fpext float %f to double
				%conv2 = sitofp i16 %p to double
				%mul = fmul fast double %conv2, %conv1
				%sub = fsub fast double %conv, %mul
				store double %sub, double* %b, align 8
				%cmp25 = icmp sgt i32 %n, 1
				br i1 %cmp25, label %for.body.preheader, label %for.end

				for.body.preheader:
				br label %for.body

				for.body:
				%1 = phi i16 [ %2, %for.body ], [ %0, %for.body.preheader ]
				%advars.iv = phi i64 [ %advars.iv.next, %for.body ], [ 1, %for.body.preheader ]
				%arrayidx5 = getelementptr inbounds i16, i16* %a, i64 %advars.iv
				%2 = load i16, i16* %arrayidx5, align 2
				%conv6 = sitofp i16 %2 to double
				%conv11 = sitofp i16 %1 to double
				%mul12 = fmul fast double %conv11, %conv1
				%sub13 = fsub fast double %conv6, %mul12
				%arrayidx15 = getelementptr inbounds double, double* %b, i64 %advars.iv
				store double %sub13, double* %arrayidx15, align 8
				%advars.iv.next = add nuw nsw i64 %advars.iv, 1
				%lftr.wideiv = trunc i64 %advars.iv.next to i32
				%exitcond = icmp eq i32 %lftr.wideiv, %n
				br i1 %exitcond, label %for.end.loopexit, label %for.body

				for.end.loopexit:
				br label %for.end

				for.end:
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[LV] Vectorize first-order recurrencesClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 44897

include/llvm/Transforms/Utils/LoopUtils.h

lib/Transforms/Utils/LoopUtils.cpp

lib/Transforms/Vectorize/LoopVectorize.cpp

test/Transforms/LoopVectorize/AArch64/pre-load-recurrence.ll

[LV] Vectorize first-order recurrences
ClosedPublic