This is an archive of the discontinued LLVM Phabricator instance.

[LV] Vectorize first-order recurrences
ClosedPublic

Authored by mssimpso on Jan 14 2016, 10:02 AM.

Download Raw Diff

Details

Reviewers

anemet
nadav
jmolloy
hfinkel
mcrosier

Commits

rG29c997c1a18a: [LV] Vectorize first-order recurrences
rL261346: [LV] Vectorize first-order recurrences

Summary

This patch enables the vectorization of first-order (non-reduction) recurrences. For example:

for (int i = 0; i < n; ++i)
  b[i] = a[i] - a[i - 1];

In the example above, the load PRE of the GVN pass can often hoist a[i - 1] into the loop preheader. This leaves a phi node inside the loop containing values for the hoisted load and a[i]. Although GVN can create these phi nodes, they can also occur naturally.

In this patch, we add a new recurrence kind for these phi nodes and attempt to vectorize them if possible. Vectorization is performed by shuffling the values for the current and previous iterations. The vectorization cost estimate is updated to account for the added shuffle instruction.

Contributed-by: Matthew Simpson and Chad Rosier <mcrosier@codeaurora.org>

Diff Detail

Repository: rL LLVM

Event Timeline

mssimpso updated this revision to Diff 44897.Jan 14 2016, 10:02 AM

mssimpso retitled this revision from to [LV] Vectorize pre-load recurrences.

mssimpso updated this object.

mssimpso added reviewers: mcrosier, jmolloy, hfinkel, anemet, nadav.

mssimpso added a subscriber: llvm-commits.

Herald added a subscriber: sanjoy. · View Herald TranscriptJan 14 2016, 10:02 AM

bmakam added a subscriber: bmakam.Jan 14 2016, 10:21 AM

Matt, I've added a few comments below.

lib/Transforms/Vectorize/LoopVectorize.cpp
4573 ↗	(On Diff #44897)	I don't think checking for store instructions is enough here. Isn't I->mayWriteToMemory() more appropriate to catch e.g. calls?
4579 ↗	(On Diff #44897)	Same as above. Plus, I'm not sure this is a sufficient check. What if there is a store after the sink point in iteration 'i' that may alias the load from iteration 'i+1'? Perhaps you can piggy back on the check for vectorizability of the original load in the loop? That may not be enough either since the load in question might be known to only write the 0th element (i.e. the value loaded by the sunk load).

One other question: have you explored vectorizing this recurrence as a shuffle+insertelement instead? That would avoid the need for any extra memory dependency checking, and would avoid introducing more loads in the loop.

In D16197#327098, @gberry wrote:

One other question: have you explored vectorizing this recurrence as a shuffle+insertelement instead? That would avoid the need for any extra memory dependency checking, and would avoid introducing more loads in the loop.

I just talked with Geoff in person, and I like the shuffle/insert approach. This will avoid a lot of the memory issues and should also expand the types of phi's we can handle beyond loads. Thanks for the quick feedback! I'll post an updated version soon.

sbaranga added a subscriber: sbaranga.Jan 15 2016, 4:36 AM

sbaranga added inline comments.

lib/Transforms/Utils/LoopUtils.cpp
562 ↗	(On Diff #44897)	It might be worth taking a PredicatedScalarEvolution for this function which I think might help in some cases. At the moment it is possible that the loop was versioned such that AR's step is known to be 1. In this case doing PSE->getSCEV() would give you the more accurate expression. I think you also need to check that AR's is an AddRecExpr for TheLoop?

Addressed comments from Geoff and Adam.

Code generation is now done by shuffling the vectors from the current and previous iterations instead of trying to sink loads into the loop. The new approach is simpler and I don't think requires as much legality checking.

Herald added a subscriber: mcrosier. · View Herald TranscriptJan 20 2016, 11:35 AM

mssimpso added inline comments.Jan 20 2016, 11:38 AM

lib/Transforms/Utils/LoopUtils.cpp
562 ↗	(On Diff #45423)	Hi Silviu. Thanks very much for the comments. After changing the code generation approach, I don't think scalar evolution is required any longer.

Ping.

Hi Matt,

I've started reviewing this but I found that I could use an example of this transformation fully spelled out. Particularly, I am not sure I understand why the initial vector value is a splat vector.

Thanks,
Adam

lib/Transforms/Vectorize/LoopVectorize.cpp
367 ↗	(On Diff #45423)	I guess a definition of first-order recurrence should be added here.
3323–3324 ↗	(On Diff #45423)	I actually think we need to this before this patch ;). This is a 200-line loop and we shouldn't pile on more strangeness. I think that all we need to do is to rename RdxPHIsToFix to PHIsToFix in a prequel NFC patch.
3326 ↗	(On Diff #45423)	I know that we use fix... in other places but that's pretty general. How about vectorize... or something like that.
3564–3567 ↗	(On Diff #45423)	assert that Previous is loop-invariant at this point?

Hi Adam,

Thanks very much for the feedback. Yes, the initial value doesn't have to be broadcast to all lanes since just an insert will do. I will upload a new version with your suggestions. Let me walk through the simple example I mentioned in the summary. For this loop, the (shorthand) scalar IR looks something like this:

scalar.ph:
  s_init = a[-1]
  br scalar.body

scalar.body:
  s1 = phi [s_init, scalar.ph], [s2, scalar.body]
  i = phi [0, scalar.ph], [i+1, scalar.body]
  s2 = a[i]
  b[i] = s2 - s1
  br cond, scalar.body, ...

Here, s1 is a non-reduction recurrence that we currently give up on. This patch calls it a first-order recurrence (because it's value depends on the previous iteration) and we try to vectorize it. The check in isFirstOrderRecurrence basically ensures that s_init is loop-invariant, that s2 is in the loop header (and thus loop-varying), and that every use of s1 is dominated by the definition of s2 (see below). The vectorized IR looks something like this for VF=4:

vector.ph:
  v_init = vector(..., ..., ..., a[-1])
  br vector.body

vector.body
  v1 = phi [v_init, vector.ph], [v2, vector.body]
  i = phi [0, vector.ph], [i+4, vector.body]
  v2 = a[i, i+1, i+2, i+3];
  v3 = vector(v1(3), v2(0, 1, 2))
  b[i, i+1, i+2, i+3] = v2 - v3
  br cond, vector.body, middle.block

middle.block:
  x = v2(3)
  br scalar.ph

scalar.ph:
  s_init = phi [x, middle.block], [a[-1], otherwise]
  br scalar.body

The dominance requirement is there so that we can shuffle v1 with v2 before this value is needed for the assignment to b. After we leave the vector loop, we extract the next value of the recurrence (x) to use as the initial value when jumping to the scalar portion. I hope that helps.

lib/Transforms/Vectorize/LoopVectorize.cpp
367 ↗	(On Diff #45423)	Agreed.
3323–3324 ↗	(On Diff #45423)	Sure, I'll push a patch that renames RdxPHIsToFix to PHIsToFix first.
3326 ↗	(On Diff #45423)	vectorizeFirstOrderRecurrence sounds good to me.
3564–3567 ↗	(On Diff #45423)	Good idea.

mssimpso mentioned this in rL259364: [LV] Rename RdxPHIsToFix to PHIsToFix (NFC).Feb 1 2016, 8:11 AM

mssimpso marked 4 inline comments as done.Feb 1 2016, 9:52 AM

mssimpso added inline comments.

lib/Transforms/Vectorize/LoopVectorize.cpp
3564–3567 ↗	(On Diff #45423)	We actually want the previous value to be loop-varying here. The initial value should be loop-invariant. I've added this assertion as well as a few others.

Addressed Adam's comments

Matt, that's a nice example. Can you please add it as a comment to the code where it's appropriate.

lib/Transforms/Utils/LoopUtils.cpp
532–541 ↗	(On Diff #46551)	Thinking more about this, isn't it true that whether the phi operand is initial or previous depends on their edge assignment rather than whether they loop-variant/invariant.
543–550 ↗	(On Diff #46551)	How can user of the current value be loop-invariant?
lib/Transforms/Vectorize/LoopVectorize.cpp
368–381 ↗	(On Diff #46551)	We don't add empty /// like that I actually think that this definition of this should be before isFirstOrderRecurrence.
3558 ↗	(On Diff #46551)	Sorry but I forgot that "fix" is an actual "phase" here. We first do the default widening then we fix up the phis. Can you please rename it back, sorry :(
3573–3585 ↗	(On Diff #46551)	This is duplicated between here and isFirstOrderRecurrence, would be nice to somehow remove this.
3596–3597 ↗	(On Diff #46551)	"... when initially vectorizing ..."
3601–3602 ↗	(On Diff #46551)	Outdated comment.
3625–3636 ↗	(On Diff #46551)	I think that CurrentParts should be a single value, this is the value that's incoming into the iteration (whether the real loop iteration or an unrolled loop iteration). We may also want to use the name VecPhi rather than "current" and add comment clarifying the above. What do you think?
3657–3659 ↗	(On Diff #46551)	A newline before this block?
3735–3742 ↗	(On Diff #46551)	It's generally not a good practice to mix a multi-line formatting change with a single line (?) functionality change .

mssimpso added inline comments.Feb 11 2016, 9:12 AM

lib/Transforms/Utils/LoopUtils.cpp
532–541 ↗	(On Diff #46551)	When I think more about this, yes, I agree. I'll make the necessary updates. Thanks for catching this.
543–550 ↗	(On Diff #46551)	Current (the Phi) could be used outside the loop. Reductions are an example where a Phi is used externally. I actually don't think we need this restriction though, because we can handle this case in code generation. The use would just be of the last value of the recurrence, which we already extract in middle.block.
lib/Transforms/Vectorize/LoopVectorize.cpp
368–381 ↗	(On Diff #46551)	Sure, I'll move it.
3558 ↗	(On Diff #46551)	No problem!
3573–3585 ↗	(On Diff #46551)	Sure, I will delete these assertions here.
3625–3636 ↗	(On Diff #46551)	Yes, I don't think we need to keep track of CurrentParts as a SmallVector since a single value should do. Changing the name is probably a good idea to avoid confusion.
3735–3742 ↗	(On Diff #46551)	Sure, we can fix the formatting in a separate patch.

anemet added inline comments.Feb 11 2016, 9:35 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
3573–3585 ↗	(On Diff #46551)	I didn't mean this only for the asserts. I think it would make sense to factor out the way we get the previous and init for a first-order recurrence somewhere (e.g. RecurrenceDescriptor?). There may also be some common functionality between this new function and RecurrenceDescriptor::isFirstOrderRecurrence that could be shared.

mssimpso added inline comments.Feb 11 2016, 10:45 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
3573–3585 ↗	(On Diff #46551)	I see. Thanks for clarifying!

Addressed Adam's comments. Thanks, Adam!

Herald added a subscriber: mzolotukhin. · View Herald TranscriptFeb 11 2016, 1:17 PM

anemet added inline comments.Feb 17 2016, 11:30 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
3613–3632 ↗	(On Diff #47718)	Don't we only enter this function if the Phi has passed isFirstOrderRecurrence? Why are we asserting the same properties that were already checked there? This is almost like saying assert(isFirstOrderRecurrence(Phi)), no? I don't think we want these unless I am misunderstanding something.
test/Transforms/LoopVectorize/AArch64/first-order-recurrence.ll
1 ↗	(On Diff #47718)	Please also add a UF>1 test.
1–4 ↗	(On Diff #47718)	I think that it's better practice to avoid the triple and use -force-vector-width to make the test robust.
18 ↗	(On Diff #47718)	Is there any reason we can't fully spell out most of these instructions, e.g. shuffle mask?

Addressed Adam's comments. Thanks again, Adam!

lib/Transforms/Vectorize/LoopVectorize.cpp
3613–3632 ↗	(On Diff #47718)	Yes, this is my mistake. I thought you had requested these asserts in an earlier review. I've removed the duplication.
test/Transforms/LoopVectorize/AArch64/first-order-recurrence.ll
18 ↗	(On Diff #47718)	We can definitely do this.

LGTM.

This revision is now accepted and ready to land.Feb 18 2016, 2:47 PM

Thanks, Adam! I really appreciate the multiple rounds of feedback you gave on this one.

You're welcome and thanks for your patience. I also had the EuroLLVM reviews happening in parallel.

Closed by commit rL261346: [LV] Vectorize first-order recurrences (authored by mssimpso). · Explain WhyFeb 19 2016, 10:00 AM

This revision was automatically updated to reflect the committed changes.

Ayal mentioned this in D78210: [LV] Mark first-order recurrences as allowed exit.Apr 15 2020, 8:11 AM

Ayal mentioned this in rG8e0c5f720058: [LV] Mark first-order recurrences as allowed exits.Apr 18 2020, 2:01 PM

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

Transforms/

Utils/

LoopUtils.h

7 lines

lib/

Transforms/

Utils/

LoopUtils.cpp

37 lines

Vectorize/

LoopVectorize.cpp

203 lines

test/

Transforms/

LoopVectorize/

AArch64/

first-order-recurrence.ll

209 lines

Diff 48508

llvm/trunk/include/llvm/Transforms/Utils/LoopUtils.h

Show First 20 Lines • Show All 169 Lines • ▼ Show 20 Lines	static bool AddReductionVar(PHINode Phi, RecurrenceKind Kind, Loop TheLoop,
bool HasFunNoNaNAttr,		bool HasFunNoNaNAttr,
RecurrenceDescriptor &RedDes);		RecurrenceDescriptor &RedDes);

/// Returns true if Phi is a reduction in TheLoop. The RecurrenceDescriptor is		/// Returns true if Phi is a reduction in TheLoop. The RecurrenceDescriptor is
/// returned in RedDes.		/// returned in RedDes.
static bool isReductionPHI(PHINode Phi, Loop TheLoop,		static bool isReductionPHI(PHINode Phi, Loop TheLoop,
RecurrenceDescriptor &RedDes);		RecurrenceDescriptor &RedDes);

		/// Returns true if Phi is a first-order recurrence. A first-order recurrence
		/// is a non-reduction recurrence relation in which the value of the
		/// recurrence in the current loop iteration equals a value defined in the
		/// previous iteration.
		static bool isFirstOrderRecurrence(PHINode Phi, Loop TheLoop,
		DominatorTree *DT);

RecurrenceKind getRecurrenceKind() { return Kind; }		RecurrenceKind getRecurrenceKind() { return Kind; }

MinMaxRecurrenceKind getMinMaxRecurrenceKind() { return MinMaxKind; }		MinMaxRecurrenceKind getMinMaxRecurrenceKind() { return MinMaxKind; }

TrackingVH<Value> getRecurrenceStartValue() { return StartValue; }		TrackingVH<Value> getRecurrenceStartValue() { return StartValue; }

Instruction *getLoopExitInstr() { return LoopExitInstr; }		Instruction *getLoopExitInstr() { return LoopExitInstr; }

▲ Show 20 Lines • Show All 213 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/Utils/LoopUtils.cpp

Show First 20 Lines • Show All 514 Lines • ▼ Show 20 Lines	bool RecurrenceDescriptor::isReductionPHI(PHINode Phi, Loop TheLoop,
if (AddReductionVar(Phi, RK_FloatMinMax, TheLoop, HasFunNoNaNAttr, RedDes)) {		if (AddReductionVar(Phi, RK_FloatMinMax, TheLoop, HasFunNoNaNAttr, RedDes)) {
DEBUG(dbgs() << "Found an float MINMAX reduction PHI." << *Phi << "\n");		DEBUG(dbgs() << "Found an float MINMAX reduction PHI." << *Phi << "\n");
return true;		return true;
}		}
// Not a reduction of known type.		// Not a reduction of known type.
return false;		return false;
}		}

		bool RecurrenceDescriptor::isFirstOrderRecurrence(PHINode Phi, Loop TheLoop,
		DominatorTree *DT) {

		// Ensure the phi node is in the loop header and has two incoming values.
		if (Phi->getParent() != TheLoop->getHeader() \|\|
		Phi->getNumIncomingValues() != 2)
		return false;

		// Ensure the loop has a preheader and a single latch block. The loop
		// vectorizer will need the latch to set up the next iteration of the loop.
		auto *Preheader = TheLoop->getLoopPreheader();
		auto *Latch = TheLoop->getLoopLatch();
		if (!Preheader \|\| !Latch)
		return false;

		// Ensure the phi node's incoming blocks are the loop preheader and latch.
		if (Phi->getBasicBlockIndex(Preheader) < 0 \|\|
		Phi->getBasicBlockIndex(Latch) < 0)
		return false;

		// Get the previous value. The previous value comes from the latch edge while
		// the initial value comes form the preheader edge.
		auto *Previous = dyn_cast<Instruction>(Phi->getIncomingValueForBlock(Latch));
		if (!Previous)
		return false;

		// Ensure every user of the phi node is dominated by the previous value. The
		// dominance requirement ensures the loop vectorizer will not need to
		// vectorize the initial value prior to the first iteration of the loop.
		for (User *U : Phi->users())
		if (auto *I = dyn_cast<Instruction>(U))
		if (!DT->dominates(Previous, I))
		return false;

		return true;
		}

/// This function returns the identity element (or neutral element) for		/// This function returns the identity element (or neutral element) for
/// the operation K.		/// the operation K.
Constant *RecurrenceDescriptor::getRecurrenceIdentity(RecurrenceKind K,		Constant *RecurrenceDescriptor::getRecurrenceIdentity(RecurrenceKind K,
Type *Tp) {		Type *Tp) {
switch (K) {		switch (K) {
case RK_IntegerXor:		case RK_IntegerXor:
case RK_IntegerAdd:		case RK_IntegerAdd:
case RK_IntegerOr:		case RK_IntegerOr:
▲ Show 20 Lines • Show All 259 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp

Show First 20 Lines • Show All 358 Lines • ▼ Show 20 Lines	protected:
/// Create an empty loop, based on the loop ranges of the old loop.		/// Create an empty loop, based on the loop ranges of the old loop.
void createEmptyLoop();		void createEmptyLoop();
/// Create a new induction variable inside L.		/// Create a new induction variable inside L.
PHINode createInductionVariable(Loop L, Value Start, Value End,		PHINode createInductionVariable(Loop L, Value Start, Value End,
Value Step, Instruction DL);		Value Step, Instruction DL);
/// Copy and widen the instructions from the old loop.		/// Copy and widen the instructions from the old loop.
virtual void vectorizeLoop();		virtual void vectorizeLoop();

		/// Fix a first-order recurrence. This is the second phase of vectorizing
		/// this phi node.
		void fixFirstOrderRecurrence(PHINode *Phi);

/// \brief The Loop exit block may have single value PHI nodes where the		/// \brief The Loop exit block may have single value PHI nodes where the
/// incoming value is 'Undef'. While vectorizing we only handled real values		/// incoming value is 'Undef'. While vectorizing we only handled real values
/// that were defined inside the loop. Here we fix the 'undef case'.		/// that were defined inside the loop. Here we fix the 'undef case'.
/// See PR14725.		/// See PR14725.
void fixLCSSAPHIs();		void fixLCSSAPHIs();

/// Shrinks vector element sizes based on information in "MinBWs".		/// Shrinks vector element sizes based on information in "MinBWs".
void truncateToMinimalBitwidths();		void truncateToMinimalBitwidths();
▲ Show 20 Lines • Show All 821 Lines • ▼ Show 20 Lines	public:
/// ReductionList contains the reduction descriptors for all		/// ReductionList contains the reduction descriptors for all
/// of the reductions that were found in the loop.		/// of the reductions that were found in the loop.
typedef DenseMap<PHINode *, RecurrenceDescriptor> ReductionList;		typedef DenseMap<PHINode *, RecurrenceDescriptor> ReductionList;

/// InductionList saves induction variables and maps them to the		/// InductionList saves induction variables and maps them to the
/// induction descriptor.		/// induction descriptor.
typedef MapVector<PHINode*, InductionDescriptor> InductionList;		typedef MapVector<PHINode*, InductionDescriptor> InductionList;

		/// RecurrenceSet contains the phi nodes that are recurrences other than
		/// inductions and reductions.
		typedef SmallPtrSet<const PHINode *, 8> RecurrenceSet;

/// Returns true if it is legal to vectorize this loop.		/// Returns true if it is legal to vectorize this loop.
/// This does not mean that it is profitable to vectorize this		/// This does not mean that it is profitable to vectorize this
/// loop, only that it is legal to do so.		/// loop, only that it is legal to do so.
bool canVectorize();		bool canVectorize();

/// Returns the Induction variable.		/// Returns the Induction variable.
PHINode *getInduction() { return Induction; }		PHINode *getInduction() { return Induction; }

/// Returns the reduction variables found in the loop.		/// Returns the reduction variables found in the loop.
ReductionList *getReductionVars() { return &Reductions; }		ReductionList *getReductionVars() { return &Reductions; }

/// Returns the induction variables found in the loop.		/// Returns the induction variables found in the loop.
InductionList *getInductionVars() { return &Inductions; }		InductionList *getInductionVars() { return &Inductions; }

		/// Return the first-order recurrences found in the loop.
		RecurrenceSet *getFirstOrderRecurrences() { return &FirstOrderRecurrences; }

/// Returns the widest induction type.		/// Returns the widest induction type.
Type *getWidestInductionType() { return WidestIndTy; }		Type *getWidestInductionType() { return WidestIndTy; }

/// Returns True if V is an induction variable in this loop.		/// Returns True if V is an induction variable in this loop.
bool isInductionVariable(const Value *V);		bool isInductionVariable(const Value *V);

/// Returns True if PN is a reduction variable in this loop.		/// Returns True if PN is a reduction variable in this loop.
bool isReductionVariable(PHINode *PN) { return Reductions.count(PN); }		bool isReductionVariable(PHINode *PN) { return Reductions.count(PN); }

		/// Returns True if Phi is a first-order recurrence in this loop.
		bool isFirstOrderRecurrence(const PHINode *Phi);

/// Return true if the block BB needs to be predicated in order for the loop		/// Return true if the block BB needs to be predicated in order for the loop
/// to be vectorized.		/// to be vectorized.
bool blockNeedsPredication(BasicBlock *BB);		bool blockNeedsPredication(BasicBlock *BB);

/// Check if this pointer is consecutive when vectorizing. This happens		/// Check if this pointer is consecutive when vectorizing. This happens
/// when the last index of the GEP is the induction variable, or that the		/// when the last index of the GEP is the induction variable, or that the
/// pointer itself is an induction variable.		/// pointer itself is an induction variable.
/// This check allows us to vectorize A[idx] into a wide load/store.		/// This check allows us to vectorize A[idx] into a wide load/store.
▲ Show 20 Lines • Show All 144 Lines • ▼ Show 20 Lines	private:
/// loop.		/// loop.
PHINode *Induction;		PHINode *Induction;
/// Holds the reduction variables.		/// Holds the reduction variables.
ReductionList Reductions;		ReductionList Reductions;
/// Holds all of the induction variables that we found in the loop.		/// Holds all of the induction variables that we found in the loop.
/// Notice that inductions don't need to start at zero and that induction		/// Notice that inductions don't need to start at zero and that induction
/// variables can be pointers.		/// variables can be pointers.
InductionList Inductions;		InductionList Inductions;
		/// Holds the phi nodes that are first-order recurrences.
		RecurrenceSet FirstOrderRecurrences;
/// Holds the widest induction type encountered.		/// Holds the widest induction type encountered.
Type *WidestIndTy;		Type *WidestIndTy;

/// Allowed outside users. This holds the reduction		/// Allowed outside users. This holds the reduction
/// vars which can be accessed from outside the loop.		/// vars which can be accessed from outside the loop.
SmallPtrSet<Value*, 4> AllowedExit;		SmallPtrSet<Value*, 4> AllowedExit;
/// This set holds the variables which are known to be uniform after		/// This set holds the variables which are known to be uniform after
/// vectorization.		/// vectorization.
▲ Show 20 Lines • Show All 1,986 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::vectorizeLoop() {

// At this point every instruction in the original loop is widened to a		// At this point every instruction in the original loop is widened to a
// vector form. Now we need to fix the recurrences in PHIsToFix. These PHI		// vector form. Now we need to fix the recurrences in PHIsToFix. These PHI
// nodes are currently empty because we did not want to introduce cycles.		// nodes are currently empty because we did not want to introduce cycles.
// This is the second stage of vectorizing recurrences.		// This is the second stage of vectorizing recurrences.
for (PHINode *Phi : PHIsToFix) {		for (PHINode *Phi : PHIsToFix) {
assert(Phi && "Unable to recover vectorized PHI");		assert(Phi && "Unable to recover vectorized PHI");

// We currently only handle reductions. Ensure the PHI node to be fixed is		// Handle first-order recurrences that need to be fixed.
// a reduction, and get its reduction variable descriptor.		if (Legal->isFirstOrderRecurrence(Phi)) {
		fixFirstOrderRecurrence(Phi);
		continue;
		}

		// If the phi node is not a first-order recurrence, it must be a reduction.
		// Get it's reduction variable descriptor.
assert(Legal->isReductionVariable(Phi) &&		assert(Legal->isReductionVariable(Phi) &&
"Unable to find the reduction variable");		"Unable to find the reduction variable");
RecurrenceDescriptor RdxDesc = (*Legal->getReductionVars())[Phi];		RecurrenceDescriptor RdxDesc = (*Legal->getReductionVars())[Phi];

RecurrenceDescriptor::RecurrenceKind RK = RdxDesc.getRecurrenceKind();		RecurrenceDescriptor::RecurrenceKind RK = RdxDesc.getRecurrenceKind();
TrackingVH<Value> ReductionStartValue = RdxDesc.getRecurrenceStartValue();		TrackingVH<Value> ReductionStartValue = RdxDesc.getRecurrenceStartValue();
Instruction *LoopExitInst = RdxDesc.getLoopExitInstr();		Instruction *LoopExitInst = RdxDesc.getLoopExitInstr();
RecurrenceDescriptor::MinMaxRecurrenceKind MinMaxKind =		RecurrenceDescriptor::MinMaxRecurrenceKind MinMaxKind =
▲ Show 20 Lines • Show All 209 Lines • ▼ Show 20 Lines	for (auto KV : PredicatedStores) {
I->getParent()->setName("pred.store.if");		I->getParent()->setName("pred.store.if");
BB->setName("pred.store.continue");		BB->setName("pred.store.continue");
}		}
DEBUG(DT->verifyDomTree());		DEBUG(DT->verifyDomTree());
// Remove redundant induction instructions.		// Remove redundant induction instructions.
cse(LoopVectorBody);		cse(LoopVectorBody);
}		}

		void InnerLoopVectorizer::fixFirstOrderRecurrence(PHINode *Phi) {

		// This is the second phase of vectorizing first-order rececurrences. An
		// overview of the transformation is described below. Suppose we have the
		// following loop.
		//
		// for (int i = 0; i < n; ++i)
		// b[i] = a[i] - a[i - 1];
		//
		// There is a first-order recurrence on "a". For this loop, the shorthand
		// scalar IR looks like:
		//
		// scalar.ph:
		// s_init = a[-1]
		// br scalar.body
		//
		// scalar.body:
		// i = phi [0, scalar.ph], [i+1, scalar.body]
		// s1 = phi [s_init, scalar.ph], [s2, scalar.body]
		// s2 = a[i]
		// b[i] = s2 - s1
		// br cond, scalar.body, ...
		//
		// In this example, s1 is a recurrence because it's value depends on the
		// previous iteration. In the first phase of vectorization, we created a
		// temporary value for s1. We now complete the vectorization and produce the
		// shorthand vector IR shown below (for VF = 4, UF = 1).
		//
		// vector.ph:
		// v_init = vector(..., ..., ..., a[-1])
		// br vector.body
		//
		// vector.body
		// i = phi [0, vector.ph], [i+4, vector.body]
		// v1 = phi [v_init, vector.ph], [v2, vector.body]
		// v2 = a[i, i+1, i+2, i+3];
		// v3 = vector(v1(3), v2(0, 1, 2))
		// b[i, i+1, i+2, i+3] = v2 - v3
		// br cond, vector.body, middle.block
		//
		// middle.block:
		// x = v2(3)
		// br scalar.ph
		//
		// scalar.ph:
		// s_init = phi [x, middle.block], [a[-1], otherwise]
		// br scalar.body
		//
		// After execution completes the vector loop, we extract the next value of
		// the recurrence (x) to use as the initial value in the scalar loop.

		// Get the original loop preheader and single loop latch.
		auto *Preheader = OrigLoop->getLoopPreheader();
		auto *Latch = OrigLoop->getLoopLatch();

		// Get the initial and previous values of the scalar recurrence.
		auto *ScalarInit = Phi->getIncomingValueForBlock(Preheader);
		auto *Previous = Phi->getIncomingValueForBlock(Latch);

		// Create a vector from the initial value.
		auto *VectorInit = ScalarInit;
		if (VF > 1) {
		Builder.SetInsertPoint(LoopVectorPreHeader->getTerminator());
		VectorInit = Builder.CreateInsertElement(
		UndefValue::get(VectorType::get(VectorInit->getType(), VF)), VectorInit,
		Builder.getInt32(VF - 1), "vector.recur.init");
		}

		// We constructed a temporary phi node in the first phase of vectorization.
		// This phi node will eventually be deleted.
		auto &PhiParts = getVectorValue(Phi);
		Builder.SetInsertPoint(cast<Instruction>(PhiParts[0]));

		// Create a phi node for the new recurrence. The current value will either be
		// the initial value inserted into a vector or loop-varying vector value.
		auto *VecPhi = Builder.CreatePHI(VectorInit->getType(), 2, "vector.recur");
		VecPhi->addIncoming(VectorInit, LoopVectorPreHeader);

		// Get the vectorized previous value. We ensured the previous values was an
		// instruction when detecting the recurrence.
		auto &PreviousParts = getVectorValue(Previous);

		// Set the insertion point to be after this instruction. We ensured the
		// previous value dominated all uses of the phi when detecting the
		// recurrence.
		Builder.SetInsertPoint(
		&*++BasicBlock::iterator(cast<Instruction>(PreviousParts[UF - 1])));

		// We will construct a vector for the recurrence by combining the values for
		// the current and previous iterations. This is the required shuffle mask.
		SmallVector<Constant *, 8> ShuffleMask(VF);
		ShuffleMask[0] = Builder.getInt32(VF - 1);
		for (unsigned I = 1; I < VF; ++I)
		ShuffleMask[I] = Builder.getInt32(I + VF - 1);

		// The vector from which to take the initial value for the current iteration
		// (actual or unrolled). Initially, this is the vector phi node.
		Value *Incoming = VecPhi;

		// Shuffle the current and previous vector and update the vector parts.
		for (unsigned Part = 0; Part < UF; ++Part) {
		auto *Shuffle =
		VF > 1
		? Builder.CreateShuffleVector(Incoming, PreviousParts[Part],
		ConstantVector::get(ShuffleMask))
		: Incoming;
		PhiParts[Part]->replaceAllUsesWith(Shuffle);
		cast<Instruction>(PhiParts[Part])->eraseFromParent();
		PhiParts[Part] = Shuffle;
		Incoming = PreviousParts[Part];
		}

		// Fix the latch value of the new recurrence in the vector loop.
		VecPhi->addIncoming(Incoming,
		LI->getLoopFor(LoopVectorBody[0])->getLoopLatch());

		// Extract the last vector element in the middle block. This will be the
		// initial value for the recurrence when jumping to the scalar loop.
		auto *Extract = Incoming;
		if (VF > 1) {
		Builder.SetInsertPoint(LoopMiddleBlock->getTerminator());
		Extract = Builder.CreateExtractElement(Extract, Builder.getInt32(VF - 1),
		"vector.recur.extract");
		}

		// Fix the initial value of the original recurrence in the scalar loop.
		Builder.SetInsertPoint(&*LoopScalarPreHeader->begin());
		auto *Start = Builder.CreatePHI(Phi->getType(), 2, "scalar.recur.init");
		for (auto *BB : predecessors(LoopScalarPreHeader)) {
		auto *Incoming = BB == LoopMiddleBlock ? Extract : ScalarInit;
		Start->addIncoming(Incoming, BB);
		}

		Phi->setIncomingValue(Phi->getBasicBlockIndex(LoopScalarPreHeader), Start);
		Phi->setName("scalar.recur");

		// Finally, fix users of the recurrence outside the loop. The users will need
		// either the last value of the scalar recurrence or the last value of the
		// vector recurrence we extracted in the middle block. Since the loop is in
		// LCSSA form, we just need to find the phi node for the original scalar
		// recurrence in the exit block, and then add an edge for the middle block.
		for (auto &I : *LoopExitBlock) {
		auto *LCSSAPhi = dyn_cast<PHINode>(&I);
		if (!LCSSAPhi)
		break;
		if (LCSSAPhi->getIncomingValue(0) == Phi) {
		LCSSAPhi->addIncoming(Extract, LoopMiddleBlock);
		break;
		}
		}
		}

void InnerLoopVectorizer::fixLCSSAPHIs() {		void InnerLoopVectorizer::fixLCSSAPHIs() {
for (BasicBlock::iterator LEI = LoopExitBlock->begin(),		for (BasicBlock::iterator LEI = LoopExitBlock->begin(),
LEE = LoopExitBlock->end(); LEI != LEE; ++LEI) {		LEE = LoopExitBlock->end(); LEI != LEE; ++LEI) {
PHINode *LCSSAPhi = dyn_cast<PHINode>(LEI);		PHINode *LCSSAPhi = dyn_cast<PHINode>(LEI);
if (!LCSSAPhi) break;		if (!LCSSAPhi) break;
if (LCSSAPhi->getNumIncomingValues() == 1)		if (LCSSAPhi->getNumIncomingValues() == 1)
LCSSAPhi->addIncoming(UndefValue::get(LCSSAPhi->getType()),		LCSSAPhi->addIncoming(UndefValue::get(LCSSAPhi->getType()),
LoopMiddleBlock);		LoopMiddleBlock);
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	InnerLoopVectorizer::createBlockInMask(BasicBlock *BB) {

return BlockMask;		return BlockMask;
}		}

void InnerLoopVectorizer::widenPHIInstruction(		void InnerLoopVectorizer::widenPHIInstruction(
Instruction *PN, InnerLoopVectorizer::VectorParts &Entry, unsigned UF,		Instruction *PN, InnerLoopVectorizer::VectorParts &Entry, unsigned UF,
unsigned VF, PhiVector *PV) {		unsigned VF, PhiVector *PV) {
PHINode* P = cast<PHINode>(PN);		PHINode* P = cast<PHINode>(PN);
// Handle reduction variables:		// Handle recurrences.
if (Legal->isReductionVariable(P)) {		if (Legal->isReductionVariable(P) \|\| Legal->isFirstOrderRecurrence(P)) {
for (unsigned part = 0; part < UF; ++part) {		for (unsigned part = 0; part < UF; ++part) {
// This is phase one of vectorizing PHIs.		// This is phase one of vectorizing PHIs.
Type *VecTy = (VF == 1) ? PN->getType() :		Type *VecTy = (VF == 1) ? PN->getType() :
VectorType::get(PN->getType(), VF);		VectorType::get(PN->getType(), VF);
Entry[part] = PHINode::Create(		Entry[part] = PHINode::Create(
VecTy, 2, "vec.phi", &*LoopVectorBody.back()->getFirstInsertionPt());		VecTy, 2, "vec.phi", &*LoopVectorBody.back()->getFirstInsertionPt());
}		}
PV->push_back(P);		PV->push_back(P);
▲ Show 20 Lines • Show All 679 Lines • ▼ Show 20 Lines	for (BasicBlock::iterator it = (bb)->begin(), e = (bb)->end(); it != e;
if (RecurrenceDescriptor::isReductionPHI(Phi, TheLoop, RedDes)) {		if (RecurrenceDescriptor::isReductionPHI(Phi, TheLoop, RedDes)) {
if (RedDes.hasUnsafeAlgebra())		if (RedDes.hasUnsafeAlgebra())
Requirements->addUnsafeAlgebraInst(RedDes.getUnsafeAlgebraInst());		Requirements->addUnsafeAlgebraInst(RedDes.getUnsafeAlgebraInst());
AllowedExit.insert(RedDes.getLoopExitInstr());		AllowedExit.insert(RedDes.getLoopExitInstr());
Reductions[Phi] = RedDes;		Reductions[Phi] = RedDes;
continue;		continue;
}		}

		if (RecurrenceDescriptor::isFirstOrderRecurrence(Phi, TheLoop, DT)) {
		FirstOrderRecurrences.insert(Phi);
		continue;
		}

emitAnalysis(VectorizationReport(&*it) <<		emitAnalysis(VectorizationReport(&*it) <<
"value that could not be identified as "		"value that could not be identified as "
"reduction is used outside the loop");		"reduction is used outside the loop");
DEBUG(dbgs() << "LV: Found an unidentified PHI."<< *Phi <<"\n");		DEBUG(dbgs() << "LV: Found an unidentified PHI."<< *Phi <<"\n");
return false;		return false;
}// end of PHI handling		}// end of PHI handling

// We handle calls that:		// We handle calls that:
▲ Show 20 Lines • Show All 161 Lines • ▼ Show 20 Lines	bool LoopVectorizationLegality::isInductionVariable(const Value *V) {
Value In0 = const_cast<Value>(V);		Value In0 = const_cast<Value>(V);
PHINode *PN = dyn_cast_or_null<PHINode>(In0);		PHINode *PN = dyn_cast_or_null<PHINode>(In0);
if (!PN)		if (!PN)
return false;		return false;

return Inductions.count(PN);		return Inductions.count(PN);
}		}

		bool LoopVectorizationLegality::isFirstOrderRecurrence(const PHINode *Phi) {
		return FirstOrderRecurrences.count(Phi);
		}

bool LoopVectorizationLegality::blockNeedsPredication(BasicBlock *BB) {		bool LoopVectorizationLegality::blockNeedsPredication(BasicBlock *BB) {
return LoopAccessInfo::blockNeedsPredication(BB, TheLoop, DT);		return LoopAccessInfo::blockNeedsPredication(BB, TheLoop, DT);
}		}

bool LoopVectorizationLegality::blockCanBePredicated(BasicBlock *BB,		bool LoopVectorizationLegality::blockCanBePredicated(BasicBlock *BB,
SmallPtrSetImpl<Value *> &SafePtrs) {		SmallPtrSetImpl<Value *> &SafePtrs) {

for (BasicBlock::iterator it = BB->begin(), e = BB->end(); it != e; ++it) {		for (BasicBlock::iterator it = BB->begin(), e = BB->end(); it != e; ++it) {
▲ Show 20 Lines • Show All 873 Lines • ▼ Show 20 Lines	case Instruction::GetElementPtr:
// We mark this instruction as zero-cost because the cost of GEPs in		// We mark this instruction as zero-cost because the cost of GEPs in
// vectorized code depends on whether the corresponding memory instruction		// vectorized code depends on whether the corresponding memory instruction
// is scalarized or not. Therefore, we handle GEPs with the memory		// is scalarized or not. Therefore, we handle GEPs with the memory
// instruction cost.		// instruction cost.
return 0;		return 0;
case Instruction::Br: {		case Instruction::Br: {
return TTI.getCFInstrCost(I->getOpcode());		return TTI.getCFInstrCost(I->getOpcode());
}		}
case Instruction::PHI:		case Instruction::PHI: {
		auto *Phi = cast<PHINode>(I);

		// First-order recurrences are replaced by vector shuffles inside the loop.
		if (VF > 1 && Legal->isFirstOrderRecurrence(Phi))
		return TTI.getShuffleCost(TargetTransformInfo::SK_ExtractSubvector,
		VectorTy, VF - 1, VectorTy);

//TODO: IF-converted IFs become selects.		// TODO: IF-converted IFs become selects.
return 0;		return 0;
		}
case Instruction::Add:		case Instruction::Add:
case Instruction::FAdd:		case Instruction::FAdd:
case Instruction::Sub:		case Instruction::Sub:
case Instruction::FSub:		case Instruction::FSub:
case Instruction::Mul:		case Instruction::Mul:
case Instruction::FMul:		case Instruction::FMul:
case Instruction::UDiv:		case Instruction::UDiv:
case Instruction::SDiv:		case Instruction::SDiv:
▲ Show 20 Lines • Show All 477 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/LoopVectorize/AArch64/first-order-recurrence.ll

				; RUN: opt < %s -loop-vectorize -force-vector-width=4 -force-vector-interleave=1 -dce -instcombine -S \| FileCheck %s
				; RUN: opt < %s -loop-vectorize -force-vector-width=4 -force-vector-interleave=2 -dce -instcombine -S \| FileCheck %s --check-prefix=UNROLL

				target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"

				; CHECK-LABEL: @recurrence_1
				;
				; void recurrence_1(int a, int b, int n) {
				; for(int i = 0; i < n; i++)
				; b[i] = a[i] + a[i - 1]
				; }
				;
				; CHECK: vector.ph:
				; CHECK: %vector.recur.init = insertelement <4 x i32> undef, i32 %pre_load, i32 3
				;
				; CHECK: vector.body:
				; CHECK: %vector.recur = phi <4 x i32> [ %vector.recur.init, %vector.ph ], [ [[L1:%[a-zA-Z0-9.]+]], %vector.body ]
				; CHECK: [[L1]] = load <4 x i32>
				; CHECK: {{.*}} = shufflevector <4 x i32> %vector.recur, <4 x i32> [[L1]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
				;
				; CHECK: middle.block:
				; CHECK: %vector.recur.extract = extractelement <4 x i32> [[L1]], i32 3
				;
				; CHECK: scalar.ph:
				; CHECK: %scalar.recur.init = phi i32 [ %vector.recur.extract, %middle.block ], [ %pre_load, %vector.memcheck ], [ %pre_load, %min.iters.checked ], [ %pre_load, %for.preheader ]
				;
				; CHECK: scalar.body:
				; CHECK: %scalar.recur = phi i32 [ %scalar.recur.init, %scalar.ph ], [ {{.*}}, %scalar.body ]
				;
				; UNROLL: vector.body:
				; UNROLL: %vector.recur = phi <4 x i32> [ %vector.recur.init, %vector.ph ], [ [[L2:%[a-zA-Z0-9.]+]], %vector.body ]
				; UNROLL: [[L1:%[a-zA-Z0-9.]+]] = load <4 x i32>
				; UNROLL: [[L2]] = load <4 x i32>
				; UNROLL: {{.*}} = shufflevector <4 x i32> %vector.recur, <4 x i32> [[L1]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
				; UNROLL: {{.*}} = shufflevector <4 x i32> [[L1]], <4 x i32> [[L2]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
				;
				; UNROLL: middle.block:
				; UNROLL: %vector.recur.extract = extractelement <4 x i32> [[L2]], i32 3
				;
				define void @recurrence_1(i32* nocapture readonly %a, i32* nocapture %b, i32 %n) {
				entry:
				br label %for.preheader

				for.preheader:
				%arrayidx.phi.trans.insert = getelementptr inbounds i32, i32* %a, i64 0
				%pre_load = load i32, i32* %arrayidx.phi.trans.insert
				br label %scalar.body

				scalar.body:
				%0 = phi i32 [ %pre_load, %for.preheader ], [ %1, %scalar.body ]
				%indvars.iv = phi i64 [ 0, %for.preheader ], [ %indvars.iv.next, %scalar.body ]
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%arrayidx32 = getelementptr inbounds i32, i32* %a, i64 %indvars.iv.next
				%1 = load i32, i32* %arrayidx32
				%arrayidx34 = getelementptr inbounds i32, i32* %b, i64 %indvars.iv
				%add35 = add i32 %1, %0
				store i32 %add35, i32* %arrayidx34
				%lftr.wideiv = trunc i64 %indvars.iv.next to i32
				%exitcond = icmp eq i32 %lftr.wideiv, %n
				br i1 %exitcond, label %for.exit, label %scalar.body

				for.exit:
				ret void
				}

				; CHECK-LABEL: @recurrence_2
				;
				; int recurrence_2(int *a, int n) {
				; int minmax;
				; for (int i = 0; i < n; ++i)
				; minmax = min(minmax, max(a[i] - a[i-1], 0));
				; return minmax;
				; }
				;
				; CHECK: vector.ph:
				; CHECK: %vector.recur.init = insertelement <4 x i32> undef, i32 %.pre, i32 3
				;
				; CHECK: vector.body:
				; CHECK: %vector.recur = phi <4 x i32> [ %vector.recur.init, %vector.ph ], [ [[L1:%[a-zA-Z0-9.]+]], %vector.body ]
				; CHECK: [[L1]] = load <4 x i32>
				; CHECK: {{.*}} = shufflevector <4 x i32> %vector.recur, <4 x i32> [[L1]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
				;
				; CHECK: middle.block:
				; CHECK: %vector.recur.extract = extractelement <4 x i32> [[L1]], i32 3
				;
				; CHECK: scalar.ph:
				; CHECK: %scalar.recur.init = phi i32 [ %vector.recur.extract, %middle.block ], [ %.pre, %min.iters.checked ], [ %.pre, %for.preheader ]
				;
				; CHECK: scalar.body:
				; CHECK: %scalar.recur = phi i32 [ %scalar.recur.init, %scalar.ph ], [ {{.*}}, %scalar.body ]
				;
				; UNROLL: vector.body:
				; UNROLL: %vector.recur = phi <4 x i32> [ %vector.recur.init, %vector.ph ], [ [[L2:%[a-zA-Z0-9.]+]], %vector.body ]
				; UNROLL: [[L1:%[a-zA-Z0-9.]+]] = load <4 x i32>
				; UNROLL: [[L2]] = load <4 x i32>
				; UNROLL: {{.*}} = shufflevector <4 x i32> %vector.recur, <4 x i32> [[L1]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
				; UNROLL: {{.*}} = shufflevector <4 x i32> [[L1]], <4 x i32> [[L2]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
				;
				; UNROLL: middle.block:
				; UNROLL: %vector.recur.extract = extractelement <4 x i32> [[L2]], i32 3
				;
				define i32 @recurrence_2(i32* nocapture readonly %a, i32 %n) {
				entry:
				%cmp27 = icmp sgt i32 %n, 0
				br i1 %cmp27, label %for.preheader, label %for.cond.cleanup

				for.preheader:
				%arrayidx2.phi.trans.insert = getelementptr inbounds i32, i32* %a, i64 -1
				%.pre = load i32, i32* %arrayidx2.phi.trans.insert, align 4
				br label %scalar.body

				for.cond.cleanup.loopexit:
				%minmax.0.cond.lcssa = phi i32 [ %minmax.0.cond, %scalar.body ]
				br label %for.cond.cleanup

				for.cond.cleanup:
				%minmax.0.lcssa = phi i32 [ undef, %entry ], [ %minmax.0.cond.lcssa, %for.cond.cleanup.loopexit ]
				ret i32 %minmax.0.lcssa

				scalar.body:
				%0 = phi i32 [ %.pre, %for.preheader ], [ %1, %scalar.body ]
				%indvars.iv = phi i64 [ 0, %for.preheader ], [ %indvars.iv.next, %scalar.body ]
				%minmax.028 = phi i32 [ undef, %for.preheader ], [ %minmax.0.cond, %scalar.body ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
				%1 = load i32, i32* %arrayidx, align 4
				%sub3 = sub nsw i32 %1, %0
				%cmp4 = icmp sgt i32 %sub3, 0
				%cond = select i1 %cmp4, i32 %sub3, i32 0
				%cmp5 = icmp slt i32 %minmax.028, %cond
				%minmax.0.cond = select i1 %cmp5, i32 %minmax.028, i32 %cond
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%lftr.wideiv = trunc i64 %indvars.iv.next to i32
				%exitcond = icmp eq i32 %lftr.wideiv, %n
				br i1 %exitcond, label %for.cond.cleanup.loopexit, label %scalar.body
				}

				; CHECK-LABEL: @recurrence_3
				;
				; void recurrence_3(short a, double b, int n, float f, short p) {
				; b[0] = (double)a[0] - f * (double)p;
				; for (int i = 1; i < n; i++)
				; b[i] = (double)a[i] - f * (double)a[i - 1];
				; }
				;
				;
				; CHECK: vector.ph:
				; CHECK: %vector.recur.init = insertelement <4 x i16> undef, i16 %0, i32 3
				;
				; CHECK: vector.body:
				; CHECK: %vector.recur = phi <4 x i16> [ %vector.recur.init, %vector.ph ], [ [[L1:%[a-zA-Z0-9.]+]], %vector.body ]
				; CHECK: [[L1]] = load <4 x i16>
				; CHECK: {{.*}} = shufflevector <4 x i16> %vector.recur, <4 x i16> [[L1]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
				;
				; CHECK: middle.block:
				; CHECK: %vector.recur.extract = extractelement <4 x i16> [[L1]], i32 3
				;
				; CHECK: scalar.ph:
				; CHECK: %scalar.recur.init = phi i16 [ %vector.recur.extract, %middle.block ], [ %0, %vector.memcheck ], [ %0, %min.iters.checked ], [ %0, %for.preheader ]
				;
				; CHECK: scalar.body:
				; CHECK: %scalar.recur = phi i16 [ %scalar.recur.init, %scalar.ph ], [ {{.*}}, %scalar.body ]
				;
				; UNROLL: vector.body:
				; UNROLL: %vector.recur = phi <4 x i16> [ %vector.recur.init, %vector.ph ], [ [[L2:%[a-zA-Z0-9.]+]], %vector.body ]
				; UNROLL: [[L1:%[a-zA-Z0-9.]+]] = load <4 x i16>
				; UNROLL: [[L2]] = load <4 x i16>
				; UNROLL: {{.*}} = shufflevector <4 x i16> %vector.recur, <4 x i16> [[L1]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
				; UNROLL: {{.*}} = shufflevector <4 x i16> [[L1]], <4 x i16> [[L2]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
				;
				; UNROLL: middle.block:
				; UNROLL: %vector.recur.extract = extractelement <4 x i16> [[L2]], i32 3
				;
				define void @recurrence_3(i16* nocapture readonly %a, double* nocapture %b, i32 %n, float %f, i16 %p) {
				entry:
				%0 = load i16, i16* %a, align 2
				%conv = sitofp i16 %0 to double
				%conv1 = fpext float %f to double
				%conv2 = sitofp i16 %p to double
				%mul = fmul fast double %conv2, %conv1
				%sub = fsub fast double %conv, %mul
				store double %sub, double* %b, align 8
				%cmp25 = icmp sgt i32 %n, 1
				br i1 %cmp25, label %for.preheader, label %for.end

				for.preheader:
				br label %scalar.body

				scalar.body:
				%1 = phi i16 [ %0, %for.preheader ], [ %2, %scalar.body ]
				%advars.iv = phi i64 [ %advars.iv.next, %scalar.body ], [ 1, %for.preheader ]
				%arrayidx5 = getelementptr inbounds i16, i16* %a, i64 %advars.iv
				%2 = load i16, i16* %arrayidx5, align 2
				%conv6 = sitofp i16 %2 to double
				%conv11 = sitofp i16 %1 to double
				%mul12 = fmul fast double %conv11, %conv1
				%sub13 = fsub fast double %conv6, %mul12
				%arrayidx15 = getelementptr inbounds double, double* %b, i64 %advars.iv
				store double %sub13, double* %arrayidx15, align 8
				%advars.iv.next = add nuw nsw i64 %advars.iv, 1
				%lftr.wideiv = trunc i64 %advars.iv.next to i32
				%exitcond = icmp eq i32 %lftr.wideiv, %n
				br i1 %exitcond, label %for.end.loopexit, label %scalar.body

				for.end.loopexit:
				br label %for.end

				for.end:
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[LV] Vectorize first-order recurrencesClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 48508

llvm/trunk/include/llvm/Transforms/Utils/LoopUtils.h

llvm/trunk/lib/Transforms/Utils/LoopUtils.cpp

llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/trunk/test/Transforms/LoopVectorize/AArch64/first-order-recurrence.ll

[LV] Vectorize first-order recurrences
ClosedPublic