This is an archive of the discontinued LLVM Phabricator instance.

[LV] Sink scalar operands of predicated instructions
ClosedPublic

Authored by mssimpso on Oct 14 2016, 12:17 PM.

Download Raw Diff

Details

Reviewers

anemet
mkuper
gilr

Commits

rGc62266d680d8: [LV] Sink scalar operands of predicated instructions
rL285097: [LV] Sink scalar operands of predicated instructions

Summary

When we predicate an instruction (div, rem, store) we place the instruction in its own basic block within the vectorized loop. If a predicated instruction has scalar operands, it's possible to recursively sink these scalar expressions into the predicated block so that they might avoid execution. This patch sinks as much scalar computation as possible into predicated blocks. We previously were able to sink such operands only if they were extractelement instructions.

For example, if we have a predicated store "a[i] = x", instead of generating:

vector.body:
  ...
  %i = add i64 %index, 1
  %p = getelementptr inbounds i32, i32* %a, i64 %i
  %x = extractelement <2 x i32> %vec, i32 0
  ...
pred.store:
  store i32 %x, i32* %p
  ...

We will now generate:

vector.body:
  ...
pred.store:
  %i = add i64 %index, 1
  %p = getelementptr inbounds i32, i32* %a, i64 %i
  %x = extractelement <2 x i32> %vec, i32 0
  store i32 %x, i32* %p
  ...

Diff Detail

Repository: rL LLVM

Event Timeline

mssimpso updated this revision to Diff 74728.Oct 14 2016, 12:17 PM

mssimpso retitled this revision from to [LV] Sink scalar operands of predicated instructions.

mssimpso updated this object.

mssimpso added reviewers: anemet, mkuper, gilr.

mssimpso added subscribers: llvm-commits, mcrosier.

Herald added a subscriber: mzolotukhin. · View Herald TranscriptOct 14 2016, 12:17 PM

mssimpso added a parent revision: D25631: [LV] Avoid emitting trivially dead instructions.Oct 14 2016, 12:23 PM

Why do this inside the vectorizer? Are we missing a later code sinking pass?

In D25632#570824, @mkuper wrote:

Why do this inside the vectorizer? Are we missing a later code sinking pass?

Hi Michael,

We actually do have an IR sinking pass (Transforms/Scalar/Sink.cpp), but it's not in the current pass pipeline. I've experimented with it in the past and observed some fairly large performance regressions when using it. It doesn't seem like the compiler is tuned for an "always sink where possible" pass. The sinking implemented in this patch is only aimed at the operands of the predicated instructions. It's really an extension of Gil's work for predicated instructions and all the scalarization we do now. We were previously only sinking the extractelement instructions we created when predicating, but we can do more.

The end goal is to get us closer to preserving the membership of the original predicated blocks if profitable, rather than always vectorizing and hoisting the instructions through if-conversion (aside from the stores, divs, and rems we can't vectorize). So we will need access to the cost model. The current patch sinks all the existing scalar operands, but the cost model may deem it more profitable for us to scalarize an instruction we otherwise would have vectorized, knowing that it is guaranteed to be sunk into a predicated block. I have a follow-on patch that does this.

With that in mind, my thoughts are that in the vectorizer, we (1) know the state of the program before and and after if-conversion. (i.e., was this instruction originally conditionally executed? If so, it might be good to leave it that way. Not, this instruction isn't need on all paths, so let's always sink it). And (2), we at least have some way to judge the profitability of this kind of decision.

Hope that helps (and makes sense)!

That makes sense, especially the cost angle.

On the other hand, my worry is that long-term, as this kind of thing accumulates, we may end up with an ad-hoc "organically grown" mini-optimizer inside the vectorizer. And I'd really like to try to avoid that. The fact this sinking patch requires D25631 to perform DCE inside the vectorizer only reinforces that concern.

So I'm not really sure what we should do. Adam, Gil, thoughts?

lib/Transforms/Vectorize/LoopVectorize.cpp
4279 ↗	(On Diff #74728)	I think the standard way to do this is to run "while (!WorkList.empty()))" and keep a side map of nodes you're done with so you will not add to the worklist. This avoids both the iterator invalidation problem, and recognizing the fixed-point problem. It may make things more complicated in this case, though.
4291 ↗	(On Diff #74728)	Is removing from the middle of a SetVector O(N)? I'd expect this case to happen a lot. (I understand you're doing it to keep Idx constant while making the Worklist smaller. As above, perhaps change the structure of the loop?)
4304 ↗	(On Diff #74728)	PredBB->getFirstInsertionPt() (which is essentially the same thing, but nicer. :-) )

dorit added a subscriber: dorit.Oct 18 2016, 12:43 AM

Hi Michael,

In D25632#571664, @mkuper wrote:

On the other hand, my worry is that long-term, as this kind of thing accumulates, we may end up with an ad-hoc "organically grown" mini-optimizer inside the vectorizer. And I'd really like to try to avoid that.

So this is a constant dilemma (e.g. in D21620). It's not just the question of whether a later pass will optimize but also of how to cost-model later hypothetical optimizations. I agree there's a slippery slope here, so we need to estimate the ROI per case.
I think this patch still falls under the "Generate no junk knowingly" category. Letting a later pass perform the actual sinking won't save us from performing the analysis in favor of the cost model's accuracy. The implementation is focused on the predicated instructions and is encapsulated in a single function, so it doesn't clutter the code (hope this will also hold for cost-model changes). I say Aye for this one.

In D25632#571664, @mkuper wrote:

The fact this sinking patch requires D25631 to perform DCE inside the vectorizer only reinforces that concern.

Since D25631 fixes a somewhat sloppy behavior of the vectorizer I think it also falls under "Generate no junk knowingly", so it could stand on its own as part of the vectorizer's induction variable handling logic.

Michael/Gil,

Thanks very much for the feedback and discussion! I'll go ahead and work on updating the patch according to Michael's suggestions.

In D25632#571664, @mkuper wrote:

That makes sense, especially the cost angle.

On the other hand, my worry is that long-term, as this kind of thing accumulates, we may end up with an ad-hoc "organically grown" mini-optimizer inside the vectorizer. And I'd really like to try to avoid that. The fact this sinking patch requires D25631 to perform DCE inside the vectorizer only reinforces that concern.

Yes, I agree a mini-optimizer inside the vectorizer is not something we want. But to clarify things somewhat, we're not really optimizing existing code as much as we are teaching the vectorizer to not emit poor code, or to "knowingly generate junk", to paraphrase Gil. In D25631, we're not actually performing DCE. In fact, by not emitting would-be dead code, we're preventing a later DCE pass from having to work as much.

We certainly shouldn't try and reinvent the wheel when it isn't necessary. If a standard pass in the pipeline can clean up after the vectorizer (and the compile time penalty from holding on to and churning over the sub-optimal code we generate is not a concern), we should let it do so, in favor of simplicity in the vectorizer. We don't have a pass that does this though (in the limited sense of the predicated instructions), and there's value in letting the cost model guide the future scalarization/sinking choices we can make.

In D25632#572696, @gilr wrote:

It's not just the question of whether a later pass will optimize but also of how to cost-model later hypothetical optimizations.

This is definitely true, but it can actually get a little worse! Later passes can use their own cost model computed over the "junk" the vectorizer generates. For example, in the LTO pipeline, loop unrolling is run after the vectorizer, but before InstCombine cleans up the code. We should probably fix this.

lib/Transforms/Vectorize/LoopVectorize.cpp
4279 ↗	(On Diff #74728)	Sure, I'll restructure the loop. And you're right, removing from the middle of SetVector is not constant-time, so we should avoid that. Thanks!
4304 ↗	(On Diff #74728)	Sounds good!

Ok, consider me convinced. :-)

gilr added inline comments.Oct 18 2016, 12:07 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
4308 ↗	(On Diff #74728)	Double "have" in comment.

Addressed comments from Michael and Gil. Thanks!

Restructured the loop.
Updated comments.

LGTM with a nit (inline)

lib/Transforms/Vectorize/LoopVectorize.cpp
4299 ↗	(On Diff #75060)	Nit - maybe move mayHaveSideEffects() to the end here? the other tests seem much lighter (to a lesser extent - also for Loop->contains()).

This revision is now accepted and ready to land.Oct 19 2016, 10:20 AM

mssimpso added inline comments.Oct 19 2016, 1:45 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
4299 ↗	(On Diff #75060)	Sure, sounds good. I'll reorder the conditions. Thanks!

Michael, did you have any comments about the new loop structure?

Sorry, lost track of this, thanks for pinging.
LGTM.

Closed by commit rL285097: [LV] Sink scalar operands of predicated instructions (authored by mssimpso). · Explain WhyOct 25 2016, 12:09 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

91 lines

test/

Transforms/

LoopVectorize/

consecutive-ptr-uniforms.ll

10 lines

if-pred-stores.ll

10 lines

induction.ll

12 lines

Diff 75758

llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 431 Lines • ▼ Show 20 Lines	protected:
void fixFirstOrderRecurrence(PHINode *Phi);		void fixFirstOrderRecurrence(PHINode *Phi);

/// \brief The Loop exit block may have single value PHI nodes where the		/// \brief The Loop exit block may have single value PHI nodes where the
/// incoming value is 'Undef'. While vectorizing we only handled real values		/// incoming value is 'Undef'. While vectorizing we only handled real values
/// that were defined inside the loop. Here we fix the 'undef case'.		/// that were defined inside the loop. Here we fix the 'undef case'.
/// See PR14725.		/// See PR14725.
void fixLCSSAPHIs();		void fixLCSSAPHIs();

		/// Iteratively sink the scalarized operands of a predicated instruction into
		/// the block that was created for it.
		void sinkScalarOperands(Instruction *PredInst);

/// Predicate conditional instructions that require predication on their		/// Predicate conditional instructions that require predication on their
/// respective conditions.		/// respective conditions.
void predicateInstructions();		void predicateInstructions();

/// Collect the instructions from the original loop that would be trivially		/// Collect the instructions from the original loop that would be trivially
/// dead in the vectorized loop if generated.		/// dead in the vectorized loop if generated.
void collectTriviallyDeadInstructions();		void collectTriviallyDeadInstructions();

▲ Show 20 Lines • Show All 3,796 Lines • ▼ Show 20 Lines	for (auto &Induction : *Legal->getInductionVars()) {
auto *IndUpdate = cast<Instruction>(Ind->getIncomingValueForBlock(Latch));		auto *IndUpdate = cast<Instruction>(Ind->getIncomingValueForBlock(Latch));
if (all_of(IndUpdate->users(), [&](User *U) -> bool {		if (all_of(IndUpdate->users(), [&](User *U) -> bool {
return U == Ind \|\| DeadInstructions.count(cast<Instruction>(U));		return U == Ind \|\| DeadInstructions.count(cast<Instruction>(U));
}))		}))
DeadInstructions.insert(IndUpdate);		DeadInstructions.insert(IndUpdate);
}		}
}		}

		void InnerLoopVectorizer::sinkScalarOperands(Instruction *PredInst) {

		// The basic block and loop containing the predicated instruction.
		auto *PredBB = PredInst->getParent();
		auto *VectorLoop = LI->getLoopFor(PredBB);

		// Initialize a worklist with the operands of the predicated instruction.
		SetVector<Value *> Worklist(PredInst->op_begin(), PredInst->op_end());

		// Holds instructions that we need to analyze again. An instruction may be
		// reanalyzed if we don't yet know if we can sink it or not.
		SmallVector<Instruction *, 8> InstsToReanalyze;

		// Returns true if a given use occurs in the predicated block. Phi nodes use
		// their operands in their corresponding predecessor blocks.
		auto isBlockOfUsePredicated = [&](Use &U) -> bool {
		auto *I = cast<Instruction>(U.getUser());
		BasicBlock *BB = I->getParent();
		if (auto *Phi = dyn_cast<PHINode>(I))
		BB = Phi->getIncomingBlock(
		PHINode::getIncomingValueNumForOperand(U.getOperandNo()));
		return BB == PredBB;
		};

		// Iteratively sink the scalarized operands of the predicated instruction
		// into the block we created for it. When an instruction is sunk, it's
		// operands are then added to the worklist. The algorithm ends after one pass
		// through the worklist doesn't sink a single instruction.
		bool Changed;
		do {

		// Add the instructions that need to be reanalyzed to the worklist, and
		// reset the changed indicator.
		Worklist.insert(InstsToReanalyze.begin(), InstsToReanalyze.end());
		InstsToReanalyze.clear();
		Changed = false;

		while (!Worklist.empty()) {
		auto *I = dyn_cast<Instruction>(Worklist.pop_back_val());

		// We can't sink an instruction if it is a phi node, is already in the
		// predicated block, is not in the loop, or may have side effects.
		if (!I \|\| isa<PHINode>(I) \|\| I->getParent() == PredBB \|\|
		!VectorLoop->contains(I) \|\| I->mayHaveSideEffects())
		continue;

		// It's legal to sink the instruction if all its uses occur in the
		// predicated block. Otherwise, there's nothing to do yet, and we may
		// need to reanalyze the instruction.
		if (!all_of(I->uses(), isBlockOfUsePredicated)) {
		InstsToReanalyze.push_back(I);
		continue;
		}

		// Move the instruction to the beginning of the predicated block, and add
		// it's operands to the worklist.
		I->moveBefore(&*PredBB->getFirstInsertionPt());
		Worklist.insert(I->op_begin(), I->op_end());

		// The sinking may have enabled other instructions to be sunk, so we will
		// need to iterate.
		Changed = true;
		}
		} while (Changed);
		}

void InnerLoopVectorizer::predicateInstructions() {		void InnerLoopVectorizer::predicateInstructions() {

// For each instruction I marked for predication on value C, split I into its		// For each instruction I marked for predication on value C, split I into its
// own basic block to form an if-then construct over C.		// own basic block to form an if-then construct over C. Since I may be fed by
// Since I may be fed by extractelement and/or be feeding an insertelement		// an extractelement instruction or other scalar operand, we try to
// generated during scalarization we try to move such instructions into the		// iteratively sink its scalar operands into the predicated block. If I feeds
// predicated basic block as well. For the insertelement this also means that		// an insertelement instruction, we try to move this instruction into the
// the PHI will be created for the resulting vector rather than for the		// predicated block as well. For non-void types, a phi node will be created
// scalar instruction.		// for the resulting value (either vector or scalar).
		//
// So for some predicated instruction, e.g. the conditional sdiv in:		// So for some predicated instruction, e.g. the conditional sdiv in:
//		//
// for.body:		// for.body:
// ...		// ...
// %add = add nsw i32 %mul, %0		// %add = add nsw i32 %mul, %0
// %cmp5 = icmp sgt i32 %2, 7		// %cmp5 = icmp sgt i32 %2, 7
// br i1 %cmp5, label %if.then, label %if.end		// br i1 %cmp5, label %if.then, label %if.end
//		//
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::predicateInstructions() {

for (auto KV : PredicatedInstructions) {		for (auto KV : PredicatedInstructions) {
BasicBlock::iterator I(KV.first);		BasicBlock::iterator I(KV.first);
BasicBlock *Head = I->getParent();		BasicBlock *Head = I->getParent();
auto BB = SplitBlock(Head, &std::next(I), DT, LI);		auto BB = SplitBlock(Head, &std::next(I), DT, LI);
auto T = SplitBlockAndInsertIfThen(KV.second, &I, /Unreachable=/false,		auto T = SplitBlockAndInsertIfThen(KV.second, &I, /Unreachable=/false,
/BranchWeights=/nullptr, DT, LI);		/BranchWeights=/nullptr, DT, LI);
I->moveBefore(T);		I->moveBefore(T);
// Try to move any extractelement we may have created for the predicated		sinkScalarOperands(&*I);
// instruction into the Then block.
for (Use &Op : I->operands()) {
auto OpInst = dyn_cast<ExtractElementInst>(&Op);
if (OpInst && OpInst->hasOneUse()) // TODO: more accurately - hasOneUser()
OpInst->moveBefore(&*I);
}

I->getParent()->setName(Twine("pred.") + I->getOpcodeName() + ".if");		I->getParent()->setName(Twine("pred.") + I->getOpcodeName() + ".if");
BB->setName(Twine("pred.") + I->getOpcodeName() + ".continue");		BB->setName(Twine("pred.") + I->getOpcodeName() + ".continue");

// If the instruction is non-void create a Phi node at reconvergence point.		// If the instruction is non-void create a Phi node at reconvergence point.
if (!I->getType()->isVoidTy()) {		if (!I->getType()->isVoidTy()) {
Value *IncomingTrue = nullptr;		Value *IncomingTrue = nullptr;
Value *IncomingFalse = nullptr;		Value *IncomingFalse = nullptr;
▲ Show 20 Lines • Show All 2,997 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/LoopVectorize/consecutive-ptr-uniforms.ll

	Show First 20 Lines • Show All 194 Lines • ▼ Show 20 Lines
	; vectorization. The store is scalarized because it's in a predicated block.			; vectorization. The store is scalarized because it's in a predicated block.
	; Even though the load in this example is vectorized and only uses the pointer			; Even though the load in this example is vectorized and only uses the pointer
	; as if it were uniform, the store is scalarized, making the pointer			; as if it were uniform, the store is scalarized, making the pointer
	; non-uniform.			; non-uniform.
	;			;
	; INTER-NOT: LV: Found uniform instruction: %tmp0 = getelementptr inbounds %pair, %pair* %p, i64 %i, i32 0			; INTER-NOT: LV: Found uniform instruction: %tmp0 = getelementptr inbounds %pair, %pair* %p, i64 %i, i32 0
	; INTER: vector.body			; INTER: vector.body
	; INTER: %index = phi i64 [ 0, %vector.ph ], [ %index.next, {{.*}} ]			; INTER: %index = phi i64 [ 0, %vector.ph ], [ %index.next, {{.*}} ]
	; INTER: %[[I1:.+]] = or i64 %index, 1
	; INTER: %[[I2:.+]] = or i64 %index, 2
	; INTER: %[[I3:.+]] = or i64 %index, 3
	; INTER: %[[G0:.+]] = getelementptr inbounds %pair, %pair* %p, i64 %index, i32 0			; INTER: %[[G0:.+]] = getelementptr inbounds %pair, %pair* %p, i64 %index, i32 0
				; INTER: %[[B0:.+]] = bitcast i32* %[[G0]] to <8 x i32>*
				; INTER: %wide.vec = load <8 x i32>, <8 x i32>* %[[B0]], align 8
				; INTER: %[[I1:.+]] = or i64 %index, 1
	; INTER: getelementptr inbounds %pair, %pair* %p, i64 %[[I1]], i32 0			; INTER: getelementptr inbounds %pair, %pair* %p, i64 %[[I1]], i32 0
				; INTER: %[[I2:.+]] = or i64 %index, 2
	; INTER: getelementptr inbounds %pair, %pair* %p, i64 %[[I2]], i32 0			; INTER: getelementptr inbounds %pair, %pair* %p, i64 %[[I2]], i32 0
				; INTER: %[[I3:.+]] = or i64 %index, 3
	; INTER: getelementptr inbounds %pair, %pair* %p, i64 %[[I3]], i32 0			; INTER: getelementptr inbounds %pair, %pair* %p, i64 %[[I3]], i32 0
	; INTER: %[[B0:.+]] = bitcast i32* %[[G0]] to <8 x i32>*
	; INTER: %wide.vec = load <8 x i32>, <8 x i32>* %[[B0]], align 8
	; INTER: br i1 {{.*}}, label %middle.block, label %vector.body			; INTER: br i1 {{.*}}, label %middle.block, label %vector.body
	;			;
	define void @predicated_store(%pair *%p, i32 %x, i64 %n) {			define void @predicated_store(%pair *%p, i32 %x, i64 %n) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%i = phi i64 [ %i.next, %if.merge ], [ 0, %entry ]			%i = phi i64 [ %i.next, %if.merge ], [ 0, %entry ]
	▲ Show 20 Lines • Show All 221 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/LoopVectorize/if-pred-stores.ll

	; RUN: opt -S -vectorize-num-stores-pred=1 -force-vector-width=1 -force-vector-interleave=2 -loop-vectorize -verify-loop-info -simplifycfg < %s \| FileCheck %s --check-prefix=UNROLL			; RUN: opt -S -vectorize-num-stores-pred=1 -force-vector-width=1 -force-vector-interleave=2 -loop-vectorize -verify-loop-info -simplifycfg < %s \| FileCheck %s --check-prefix=UNROLL
	; RUN: opt -S -vectorize-num-stores-pred=1 -force-vector-width=1 -force-vector-interleave=2 -loop-vectorize -verify-loop-info < %s \| FileCheck %s --check-prefix=UNROLL-NOSIMPLIFY			; RUN: opt -S -vectorize-num-stores-pred=1 -force-vector-width=1 -force-vector-interleave=2 -loop-vectorize -verify-loop-info < %s \| FileCheck %s --check-prefix=UNROLL-NOSIMPLIFY
	; RUN: opt -S -vectorize-num-stores-pred=1 -force-vector-width=2 -force-vector-interleave=1 -loop-vectorize -enable-cond-stores-vec -verify-loop-info -simplifycfg < %s \| FileCheck %s --check-prefix=VEC			; RUN: opt -S -vectorize-num-stores-pred=1 -force-vector-width=2 -force-vector-interleave=1 -loop-vectorize -enable-cond-stores-vec -verify-loop-info -simplifycfg < %s \| FileCheck %s --check-prefix=VEC

	target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"

	; Test predication of stores.			; Test predication of stores.
	define i32 @test(i32* nocapture %f) #0 {			define i32 @test(i32* nocapture %f) #0 {
	entry:			entry:
	br label %for.body			br label %for.body

	; VEC-LABEL: test			; VEC-LABEL: test
	; VEC: %[[v0:.+]] = add i64 %index, 0			; VEC: %[[v0:.+]] = add i64 %index, 0
	; VEC: %[[v1:.+]] = add i64 %index, 1
	; VEC: %[[v2:.+]] = getelementptr inbounds i32, i32* %f, i64 %[[v0]]
	; VEC: %[[v4:.+]] = getelementptr inbounds i32, i32* %f, i64 %[[v1]]
	; VEC: %[[v8:.+]] = icmp sgt <2 x i32> %{{.*}}, <i32 100, i32 100>			; VEC: %[[v8:.+]] = icmp sgt <2 x i32> %{{.*}}, <i32 100, i32 100>
	; VEC: %[[v9:.+]] = add nsw <2 x i32> %{{.*}}, <i32 20, i32 20>			; VEC: %[[v9:.+]] = add nsw <2 x i32> %{{.*}}, <i32 20, i32 20>
	; VEC: %[[v10:.+]] = and <2 x i1> %[[v8]], <i1 true, i1 true>			; VEC: %[[v10:.+]] = and <2 x i1> %[[v8]], <i1 true, i1 true>
	; VEC: %[[o1:.+]] = or <2 x i1> zeroinitializer, %[[v10]]			; VEC: %[[o1:.+]] = or <2 x i1> zeroinitializer, %[[v10]]
	; VEC: %[[v11:.+]] = extractelement <2 x i1> %[[o1]], i32 0			; VEC: %[[v11:.+]] = extractelement <2 x i1> %[[o1]], i32 0
	; VEC: %[[v12:.+]] = icmp eq i1 %[[v11]], true			; VEC: %[[v12:.+]] = icmp eq i1 %[[v11]], true
	; VEC: br i1 %[[v12]], label %[[cond:.+]], label %[[else:.+]]			; VEC: br i1 %[[v12]], label %[[cond:.+]], label %[[else:.+]]
	;			;
	; VEC: [[cond]]:			; VEC: [[cond]]:
	; VEC: %[[v13:.+]] = extractelement <2 x i32> %[[v9]], i32 0			; VEC: %[[v13:.+]] = extractelement <2 x i32> %[[v9]], i32 0
				; VEC: %[[v2:.+]] = getelementptr inbounds i32, i32* %f, i64 %[[v0]]
	; VEC: store i32 %[[v13]], i32* %[[v2]], align 4			; VEC: store i32 %[[v13]], i32* %[[v2]], align 4
	; VEC: br label %[[else:.+]]			; VEC: br label %[[else:.+]]
	;			;
	; VEC: [[else]]:			; VEC: [[else]]:
	; VEC: %[[v15:.+]] = extractelement <2 x i1> %[[o1]], i32 1			; VEC: %[[v15:.+]] = extractelement <2 x i1> %[[o1]], i32 1
	; VEC: %[[v16:.+]] = icmp eq i1 %[[v15]], true			; VEC: %[[v16:.+]] = icmp eq i1 %[[v15]], true
	; VEC: br i1 %[[v16]], label %[[cond2:.+]], label %[[else2:.+]]			; VEC: br i1 %[[v16]], label %[[cond2:.+]], label %[[else2:.+]]
	;			;
	; VEC: [[cond2]]:			; VEC: [[cond2]]:
	; VEC: %[[v17:.+]] = extractelement <2 x i32> %[[v9]], i32 1			; VEC: %[[v17:.+]] = extractelement <2 x i32> %[[v9]], i32 1
				; VEC: %[[v1:.+]] = add i64 %index, 1
				; VEC: %[[v4:.+]] = getelementptr inbounds i32, i32* %f, i64 %[[v1]]
	; VEC: store i32 %[[v17]], i32* %[[v4]], align 4			; VEC: store i32 %[[v17]], i32* %[[v4]], align 4
	; VEC: br label %[[else2:.+]]			; VEC: br label %[[else2:.+]]
	;			;
	; VEC: [[else2]]:			; VEC: [[else2]]:

	; UNROLL-LABEL: test			; UNROLL-LABEL: test
	; UNROLL: vector.body:			; UNROLL: vector.body:
	; UNROLL: %[[IND:[a-zA-Z0-9]+]] = add i64 %{{.*}}, 0			; UNROLL: %[[IND:[a-zA-Z0-9]+]] = add i64 %{{.*}}, 0
	; UNROLL: %[[IND1:[a-zA-Z0-9]+]] = add i64 %{{.*}}, 1			; UNROLL: %[[IND1:[a-zA-Z0-9]+]] = add i64 %{{.*}}, 1
	; UNROLL: %[[v0:[a-zA-Z0-9]+]] = getelementptr inbounds i32, i32* %f, i64 %[[IND]]			; UNROLL: %[[v0:[a-zA-Z0-9]+]] = getelementptr inbounds i32, i32* %f, i64 %[[IND]]
	; UNROLL: %[[v1:[a-zA-Z0-9]+]] = getelementptr inbounds i32, i32* %f, i64 %[[IND1]]			; UNROLL: %[[v1:[a-zA-Z0-9]+]] = getelementptr inbounds i32, i32* %f, i64 %[[IND1]]
	; UNROLL: %[[v2:[a-zA-Z0-9]+]] = load i32, i32* %[[v0]], align 4			; UNROLL: %[[v2:[a-zA-Z0-9]+]] = load i32, i32* %[[v0]], align 4
	; UNROLL: %[[v3:[a-zA-Z0-9]+]] = load i32, i32* %[[v1]], align 4			; UNROLL: %[[v3:[a-zA-Z0-9]+]] = load i32, i32* %[[v1]], align 4
	; UNROLL: %[[v4:[a-zA-Z0-9]+]] = icmp sgt i32 %[[v2]], 100			; UNROLL: %[[v4:[a-zA-Z0-9]+]] = icmp sgt i32 %[[v2]], 100
	; UNROLL: %[[v5:[a-zA-Z0-9]+]] = icmp sgt i32 %[[v3]], 100			; UNROLL: %[[v5:[a-zA-Z0-9]+]] = icmp sgt i32 %[[v3]], 100
	; UNROLL: %[[v6:[a-zA-Z0-9]+]] = add nsw i32 %[[v2]], 20
	; UNROLL: %[[v7:[a-zA-Z0-9]+]] = add nsw i32 %[[v3]], 20
	; UNROLL: %[[o1:[a-zA-Z0-9]+]] = or i1 false, %[[v4]]			; UNROLL: %[[o1:[a-zA-Z0-9]+]] = or i1 false, %[[v4]]
	; UNROLL: %[[o2:[a-zA-Z0-9]+]] = or i1 false, %[[v5]]			; UNROLL: %[[o2:[a-zA-Z0-9]+]] = or i1 false, %[[v5]]
	; UNROLL: %[[v8:[a-zA-Z0-9]+]] = icmp eq i1 %[[o1]], true			; UNROLL: %[[v8:[a-zA-Z0-9]+]] = icmp eq i1 %[[o1]], true
	; UNROLL: br i1 %[[v8]], label %[[cond:[a-zA-Z0-9.]+]], label %[[else:[a-zA-Z0-9.]+]]			; UNROLL: br i1 %[[v8]], label %[[cond:[a-zA-Z0-9.]+]], label %[[else:[a-zA-Z0-9.]+]]
	;			;
	; UNROLL: [[cond]]:			; UNROLL: [[cond]]:
				; UNROLL: %[[v6:[a-zA-Z0-9]+]] = add nsw i32 %[[v2]], 20
	; UNROLL: store i32 %[[v6]], i32* %[[v0]], align 4			; UNROLL: store i32 %[[v6]], i32* %[[v0]], align 4
	; UNROLL: br label %[[else]]			; UNROLL: br label %[[else]]
	;			;
	; UNROLL: [[else]]:			; UNROLL: [[else]]:
	; UNROLL: %[[v9:[a-zA-Z0-9]+]] = icmp eq i1 %[[o2]], true			; UNROLL: %[[v9:[a-zA-Z0-9]+]] = icmp eq i1 %[[o2]], true
	; UNROLL: br i1 %[[v9]], label %[[cond2:[a-zA-Z0-9.]+]], label %[[else2:[a-zA-Z0-9.]+]]			; UNROLL: br i1 %[[v9]], label %[[cond2:[a-zA-Z0-9.]+]], label %[[else2:[a-zA-Z0-9.]+]]
	;			;
	; UNROLL: [[cond2]]:			; UNROLL: [[cond2]]:
				; UNROLL: %[[v7:[a-zA-Z0-9]+]] = add nsw i32 %[[v3]], 20
	; UNROLL: store i32 %[[v7]], i32* %[[v1]], align 4			; UNROLL: store i32 %[[v7]], i32* %[[v1]], align 4
	; UNROLL: br label %[[else2]]			; UNROLL: br label %[[else2]]
	;			;
	; UNROLL: [[else2]]:			; UNROLL: [[else2]]:

	for.body:			for.body:
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.inc ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.inc ]
	%arrayidx = getelementptr inbounds i32, i32* %f, i64 %indvars.iv			%arrayidx = getelementptr inbounds i32, i32* %f, i64 %indvars.iv
	▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/LoopVectorize/induction.ll

	Show First 20 Lines • Show All 297 Lines • ▼ Show 20 Lines
	; x /= i;			; x /= i;
	; sum += x;			; sum += x;
	; }			; }
	;			;
	; CHECK-LABEL: @scalarize_induction_variable_05(			; CHECK-LABEL: @scalarize_induction_variable_05(
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %pred.udiv.continue2 ]			; CHECK: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %pred.udiv.continue2 ]
	; CHECK: %[[I0:.+]] = add i32 %index, 0			; CHECK: %[[I0:.+]] = add i32 %index, 0
	; CHECK: %[[I1:.+]] = add i32 %index, 1
	; CHECK: getelementptr inbounds i32, i32* %a, i32 %[[I0]]			; CHECK: getelementptr inbounds i32, i32* %a, i32 %[[I0]]
	; CHECK: pred.udiv.if:			; CHECK: pred.udiv.if:
	; CHECK: udiv i32 {{.*}}, %[[I0]]			; CHECK: udiv i32 {{.*}}, %[[I0]]
	; CHECK: pred.udiv.if1:			; CHECK: pred.udiv.if1:
				; CHECK: %[[I1:.+]] = add i32 %index, 1
	; CHECK: udiv i32 {{.*}}, %[[I1]]			; CHECK: udiv i32 {{.*}}, %[[I1]]
	;			;
	; UNROLL-NO_IC-LABEL: @scalarize_induction_variable_05(			; UNROLL-NO_IC-LABEL: @scalarize_induction_variable_05(
	; UNROLL-NO-IC: vector.body:			; UNROLL-NO-IC: vector.body:
	; UNROLL-NO-IC: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %pred.udiv.continue11 ]			; UNROLL-NO-IC: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %pred.udiv.continue11 ]
	; UNROLL-NO-IC: %[[I0:.+]] = add i32 %index, 0			; UNROLL-NO-IC: %[[I0:.+]] = add i32 %index, 0
	; UNROLL-NO-IC: %[[I1:.+]] = add i32 %index, 1
	; UNROLL-NO-IC: %[[I2:.+]] = add i32 %index, 2			; UNROLL-NO-IC: %[[I2:.+]] = add i32 %index, 2
	; UNROLL-NO-IC: %[[I3:.+]] = add i32 %index, 3
	; UNROLL-NO-IC: getelementptr inbounds i32, i32* %a, i32 %[[I0]]			; UNROLL-NO-IC: getelementptr inbounds i32, i32* %a, i32 %[[I0]]
	; UNROLL-NO-IC: getelementptr inbounds i32, i32* %a, i32 %[[I2]]			; UNROLL-NO-IC: getelementptr inbounds i32, i32* %a, i32 %[[I2]]
	; UNROLL-NO-IC: pred.udiv.if:			; UNROLL-NO-IC: pred.udiv.if:
	; UNROLL-NO-IC: udiv i32 {{.*}}, %[[I0]]			; UNROLL-NO-IC: udiv i32 {{.*}}, %[[I0]]
	; UNROLL-NO-IC: pred.udiv.if6:			; UNROLL-NO-IC: pred.udiv.if6:
				; UNROLL-NO-IC: %[[I1:.+]] = add i32 %index, 1
	; UNROLL-NO-IC: udiv i32 {{.*}}, %[[I1]]			; UNROLL-NO-IC: udiv i32 {{.*}}, %[[I1]]
	; UNROLL-NO-IC: pred.udiv.if8:			; UNROLL-NO-IC: pred.udiv.if8:
	; UNROLL-NO-IC: udiv i32 {{.*}}, %[[I2]]			; UNROLL-NO-IC: udiv i32 {{.*}}, %[[I2]]
	; UNROLL-NO-IC: pred.udiv.if10:			; UNROLL-NO-IC: pred.udiv.if10:
				; UNROLL-NO-IC: %[[I3:.+]] = add i32 %index, 3
	; UNROLL-NO-IC: udiv i32 {{.*}}, %[[I3]]			; UNROLL-NO-IC: udiv i32 {{.*}}, %[[I3]]
	;			;
	; IND-LABEL: @scalarize_induction_variable_05(			; IND-LABEL: @scalarize_induction_variable_05(
	; IND: vector.body:			; IND: vector.body:
	; IND: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %pred.udiv.continue2 ]			; IND: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %pred.udiv.continue2 ]
	; IND: %[[I1:.+]] = or i32 %index, 1
	; IND: %[[E0:.+]] = sext i32 %index to i64			; IND: %[[E0:.+]] = sext i32 %index to i64
	; IND: getelementptr inbounds i32, i32* %a, i64 %[[E0]]			; IND: getelementptr inbounds i32, i32* %a, i64 %[[E0]]
	; IND: pred.udiv.if:			; IND: pred.udiv.if:
	; IND: udiv i32 {{.*}}, %index			; IND: udiv i32 {{.*}}, %index
	; IND: pred.udiv.if1:			; IND: pred.udiv.if1:
				; IND: %[[I1:.+]] = or i32 %index, 1
	; IND: udiv i32 {{.*}}, %[[I1]]			; IND: udiv i32 {{.*}}, %[[I1]]
	;			;
	; UNROLL-LABEL: @scalarize_induction_variable_05(			; UNROLL-LABEL: @scalarize_induction_variable_05(
	; UNROLL: vector.body:			; UNROLL: vector.body:
	; UNROLL: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %pred.udiv.continue11 ]			; UNROLL: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %pred.udiv.continue11 ]
	; UNROLL: %[[I1:.+]] = or i32 %index, 1
	; UNROLL: %[[I2:.+]] = or i32 %index, 2			; UNROLL: %[[I2:.+]] = or i32 %index, 2
	; UNROLL: %[[I3:.+]] = or i32 %index, 3
	; UNROLL: %[[E0:.+]] = sext i32 %index to i64			; UNROLL: %[[E0:.+]] = sext i32 %index to i64
	; UNROLL: %[[G0:.+]] = getelementptr inbounds i32, i32* %a, i64 %[[E0]]			; UNROLL: %[[G0:.+]] = getelementptr inbounds i32, i32* %a, i64 %[[E0]]
	; UNROLL: getelementptr i32, i32* %[[G0]], i64 2			; UNROLL: getelementptr i32, i32* %[[G0]], i64 2
	; UNROLL: pred.udiv.if:			; UNROLL: pred.udiv.if:
	; UNROLL: udiv i32 {{.*}}, %index			; UNROLL: udiv i32 {{.*}}, %index
	; UNROLL: pred.udiv.if6:			; UNROLL: pred.udiv.if6:
				; UNROLL: %[[I1:.+]] = or i32 %index, 1
	; UNROLL: udiv i32 {{.*}}, %[[I1]]			; UNROLL: udiv i32 {{.*}}, %[[I1]]
	; UNROLL: pred.udiv.if8:			; UNROLL: pred.udiv.if8:
	; UNROLL: udiv i32 {{.*}}, %[[I2]]			; UNROLL: udiv i32 {{.*}}, %[[I2]]
	; UNROLL: pred.udiv.if10:			; UNROLL: pred.udiv.if10:
				; UNROLL: %[[I3:.+]] = or i32 %index, 3
	; UNROLL: udiv i32 {{.*}}, %[[I3]]			; UNROLL: udiv i32 {{.*}}, %[[I3]]

	define i32 @scalarize_induction_variable_05(i32* %a, i32 %x, i1 %c, i32 %n) {			define i32 @scalarize_induction_variable_05(i32* %a, i32 %x, i1 %c, i32 %n) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%i = phi i32 [ 0, %entry ], [ %i.next, %if.end ]			%i = phi i32 [ 0, %entry ], [ %i.next, %if.end ]
	▲ Show 20 Lines • Show All 402 Lines • Show Last 20 Lines