This is an archive of the discontinued LLVM Phabricator instance.

[LV] Keep predicated instructions in the same block
AbandonedPublic

Authored by mssimpso on Nov 11 2016, 12:26 PM.

Download Raw Diff

Details

Reviewers

mkuper
gilr

Summary

When predicating scalar instructions, we previously placed every instruction that requires predication into its own block, regardless of whether multiple instructions may have occurred in the same block in the original loop. For example, if a division and a store from the same block in the original loop require predication, we created a new basic block for each of the 2 x VF x UF scalar instructions in the vector loop.

This patch modifies code generation for instruction predication such that we keep the predicated instructions in the same block after vectorization if they were in the same block before vectorization.

Diff Detail

Build Status

Buildable 1265
Build 1265: arc lint + arc unit

Event Timeline

mssimpso updated this revision to Diff 77651.Nov 11 2016, 12:26 PM

mssimpso retitled this revision from to [LV] Keep predicated instructions in the same block.

mssimpso updated this object.

mssimpso added reviewers: mkuper, gilr.

mssimpso added subscribers: llvm-commits, mcrosier.

Herald added a subscriber: mzolotukhin. · View Herald TranscriptNov 11 2016, 12:26 PM

gilr added inline comments.Nov 13 2016, 10:06 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
7111–7115	Why force the creation of the edge masks here?

mkuper added inline comments.Nov 13 2016, 4:15 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
482–484	As long as you're changing this comment - this is no longer only for "un-vectorizable" instructions.
4317–4318	This changes now, right?
4491	Do we generally prefer this to defining the vector inside the loop, from a performance standpoint? (The latter would be clearer, I think).
4493	What happens in the "Lane == 0 && Legal->isUniformAfterVectorization(I)" case? I'm having a bit of trouble imagining what the code ends up looking.

mssimpso added inline comments.Nov 14 2016, 12:07 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
482–484	That's true! I'll update the comment.
4317–4318	That's right; I missed this. I'll update this comment as well.
4491	Not that I'm aware of. I'm happy to move the vector definition inside the loop.
4493	This loop is collecting in VectorLoopPredInsts the scalarized instructions we produced in scalarizeInstruction. It then predicates all the scalarized instructions for a given unroll part and vector lane. If an instruction is uniform-after-vectorization we only generate values for Lane zero during scalarization, and getScalarValue asserts if we try to get a value for a Lane > 0. However, as I'm thinking about this, I don't think we can ever end up with an instruction that requires predication that is also marked uniform-after-vectorization. Unless I'm missing something, we should probably just remove this if condition. What do you think?
7111–7115	We have to create the edge masks in program order because they are cached in MaskCache. PHI widening, for example, also calls createEdgeMask. During PHI widening, if a mask isn't already in the cache, we will produce it at that time. But then when we go back to perform the actual predication, we would reuse the masks we created for the PHIs. But these masks may not dominate the branches we create for the predicated blocks, so we have to produce the masks in order. The existing tests in if-pred-non-void.ll require this.

mkuper added inline comments.Nov 14 2016, 1:40 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
4493	However, as I'm thinking about this, I don't think we can ever end up with an instruction that requires predication that is also marked uniform-after-vectorization. Unless I'm missing something, we should probably just remove this if condition. What do you think? Yes, that's why I was having trouble imagining it. Couldn't figure out what a uniform-after-vectorization predicated instruction looks like. I think we should remove this, and have an assert somewhere to make sure it really doesn't happen. (If we find out that does happen, I guess it's some edge-case we're not thinking about, so I'm not sure this code would do the right thing anyway.)

Addressed comments from Michael and Gil.

Updated comments
Added assert for uniform-after-vectorization
Moved vector definition inside loop
Made some auto types explicit

gilr added inline comments.Nov 14 2016, 3:00 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
7111–7115	Ahhh ... right. This seems somewhat unclean, though: the patch moves block mask creation logic from scalarization to predication (which makes sense), but must leave some of it behind for caching reasons. So as long as we're generating masks in program order for caching reasons, could the original createBlockInMask call be retained if the same caching was added for block masks? This would make it clear that it's the block mask we need generated here, avoid partly-duplicating its code and make the masks caching behavior more consistent (more of a getOrCreate API).

mssimpso added inline comments.Nov 15 2016, 7:20 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
7111–7115	Sure, we can do that. I'll add a cache for the block-in masks and update the patch. Thanks!

Addressed Gil's comments.

Added caching for block-in masks, and restored call to createBlockInMask in scalarizeInstruction
Updated comments

LGTM, but please also wait for Gil re the masking change.

So committing to a single predicated block may be too aggressive at this point, for example in the following where the srem is moved into the predicated block and above the vectorized sub feeding it:

void foo(int* a, int b, int* c, int* d) {
  for (int i = 0; i < 10000; ++i) {
    int x = 333;
    if (a[i] > 777) {
      int t1 = a[i] / c[i];
      int t2 = b - t1;
      x = t2 % d[i];
    }
    a[i] += x;
  }
}

I'm not sure it would still serve the needs of the smarter-scalarization work, but as a standalone improvement to predication logic we can still try to combine predicated instructions into a mutual basic block wherever possible (e.g. reuse the last predicated basic block if possible, create a new one otherwise).

...and ignore my LGTM, Gil's right. :-)

Ah, you're right, Gil. Thanks very much for pointing this out. It seems this work will not be as straightforward as I thought, since there will be times when splitting the original block is unavoidable. I think it probably makes sense to table this patch for the moment and perhaps come back to it after the scalarization work. I'll finish addressing all the comments over at D26083. Thanks!

I'm abandoning this patch since the original approach was not correct. A new approach that combines predicated instructions into the same basic block when possible will likely look different enough from this patch that we should start a new review.

Revision Contents

Path

Size

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

251 lines

test/

Transforms/

LoopVectorize/

AArch64/

predication_costs.ll

6 lines

if-pred-non-void.ll

209 lines

induction.ll

32 lines

interleaved-accesses-pred-stores.ll

14 lines

Diff 78006

lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 435 Lines • ▼ Show 20 Lines	protected:
/// that were defined inside the loop. Here we fix the 'undef case'.		/// that were defined inside the loop. Here we fix the 'undef case'.
/// See PR14725.		/// See PR14725.
void fixLCSSAPHIs();		void fixLCSSAPHIs();

/// Iteratively sink the scalarized operands of a predicated instruction into		/// Iteratively sink the scalarized operands of a predicated instruction into
/// the block that was created for it.		/// the block that was created for it.
void sinkScalarOperands(Instruction *PredInst);		void sinkScalarOperands(Instruction *PredInst);

/// Predicate conditional instructions that require predication on their		/// Predicate the instructions in the vectorized loop whose corresponding
/// respective conditions.		/// instructions in the scalar loop have been marked scalar-with-predication
		/// by the legality analysis.
void predicateInstructions();		void predicateInstructions();

		/// Predicate the instructions in \p PredInsts by the condition \p Cmp. All
		/// the given instructions will be placed in the same basic block.
		void predicateInstructions(ArrayRef<Instruction > PredInsts, Value Cmp);

/// Collect the instructions from the original loop that would be trivially		/// Collect the instructions from the original loop that would be trivially
/// dead in the vectorized loop if generated.		/// dead in the vectorized loop if generated.
void collectTriviallyDeadInstructions();		void collectTriviallyDeadInstructions();

/// Shrinks vector element sizes to the smallest bitwidth they can be legally		/// Shrinks vector element sizes to the smallest bitwidth they can be legally
/// represented as.		/// represented as.
void truncateToMinimalBitwidths();		void truncateToMinimalBitwidths();

Show All 13 Lines	protected:
/// arbitrary length vectors.		/// arbitrary length vectors.
void widenPHIInstruction(Instruction *PN, unsigned UF, unsigned VF,		void widenPHIInstruction(Instruction *PN, unsigned UF, unsigned VF,
PhiVector *PV);		PhiVector *PV);

/// Insert the new loop to the loop hierarchy and pass manager		/// Insert the new loop to the loop hierarchy and pass manager
/// and update the analysis passes.		/// and update the analysis passes.
void updateAnalysis();		void updateAnalysis();

/// This instruction is un-vectorizable. Implement it as a sequence		/// Represent instruction \p Instr from the original loop as a sequence of
/// of scalars. If \p IfPredicateInstr is true we need to 'hide' each		/// scalar instructions in the vectorized loop.
/// scalarized instruction behind an if block predicated on the control		virtual void scalarizeInstruction(Instruction *Instr);
		mkuperUnsubmitted Done Reply Inline Actions As long as you're changing this comment - this is no longer only for "un-vectorizable" instructions. mkuper: As long as you're changing this comment - this is no longer only for "un-vectorizable"…
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions That's true! I'll update the comment. mssimpso: That's true! I'll update the comment.
/// dependence of the instruction.
virtual void scalarizeInstruction(Instruction *Instr,
bool IfPredicateInstr = false);

/// Vectorize Load and Store instructions,		/// Vectorize Load and Store instructions,
virtual void vectorizeMemoryInstruction(Instruction *Instr);		virtual void vectorizeMemoryInstruction(Instruction *Instr);

/// Create a broadcast instruction. This method generates a broadcast		/// Create a broadcast instruction. This method generates a broadcast
/// instruction (shuffle) for loop invariant values and for the induction		/// instruction (shuffle) for loop invariant values and for the induction
/// value. If this is the induction variable then we extend it to N, N+1, ...		/// value. If this is the induction variable then we extend it to N, N+1, ...
/// this is needed because each iteration in the loop corresponds to a SIMD		/// this is needed because each iteration in the loop corresponds to a SIMD
▲ Show 20 Lines • Show All 258 Lines • ▼ Show 20 Lines	protected:
PHINode *OldInduction;		PHINode *OldInduction;

/// Maps values from the original loop to their corresponding values in the		/// Maps values from the original loop to their corresponding values in the
/// vectorized loop. A key value can map to either vector values, scalar		/// vectorized loop. A key value can map to either vector values, scalar
/// values or both kinds of values, depending on whether the key was		/// values or both kinds of values, depending on whether the key was
/// vectorized and scalarized.		/// vectorized and scalarized.
ValueMap VectorLoopValueMap;		ValueMap VectorLoopValueMap;

/// Store instructions that should be predicated, as a pair		/// Holds the predicates created for the edges between given source and
/// <StoreInst, Predicate>		/// destination blocks.
SmallVector<std::pair<Instruction , Value >, 4> PredicatedInstructions;
EdgeMaskCache MaskCache;		EdgeMaskCache MaskCache;

		/// Holds the predicates created for entry into the given basic blocks.
		DenseMap<BasicBlock *, VectorParts> BlockInCache;

/// Trip count of the original loop.		/// Trip count of the original loop.
Value *TripCount;		Value *TripCount;
/// Trip count of the widened loop (TripCount - TripCount % (VF*UF))		/// Trip count of the widened loop (TripCount - TripCount % (VF*UF))
Value *VectorTripCount;		Value *VectorTripCount;

/// The legality analysis.		/// The legality analysis.
LoopVectorizationLegality *Legal;		LoopVectorizationLegality *Legal;

Show All 20 Lines	InnerLoopUnroller(Loop *OrigLoop, PredicatedScalarEvolution &PSE,
const TargetTransformInfo TTI, AssumptionCache AC,		const TargetTransformInfo TTI, AssumptionCache AC,
OptimizationRemarkEmitter *ORE, unsigned UnrollFactor,		OptimizationRemarkEmitter *ORE, unsigned UnrollFactor,
LoopVectorizationLegality *LVL,		LoopVectorizationLegality *LVL,
LoopVectorizationCostModel *CM)		LoopVectorizationCostModel *CM)
: InnerLoopVectorizer(OrigLoop, PSE, LI, DT, TLI, TTI, AC, ORE, 1,		: InnerLoopVectorizer(OrigLoop, PSE, LI, DT, TLI, TTI, AC, ORE, 1,
UnrollFactor, LVL, CM) {}		UnrollFactor, LVL, CM) {}

private:		private:
void scalarizeInstruction(Instruction *Instr,		void scalarizeInstruction(Instruction *Instr) override;
bool IfPredicateInstr = false) override;
void vectorizeMemoryInstruction(Instruction *Instr) override;		void vectorizeMemoryInstruction(Instruction *Instr) override;
Value getBroadcastInstrs(Value V) override;		Value getBroadcastInstrs(Value V) override;
Value getStepVector(Value Val, int StartIdx, Value *Step,		Value getStepVector(Value Val, int StartIdx, Value *Step,
Instruction::BinaryOps Opcode =		Instruction::BinaryOps Opcode =
Instruction::BinaryOpsEnd) override;		Instruction::BinaryOpsEnd) override;
Value reverseVector(Value Vec) override;		Value reverseVector(Value Vec) override;
};		};

▲ Show 20 Lines • Show All 1,975 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::vectorizeMemoryInstruction(Instruction *Instr) {
// target abi alignment in such a case.		// target abi alignment in such a case.
const DataLayout &DL = Instr->getModule()->getDataLayout();		const DataLayout &DL = Instr->getModule()->getDataLayout();
if (!Alignment)		if (!Alignment)
Alignment = DL.getABITypeAlignment(ScalarDataTy);		Alignment = DL.getABITypeAlignment(ScalarDataTy);
unsigned AddressSpace = Ptr->getType()->getPointerAddressSpace();		unsigned AddressSpace = Ptr->getType()->getPointerAddressSpace();

// Scalarize the memory instruction if necessary.		// Scalarize the memory instruction if necessary.
if (Legal->memoryInstructionMustBeScalarized(Instr, VF))		if (Legal->memoryInstructionMustBeScalarized(Instr, VF))
return scalarizeInstruction(Instr, Legal->isScalarWithPredication(Instr));		return scalarizeInstruction(Instr);

// Determine if the pointer operand of the access is either consecutive or		// Determine if the pointer operand of the access is either consecutive or
// reverse consecutive.		// reverse consecutive.
int ConsecutiveStride = Legal->isConsecutivePtr(Ptr);		int ConsecutiveStride = Legal->isConsecutivePtr(Ptr);
bool Reverse = ConsecutiveStride < 0;		bool Reverse = ConsecutiveStride < 0;

// Determine if either a gather or scatter operation is legal.		// Determine if either a gather or scatter operation is legal.
bool CreateGatherScatter =		bool CreateGatherScatter =
▲ Show 20 Lines • Show All 153 Lines • ▼ Show 20 Lines	if (CreateGatherScatter) {
NewLI = Builder.CreateAlignedLoad(VecPtr, Alignment, "wide.load");		NewLI = Builder.CreateAlignedLoad(VecPtr, Alignment, "wide.load");
Entry[Part] = Reverse ? reverseVector(NewLI) : NewLI;		Entry[Part] = Reverse ? reverseVector(NewLI) : NewLI;
}		}
addMetadata(NewLI, LI);		addMetadata(NewLI, LI);
}		}
VectorLoopValueMap.initVector(Instr, Entry);		VectorLoopValueMap.initVector(Instr, Entry);
}		}

void InnerLoopVectorizer::scalarizeInstruction(Instruction *Instr,		void InnerLoopVectorizer::scalarizeInstruction(Instruction *Instr) {
bool IfPredicateInstr) {
assert(!Instr->getType()->isAggregateType() && "Can't handle vectors");		assert(!Instr->getType()->isAggregateType() && "Can't handle vectors");
DEBUG(dbgs() << "LV: Scalarizing"		DEBUG(dbgs() << "LV: Scalarizing: " << *Instr << '\n');
<< (IfPredicateInstr ? " and predicating:" : ":") << *Instr
<< '\n');
// Holds vector parameters or scalars, in case of uniform vals.		// Holds vector parameters or scalars, in case of uniform vals.
SmallVector<VectorParts, 4> Params;		SmallVector<VectorParts, 4> Params;

setDebugLocFromInst(Builder, Instr);		setDebugLocFromInst(Builder, Instr);

// Does this instruction return a value ?		// Does this instruction return a value ?
bool IsVoidRetTy = Instr->getType()->isVoidTy();		bool IsVoidRetTy = Instr->getType()->isVoidTy();

// Initialize a new scalar map entry.		// Initialize a new scalar map entry.
ScalarParts Entry(UF);		ScalarParts Entry(UF);

VectorParts Cond;		// If the instruction requires predication, emit the block-in mask for its
if (IfPredicateInstr)		// parent block. The mask will be stored in BlockInCache and made available
Cond = createBlockInMask(Instr->getParent());		// for reuse (e.g., when performing the actual predication).
		if (Legal->isScalarWithPredication(Instr))
		createBlockInMask(Instr->getParent());

// Determine the number of scalars we need to generate for each unroll		// Determine the number of scalars we need to generate for each unroll
// iteration. If the instruction is uniform, we only need to generate the		// iteration. If the instruction is uniform, we only need to generate the
// first lane. Otherwise, we generate all VF values.		// first lane. Otherwise, we generate all VF values.
unsigned Lanes = Legal->isUniformAfterVectorization(Instr) ? 1 : VF;		unsigned Lanes = Legal->isUniformAfterVectorization(Instr) ? 1 : VF;

// For each vector unroll 'part':		// For each vector unroll 'part':
for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
Entry[Part].resize(VF);		Entry[Part].resize(VF);
// For each scalar that we create:		// For each scalar that we create:
for (unsigned Lane = 0; Lane < Lanes; ++Lane) {		for (unsigned Lane = 0; Lane < Lanes; ++Lane) {

// Start if-block.
Value *Cmp = nullptr;
if (IfPredicateInstr) {
Cmp = Builder.CreateExtractElement(Cond[Part], Builder.getInt32(Lane));
Cmp = Builder.CreateICmp(ICmpInst::ICMP_EQ, Cmp,
ConstantInt::get(Cmp->getType(), 1));
}

Instruction *Cloned = Instr->clone();		Instruction *Cloned = Instr->clone();
if (!IsVoidRetTy)		if (!IsVoidRetTy)
Cloned->setName(Instr->getName() + ".cloned");		Cloned->setName(Instr->getName() + ".cloned");

// Replace the operands of the cloned instructions with their scalar		// Replace the operands of the cloned instructions with their scalar
// equivalents in the new loop.		// equivalents in the new loop.
for (unsigned op = 0, e = Instr->getNumOperands(); op != e; ++op) {		for (unsigned op = 0, e = Instr->getNumOperands(); op != e; ++op) {
auto *NewOp = getScalarValue(Instr->getOperand(op), Part, Lane);		auto *NewOp = getScalarValue(Instr->getOperand(op), Part, Lane);
Cloned->setOperand(op, NewOp);		Cloned->setOperand(op, NewOp);
}		}
addNewMetadata(Cloned, Instr);		addNewMetadata(Cloned, Instr);

// Place the cloned scalar in the new loop.		// Place the cloned scalar in the new loop.
Builder.Insert(Cloned);		Builder.Insert(Cloned);

// Add the cloned scalar to the scalar map entry.		// Add the cloned scalar to the scalar map entry.
Entry[Part][Lane] = Cloned;		Entry[Part][Lane] = Cloned;

// If we just cloned a new assumption, add it the assumption cache.		// If we just cloned a new assumption, add it the assumption cache.
if (auto *II = dyn_cast<IntrinsicInst>(Cloned))		if (auto *II = dyn_cast<IntrinsicInst>(Cloned))
if (II->getIntrinsicID() == Intrinsic::assume)		if (II->getIntrinsicID() == Intrinsic::assume)
AC->registerAssumption(II);		AC->registerAssumption(II);

// End if-block.
if (IfPredicateInstr)
PredicatedInstructions.push_back(std::make_pair(Cloned, Cmp));
}		}
}		}
VectorLoopValueMap.initScalar(Instr, Entry);		VectorLoopValueMap.initScalar(Instr, Entry);
}		}

PHINode InnerLoopVectorizer::createInductionVariable(Loop L, Value *Start,		PHINode InnerLoopVectorizer::createInductionVariable(Loop L, Value *Start,
Value End, Value Step,		Value End, Value Step,
Instruction *DL) {		Instruction *DL) {
▲ Show 20 Lines • Show All 1,228 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::sinkScalarOperands(Instruction *PredInst) {

// Initialize a worklist with the operands of the predicated instruction.		// Initialize a worklist with the operands of the predicated instruction.
SetVector<Value *> Worklist(PredInst->op_begin(), PredInst->op_end());		SetVector<Value *> Worklist(PredInst->op_begin(), PredInst->op_end());

// Holds instructions that we need to analyze again. An instruction may be		// Holds instructions that we need to analyze again. An instruction may be
// reanalyzed if we don't yet know if we can sink it or not.		// reanalyzed if we don't yet know if we can sink it or not.
SmallVector<Instruction *, 8> InstsToReanalyze;		SmallVector<Instruction *, 8> InstsToReanalyze;

		// The location where an instruction will be sunk. This location is updated
		// whenever we sink a new instruction.
		Instruction *InsertPoint = PredInst;

// Returns true if a given use occurs in the predicated block. Phi nodes use		// Returns true if a given use occurs in the predicated block. Phi nodes use
// their operands in their corresponding predecessor blocks.		// their operands in their corresponding predecessor blocks.
auto isBlockOfUsePredicated = [&](Use &U) -> bool {		auto isBlockOfUsePredicated = [&](Use &U) -> bool {
auto *I = cast<Instruction>(U.getUser());		auto *I = cast<Instruction>(U.getUser());
BasicBlock *BB = I->getParent();		BasicBlock *BB = I->getParent();
if (auto *Phi = dyn_cast<PHINode>(I))		if (auto *Phi = dyn_cast<PHINode>(I))
BB = Phi->getIncomingBlock(		BB = Phi->getIncomingBlock(
PHINode::getIncomingValueNumForOperand(U.getOperandNo()));		PHINode::getIncomingValueNumForOperand(U.getOperandNo()));
Show All 25 Lines	while (!Worklist.empty()) {
// It's legal to sink the instruction if all its uses occur in the		// It's legal to sink the instruction if all its uses occur in the
// predicated block. Otherwise, there's nothing to do yet, and we may		// predicated block. Otherwise, there's nothing to do yet, and we may
// need to reanalyze the instruction.		// need to reanalyze the instruction.
if (!all_of(I->uses(), isBlockOfUsePredicated)) {		if (!all_of(I->uses(), isBlockOfUsePredicated)) {
InstsToReanalyze.push_back(I);		InstsToReanalyze.push_back(I);
continue;		continue;
}		}

// Move the instruction to the beginning of the predicated block, and add		// Move the instruction to the insert point, and add it's operands to the
// it's operands to the worklist.		// worklist. We update the insert point to be the newly sunk instruction.
I->moveBefore(&*PredBB->getFirstInsertionPt());		I->moveBefore(InsertPoint);
		InsertPoint = I;
Worklist.insert(I->op_begin(), I->op_end());		Worklist.insert(I->op_begin(), I->op_end());

// The sinking may have enabled other instructions to be sunk, so we will		// The sinking may have enabled other instructions to be sunk, so we will
// need to iterate.		// need to iterate.
Changed = true;		Changed = true;
}		}
} while (Changed);		} while (Changed);
}		}

void InnerLoopVectorizer::predicateInstructions() {		void InnerLoopVectorizer::predicateInstructions(
		mkuperUnsubmitted Done Reply Inline Actions This changes now, right? mkuper: This changes now, right?
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions That's right; I missed this. I'll update this comment as well. mssimpso: That's right; I missed this. I'll update this comment as well.
		ArrayRef<Instruction > PredInsts, Value Cmp) {

// For each instruction I marked for predication on value C, split I into its		// Predicate all instructions in PredInsts on value Cmp by placing the
// own basic block to form an if-then construct over C. Since I may be fed by		// instructions into a single, newly created basic block, forming an if-then
// an extractelement instruction or other scalar operand, we try to		// construction over Cmp. The instructions in PredInsts are assumed to be
// iteratively sink its scalar operands into the predicated block. If I feeds		// scalarized instructions from the vector loop corresponding to the same
// an insertelement instruction, we try to move this instruction into the		// unroll part and vector lane. Since the instructions may be fed by
// predicated block as well. For non-void types, a phi node will be created		// extractelement instructions or other scalar operands, we try to
// for the resulting value (either vector or scalar).		// iteratively sink these scalar operands into the predicated block. If an
		// instruction feeds an insertelement instruction, we try to move this
		// instruction into the predicated block as well. For non-void types, a phi
		// node will be created for the resulting value (either vector or scalar).
//		//
// So for some predicated instruction, e.g. the conditional sdiv in:		// So for some predicated instruction, e.g. the conditional sdiv in:
//		//
// for.body:		// for.body:
// ...		// ...
// %add = add nsw i32 %mul, %0		// %add = add nsw i32 %mul, %0
// %cmp5 = icmp sgt i32 %2, 7		// %cmp5 = icmp sgt i32 %2, 7
// br i1 %cmp5, label %if.then, label %if.end		// br i1 %cmp5, label %if.then, label %if.end
Show All 26 Lines	void InnerLoopVectorizer::predicateInstructions(
// side-effects by the sdiv instructions on the inactive elements, yielding		// side-effects by the sdiv instructions on the inactive elements, yielding
// (after cleanup):		// (after cleanup):
//		//
// vector.body:		// vector.body:
// ...		// ...
// %5 = add nsw <2 x i32> %4, %wide.load		// %5 = add nsw <2 x i32> %4, %wide.load
// %8 = icmp sgt <2 x i32> %wide.load52, <i32 7, i32 7>		// %8 = icmp sgt <2 x i32> %wide.load52, <i32 7, i32 7>
// %9 = extractelement <2 x i1> %8, i32 0		// %9 = extractelement <2 x i1> %8, i32 0
// br i1 %9, label %pred.sdiv.if, label %pred.sdiv.continue		// br i1 %9, label %pred.if, label %pred.continue
//		//
// pred.sdiv.if:		// pred.if:
// %10 = extractelement <2 x i32> %wide.load, i32 0		// %10 = extractelement <2 x i32> %wide.load, i32 0
// %11 = extractelement <2 x i32> %wide.load51, i32 0		// %11 = extractelement <2 x i32> %wide.load51, i32 0
// %12 = sdiv i32 %10, %11		// %12 = sdiv i32 %10, %11
// %13 = insertelement <2 x i32> undef, i32 %12, i32 0		// %13 = insertelement <2 x i32> undef, i32 %12, i32 0
// br label %pred.sdiv.continue		// br label %pred.continue
//		//
// pred.sdiv.continue:		// pred.continue:
// %14 = phi <2 x i32> [ undef, %vector.body ], [ %13, %pred.sdiv.if ]		// %14 = phi <2 x i32> [ undef, %vector.body ], [ %13, %pred.if ]
// %15 = extractelement <2 x i1> %8, i32 1		// %15 = extractelement <2 x i1> %8, i32 1
// br i1 %15, label %pred.sdiv.if54, label %pred.sdiv.continue55		// br i1 %15, label %pred.if54, label %pred.continue55
//		//
// pred.sdiv.if54:		// pred.if54:
// %16 = extractelement <2 x i32> %wide.load, i32 1		// %16 = extractelement <2 x i32> %wide.load, i32 1
// %17 = extractelement <2 x i32> %wide.load51, i32 1		// %17 = extractelement <2 x i32> %wide.load51, i32 1
// %18 = sdiv i32 %16, %17		// %18 = sdiv i32 %16, %17
// %19 = insertelement <2 x i32> %14, i32 %18, i32 1		// %19 = insertelement <2 x i32> %14, i32 %18, i32 1
// br label %pred.sdiv.continue55		// br label %pred.continue55
//		//
// pred.sdiv.continue55:		// pred.continue55:
// %20 = phi <2 x i32> [ %14, %pred.sdiv.continue ], [ %19, %pred.sdiv.if54 ]		// %20 = phi <2 x i32> [ %14, %pred.continue ], [ %19, %pred.if54 ]
// %predphi = select <2 x i1> %8, <2 x i32> %20, <2 x i32> %5		// %predphi = select <2 x i1> %8, <2 x i32> %20, <2 x i32> %5

for (auto KV : PredicatedInstructions) {		BasicBlock::iterator Front(PredInsts.front());
BasicBlock::iterator I(KV.first);		BasicBlock *Head = Front->getParent();
BasicBlock *Head = I->getParent();		BasicBlock BB = SplitBlock(Head, &std::next(Front), DT, LI);
auto BB = SplitBlock(Head, &std::next(I), DT, LI);		TerminatorInst *T = SplitBlockAndInsertIfThen(
auto T = SplitBlockAndInsertIfThen(KV.second, &I, /Unreachable=/false,		Cmp, &Front, /Unreachable=/false, /BranchWeights=*/nullptr, DT, LI);
/BranchWeights=/nullptr, DT, LI);		T->getParent()->setName("pred.if");
		BB->setName("pred.continue");

		// Holds instructions whose uses will need to be replaced by the phi nodes we
		// create. We maintain a vector of these pairs so we can perform the
		// replacements after all instructions have been predicated and sunk.
		SmallVector<std::pair<Instruction , PHINode >, 4> Replacements;

		for (Instruction *I : PredInsts) {
I->moveBefore(T);		I->moveBefore(T);
sinkScalarOperands(&*I);		sinkScalarOperands(&*I);

I->getParent()->setName(Twine("pred.") + I->getOpcodeName() + ".if");
BB->setName(Twine("pred.") + I->getOpcodeName() + ".continue");

// If the instruction is non-void create a Phi node at reconvergence point.		// If the instruction is non-void create a Phi node at reconvergence point.
if (!I->getType()->isVoidTy()) {		if (!I->getType()->isVoidTy()) {
Value *IncomingTrue = nullptr;		Instruction *IncomingTrue = nullptr;
Value *IncomingFalse = nullptr;		Value *IncomingFalse = nullptr;

if (I->hasOneUse() && isa<InsertElementInst>(*I->user_begin())) {		if (I->hasOneUse() && isa<InsertElementInst>(*I->user_begin())) {
// If the predicated instruction is feeding an insert-element, move it		// If the predicated instruction is feeding an insert-element, move it
// into the Then block; Phi node will be created for the vector.		// into the Then block; Phi node will be created for the vector.
InsertElementInst IEI = cast<InsertElementInst>(I->user_begin());		InsertElementInst IEI = cast<InsertElementInst>(I->user_begin());
IEI->moveBefore(T);		IEI->moveBefore(T);
IncomingTrue = IEI; // the new vector with the inserted element.		IncomingTrue = IEI; // the new vector with the inserted element.
IncomingFalse = IEI->getOperand(0); // the unmodified vector		IncomingFalse = IEI->getOperand(0); // the unmodified vector
} else {		} else {
// Phi node will be created for the scalar predicated instruction.		// Phi node will be created for the scalar predicated instruction.
IncomingTrue = &*I;		IncomingTrue = &*I;
IncomingFalse = UndefValue::get(I->getType());		IncomingFalse = UndefValue::get(I->getType());
}		}

BasicBlock *PostDom = I->getParent()->getSingleSuccessor();		BasicBlock *PostDom = I->getParent()->getSingleSuccessor();
assert(PostDom && "Then block has multiple successors");		assert(PostDom && "Then block has multiple successors");
PHINode *Phi =		PHINode *Phi =
PHINode::Create(IncomingTrue->getType(), 2, "", &PostDom->front());		PHINode::Create(IncomingTrue->getType(), 2, "", &PostDom->front());
IncomingTrue->replaceAllUsesWith(Phi);
Phi->addIncoming(IncomingFalse, Head);		Phi->addIncoming(IncomingFalse, Head);
Phi->addIncoming(IncomingTrue, I->getParent());		Phi->addIncoming(IncomingTrue, I->getParent());
		Replacements.push_back(std::make_pair(IncomingTrue, Phi));
		}
}		}

		// Replace all uses of the predicated instruction (or insertelement
		// instruction) with the new phi node we created for it. We ignore uses in
		// the same basic block and the use by the phi node itself.
		for (std::pair<Instruction , PHINode > &R : Replacements)
		for (User *U : R.first->users()) {
		if (auto *I = dyn_cast<Instruction>(U))
		if (I == R.second \|\| R.first->getParent() == I->getParent())
		continue;
		U->replaceUsesOfWith(R.first, R.second);
}		}

DEBUG(DT->verifyDomTree());		DEBUG(DT->verifyDomTree());
}		}

		void InnerLoopVectorizer::predicateInstructions() {
		for (BasicBlock *BB : OrigLoop->blocks()) {
		if (!Legal->blockNeedsPredication(BB))
		continue;

		// Collect the instructions in the original loop whose corresponding
		// instructions in the vector loop must be predicated.
		SmallVector<Instruction *, 4> ScalarLoopPredInsts;
		for (Instruction &I : *BB)
		if (Legal->isScalarWithPredication(&I)) {
		assert(!Legal->isUniformAfterVectorization(&I) &&
		"Uniform after vectorization instruction requires predication");
		DEBUG(dbgs() << "LV: Predicating: " << I << '\n');
		ScalarLoopPredInsts.push_back(&I);
		}
		if (ScalarLoopPredInsts.empty())
		continue;

		// Set the insert point to the first instruction that requires predication.
		// Note that the instruction must have been scalarized when vectorizing the
		// loop since it requires predication.
		Builder.SetInsertPoint(
		cast<Instruction>(getScalarValue(ScalarLoopPredInsts.front(), 0, 0)));

		// Create the block mask. We do this once for all the instructions in the
		// block.
		VectorParts Cond = createBlockInMask(BB);

		// We're going to create a single block corresponding to each of the VF x
		// UF iterations of the original loop.
		for (unsigned Part = 0; Part < UF; ++Part)
		for (unsigned Lane = 0; Lane < VF; ++Lane) {

		// Collect the instructions in the vector loop for this lane and part
		// corresponding to each instruction in ScalarLoopPredInsts.
		mkuperUnsubmitted Done Reply Inline Actions Do we generally prefer this to defining the vector inside the loop, from a performance standpoint? (The latter would be clearer, I think). mkuper: Do we generally prefer this to defining the vector inside the loop, from a performance…
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions Not that I'm aware of. I'm happy to move the vector definition inside the loop. mssimpso: Not that I'm aware of. I'm happy to move the vector definition inside the loop.
		SmallVector<Instruction *, 4> VectorLoopPredInsts;
		for (Instruction *I : ScalarLoopPredInsts)
		mkuperUnsubmitted Not Done Reply Inline Actions What happens in the "Lane == 0 && Legal->isUniformAfterVectorization(I)" case? I'm having a bit of trouble imagining what the code ends up looking. mkuper: What happens in the "Lane == 0 && Legal->isUniformAfterVectorization(I)" case? I'm having a bit…
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions This loop is collecting in VectorLoopPredInsts the scalarized instructions we produced in scalarizeInstruction. It then predicates all the scalarized instructions for a given unroll part and vector lane. If an instruction is uniform-after-vectorization we only generate values for Lane zero during scalarization, and getScalarValue asserts if we try to get a value for a Lane > 0. However, as I'm thinking about this, I don't think we can ever end up with an instruction that requires predication that is also marked uniform-after-vectorization. Unless I'm missing something, we should probably just remove this if condition. What do you think? mssimpso: This loop is collecting in VectorLoopPredInsts the scalarized instructions we produced in…
		mkuperUnsubmitted Done Reply Inline Actions However, as I'm thinking about this, I don't think we can ever end up with an instruction that requires predication that is also marked uniform-after-vectorization. Unless I'm missing something, we should probably just remove this if condition. What do you think? Yes, that's why I was having trouble imagining it. Couldn't figure out what a uniform-after-vectorization predicated instruction looks like. I think we should remove this, and have an assert somewhere to make sure it really doesn't happen. (If we find out that does happen, I guess it's some edge-case we're not thinking about, so I'm not sure this code would do the right thing anyway.) mkuper: > However, as I'm thinking about this, I don't think we can ever end up with an instruction…
		VectorLoopPredInsts.push_back(
		cast<Instruction>(getScalarValue(I, Part, Lane)));

		// Set the insert point to the first instruction that requires
		// predication for this lane and part.
		Builder.SetInsertPoint(VectorLoopPredInsts.front());

		// Get the block mask value corresponding to this lane and part.
		Value *Cmp = Cond[Part];
		if (VF > 1)
		Cmp =
		Builder.CreateExtractElement(Cond[Part], Builder.getInt32(Lane));
		Cmp = Builder.CreateICmp(ICmpInst::ICMP_EQ, Cmp,
		ConstantInt::get(Cmp->getType(), 1));

		// Predicate the instructions in VectorLoopPredInsts with Cmp.
		predicateInstructions(VectorLoopPredInsts, Cmp);
		}
		}
		}

InnerLoopVectorizer::VectorParts		InnerLoopVectorizer::VectorParts
InnerLoopVectorizer::createEdgeMask(BasicBlock Src, BasicBlock Dst) {		InnerLoopVectorizer::createEdgeMask(BasicBlock Src, BasicBlock Dst) {
assert(is_contained(predecessors(Dst), Src) && "Invalid edge");		assert(is_contained(predecessors(Dst), Src) && "Invalid edge");

// Look for cached value.		// Look for cached value.
std::pair<BasicBlock , BasicBlock > Edge(Src, Dst);		std::pair<BasicBlock , BasicBlock > Edge(Src, Dst);
EdgeMaskCache::iterator ECEntryIt = MaskCache.find(Edge);		EdgeMaskCache::iterator ECEntryIt = MaskCache.find(Edge);
if (ECEntryIt != MaskCache.end())		if (ECEntryIt != MaskCache.end())
Show All 22 Lines	InnerLoopVectorizer::createEdgeMask(BasicBlock Src, BasicBlock Dst) {
MaskCache[Edge] = SrcMask;		MaskCache[Edge] = SrcMask;
return SrcMask;		return SrcMask;
}		}

InnerLoopVectorizer::VectorParts		InnerLoopVectorizer::VectorParts
InnerLoopVectorizer::createBlockInMask(BasicBlock *BB) {		InnerLoopVectorizer::createBlockInMask(BasicBlock *BB) {
assert(OrigLoop->contains(BB) && "Block is not a part of a loop");		assert(OrigLoop->contains(BB) && "Block is not a part of a loop");

		// If the block-in mask for this basic block is cached, return it.
		auto BICEntryIt = BlockInCache.find(BB);
		if (BICEntryIt != BlockInCache.end())
		return BICEntryIt->second;

// Loop incoming mask is all-one.		// Loop incoming mask is all-one.
if (OrigLoop->getHeader() == BB) {		if (OrigLoop->getHeader() == BB) {
Value *C = ConstantInt::get(IntegerType::getInt1Ty(BB->getContext()), 1);		Value *C = ConstantInt::get(IntegerType::getInt1Ty(BB->getContext()), 1);
return getVectorValue(C);		VectorParts BlockMask = getVectorValue(C);
		BlockInCache[BB] = BlockMask;
		return BlockMask;
}		}

// This is the block mask. We OR all incoming edges, and with zero.		// This is the block mask. We OR all incoming edges, and with zero.
Value *Zero = ConstantInt::get(IntegerType::getInt1Ty(BB->getContext()), 0);		Value *Zero = ConstantInt::get(IntegerType::getInt1Ty(BB->getContext()), 0);
VectorParts BlockMask = getVectorValue(Zero);		VectorParts BlockMask = getVectorValue(Zero);

// For each pred:		// For each pred:
for (pred_iterator it = pred_begin(BB), e = pred_end(BB); it != e; ++it) {		for (pred_iterator it = pred_begin(BB), e = pred_end(BB); it != e; ++it) {
VectorParts EM = createEdgeMask(*it, BB);		VectorParts EM = createEdgeMask(*it, BB);
for (unsigned part = 0; part < UF; ++part)		for (unsigned part = 0; part < UF; ++part)
BlockMask[part] = Builder.CreateOr(BlockMask[part], EM[part]);		BlockMask[part] = Builder.CreateOr(BlockMask[part], EM[part]);
}		}

		BlockInCache[BB] = BlockMask;
return BlockMask;		return BlockMask;
}		}

void InnerLoopVectorizer::widenPHIInstruction(Instruction *PN, unsigned UF,		void InnerLoopVectorizer::widenPHIInstruction(Instruction *PN, unsigned UF,
unsigned VF, PhiVector *PV) {		unsigned VF, PhiVector *PV) {
PHINode *P = cast<PHINode>(PN);		PHINode *P = cast<PHINode>(PN);
// Handle recurrences.		// Handle recurrences.
if (Legal->isReductionVariable(P) \|\| Legal->isFirstOrderRecurrence(P)) {		if (Legal->isReductionVariable(P) \|\| Legal->isFirstOrderRecurrence(P)) {
▲ Show 20 Lines • Show All 163 Lines • ▼ Show 20 Lines	for (Instruction &I : *BB) {

case Instruction::UDiv:		case Instruction::UDiv:
case Instruction::SDiv:		case Instruction::SDiv:
case Instruction::SRem:		case Instruction::SRem:
case Instruction::URem:		case Instruction::URem:
// Scalarize with predication if this instruction may divide by zero and		// Scalarize with predication if this instruction may divide by zero and
// block execution is conditional, otherwise fallthrough.		// block execution is conditional, otherwise fallthrough.
if (Legal->isScalarWithPredication(&I)) {		if (Legal->isScalarWithPredication(&I)) {
scalarizeInstruction(&I, true);		scalarizeInstruction(&I);
continue;		continue;
}		}
case Instruction::Add:		case Instruction::Add:
case Instruction::FAdd:		case Instruction::FAdd:
case Instruction::Sub:		case Instruction::Sub:
case Instruction::FSub:		case Instruction::FSub:
case Instruction::Mul:		case Instruction::Mul:
case Instruction::FMul:		case Instruction::FMul:
▲ Show 20 Lines • Show All 2,324 Lines • ▼ Show 20 Lines	void LoopVectorizationCostModel::collectValuesToIgnore() {

// Insert values known to be scalar into VecValuesToIgnore.		// Insert values known to be scalar into VecValuesToIgnore.
for (auto *BB : TheLoop->getBlocks())		for (auto *BB : TheLoop->getBlocks())
for (auto &I : *BB)		for (auto &I : *BB)
if (Legal->isScalarAfterVectorization(&I))		if (Legal->isScalarAfterVectorization(&I))
VecValuesToIgnore.insert(&I);		VecValuesToIgnore.insert(&I);
}		}

void InnerLoopUnroller::scalarizeInstruction(Instruction *Instr,		void InnerLoopUnroller::scalarizeInstruction(Instruction *Instr) {
bool IfPredicateInstr) {
assert(!Instr->getType()->isAggregateType() && "Can't handle vectors");		assert(!Instr->getType()->isAggregateType() && "Can't handle vectors");
// Holds vector parameters or scalars, in case of uniform vals.		// Holds vector parameters or scalars, in case of uniform vals.
SmallVector<VectorParts, 4> Params;		SmallVector<VectorParts, 4> Params;

setDebugLocFromInst(Builder, Instr);		setDebugLocFromInst(Builder, Instr);

// Does this instruction return a value ?		// Does this instruction return a value ?
bool IsVoidRetTy = Instr->getType()->isVoidTy();		bool IsVoidRetTy = Instr->getType()->isVoidTy();

// Initialize a new scalar map entry.		// Initialize a new scalar map entry.
ScalarParts Entry(UF);		ScalarParts Entry(UF);

VectorParts Cond;		// If the instruction requires predication, emit the block-in mask for its
if (IfPredicateInstr)		// parent block. The mask will be stored in BlockInCache and made available
Cond = createBlockInMask(Instr->getParent());		// for reuse (e.g., when performing the actual predication).
		if (Legal->isScalarWithPredication(Instr))
		createBlockInMask(Instr->getParent());
		gilrUnsubmitted Not Done Reply Inline Actions Why force the creation of the edge masks here? gilr: Why force the creation of the edge masks here?
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions We have to create the edge masks in program order because they are cached in MaskCache. PHI widening, for example, also calls createEdgeMask. During PHI widening, if a mask isn't already in the cache, we will produce it at that time. But then when we go back to perform the actual predication, we would reuse the masks we created for the PHIs. But these masks may not dominate the branches we create for the predicated blocks, so we have to produce the masks in order. The existing tests in if-pred-non-void.ll require this. mssimpso: We have to create the edge masks in program order because they are cached in MaskCache. PHI…
		gilrUnsubmitted Done Reply Inline Actions Ahhh ... right. This seems somewhat unclean, though: the patch moves block mask creation logic from scalarization to predication (which makes sense), but must leave some of it behind for caching reasons. So as long as we're generating masks in program order for caching reasons, could the original createBlockInMask call be retained if the same caching was added for block masks? This would make it clear that it's the block mask we need generated here, avoid partly-duplicating its code and make the masks caching behavior more consistent (more of a getOrCreate API). gilr: Ahhh ... right. This seems somewhat unclean, though: the patch moves block mask creation logic…
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions Sure, we can do that. I'll add a cache for the block-in masks and update the patch. Thanks! mssimpso: Sure, we can do that. I'll add a cache for the block-in masks and update the patch. Thanks!

// For each vector unroll 'part':		// For each vector unroll 'part':
for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
Entry[Part].resize(1);		Entry[Part].resize(1);
// For each scalar that we create:		// For each scalar that we create:

// Start an "if (pred) a[i] = ..." block.
Value *Cmp = nullptr;
if (IfPredicateInstr) {
if (Cond[Part]->getType()->isVectorTy())
Cond[Part] =
Builder.CreateExtractElement(Cond[Part], Builder.getInt32(0));
Cmp = Builder.CreateICmp(ICmpInst::ICMP_EQ, Cond[Part],
ConstantInt::get(Cond[Part]->getType(), 1));
}

Instruction *Cloned = Instr->clone();		Instruction *Cloned = Instr->clone();
if (!IsVoidRetTy)		if (!IsVoidRetTy)
Cloned->setName(Instr->getName() + ".cloned");		Cloned->setName(Instr->getName() + ".cloned");

// Replace the operands of the cloned instructions with their scalar		// Replace the operands of the cloned instructions with their scalar
// equivalents in the new loop.		// equivalents in the new loop.
for (unsigned op = 0, e = Instr->getNumOperands(); op != e; ++op) {		for (unsigned op = 0, e = Instr->getNumOperands(); op != e; ++op) {
auto *NewOp = getScalarValue(Instr->getOperand(op), Part, 0);		auto *NewOp = getScalarValue(Instr->getOperand(op), Part, 0);
Cloned->setOperand(op, NewOp);		Cloned->setOperand(op, NewOp);
}		}

// Place the cloned scalar in the new loop.		// Place the cloned scalar in the new loop.
Builder.Insert(Cloned);		Builder.Insert(Cloned);

// Add the cloned scalar to the scalar map entry.		// Add the cloned scalar to the scalar map entry.
Entry[Part][0] = Cloned;		Entry[Part][0] = Cloned;

// If we just cloned a new assumption, add it the assumption cache.		// If we just cloned a new assumption, add it the assumption cache.
if (auto *II = dyn_cast<IntrinsicInst>(Cloned))		if (auto *II = dyn_cast<IntrinsicInst>(Cloned))
if (II->getIntrinsicID() == Intrinsic::assume)		if (II->getIntrinsicID() == Intrinsic::assume)
AC->registerAssumption(II);		AC->registerAssumption(II);

// End if-block.
if (IfPredicateInstr)
PredicatedInstructions.push_back(std::make_pair(Cloned, Cmp));
}		}
VectorLoopValueMap.initScalar(Instr, Entry);		VectorLoopValueMap.initScalar(Instr, Entry);
}		}

void InnerLoopUnroller::vectorizeMemoryInstruction(Instruction *Instr) {		void InnerLoopUnroller::vectorizeMemoryInstruction(Instruction *Instr) {
auto *SI = dyn_cast<StoreInst>(Instr);		return scalarizeInstruction(Instr);
bool IfPredicateInstr = (SI && Legal->blockNeedsPredication(SI->getParent()));

return scalarizeInstruction(Instr, IfPredicateInstr);
}		}

Value InnerLoopUnroller::reverseVector(Value Vec) { return Vec; }		Value InnerLoopUnroller::reverseVector(Value Vec) { return Vec; }

Value InnerLoopUnroller::getBroadcastInstrs(Value V) { return V; }		Value InnerLoopUnroller::getBroadcastInstrs(Value V) { return V; }

Value InnerLoopUnroller::getStepVector(Value Val, int StartIdx, Value *Step,		Value InnerLoopUnroller::getStepVector(Value Val, int StartIdx, Value *Step,
Instruction::BinaryOps BinOp) {		Instruction::BinaryOps BinOp) {
▲ Show 20 Lines • Show All 375 Lines • Show Last 20 Lines

test/Transforms/LoopVectorize/AArch64/predication_costs.ll

	Show All 13 Lines
	; This test checks that we correctly compute the cost of the predicated udiv			; This test checks that we correctly compute the cost of the predicated udiv
	; instruction. If we assume the block probability is 50%, we compute the cost			; instruction. If we assume the block probability is 50%, we compute the cost
	; as:			; as:
	;			;
	; Cost of udiv:			; Cost of udiv:
	; (udiv(2) + extractelement(6) + insertelement(3)) / 2 = 5			; (udiv(2) + extractelement(6) + insertelement(3)) / 2 = 5
	;			;
	; CHECK: Found an estimated cost of 5 for VF 2 For instruction: %tmp4 = udiv i32 %tmp2, %tmp3			; CHECK: Found an estimated cost of 5 for VF 2 For instruction: %tmp4 = udiv i32 %tmp2, %tmp3
	; CHECK: Scalarizing and predicating: %tmp4 = udiv i32 %tmp2, %tmp3			; CHECK: Scalarizing: %tmp4 = udiv i32 %tmp2, %tmp3
				; CHECK: Predicating: %tmp4 = udiv i32 %tmp2, %tmp3
	;			;
	define i32 @predicated_udiv(i32* %a, i32* %b, i1 %c, i64 %n) {			define i32 @predicated_udiv(i32* %a, i32* %b, i1 %c, i64 %n) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.inc ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.inc ]
	%r = phi i32 [ 0, %entry ], [ %tmp6, %for.inc ]			%r = phi i32 [ 0, %entry ], [ %tmp6, %for.inc ]
	Show All 24 Lines
	; This test checks that we correctly compute the cost of the predicated store			; This test checks that we correctly compute the cost of the predicated store
	; instruction. If we assume the block probability is 50%, we compute the cost			; instruction. If we assume the block probability is 50%, we compute the cost
	; as:			; as:
	;			;
	; Cost of store:			; Cost of store:
	; (store(4) + extractelement(6)) / 2 = 5			; (store(4) + extractelement(6)) / 2 = 5
	;			;
	; CHECK: Found an estimated cost of 5 for VF 2 For instruction: store i32 %tmp2, i32* %tmp0, align 4			; CHECK: Found an estimated cost of 5 for VF 2 For instruction: store i32 %tmp2, i32* %tmp0, align 4
	; CHECK: Scalarizing and predicating: store i32 %tmp2, i32* %tmp0, align 4			; CHECK: Scalarizing: store i32 %tmp2, i32* %tmp0, align 4
				; CHECK: Predicating: store i32 %tmp2, i32* %tmp0, align 4
	;			;
	define void @predicated_store(i32* %a, i1 %c, i32 %x, i64 %n) {			define void @predicated_store(i32* %a, i1 %c, i32 %x, i64 %n) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.inc ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.inc ]
	%tmp0 = getelementptr inbounds i32, i32* %a, i64 %i			%tmp0 = getelementptr inbounds i32, i32* %a, i64 %i
	Show All 16 Lines

test/Transforms/LoopVectorize/if-pred-non-void.ll

Show All 10 Lines	define void @test(i32* nocapture %asd, i32* nocapture %aud,
i32* nocapture %asr, i32* nocapture %aur) {		i32* nocapture %asr, i32* nocapture %aur) {
entry:		entry:
br label %for.body		br label %for.body

for.cond.cleanup: ; preds = %if.end		for.cond.cleanup: ; preds = %if.end
ret void		ret void

; CHECK-LABEL: test		; CHECK-LABEL: test
; CHECK: vector.body:		; CHECK: vector.body:
; CHECK: %[[SDEE:[a-zA-Z0-9]+]] = extractelement <2 x i1> %{{.*}}, i32 0		; CHECK: %[[ADD0:.+]] = add nsw <2 x i32> %[[LOAD0:.+]], <i32 23, i32 23>
; CHECK: %[[SDCC:[a-zA-Z0-9]+]] = icmp eq i1 %[[SDEE]], true		; CHECK-NEXT: %[[ADD1:.+]] = add nsw <2 x i32> %[[LOAD1:.+]], <i32 24, i32 24>
; CHECK: br i1 %[[SDCC]], label %[[CSD:[a-zA-Z0-9.]+]], label %[[ESD:[a-zA-Z0-9.]+]]		; CHECK-NEXT: %[[ADD2:.+]] = add nsw <2 x i32> %[[LOAD2:.+]], <i32 25, i32 25>
; CHECK: [[CSD]]:		; CHECK-NEXT: %[[ADD3:.+]] = add nsw <2 x i32> %[[LOAD3:.+]], <i32 26, i32 26>
; CHECK: %[[SDA0:[a-zA-Z0-9]+]] = extractelement <2 x i32> %{{.*}}, i32 0		; CHECK: %[[COND0:.+]] = extractelement <2 x i1> %{{.*}}, i32 0
; CHECK: %[[SDA1:[a-zA-Z0-9]+]] = extractelement <2 x i32> %{{.*}}, i32 0		; CHECK-NEXT: %[[CMP0:.+]] = icmp eq i1 %[[COND0]], true
; CHECK: %[[SD0:[a-zA-Z0-9]+]] = sdiv i32 %[[SDA0]], %[[SDA1]]		; CHECK-NEXT: br i1 %[[CMP0]], label %[[IF0:.+]], label %[[CONT0:.+]]
; CHECK: %[[SD1:[a-zA-Z0-9]+]] = insertelement <2 x i32> undef, i32 %[[SD0]], i32 0		; CHECK: [[IF0]]:
; CHECK: br label %[[ESD]]		; CHECK-NEXT: %[[E0_0:.+]] = extractelement <2 x i32> %[[ADD0]], i32 0
; CHECK: [[ESD]]:		; CHECK-NEXT: %[[E0_1:.+]] = extractelement <2 x i32> %[[LOAD0]], i32 0
; CHECK: %[[SDR:[a-zA-Z0-9]+]] = phi <2 x i32> [ undef, %vector.body ], [ %[[SD1]], %[[CSD]] ]		; CHECK-NEXT: %[[SDIV0:.+]] = sdiv i32 %[[E0_0]], %[[E0_1]]
; CHECK: %[[SDEEH:[a-zA-Z0-9]+]] = extractelement <2 x i1> %{{.*}}, i32 1		; CHECK-NEXT: %[[I0_0:.+]] = insertelement <2 x i32> undef, i32 %[[SDIV0]], i32 0
; CHECK: %[[SDCCH:[a-zA-Z0-9]+]] = icmp eq i1 %[[SDEEH]], true		; CHECK-NEXT: %[[E0_2:.+]] = extractelement <2 x i32> %[[ADD1]], i32 0
; CHECK: br i1 %[[SDCCH]], label %[[CSDH:[a-zA-Z0-9.]+]], label %[[ESDH:[a-zA-Z0-9.]+]]		; CHECK-NEXT: %[[E0_3:.+]] = extractelement <2 x i32> %[[LOAD1]], i32 0
; CHECK: [[CSDH]]:		; CHECK-NEXT: %[[UDIV0:.+]] = udiv i32 %[[E0_2]], %[[E0_3]]
; CHECK: %[[SDA0H:[a-zA-Z0-9]+]] = extractelement <2 x i32> %{{.*}}, i32 1		; CHECK-NEXT: %[[I0_1:.+]] = insertelement <2 x i32> undef, i32 %[[UDIV0]], i32 0
; CHECK: %[[SDA1H:[a-zA-Z0-9]+]] = extractelement <2 x i32> %{{.*}}, i32 1		; CHECK-NEXT: %[[E0_4:.+]] = extractelement <2 x i32> %[[ADD2]], i32 0
; CHECK: %[[SD0H:[a-zA-Z0-9]+]] = sdiv i32 %[[SDA0H]], %[[SDA1H]]		; CHECK-NEXT: %[[E0_5:.+]] = extractelement <2 x i32> %[[LOAD2]], i32 0
; CHECK: %[[SD1H:[a-zA-Z0-9]+]] = insertelement <2 x i32> %[[SDR]], i32 %[[SD0H]], i32 1		; CHECK-NEXT: %[[SREM0:.+]] = srem i32 %[[E0_4]], %[[E0_5]]
; CHECK: br label %[[ESDH]]		; CHECK-NEXT: %[[I0_2:.+]] = insertelement <2 x i32> undef, i32 %[[SREM0]], i32 0
; CHECK: [[ESDH]]:		; CHECK-NEXT: %[[E0_6:.+]] = extractelement <2 x i32> %[[ADD3]], i32 0
; CHECK: %{{.*}} = phi <2 x i32> [ %[[SDR]], %[[ESD]] ], [ %[[SD1H]], %[[CSDH]] ]		; CHECK-NEXT: %[[E0_7:.+]] = extractelement <2 x i32> %[[LOAD3]], i32 0
		; CHECK-NEXT: %[[UREM0:.+]] = urem i32 %[[E0_6]], %[[E0_7]]
; CHECK: %[[UDEE:[a-zA-Z0-9]+]] = extractelement <2 x i1> %{{.*}}, i32 0		; CHECK-NEXT: %[[I0_3:.+]] = insertelement <2 x i32> undef, i32 %[[UREM0]], i32 0
; CHECK: %[[UDCC:[a-zA-Z0-9]+]] = icmp eq i1 %[[UDEE]], true		; CHECK-NEXT: br label %[[CONT0]]
; CHECK: br i1 %[[UDCC]], label %[[CUD:[a-zA-Z0-9.]+]], label %[[EUD:[a-zA-Z0-9.]+]]		; CHECK: [[CONT0]]:
; CHECK: [[CUD]]:		; CHECK-NEXT: %[[PHI3:.+]] = phi <2 x i32> [ undef, %vector.body ], [ %[[I0_3]], %[[IF0]] ]
; CHECK: %[[UDA0:[a-zA-Z0-9]+]] = extractelement <2 x i32> %{{.*}}, i32 0		; CHECK-NEXT: %[[PHI2:.+]] = phi <2 x i32> [ undef, %vector.body ], [ %[[I0_2]], %[[IF0]] ]
; CHECK: %[[UDA1:[a-zA-Z0-9]+]] = extractelement <2 x i32> %{{.*}}, i32 0		; CHECK-NEXT: %[[PHI1:.+]] = phi <2 x i32> [ undef, %vector.body ], [ %[[I0_1]], %[[IF0]] ]
; CHECK: %[[UD0:[a-zA-Z0-9]+]] = udiv i32 %[[UDA0]], %[[UDA1]]		; CHECK-NEXT: %[[PHI0:.+]] = phi <2 x i32> [ undef, %vector.body ], [ %[[I0_0]], %[[IF0]] ]
; CHECK: %[[UD1:[a-zA-Z0-9]+]] = insertelement <2 x i32> undef, i32 %[[UD0]], i32 0		; CHECK-NEXT: %[[COND1:.+]] = extractelement <2 x i1> %{{.*}}, i32 1
; CHECK: br label %[[EUD]]		; CHECK-NEXT: %[[CMP1:.+]] = icmp eq i1 %[[COND1]], true
; CHECK: [[EUD]]:		; CHECK-NEXT: br i1 %[[CMP1]], label %[[IF1:.+]], label %[[CONT1:.+]]
; CHECK: %{{.}} = phi <2 x i32> [ undef, %{{.}} ], [ %[[UD1]], %[[CUD]] ]		; CHECK: [[IF1]]:
		; CHECK-NEXT: %[[E1_0:.+]] = extractelement <2 x i32> %[[ADD0]], i32 1
; CHECK: %[[SREE:[a-zA-Z0-9]+]] = extractelement <2 x i1> %{{.*}}, i32 0		; CHECK-NEXT: %[[E1_1:.+]] = extractelement <2 x i32> %[[LOAD0]], i32 1
; CHECK: %[[SRCC:[a-zA-Z0-9]+]] = icmp eq i1 %[[SREE]], true		; CHECK-NEXT: %[[SDIV1:.+]] = sdiv i32 %[[E1_0]], %[[E1_1]]
; CHECK: br i1 %[[SRCC]], label %[[CSR:[a-zA-Z0-9.]+]], label %[[ESR:[a-zA-Z0-9.]+]]		; CHECK-NEXT: %[[I1_0:.+]] = insertelement <2 x i32> %[[PHI0]], i32 %[[SDIV1]], i32 1
; CHECK: [[CSR]]:		; CHECK-NEXT: %[[E1_2:.+]] = extractelement <2 x i32> %[[ADD1]], i32 1
; CHECK: %[[SRA0:[a-zA-Z0-9]+]] = extractelement <2 x i32> %{{.*}}, i32 0		; CHECK-NEXT: %[[E1_3:.+]] = extractelement <2 x i32> %[[LOAD1]], i32 1
; CHECK: %[[SRA1:[a-zA-Z0-9]+]] = extractelement <2 x i32> %{{.*}}, i32 0		; CHECK-NEXT: %[[UDIV1:.+]] = udiv i32 %[[E1_2]], %[[E1_3]]
; CHECK: %[[SR0:[a-zA-Z0-9]+]] = srem i32 %[[SRA0]], %[[SRA1]]		; CHECK-NEXT: %[[I1_1:.+]] = insertelement <2 x i32> %[[PHI1]], i32 %[[UDIV1]], i32 1
; CHECK: %[[SR1:[a-zA-Z0-9]+]] = insertelement <2 x i32> undef, i32 %[[SR0]], i32 0		; CHECK-NEXT: %[[E1_4:.+]] = extractelement <2 x i32> %[[ADD2]], i32 1
; CHECK: br label %[[ESR]]		; CHECK-NEXT: %[[E1_5:.+]] = extractelement <2 x i32> %[[LOAD2]], i32 1
; CHECK: [[ESR]]:		; CHECK-NEXT: %[[SREM1:.+]] = srem i32 %[[E1_4]], %[[E1_5]]
; CHECK: %{{.}} = phi <2 x i32> [ undef, %{{.}} ], [ %[[SR1]], %[[CSR]] ]		; CHECK-NEXT: %[[I1_2:.+]] = insertelement <2 x i32> %[[PHI2]], i32 %[[SREM1]], i32 1
		; CHECK-NEXT: %[[E1_6:.+]] = extractelement <2 x i32> %[[ADD3]], i32 1
; CHECK: %[[UREE:[a-zA-Z0-9]+]] = extractelement <2 x i1> %{{.*}}, i32 0		; CHECK-NEXT: %[[E1_7:.+]] = extractelement <2 x i32> %[[LOAD3]], i32 1
; CHECK: %[[URCC:[a-zA-Z0-9]+]] = icmp eq i1 %[[UREE]], true		; CHECK-NEXT: %[[UREM1:.+]] = urem i32 %[[E1_6]], %[[E1_7]]
; CHECK: br i1 %[[URCC]], label %[[CUR:[a-zA-Z0-9.]+]], label %[[EUR:[a-zA-Z0-9.]+]]		; CHECK-NEXT: %[[I1_3:.+]] = insertelement <2 x i32> %[[PHI3]], i32 %[[UREM1]], i32 1
; CHECK: [[CUR]]:		; CHECK-NEXT: br label %[[CONT1]]
; CHECK: %[[URA0:[a-zA-Z0-9]+]] = extractelement <2 x i32> %{{.*}}, i32 0		; CHECK: [[CONT1]]:
; CHECK: %[[URA1:[a-zA-Z0-9]+]] = extractelement <2 x i32> %{{.*}}, i32 0		; CHECK-NEXT: phi <2 x i32> [ %[[PHI3]], %[[CONT0]] ], [ %[[I1_3]], %[[IF1]] ]
; CHECK: %[[UR0:[a-zA-Z0-9]+]] = urem i32 %[[URA0]], %[[URA1]]		; CHECK-NEXT: phi <2 x i32> [ %[[PHI2]], %[[CONT0]] ], [ %[[I1_2]], %[[IF1]] ]
; CHECK: %[[UR1:[a-zA-Z0-9]+]] = insertelement <2 x i32> undef, i32 %[[UR0]], i32 0		; CHECK-NEXT: phi <2 x i32> [ %[[PHI1]], %[[CONT0]] ], [ %[[I1_1]], %[[IF1]] ]
; CHECK: br label %[[EUR]]		; CHECK-NEXT: phi <2 x i32> [ %[[PHI0]], %[[CONT0]] ], [ %[[I1_0]], %[[IF1]] ]
; CHECK: [[EUR]]:		; CHECK: br {{.*}} label %middle.block, label %vector.body
; CHECK: %{{.}} = phi <2 x i32> [ undef, %{{.}} ], [ %[[UR1]], %[[CUR]] ]

for.body: ; preds = %if.end, %entry		for.body: ; preds = %if.end, %entry
%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %if.end ]		%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %if.end ]
%isd = getelementptr inbounds i32, i32* %asd, i64 %indvars.iv		%isd = getelementptr inbounds i32, i32* %asd, i64 %indvars.iv
%iud = getelementptr inbounds i32, i32* %aud, i64 %indvars.iv		%iud = getelementptr inbounds i32, i32* %aud, i64 %indvars.iv
%isr = getelementptr inbounds i32, i32* %asr, i64 %indvars.iv		%isr = getelementptr inbounds i32, i32* %asr, i64 %indvars.iv
%iur = getelementptr inbounds i32, i32* %aur, i64 %indvars.iv		%iur = getelementptr inbounds i32, i32* %aur, i64 %indvars.iv
%lsd = load i32, i32* %isd, align 4		%lsd = load i32, i32* %isd, align 4
Show All 31 Lines
define void @test_scalar2scalar(i32* nocapture %asd, i32* nocapture %bsd) {		define void @test_scalar2scalar(i32* nocapture %asd, i32* nocapture %bsd) {
entry:		entry:
br label %for.body		br label %for.body

for.cond.cleanup: ; preds = %if.end		for.cond.cleanup: ; preds = %if.end
ret void		ret void

; CHECK-LABEL: test_scalar2scalar		; CHECK-LABEL: test_scalar2scalar
; CHECK: vector.body:		; CHECK: vector.body:
; CHECK: br i1 %{{.*}}, label %[[THEN:[a-zA-Z0-9.]+]], label %[[FI:[a-zA-Z0-9.]+]]		; CHECK: %[[LOAD0:.+]] = load <2 x i32>, <2 x i32>* {{.*}}, align 4
; CHECK: [[THEN]]:		; CHECK: %[[LOAD1:.+]] = load <2 x i32>, <2 x i32>* {{.*}}, align 4
; CHECK: %[[PD:[a-zA-Z0-9]+]] = sdiv i32 %{{.}}, %{{.}}		; CHECK: %[[ADD0:.+]] = add nsw <2 x i32> %[[LOAD0]], <i32 23, i32 23>
; CHECK: br label %[[FI]]		; CHECK: %[[COND0:.+]] = extractelement <2 x i1> %{{.*}}, i32 0
; CHECK: [[FI]]:		; CHECK-NEXT: %[[CMP0:.+]] = icmp eq i1 %[[COND0]], true
; CHECK: %{{.*}} = phi i32 [ undef, %vector.body ], [ %[[PD]], %[[THEN]] ]		; CHECK-NEXT: br i1 %[[CMP0]], label %[[IF0:.+]], label %[[CONT0:.+]]
		; CHECK: [[IF0]]:
		; CHECK-NEXT: %[[E0_0:.+]] = extractelement <2 x i32> %[[ADD0]], i32 0
		; CHECK-NEXT: %[[E0_1:.+]] = extractelement <2 x i32> %[[LOAD0]], i32 0
		; CHECK-NEXT: %[[SDIV0_0:.+]] = sdiv i32 %[[E0_0]], %[[E0_1]]
		; CHECK-NEXT: %[[E0_2:.+]] = extractelement <2 x i32> %[[LOAD1]], i32 0
		; CHECK-NEXT: %[[SDIV0_1:.+]] = sdiv i32 %[[E0_2]], %[[SDIV0_0]]
		; CHECK-NEXT: %[[I0:.+]] = insertelement <2 x i32> undef, i32 %[[SDIV0_1]], i32 0
		; CHECK-NEXT: br label %[[CONT0]]
		; CHECK: [[CONT0]]:
		; CHECK-NEXT: %[[PHI:.+]] = phi <2 x i32> [ undef, %vector.body ], [ %[[I0]], %[[IF0]] ]
		; CHECK-NEXT: phi i32 [ undef, %vector.body ], [ %[[SDIV0_0]], %[[IF0]] ]
		; CHECK-NEXT: %[[COND1:.+]] = extractelement <2 x i1> %{{.*}}, i32 1
		; CHECK-NEXT: %[[CMP1:.+]] = icmp eq i1 %[[COND1]], true
		; CHECK-NEXT: br i1 %[[CMP1]], label %[[IF1:.+]], label %[[CONT1:.+]]
		; CHECK: [[IF1]]:
		; CHECK-NEXT: %[[E1_0:.+]] = extractelement <2 x i32> %[[ADD0]], i32 1
		; CHECK-NEXT: %[[E1_1:.+]] = extractelement <2 x i32> %[[LOAD0]], i32 1
		; CHECK-NEXT: %[[SDIV1_0:.+]] = sdiv i32 %[[E1_0]], %[[E1_1]]
		; CHECK-NEXT: %[[E1_2:.+]] = extractelement <2 x i32> %[[LOAD1]], i32 1
		; CHECK-NEXT: %[[SDIV1_1:.+]] = sdiv i32 %[[E1_2]], %[[SDIV1_0]]
		; CHECK-NEXT: %[[I1:.+]] = insertelement <2 x i32> %[[PHI]], i32 %[[SDIV1_1]], i32 1
		; CHECK-NEXT: br label %[[CONT1]]
		; CHECK: [[CONT1]]:
		; CHECK-NEXT: phi <2 x i32> [ %[[PHI]], %[[CONT0]] ], [ %[[I1]], %[[IF1]] ]
		; CHECK-NEXT: phi i32 [ undef, %[[CONT0]] ], [ %[[SDIV1_0]], %[[IF1]] ]
		; CHECK: br {{.*}} label %middle.block, label %vector.body

for.body: ; preds = %if.end, %entry		for.body: ; preds = %if.end, %entry
%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %if.end ]		%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %if.end ]
%isd = getelementptr inbounds i32, i32* %asd, i64 %indvars.iv		%isd = getelementptr inbounds i32, i32* %asd, i64 %indvars.iv
%lsd = load i32, i32* %isd, align 4		%lsd = load i32, i32* %isd, align 4
%isd.b = getelementptr inbounds i32, i32* %bsd, i64 %indvars.iv		%isd.b = getelementptr inbounds i32, i32* %bsd, i64 %indvars.iv
%lsd.b = load i32, i32* %isd.b, align 4		%lsd.b = load i32, i32* %isd.b, align 4
%psd = add nsw i32 %lsd, 23		%psd = add nsw i32 %lsd, 23
Show All 16 Lines
define void @pr30172(i32* nocapture %asd, i32* nocapture %bsd) {		define void @pr30172(i32* nocapture %asd, i32* nocapture %bsd) {
entry:		entry:
br label %for.body		br label %for.body

for.cond.cleanup: ; preds = %if.end		for.cond.cleanup: ; preds = %if.end
ret void		ret void

; CHECK-LABEL: pr30172		; CHECK-LABEL: pr30172
; CHECK: vector.body:		; CHECK: vector.body:
; CHECK: %[[CMP1:.+]] = icmp slt <2 x i32> %[[VAL:.+]], <i32 100, i32 100>		; CHECK: %[[LOAD0:.+]] = load <2 x i32>, <2 x i32>* {{.*}}, align 4
; CHECK: %[[CMP2:.+]] = icmp sge <2 x i32> %[[VAL]], <i32 200, i32 200>		; CHECK: %[[LOAD1:.+]] = load <2 x i32>, <2 x i32>* {{.*}}, align 4
; CHECK: %[[XOR:.+]] = xor <2 x i1> %[[CMP1]], <i1 true, i1 true>		; CHECK: %[[ADD0:.+]] = add nsw <2 x i32> %[[LOAD0]], <i32 23, i32 23>
; CHECK: %[[AND1:.+]] = and <2 x i1> %[[XOR]], <i1 true, i1 true>		; CHECK: %[[COND0:.+]] = extractelement <2 x i1> %{{.*}}, i32 0
; CHECK: %[[OR1:.+]] = or <2 x i1> zeroinitializer, %[[AND1]]		; CHECK-NEXT: %[[CMP0:.+]] = icmp eq i1 %[[COND0]], true
; CHECK: %[[AND2:.+]] = and <2 x i1> %[[CMP2]], %[[OR1]]		; CHECK-NEXT: br i1 %[[CMP0]], label %[[IF0:.+]], label %[[CONT0:.+]]
; CHECK: %[[OR2:.+]] = or <2 x i1> zeroinitializer, %[[AND2]]		; CHECK: [[IF0]]:
; CHECK: %[[AND3:.+]] = and <2 x i1> %[[CMP1]], <i1 true, i1 true>		; CHECK-NEXT: %[[E0_0:.+]] = extractelement <2 x i32> %[[ADD0]], i32 0
; CHECK: %[[OR3:.+]] = or <2 x i1> %[[OR2]], %[[AND3]]		; CHECK-NEXT: %[[E0_1:.+]] = extractelement <2 x i32> %[[LOAD0]], i32 0
; CHECK: %[[EXTRACT:.+]] = extractelement <2 x i1> %[[OR3]], i32 0		; CHECK-NEXT: %[[SDIV0_0:.+]] = sdiv i32 %[[E0_0]], %[[E0_1]]
; CHECK: %[[MASK:.+]] = icmp eq i1 %[[EXTRACT]], true		; CHECK-NEXT: %[[E0_2:.+]] = extractelement <2 x i32> %[[LOAD1]], i32 0
; CHECK: br i1 %[[MASK]], label %[[THEN:[a-zA-Z0-9.]+]], label %[[FI:[a-zA-Z0-9.]+]]		; CHECK-NEXT: %[[SDIV0_1:.+]] = sdiv i32 %[[E0_2]], %[[SDIV0_0]]
; CHECK: [[THEN]]:		; CHECK-NEXT: %[[I0:.+]] = insertelement <2 x i32> undef, i32 %[[SDIV0_1]], i32 0
; CHECK: %[[PD:[a-zA-Z0-9]+]] = sdiv i32 %{{.}}, %{{.}}		; CHECK-NEXT: br label %[[CONT0]]
; CHECK: br label %[[FI]]		; CHECK: [[CONT0]]:
; CHECK: [[FI]]:		; CHECK-NEXT: %[[PHI:.+]] = phi <2 x i32> [ undef, %vector.body ], [ %[[I0]], %[[IF0]] ]
; CHECK: %{{.*}} = phi i32 [ undef, %vector.body ], [ %[[PD]], %[[THEN]] ]		; CHECK-NEXT: phi i32 [ undef, %vector.body ], [ %[[SDIV0_0]], %[[IF0]] ]
		; CHECK-NEXT: %[[COND1:.+]] = extractelement <2 x i1> %{{.*}}, i32 1
		; CHECK-NEXT: %[[CMP1:.+]] = icmp eq i1 %[[COND1]], true
		; CHECK-NEXT: br i1 %[[CMP1]], label %[[IF1:.+]], label %[[CONT1:.+]]
		; CHECK: [[IF1]]:
		; CHECK-NEXT: %[[E1_0:.+]] = extractelement <2 x i32> %[[ADD0]], i32 1
		; CHECK-NEXT: %[[E1_1:.+]] = extractelement <2 x i32> %[[LOAD0]], i32 1
		; CHECK-NEXT: %[[SDIV1_0:.+]] = sdiv i32 %[[E1_0]], %[[E1_1]]
		; CHECK-NEXT: %[[E1_2:.+]] = extractelement <2 x i32> %[[LOAD1]], i32 1
		; CHECK-NEXT: %[[SDIV1_1:.+]] = sdiv i32 %[[E1_2]], %[[SDIV1_0]]
		; CHECK-NEXT: %[[I1:.+]] = insertelement <2 x i32> %[[PHI]], i32 %[[SDIV1_1]], i32 1
		; CHECK-NEXT: br label %[[CONT1]]
		; CHECK: [[CONT1]]:
		; CHECK-NEXT: phi <2 x i32> [ %[[PHI]], %[[CONT0]] ], [ %[[I1]], %[[IF1]] ]
		; CHECK-NEXT: phi i32 [ undef, %[[CONT0]] ], [ %[[SDIV1_0]], %[[IF1]] ]
		; CHECK: br {{.*}} label %middle.block, label %vector.body

for.body: ; preds = %if.end, %entry		for.body: ; preds = %if.end, %entry
%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %if.end ]		%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %if.end ]
%isd = getelementptr inbounds i32, i32* %asd, i64 %indvars.iv		%isd = getelementptr inbounds i32, i32* %asd, i64 %indvars.iv
%lsd = load i32, i32* %isd, align 4		%lsd = load i32, i32* %isd, align 4
%isd.b = getelementptr inbounds i32, i32* %bsd, i64 %indvars.iv		%isd.b = getelementptr inbounds i32, i32* %bsd, i64 %indvars.iv
%lsd.b = load i32, i32* %isd.b, align 4		%lsd.b = load i32, i32* %isd.b, align 4
%psd = add nsw i32 %lsd, 23		%psd = add nsw i32 %lsd, 23
Show All 19 Lines

test/Transforms/LoopVectorize/induction.ll

	Show First 20 Lines • Show All 295 Lines • ▼ Show 20 Lines
	; int x = a[i];			; int x = a[i];
	; if (c)			; if (c)
	; x /= i;			; x /= i;
	; sum += x;			; sum += x;
	; }			; }
	;			;
	; CHECK-LABEL: @scalarize_induction_variable_05(			; CHECK-LABEL: @scalarize_induction_variable_05(
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %pred.udiv.continue2 ]			; CHECK: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %pred.continue2 ]
	; CHECK: %[[I0:.+]] = add i32 %index, 0			; CHECK: %[[I0:.+]] = add i32 %index, 0
	; CHECK: getelementptr inbounds i32, i32* %a, i32 %[[I0]]			; CHECK: getelementptr inbounds i32, i32* %a, i32 %[[I0]]
	; CHECK: pred.udiv.if:			; CHECK: pred.if:
	; CHECK: udiv i32 {{.*}}, %[[I0]]			; CHECK: udiv i32 {{.*}}, %[[I0]]
	; CHECK: pred.udiv.if1:			; CHECK: pred.if1:
	; CHECK: %[[I1:.+]] = add i32 %index, 1			; CHECK: %[[I1:.+]] = add i32 %index, 1
	; CHECK: udiv i32 {{.*}}, %[[I1]]			; CHECK: udiv i32 {{.*}}, %[[I1]]
	;			;
	; UNROLL-NO_IC-LABEL: @scalarize_induction_variable_05(			; UNROLL-NO_IC-LABEL: @scalarize_induction_variable_05(
	; UNROLL-NO-IC: vector.body:			; UNROLL-NO-IC: vector.body:
	; UNROLL-NO-IC: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %pred.udiv.continue11 ]			; UNROLL-NO-IC: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %pred.continue11 ]
	; UNROLL-NO-IC: %[[I0:.+]] = add i32 %index, 0			; UNROLL-NO-IC: %[[I0:.+]] = add i32 %index, 0
	; UNROLL-NO-IC: %[[I2:.+]] = add i32 %index, 2			; UNROLL-NO-IC: %[[I2:.+]] = add i32 %index, 2
	; UNROLL-NO-IC: getelementptr inbounds i32, i32* %a, i32 %[[I0]]			; UNROLL-NO-IC: getelementptr inbounds i32, i32* %a, i32 %[[I0]]
	; UNROLL-NO-IC: getelementptr inbounds i32, i32* %a, i32 %[[I2]]			; UNROLL-NO-IC: getelementptr inbounds i32, i32* %a, i32 %[[I2]]
	; UNROLL-NO-IC: pred.udiv.if:			; UNROLL-NO-IC: pred.if:
	; UNROLL-NO-IC: udiv i32 {{.*}}, %[[I0]]			; UNROLL-NO-IC: udiv i32 {{.*}}, %[[I0]]
	; UNROLL-NO-IC: pred.udiv.if6:			; UNROLL-NO-IC: pred.if6:
	; UNROLL-NO-IC: %[[I1:.+]] = add i32 %index, 1			; UNROLL-NO-IC: %[[I1:.+]] = add i32 %index, 1
	; UNROLL-NO-IC: udiv i32 {{.*}}, %[[I1]]			; UNROLL-NO-IC: udiv i32 {{.*}}, %[[I1]]
	; UNROLL-NO-IC: pred.udiv.if8:			; UNROLL-NO-IC: pred.if8:
	; UNROLL-NO-IC: udiv i32 {{.*}}, %[[I2]]			; UNROLL-NO-IC: udiv i32 {{.*}}, %[[I2]]
	; UNROLL-NO-IC: pred.udiv.if10:			; UNROLL-NO-IC: pred.if10:
	; UNROLL-NO-IC: %[[I3:.+]] = add i32 %index, 3			; UNROLL-NO-IC: %[[I3:.+]] = add i32 %index, 3
	; UNROLL-NO-IC: udiv i32 {{.*}}, %[[I3]]			; UNROLL-NO-IC: udiv i32 {{.*}}, %[[I3]]
	;			;
	; IND-LABEL: @scalarize_induction_variable_05(			; IND-LABEL: @scalarize_induction_variable_05(
	; IND: vector.body:			; IND: vector.body:
	; IND: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %pred.udiv.continue2 ]			; IND: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %pred.continue2 ]
	; IND: %[[E0:.+]] = sext i32 %index to i64			; IND: %[[E0:.+]] = sext i32 %index to i64
	; IND: getelementptr inbounds i32, i32* %a, i64 %[[E0]]			; IND: getelementptr inbounds i32, i32* %a, i64 %[[E0]]
	; IND: pred.udiv.if:			; IND: pred.if:
	; IND: udiv i32 {{.*}}, %index			; IND: udiv i32 {{.*}}, %index
	; IND: pred.udiv.if1:			; IND: pred.if1:
	; IND: %[[I1:.+]] = or i32 %index, 1			; IND: %[[I1:.+]] = or i32 %index, 1
	; IND: udiv i32 {{.*}}, %[[I1]]			; IND: udiv i32 {{.*}}, %[[I1]]
	;			;
	; UNROLL-LABEL: @scalarize_induction_variable_05(			; UNROLL-LABEL: @scalarize_induction_variable_05(
	; UNROLL: vector.body:			; UNROLL: vector.body:
	; UNROLL: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %pred.udiv.continue11 ]			; UNROLL: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %pred.continue11 ]
	; UNROLL: %[[I2:.+]] = or i32 %index, 2			; UNROLL: %[[I2:.+]] = or i32 %index, 2
	; UNROLL: %[[E0:.+]] = sext i32 %index to i64			; UNROLL: %[[E0:.+]] = sext i32 %index to i64
	; UNROLL: %[[G0:.+]] = getelementptr inbounds i32, i32* %a, i64 %[[E0]]			; UNROLL: %[[G0:.+]] = getelementptr inbounds i32, i32* %a, i64 %[[E0]]
	; UNROLL: getelementptr i32, i32* %[[G0]], i64 2			; UNROLL: getelementptr i32, i32* %[[G0]], i64 2
	; UNROLL: pred.udiv.if:			; UNROLL: pred.if:
	; UNROLL: udiv i32 {{.*}}, %index			; UNROLL: udiv i32 {{.*}}, %index
	; UNROLL: pred.udiv.if6:			; UNROLL: pred.if6:
	; UNROLL: %[[I1:.+]] = or i32 %index, 1			; UNROLL: %[[I1:.+]] = or i32 %index, 1
	; UNROLL: udiv i32 {{.*}}, %[[I1]]			; UNROLL: udiv i32 {{.*}}, %[[I1]]
	; UNROLL: pred.udiv.if8:			; UNROLL: pred.if8:
	; UNROLL: udiv i32 {{.*}}, %[[I2]]			; UNROLL: udiv i32 {{.*}}, %[[I2]]
	; UNROLL: pred.udiv.if10:			; UNROLL: pred.if10:
	; UNROLL: %[[I3:.+]] = or i32 %index, 3			; UNROLL: %[[I3:.+]] = or i32 %index, 3
	; UNROLL: udiv i32 {{.*}}, %[[I3]]			; UNROLL: udiv i32 {{.*}}, %[[I3]]

	define i32 @scalarize_induction_variable_05(i32* %a, i32 %x, i1 %c, i32 %n) {			define i32 @scalarize_induction_variable_05(i32* %a, i32 %x, i1 %c, i32 %n) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	▲ Show 20 Lines • Show All 403 Lines • Show Last 20 Lines

test/Transforms/LoopVectorize/interleaved-accesses-pred-stores.ll

	Show All 13 Lines
	; CHECK: %[[IsZero:[a-zA-Z0-9]+]] = icmp eq i64 %n.mod.vf, 0			; CHECK: %[[IsZero:[a-zA-Z0-9]+]] = icmp eq i64 %n.mod.vf, 0
	; CHECK: %[[R:.+]] = select i1 %[[IsZero]], i64 2, i64 %n.mod.vf			; CHECK: %[[R:.+]] = select i1 %[[IsZero]], i64 2, i64 %n.mod.vf
	; CHECK: %n.vec = sub nsw i64 %[[N]], %[[R]]			; CHECK: %n.vec = sub nsw i64 %[[N]], %[[R]]
	;			;
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK: %wide.vec = load <4 x i64>, <4 x i64>* %{{.*}}			; CHECK: %wide.vec = load <4 x i64>, <4 x i64>* %{{.*}}
	; CHECK: %strided.vec = shufflevector <4 x i64> %wide.vec, <4 x i64> undef, <2 x i32> <i32 0, i32 2>			; CHECK: %strided.vec = shufflevector <4 x i64> %wide.vec, <4 x i64> undef, <2 x i32> <i32 0, i32 2>
	;			;
	; CHECK: pred.store.if			; CHECK: pred.if
	; CHECK: %[[X1:.+]] = extractelement <4 x i64> %wide.vec, i32 0			; CHECK: %[[X1:.+]] = extractelement <4 x i64> %wide.vec, i32 0
	; CHECK: store i64 %[[X1]], {{.*}}			; CHECK: store i64 %[[X1]], {{.*}}
	;			;
	; CHECK: pred.store.if			; CHECK: pred.if
	; CHECK: %[[X2:.+]] = extractelement <4 x i64> %wide.vec, i32 2			; CHECK: %[[X2:.+]] = extractelement <4 x i64> %wide.vec, i32 2
	; CHECK: store i64 %[[X2]], {{.*}}			; CHECK: store i64 %[[X2]], {{.*}}

	define void @interleaved_with_cond_store_0(%pair *%p, i64 %x, i64 %n) {			define void @interleaved_with_cond_store_0(%pair *%p, i64 %x, i64 %n) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	Show All 28 Lines
	; CHECK: %[[IsZero:[a-zA-Z0-9]+]] = icmp eq i64 %n.mod.vf, 0			; CHECK: %[[IsZero:[a-zA-Z0-9]+]] = icmp eq i64 %n.mod.vf, 0
	; CHECK: %[[R:.+]] = select i1 %[[IsZero]], i64 2, i64 %n.mod.vf			; CHECK: %[[R:.+]] = select i1 %[[IsZero]], i64 2, i64 %n.mod.vf
	; CHECK: %n.vec = sub nsw i64 %[[N]], %[[R]]			; CHECK: %n.vec = sub nsw i64 %[[N]], %[[R]]
	;			;
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK: %[[L1:.+]] = load <4 x i64>, <4 x i64>* %{{.*}}			; CHECK: %[[L1:.+]] = load <4 x i64>, <4 x i64>* %{{.*}}
	; CHECK: %strided.vec = shufflevector <4 x i64> %[[L1]], <4 x i64> undef, <2 x i32> <i32 0, i32 2>			; CHECK: %strided.vec = shufflevector <4 x i64> %[[L1]], <4 x i64> undef, <2 x i32> <i32 0, i32 2>
	;			;
	; CHECK: pred.store.if			; CHECK: pred.if
	; CHECK: %[[X1:.+]] = extractelement <4 x i64> %wide.vec, i32 0			; CHECK: %[[X1:.+]] = extractelement <4 x i64> %wide.vec, i32 0
	; CHECK: store i64 %[[X1]], {{.*}}			; CHECK: store i64 %[[X1]], {{.*}}
	;			;
	; CHECK: pred.store.if			; CHECK: pred.if
	; CHECK: %[[X2:.+]] = extractelement <4 x i64> %wide.vec, i32 2			; CHECK: %[[X2:.+]] = extractelement <4 x i64> %wide.vec, i32 2
	; CHECK: store i64 %[[X2]], {{.*}}			; CHECK: store i64 %[[X2]], {{.*}}
	;			;
	; CHECK: pred.store.continue			; CHECK: pred.continue
	; CHECK: %[[L2:.+]] = load <4 x i64>, <4 x i64>* {{.*}}			; CHECK: %[[L2:.+]] = load <4 x i64>, <4 x i64>* {{.*}}
	; CHECK: %[[X3:.+]] = extractelement <4 x i64> %[[L2]], i32 0			; CHECK: %[[X3:.+]] = extractelement <4 x i64> %[[L2]], i32 0
	; CHECK: store i64 %[[X3]], {{.*}}			; CHECK: store i64 %[[X3]], {{.*}}
	; CHECK: %[[X4:.+]] = extractelement <4 x i64> %[[L2]], i32 2			; CHECK: %[[X4:.+]] = extractelement <4 x i64> %[[L2]], i32 2
	; CHECK: store i64 %[[X4]], {{.*}}			; CHECK: store i64 %[[X4]], {{.*}}

	define void @interleaved_with_cond_store_1(%pair *%p, i64 %x, i64 %n) {			define void @interleaved_with_cond_store_1(%pair *%p, i64 %x, i64 %n) {
	entry:			entry:
	Show All 36 Lines
	; CHECK: %n.vec = sub nsw i64 %[[N]], %[[R]]			; CHECK: %n.vec = sub nsw i64 %[[N]], %[[R]]
	;			;
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK: %[[L1:.+]] = load <4 x i64>, <4 x i64>* %{{.*}}			; CHECK: %[[L1:.+]] = load <4 x i64>, <4 x i64>* %{{.*}}
	; CHECK: %strided.vec = shufflevector <4 x i64> %[[L1]], <4 x i64> undef, <2 x i32> <i32 0, i32 2>			; CHECK: %strided.vec = shufflevector <4 x i64> %[[L1]], <4 x i64> undef, <2 x i32> <i32 0, i32 2>
	; CHECK: store i64 %x, {{.*}}			; CHECK: store i64 %x, {{.*}}
	; CHECK: store i64 %x, {{.*}}			; CHECK: store i64 %x, {{.*}}
	;			;
	; CHECK: pred.store.if			; CHECK: pred.if
	; CHECK: %[[X1:.+]] = extractelement <4 x i64> %wide.vec, i32 0			; CHECK: %[[X1:.+]] = extractelement <4 x i64> %wide.vec, i32 0
	; CHECK: store i64 %[[X1]], {{.*}}			; CHECK: store i64 %[[X1]], {{.*}}
	;			;
	; CHECK: pred.store.if			; CHECK: pred.if
	; CHECK: %[[X2:.+]] = extractelement <4 x i64> %wide.vec, i32 2			; CHECK: %[[X2:.+]] = extractelement <4 x i64> %wide.vec, i32 2
	; CHECK: store i64 %[[X2]], {{.*}}			; CHECK: store i64 %[[X2]], {{.*}}

	define void @interleaved_with_cond_store_2(%pair *%p, i64 %x, i64 %n) {			define void @interleaved_with_cond_store_2(%pair *%p, i64 %x, i64 %n) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	Show All 20 Lines