This is an archive of the discontinued LLVM Phabricator instance.

[LV] Merge floating point and integer induction widening code
ClosedPublic

Authored by mssimpso on Feb 21 2017, 9:34 AM.

Download Raw Diff

Details

Reviewers

delena
mkuper

Commits

rGbdc9c7888099: [LV] Merge floating-point and integer induction widening code
rL296145: [LV] Merge floating-point and integer induction widening code

Summary

This patch merges the existing floating point induction variable widening code into the integer induction variable widening code, creating a single set of functions for both kinds of inductions. The primary motivation for doing this is to enable vector phi node creation for floating point induction variables.

Diff Detail

Build Status

Buildable 4186
Build 4186: arc lint + arc unit

Event Timeline

mssimpso created this revision.Feb 21 2017, 9:34 AM

Herald added a subscriber: mzolotukhin. · View Herald TranscriptFeb 21 2017, 9:34 AM

Thanks, this basically looks good, a few comments inline.

lib/Transforms/Vectorize/LoopVectorize.cpp
387	Nothing changed here, you just moved it around, right?
400	One of the users of this for the int case was a getSigned(), and now it's a get(). Are you sure this is correct?
test/Transforms/LoopVectorize/float-induction.ll
4	Did you run it through the update script? If you did, could you have the diff show the actual diff vs. running it with the old code?

delena added inline comments.Feb 21 2017, 11:46 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
2468	I suggest to simplify the expression: IV->getType()->isIntegerTy() \|\| IV != OldInduction
2502	ID.getStep() should already be SCEVUnknown if it is not scevable. I think that the following code should work: const SCEV Step = ID.getStep(); if (PSE.getSE()->isSCEVable(IV->getType())) { SCEVExpander Exp(PSE.getSE(), DL, "induction"); Step = Exp.expandCodeFor(Step, ID.getStep()->getType(), LoopVectorPreHeader->getTerminator()); } And I think you should use IV->getType() instead of ID.getStep()->getType(), while calling to expandCodeFor(), it will expand/truncate if needed.
2637	I'm not 100% sure about the following comment: For FP, the operation may be FADD or FSUB. It depends on reduction opcode.

mkuper added inline comments.Feb 22 2017, 12:29 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
2502	Step needs to be a Value, not a SCEV, hence the else. (Perhaps you've read it as "cast<SCEVUnknown>(ID.getStep()->getValue());", not "cast<SCEVUnknown>(ID.getStep())->getValue()"?) As to the other change - I think this patch is trying to be NFC for ints, so I think I'd prefer we not change that in this patch.
2637	I think you're right. (You mean induction opcode, right?)

mssimpso added inline comments.Feb 22 2017, 6:24 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
387	Right, I just moved it so I could reuse it.
400	You're right, nice catch!
2468	Yes that's much simpler. Thanks!
2637	Ah, yes that's true. We'll need to get the step BinOp from the induction descriptor. Thanks!
test/Transforms/LoopVectorize/float-induction.ll
4	Hey Michael, can you clarify what you mean here? I did use the script to help generate the checks (then moved the test into this file). You're just wanting to see the lines that are different with and without the patch?

mkuper added inline comments.Feb 22 2017, 10:51 AM

test/Transforms/LoopVectorize/float-induction.ll
4	Yes, otherwise it's kind of hard to see what changed. I suggest running the script over the test with the existing code, committing that test, and rebasing this patch on that. That way we can actually see what happened.

mssimpso added inline comments.Feb 22 2017, 10:54 AM

test/Transforms/LoopVectorize/float-induction.ll
4	Great, that's what I was thinking as well.

Addressed comments from Michael and Elena. Thanks!

Changed the constant helper to use signed values.
Simplified assert in widenIntOrFpInduction.
Used the induction opcode from the induction descriptor for the "AddOp" in buildScalarSteps and createVectorIntOrFPInductionPHI. I corrected one of the test cases I incorrectly updated the first time around. It now correctly uses fsub instead of fadd for the induction update.
Rebased new test case.

mkuper added inline comments.Feb 22 2017, 12:13 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
400	So you're saying the get() on line 2592 should have been a getSigned() too? (Although I'm not sure it matters, for that case.)
test/Transforms/LoopVectorize/float-induction.ll
4	Sorry, I meant for all test cases you changed, not just the new one.

mssimpso added inline comments.Feb 22 2017, 12:23 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
400	Both of the original uses (get and getSigned) could only ever have values greater than one (and much less than the max int64_t) so I don't think it makes a difference either way. If we go with a helper function here, I think the signed version is less confusing.
test/Transforms/LoopVectorize/float-induction.ll
4	Ah, I see! Sorry for the confusion. I'll run the script over the tests I changed and rebase.

mkuper added inline comments.Feb 22 2017, 1:04 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
400	Yeah, makes sense.

Rebased tests.

LGTM

This revision is now accepted and ready to land.Feb 22 2017, 3:52 PM

delena added inline comments.Feb 22 2017, 11:06 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
4741	Is the pointer induction really different from int and FP induction? May be you can put everything under widenInduction?

mssimpso added inline comments.Feb 23 2017, 4:12 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
4741	It's not, and I'm planning to work on this soon. Thanks!

delena added inline comments.Feb 23 2017, 4:29 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
4741	I'd put them all in one patch and call the function widenInduction(). but it is not a stumbling-block anyway.

mssimpso added inline comments.Feb 23 2017, 5:00 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
4741	Would you mind if we did this in a separate patch? I was trying to focus this one on floating-point, while keeping everything thing else as NFC as possible. Once committed, we can make sure no performance issues crop up for anyone before moving on to the pointer induction variables. The suggestion makes sense - I just would like to keep things small for review and codgen investigations should the need arise.

mssimpso added inline comments.Feb 24 2017, 9:34 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
4741	Hi Elena, I'm going to go ahead and commit this - assuming no issues arise over the weekend, I'll submit a patch to take care of the pointer inductions early next week, like you suggested.

Closed by commit rL296145: [LV] Merge floating-point and integer induction widening code (authored by mssimpso). · Explain WhyFeb 24 2017, 10:32 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

157 lines

test/

Transforms/

LoopVectorize/

float-induction.ll

156 lines

Diff 89392

lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 378 Lines • ▼ Show 20 Lines
/// A helper function that returns the reciprocal of the block probability of		/// A helper function that returns the reciprocal of the block probability of
/// predicated blocks. If we return X, we are assuming the predicated block		/// predicated blocks. If we return X, we are assuming the predicated block
/// will execute once for for every X iterations of the loop header.		/// will execute once for for every X iterations of the loop header.
///		///
/// TODO: We should use actual block probability here, if available. Currently,		/// TODO: We should use actual block probability here, if available. Currently,
/// we always assume predicated blocks have a 50% chance of executing.		/// we always assume predicated blocks have a 50% chance of executing.
static unsigned getReciprocalPredBlockProb() { return 2; }		static unsigned getReciprocalPredBlockProb() { return 2; }

		/// A helper function that adds a 'fast' flag to floating point operations.
		mkuperUnsubmitted Not Done Reply Inline Actions Nothing changed here, you just moved it around, right? mkuper: Nothing changed here, you just moved it around, right?
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions Right, I just moved it so I could reuse it. mssimpso: Right, I just moved it so I could reuse it.
		static Value addFastMathFlag(Value V) {
		if (isa<FPMathOperator>(V)) {
		FastMathFlags Flags;
		Flags.setUnsafeAlgebra();
		cast<Instruction>(V)->setFastMathFlags(Flags);
		}
		return V;
		}

		/// A helper function that returns an integer or floating-point constant with
		/// value C.
		static Constant getSignedIntOrFpConstant(Type Ty, int64_t C) {
		return Ty->isIntegerTy() ? ConstantInt::getSigned(Ty, C)
		mkuperUnsubmitted Done Reply Inline Actions One of the users of this for the int case was a getSigned(), and now it's a get(). Are you sure this is correct? mkuper: One of the users of this for the int case was a getSigned(), and now it's a get(). Are you sure…
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions You're right, nice catch! mssimpso: You're right, nice catch!
		mkuperUnsubmitted Not Done Reply Inline Actions So you're saying the get() on line 2592 should have been a getSigned() too? (Although I'm not sure it matters, for that case.) mkuper: So you're saying the get() on line 2592 should have been a getSigned() too? (Although I'm not…
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions Both of the original uses (get and getSigned) could only ever have values greater than one (and much less than the max int64_t) so I don't think it makes a difference either way. If we go with a helper function here, I think the signed version is less confusing. mssimpso: Both of the original uses (get and getSigned) could only ever have values greater than one (and…
		mkuperUnsubmitted Not Done Reply Inline Actions Yeah, makes sense. mkuper: Yeah, makes sense.
		: ConstantFP::get(Ty, C);
		}

/// InnerLoopVectorizer vectorizes loops which contain only one basic		/// InnerLoopVectorizer vectorizes loops which contain only one basic
/// block to a specified vectorization factor (VF).		/// block to a specified vectorization factor (VF).
/// This class performs the widening of scalars into vectors, or multiple		/// This class performs the widening of scalars into vectors, or multiple
/// scalars. This class also implements the following features:		/// scalars. This class also implements the following features:
/// * It inserts an epilogue loop for handling loops that don't have iteration		/// * It inserts an epilogue loop for handling loops that don't have iteration
/// counts that are known to be a multiple of the vectorization factor.		/// counts that are known to be a multiple of the vectorization factor.
/// * It handles the code generation for reduction variables.		/// * It handles the code generation for reduction variables.
/// * Scalarization (implementation using scalars) of un-vectorizable		/// * Scalarization (implementation using scalars) of un-vectorizable
▲ Show 20 Lines • Show All 137 Lines • ▼ Show 20 Lines	virtual Value getStepVector(Value Val, int StartIdx, Value *Step,
Instruction::BinaryOps Opcode =		Instruction::BinaryOps Opcode =
Instruction::BinaryOpsEnd);		Instruction::BinaryOpsEnd);

/// Compute scalar induction steps. \p ScalarIV is the scalar induction		/// Compute scalar induction steps. \p ScalarIV is the scalar induction
/// variable on which to base the steps, \p Step is the size of the step, and		/// variable on which to base the steps, \p Step is the size of the step, and
/// \p EntryVal is the value from the original loop that maps to the steps.		/// \p EntryVal is the value from the original loop that maps to the steps.
/// Note that \p EntryVal doesn't have to be an induction variable (e.g., it		/// Note that \p EntryVal doesn't have to be an induction variable (e.g., it
/// can be a truncate instruction).		/// can be a truncate instruction).
void buildScalarSteps(Value ScalarIV, Value Step, Value *EntryVal);		void buildScalarSteps(Value ScalarIV, Value Step, Value *EntryVal,
		const InductionDescriptor &ID);

/// Create a vector induction phi node based on an existing scalar one. \p		/// Create a vector induction phi node based on an existing scalar one. \p
/// EntryVal is the value from the original loop that maps to the vector phi		/// EntryVal is the value from the original loop that maps to the vector phi
/// node, and \p Step is the loop-invariant step. If \p EntryVal is a		/// node, and \p Step is the loop-invariant step. If \p EntryVal is a
/// truncate instruction, instead of widening the original IV, we widen a		/// truncate instruction, instead of widening the original IV, we widen a
/// version of the IV truncated to \p EntryVal's type.		/// version of the IV truncated to \p EntryVal's type.
void createVectorIntInductionPHI(const InductionDescriptor &II, Value *Step,		void createVectorIntOrFpInductionPHI(const InductionDescriptor &II,
Instruction *EntryVal);		Value Step, Instruction EntryVal);

/// Widen an integer induction variable \p IV. If \p Trunc is provided, the		/// Widen an integer or floating-point induction variable \p IV. If \p Trunc
/// induction variable will first be truncated to the corresponding type.		/// is provided, the integer induction variable will first be truncated to
void widenIntInduction(PHINode IV, TruncInst Trunc = nullptr);		/// the corresponding type.
		void widenIntOrFpInduction(PHINode IV, TruncInst Trunc = nullptr);

/// Returns true if an instruction \p I should be scalarized instead of		/// Returns true if an instruction \p I should be scalarized instead of
/// vectorized for the chosen vectorization factor.		/// vectorized for the chosen vectorization factor.
bool shouldScalarizeInstruction(Instruction *I) const;		bool shouldScalarizeInstruction(Instruction *I) const;

/// Returns true if we should generate a scalar version of \p IV.		/// Returns true if we should generate a scalar version of \p IV.
bool needsScalarInduction(Instruction *IV) const;		bool needsScalarInduction(Instruction *IV) const;

▲ Show 20 Lines • Show All 1,789 Lines • ▼ Show 20 Lines	if (Invariant)
Builder.SetInsertPoint(LoopVectorPreHeader->getTerminator());		Builder.SetInsertPoint(LoopVectorPreHeader->getTerminator());

// Broadcast the scalar into all locations in the vector.		// Broadcast the scalar into all locations in the vector.
Value *Shuf = Builder.CreateVectorSplat(VF, V, "broadcast");		Value *Shuf = Builder.CreateVectorSplat(VF, V, "broadcast");

return Shuf;		return Shuf;
}		}

void InnerLoopVectorizer::createVectorIntInductionPHI(		void InnerLoopVectorizer::createVectorIntOrFpInductionPHI(
const InductionDescriptor &II, Value Step, Instruction EntryVal) {		const InductionDescriptor &II, Value Step, Instruction EntryVal) {
Value *Start = II.getStartValue();		Value *Start = II.getStartValue();
assert(Step->getType()->isIntegerTy() &&
"Cannot widen an IV having a step with a non-integer type");

// Construct the initial value of the vector IV in the vector loop preheader		// Construct the initial value of the vector IV in the vector loop preheader
auto CurrIP = Builder.saveIP();		auto CurrIP = Builder.saveIP();
Builder.SetInsertPoint(LoopVectorPreHeader->getTerminator());		Builder.SetInsertPoint(LoopVectorPreHeader->getTerminator());
if (isa<TruncInst>(EntryVal)) {		if (isa<TruncInst>(EntryVal)) {
		assert(Start->getType()->isIntegerTy() &&
		"Truncation requires an integer type");
auto *TruncType = cast<IntegerType>(EntryVal->getType());		auto *TruncType = cast<IntegerType>(EntryVal->getType());
Step = Builder.CreateTrunc(Step, TruncType);		Step = Builder.CreateTrunc(Step, TruncType);
Start = Builder.CreateCast(Instruction::Trunc, Start, TruncType);		Start = Builder.CreateCast(Instruction::Trunc, Start, TruncType);
}		}
Value *SplatStart = Builder.CreateVectorSplat(VF, Start);		Value *SplatStart = Builder.CreateVectorSplat(VF, Start);
Value *SteppedStart = getStepVector(SplatStart, 0, Step);		Value *SteppedStart =
		getStepVector(SplatStart, 0, Step, II.getInductionOpcode());

		// We create vector phi nodes for both integer and floating-point induction
		// variables. Here, we determine the kind of arithmetic we will perform.
		Instruction::BinaryOps AddOp;
		Instruction::BinaryOps MulOp;
		if (Step->getType()->isIntegerTy()) {
		AddOp = Instruction::Add;
		MulOp = Instruction::Mul;
		} else {
		AddOp = II.getInductionOpcode();
		MulOp = Instruction::FMul;
		}

		// Multiply the vectorization factor by the step using integer or
		// floating-point arithmetic as appropriate.
		Value *ConstVF = getSignedIntOrFpConstant(Step->getType(), VF);
		Value *Mul = addFastMathFlag(Builder.CreateBinOp(MulOp, Step, ConstVF));

// Create a vector splat to use in the induction update.		// Create a vector splat to use in the induction update.
//		//
// FIXME: If the step is non-constant, we create the vector splat with		// FIXME: If the step is non-constant, we create the vector splat with
// IRBuilder. IRBuilder can constant-fold the multiply, but it doesn't		// IRBuilder. IRBuilder can constant-fold the multiply, but it doesn't
// handle a constant vector splat.		// handle a constant vector splat.
auto *ConstVF = ConstantInt::getSigned(Step->getType(), VF);
auto *Mul = Builder.CreateMul(Step, ConstVF);
Value *SplatVF = isa<Constant>(Mul)		Value *SplatVF = isa<Constant>(Mul)
? ConstantVector::getSplat(VF, cast<Constant>(Mul))		? ConstantVector::getSplat(VF, cast<Constant>(Mul))
: Builder.CreateVectorSplat(VF, Mul);		: Builder.CreateVectorSplat(VF, Mul);
Builder.restoreIP(CurrIP);		Builder.restoreIP(CurrIP);

// We may need to add the step a number of times, depending on the unroll		// We may need to add the step a number of times, depending on the unroll
// factor. The last of those goes into the PHI.		// factor. The last of those goes into the PHI.
PHINode *VecInd = PHINode::Create(SteppedStart->getType(), 2, "vec.ind",		PHINode *VecInd = PHINode::Create(SteppedStart->getType(), 2, "vec.ind",
&*LoopVectorBody->getFirstInsertionPt());		&*LoopVectorBody->getFirstInsertionPt());
Instruction *LastInduction = VecInd;		Instruction *LastInduction = VecInd;
VectorParts Entry(UF);		VectorParts Entry(UF);
for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
Entry[Part] = LastInduction;		Entry[Part] = LastInduction;
LastInduction = cast<Instruction>(		LastInduction = cast<Instruction>(addFastMathFlag(
Builder.CreateAdd(LastInduction, SplatVF, "step.add"));		Builder.CreateBinOp(AddOp, LastInduction, SplatVF, "step.add")));
}		}
VectorLoopValueMap.initVector(EntryVal, Entry);		VectorLoopValueMap.initVector(EntryVal, Entry);
if (isa<TruncInst>(EntryVal))		if (isa<TruncInst>(EntryVal))
addMetadata(Entry, EntryVal);		addMetadata(Entry, EntryVal);

// Move the last step to the end of the latch block. This ensures consistent		// Move the last step to the end of the latch block. This ensures consistent
// placement of all induction updates.		// placement of all induction updates.
auto *LoopVectorLatch = LI->getLoopFor(LoopVectorBody)->getLoopLatch();		auto *LoopVectorLatch = LI->getLoopFor(LoopVectorBody)->getLoopLatch();
Show All 16 Lines	if (shouldScalarizeInstruction(IV))
return true;		return true;
auto isScalarInst = [&](User *U) -> bool {		auto isScalarInst = [&](User *U) -> bool {
auto *I = cast<Instruction>(U);		auto *I = cast<Instruction>(U);
return (OrigLoop->contains(I) && shouldScalarizeInstruction(I));		return (OrigLoop->contains(I) && shouldScalarizeInstruction(I));
};		};
return any_of(IV->users(), isScalarInst);		return any_of(IV->users(), isScalarInst);
}		}

void InnerLoopVectorizer::widenIntInduction(PHINode IV, TruncInst Trunc) {		void InnerLoopVectorizer::widenIntOrFpInduction(PHINode IV, TruncInst Trunc) {

		assert((IV->getType()->isIntegerTy() \|\| IV != OldInduction) &&
		"Primary induction variable must have an integer type");

		delenaUnsubmitted Done Reply Inline Actions I suggest to simplify the expression: IV->getType()->isIntegerTy() \|\| IV != OldInduction delena: I suggest to simplify the expression: IV->getType()->isIntegerTy() \|\| IV != OldInduction
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions Yes that's much simpler. Thanks! mssimpso: Yes that's much simpler. Thanks!
auto II = Legal->getInductionVars()->find(IV);		auto II = Legal->getInductionVars()->find(IV);
assert(II != Legal->getInductionVars()->end() && "IV is not an induction");		assert(II != Legal->getInductionVars()->end() && "IV is not an induction");

auto ID = II->second;		auto ID = II->second;
assert(IV->getType() == ID.getStartValue()->getType() && "Types must match");		assert(IV->getType() == ID.getStartValue()->getType() && "Types must match");

// The scalar value to broadcast. This will be derived from the canonical		// The scalar value to broadcast. This will be derived from the canonical
// induction variable.		// induction variable.
Show All 11 Lines	void InnerLoopVectorizer::widenIntOrFpInduction(PHINode IV, TruncInst Trunc) {
// least one user in the loop that is not widened.		// least one user in the loop that is not widened.
auto NeedsScalarIV = VF > 1 && needsScalarInduction(EntryVal);		auto NeedsScalarIV = VF > 1 && needsScalarInduction(EntryVal);

// Generate code for the induction step. Note that induction steps are		// Generate code for the induction step. Note that induction steps are
// required to be loop-invariant		// required to be loop-invariant
assert(PSE.getSE()->isLoopInvariant(ID.getStep(), OrigLoop) &&		assert(PSE.getSE()->isLoopInvariant(ID.getStep(), OrigLoop) &&
"Induction step should be loop invariant");		"Induction step should be loop invariant");
auto &DL = OrigLoop->getHeader()->getModule()->getDataLayout();		auto &DL = OrigLoop->getHeader()->getModule()->getDataLayout();
		Value *Step = nullptr;
		if (PSE.getSE()->isSCEVable(IV->getType())) {
SCEVExpander Exp(*PSE.getSE(), DL, "induction");		SCEVExpander Exp(*PSE.getSE(), DL, "induction");
Value *Step = Exp.expandCodeFor(ID.getStep(), ID.getStep()->getType(),		Step = Exp.expandCodeFor(ID.getStep(), ID.getStep()->getType(),
LoopVectorPreHeader->getTerminator());		LoopVectorPreHeader->getTerminator());
		} else {
		Step = cast<SCEVUnknown>(ID.getStep())->getValue();
		delenaUnsubmitted Not Done Reply Inline Actions ID.getStep() should already be SCEVUnknown if it is not scevable. I think that the following code should work: const SCEV Step = ID.getStep(); if (PSE.getSE()->isSCEVable(IV->getType())) { SCEVExpander Exp(PSE.getSE(), DL, "induction"); Step = Exp.expandCodeFor(Step, ID.getStep()->getType(), LoopVectorPreHeader->getTerminator()); } And I think you should use IV->getType() instead of ID.getStep()->getType(), while calling to expandCodeFor(), it will expand/truncate if needed. delena: ID.getStep() should already be SCEVUnknown if it is not scevable. I think that the following…
		mkuperUnsubmitted Not Done Reply Inline Actions Step needs to be a Value, not a SCEV, hence the else. (Perhaps you've read it as "cast<SCEVUnknown>(ID.getStep()->getValue());", not "cast<SCEVUnknown>(ID.getStep())->getValue()"?) As to the other change - I think this patch is trying to be NFC for ints, so I think I'd prefer we not change that in this patch. mkuper: Step needs to be a Value, not a SCEV, hence the else. (Perhaps you've read it as…
		}

// Try to create a new independent vector induction variable. If we can't		// Try to create a new independent vector induction variable. If we can't
// create the phi node, we will splat the scalar induction variable in each		// create the phi node, we will splat the scalar induction variable in each
// loop iteration.		// loop iteration.
if (VF > 1 && !shouldScalarizeInstruction(EntryVal)) {		if (VF > 1 && !shouldScalarizeInstruction(EntryVal)) {
createVectorIntInductionPHI(ID, Step, EntryVal);		createVectorIntOrFpInductionPHI(ID, Step, EntryVal);
VectorizedIV = true;		VectorizedIV = true;
}		}

// If we haven't yet vectorized the induction variable, or if we will create		// If we haven't yet vectorized the induction variable, or if we will create
// a scalar one, we need to define the scalar induction variable and step		// a scalar one, we need to define the scalar induction variable and step
// values. If we were given a truncation type, truncate the canonical		// values. If we were given a truncation type, truncate the canonical
// induction variable and step. Otherwise, derive these values from the		// induction variable and step. Otherwise, derive these values from the
// induction descriptor.		// induction descriptor.
if (!VectorizedIV \|\| NeedsScalarIV) {		if (!VectorizedIV \|\| NeedsScalarIV) {
if (Trunc) {		if (Trunc) {
auto *TruncType = cast<IntegerType>(Trunc->getType());		auto *TruncType = cast<IntegerType>(Trunc->getType());
assert(Step->getType()->isIntegerTy() &&		assert(Step->getType()->isIntegerTy() &&
"Truncation requires an integer step");		"Truncation requires an integer step");
ScalarIV = Builder.CreateCast(Instruction::Trunc, Induction, TruncType);		ScalarIV = Builder.CreateCast(Instruction::Trunc, Induction, TruncType);
Step = Builder.CreateTrunc(Step, TruncType);		Step = Builder.CreateTrunc(Step, TruncType);
} else {		} else {
ScalarIV = Induction;		ScalarIV = Induction;
if (IV != OldInduction) {		if (IV != OldInduction) {
ScalarIV = Builder.CreateSExtOrTrunc(ScalarIV, IV->getType());		ScalarIV = IV->getType()->isIntegerTy()
		? Builder.CreateSExtOrTrunc(ScalarIV, IV->getType())
		: Builder.CreateCast(Instruction::SIToFP, Induction,
		IV->getType());
ScalarIV = ID.transform(Builder, ScalarIV, PSE.getSE(), DL);		ScalarIV = ID.transform(Builder, ScalarIV, PSE.getSE(), DL);
ScalarIV->setName("offset.idx");		ScalarIV->setName("offset.idx");
}		}
}		}
}		}

// If we haven't yet vectorized the induction variable, splat the scalar		// If we haven't yet vectorized the induction variable, splat the scalar
// induction variable, and build the necessary step vectors.		// induction variable, and build the necessary step vectors.
if (!VectorizedIV) {		if (!VectorizedIV) {
Value *Broadcasted = getBroadcastInstrs(ScalarIV);		Value *Broadcasted = getBroadcastInstrs(ScalarIV);
VectorParts Entry(UF);		VectorParts Entry(UF);
for (unsigned Part = 0; Part < UF; ++Part)		for (unsigned Part = 0; Part < UF; ++Part)
Entry[Part] = getStepVector(Broadcasted, VF * Part, Step);		Entry[Part] =
		getStepVector(Broadcasted, VF * Part, Step, ID.getInductionOpcode());
VectorLoopValueMap.initVector(EntryVal, Entry);		VectorLoopValueMap.initVector(EntryVal, Entry);
if (Trunc)		if (Trunc)
addMetadata(Entry, Trunc);		addMetadata(Entry, Trunc);
}		}

// If an induction variable is only used for counting loop iterations or		// If an induction variable is only used for counting loop iterations or
// calculating addresses, it doesn't need to be widened. Create scalar steps		// calculating addresses, it doesn't need to be widened. Create scalar steps
// that can be used by instructions we will later scalarize. Note that the		// that can be used by instructions we will later scalarize. Note that the
// addition of the scalar steps will not increase the number of instructions		// addition of the scalar steps will not increase the number of instructions
// in the loop in the common case prior to InstCombine. We will be trading		// in the loop in the common case prior to InstCombine. We will be trading
// one vector extract for each scalar step.		// one vector extract for each scalar step.
if (NeedsScalarIV)		if (NeedsScalarIV)
buildScalarSteps(ScalarIV, Step, EntryVal);		buildScalarSteps(ScalarIV, Step, EntryVal, ID);
}		}

Value InnerLoopVectorizer::getStepVector(Value Val, int StartIdx, Value *Step,		Value InnerLoopVectorizer::getStepVector(Value Val, int StartIdx, Value *Step,
Instruction::BinaryOps BinOp) {		Instruction::BinaryOps BinOp) {
// Create and check the types.		// Create and check the types.
assert(Val->getType()->isVectorTy() && "Must be a vector");		assert(Val->getType()->isVectorTy() && "Must be a vector");
int VLen = Val->getType()->getVectorNumElements();		int VLen = Val->getType()->getVectorNumElements();

▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	Value InnerLoopVectorizer::getStepVector(Value Val, int StartIdx, Value *Step,

Value *BOp = Builder.CreateBinOp(BinOp, Val, MulOp, "induction");		Value *BOp = Builder.CreateBinOp(BinOp, Val, MulOp, "induction");
if (isa<Instruction>(BOp))		if (isa<Instruction>(BOp))
cast<Instruction>(BOp)->setFastMathFlags(Flags);		cast<Instruction>(BOp)->setFastMathFlags(Flags);
return BOp;		return BOp;
}		}

void InnerLoopVectorizer::buildScalarSteps(Value ScalarIV, Value Step,		void InnerLoopVectorizer::buildScalarSteps(Value ScalarIV, Value Step,
Value *EntryVal) {		Value *EntryVal,
		const InductionDescriptor &ID) {

// We shouldn't have to build scalar steps if we aren't vectorizing.		// We shouldn't have to build scalar steps if we aren't vectorizing.
assert(VF > 1 && "VF should be greater than one");		assert(VF > 1 && "VF should be greater than one");

// Get the value type and ensure it and the step have the same integer type.		// Get the value type and ensure it and the step have the same integer type.
Type *ScalarIVTy = ScalarIV->getType()->getScalarType();		Type *ScalarIVTy = ScalarIV->getType()->getScalarType();
assert(ScalarIVTy->isIntegerTy() && ScalarIVTy == Step->getType() &&		assert(ScalarIVTy == Step->getType() &&
"Val and Step should have the same integer type");		"Val and Step should have the same type");

		// We build scalar steps for both integer and floating-point induction
		// variables. Here, we determine the kind of arithmetic we will perform.
		Instruction::BinaryOps AddOp;
		Instruction::BinaryOps MulOp;
		if (ScalarIVTy->isIntegerTy()) {
		AddOp = Instruction::Add;
		MulOp = Instruction::Mul;
		} else {
		AddOp = ID.getInductionOpcode();
		delenaUnsubmitted Done Reply Inline Actions I'm not 100% sure about the following comment: For FP, the operation may be FADD or FSUB. It depends on reduction opcode. delena: I'm not 100% sure about the following comment: For FP, the operation may be FADD or FSUB. It…
		mkuperUnsubmitted Not Done Reply Inline Actions I think you're right. (You mean induction opcode, right?) mkuper: I think you're right. (You mean induction opcode, right?)
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions Ah, yes that's true. We'll need to get the step BinOp from the induction descriptor. Thanks! mssimpso: Ah, yes that's true. We'll need to get the step BinOp from the induction descriptor. Thanks!
		MulOp = Instruction::FMul;
		}

// Determine the number of scalars we need to generate for each unroll		// Determine the number of scalars we need to generate for each unroll
// iteration. If EntryVal is uniform, we only need to generate the first		// iteration. If EntryVal is uniform, we only need to generate the first
// lane. Otherwise, we generate all VF values.		// lane. Otherwise, we generate all VF values.
unsigned Lanes =		unsigned Lanes =
Cost->isUniformAfterVectorization(cast<Instruction>(EntryVal), VF) ? 1 : VF;		Cost->isUniformAfterVectorization(cast<Instruction>(EntryVal), VF) ? 1 : VF;

// Compute the scalar steps and save the results in VectorLoopValueMap.		// Compute the scalar steps and save the results in VectorLoopValueMap.
ScalarParts Entry(UF);		ScalarParts Entry(UF);
for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
Entry[Part].resize(VF);		Entry[Part].resize(VF);
for (unsigned Lane = 0; Lane < Lanes; ++Lane) {		for (unsigned Lane = 0; Lane < Lanes; ++Lane) {
auto StartIdx = ConstantInt::get(ScalarIVTy, VF Part + Lane);		auto StartIdx = getSignedIntOrFpConstant(ScalarIVTy, VF Part + Lane);
auto *Mul = Builder.CreateMul(StartIdx, Step);		auto *Mul = addFastMathFlag(Builder.CreateBinOp(MulOp, StartIdx, Step));
auto *Add = Builder.CreateAdd(ScalarIV, Mul);		auto *Add = addFastMathFlag(Builder.CreateBinOp(AddOp, ScalarIV, Mul));
Entry[Part][Lane] = Add;		Entry[Part][Lane] = Add;
}		}
}		}
VectorLoopValueMap.initScalar(EntryVal, Entry);		VectorLoopValueMap.initScalar(EntryVal, Entry);
}		}

int LoopVectorizationLegality::isConsecutivePtr(Value *Ptr) {		int LoopVectorizationLegality::isConsecutivePtr(Value *Ptr) {

▲ Show 20 Lines • Show All 1,066 Lines • ▼ Show 20 Lines	if (Instruction *V = CSEMap.lookup(In)) {
In->eraseFromParent();		In->eraseFromParent();
continue;		continue;
}		}

CSEMap[In] = In;		CSEMap[In] = In;
}		}
}		}

/// \brief Adds a 'fast' flag to floating point operations.
static Value addFastMathFlag(Value V) {
if (isa<FPMathOperator>(V)) {
FastMathFlags Flags;
Flags.setUnsafeAlgebra();
cast<Instruction>(V)->setFastMathFlags(Flags);
}
return V;
}

/// \brief Estimate the overhead of scalarizing an instruction. This is a		/// \brief Estimate the overhead of scalarizing an instruction. This is a
/// convenience wrapper for the type-based getScalarizationOverhead API.		/// convenience wrapper for the type-based getScalarizationOverhead API.
static unsigned getScalarizationOverhead(Instruction *I, unsigned VF,		static unsigned getScalarizationOverhead(Instruction *I, unsigned VF,
const TargetTransformInfo &TTI) {		const TargetTransformInfo &TTI) {
if (VF == 1)		if (VF == 1)
return 0;		return 0;

unsigned Cost = 0;		unsigned Cost = 0;
▲ Show 20 Lines • Show All 984 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::widenPHIInstruction(Instruction *PN, unsigned UF,
const DataLayout &DL = OrigLoop->getHeader()->getModule()->getDataLayout();		const DataLayout &DL = OrigLoop->getHeader()->getModule()->getDataLayout();

// FIXME: The newly created binary instructions should contain nsw/nuw flags,		// FIXME: The newly created binary instructions should contain nsw/nuw flags,
// which can be found from the original scalar operations.		// which can be found from the original scalar operations.
switch (II.getKind()) {		switch (II.getKind()) {
case InductionDescriptor::IK_NoInduction:		case InductionDescriptor::IK_NoInduction:
llvm_unreachable("Unknown induction");		llvm_unreachable("Unknown induction");
case InductionDescriptor::IK_IntInduction:		case InductionDescriptor::IK_IntInduction:
return widenIntInduction(P);		case InductionDescriptor::IK_FpInduction:
		return widenIntOrFpInduction(P);
case InductionDescriptor::IK_PtrInduction: {		case InductionDescriptor::IK_PtrInduction: {
// Handle the pointer induction variable case.		// Handle the pointer induction variable case.
assert(P->getType()->isPointerTy() && "Unexpected type.");		assert(P->getType()->isPointerTy() && "Unexpected type.");
		delenaUnsubmitted Not Done Reply Inline Actions Is the pointer induction really different from int and FP induction? May be you can put everything under widenInduction? delena: Is the pointer induction really different from int and FP induction? May be you can put…
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions It's not, and I'm planning to work on this soon. Thanks! mssimpso: It's not, and I'm planning to work on this soon. Thanks!
		delenaUnsubmitted Not Done Reply Inline Actions I'd put them all in one patch and call the function widenInduction(). but it is not a stumbling-block anyway. delena: I'd put them all in one patch and call the function widenInduction(). but it is not a stumbling…
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions Would you mind if we did this in a separate patch? I was trying to focus this one on floating-point, while keeping everything thing else as NFC as possible. Once committed, we can make sure no performance issues crop up for anyone before moving on to the pointer induction variables. The suggestion makes sense - I just would like to keep things small for review and codgen investigations should the need arise. mssimpso: Would you mind if we did this in a separate patch? I was trying to focus this one on floating…
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions Hi Elena, I'm going to go ahead and commit this - assuming no issues arise over the weekend, I'll submit a patch to take care of the pointer inductions early next week, like you suggested. mssimpso: Hi Elena, I'm going to go ahead and commit this - assuming no issues arise over the weekend…
// This is the normalized GEP that starts counting at zero.		// This is the normalized GEP that starts counting at zero.
Value *PtrInd = Induction;		Value *PtrInd = Induction;
PtrInd = Builder.CreateSExtOrTrunc(PtrInd, II.getStep()->getType());		PtrInd = Builder.CreateSExtOrTrunc(PtrInd, II.getStep()->getType());
// Determine the number of scalars we need to generate for each unroll		// Determine the number of scalars we need to generate for each unroll
// iteration. If the instruction is uniform, we only need to generate the		// iteration. If the instruction is uniform, we only need to generate the
// first lane. Otherwise, we generate all VF values.		// first lane. Otherwise, we generate all VF values.
unsigned Lanes = Cost->isUniformAfterVectorization(P, VF) ? 1 : VF;		unsigned Lanes = Cost->isUniformAfterVectorization(P, VF) ? 1 : VF;
// These are the scalar results. Notice that we don't generate vector GEPs		// These are the scalar results. Notice that we don't generate vector GEPs
// because scalar GEPs result in better code.		// because scalar GEPs result in better code.
ScalarParts Entry(UF);		ScalarParts Entry(UF);
for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
Entry[Part].resize(VF);		Entry[Part].resize(VF);
for (unsigned Lane = 0; Lane < Lanes; ++Lane) {		for (unsigned Lane = 0; Lane < Lanes; ++Lane) {
Constant Idx = ConstantInt::get(PtrInd->getType(), Lane + Part VF);		Constant Idx = ConstantInt::get(PtrInd->getType(), Lane + Part VF);
Value *GlobalIdx = Builder.CreateAdd(PtrInd, Idx);		Value *GlobalIdx = Builder.CreateAdd(PtrInd, Idx);
Value *SclrGep = II.transform(Builder, GlobalIdx, PSE.getSE(), DL);		Value *SclrGep = II.transform(Builder, GlobalIdx, PSE.getSE(), DL);
SclrGep->setName("next.gep");		SclrGep->setName("next.gep");
Entry[Part][Lane] = SclrGep;		Entry[Part][Lane] = SclrGep;
}		}
}		}
VectorLoopValueMap.initScalar(P, Entry);		VectorLoopValueMap.initScalar(P, Entry);
return;		return;
}		}
case InductionDescriptor::IK_FpInduction: {
assert(P->getType() == II.getStartValue()->getType() &&
"Types must match");
// Handle other induction variables that are now based on the
// canonical one.
assert(P != OldInduction && "Primary induction can be integer only");

Value *V = Builder.CreateCast(Instruction::SIToFP, Induction, P->getType());
V = II.transform(Builder, V, PSE.getSE(), DL);
V->setName("fp.offset.idx");

// Now we have scalar op: %fp.offset.idx = StartVal +/- Induction*StepVal

Value *Broadcasted = getBroadcastInstrs(V);
// After broadcasting the induction variable we need to make the vector
// consecutive by adding StepVal0, StepVal1, StepVal*2, etc.
Value *StepVal = cast<SCEVUnknown>(II.getStep())->getValue();
VectorParts Entry(UF);
for (unsigned part = 0; part < UF; ++part)
Entry[part] = getStepVector(Broadcasted, VF * part, StepVal,
II.getInductionOpcode());
VectorLoopValueMap.initVector(P, Entry);
return;
}
}		}
}		}

/// A helper function for checking whether an integer division-related		/// A helper function for checking whether an integer division-related
/// instruction may divide by zero (in which case it must be predicated if		/// instruction may divide by zero (in which case it must be predicated if
/// executed conditionally in the scalar code).		/// executed conditionally in the scalar code).
/// TODO: It may be worthwhile to generalize and check isKnownNonZero().		/// TODO: It may be worthwhile to generalize and check isKnownNonZero().
/// Non-zero divisors that are non compile-time constants will not be		/// Non-zero divisors that are non compile-time constants will not be
▲ Show 20 Lines • Show All 158 Lines • ▼ Show 20 Lines	case Instruction::BitCast: {
auto *CI = dyn_cast<CastInst>(&I);		auto *CI = dyn_cast<CastInst>(&I);
setDebugLocFromInst(Builder, CI);		setDebugLocFromInst(Builder, CI);

// Optimize the special case where the source is a constant integer		// Optimize the special case where the source is a constant integer
// induction variable. Notice that we can only optimize the 'trunc' case		// induction variable. Notice that we can only optimize the 'trunc' case
// because (a) FP conversions lose precision, (b) sext/zext may wrap, and		// because (a) FP conversions lose precision, (b) sext/zext may wrap, and
// (c) other casts depend on pointer size.		// (c) other casts depend on pointer size.
if (Cost->isOptimizableIVTruncate(CI, VF)) {		if (Cost->isOptimizableIVTruncate(CI, VF)) {
widenIntInduction(cast<PHINode>(CI->getOperand(0)),		widenIntOrFpInduction(cast<PHINode>(CI->getOperand(0)),
cast<TruncInst>(CI));		cast<TruncInst>(CI));
break;		break;
}		}

/// Vectorize casts.		/// Vectorize casts.
Type *DestTy =		Type *DestTy =
(VF == 1) ? CI->getType() : VectorType::get(CI->getType(), VF);		(VF == 1) ? CI->getType() : VectorType::get(CI->getType(), VF);

const VectorParts &A = getVectorValue(CI->getOperand(0));		const VectorParts &A = getVectorValue(CI->getOperand(0));
▲ Show 20 Lines • Show All 2,898 Lines • Show Last 20 Lines

test/Transforms/LoopVectorize/float-induction.ll

	; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=4 -dce -instcombine -S \| FileCheck --check-prefix VEC4_INTERL1 %s			; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=4 -dce -instcombine -S \| FileCheck --check-prefix VEC4_INTERL1 %s
	; RUN: opt < %s -loop-vectorize -force-vector-interleave=2 -force-vector-width=4 -dce -instcombine -S \| FileCheck --check-prefix VEC4_INTERL2 %s			; RUN: opt < %s -loop-vectorize -force-vector-interleave=2 -force-vector-width=4 -dce -instcombine -S \| FileCheck --check-prefix VEC4_INTERL2 %s
	; RUN: opt < %s -loop-vectorize -force-vector-interleave=2 -force-vector-width=1 -dce -instcombine -S \| FileCheck --check-prefix VEC1_INTERL2 %s			; RUN: opt < %s -loop-vectorize -force-vector-interleave=2 -force-vector-width=1 -dce -instcombine -S \| FileCheck --check-prefix VEC1_INTERL2 %s
	; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=2 -dce -simplifycfg -instcombine -S \| FileCheck --check-prefix VEC2_INTERL1_PRED_STORE %s			; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=2 -dce -simplifycfg -instcombine -S \| FileCheck --check-prefix VEC2_INTERL1_PRED_STORE %s
				mkuperUnsubmitted Not Done Reply Inline Actions Did you run it through the update script? If you did, could you have the diff show the actual diff vs. running it with the old code? mkuper: Did you run it through the update script? If you did, could you have the diff show the actual…
				mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions Hey Michael, can you clarify what you mean here? I did use the script to help generate the checks (then moved the test into this file). You're just wanting to see the lines that are different with and without the patch? mssimpso: Hey Michael, can you clarify what you mean here? I did use the script to help generate the…
				mkuperUnsubmitted Done Reply Inline Actions Yes, otherwise it's kind of hard to see what changed. I suggest running the script over the test with the existing code, committing that test, and rebasing this patch on that. That way we can actually see what happened. mkuper: Yes, otherwise it's kind of hard to see what changed. I suggest running the script over the…
				mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions Great, that's what I was thinking as well. mssimpso: Great, that's what I was thinking as well.
				mkuperUnsubmitted Not Done Reply Inline Actions Sorry, I meant for all test cases you changed, not just the new one. mkuper: Sorry, I meant for all test cases you changed, not just the new one.
				mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions Ah, I see! Sorry for the confusion. I'll run the script over the tests I changed and rebase. mssimpso: Ah, I see! Sorry for the confusion. I'll run the script over the tests I changed and rebase.

	; VEC4_INTERL1-LABEL: @fp_iv_loop1(			; VEC4_INTERL1-LABEL: @fp_iv_loop1(
	; VEC4_INTERL1: %[[FP_INC:.]] = load float, float @fp_inc			; VEC4_INTERL1: vector.ph:
				; VEC4_INTERL1-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <4 x float> undef, float %init, i32 0
				; VEC4_INTERL1-NEXT: [[DOTSPLAT:%.*]] = shufflevector <4 x float> [[DOTSPLATINSERT]], <4 x float> undef, <4 x i32> zeroinitializer
				; VEC4_INTERL1-NEXT: [[DOTSPLATINSERT2:%.*]] = insertelement <4 x float> undef, float %fpinc, i32 0
				; VEC4_INTERL1-NEXT: [[DOTSPLAT3:%.*]] = shufflevector <4 x float> [[DOTSPLATINSERT2]], <4 x float> undef, <4 x i32> zeroinitializer
				; VEC4_INTERL1-NEXT: [[TMP5:%.*]] = fmul fast <4 x float> [[DOTSPLAT3]], <float 0.000000e+00, float 1.000000e+00, float 2.000000e+00, float 3.000000e+00>
				; VEC4_INTERL1-NEXT: [[INDUCTION4:%.*]] = fsub fast <4 x float> [[DOTSPLAT]], [[TMP5]]
				; VEC4_INTERL1-NEXT: [[TMP6:%.*]] = fmul fast float %fpinc, 4.000000e+00
				; VEC4_INTERL1-NEXT: [[DOTSPLATINSERT5:%.*]] = insertelement <4 x float> undef, float [[TMP6]], i32 0
				; VEC4_INTERL1-NEXT: [[DOTSPLAT6:%.*]] = shufflevector <4 x float> [[DOTSPLATINSERT5]], <4 x float> undef, <4 x i32> zeroinitializer
				; VEC4_INTERL1-NEXT: br label %vector.body
	; VEC4_INTERL1: vector.body:			; VEC4_INTERL1: vector.body:
	; VEC4_INTERL1: %[[FP_INDEX:.]] = sitofp i64 {{.}} to float			; VEC4_INTERL1-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], %vector.body ]
	; VEC4_INTERL1: %[[VEC_INCR:.]] = fmul fast float {{.}}, %[[FP_INDEX]]			; VEC4_INTERL1-NEXT: [[VEC_IND:%.]] = phi <4 x float> [ [[INDUCTION4]], %vector.ph ], [ [[VEC_IND_NEXT:%.]], %vector.body ]
	; VEC4_INTERL1: %[[FP_OFFSET_IDX:.*]] = fsub fast float %init, %[[VEC_INCR]]			; VEC4_INTERL1-NEXT: [[TMP7:%.]] = getelementptr inbounds float, float %A, i64 [[INDEX]]
	; VEC4_INTERL1: %[[BRCT_INSERT:.*]] = insertelement <4 x float> undef, float %[[FP_OFFSET_IDX]], i32 0			; VEC4_INTERL1-NEXT: [[TMP8:%.]] = bitcast float [[TMP7]] to <4 x float>*
	; VEC4_INTERL1-NEXT: %[[BRCT_SPLAT:.*]] = shufflevector <4 x float> %[[BRCT_INSERT]], <4 x float> undef, <4 x i32> zeroinitializer			; VEC4_INTERL1-NEXT: store <4 x float> [[VEC_IND]], <4 x float>* [[TMP8]], align 4
	; VEC4_INTERL1: %[[BRCT_INSERT:.]] = insertelement {{.}} %[[FP_INC]]			; VEC4_INTERL1-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4
	; VEC4_INTERL1-NEXT: %[[FP_INC_BCST:.]] = shufflevector <4 x float> %[[BRCT_INSERT]], {{.}} zeroinitializer			; VEC4_INTERL1-NEXT: [[VEC_IND_NEXT]] = fsub fast <4 x float> [[VEC_IND]], [[DOTSPLAT6]]
	; VEC4_INTERL1: %[[VSTEP:.*]] = fmul fast <4 x float> %[[FP_INC_BCST]], <float 0.000000e+00, float 1.000000e+00, float 2.000000e+00, float 3.000000e+00>			; VEC4_INTERL1: br i1 {{.*}}, label %middle.block, label %vector.body
	; VEC4_INTERL1-NEXT: %[[VEC_INDUCTION:.*]] = fsub fast <4 x float> %[[BRCT_SPLAT]], %[[VSTEP]]
	; VEC4_INTERL1: store <4 x float> %[[VEC_INDUCTION]]

	; VEC4_INTERL2-LABEL: @fp_iv_loop1(			; VEC4_INTERL2-LABEL: @fp_iv_loop1(
	; VEC4_INTERL2: %[[FP_INC:.]] = load float, float @fp_inc			; VEC4_INTERL2: vector.ph:
				; VEC4_INTERL2-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <4 x float> undef, float %init, i32 0
				; VEC4_INTERL2-NEXT: [[DOTSPLAT:%.*]] = shufflevector <4 x float> [[DOTSPLATINSERT]], <4 x float> undef, <4 x i32> zeroinitializer
				; VEC4_INTERL2-NEXT: [[DOTSPLATINSERT3:%.*]] = insertelement <4 x float> undef, float %fpinc, i32 0
				; VEC4_INTERL2-NEXT: [[DOTSPLAT4:%.*]] = shufflevector <4 x float> [[DOTSPLATINSERT3]], <4 x float> undef, <4 x i32> zeroinitializer
				; VEC4_INTERL2-NEXT: [[TMP5:%.*]] = fmul fast <4 x float> [[DOTSPLAT4]], <float 0.000000e+00, float 1.000000e+00, float 2.000000e+00, float 3.000000e+00>
				; VEC4_INTERL2-NEXT: [[INDUCTION5:%.*]] = fsub fast <4 x float> [[DOTSPLAT]], [[TMP5]]
				; VEC4_INTERL2-NEXT: [[TMP6:%.*]] = fmul fast float %fpinc, 4.000000e+00
				; VEC4_INTERL2-NEXT: [[DOTSPLATINSERT6:%.*]] = insertelement <4 x float> undef, float [[TMP6]], i32 0
				; VEC4_INTERL2-NEXT: [[DOTSPLAT7:%.*]] = shufflevector <4 x float> [[DOTSPLATINSERT6]], <4 x float> undef, <4 x i32> zeroinitializer
				; VEC4_INTERL2-NEXT: br label %vector.body
	; VEC4_INTERL2: vector.body:			; VEC4_INTERL2: vector.body:
	; VEC4_INTERL2: %[[INDEX:.]] = sitofp i64 {{.}} to float			; VEC4_INTERL2-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], %vector.body ]
	; VEC4_INTERL2: %[[VEC_INCR:.]] = fmul fast float %{{.}}, %[[INDEX]]			; VEC4_INTERL2-NEXT: [[VEC_IND:%.]] = phi <4 x float> [ [[INDUCTION5]], %vector.ph ], [ [[VEC_IND_NEXT:%.]], %vector.body ]
	; VEC4_INTERL2: fsub fast float %init, %[[VEC_INCR]]			; VEC4_INTERL2-NEXT: [[STEP_ADD:%.*]] = fsub fast <4 x float> [[VEC_IND]], [[DOTSPLAT7]]
	; VEC4_INTERL2: %[[VSTEP1:.]] = fmul fast <4 x float> %{{.}}, <float 0.000000e+00, float 1.000000e+00, float 2.000000e+00, float 3.000000e+00>			; VEC4_INTERL2-NEXT: [[TMP7:%.]] = getelementptr inbounds float, float %A, i64 [[INDEX]]
	; VEC4_INTERL2-NEXT: %[[VEC_INDUCTION1:.]] = fsub fast <4 x float> {{.}}, %[[VSTEP1]]			; VEC4_INTERL2-NEXT: [[TMP8:%.]] = bitcast float [[TMP7]] to <4 x float>*
	; VEC4_INTERL2: %[[VSTEP2:.]] = fmul fast <4 x float> %{{.}}, <float 4.000000e+00, float 5.000000e+00, float 6.000000e+00, float 7.000000e+00>			; VEC4_INTERL2-NEXT: store <4 x float> [[VEC_IND]], <4 x float>* [[TMP8]], align 4
	; VEC4_INTERL2-NEXT: %[[VEC_INDUCTION2:.]] = fsub fast <4 x float> {{.}}, %[[VSTEP2]]			; VEC4_INTERL2-NEXT: [[TMP9:%.]] = getelementptr float, float [[TMP7]], i64 4
	; VEC4_INTERL2: store <4 x float> %[[VEC_INDUCTION1]]			; VEC4_INTERL2-NEXT: [[TMP10:%.]] = bitcast float [[TMP9]] to <4 x float>*
	; VEC4_INTERL2: store <4 x float> %[[VEC_INDUCTION2]]			; VEC4_INTERL2-NEXT: store <4 x float> [[STEP_ADD]], <4 x float>* [[TMP10]], align 4
				; VEC4_INTERL2-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 8
				; VEC4_INTERL2-NEXT: [[VEC_IND_NEXT]] = fsub fast <4 x float> [[STEP_ADD]], [[DOTSPLAT7]]
				; VEC4_INTERL2: br i1 {{.*}}, label %middle.block, label %vector.body

	; VEC1_INTERL2-LABEL: @fp_iv_loop1(			; VEC1_INTERL2-LABEL: @fp_iv_loop1(
	; VEC1_INTERL2: %[[FP_INC:.]] = load float, float @fp_inc			; VEC1_INTERL2: %[[FP_INC:.]] = load float, float @fp_inc
	; VEC1_INTERL2: vector.body:			; VEC1_INTERL2: vector.body:
	; VEC1_INTERL2: %[[INDEX:.]] = sitofp i64 {{.}} to float			; VEC1_INTERL2: %[[INDEX:.]] = sitofp i64 {{.}} to float
	; VEC1_INTERL2: %[[STEP:.]] = fmul fast float %{{.}}, %[[INDEX]]			; VEC1_INTERL2: %[[STEP:.]] = fmul fast float %{{.}}, %[[INDEX]]
	; VEC1_INTERL2: %[[FP_OFFSET_IDX:.*]] = fsub fast float %init, %[[STEP]]			; VEC1_INTERL2: %[[FP_OFFSET_IDX:.*]] = fsub fast float %init, %[[STEP]]
	; VEC1_INTERL2: %[[SCALAR_INDUCTION2:.*]] = fsub fast float %[[FP_OFFSET_IDX]], %[[FP_INC]]			; VEC1_INTERL2: %[[SCALAR_INDUCTION2:.*]] = fsub fast float %[[FP_OFFSET_IDX]], %[[FP_INC]]
	▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
	; float x = init;			; float x = init;
	; for (int i=0; i < N; ++i) {			; for (int i=0; i < N; ++i) {
	; A[i] = x;			; A[i] = x;
	; x += 0.5;			; x += 0.5;
	; }			; }
	;}			;}

	; VEC4_INTERL1-LABEL: @fp_iv_loop2(			; VEC4_INTERL1-LABEL: @fp_iv_loop2(
	; VEC4_INTERL1: vector.body			; VEC4_INTERL1: vector.ph:
	; VEC4_INTERL1: %[[index:.*]] = phi i64 [ 0, %vector.ph ]			; VEC4_INTERL1-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <4 x float> undef, float %init, i32 0
	; VEC4_INTERL1: sitofp i64 %[[index]] to float			; VEC4_INTERL1-NEXT: [[DOTSPLAT:%.*]] = shufflevector <4 x float> [[DOTSPLATINSERT]], <4 x float> undef, <4 x i32> zeroinitializer
	; VEC4_INTERL1: %[[VAR1:.]] = fmul fast float {{.}}, 5.000000e-01			; VEC4_INTERL1-NEXT: [[INDUCTION2:%.*]] = fadd fast <4 x float> [[DOTSPLAT]], <float 0.000000e+00, float 5.000000e-01, float 1.000000e+00, float 1.500000e+00>
	; VEC4_INTERL1: %[[VAR2:.*]] = fadd fast float %[[VAR1]]			; VEC4_INTERL1-NEXT: br label %vector.body
	; VEC4_INTERL1: insertelement <4 x float> undef, float %[[VAR2]], i32 0			; VEC4_INTERL1: vector.body:
	; VEC4_INTERL1: shufflevector <4 x float> {{.*}}, <4 x float> undef, <4 x i32> zeroinitializer			; VEC4_INTERL1-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], %vector.body ]
	; VEC4_INTERL1: fadd fast <4 x float> {{.*}}, <float 0.000000e+00, float 5.000000e-01, float 1.000000e+00, float 1.500000e+00>			; VEC4_INTERL1-NEXT: [[VEC_IND:%.]] = phi <4 x float> [ [[INDUCTION2]], %vector.ph ], [ [[VEC_IND_NEXT:%.]], %vector.body ]
	; VEC4_INTERL1: store <4 x float>			; VEC4_INTERL1-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float %A, i64 [[INDEX]]
				; VEC4_INTERL1-NEXT: [[TMP6:%.]] = bitcast float [[TMP5]] to <4 x float>*
				; VEC4_INTERL1-NEXT: store <4 x float> [[VEC_IND]], <4 x float>* [[TMP6]], align 4
				; VEC4_INTERL1-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4
				; VEC4_INTERL1-NEXT: [[VEC_IND_NEXT]] = fadd fast <4 x float> [[VEC_IND]], <float 2.000000e+00, float 2.000000e+00, float 2.000000e+00, float 2.000000e+00>
				; VEC4_INTERL1: br i1 {{.*}}, label %middle.block, label %vector.body

	define void @fp_iv_loop2(float %init, float* noalias nocapture %A, i32 %N) #0 {			define void @fp_iv_loop2(float %init, float* noalias nocapture %A, i32 %N) #0 {
	entry:			entry:
	%cmp4 = icmp sgt i32 %N, 0			%cmp4 = icmp sgt i32 %N, 0
	br i1 %cmp4, label %for.body.preheader, label %for.end			br i1 %cmp4, label %for.body.preheader, label %for.end

	for.body.preheader: ; preds = %entry			for.body.preheader: ; preds = %entry
	br label %for.body			br label %for.body
	Show All 23 Lines
	; for (; i < N; ++i) {			; for (; i < N; ++i) {
	; A[i] = x;			; A[i] = x;
	; x += fp_inc;			; x += fp_inc;
	; y -= 0.5;			; y -= 0.5;
	; B[i] = x + y;			; B[i] = x + y;
	; C[i] = y;			; C[i] = y;
	; }			; }
	;}			;}

	; VEC4_INTERL1-LABEL: @fp_iv_loop3(			; VEC4_INTERL1-LABEL: @fp_iv_loop3(
	; VEC4_INTERL1: vector.body			; VEC4_INTERL1: for.body.lr.ph:
	; VEC4_INTERL1: %[[index:.*]] = phi i64 [ 0, %vector.ph ]			; VEC4_INTERL1: [[TMP0:%.]] = load float, float @fp_inc, align 4
	; VEC4_INTERL1: sitofp i64 %[[index]] to float			; VEC4_INTERL1: vector.ph:
	; VEC4_INTERL1: %[[VAR1:.]] = fmul fast float {{.}}, -5.000000e-01			; VEC4_INTERL1-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <4 x float> undef, float %init, i32 0
	; VEC4_INTERL1: fadd fast float %[[VAR1]]			; VEC4_INTERL1-NEXT: [[DOTSPLAT:%.*]] = shufflevector <4 x float> [[DOTSPLATINSERT]], <4 x float> undef, <4 x i32> zeroinitializer
	; VEC4_INTERL1: fadd fast <4 x float> {{.*}}, <float -5.000000e-01, float -1.000000e+00, float -1.500000e+00, float -2.000000e+00>			; VEC4_INTERL1-NEXT: [[DOTSPLATINSERT5:%.*]] = insertelement <4 x float> undef, float [[TMP0]], i32 0
	; VEC4_INTERL1: store <4 x float>			; VEC4_INTERL1-NEXT: [[DOTSPLAT6:%.*]] = shufflevector <4 x float> [[DOTSPLATINSERT5]], <4 x float> undef, <4 x i32> zeroinitializer
				; VEC4_INTERL1-NEXT: [[TMP7:%.*]] = fmul fast <4 x float> [[DOTSPLAT6]], <float 0.000000e+00, float 1.000000e+00, float 2.000000e+00, float 3.000000e+00>
				; VEC4_INTERL1-NEXT: [[INDUCTION7:%.*]] = fadd fast <4 x float> [[DOTSPLAT]], [[TMP7]]
				; VEC4_INTERL1-NEXT: [[TMP8:%.*]] = fmul fast float [[TMP0]], 4.000000e+00
				; VEC4_INTERL1-NEXT: [[DOTSPLATINSERT8:%.*]] = insertelement <4 x float> undef, float [[TMP8]], i32 0
				; VEC4_INTERL1-NEXT: [[DOTSPLAT9:%.*]] = shufflevector <4 x float> [[DOTSPLATINSERT8]], <4 x float> undef, <4 x i32> zeroinitializer
				; VEC4_INTERL1-NEXT: [[BROADCAST_SPLATINSERT12:%.*]] = insertelement <4 x float> undef, float [[TMP0]], i32 0
				; VEC4_INTERL1-NEXT: [[BROADCAST_SPLAT13:%.*]] = shufflevector <4 x float> [[BROADCAST_SPLATINSERT12]], <4 x float> undef, <4 x i32> zeroinitializer
				; VEC4_INTERL1-NEXT: br label [[VECTOR_BODY:%.*]]
				; VEC4_INTERL1: vector.body:
				; VEC4_INTERL1-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], %vector.body ]
				; VEC4_INTERL1-NEXT: [[VEC_IND:%.]] = phi <4 x float> [ <float 0x3FB99999A0000000, float 0xBFD99999A0000000, float 0xBFECCCCCC0000000, float 0xBFF6666660000000>, %vector.ph ], [ [[VEC_IND_NEXT:%.]], %vector.body ]
				; VEC4_INTERL1-NEXT: [[VEC_IND10:%.]] = phi <4 x float> [ [[INDUCTION7]], %vector.ph ], [ [[VEC_IND_NEXT11:%.]], %vector.body ]
				; VEC4_INTERL1-NEXT: [[TMP9:%.]] = getelementptr inbounds float, float %A, i64 [[INDEX]]
				; VEC4_INTERL1-NEXT: [[TMP10:%.]] = bitcast float [[TMP9]] to <4 x float>*
				; VEC4_INTERL1-NEXT: store <4 x float> [[VEC_IND10]], <4 x float>* [[TMP10]], align 4
				; VEC4_INTERL1-NEXT: [[TMP11:%.*]] = fadd fast <4 x float> [[VEC_IND10]], [[BROADCAST_SPLAT13]]
				; VEC4_INTERL1-NEXT: [[TMP12:%.*]] = fadd fast <4 x float> [[VEC_IND]], <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01>
				; VEC4_INTERL1-NEXT: [[TMP13:%.*]] = fadd fast <4 x float> [[TMP12]], [[TMP11]]
				; VEC4_INTERL1-NEXT: [[TMP14:%.]] = getelementptr inbounds float, float %B, i64 [[INDEX]]
				; VEC4_INTERL1-NEXT: [[TMP15:%.]] = bitcast float [[TMP14]] to <4 x float>*
				; VEC4_INTERL1-NEXT: store <4 x float> [[TMP13]], <4 x float>* [[TMP15]], align 4
				; VEC4_INTERL1-NEXT: [[TMP16:%.]] = getelementptr inbounds float, float %C, i64 [[INDEX]]
				; VEC4_INTERL1-NEXT: [[TMP17:%.]] = bitcast float [[TMP16]] to <4 x float>*
				; VEC4_INTERL1-NEXT: store <4 x float> [[TMP12]], <4 x float>* [[TMP17]], align 4
				; VEC4_INTERL1-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4
				; VEC4_INTERL1-NEXT: [[VEC_IND_NEXT]] = fadd fast <4 x float> [[VEC_IND]], <float -2.000000e+00, float -2.000000e+00, float -2.000000e+00, float -2.000000e+00>
				; VEC4_INTERL1-NEXT: [[VEC_IND_NEXT11]] = fadd fast <4 x float> [[VEC_IND10]], [[DOTSPLAT9]]
				; VEC4_INTERL1: br i1 {{.*}}, label %middle.block, label %vector.body

	define void @fp_iv_loop3(float %init, float* noalias nocapture %A, float* noalias nocapture %B, float* noalias nocapture %C, i32 %N) #1 {			define void @fp_iv_loop3(float %init, float* noalias nocapture %A, float* noalias nocapture %B, float* noalias nocapture %C, i32 %N) #1 {
	entry:			entry:
	%cmp9 = icmp sgt i32 %N, 0			%cmp9 = icmp sgt i32 %N, 0
	br i1 %cmp9, label %for.body.lr.ph, label %for.end			br i1 %cmp9, label %for.body.lr.ph, label %for.end

	for.body.lr.ph: ; preds = %entry			for.body.lr.ph: ; preds = %entry
	%0 = load float, float* @fp_inc, align 4			%0 = load float, float* @fp_inc, align 4
	Show All 29 Lines
	; float x = 1.0;			; float x = 1.0;
	; for (int i=0; i < N; ++i) {			; for (int i=0; i < N; ++i) {
	; A[i] = x;			; A[i] = x;
	; x += 0.5;			; x += 0.5;
	; }			; }
	;}			;}

	; VEC4_INTERL1-LABEL: @fp_iv_loop4(			; VEC4_INTERL1-LABEL: @fp_iv_loop4(
	; VEC4_INTERL1: vector.body			; VEC4_INTERL1: vector.ph:
	; VEC4_INTERL1-NOT: fmul fast <4 x float>			; VEC4_INTERL1-NEXT: br label %vector.body
	; VEC4_INTERL1: %[[induction:.]] = fadd fast <4 x float> %{{.}}, <float 0.000000e+00, float 5.000000e-01, float 1.000000e+00, float 1.500000e+00>			; VEC4_INTERL1: vector.body:
	; VEC4_INTERL1: store <4 x float> %[[induction]]			; VEC4_INTERL1-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], %vector.body ]
				; VEC4_INTERL1-NEXT: [[VEC_IND:%.]] = phi <4 x float> [ <float 1.000000e+00, float 1.500000e+00, float 2.000000e+00, float 2.500000e+00>, %vector.ph ], [ [[VEC_IND_NEXT:%.]], %vector.body ]
				; VEC4_INTERL1-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float %A, i64 [[INDEX]]
				; VEC4_INTERL1-NEXT: [[TMP6:%.]] = bitcast float [[TMP5]] to <4 x float>*
				; VEC4_INTERL1-NEXT: store <4 x float> [[VEC_IND]], <4 x float>* [[TMP6]], align 4
				; VEC4_INTERL1-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4
				; VEC4_INTERL1-NEXT: [[VEC_IND_NEXT]] = fadd fast <4 x float> [[VEC_IND]], <float 2.000000e+00, float 2.000000e+00, float 2.000000e+00, float 2.000000e+00>
				; VEC4_INTERL1: br i1 {{.*}}, label %middle.block, label %vector.body

	define void @fp_iv_loop4(float* noalias nocapture %A, i32 %N) {			define void @fp_iv_loop4(float* noalias nocapture %A, i32 %N) {
	entry:			entry:
	%cmp4 = icmp sgt i32 %N, 0			%cmp4 = icmp sgt i32 %N, 0
	br i1 %cmp4, label %for.body.preheader, label %for.end			br i1 %cmp4, label %for.body.preheader, label %for.end

	for.body.preheader: ; preds = %entry			for.body.preheader: ; preds = %entry
	br label %for.body			br label %for.body
	Show All 15 Lines
	for.end: ; preds = %for.end.loopexit, %entry			for.end: ; preds = %for.end.loopexit, %entry
	ret void			ret void
	}			}

	; VEC2_INTERL1_PRED_STORE-LABEL: @non_primary_iv_float_scalar(			; VEC2_INTERL1_PRED_STORE-LABEL: @non_primary_iv_float_scalar(
	; VEC2_INTERL1_PRED_STORE: vector.body:			; VEC2_INTERL1_PRED_STORE: vector.body:
	; VEC2_INTERL1_PRED_STORE-NEXT: [[INDEX:%.]] = phi i64 [ [[INDEX_NEXT:%.]], %[[PRED_STORE_CONTINUE7:.*]] ], [ 0, %min.iters.checked ]			; VEC2_INTERL1_PRED_STORE-NEXT: [[INDEX:%.]] = phi i64 [ [[INDEX_NEXT:%.]], %[[PRED_STORE_CONTINUE7:.*]] ], [ 0, %min.iters.checked ]
	; VEC2_INTERL1_PRED_STORE-NEXT: [[TMP1:%.*]] = sitofp i64 [[INDEX]] to float			; VEC2_INTERL1_PRED_STORE-NEXT: [[TMP1:%.*]] = sitofp i64 [[INDEX]] to float
	; VEC2_INTERL1_PRED_STORE-NEXT: [[BROADCAST_SPLATINSERT3:%.*]] = insertelement <2 x float> undef, float [[TMP1]], i32 0
	; VEC2_INTERL1_PRED_STORE-NEXT: [[BROADCAST_SPLAT4:%.*]] = shufflevector <2 x float> [[BROADCAST_SPLATINSERT3]], <2 x float> undef, <2 x i32> zeroinitializer
	; VEC2_INTERL1_PRED_STORE-NEXT: [[INDUCTION5:%.*]] = fadd fast <2 x float> [[BROADCAST_SPLAT4]], <float 0.000000e+00, float 1.000000e+00>
	; VEC2_INTERL1_PRED_STORE-NEXT: [[TMP2:%.]] = getelementptr inbounds float, float %A, i64 [[INDEX]]			; VEC2_INTERL1_PRED_STORE-NEXT: [[TMP2:%.]] = getelementptr inbounds float, float %A, i64 [[INDEX]]
	; VEC2_INTERL1_PRED_STORE-NEXT: [[TMP3:%.]] = bitcast float [[TMP2]] to <2 x float>*			; VEC2_INTERL1_PRED_STORE-NEXT: [[TMP3:%.]] = bitcast float [[TMP2]] to <2 x float>*
	; VEC2_INTERL1_PRED_STORE-NEXT: [[WIDE_LOAD:%.]] = load <2 x float>, <2 x float> [[TMP3]], align 4			; VEC2_INTERL1_PRED_STORE-NEXT: [[WIDE_LOAD:%.]] = load <2 x float>, <2 x float> [[TMP3]], align 4
	; VEC2_INTERL1_PRED_STORE-NEXT: [[TMP4:%.*]] = fcmp fast oeq <2 x float> [[WIDE_LOAD]], zeroinitializer			; VEC2_INTERL1_PRED_STORE-NEXT: [[TMP4:%.*]] = fcmp fast oeq <2 x float> [[WIDE_LOAD]], zeroinitializer
	; VEC2_INTERL1_PRED_STORE-NEXT: [[TMP5:%.*]] = extractelement <2 x i1> [[TMP4]], i32 0			; VEC2_INTERL1_PRED_STORE-NEXT: [[TMP5:%.*]] = extractelement <2 x i1> [[TMP4]], i32 0
	; VEC2_INTERL1_PRED_STORE-NEXT: br i1 [[TMP5]], label %[[PRED_STORE_IF:.]], label %[[PRED_STORE_CONTINUE:.]]			; VEC2_INTERL1_PRED_STORE-NEXT: br i1 [[TMP5]], label %[[PRED_STORE_IF:.]], label %[[PRED_STORE_CONTINUE:.]]
	; VEC2_INTERL1_PRED_STORE: [[PRED_STORE_IF]]:			; VEC2_INTERL1_PRED_STORE: [[PRED_STORE_IF]]:
	; VEC2_INTERL1_PRED_STORE-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[INDUCTION5]], i32 0
	; VEC2_INTERL1_PRED_STORE-NEXT: [[TMP7:%.]] = getelementptr inbounds float, float %A, i64 [[INDEX]]			; VEC2_INTERL1_PRED_STORE-NEXT: [[TMP7:%.]] = getelementptr inbounds float, float %A, i64 [[INDEX]]
	; VEC2_INTERL1_PRED_STORE-NEXT: store float [[TMP6]], float* [[TMP7]], align 4			; VEC2_INTERL1_PRED_STORE-NEXT: store float [[TMP1]], float* [[TMP7]], align 4
	; VEC2_INTERL1_PRED_STORE-NEXT: br label %[[PRED_STORE_CONTINUE]]			; VEC2_INTERL1_PRED_STORE-NEXT: br label %[[PRED_STORE_CONTINUE]]
	; VEC2_INTERL1_PRED_STORE: [[PRED_STORE_CONTINUE]]:			; VEC2_INTERL1_PRED_STORE: [[PRED_STORE_CONTINUE]]:
	; VEC2_INTERL1_PRED_STORE-NEXT: [[TMP8:%.*]] = extractelement <2 x i1> [[TMP4]], i32 1			; VEC2_INTERL1_PRED_STORE-NEXT: [[TMP8:%.*]] = extractelement <2 x i1> [[TMP4]], i32 1
	; VEC2_INTERL1_PRED_STORE-NEXT: br i1 [[TMP8]], label %[[PRED_STORE_IF6:.*]], label %[[PRED_STORE_CONTINUE7]]			; VEC2_INTERL1_PRED_STORE-NEXT: br i1 [[TMP8]], label %[[PRED_STORE_IF6:.*]], label %[[PRED_STORE_CONTINUE7]]
	; VEC2_INTERL1_PRED_STORE: [[PRED_STORE_IF6]]:			; VEC2_INTERL1_PRED_STORE: [[PRED_STORE_IF6]]:
	; VEC2_INTERL1_PRED_STORE-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[INDUCTION5]], i32 1			; VEC2_INTERL1_PRED_STORE-NEXT: [[TMP9:%.*]] = fadd fast float [[TMP1]], 1.000000e+00
	; VEC2_INTERL1_PRED_STORE-NEXT: [[TMP10:%.*]] = or i64 [[INDEX]], 1			; VEC2_INTERL1_PRED_STORE-NEXT: [[TMP10:%.*]] = or i64 [[INDEX]], 1
	; VEC2_INTERL1_PRED_STORE-NEXT: [[TMP11:%.]] = getelementptr inbounds float, float %A, i64 [[TMP10]]			; VEC2_INTERL1_PRED_STORE-NEXT: [[TMP11:%.]] = getelementptr inbounds float, float %A, i64 [[TMP10]]
	; VEC2_INTERL1_PRED_STORE-NEXT: store float [[TMP9]], float* [[TMP11]], align 4			; VEC2_INTERL1_PRED_STORE-NEXT: store float [[TMP9]], float* [[TMP11]], align 4
	; VEC2_INTERL1_PRED_STORE-NEXT: br label %[[PRED_STORE_CONTINUE7]]			; VEC2_INTERL1_PRED_STORE-NEXT: br label %[[PRED_STORE_CONTINUE7]]
	; VEC2_INTERL1_PRED_STORE: [[PRED_STORE_CONTINUE7]]:			; VEC2_INTERL1_PRED_STORE: [[PRED_STORE_CONTINUE7]]:
	; VEC2_INTERL1_PRED_STORE-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 2			; VEC2_INTERL1_PRED_STORE-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 2
	; VEC2_INTERL1_PRED_STORE: br i1 {{.*}}, label %middle.block, label %vector.body			; VEC2_INTERL1_PRED_STORE: br i1 {{.*}}, label %middle.block, label %vector.body

	Show All 25 Lines