This is an archive of the discontinued LLVM Phabricator instance.

[LV] Don't emit unused scalars for uniform instructions
ClosedPublic

Authored by mssimpso on Sep 6 2016, 12:47 PM.

Download Raw Diff

Details

Reviewers

anemet
wmi
mkuper

Commits

rG15869f86d849: [LV] Don't emit unused scalars for uniform instructions
rL282087: [LV] Don't emit unused scalars for uniform instructions

Summary

If we identify an instruction as uniform after vectorization, we know that we should only use the value corresponding to the first vector lane of each unroll iteration. However, when scalarizing such instructions, we still produce values for the other vector lanes. This patch prevents us from generating the unused scalars.

Diff Detail

Repository: rL LLVM

Event Timeline

mssimpso updated this revision to Diff 70454.Sep 6 2016, 12:47 PM

mssimpso retitled this revision from to [LV] Don't emit unused scalars for uniform instructions.

mssimpso updated this object.

mssimpso added reviewers: mkuper, wmi, anemet.

mssimpso added subscribers: samparker, hfinkel, mcrosier, llvm-commits.

Herald added a subscriber: mzolotukhin. · View Herald TranscriptSep 6 2016, 12:48 PM

mssimpso added a parent revision: D24271: [LV] Don't mark pointers used by scalarized memory accesses uniform.Sep 6 2016, 12:48 PM

mssimpso mentioned this in D23889: [LV] Scalarize instructions marked scalar after vectorization.Sep 6 2016, 12:51 PM

mssimpso added a child revision: D23889: [LV] Scalarize instructions marked scalar after vectorization.

mssimpso added a parent revision: D24511: [LV] Process pointer IVs with PHINodes in collectLoopUniforms.Sep 13 2016, 8:40 AM

Added assert in getScalarValue.

Now that we're more conservative about the values we mark uniform in collectLoopUniforms, this patch should be ready to go. I added an assert to be safe.

Thanks, Matt.
This LGTM, except for some nits.

lib/Transforms/Vectorize/LoopVectorize.cpp
2364 ↗	(On Diff #71819)	The comment is slightly incorrect now ("last" vector lane).
2379 ↗	(On Diff #71819)	As long as you're touching this, maybe change the variable name? "Width" is weird.
2380 ↗	(On Diff #71819)	Maybe just build a splat (one insert + shuffle) directly? InstCombine will clean this up, but IIUC the whole point of this patch is to generate cleaner code pre-instcombine.

This revision is now accepted and ready to land.Sep 20 2016, 3:23 PM

Thanks, Michael! I'll address your comments before committing.

lib/Transforms/Vectorize/LoopVectorize.cpp
2364 ↗	(On Diff #71819)	True. I'll update the comment here.
2379 ↗	(On Diff #71819)	I agree. I'll change this to "Lane" like we have in other places.
2380 ↗	(On Diff #71819)	Good point!

Closed by commit rL282087: [LV] Don't emit unused scalars for uniform instructions (authored by mssimpso). · Explain WhySep 21 2016, 9:59 AM

This revision was automatically updated to reflect the committed changes.

mssimpso marked 3 inline comments as done.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

72 lines

test/

Transforms/

LoopVectorize/

induction.ll

8 lines

reverse_induction.ll

30 lines

Diff 72079

llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,275 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::buildScalarSteps(Value ScalarIV, Value Step,
// We shouldn't have to build scalar steps if we aren't vectorizing.		// We shouldn't have to build scalar steps if we aren't vectorizing.
assert(VF > 1 && "VF should be greater than one");		assert(VF > 1 && "VF should be greater than one");

// Get the value type and ensure it and the step have the same integer type.		// Get the value type and ensure it and the step have the same integer type.
Type *ScalarIVTy = ScalarIV->getType()->getScalarType();		Type *ScalarIVTy = ScalarIV->getType()->getScalarType();
assert(ScalarIVTy->isIntegerTy() && ScalarIVTy == Step->getType() &&		assert(ScalarIVTy->isIntegerTy() && ScalarIVTy == Step->getType() &&
"Val and Step should have the same integer type");		"Val and Step should have the same integer type");

		auto scalarUserIsUniform = [&](User *U) -> bool {
		auto *I = cast<Instruction>(U);
		return !OrigLoop->contains(I) \|\| !Legal->isScalarAfterVectorization(I) \|\|
		Legal->isUniformAfterVectorization(I);
		};

		// Determine the number of scalars we need to generate for each unroll
		// iteration. If EntryVal is uniform or all it's scalar users are uniform, we
		// only need to generate the first lane. Otherwise, we generate all VF
		// values. We are essentially determining if the induction variable has no
		// "multi-scalar" (non-uniform scalar) users.
		unsigned Lanes =
		Legal->isUniformAfterVectorization(cast<Instruction>(EntryVal)) \|\|
		all_of(EntryVal->users(), scalarUserIsUniform)
		? 1
		: VF;

// Compute the scalar steps and save the results in VectorLoopValueMap.		// Compute the scalar steps and save the results in VectorLoopValueMap.
ScalarParts Entry(UF);		ScalarParts Entry(UF);
for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
Entry[Part].resize(VF);		Entry[Part].resize(VF);
for (unsigned Lane = 0; Lane < VF; ++Lane) {		for (unsigned Lane = 0; Lane < Lanes; ++Lane) {
auto StartIdx = ConstantInt::get(ScalarIVTy, VF Part + Lane);		auto StartIdx = ConstantInt::get(ScalarIVTy, VF Part + Lane);
auto *Mul = Builder.CreateMul(StartIdx, Step);		auto *Mul = Builder.CreateMul(StartIdx, Step);
auto *Add = Builder.CreateAdd(ScalarIV, Mul);		auto *Add = Builder.CreateAdd(ScalarIV, Mul);
Entry[Part][Lane] = Add;		Entry[Part][Lane] = Add;
}		}
}		}
VectorLoopValueMap.initScalar(EntryVal, Entry);		VectorLoopValueMap.initScalar(EntryVal, Entry);
}		}
Show All 30 Lines	InnerLoopVectorizer::getVectorValue(Value *V) {
// If the value has not been vectorized, check if it has been scalarized		// If the value has not been vectorized, check if it has been scalarized
// instead. If it has been scalarized, and we actually need the value in		// instead. If it has been scalarized, and we actually need the value in
// vector form, we will construct the vector values on demand.		// vector form, we will construct the vector values on demand.
if (VectorLoopValueMap.hasScalar(V)) {		if (VectorLoopValueMap.hasScalar(V)) {

// Initialize a new vector map entry.		// Initialize a new vector map entry.
VectorParts Entry(UF);		VectorParts Entry(UF);

		// If we've scalarized a value, that value should be an instruction.
		auto *I = cast<Instruction>(V);

// If we aren't vectorizing, we can just copy the scalar map values over to		// If we aren't vectorizing, we can just copy the scalar map values over to
// the vector map.		// the vector map.
if (VF == 1) {		if (VF == 1) {
for (unsigned Part = 0; Part < UF; ++Part)		for (unsigned Part = 0; Part < UF; ++Part)
Entry[Part] = getScalarValue(V, Part, 0);		Entry[Part] = getScalarValue(V, Part, 0);
return VectorLoopValueMap.initVector(V, Entry);		return VectorLoopValueMap.initVector(V, Entry);
}		}

// Get the last scalarized instruction. This corresponds to the instruction		// Get the last scalar instruction we generated for V. If the value is
// we created for the last vector lane on the last unroll iteration.		// known to be uniform after vectorization, this corresponds to lane zero
auto *LastInst = cast<Instruction>(getScalarValue(V, UF - 1, VF - 1));		// of the last unroll iteration. Otherwise, the last instruction is the one
		// we created for the last vector lane of the last unroll iteration.
		unsigned LastLane = Legal->isUniformAfterVectorization(I) ? 0 : VF - 1;
		auto *LastInst = cast<Instruction>(getScalarValue(V, UF - 1, LastLane));

// Set the insert point after the last scalarized instruction. This ensures		// Set the insert point after the last scalarized instruction. This ensures
// the insertelement sequence will directly follow the scalar definitions.		// the insertelement sequence will directly follow the scalar definitions.
auto OldIP = Builder.saveIP();		auto OldIP = Builder.saveIP();
auto NewIP = std::next(BasicBlock::iterator(LastInst));		auto NewIP = std::next(BasicBlock::iterator(LastInst));
Builder.SetInsertPoint(&*NewIP);		Builder.SetInsertPoint(&*NewIP);

// However, if we are vectorizing, we need to construct the vector values		// However, if we are vectorizing, we need to construct the vector values.
// using insertelement instructions. Since the resulting vectors are stored		// If the value is known to be uniform after vectorization, we can just
// in VectorLoopValueMap, we will only generate the insertelements once.		// broadcast the scalar value corresponding to lane zero for each unroll
		// iteration. Otherwise, we construct the vector values using insertelement
		// instructions. Since the resulting vectors are stored in
		// VectorLoopValueMap, we will only generate the insertelements once.
for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
Value *Insert = UndefValue::get(VectorType::get(V->getType(), VF));		Value *VectorValue = nullptr;
		if (Legal->isUniformAfterVectorization(I)) {
		VectorValue = getBroadcastInstrs(getScalarValue(V, Part, 0));
		} else {
		VectorValue = UndefValue::get(VectorType::get(V->getType(), VF));
for (unsigned Lane = 0; Lane < VF; ++Lane)		for (unsigned Lane = 0; Lane < VF; ++Lane)
Insert = Builder.CreateInsertElement(		VectorValue = Builder.CreateInsertElement(
Insert, getScalarValue(V, Part, Lane), Builder.getInt32(Lane));		VectorValue, getScalarValue(V, Part, Lane),
Entry[Part] = Insert;		Builder.getInt32(Lane));
		}
		Entry[Part] = VectorValue;
}		}
Builder.restoreIP(OldIP);		Builder.restoreIP(OldIP);
return VectorLoopValueMap.initVector(V, Entry);		return VectorLoopValueMap.initVector(V, Entry);
}		}

// If this scalar is unknown, assume that it is a constant or that it is		// If this scalar is unknown, assume that it is a constant or that it is
// loop invariant. Broadcast V and save the value for future uses.		// loop invariant. Broadcast V and save the value for future uses.
Value *B = getBroadcastInstrs(V);		Value *B = getBroadcastInstrs(V);
return VectorLoopValueMap.initVector(V, VectorParts(UF, B));		return VectorLoopValueMap.initVector(V, VectorParts(UF, B));
}		}

Value InnerLoopVectorizer::getScalarValue(Value V, unsigned Part,		Value InnerLoopVectorizer::getScalarValue(Value V, unsigned Part,
unsigned Lane) {		unsigned Lane) {

// If the value is not an instruction contained in the loop, it should		// If the value is not an instruction contained in the loop, it should
// already be scalar.		// already be scalar.
if (OrigLoop->isLoopInvariant(V))		if (OrigLoop->isLoopInvariant(V))
return V;		return V;

		assert(Lane > 0 ? !Legal->isUniformAfterVectorization(cast<Instruction>(V))
		: true && "Uniform values only have lane zero");

// If the value from the original loop has not been vectorized, it is		// If the value from the original loop has not been vectorized, it is
// represented by UF x VF scalar values in the new loop. Return the requested		// represented by UF x VF scalar values in the new loop. Return the requested
// scalar value.		// scalar value.
if (VectorLoopValueMap.hasScalar(V))		if (VectorLoopValueMap.hasScalar(V))
return VectorLoopValueMap.ScalarMapStorage[V][Part][Lane];		return VectorLoopValueMap.ScalarMapStorage[V][Part][Lane];

// If the value has not been scalarized, get its entry in VectorLoopValueMap		// If the value has not been scalarized, get its entry in VectorLoopValueMap
// for the given unroll part. If this entry is not a vector type (i.e., the		// for the given unroll part. If this entry is not a vector type (i.e., the
▲ Show 20 Lines • Show All 490 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::scalarizeInstruction(Instruction *Instr,

// Initialize a new scalar map entry.		// Initialize a new scalar map entry.
ScalarParts Entry(UF);		ScalarParts Entry(UF);

VectorParts Cond;		VectorParts Cond;
if (IfPredicateInstr)		if (IfPredicateInstr)
Cond = createBlockInMask(Instr->getParent());		Cond = createBlockInMask(Instr->getParent());

		// Determine the number of scalars we need to generate for each unroll
		// iteration. If the instruction is uniform, we only need to generate the
		// first lane. Otherwise, we generate all VF values.
		unsigned Lanes = Legal->isUniformAfterVectorization(Instr) ? 1 : VF;

// For each vector unroll 'part':		// For each vector unroll 'part':
for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
Entry[Part].resize(VF);		Entry[Part].resize(VF);
// For each scalar that we create:		// For each scalar that we create:
for (unsigned Lane = 0; Lane < VF; ++Lane) {		for (unsigned Lane = 0; Lane < Lanes; ++Lane) {

// Start if-block.		// Start if-block.
Value *Cmp = nullptr;		Value *Cmp = nullptr;
if (IfPredicateInstr) {		if (IfPredicateInstr) {
Cmp = Builder.CreateExtractElement(Cond[Part], Builder.getInt32(Lane));		Cmp = Builder.CreateExtractElement(Cond[Part], Builder.getInt32(Lane));
Cmp = Builder.CreateICmp(ICmpInst::ICMP_EQ, Cmp,		Cmp = Builder.CreateICmp(ICmpInst::ICMP_EQ, Cmp,
ConstantInt::get(Cmp->getType(), 1));		ConstantInt::get(Cmp->getType(), 1));
}		}
▲ Show 20 Lines • Show All 1,493 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::widenPHIInstruction(Instruction *PN, unsigned UF,
case InductionDescriptor::IK_IntInduction:		case InductionDescriptor::IK_IntInduction:
return widenIntInduction(P);		return widenIntInduction(P);
case InductionDescriptor::IK_PtrInduction: {		case InductionDescriptor::IK_PtrInduction: {
// Handle the pointer induction variable case.		// Handle the pointer induction variable case.
assert(P->getType()->isPointerTy() && "Unexpected type.");		assert(P->getType()->isPointerTy() && "Unexpected type.");
// This is the normalized GEP that starts counting at zero.		// This is the normalized GEP that starts counting at zero.
Value *PtrInd = Induction;		Value *PtrInd = Induction;
PtrInd = Builder.CreateSExtOrTrunc(PtrInd, II.getStep()->getType());		PtrInd = Builder.CreateSExtOrTrunc(PtrInd, II.getStep()->getType());
		// Determine the number of scalars we need to generate for each unroll
		// iteration. If the instruction is uniform, we only need to generate the
		// first lane. Otherwise, we generate all VF values.
		unsigned Lanes = Legal->isUniformAfterVectorization(P) ? 1 : VF;
// These are the scalar results. Notice that we don't generate vector GEPs		// These are the scalar results. Notice that we don't generate vector GEPs
// because scalar GEPs result in better code.		// because scalar GEPs result in better code.
ScalarParts Entry(UF);		ScalarParts Entry(UF);
for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
Entry[Part].resize(VF);		Entry[Part].resize(VF);
for (unsigned Lane = 0; Lane < VF; ++Lane) {		for (unsigned Lane = 0; Lane < Lanes; ++Lane) {
Constant Idx = ConstantInt::get(PtrInd->getType(), Lane + Part VF);		Constant Idx = ConstantInt::get(PtrInd->getType(), Lane + Part VF);
Value *GlobalIdx = Builder.CreateAdd(PtrInd, Idx);		Value *GlobalIdx = Builder.CreateAdd(PtrInd, Idx);
Value *SclrGep = II.transform(Builder, GlobalIdx, PSE.getSE(), DL);		Value *SclrGep = II.transform(Builder, GlobalIdx, PSE.getSE(), DL);
SclrGep->setName("next.gep");		SclrGep->setName("next.gep");
Entry[Part][Lane] = SclrGep;		Entry[Part][Lane] = SclrGep;
}		}
}		}
VectorLoopValueMap.initScalar(P, Entry);		VectorLoopValueMap.initScalar(P, Entry);
▲ Show 20 Lines • Show All 2,745 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/LoopVectorize/induction.ll

	Show First 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
	;			;
	; for (int i = 0; i < n; ++i)			; for (int i = 0; i < n; ++i)
	; sum += a[i];			; sum += a[i];
	;			;
	; CHECK-LABEL: @scalarize_induction_variable_01(			; CHECK-LABEL: @scalarize_induction_variable_01(
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]			; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; CHECK: %[[i0:.+]] = add i64 %index, 0			; CHECK: %[[i0:.+]] = add i64 %index, 0
	; CHECK: %[[i1:.+]] = add i64 %index, 1
	; CHECK: getelementptr inbounds i64, i64* %a, i64 %[[i0]]			; CHECK: getelementptr inbounds i64, i64* %a, i64 %[[i0]]
	; CHECK: getelementptr inbounds i64, i64* %a, i64 %[[i1]]
	;			;
	; UNROLL-NO-IC-LABEL: @scalarize_induction_variable_01(			; UNROLL-NO-IC-LABEL: @scalarize_induction_variable_01(
	; UNROLL-NO-IC: vector.body:			; UNROLL-NO-IC: vector.body:
	; UNROLL-NO-IC: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]			; UNROLL-NO-IC: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; UNROLL-NO-IC: %[[i0:.+]] = add i64 %index, 0			; UNROLL-NO-IC: %[[i0:.+]] = add i64 %index, 0
	; UNROLL-NO-IC: %[[i1:.+]] = add i64 %index, 1
	; UNROLL-NO-IC: %[[i2:.+]] = add i64 %index, 2			; UNROLL-NO-IC: %[[i2:.+]] = add i64 %index, 2
	; UNROLL-NO-IC: %[[i3:.+]] = add i64 %index, 3
	; UNROLL-NO-IC: getelementptr inbounds i64, i64* %a, i64 %[[i0]]			; UNROLL-NO-IC: getelementptr inbounds i64, i64* %a, i64 %[[i0]]
	; UNROLL-NO-IC: getelementptr inbounds i64, i64* %a, i64 %[[i1]]
	; UNROLL-NO-IC: getelementptr inbounds i64, i64* %a, i64 %[[i2]]			; UNROLL-NO-IC: getelementptr inbounds i64, i64* %a, i64 %[[i2]]
	; UNROLL-NO-IC: getelementptr inbounds i64, i64* %a, i64 %[[i3]]
	;			;
	; IND-LABEL: @scalarize_induction_variable_01(			; IND-LABEL: @scalarize_induction_variable_01(
	; IND: vector.body:			; IND: vector.body:
	; IND: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]			; IND: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; IND-NOT: add i64 {{.*}}, 2			; IND-NOT: add i64 {{.*}}, 2
	; IND: getelementptr inbounds i64, i64* %a, i64 %index			; IND: getelementptr inbounds i64, i64* %a, i64 %index
	;			;
	; UNROLL-LABEL: @scalarize_induction_variable_01(			; UNROLL-LABEL: @scalarize_induction_variable_01(
	▲ Show 20 Lines • Show All 502 Lines • ▼ Show 20 Lines
	; CHECK: %[[INSERT:.*]] = insertelement <2 x i32> undef, i32 %i, i32 0			; CHECK: %[[INSERT:.*]] = insertelement <2 x i32> undef, i32 %i, i32 0
	; CHECK: %[[SPLAT:.*]] = shufflevector <2 x i32> %[[INSERT]], <2 x i32> undef, <2 x i32> zeroinitializer			; CHECK: %[[SPLAT:.*]] = shufflevector <2 x i32> %[[INSERT]], <2 x i32> undef, <2 x i32> zeroinitializer
	; CHECK: %[[START:.*]] = add <2 x i32> %[[SPLAT]], <i32 0, i32 1>			; CHECK: %[[START:.*]] = add <2 x i32> %[[SPLAT]], <i32 0, i32 1>
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]			; CHECK: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; CHECK: %vec.ind = phi <2 x i32> [ %[[START]], %vector.ph ], [ %vec.ind.next, %vector.body ]			; CHECK: %vec.ind = phi <2 x i32> [ %[[START]], %vector.ph ], [ %vec.ind.next, %vector.body ]
	; CHECK: %offset.idx = add i32 %i, %index			; CHECK: %offset.idx = add i32 %i, %index
	; CHECK: %[[A1:.*]] = add i32 %offset.idx, 0			; CHECK: %[[A1:.*]] = add i32 %offset.idx, 0
	; CHECK: %[[A2:.*]] = add i32 %offset.idx, 1
	; CHECK: %[[G1:.]] = getelementptr inbounds i32, i32 %a, i32 %[[A1]]			; CHECK: %[[G1:.]] = getelementptr inbounds i32, i32 %a, i32 %[[A1]]
	; CHECK: %[[G2:.]] = getelementptr inbounds i32, i32 %a, i32 %[[A2]]
	; CHECK: %[[G3:.]] = getelementptr i32, i32 %[[G1]], i32 0			; CHECK: %[[G3:.]] = getelementptr i32, i32 %[[G1]], i32 0
	; CHECK: %[[B1:.]] = bitcast i32 %[[G3]] to <2 x i32>*			; CHECK: %[[B1:.]] = bitcast i32 %[[G3]] to <2 x i32>*
	; CHECK: store <2 x i32> %vec.ind, <2 x i32>* %[[B1]]			; CHECK: store <2 x i32> %vec.ind, <2 x i32>* %[[B1]]
	; CHECK: %index.next = add i32 %index, 2			; CHECK: %index.next = add i32 %index, 2
	; CHECK: %vec.ind.next = add <2 x i32> %vec.ind, <i32 2, i32 2>			; CHECK: %vec.ind.next = add <2 x i32> %vec.ind, <i32 2, i32 2>
	; CHECK: %[[CMP:.*]] = icmp eq i32 %index.next, %n.vec			; CHECK: %[[CMP:.*]] = icmp eq i32 %index.next, %n.vec
	; CHECK: br i1 %[[CMP]]			; CHECK: br i1 %[[CMP]]
	;			;
	▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/LoopVectorize/reverse_induction.ll

	; RUN: opt < %s -loop-vectorize -force-vector-interleave=2 -force-vector-width=4 -S \| FileCheck %s			; RUN: opt < %s -loop-vectorize -force-vector-interleave=2 -force-vector-width=4 -S \| FileCheck %s

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"

	; Make sure consecutive vector generates correct negative indices.			; Make sure consecutive vector generates correct negative indices.
	; PR15882			; PR15882

	; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]			; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; CHECK: %offset.idx = sub i64 %startval, %index			; CHECK: %offset.idx = sub i64 %startval, %index
	; CHECK: %[[a0:.+]] = add i64 %offset.idx, 0			; CHECK: %[[a0:.+]] = add i64 %offset.idx, 0
	; CHECK: %[[a1:.+]] = add i64 %offset.idx, -1
	; CHECK: %[[a2:.+]] = add i64 %offset.idx, -2
	; CHECK: %[[a3:.+]] = add i64 %offset.idx, -3
	; CHECK: %[[a4:.+]] = add i64 %offset.idx, -4			; CHECK: %[[a4:.+]] = add i64 %offset.idx, -4
	; CHECK: %[[a5:.+]] = add i64 %offset.idx, -5
	; CHECK: %[[a6:.+]] = add i64 %offset.idx, -6
	; CHECK: %[[a7:.+]] = add i64 %offset.idx, -7

	define i32 @reverse_induction_i64(i64 %startval, i32 * %ptr) {			define i32 @reverse_induction_i64(i64 %startval, i32 * %ptr) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%add.i7 = phi i64 [ %startval, %entry ], [ %add.i, %for.body ]			%add.i7 = phi i64 [ %startval, %entry ], [ %add.i, %for.body ]
	%i.06 = phi i32 [ 0, %entry ], [ %inc4, %for.body ]			%i.06 = phi i32 [ 0, %entry ], [ %inc4, %for.body ]
	Show All 9 Lines
	loopend:			loopend:
	ret i32 %inc.redux			ret i32 %inc.redux
	}			}

	; CHECK-LABEL: @reverse_induction_i128(			; CHECK-LABEL: @reverse_induction_i128(
	; CHECK: %index = phi i128 [ 0, %vector.ph ], [ %index.next, %vector.body ]			; CHECK: %index = phi i128 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; CHECK: %offset.idx = sub i128 %startval, %index			; CHECK: %offset.idx = sub i128 %startval, %index
	; CHECK: %[[a0:.+]] = add i128 %offset.idx, 0			; CHECK: %[[a0:.+]] = add i128 %offset.idx, 0
	; CHECK: %[[a1:.+]] = add i128 %offset.idx, -1
	; CHECK: %[[a2:.+]] = add i128 %offset.idx, -2
	; CHECK: %[[a3:.+]] = add i128 %offset.idx, -3
	; CHECK: %[[a4:.+]] = add i128 %offset.idx, -4			; CHECK: %[[a4:.+]] = add i128 %offset.idx, -4
	; CHECK: %[[a5:.+]] = add i128 %offset.idx, -5
	; CHECK: %[[a6:.+]] = add i128 %offset.idx, -6
	; CHECK: %[[a7:.+]] = add i128 %offset.idx, -7

	define i32 @reverse_induction_i128(i128 %startval, i32 * %ptr) {			define i32 @reverse_induction_i128(i128 %startval, i32 * %ptr) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%add.i7 = phi i128 [ %startval, %entry ], [ %add.i, %for.body ]			%add.i7 = phi i128 [ %startval, %entry ], [ %add.i, %for.body ]
	%i.06 = phi i32 [ 0, %entry ], [ %inc4, %for.body ]			%i.06 = phi i32 [ 0, %entry ], [ %inc4, %for.body ]
	Show All 9 Lines
	loopend:			loopend:
	ret i32 %inc.redux			ret i32 %inc.redux
	}			}

	; CHECK-LABEL: @reverse_induction_i16(			; CHECK-LABEL: @reverse_induction_i16(
	; CHECK: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]			; CHECK: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; CHECK: %offset.idx = sub i16 %startval, {{.*}}			; CHECK: %offset.idx = sub i16 %startval, {{.*}}
	; CHECK: %[[a0:.+]] = add i16 %offset.idx, 0			; CHECK: %[[a0:.+]] = add i16 %offset.idx, 0
	; CHECK: %[[a1:.+]] = add i16 %offset.idx, -1
	; CHECK: %[[a2:.+]] = add i16 %offset.idx, -2
	; CHECK: %[[a3:.+]] = add i16 %offset.idx, -3
	; CHECK: %[[a4:.+]] = add i16 %offset.idx, -4			; CHECK: %[[a4:.+]] = add i16 %offset.idx, -4
	; CHECK: %[[a5:.+]] = add i16 %offset.idx, -5
	; CHECK: %[[a6:.+]] = add i16 %offset.idx, -6
	; CHECK: %[[a7:.+]] = add i16 %offset.idx, -7

	define i32 @reverse_induction_i16(i16 %startval, i32 * %ptr) {			define i32 @reverse_induction_i16(i16 %startval, i32 * %ptr) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%add.i7 = phi i16 [ %startval, %entry ], [ %add.i, %for.body ]			%add.i7 = phi i16 [ %startval, %entry ], [ %add.i, %for.body ]
	%i.06 = phi i32 [ 0, %entry ], [ %inc4, %for.body ]			%i.06 = phi i32 [ 0, %entry ], [ %inc4, %for.body ]
	Show All 26 Lines
	; --reverse_induction;			; --reverse_induction;
	; }			; }
	; }			; }

	; CHECK-LABEL: @reverse_forward_induction_i64_i8(			; CHECK-LABEL: @reverse_forward_induction_i64_i8(
	; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]			; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; CHECK: %offset.idx = sub i64 1023, %index			; CHECK: %offset.idx = sub i64 1023, %index
	; CHECK: %[[a0:.+]] = add i64 %offset.idx, 0			; CHECK: %[[a0:.+]] = add i64 %offset.idx, 0
	; CHECK: %[[a1:.+]] = add i64 %offset.idx, -1
	; CHECK: %[[a2:.+]] = add i64 %offset.idx, -2
	; CHECK: %[[a3:.+]] = add i64 %offset.idx, -3
	; CHECK: %[[a4:.+]] = add i64 %offset.idx, -4			; CHECK: %[[a4:.+]] = add i64 %offset.idx, -4
	; CHECK: %[[a5:.+]] = add i64 %offset.idx, -5
	; CHECK: %[[a6:.+]] = add i64 %offset.idx, -6
	; CHECK: %[[a7:.+]] = add i64 %offset.idx, -7

	define void @reverse_forward_induction_i64_i8() {			define void @reverse_forward_induction_i64_i8() {
	entry:			entry:
	br label %while.body			br label %while.body

	while.body:			while.body:
	%indvars.iv = phi i64 [ 1023, %entry ], [ %indvars.iv.next, %while.body ]			%indvars.iv = phi i64 [ 1023, %entry ], [ %indvars.iv.next, %while.body ]
	%forward_induction.05 = phi i8 [ 0, %entry ], [ %inc, %while.body ]			%forward_induction.05 = phi i8 [ 0, %entry ], [ %inc, %while.body ]
	Show All 9 Lines
	while.end:			while.end:
	ret void			ret void
	}			}

	; CHECK-LABEL: @reverse_forward_induction_i64_i8_signed(			; CHECK-LABEL: @reverse_forward_induction_i64_i8_signed(
	; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]			; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; CHECK: %offset.idx = sub i64 1023, %index			; CHECK: %offset.idx = sub i64 1023, %index
	; CHECK: %[[a0:.+]] = add i64 %offset.idx, 0			; CHECK: %[[a0:.+]] = add i64 %offset.idx, 0
	; CHECK: %[[a1:.+]] = add i64 %offset.idx, -1
	; CHECK: %[[a2:.+]] = add i64 %offset.idx, -2
	; CHECK: %[[a3:.+]] = add i64 %offset.idx, -3
	; CHECK: %[[a4:.+]] = add i64 %offset.idx, -4			; CHECK: %[[a4:.+]] = add i64 %offset.idx, -4
	; CHECK: %[[a5:.+]] = add i64 %offset.idx, -5
	; CHECK: %[[a6:.+]] = add i64 %offset.idx, -6
	; CHECK: %[[a7:.+]] = add i64 %offset.idx, -7

	define void @reverse_forward_induction_i64_i8_signed() {			define void @reverse_forward_induction_i64_i8_signed() {
	entry:			entry:
	br label %while.body			br label %while.body

	while.body:			while.body:
	%indvars.iv = phi i64 [ 1023, %entry ], [ %indvars.iv.next, %while.body ]			%indvars.iv = phi i64 [ 1023, %entry ], [ %indvars.iv.next, %while.body ]
	%forward_induction.05 = phi i8 [ -127, %entry ], [ %inc, %while.body ]			%forward_induction.05 = phi i8 [ -127, %entry ], [ %inc, %while.body ]
	Show All 12 Lines