This is an archive of the discontinued LLVM Phabricator instance.

[LV] Switch to using canonical induction variables.
ClosedPublic

Authored by jmolloy on Aug 24 2015, 8:59 AM.

Download Raw Diff

Details

Reviewers

anemet
nadav
mzolotukhin
aschwaighofer
hfinkel

Summary

Vectorized loops only ever have one induction variable. All induction PHIs from the scalar loop are rewritten to be in terms of this single indvar.

We were trying very hard to pick an indvar that already existed, even if that indvar wasn't canonical (didn't start at zero). But trying so hard is really fruitless - creating a new, canonical, indvar only results in one extra add in the worst case and that add is trivially easy to push through the PHI out of the loop by instcombine.

If we try and be less clever here and instead let instcombine clean up our mess (as we do in many other places in LV), we can remove unneeded complexity.

Diff Detail

Repository: rL LLVM

Event Timeline

jmolloy updated this revision to Diff 32963.Aug 24 2015, 8:59 AM

jmolloy retitled this revision from to [LV] Switch to using canonical induction variables..

jmolloy updated this object.

jmolloy added reviewers: anemet, mzolotukhin.

jmolloy set the repository for this revision to rL LLVM.

jmolloy added a subscriber: llvm-commits.

I am OK with this but I'd like someone else to sign off on this one too. Alternatively showing that there is no relevant asm diff on the testsuite (including externals) is probably also a good indication that this case is over-engineered locally.

When you say InstCombine do you mean LSR?

We should probably also add a comment that OldInduction is now a canonical induction variable. And remove ExtendedIdx, which I think is unused now.

Hi Adam,

I ran the test-suite and got 66 files changed out of 1733 (0.05%). Of the ones that changed, I didn't see anything obvious but there was quite a bit of register allocation churn in many of them which made it very difficult to spot real differences.

I have a patch in my queue to remove ExtendedIdx later - I can merge that into this one.

Cheers,

James

Also, FWIW, I ran all our testing on this and got a slight improvement (2.5%) in ammp in spec2000. No other significant changes.

In D12285#232977, @jmolloy wrote:

I have a patch in my queue to remove ExtendedIdx later - I can merge that into this one.

Up to you, just wanted to make sure it does not stay there.

Can you please also address my question regarding LSR vs. InstCombine?

Sure. If we have a single, non canonical phi, and we create a canonical
one, then we'll end up with one phi with a constant add inside the loop.

I would have expected it to be instcombine that would take that add and
thread it up before the loop.

Having said that, instcombine doesn't look at loop depths so yes, it would
probably be LSR (or indvars?)

James

msg-1034-312.txt162 BDownload

Hi Adam,

Rebased and uploaded with full context.

Would it be acceptable to commit this patch and monitor for performance regressions? Without a specific regression test that shows a pattern being pessimized, and with my own testing showing no regressions, to me it makes sense to proceed and watch performance numbers carefully for a regression.

Is this acceptable to you?

Cheers,

James

jmolloy mentioned this in D12286: [LV] Never widen an induction variable..Aug 30 2015, 3:07 AM

I think that this is a good change. In many other places in the vectorizer the design was that we let other passes (such as CSE, InstCombine and LSR) clean up after us. I am totally okay with letting LSR do the cleanup. If I remember correctly we have always relied on LSR to do the cleanup and I don't remember why we have the logic for searching other induction variables.

LGTM.

anemet accepted this revision.Aug 31 2015, 8:55 PM

anemet edited edge metadata.

This revision is now accepted and ready to land.Aug 31 2015, 8:55 PM

r246630.

Revision Contents

Path

Size

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

22 lines

test/

Transforms/

LoopVectorize/

induction.ll

6 lines

reverse_induction.ll

9 lines

Diff 33524

lib/Transforms/Vectorize/LoopVectorize.cpp

Show First 20 Lines • Show All 2,646 Lines • ▼ Show 20 Lines	ExitCountValue = CastInst::CreatePointerCast(ExitCountValue, IdxTy,
"exitcount.ptrcnt.to.int",		"exitcount.ptrcnt.to.int",
VectorPH->getTerminator());		VectorPH->getTerminator());

Instruction *CheckMinIters =		Instruction *CheckMinIters =
CmpInst::Create(Instruction::ICmp, CmpInst::ICMP_ULT, ExitCountValue,		CmpInst::Create(Instruction::ICmp, CmpInst::ICMP_ULT, ExitCountValue,
ConstantInt::get(ExitCountValue->getType(), VF * UF),		ConstantInt::get(ExitCountValue->getType(), VF * UF),
"min.iters.check", VectorPH->getTerminator());		"min.iters.check", VectorPH->getTerminator());

// The loop index does not have to start at Zero. Find the original start
// value from the induction PHI node. If we don't have an induction variable
// then we know that it starts at zero.
Builder.SetInsertPoint(VectorPH->getTerminator());		Builder.SetInsertPoint(VectorPH->getTerminator());
Value *StartIdx = ExtendedIdx =		Value *StartIdx = ExtendedIdx = ConstantInt::get(IdxTy, 0);
OldInduction
? Builder.CreateZExt(OldInduction->getIncomingValueForBlock(VectorPH),
IdxTy)
: ConstantInt::get(IdxTy, 0);

// Count holds the overall loop count (N).		// Count holds the overall loop count (N).
Value *Count = Exp.expandCodeFor(ExitCount, ExitCount->getType(),		Value *Count = Exp.expandCodeFor(ExitCount, ExitCount->getType(),
VectorPH->getTerminator());		VectorPH->getTerminator());

LoopBypassBlocks.push_back(VectorPH);		LoopBypassBlocks.push_back(VectorPH);

// Split the single block loop into the two loop structure described above.		// Split the single block loop into the two loop structure described above.
▲ Show 20 Lines • Show All 865 Lines • ▼ Show 20 Lines	case InductionDescriptor::IK_IntInduction: {
Value *Broadcasted;		Value *Broadcasted;
if (P == OldInduction) {		if (P == OldInduction) {
// Handle the canonical induction variable. We might have had to		// Handle the canonical induction variable. We might have had to
// extend the type.		// extend the type.
Broadcasted = Builder.CreateTrunc(Induction, PhiTy);		Broadcasted = Builder.CreateTrunc(Induction, PhiTy);
} else {		} else {
// Handle other induction variables that are now based on the		// Handle other induction variables that are now based on the
// canonical one.		// canonical one.
Value *NormalizedIdx = Builder.CreateSub(Induction, ExtendedIdx,		auto *V = Builder.CreateSExtOrTrunc(Induction, PhiTy);
"normalized.idx");		Broadcasted = II.transform(Builder, V);
NormalizedIdx = Builder.CreateSExtOrTrunc(NormalizedIdx, PhiTy);
Broadcasted = II.transform(Builder, NormalizedIdx);
Broadcasted->setName("offset.idx");		Broadcasted->setName("offset.idx");
}		}
Broadcasted = getBroadcastInstrs(Broadcasted);		Broadcasted = getBroadcastInstrs(Broadcasted);
// After broadcasting the induction variable we need to make the vector		// After broadcasting the induction variable we need to make the vector
// consecutive by adding 0, 1, 2, etc.		// consecutive by adding 0, 1, 2, etc.
for (unsigned part = 0; part < UF; ++part)		for (unsigned part = 0; part < UF; ++part)
Entry[part] = getStepVector(Broadcasted, VF * part, II.getStepValue());		Entry[part] = getStepVector(Broadcasted, VF * part, II.getStepValue());
return;		return;
▲ Show 20 Lines • Show All 572 Lines • ▼ Show 20 Lines	for (BasicBlock::iterator it = (bb)->begin(), e = (bb)->end(); it != e;
// Get the widest type.		// Get the widest type.
if (!WidestIndTy)		if (!WidestIndTy)
WidestIndTy = convertPointerToIntegerType(DL, PhiTy);		WidestIndTy = convertPointerToIntegerType(DL, PhiTy);
else		else
WidestIndTy = getWiderType(DL, PhiTy, WidestIndTy);		WidestIndTy = getWiderType(DL, PhiTy, WidestIndTy);

// Int inductions are special because we only allow one IV.		// Int inductions are special because we only allow one IV.
if (ID.getKind() == InductionDescriptor::IK_IntInduction &&		if (ID.getKind() == InductionDescriptor::IK_IntInduction &&
ID.getStepValue()->isOne()) {		ID.getStepValue()->isOne() &&
		isa<Constant>(ID.getStartValue()) &&
		cast<Constant>(ID.getStartValue())->isNullValue()) {
// Use the phi node with the widest type as induction. Use the last		// Use the phi node with the widest type as induction. Use the last
// one if there are multiple (no good reason for doing this other		// one if there are multiple (no good reason for doing this other
// than it is expedient).		// than it is expedient). We've checked that it begins at zero and
		// steps by one, so this is a canonical induction variable.
if (!Induction \|\| PhiTy == WidestIndTy)		if (!Induction \|\| PhiTy == WidestIndTy)
Induction = Phi;		Induction = Phi;
}		}

DEBUG(dbgs() << "LV: Found an induction variable.\n");		DEBUG(dbgs() << "LV: Found an induction variable.\n");

// Until we explicitly handle the case of an induction variable with		// Until we explicitly handle the case of an induction variable with
// an outside loop user we have to give up vectorizing this loop.		// an outside loop user we have to give up vectorizing this loop.
▲ Show 20 Lines • Show All 1,376 Lines • Show Last 20 Lines

test/Transforms/LoopVectorize/induction.ll

	; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=2 -S \| FileCheck %s			; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-vector-width=2 -S \| FileCheck %s

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"

	; Make sure that we can handle multiple integer induction variables.			; Make sure that we can handle multiple integer induction variables.
	; CHECK-LABEL: @multi_int_induction(			; CHECK-LABEL: @multi_int_induction(
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]			; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; CHECK: %normalized.idx = sub i64 %index, 0			; CHECK: %[[VAR:.*]] = trunc i64 %index to i32
	; CHECK: %[[VAR:.*]] = trunc i64 %normalized.idx to i32
	; CHECK: %offset.idx = add i32 190, %[[VAR]]			; CHECK: %offset.idx = add i32 190, %[[VAR]]
	define void @multi_int_induction(i32* %A, i32 %N) {			define void @multi_int_induction(i32* %A, i32 %N) {
	for.body.lr.ph:			for.body.lr.ph:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%indvars.iv = phi i64 [ 0, %for.body.lr.ph ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %for.body.lr.ph ], [ %indvars.iv.next, %for.body ]
	%count.09 = phi i32 [ 190, %for.body.lr.ph ], [ %inc, %for.body ]			%count.09 = phi i32 [ 190, %for.body.lr.ph ], [ %inc, %for.body ]
	▲ Show 20 Lines • Show All 118 Lines • ▼ Show 20 Lines
	}			}

	; When generating the overflow check we must sure that the induction start value			; When generating the overflow check we must sure that the induction start value
	; is defined before the branch to the scalar preheader.			; is defined before the branch to the scalar preheader.

	; CHECK-LABEL: testoverflowcheck			; CHECK-LABEL: testoverflowcheck
	; CHECK: entry			; CHECK: entry
	; CHECK: %[[LOAD:.*]] = load i8			; CHECK: %[[LOAD:.*]] = load i8
	; CHECK: %[[VAL:.*]] = zext i8 %[[LOAD]] to i32
	; CHECK: br			; CHECK: br

	; CHECK: scalar.ph			; CHECK: scalar.ph
	; CHECK: phi i32 [ %{{.*}}, %middle.block ], [ %[[VAL]], %entry ]			; CHECK: phi i8 [ %{{.*}}, %middle.block ], [ %[[LOAD]], %entry ]

	@e = global i8 1, align 1			@e = global i8 1, align 1
	@d = common global i32 0, align 4			@d = common global i32 0, align 4
	@c = common global i32 0, align 4			@c = common global i32 0, align 4
	define i32 @testoverflowcheck() {			define i32 @testoverflowcheck() {
	entry:			entry:
	%.pr.i = load i8, i8* @e, align 1			%.pr.i = load i8, i8* @e, align 1
	%0 = load i32, i32* @d, align 4			%0 = load i32, i32* @d, align 4
	Show All 14 Lines

test/Transforms/LoopVectorize/reverse_induction.ll

	Show First 20 Lines • Show All 90 Lines • ▼ Show 20 Lines
	; a[reverse_induction] = forward_induction;			; a[reverse_induction] = forward_induction;
	; --reverse_induction;			; --reverse_induction;
	; }			; }
	; }			; }

	; CHECK-LABEL: @reverse_forward_induction_i64_i8(			; CHECK-LABEL: @reverse_forward_induction_i64_i8(
	; CHECK: vector.body			; CHECK: vector.body
	; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]			; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; CHECK: %normalized.idx = sub i64 %index, 0			; CHECK: %offset.idx = sub i64 1023, %index
	; CHECK: %offset.idx = sub i64 1023, %normalized.idx
	; CHECK: trunc i64 %index to i8			; CHECK: trunc i64 %index to i8

	define void @reverse_forward_induction_i64_i8() {			define void @reverse_forward_induction_i64_i8() {
	entry:			entry:
	br label %while.body			br label %while.body

	while.body:			while.body:
	%indvars.iv = phi i64 [ 1023, %entry ], [ %indvars.iv.next, %while.body ]			%indvars.iv = phi i64 [ 1023, %entry ], [ %indvars.iv.next, %while.body ]
	%forward_induction.05 = phi i8 [ 0, %entry ], [ %inc, %while.body ]			%forward_induction.05 = phi i8 [ 0, %entry ], [ %inc, %while.body ]
	%inc = add i8 %forward_induction.05, 1			%inc = add i8 %forward_induction.05, 1
	%conv = zext i8 %inc to i32			%conv = zext i8 %inc to i32
	%arrayidx = getelementptr inbounds [1024 x i32], [1024 x i32]* @a, i64 0, i64 %indvars.iv			%arrayidx = getelementptr inbounds [1024 x i32], [1024 x i32]* @a, i64 0, i64 %indvars.iv
	store i32 %conv, i32* %arrayidx, align 4			store i32 %conv, i32* %arrayidx, align 4
	%indvars.iv.next = add i64 %indvars.iv, -1			%indvars.iv.next = add i64 %indvars.iv, -1
	%0 = trunc i64 %indvars.iv to i32			%0 = trunc i64 %indvars.iv to i32
	%cmp = icmp sgt i32 %0, 0			%cmp = icmp sgt i32 %0, 0
	br i1 %cmp, label %while.body, label %while.end			br i1 %cmp, label %while.body, label %while.end

	while.end:			while.end:
	ret void			ret void
	}			}

	; CHECK-LABEL: @reverse_forward_induction_i64_i8_signed(			; CHECK-LABEL: @reverse_forward_induction_i64_i8_signed(
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK: %index = phi i64 [ 129, %vector.ph ], [ %index.next, %vector.body ]			; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; CHECK: %normalized.idx = sub i64 %index, 129			; CHECK: %offset.idx = sub i64 1023, %index
	; CHECK: %offset.idx = sub i64 1023, %normalized.idx
	; CHECK: trunc i64 %index to i8

	define void @reverse_forward_induction_i64_i8_signed() {			define void @reverse_forward_induction_i64_i8_signed() {
	entry:			entry:
	br label %while.body			br label %while.body

	while.body:			while.body:
	%indvars.iv = phi i64 [ 1023, %entry ], [ %indvars.iv.next, %while.body ]			%indvars.iv = phi i64 [ 1023, %entry ], [ %indvars.iv.next, %while.body ]
	%forward_induction.05 = phi i8 [ -127, %entry ], [ %inc, %while.body ]			%forward_induction.05 = phi i8 [ -127, %entry ], [ %inc, %while.body ]
	Show All 12 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LV] Switch to using canonical induction variables.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 33524

lib/Transforms/Vectorize/LoopVectorize.cpp

test/Transforms/LoopVectorize/induction.ll

test/Transforms/LoopVectorize/reverse_induction.ll

[LV] Switch to using canonical induction variables.
ClosedPublic