This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
5
IndVarSimplify.cpp
-
test/Transforms/
-
Transforms/
-
IndVarSimplify/
-
2011-09-27-hoistsext.ll
1
elim-extend.ll
-
no-iv-rewrite.ll
1
verify-scev.ll
1
widen-loop-comp.ll
-
LoopSimplify/
4
merge-exits.ll

Differential D5333

[IndVarSimplify] Widen signed loop compare instructions to enable additional optimizations.
ClosedPublic

Authored by mcrosier on Sep 12 2014, 10:14 AM.

Download Raw Diff

Details

Reviewers

chandlerc
compnerd
atrick
sunfish
bkramer

Summary

This change removes a trunc from in-between an induction variable and the associated compare. This allows other optimizations (i.e., instcombine, LSR) to take effect. A sext may be added to the compare's other operand, but this can often be hoisted outside of the loop.

For example,
int *ptr;
int e, idx;
int foo() {

int i;
idx = -1;
for (i = 0; i <= e; i++)
  if (!ptr[i]) {
    idx = i;
    break;
  };
return idx;

}

Before this change, on AArch64 we generate the following loop:

.LBB0_2: // %for.body

ldr     w11, [x10, x0, lsl #2]
cbz     w11, .LBB0_5
add     x0, x0, #1           // ++i / pre-increment
sub     w11, w0, #1        // rematerialize i
cmp      w11, w9           // compare i
b.lt    .LBB0_2

After:

.LBB0_2: // %for.body

ldr      w11, [x10]             // remove shift as we can now increment base by 4
add     x0, x0, #1             // i++ / post-increment i
cbz     w11, .LBB0_5
add     x10, x10, #4         // base += 4 
cmp      x0, x9
b.lt    .LBB0_2

With a little more work we should be able to generate a post-increment load (my next patch) to generate the following loop:

.LBB0_2: // %for.body

ldr      w11, [x10], 4
add     x0, x0, #1
cbz     w11, .LBB0_5
cmp      x0, x9
b.lt    .LBB0_2

This change in isolation has a minimal effect on performance (i.e., nothing outside of noise). However, it enables better use of post-increment loads/stores, which is why I deemed it important. Please have a look.

Chad

Diff Detail

Event Timeline

mcrosier updated this revision to Diff 13639.Sep 12 2014, 10:14 AM

mcrosier retitled this revision from to [IndVarSimplify] Widen signed loop compare instructions to enable additional optimizations..

mcrosier updated this object.

mcrosier edited the test plan for this revision. (Show Details)

mcrosier added reviewers: atrick, bkramer, chandlerc, sunfish, compnerd.

mcrosier added subscribers: Unknown Object (MLST), Jiangning, jmolloy.

Herald added a subscriber: aemerson. · View Herald TranscriptSep 12 2014, 10:14 AM

Just a few comments to save on review time.

Chad

lib/Transforms/Scalar/IndVarSimplify.cpp
935	No that I think about it, I'm not sure the single use is necessary.
954	I reuse getExtend because it tries hard to hoist the extend as far out of the loop as possible.
test/Transforms/IndVarSimplify/elim-extend.ll
10	This is the sext on limit, which is outside of the loop. This isn't an extend we need to CHECK-NOT.
test/Transforms/IndVarSimplify/verify-scev.ll
383	Benjamin, Can you verify this is a safe change on this test case? My patch changes how the SCEV output, so I modified the test case to the new result.
test/Transforms/IndVarSimplify/widen-loop-comp.ll
19	This is the example from the commit-list message.
test/Transforms/LoopSimplify/merge-exits.ll
1	FileCheckize.
14	No sext in loop.

Looks fine to me, though I'm not familiar with this code. You should get a LGTM from someone who knows the area. I'm mostly just reading through to build up my own background in the area.

lib/Transforms/Scalar/IndVarSimplify.cpp
935	Why does it matter that the compare has one use? We're not changing the result of the ICmp are we?
test/Transforms/LoopSimplify/merge-exits.ll
1	Separating this cleanup into it's own change before making the change required for your patch would make the diff a lot more understandable.

I just glanced at this and still want to take a little closer look. In particular I want to understand why LSR behavior changes. Ping me if I don't get back to it.

I generally would not trade a trunc for a sext. But in this case, it probably works out better most of the time. So your change seems reasonable if it doesn't regress things. We definitely want canonical looking loop compares and that must have something to do with LSR behaving better.

Did you also see changes in InstCombine?

I wonder if you'll hit an assertion if both compare operands happen to use the same IV:

j = i*2 - k
cmp i, j

It would be good to just handle the unsigned case instead of leaving a fixme so behavior is consistent. It doesn't have to be in this commit though.

Thanks for the feedback, Phillip.

Andy,
I'll have to revisit the LSR code to better understand the the change. At one point in time I understood what was going on, but it's since been paged out. Assuming this patch gets accepted, I'll investigate the unsigned implementation.

Chad

lib/Transforms/Scalar/IndVarSimplify.cpp
935	Exactly. I'll remove the check and rerun the analysis.
test/Transforms/LoopSimplify/merge-exits.ll
1	r217698.

Updated based on reviews. Removed the singleUse check on icmp. Also removed the FileCheckize change (committed in r217698).

dpeixott added a subscriber: dpeixott.Sep 12 2014, 12:29 PM

dpeixott added inline comments.

lib/Transforms/Scalar/IndVarSimplify.cpp
931	Does your comment about single use still apply? I don't see a check for it.

zinovy.nis added a subscriber: zinovy.nis.Sep 15 2014, 12:50 AM

I think the reason removing the trunc works well in practice is that LSR can easily tell that an immediate offset can be folded into the compare. I'm sure it would be possible to make LSR handle the trunc, but fixing Indvars to generate a cleaner loop test is a fine approach.

I'm still curious if this triggers any InstCombines that we were missing before.

This revision is now accepted and ready to land.Sep 15 2014, 7:59 PM

In D5333#16, @atrick wrote:

I think the reason removing the trunc works well in practice is that LSR can easily tell that an immediate offset can be folded into the compare. I'm sure it would be possible to make LSR handle the trunc, but fixing Indvars to generate a cleaner loop test is a fine approach.

Yes, that's what I'm seeing as well.

LSR Use: Kind=Basic, Offsets={0}, widest fixup type: i64

reg({0,+,1}<nuw><nsw><%for.body>)
reg({-1,+,1}<nw><%for.body>) + imm(1)

The latter case is only considered once the trunc is removed.

The specific instcombine I thought might be effected was committed by David here:
http://llvm.org/viewvc/llvm-project?view=revision&revision=179316

From my original code I was hoping to fold the extra subtract into the compare:

ldr w11, [x10, x0, lsl #2]
cbz w11, .LBB0_5
add x0, x0, #1
sub w11, w0, #1 <--- fold into compare
cmp w11, w9
b.lt .LBB0_2

However, r179316 couldn't handle this case because of the trunc. After a few pointers from David M. I realized that modifying that code to handle the trunc wasn't the right approach and that's how we arrived at this patch. Based on that limited experience, I could imagine other compare instcombines not working due to intervening truncs. However, I haven't actually seen this in practice.

Thanks for the review, Andy. I'm going to do a little more analysis before committing. My performance runs were on devices that were rather unstable at the time, so I'd like to rerun everything so I can sleep easier at night. :)

Chad

Thanks LGTM. Please follow up by removing the FIXME for unsigned compare.

Revised patch which now handles unsigned compares.

(Andy, I'm sure you can appreciate this..) I had a bit of variance in my performance runs (both A53 and A57), but overall I saw a few small improvements.

Committed in r217953. Post-commit reviews welcome, but hopefully this is in good shape.

njw45 added a subscriber: njw45.Sep 25 2014, 1:47 AM

This comment was removed by njw45.

Nick,
The promotion of unsigned comparisons was reverted for this very reason.
I believe the step has to be a unit step of one to be correct. There may
be other cases, but this is the only one I can think of at the moment.

Regardless, that part of the code has been reverted.

Chad

I'm following up on my mail to llvmdev - I don't think this change is quite correct. The test case
below is a slight modification of the one in the final version of this
change; it's basically (in C):
uint32_t test5(uint32_t* a, uint8_t b) {
  uint32_t sum = 0;
  for(uint8_t i = 0; i <= b; i += 10) {
    sum += a[i];
  }
  return sum;
}
LLVM IR:
; CHECK-LABEL: @test5
; CHECK: zext i8 %b
; CHECK: for.cond:
; CHECK: phi i64
; CHECK: icmp ule i64

define i32 @test5(i32* %a, i8 %b) {
entry:
  br label %for.cond

for.cond:
  %sum.0 = phi i32 [ 0, %entry ], [ %add, %for.body ]
  %i.0 = phi i8 [ 0, %entry ], [ %inc, %for.body ]
  %cmp = icmp ule i8 %i.0, %b
  br i1 %cmp, label %for.body, label %for.end

for.body:
  %idxprom = sext i8 %i.0 to i64
  %arrayidx = getelementptr inbounds i32* %a, i64 %idxprom
  %0 = load i32* %arrayidx, align 4
  %add = add nsw i32 %sum.0, %0
  %inc = add nsw i8 %i.0, 10
  br label %for.cond

for.end:
  ret i32 %sum.0
}
If b was (e.g.) 251 the loop would be infinite (as it's `for(uint8_t i =
0; i < 251; i += 10), but when i` gets to 250 + 10 then i wraps and `i
< 251` is true). If you promote i to a uint64_t then the loop
terminates at this point as 260 > 251. Is this analysis correct?

Thanks -

Nick

http://reviews.llvm.org/D5333

Hi Chad -
I think this patch caused or exposed some other bug here:
http://llvm.org/bugs/show_bug.cgi?id=21030

Thanks, Sanjay. I'll investigate now.

Hi Chad -
I think this patch caused or exposed some other bug here:
http://llvm.org/bugs/show_bug.cgi?id=21030

http://reviews.llvm.org/D5333

This change caused a 24% performance regression in the SciMark2's Sparse matmult benchmark on Bloomfield processors and 12% on Harpertown processors with "-O3 -march=native" generating identical assembly in both cases. This issue is filed at http://llvm.org/bugs/show_bug.cgi?id=22589 and appears to be due to the increased register pressure introduced by the newly exposed optimizations causing spills. This issue is processor-specific and the regression is minimal on other processors like Haswell due to their additional registers (at least for this specific test case).

Revision Contents

Path

Size

lib/

Transforms/

Scalar/

IndVarSimplify.cpp

42 lines

test/

Transforms/

IndVarSimplify/

2011-09-27-hoistsext.ll

1 line

1 line

2 lines

8 lines

137 lines

LoopSimplify/

merge-exits.ll

54 lines

Diff 13639

lib/Transforms/Scalar/IndVarSimplify.cpp

Context not available.

	Instruction *WidenIVUse(NarrowIVDefUse DU, SCEVExpander &Rewriter);	Instruction *WidenIVUse(NarrowIVDefUse DU, SCEVExpander &Rewriter);

		bool WidenLoopCompare(NarrowIVDefUse DU);

	void pushNarrowIVUsers(Instruction NarrowDef, Instruction WideDef);	void pushNarrowIVUsers(Instruction NarrowDef, Instruction WideDef);
	};	};
	} // anonymous namespace	} // anonymous namespace
Context not available.
	DU.NarrowUse->replaceUsesOfWith(DU.NarrowDef, Trunc);	DU.NarrowUse->replaceUsesOfWith(DU.NarrowDef, Trunc);
	}	}

		/// If the narrow use is a compare instruction, and the compare has a single use
		dpeixottUnsubmitted Not Done Reply Inline Actions Does your comment about single use still apply? I don't see a check for it. dpeixott: Does your comment about single use still apply? I don't see a check for it.
		/// then widen the compare instruction (and possibly the other operand).
		bool WidenIV::WidenLoopCompare(NarrowIVDefUse DU) {
		ICmpInst *Cmp = dyn_cast<ICmpInst>(DU.NarrowUse);
		if (!Cmp \|\| !Cmp->hasOneUse())
		mcrosierAuthorUnsubmitted Not Done Reply Inline Actions No that I think about it, I'm not sure the single use is necessary. mcrosier: No that I think about it, I'm not sure the single use is necessary.
		reamesUnsubmitted Not Done Reply Inline Actions Why does it matter that the compare has one use? We're not changing the result of the ICmp are we? reames: Why does it matter that the compare has one use? We're not changing the result of the ICmp are…
		mcrosierAuthorUnsubmitted Not Done Reply Inline Actions Exactly. I'll remove the check and rerun the analysis. mcrosier: Exactly. I'll remove the check and rerun the analysis.
		return false;

		/// FIXME: Add support for unsigned compare.
		bool IsSigned = CmpInst::isSigned(Cmp->getPredicate());
		if (!IsSigned)
		return false;

		Value *Op = Cmp->getOperand(Cmp->getOperand(0) == DU.NarrowDef ? 1 : 0);
		unsigned CastWidth = SE->getTypeSizeInBits(Op->getType());
		unsigned IVWidth = SE->getTypeSizeInBits(WideType);
		assert (CastWidth <= IVWidth && "Unexpected width while widening compare.");

		// Widen the compare instruction.
		IRBuilder<> Builder(getInsertPointForUses(DU.NarrowUse, DU.NarrowDef, DT));
		DU.NarrowUse->replaceUsesOfWith(DU.NarrowDef, DU.WideDef);

		// Widen the other operand of the compare, if necessary.
		if (CastWidth < IVWidth) {
		Value *ExtOp = getExtend(Op, WideType, IsSigned, Cmp);
		mcrosierAuthorUnsubmitted Not Done Reply Inline Actions I reuse getExtend because it tries hard to hoist the extend as far out of the loop as possible. mcrosier: I reuse getExtend because it tries hard to hoist the extend as far out of the loop as possible.
		DU.NarrowUse->replaceUsesOfWith(Op, ExtOp);
		}
		return true;
		}

	/// WidenIVUse - Determine whether an individual user of the narrow IV can be	/// WidenIVUse - Determine whether an individual user of the narrow IV can be
	/// widened. If so, return the wide clone of the user.	/// widened. If so, return the wide clone of the user.
	Instruction *WidenIV::WidenIVUse(NarrowIVDefUse DU, SCEVExpander &Rewriter) {	Instruction *WidenIV::WidenIVUse(NarrowIVDefUse DU, SCEVExpander &Rewriter) {
Context not available.

	// Does this user itself evaluate to a recurrence after widening?	// Does this user itself evaluate to a recurrence after widening?
	const SCEVAddRecExpr *WideAddRec = GetWideRecurrence(DU.NarrowUse);	const SCEVAddRecExpr *WideAddRec = GetWideRecurrence(DU.NarrowUse);
		if (!WideAddRec)
		WideAddRec = GetExtendedOperandRecurrence(DU);

	if (!WideAddRec) {	if (!WideAddRec) {
	WideAddRec = GetExtendedOperandRecurrence(DU);	// If use is a loop condition, try to promote the condition instead of
	}	// truncating the IV first.
	if (!WideAddRec) {	if (WidenLoopCompare(DU))
		return nullptr;

	// This user does not evaluate to a recurence after widening, so don't	// This user does not evaluate to a recurence after widening, so don't
	// follow it. Instead insert a Trunc to kill off the original use,	// follow it. Instead insert a Trunc to kill off the original use,
	// eventually isolating the original narrow IV so it can be removed.	// eventually isolating the original narrow IV so it can be removed.
Context not available.

test/Transforms/IndVarSimplify/2011-09-27-hoistsext.ll

Context not available.

	; CHECK: for.body:	; CHECK: for.body:
	; CHECK-NOT: sext	; CHECK-NOT: sext
		; CHECK: indvars.iv.next
	; CHECK: br	; CHECK: br
	for.body:	for.body:
	%i2.115 = phi i32 [ 0, %entry ], [ %add249, %for.body ]	%i2.115 = phi i32 [ 0, %entry ], [ %add249, %for.body ]
Context not available.

test/Transforms/IndVarSimplify/elim-extend.ll

Context not available.
	define void @postincConstIV(i8* %base, i32 %limit) nounwind {	define void @postincConstIV(i8* %base, i32 %limit) nounwind {
	entry:	entry:
	br label %loop	br label %loop
		; CHECK: sext i32
		mcrosierAuthorUnsubmitted Not Done Reply Inline Actions This is the sext on limit, which is outside of the loop. This isn't an extend we need to CHECK-NOT. mcrosier: This is the sext on limit, which is outside of the loop. This isn't an extend we need to CHECK…
	; CHECK: loop:	; CHECK: loop:
	; CHECK-NOT: sext	; CHECK-NOT: sext
	; CHECK: exit:	; CHECK: exit:
Context not available.

test/Transforms/IndVarSimplify/no-iv-rewrite.ll

Context not available.
	; loop and the OR instruction is replaced by an ADD keeping the result	; loop and the OR instruction is replaced by an ADD keeping the result
	; equivalent.	; equivalent.
	;	;
		; CHECK: sext
	; CHECK: loop:	; CHECK: loop:
	; CHECK: phi i64	; CHECK: phi i64
	; CHECK-NOT: sext	; CHECK-NOT: sext
	; CHECK: icmp slt i32
	; CHECK: exit:	; CHECK: exit:
	; CHECK: add i64	; CHECK: add i64
	loop:	loop:
Context not available.

test/Transforms/IndVarSimplify/verify-scev.ll

Context not available.

	for.body65.lr.ph: ; preds = %for.body48	for.body65.lr.ph: ; preds = %for.body48
	%0 = load i32* undef, align 4	%0 = load i32* undef, align 4
		%1 = sext i32 %0 to i64
		mcrosierAuthorUnsubmitted Not Done Reply Inline Actions Benjamin, Can you verify this is a safe change on this test case? My patch changes how the SCEV output, so I modified the test case to the new result. mcrosier: Benjamin, Can you verify this is a safe change on this test case? My patch changes how the…
	br label %for.body65.us	br label %for.body65.us

	for.body65.us: ; preds = %for.inc219.us, %for.body65.lr.ph	for.body65.us: ; preds = %for.inc219.us, %for.body65.lr.ph
	%k.09.us = phi i32 [ %inc.us, %for.inc219.us ], [ 1, %for.body65.lr.ph ]	%indvars.iv = phi i64 [ %indvars.iv.next, %for.inc219.us ], [ 1, %for.body65.lr.ph ]
	%idxprom66.us = sext i32 %k.09.us to i64
	br i1 undef, label %for.inc219.us, label %if.end72.us	br i1 undef, label %for.inc219.us, label %if.end72.us

	if.end72.us: ; preds = %for.body65.us	if.end72.us: ; preds = %for.body65.us
Context not available.
	br i1 undef, label %for.cond139.loopexit.us, label %for.cond152.us	br i1 undef, label %for.cond139.loopexit.us, label %for.cond152.us

	for.inc219.us: ; preds = %for.cond139.loopexit.us, %if.end110.us, %if.end93.us, %for.body65.us	for.inc219.us: ; preds = %for.cond139.loopexit.us, %if.end110.us, %if.end93.us, %for.body65.us
	%inc.us = add nsw i32 %k.09.us, 1	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%cmp64.us = icmp sgt i32 %inc.us, %0	%cmp64.us = icmp sgt i64 %indvars.iv.next, %1
	br i1 %cmp64.us, label %for.inc221, label %for.body65.us	br i1 %cmp64.us, label %for.inc221, label %for.body65.us

	for.cond139.loopexit.us: ; preds = %for.cond152.us	for.cond139.loopexit.us: ; preds = %for.cond152.us
Context not available.

test/Transforms/IndVarSimplify/widen-loop-comp.ll

This file was added.

				; RUN: opt < %s -indvars -S \| FileCheck %s
				target triple = "aarch64--linux-gnu"

				; Check the loop exit i32 compare instruction and operand are widened to i64
				; instead of truncating IV before its use in the i32 compare instruction.

				@idx = common global i32 0, align 4
				@e = common global i32 0, align 4
				@ptr = common global i32* null, align 8

				; CHECK-LABEL: @test1
				; CHECK: for.body.lr.ph:
				; CHECK: sext i32
				; CHECK: for.cond:
				; CHECK: icmp slt i64
				; CHECK: for.body:
				; CHECK: phi i64

				define i32 @test1() {
				mcrosierAuthorUnsubmitted Not Done Reply Inline Actions This is the example from the commit-list message. mcrosier: This is the example from the commit-list message.
				entry:
				store i32 -1, i32* @idx, align 4
				%0 = load i32* @e, align 4
				%cmp4 = icmp slt i32 %0, 0
				br i1 %cmp4, label %for.end.loopexit, label %for.body.lr.ph

				for.body.lr.ph:
				%1 = load i32** @ptr, align 8
				%2 = load i32* @e, align 4
				br label %for.body

				for.cond:
				%inc = add nsw i32 %i.05, 1
				%cmp = icmp slt i32 %i.05, %2
				br i1 %cmp, label %for.body, label %for.cond.for.end.loopexit_crit_edge

				for.body:
				%i.05 = phi i32 [ 0, %for.body.lr.ph ], [ %inc, %for.cond ]
				%idxprom = sext i32 %i.05 to i64
				%arrayidx = getelementptr inbounds i32* %1, i64 %idxprom
				%3 = load i32* %arrayidx, align 4
				%tobool = icmp eq i32 %3, 0
				br i1 %tobool, label %if.then, label %for.cond

				if.then:
				%i.05.lcssa = phi i32 [ %i.05, %for.body ]
				store i32 %i.05.lcssa, i32* @idx, align 4
				br label %for.end

				for.cond.for.end.loopexit_crit_edge:
				br label %for.end.loopexit

				for.end.loopexit:
				br label %for.end

				for.end:
				%4 = load i32* @idx, align 4
				ret i32 %4
				}

				; CHECK-LABEL: @test2
				; CHECK: for.body4.us
				; CHECK: %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				; CHECK: %cmp2.us = icmp slt i64
				; CHECK-NOT: %2 = trunc i64 %indvars.iv.next to i32
				; CHECK-NOT: %cmp2.us = icmp slt i32

				define void @test2([8 x i8]* %a, i8* %b, i8 %limit) {
				entry:
				%conv = zext i8 %limit to i32
				%cmp23 = icmp eq i8 %limit, 0
				br i1 %cmp23, label %for.cond1.preheader, label %for.cond1.preheader.us

				for.cond1.preheader.us:
				%storemerge5.us = phi i32 [ 0, %entry ], [ %inc14.us, %for.inc13.us ]
				br i1 true, label %for.body4.lr.ph.us, label %for.inc13.us

				for.inc13.us:
				%inc14.us = add nsw i32 %storemerge5.us, 1
				%cmp.us = icmp slt i32 %inc14.us, 4
				br i1 %cmp.us, label %for.cond1.preheader.us, label %for.end

				for.body4.us:
				%storemerge14.us = phi i32 [ 0, %for.body4.lr.ph.us ], [ %inc.us, %for.body4.us ]
				%idxprom.us = sext i32 %storemerge14.us to i64
				%arrayidx6.us = getelementptr inbounds [8 x i8]* %a, i64 %idxprom5.us, i64 %idxprom.us
				%0 = load i8* %arrayidx6.us, align 1
				%idxprom7.us = zext i8 %0 to i64
				%arrayidx8.us = getelementptr inbounds i8* %b, i64 %idxprom7.us
				%1 = load i8* %arrayidx8.us, align 1
				store i8 %1, i8* %arrayidx6.us, align 1
				%inc.us = add nsw i32 %storemerge14.us, 1
				%cmp2.us = icmp slt i32 %inc.us, %conv
				br i1 %cmp2.us, label %for.body4.us, label %for.inc13.us

				for.body4.lr.ph.us:
				%idxprom5.us = sext i32 %storemerge5.us to i64
				br label %for.body4.us

				for.cond1.preheader:
				%storemerge5 = phi i32 [ 0, %entry ], [ %inc14, %for.inc13 ]
				br i1 false, label %for.inc13, label %for.inc13

				for.inc13:
				%inc14 = add nsw i32 %storemerge5, 1
				%cmp = icmp slt i32 %inc14, 4
				br i1 %cmp, label %for.cond1.preheader, label %for.end

				for.end:
				ret void
				}

				; CHECK-LABEL: @test3
				; CHECK: for.cond:
				; CHECK: phi i64
				; CHECK: icmp ne i64

				define i32 @test3(i32* %a) {
				entry:
				br label %for.cond

				for.cond:
				%sum.0 = phi i32 [ 0, %entry ], [ %add, %for.body ]
				%i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
				%cmp = icmp slt i32 %i.0, 1000
				br i1 %cmp, label %for.body, label %for.end

				for.body:
				%idxprom = sext i32 %i.0 to i64
				%arrayidx = getelementptr inbounds i32* %a, i64 %idxprom
				%0 = load i32* %arrayidx, align 4
				%add = add nsw i32 %sum.0, %0
				%inc = add nsw i32 %i.0, 1
				br label %for.cond

				for.end:
				ret i32 %sum.0
				}

test/Transforms/LoopSimplify/merge-exits.ll

	; RUN: opt < %s -loop-simplify -loop-rotate -instcombine -indvars -S -verify-loop-info -verify-dom-info > %t	; RUN: opt < %s -loop-simplify -loop-rotate -instcombine -indvars -S -verify-loop-info -verify-dom-info \| FileCheck %s
		mcrosierAuthorUnsubmitted Not Done Reply Inline Actions FileCheckize. mcrosier: FileCheckize.
		reamesUnsubmitted Not Done Reply Inline Actions Separating this cleanup into it's own change before making the change required for your patch would make the diff a lot more understandable. reames: Separating this cleanup into it's own change before making the change required for your patch…
		mcrosierAuthorUnsubmitted Not Done Reply Inline Actions r217698. mcrosier: r217698.
	; RUN: not grep sext %t
	; RUN: grep "phi i64" %t \| count 1

	; Loopsimplify should be able to merge the two loop exits	; Loopsimplify should be able to merge the two loop exits
	; into one, so that loop rotate can rotate the loop, so	; into one, so that loop rotate can rotate the loop, so
Context not available.

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n32:64"	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n32:64"

	define float @t(float* %pTmp1, float* %peakWeight, i32 %bandEdgeIndex) nounwind {	; CHECK-LABEL: @test1
		; CHECK: bb
		; CHECK: phi i64
		; CHECK-NOT: phi i64
		; CHECK-NOT: sext
		mcrosierAuthorUnsubmitted Not Done Reply Inline Actions No sext in loop. mcrosier: No sext in loop.

		define float @test1(float* %pTmp1, float* %peakWeight, i32 %bandEdgeIndex) nounwind {
	entry:	entry:
	%t0 = load float* %peakWeight, align 4 ; <float> [#uses=1]	%t0 = load float* %peakWeight, align 4
	br label %bb1	br label %bb1

	bb: ; preds = %bb2	bb:
	%t1 = sext i32 %hiPart.0 to i64 ; <i64> [#uses=1]	%t1 = sext i32 %hiPart.0 to i64
	%t2 = getelementptr float* %pTmp1, i64 %t1 ; <float*> [#uses=1]	%t2 = getelementptr float* %pTmp1, i64 %t1
	%t3 = load float* %t2, align 4 ; <float> [#uses=1]	%t3 = load float* %t2, align 4
	%t4 = fadd float %t3, %distERBhi.0 ; <float> [#uses=1]	%t4 = fadd float %t3, %distERBhi.0
	%t5 = add i32 %hiPart.0, 1 ; <i32> [#uses=2]	%t5 = add i32 %hiPart.0, 1
	%t6 = sext i32 %t5 to i64 ; <i64> [#uses=1]	%t6 = sext i32 %t5 to i64
	%t7 = getelementptr float* %peakWeight, i64 %t6 ; <float*> [#uses=1]	%t7 = getelementptr float* %peakWeight, i64 %t6
	%t8 = load float* %t7, align 4 ; <float> [#uses=1]	%t8 = load float* %t7, align 4
	%t9 = fadd float %t8, %peakCount.0 ; <float> [#uses=1]	%t9 = fadd float %t8, %peakCount.0
	br label %bb1	br label %bb1

	bb1: ; preds = %bb, %entry	bb1:
	%peakCount.0 = phi float [ %t0, %entry ], [ %t9, %bb ] ; <float> [#uses=2]	%peakCount.0 = phi float [ %t0, %entry ], [ %t9, %bb ]
	%hiPart.0 = phi i32 [ 0, %entry ], [ %t5, %bb ] ; <i32> [#uses=3]	%hiPart.0 = phi i32 [ 0, %entry ], [ %t5, %bb ]
	%distERBhi.0 = phi float [ 0.000000e+00, %entry ], [ %t4, %bb ] ; <float> [#uses=3]	%distERBhi.0 = phi float [ 0.000000e+00, %entry ], [ %t4, %bb ]
	%t10 = fcmp uge float %distERBhi.0, 2.500000e+00 ; <i1> [#uses=1]	%t10 = fcmp uge float %distERBhi.0, 2.500000e+00
	br i1 %t10, label %bb3, label %bb2	br i1 %t10, label %bb3, label %bb2

	bb2: ; preds = %bb1	bb2:
	%t11 = add i32 %bandEdgeIndex, -1 ; <i32> [#uses=1]	%t11 = add i32 %bandEdgeIndex, -1
	%t12 = icmp sgt i32 %t11, %hiPart.0 ; <i1> [#uses=1]	%t12 = icmp sgt i32 %t11, %hiPart.0
	br i1 %t12, label %bb, label %bb3	br i1 %t12, label %bb, label %bb3

	bb3: ; preds = %bb2, %bb1	bb3:
	%t13 = fdiv float %peakCount.0, %distERBhi.0 ; <float> [#uses=1]	%t13 = fdiv float %peakCount.0, %distERBhi.0
	ret float %t13	ret float %t13
	}	}
Context not available.

This is an archive of the discontinued LLVM Phabricator instance.

[IndVarSimplify] Widen signed loop compare instructions to enable additional optimizations.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 13639

lib/Transforms/Scalar/IndVarSimplify.cpp

test/Transforms/IndVarSimplify/2011-09-27-hoistsext.ll

test/Transforms/IndVarSimplify/elim-extend.ll

test/Transforms/IndVarSimplify/no-iv-rewrite.ll

test/Transforms/IndVarSimplify/verify-scev.ll

test/Transforms/IndVarSimplify/widen-loop-comp.ll

test/Transforms/LoopSimplify/merge-exits.ll

[IndVarSimplify] Widen signed loop compare instructions to enable additional optimizations.
ClosedPublic