Download Raw Diff

Details

Reviewers

majnemer
efriedma
qcolombet
spatel
DIVYA

Summary

This change brings performance of zlib up by 10%. The example below is from a
hot loop in longest_match() from zlib.

do.body:

%cur_match.addr.0 = phi i32 [ %cur_match, %entry ], [ %2, %do.cond ]
%idx.ext = zext i32 %cur_match.addr.0 to i64
%add.ptr = getelementptr inbounds i8, i8* %win, i64 %idx.ext
%add.ptr2 = getelementptr inbounds i8, i8* %add.ptr, i64 %idx.ext1
%add.ptr3 = getelementptr inbounds i8, i8* %add.ptr2, i64 -1

In this example %idx.ext1 is a loop invariant. It will be moved above the use of
loop induction variable %idx.ext such that it can be hoisted out of the loop by
LICM. The operands that have dependences carried by the loop will be sinked down
in the GEP chain. This patch will produce the following output:

do.body:

%cur_match.addr.0 = phi i32 [ %cur_match, %entry ], [ %2, %do.cond ]
%idx.ext = zext i32 %cur_match.addr.0 to i64
%add.ptr = getelementptr inbounds i8, i8* %win, i64 %idx.ext1
%add.ptr2 = getelementptr inbounds i8, i8* %add.ptr, i64 -1
%add.ptr3 = getelementptr inbounds i8, i8* %add.ptr2, i64 %idx.ext

Diff Detail

Event Timeline

DIVYA created this revision.Nov 10 2017, 9:01 AM

This patch helps to reduce the number of instructions generated by llvm for aarch64 for the longest_match() hottest functions in zlib-ng library.

Assembly code for the longest_match function before applying the patch.
The code contains 2 adds inside the loop
.LBB0_7: // %do.body37

                                //   in Loop: Header=BB0_8 Depth=2
add             x26, x8, x20
add             x10, x26, x9
ldurh   w10, [x10, #-1]
cmp             w10, w28, uxth
b.eq    .LBB0_10

After applying the patch
.LBB0_8: // %if.then49

                                      //   Parent Loop BB0_5 Depth=1
                                      // =>  This Inner Loop Header: Depth=2

add   x26, x8, x20
ldrh    w10, [x26, x9]
cmp   w10, w28, uxth
b.ne  .LBB0_8

.LBB0_11:

Perf stat result before applying the patch for aarch64 for the minizip executable of zlib-ng library(Used llvm/build/lib as the source file to be compressed)

267701.266460      task-clock (msec)         #    0.503 CPUs utilized
         35,437      context-switches          #    0.132 K/sec
            147      cpu-migrations            #    0.001 K/sec
            176      page-faults               #    0.001 K/sec
294,334,720,322      cycles                    #    1.099 GHz
326,901,036,222      instructions              #    1.11  insns per cycle

Perf stat result before applying the patch for aarch64 or the minizip executable of zlib-ng library

265034.440160      task-clock (msec)         #    0.500 CPUs utilized
         35,480      context-switches          #    0.134 K/sec
            161      cpu-migrations            #    0.001 K/sec
            180      page-faults               #    0.001 K/sec
291,392,526,820      cycles                    #    1.099 GHz
321,014,404,980      instructions              #    1.10  insns per cycle

DIVYA edited the summary of this revision. (Show Details)Nov 10 2017, 9:18 AM

mgrang added a subscriber: mgrang.Nov 10 2017, 11:03 AM

mgrang added inline comments.

lib/Transforms/InstCombine/InstructionCombining.cpp
1697	nit: Period after comment.
1725	Please consider using early exits: https://llvm.org/docs/CodingStandards.html#use-early-exits-and-continue-to-simplify-code. Something like: if (!LI) return nullptr; <main logic goes here>
1729	nit: Period after comment.

See also https://reviews.llvm.org/D8911 / https://bugs.llvm.org/show_bug.cgi?id=23163, which originally introduced this check. I don't think checking whether the operands are loop-invariant really solves that problem... you still get exactly the same code duplication, just not inside the innermost loop.

lib/Transforms/InstCombine/InstructionCombining.cpp
1731	Constants are loop-invariant; no need to check for them explicitly.
test/Transforms/InstCombine/gep-combine-loop-invariant.ll
3	Please write tests to only run one pass; putting multiple passes in the RUN line makes it confusing to figure out what you're actually expecting each individual pass to do. (You can add a comment to explain, if it isn't obvious why a transform is profitable.)

DIVYA updated this revision to Diff 122667.Nov 13 2017, 8:55 AM

DIVYA marked 5 inline comments as done.

For the testcase in https://bugs.llvm.org/show_bug.cgi?id=23163, the patch will not create extra instructions , since the operands are not loop invariants.
And for the cases where the extra instructions are produced, gep(gep ...) merging optimization happens only when the second operands are loop invariant , and hence LICM pass will move them out of the loop.
Also, I am checking if the first operand of Src is not loop invariant.Since , if the first operand of Src Gep is loop invariant and both the second operands are also loop invariants, then they shouldn't be combined as LICM will anyway hoist them out of the loop and combining will only create extra instructions.

And for the cases where the extra instructions are produced, gep(gep ...) merging optimization happens only when the second operands are loop invariant , and hence LICM pass will move them out of the loop.

"L" here is the innermost loop relative to the GEP. The innermost loop might not be the important loop (e.g. it might have a low trip count and get unrolled). And even if the addition is hoisted out of the hottest loop, you're still increasing codesize and register pressure.

Do you have numbers for LLVM testsuite or SPEC or something?

lib/Transforms/InstCombine/InstructionCombining.cpp
1740	CreateAdd can't return nullptr.

mgrang added inline comments.Nov 13 2017, 5:19 PM

lib/Transforms/InstCombine/InstructionCombining.cpp
1735	Too many parentheses. Can be simplified: if (!L->isLoopInvariant(GO1) \|\| !L->isLoopInvariant(SO1) \|\| L->isLoopInvariant(Src->getOperand(0)))

The motivation behind this patch is that it brings up the performance
of gzip deflate by 10%: this pattern occurs in the hot spot of
longest_match() in zlib.

To answer Eli's question, here are the performance numbers
for the spec2000 int with and without the patch on A72 firefly:
a positive value is a speedup percent, a negative value is a slowdown percent

164.gzip .1000
175.vpr -1.5400
176.gcc -.5200
177.mesa .6200
179.art 2.1500
181.mcf -.7100
183.equake 1.2800
186.crafty .5400
188.ammp -.8700
197.parser .0400
252.eon .4700
253.perlbmk .7200
254.gap -.8300
255.vortex -.3400
256.bzip2 -.0400
300.twolf .4800

I will update the patch with all corrections asked in previous comments.

Update patch on today's tree.
Fixed parentheses.

Herald added a subscriber: hiraditya. · View Herald TranscriptMar 8 2018, 1:35 PM

sebpop updated this revision to Diff 137652.Mar 8 2018, 1:46 PM

sebpop edited the summary of this revision. (Show Details)

The 175.vpr result looks bad.

I'm still not convinced this is the right approach; if you look at the testcase as a reassociation problem, you can optimize it without duplicating code.

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1726 ↗	(On Diff #137649)	CreateAdd (still) can't return null.

In D39906#1031961, @efriedma wrote:

The 175.vpr result looks bad.

That's noise, I just reran that benchmark and the 1% slowdown is within the noise level: in the new run vpr shows a speedup of 1.47%.

I'm still not convinced this is the right approach; if you look at the testcase as a reassociation problem, you can optimize it without duplicating code.

Could you please explain how you want to optimize this sequence?
I also don't see where "duplicating code" comes from.

Also looking at the AVX512 example that needed to be amended, you see that
this pattern matches at quite some spots in there, and that it transforms two geps
into only one.

I also don't see where "duplicating code" comes from.

You're inserting an "add" instruction without removing any other instructions; in general, this hurts codesize/performance. This is why the combine is restricted in the first place. You might get lucky sometimes and simplify away the extra instruction, but you can't rely on that.

The "reassociation" way to optimize this is to consider the following two instructions:

%add.ptr = getelementptr inbounds i8, i8* %win, i64 %idx.ext
%add.ptr2 = getelementptr inbounds i8, i8* %add.ptr, i64 %idx.ext1

These can be reassociated into the following:

%tmp = getelementptr inbounds i8, i8* %win, i64 %idx.ext1
%add.ptr2 = getelementptr inbounds i8, i8* %tmp, i64 %idx.ext

The first GEP then gets hoisted out of the loop.

Rewrote the patch as suggested by Eli.

efriedma added inline comments.Mar 9 2018, 2:10 PM

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1683 ↗	(On Diff #137834)	I think you need to check `Src->hasOneUse()`?

sebpop updated this revision to Diff 137869.Mar 9 2018, 4:24 PM

Ping.

LGTM

This revision is now accepted and ready to land.Mar 23 2018, 2:52 PM

Committed in https://reviews.llvm.org/rL328539

Diff 137869

lib/Transforms/InstCombine/InstructionCombining.cpp

Show First 20 Lines • Show All 1,664 Lines • ▼ Show 20 Lines	Instruction *InstCombiner::visitGetElementPtrInst(GetElementPtrInst &GEP) {

// Combine Indices - If the source pointer to this getelementptr instruction		// Combine Indices - If the source pointer to this getelementptr instruction
// is a getelementptr instruction, combine the indices of the two		// is a getelementptr instruction, combine the indices of the two
// getelementptr instructions into a single instruction.		// getelementptr instructions into a single instruction.
if (GEPOperator *Src = dyn_cast<GEPOperator>(PtrOp)) {		if (GEPOperator *Src = dyn_cast<GEPOperator>(PtrOp)) {
if (!shouldMergeGEPs(cast<GEPOperator>(&GEP), Src))		if (!shouldMergeGEPs(cast<GEPOperator>(&GEP), Src))
return nullptr;		return nullptr;

		// Try to reassociate loop invariant GEP chains to enable LICM.
		if (LI && Src->getNumOperands() == 2 && GEP.getNumOperands() == 2 &&
		Src->hasOneUse()) {
		if (Loop *L = LI->getLoopFor(GEP.getParent())) {
		Value *GO1 = GEP.getOperand(1);
		Value *SO1 = Src->getOperand(1);
		// Reassociate the two GEPs if SO1 is variant in the loop and GO1 is
		// invariant: this breaks the dependence between GEPs and allows LICM
		// to hoist the invariant part out of the loop.
		if (L->isLoopInvariant(GO1) && !L->isLoopInvariant(SO1)) {
		Src->setOperand(1, GO1);
		GEP.setOperand(1, SO1);
		return &GEP;
		}
		}
		}

// Note that if our source is a gep chain itself then we wait for that		// Note that if our source is a gep chain itself then we wait for that
// chain to be resolved before we perform this transformation. This		// chain to be resolved before we perform this transformation. This
// avoids us creating a TON of code in some cases.		// avoids us creating a TON of code in some cases.
if (GEPOperator *SrcGEP =		if (GEPOperator *SrcGEP =
dyn_cast<GEPOperator>(Src->getOperand(0)))		dyn_cast<GEPOperator>(Src->getOperand(0)))
if (SrcGEP->getNumOperands() == 2 && shouldMergeGEPs(Src, SrcGEP))		if (SrcGEP->getNumOperands() == 2 && shouldMergeGEPs(Src, SrcGEP))
return nullptr; // Wait until our source is folded to completion.		return nullptr; // Wait until our source is folded to completion.

		mgrangUnsubmitted Done Reply Inline Actions nit: Period after comment. mgrang: nit: Period after comment.
SmallVector<Value*, 8> Indices;		SmallVector<Value*, 8> Indices;

// Find out whether the last index in the source GEP is a sequential idx.		// Find out whether the last index in the source GEP is a sequential idx.
bool EndsWithSequential = false;		bool EndsWithSequential = false;
for (gep_type_iterator I = gep_type_begin(Src), E = gep_type_end(Src);		for (gep_type_iterator I = gep_type_begin(Src), E = gep_type_end(Src);
I != E; ++I)		I != E; ++I)
EndsWithSequential = I.isSequential();		EndsWithSequential = I.isSequential();

Show All 11 Lines	if (EndsWithSequential) {
if (SO1->getType() != GO1->getType())		if (SO1->getType() != GO1->getType())
return nullptr;		return nullptr;

Value *Sum =		Value *Sum =
SimplifyAddInst(GO1, SO1, false, false, SQ.getWithInstruction(&GEP));		SimplifyAddInst(GO1, SO1, false, false, SQ.getWithInstruction(&GEP));
// Only do the combine when we are sure the cost after the		// Only do the combine when we are sure the cost after the
// merge is never more than that before the merge.		// merge is never more than that before the merge.
if (Sum == nullptr)		if (Sum == nullptr)
return nullptr;		return nullptr;
		mgrangUnsubmitted Done Reply Inline Actions Please consider using early exits: https://llvm.org/docs/CodingStandards.html#use-early-exits-and-continue-to-simplify-code. Something like: if (!LI) return nullptr; <main logic goes here> mgrang: Please consider using early exits: https://llvm.org/docs/CodingStandards.html#use-early-exits…

// Update the GEP in place if possible.		// Update the GEP in place if possible.
if (Src->getNumOperands() == 2) {		if (Src->getNumOperands() == 2) {
GEP.setOperand(0, Src->getOperand(0));		GEP.setOperand(0, Src->getOperand(0));
		mgrangUnsubmitted Done Reply Inline Actions nit: Period after comment. mgrang: nit: Period after comment.
GEP.setOperand(1, Sum);		GEP.setOperand(1, Sum);
return &GEP;		return &GEP;
		efriedmaUnsubmitted Done Reply Inline Actions Constants are loop-invariant; no need to check for them explicitly. efriedma: Constants are loop-invariant; no need to check for them explicitly.
}		}
Indices.append(Src->op_begin()+1, Src->op_end()-1);		Indices.append(Src->op_begin()+1, Src->op_end()-1);
Indices.push_back(Sum);		Indices.push_back(Sum);
Indices.append(GEP.op_begin()+2, GEP.op_end());		Indices.append(GEP.op_begin()+2, GEP.op_end());
		mgrangUnsubmitted Not Done Reply Inline Actions Too many parentheses. Can be simplified: if (!L->isLoopInvariant(GO1) \|\| !L->isLoopInvariant(SO1) \|\| L->isLoopInvariant(Src->getOperand(0))) mgrang: Too many parentheses. Can be simplified: ``` if (!L->isLoopInvariant(GO1) \|\| !L…
} else if (isa<Constant>(*GEP.idx_begin()) &&		} else if (isa<Constant>(*GEP.idx_begin()) &&
cast<Constant>(*GEP.idx_begin())->isNullValue() &&		cast<Constant>(*GEP.idx_begin())->isNullValue() &&
Src->getNumOperands() != 1) {		Src->getNumOperands() != 1) {
// Otherwise we can do the fold if the first index of the GEP is a zero		// Otherwise we can do the fold if the first index of the GEP is a zero
Indices.append(Src->op_begin()+1, Src->op_end());		Indices.append(Src->op_begin()+1, Src->op_end());
		efriedmaUnsubmitted Not Done Reply Inline Actions CreateAdd can't return nullptr. efriedma: CreateAdd can't return nullptr.
Indices.append(GEP.idx_begin()+1, GEP.idx_end());		Indices.append(GEP.idx_begin()+1, GEP.idx_end());
}		}

if (!Indices.empty())		if (!Indices.empty())
return GEP.isInBounds() && Src->isInBounds()		return GEP.isInBounds() && Src->isInBounds()
? GetElementPtrInst::CreateInBounds(		? GetElementPtrInst::CreateInBounds(
Src->getSourceElementType(), Src->getOperand(0), Indices,		Src->getSourceElementType(), Src->getOperand(0), Indices,
GEP.getName())		GEP.getName())
▲ Show 20 Lines • Show All 1,634 Lines • Show Last 20 Lines

test/Transforms/InstCombine/gep-combine-loop-invariant.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -instcombine -S \| FileCheck %s
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				efriedmaUnsubmitted Done Reply Inline Actions Please write tests to only run one pass; putting multiple passes in the RUN line makes it confusing to figure out what you're actually expecting each individual pass to do. (You can add a comment to explain, if it isn't obvious why a transform is profitable.) efriedma: Please write tests to only run one pass; putting multiple passes in the RUN line makes it…
				target triple = "x86_64-unknown-linux-gnu"

				; Function Attrs: norecurse nounwind readonly uwtable
				define i32 @foo(i8* nocapture readnone %match, i32 %cur_match, i32 %best_len, i32 %scan_end, i32* nocapture readonly %prev, i32 %limit, i32 %chain_length, i8* nocapture readonly %win, i32 %wmask) {
				; CHECK-LABEL: @foo(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[IDX_EXT2:%.]] = zext i32 [[CUR_MATCH:%.]] to i64
				; CHECK-NEXT: [[ADD_PTR4:%.]] = getelementptr inbounds i8, i8 [[WIN:%.*]], i64 [[IDX_EXT2]]
				; CHECK-NEXT: [[IDX_EXT1:%.]] = zext i32 [[BEST_LEN:%.]] to i64
				; CHECK-NEXT: [[ADD_PTR25:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR4]], i64 [[IDX_EXT1]]
				; CHECK-NEXT: [[ADD_PTR36:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR25]], i64 -1
				; CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[ADD_PTR36]] to i32*
				; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[TMP0]], align 4
				; CHECK-NEXT: [[CMP7:%.]] = icmp eq i32 [[TMP1]], [[SCAN_END:%.]]
				; CHECK-NEXT: br i1 [[CMP7]], label [[DO_END:%.]], label [[IF_THEN_LR_PH:%.]]
				; CHECK: if.then.lr.ph:
				; CHECK-NEXT: br label [[IF_THEN:%.*]]
				; CHECK: do.body:
				; CHECK-NEXT: [[IDX_EXT:%.]] = zext i32 [[TMP4:%.]] to i64
				; CHECK-NEXT: [[ADD_PTR:%.]] = getelementptr inbounds i8, i8 [[WIN]], i64 [[IDX_EXT1]]
				; CHECK-NEXT: [[ADD_PTR2:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR]], i64 -1
				; CHECK-NEXT: [[ADD_PTR3:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR2]], i64 [[IDX_EXT]]
				; CHECK-NEXT: [[TMP2:%.]] = bitcast i8 [[ADD_PTR3]] to i32*
				; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP2]], align 4
				; CHECK-NEXT: [[CMP:%.*]] = icmp eq i32 [[TMP3]], [[SCAN_END]]
				; CHECK-NEXT: br i1 [[CMP]], label [[DO_END]], label [[IF_THEN]]
				; CHECK: if.then:
				; CHECK-NEXT: [[CUR_MATCH_ADDR_09:%.]] = phi i32 [ [[CUR_MATCH]], [[IF_THEN_LR_PH]] ], [ [[TMP4]], [[DO_BODY:%.]] ]
				; CHECK-NEXT: [[CHAIN_LENGTH_ADDR_08:%.]] = phi i32 [ [[CHAIN_LENGTH:%.]], [[IF_THEN_LR_PH]] ], [ [[DEC:%.*]], [[DO_BODY]] ]
				; CHECK-NEXT: [[AND:%.]] = and i32 [[CUR_MATCH_ADDR_09]], [[WMASK:%.]]
				; CHECK-NEXT: [[IDXPROM:%.*]] = zext i32 [[AND]] to i64
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[PREV:%.*]], i64 [[IDXPROM]]
				; CHECK-NEXT: [[TMP4]] = load i32, i32* [[ARRAYIDX]], align 4
				; CHECK-NEXT: [[CMP4:%.]] = icmp ugt i32 [[TMP4]], [[LIMIT:%.]]
				; CHECK-NEXT: br i1 [[CMP4]], label [[LAND_LHS_TRUE:%.*]], label [[DO_END]]
				; CHECK: land.lhs.true:
				; CHECK-NEXT: [[DEC]] = add i32 [[CHAIN_LENGTH_ADDR_08]], -1
				; CHECK-NEXT: [[CMP5:%.*]] = icmp eq i32 [[DEC]], 0
				; CHECK-NEXT: br i1 [[CMP5]], label [[DO_END]], label [[DO_BODY]]
				; CHECK: do.end:
				; CHECK-NEXT: [[CONT_0:%.]] = phi i32 [ 1, [[ENTRY:%.]] ], [ 0, [[IF_THEN]] ], [ 0, [[LAND_LHS_TRUE]] ], [ 1, [[DO_BODY]] ]
				; CHECK-NEXT: ret i32 [[CONT_0]]
				;
				entry:
				%idx.ext2 = zext i32 %cur_match to i64
				%add.ptr4 = getelementptr inbounds i8, i8* %win, i64 %idx.ext2
				%idx.ext1 = zext i32 %best_len to i64
				%add.ptr25 = getelementptr inbounds i8, i8* %add.ptr4, i64 %idx.ext1
				%add.ptr36 = getelementptr inbounds i8, i8* %add.ptr25, i64 -1
				%0 = bitcast i8* %add.ptr36 to i32*
				%1 = load i32, i32* %0, align 4
				%cmp7 = icmp eq i32 %1, %scan_end
				br i1 %cmp7, label %do.end, label %if.then.lr.ph

				if.then.lr.ph: ; preds = %entry
				br label %if.then

				do.body: ; preds = %land.lhs.true
				%chain_length.addr.0 = phi i32 [ %dec, %land.lhs.true ]
				%cur_match.addr.0 = phi i32 [ %4, %land.lhs.true ]
				%idx.ext = zext i32 %cur_match.addr.0 to i64
				%add.ptr = getelementptr inbounds i8, i8* %win, i64 %idx.ext
				%add.ptr2 = getelementptr inbounds i8, i8* %add.ptr, i64 %idx.ext1
				%add.ptr3 = getelementptr inbounds i8, i8* %add.ptr2, i64 -1
				%2 = bitcast i8* %add.ptr3 to i32*
				%3 = load i32, i32* %2, align 4
				%cmp = icmp eq i32 %3, %scan_end
				br i1 %cmp, label %do.end, label %if.then

				if.then: ; preds = %if.then.lr.ph, %do.body
				%cur_match.addr.09 = phi i32 [ %cur_match, %if.then.lr.ph ], [ %cur_match.addr.0, %do.body ]
				%chain_length.addr.08 = phi i32 [ %chain_length, %if.then.lr.ph ], [ %chain_length.addr.0, %do.body ]
				%and = and i32 %cur_match.addr.09, %wmask
				%idxprom = zext i32 %and to i64
				%arrayidx = getelementptr inbounds i32, i32* %prev, i64 %idxprom
				%4 = load i32, i32* %arrayidx, align 4
				%cmp4 = icmp ugt i32 %4, %limit
				br i1 %cmp4, label %land.lhs.true, label %do.end

				land.lhs.true: ; preds = %if.then
				%dec = add i32 %chain_length.addr.08, -1
				%cmp5 = icmp eq i32 %dec, 0
				br i1 %cmp5, label %do.end, label %do.body

				do.end: ; preds = %do.body, %land.lhs.true, %if.then, %entry
				%cont.0 = phi i32 [ 1, %entry ], [ 0, %if.then ], [ 0, %land.lhs.true ], [ 1, %do.body ]
				ret i32 %cont.0
				}

test/Transforms/LoopVectorize/consecutive-ptr-uniforms.ll

	Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
	; Check that a reverse consecutive pointer is recognized as uniform and remains			; Check that a reverse consecutive pointer is recognized as uniform and remains
	; uniform after vectorization.			; uniform after vectorization.
	;			;
	; CHECK: LV: Found uniform instruction: %tmp1 = getelementptr inbounds i32, i32* %a, i64 %i			; CHECK: LV: Found uniform instruction: %tmp1 = getelementptr inbounds i32, i32* %a, i64 %i
	; CHECK: vector.body			; CHECK: vector.body
	; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]			; CHECK: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; CHECK: %offset.idx = sub i64 %n, %index			; CHECK: %offset.idx = sub i64 %n, %index
	; CHECK-NOT: getelementptr			; CHECK-NOT: getelementptr
	; CHECK: %[[G0:.+]] = getelementptr inbounds i32, i32* %a, i64 %offset.idx			; CHECK: %[[G0:.+]] = getelementptr inbounds i32, i32* %a, i64 -3
	; CHECK: getelementptr i32, i32* %[[G0]], i64 -3			; CHECK: getelementptr i32, i32* %[[G0]], i64 %offset.idx
	; CHECK-NOT: getelementptr			; CHECK-NOT: getelementptr
	; CHECK: br i1 {{.*}}, label %middle.block, label %vector.body			; CHECK: br i1 {{.*}}, label %middle.block, label %vector.body
	;			;
	define i32 @consecutive_ptr_reverse(i32* %a, i64 %n) {			define i32 @consecutive_ptr_reverse(i32* %a, i64 %n) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	▲ Show 20 Lines • Show All 430 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] reassociate loop invariant GEP chains to enable LICM
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 137869

lib/Transforms/InstCombine/InstructionCombining.cpp

test/Transforms/InstCombine/gep-combine-loop-invariant.ll

test/Transforms/LoopVectorize/consecutive-ptr-uniforms.ll

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] reassociate loop invariant GEP chains to enable LICMClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 137869

lib/Transforms/InstCombine/InstructionCombining.cpp

test/Transforms/InstCombine/gep-combine-loop-invariant.ll

test/Transforms/LoopVectorize/consecutive-ptr-uniforms.ll

[InstCombine] reassociate loop invariant GEP chains to enable LICM
ClosedPublic