Download Raw Diff

Details

Reviewers

kiranchandramohan
nikic
paulwalker-arm
david-arm
dmgreen

Commits

rGe13bed4c5f35: [PATCH] [llvm] [InstCombine] Canonicalise ADD+GEP

Summary

This patch tries to canonicalise add + gep to gep + gep.

Co-authored-by: Paul Walker <paul.walker@arm.com>

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60 ms	x64 debian > LLVM.Transforms/InstCombine::gep-combine-loop-invariant.ll
	180 ms	x64 debian > LLVM.Transforms/LoopVectorize::induction.ll
	110 ms	x64 debian > LLVM.Transforms/LoopVectorize::interleaved-accesses.ll
	60 ms	x64 debian > LLVM.Transforms/LoopVectorize::invariant-store-vectorization.ll
	60 ms	x64 debian > LLVM.Transforms/LoopVectorize::runtime-check.ll
		View Full Test Results (10 Failed)

Event Timeline

d-smirnov created this revision.Jul 19 2023, 2:55 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 19 2023, 2:55 AM

Herald added subscribers: arphaman, hiraditya. · View Herald Transcript

d-smirnov requested review of this revision.Jul 19 2023, 2:55 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 19 2023, 2:55 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

d-smirnov added a reviewer: kiranchandramohan.Jul 19 2023, 2:56 AM

This needs to happen in LICM, not InstCombine.

This revision now requires changes to proceed.Jul 19 2023, 3:25 AM

Herald added a subscriber: StephenFan. · View Herald TranscriptJul 19 2023, 3:25 AM

kiranchandramohan added reviewers: paulwalker-arm, david-arm.Jul 19 2023, 3:56 AM

kiranchandramohan edited the summary of this revision. (Show Details)Jul 19 2023, 4:02 AM

Herald added a subscriber: kristof.beyls. · View Herald TranscriptJul 19 2023, 4:02 AM

Alternatively, it would also be possible to canonicalize gep(p, add(x, y)) to gep(gep(p, x), y) in general, in which case existing GEP reassociation support in LICM will take care of the rest. Arguably this is cleaner (as we should have a canonical form between these two possibilities), but it's more likely to cause fallout.

Harbormaster completed remote builds in B246479: Diff 541923.Jul 19 2023, 6:11 AM

Relaxed checks

Herald added a project: Restricted Project. · View Herald TranscriptSep 15 2023, 3:08 PM

Herald added subscribers: cfe-commits, wangpc, zzheng. · View Herald Transcript

Harbormaster completed remote builds in B257295: Diff 556881.Sep 15 2023, 3:24 PM

updated

Harbormaster completed remote builds in B257298: Diff 556884.Sep 15 2023, 4:19 PM

unit test fixed

Harbormaster completed remote builds in B257304: Diff 556894.Sep 16 2023, 3:49 AM

unit tests

Harbormaster completed remote builds in B257306: Diff 556898.Sep 16 2023, 12:05 PM

@nikic Could you check out the updated code to make sure we're on the right track before I try to fix the rest of the unit tests?

unit tests

Harbormaster completed remote builds in B257396: Diff 557022.Sep 19 2023, 3:59 AM

Hexagon test updated

Harbormaster completed remote builds in B257445: Diff 557104.Sep 20 2023, 3:02 AM

d-smirnov added a reviewer: dmgreen.Sep 20 2023, 3:22 AM

paulwalker-arm added inline comments.Sep 20 2023, 3:32 AM

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
2197–2239	Perhaps move this block after the `We do not handle pointer-vector geps here` immediately below so this test can be removed.

nikic added inline comments.Sep 20 2023, 4:00 AM

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
2201	This needs a one-use check. The transform is not profitable if we have to keep both the add and the gep. Can also use `match(GEP.getOperand(1), m_Add(...))` here.
2204	This no longer checks for loop invariance, so we should remove any invariance-related terminology.
2216	This inbounds preservation is incorrect: https://alive2.llvm.org/ce/z/bJZvQG It's even incorrect if the add is also nsw.

Amended

Reordered and removed extra check

updated

Harbormaster completed remote builds in B257470: Diff 557138.Sep 20 2023, 12:45 PM

d-smirnov retitled this revision from [PATCH] [llvm] [InstCombine] Reassociate loop invariant GEP index calculations. to [PATCH] [llvm] [InstCombine] Canonicalise ADD+GEP.Sep 21 2023, 9:25 AM

d-smirnov edited the summary of this revision. (Show Details)

comment updated

Harbormaster completed remote builds in B257503: Diff 557187.Sep 21 2023, 11:35 AM

@nikic Updated. Please review

nikic added inline comments.Oct 2 2023, 6:39 AM

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
2246–2248
2254
2257	No need for the NewGEP variable.

amended

@nikic Amended.

Harbormaster completed remote builds in B257724: Diff 557538.Oct 4 2023, 1:50 AM

LGTM

We should give this a try, but I think there is a fairly large chance that this will cause regressions somewhere and a more targeted change may be necessary (e.g. only do this for loop-invariants in LICM).

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
2255–2256	IRBuilder needs to be used for all but the last instruction.

This revision is now accepted and ready to land.Oct 4 2023, 6:34 AM

Updated

Harbormaster completed remote builds in B257767: Diff 557609.Oct 5 2023, 8:59 AM

Closed by commit rGe13bed4c5f35: [PATCH] [llvm] [InstCombine] Canonicalise ADD+GEP (authored by d-smirnov, committed by MatsPetersson). · Explain WhyOct 6 2023, 4:38 AM

This revision was automatically updated to reflect the committed changes.

MatsPetersson added a commit: rGe13bed4c5f35: [PATCH] [llvm] [InstCombine] Canonicalise ADD+GEP.

How does this patch work with visitGEPOfGEP that does a reverse transformation?

// Replace: gep (gep %P, long B), long A, ...
// With:    T = long A+B; gep %P, T, ...

In D155688#4653347, @fiigii wrote:
How does this patch work with visitGEPOfGEP that does a reverse transformation?
// Replace: gep (gep %P, long B), long A, ...
// With:    T = long A+B; gep %P, T, ...

The reverse transform is only done if A + B simplifies.

By the way, this change did cause some code size regressions: http://llvm-compile-time-tracker.com/compare.php?from=a16f6462d756804276d4b39267b3c19bcd6949fe&to=e13bed4c5f3544c076ce57e36d9a11eefa5a7815&stat=size-text

The one that stood out to me is that btGjkEpa2.cpp from bullet has become 13% larger.

The reverse transform is only done if A + B simplifies.

Looks like`simplifyAddInst` may give add expressions, so I guess this patch may make IC run into infinite loops.

Additionally, this change could make longer GEP chains that could hurt other optimizations by exceeding AA or value-tracking thresholds.

We have some improvements with the patch, most notable: 549.fotonik_3d improves about 6%.
@nikic Should we revert the patch and try another location for it (in LICM pass, as you previously suggested)?

In D155688#4653520, @fiigii wrote:

The reverse transform is only done if A + B simplifies.

Looks like`simplifyAddInst` may give add expressions, so I guess this patch may make IC run into infinite loops.

simplifyAddInst can return an add instruction, but it will be an existing one. It will never introduce a new one. So I'm not sure how this would result in infinite loops?

In D155688#4653629, @d-smirnov wrote:

We have some improvements with the patch, most notable: 549.fotonik_3d improves about 6%.
@nikic Should we revert the patch and try another location for it (in LICM pass, as you previously suggested)?

I don't think we have cause to revert just yet, as we're not aware of any specific issues.

That would be fine. Thanks for explaining.

After this patch was recently pulled into my downstream, I'm seeing a lot of invariant.gep created by LICM. For example, in LBM_performStreamCollide in 470.lbm there are 65 of them. On RISC-V, these all get created in registers outside the loop and get spilled. Is ARM seeing anything like this or do you have more addressing modes that allow CodeGenPrepare to bring these back into the loop?

I hadn't realized this came from someone at Arm. The performance results I had were overall roughly flat, with some improvements and regressions. I think there were still some people working through some fixes for some of the knock-on effects but with those nothing large would stick out in what I saw.

I would expect Loop Strength Reduction (maybe with CGP) to be able to optimize the addressing modes back to something that is optimal for the loop if it can. It's not always super reliable though. Might there be something going wrong in that pass?

Diff 541923

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp

Show First 20 Lines • Show All 2,188 Lines • ▼ Show 20 Lines if (GEP.getOperand(1)->getType()->getScalarSizeInBits() ==

Value *Y; Value *Y;

Value *X = GEP.getOperand(0); Value *X = GEP.getOperand(0);

if (Matched && if (Matched &&

match(V, m_Sub(m_PtrToInt(m_Value(Y)), m_PtrToInt(m_Specific(X)))) && match(V, m_Sub(m_PtrToInt(m_Value(Y)), m_PtrToInt(m_Specific(X)))) &&

getUnderlyingObject(X) == getUnderlyingObject(Y)) getUnderlyingObject(X) == getUnderlyingObject(Y))

return CastInst::CreatePointerBitCastOrAddrSpaceCast(Y, GEPType); return CastInst::CreatePointerBitCastOrAddrSpaceCast(Y, GEPType);

} }

if (LI && (GEP.getNumIndices() == 1) && !GEP.getType()->isVectorTy()) {

auto *BB = GEP.getParent();

auto *L = LI->getLoopFor(BB);

auto *Idx = dyn_cast<BinaryOperator>(GEP.getOperand(1));

nikicUnsubmitted

Done

This needs a one-use check. The transform is not profitable if we have to keep *both* the add and the gep.

Can also use match(GEP.getOperand(1), m_Add(...)) here.

nikic: This needs a one-use check. The transform is not profitable if we have to keep *both* the add…

// Try to reassociate loop invariant index calculations to enable LICM.

if (L && Idx && (Idx->getOpcode() == Instruction::Add)) {

nikicUnsubmitted

Done

This no longer checks for loop invariance, so we should remove any invariance-related terminology.

nikic: This no longer checks for loop invariance, so we should remove any invariance-related…

Value *Ptr = GEP.getOperand(0);

Value *InvIdx = Idx->getOperand(0);

Value *NonInvIdx = Idx->getOperand(1);

if (!L->isLoopInvariant(InvIdx))

std::swap(InvIdx, NonInvIdx);

if (L->isLoopInvariant(InvIdx) && !L->isLoopInvariant(NonInvIdx) &&

L->isLoopInvariant(Ptr)) {

// Ensure Idx can be eliminated.

auto IsDead = [BB, L](User *U) {

auto *G = dyn_cast<GetElementPtrInst>(U);

nikicUnsubmitted

Done

This inbounds preservation is incorrect: https://alive2.llvm.org/ce/z/bJZvQG

It's even incorrect if the add is also nsw.

nikic: This inbounds preservation is incorrect: https://alive2.llvm.org/ce/z/bJZvQG It's even…

return G && (G->getNumIndices() == 1) && (G->getParent() == BB) &&

L->isLoopInvariant(G->getOperand(0));

};

if (Idx->hasOneUse() ||

std::all_of(Idx->user_begin(), Idx->user_end(), IsDead)) {

// rewrite:

// %idx = add i64 %invariant, %indvars.iv

// %gep = getelementptr i32, i32* %ptr, i64 %idx

// as:

// %newptr = getelementptr i32, i32* %ptr, i64 %invariant

// %newgep = getelementptr i32, i32* %newptr, i64 %indvars.iv

auto *NewPtr = GetElementPtrInst::Create(GEP.getResultElementType(),

Ptr, InvIdx, "", &GEP);

auto *NewGEP = GetElementPtrInst::Create(GEP.getResultElementType(),

NewPtr, NonInvIdx);

NewGEP->setIsInBounds(GEP.isInBounds());

return NewGEP;

}

paulwalker-armUnsubmitted

Done

Perhaps move this block after the We do not handle pointer-vector geps here immediately below so this test can be removed.

paulwalker-arm: Perhaps move this block after the `We do not handle pointer-vector geps here` immediately below…

// We do not handle pointer-vector geps here. // We do not handle pointer-vector geps here.

if (GEPType->isVectorTy()) if (GEPType->isVectorTy())

return nullptr; return nullptr;

if (!GEP.isInBounds()) { if (!GEP.isInBounds()) {

unsigned IdxWidth = unsigned IdxWidth =

DL.getIndexSizeInBits(PtrOp->getType()->getPointerAddressSpace()); DL.getIndexSizeInBits(PtrOp->getType()->getPointerAddressSpace());

APInt BasePtrOffset(IdxWidth, 0); APInt BasePtrOffset(IdxWidth, 0);

Value *UnderlyingPtrOp = Value *UnderlyingPtrOp =

nikicUnsubmitted

Done

// Try to replace ADD + GEP with GEP + GEP.

- if (BinaryOperator *Idx =

- dyn_cast_or_null<BinaryOperator>(GEP.getOperand(1)))

- if ((Idx->getOpcode() == Instruction::Add) && Idx->hasOneUse()) {

+ Value *Idx1, *Idx2;

+ if (match(GEP.getOperand(1), m_OneUse(m_Add(m_Value(Idx1), m_Value(Idx2)))) {

// %idx = add i64 %idx1, %idx2

nikic:

PtrOp->stripAndAccumulateInBoundsConstantOffsets(DL, PtrOp->stripAndAccumulateInBoundsConstantOffsets(DL,

BasePtrOffset); BasePtrOffset);

bool CanBeNull, CanBeFreed; bool CanBeNull, CanBeFreed;

uint64_t DerefBytes = UnderlyingPtrOp->getPointerDereferenceableBytes( uint64_t DerefBytes = UnderlyingPtrOp->getPointerDereferenceableBytes(

DL, CanBeNull, CanBeFreed); DL, CanBeNull, CanBeFreed);

if (!CanBeNull && !CanBeFreed && DerefBytes != 0) { if (!CanBeNull && !CanBeFreed && DerefBytes != 0) {

nikicUnsubmitted

Done

// %newgep = getelementptr i32, i32* %newptr, i64 %idx2

- Value *Ptr = GEP.getOperand(0);

+ Value *Ptr = GEP.getPointerOperand();

auto *NewPtr = GetElementPtrInst::Create(

nikic:

if (GEP.accumulateConstantOffset(DL, BasePtrOffset) && if (GEP.accumulateConstantOffset(DL, BasePtrOffset) &&

BasePtrOffset.isNonNegative()) { BasePtrOffset.isNonNegative()) {

nikicUnsubmitted

Done

Value *Ptr = GEP.getPointerOperand();

- auto *NewPtr = GetElementPtrInst::Create(GEP.getResultElementType(), Ptr,

- Idx1, "", &GEP);

+ auto *NewPtr = Builder.CreateGEP(GEP.getResultElementType(), Ptr, Idx1);

return GetElementPtrInst::Create(GEP.getResultElementType(), NewPtr,

IRBuilder needs to be used for all but the last instruction.

nikic: IRBuilder needs to be used for all but the last instruction.

APInt AllocSize(IdxWidth, DerefBytes); APInt AllocSize(IdxWidth, DerefBytes);

nikicUnsubmitted

Done

No need for the NewGEP variable.

nikic: No need for the NewGEP variable.

if (BasePtrOffset.ule(AllocSize)) { if (BasePtrOffset.ule(AllocSize)) {

return GetElementPtrInst::CreateInBounds( return GetElementPtrInst::CreateInBounds(

GEP.getSourceElementType(), PtrOp, Indices, GEP.getName()); GEP.getSourceElementType(), PtrOp, Indices, GEP.getName());

} }

▲ Show 20 Lines • Show All 2,123 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/gep-combine-loop-invariant.ll

Show First 20 Lines • Show All 176 Lines • ▼ Show 20 Lines	loop: ; preds = %loop, %entry
%lr1 = lshr <2 x i64> %pi1, <i64 21, i64 21>		%lr1 = lshr <2 x i64> %pi1, <i64 21, i64 21>
%sl1 = shl nuw nsw <2 x i64> %lr1, <i64 7, i64 7>		%sl1 = shl nuw nsw <2 x i64> %lr1, <i64 7, i64 7>
%e5 = getelementptr inbounds i8, <2 x ptr> %base, <2 x i64> %sl1		%e5 = getelementptr inbounds i8, <2 x ptr> %base, <2 x i64> %sl1
%e6 = getelementptr inbounds i8, <2 x ptr> %e5, i64 80		%e6 = getelementptr inbounds i8, <2 x ptr> %e5, i64 80
call void @blackhole(<2 x ptr> %e6)		call void @blackhole(<2 x ptr> %e6)
br label %loop		br label %loop
}		}

		; Test that ADD->GEP chains get reassociated to separate invariants and
		; thus provide more LICM opportunities.
		define void @test1(float* %a, float* %b, i64 %disp) {
		; CHECK-LABEL: @test1(
		entry:
		br label %for.body

		for.body:
		; CHECK: for.body:
		%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
		; CHECK: [[INV:%.]] = getelementptr float, float %a, i64 %disp
		; CHECK: %arrayidx = getelementptr inbounds float, float* [[INV]], i64 %indvars.iv
		%idx = add i64 %indvars.iv, %disp
		%arrayidx = getelementptr inbounds float, float* %a, i64 %idx
		%0 = load float, float* %arrayidx
		%div = fdiv float 1.500000e+00, %0
		; CHECK: [[INV1:%.]] = getelementptr float, float %b, i64 %disp
		; CHECK: %arrayidx1 = getelementptr inbounds float, float* [[INV1]], i64 %indvars.iv
		%arrayidx1 = getelementptr inbounds float, float* %b, i64 %idx
		store float %div, float* %arrayidx1
		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
		%exitcond = icmp ne i64 %indvars.iv.next, 1024
		br i1 %exitcond, label %for.body, label %for.cond.cleanup

		for.cond.cleanup:
		ret void
		}

		; As test1 but esnure no transformation when we cannot prove the original ADD
		; will become redundant, because it's used by a non-GEP.
		define i64 @test2(float* %a, float* %b, i64 %disp) {
		; CHECK-LABEL: @test2(
		entry:
		br label %for.body

		for.body:
		; CHECK: for.body:
		%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
		; CHECK: %idx = add i64 %indvars.iv, %disp
		; CHECK: %arrayidx = getelementptr inbounds float, float* %a, i64 %idx
		%idx = add i64 %indvars.iv, %disp
		%arrayidx = getelementptr inbounds float, float* %a, i64 %idx
		%0 = load float, float* %arrayidx
		%div = fdiv float 1.500000e+00, %0
		; CHECK: %arrayidx1 = getelementptr inbounds float, float* %b, i64 %idx
		%arrayidx1 = getelementptr inbounds float, float* %b, i64 %idx
		store float %div, float* %arrayidx1
		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
		%exitcond = icmp ne i64 %indvars.iv.next, 1024
		br i1 %exitcond, label %for.body, label %for.cond.cleanup

		for.cond.cleanup:
		ret i64 %idx
		}

		; As test1 but esnure no transformation when we cannot prove the original ADD
		; will become redundant, because the second GEP's ptr is not invariant.
		define void @test3(float* %a, float* %b, i64 %disp) {
		; CHECK-LABEL: @test3(
		entry:
		br label %for.body

		for.body:
		; CHECK: for.body:
		%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
		; CHECK: %idx = add i64 %indvars.iv, %disp
		; CHECK: %arrayidx = getelementptr inbounds float, float* %a, i64 %idx
		%idx = add i64 %indvars.iv, %disp
		%arrayidx = getelementptr inbounds float, float* %a, i64 %idx
		%0 = load float, float* %arrayidx
		%div = fdiv float 1.500000e+00, %0
		; CHECK: %arrayidx1 = getelementptr inbounds float, float* %arrayidx, i64 %idx
		%arrayidx1 = getelementptr inbounds float, float* %arrayidx, i64 %idx
		store float %div, float* %arrayidx1
		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
		%exitcond = icmp ne i64 %indvars.iv.next, 1024
		br i1 %exitcond, label %for.body, label %for.cond.cleanup

		for.cond.cleanup:
		ret void
		}

; This would crash because we did not expect to be able to constant fold a GEP.		; This would crash because we did not expect to be able to constant fold a GEP.

define void @PR51485(<2 x i64> %v) {		define void @PR51485(<2 x i64> %v) {
; CHECK-LABEL: @PR51485(		; CHECK-LABEL: @PR51485(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: br label [[LOOP:%.*]]		; CHECK-NEXT: br label [[LOOP:%.*]]
; CHECK: loop:		; CHECK: loop:
; CHECK-NEXT: [[SL1:%.]] = shl nuw nsw <2 x i64> [[V:%.]], <i64 7, i64 7>		; CHECK-NEXT: [[SL1:%.]] = shl nuw nsw <2 x i64> [[V:%.]], <i64 7, i64 7>
▲ Show 20 Lines • Show All 144 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[PATCH] [llvm] [InstCombine] Canonicalise ADD+GEP
ClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 541923

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp

llvm/test/Transforms/InstCombine/gep-combine-loop-invariant.ll

This is an archive of the discontinued LLVM Phabricator instance.

[PATCH] [llvm] [InstCombine] Canonicalise ADD+GEPClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 541923

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp

llvm/test/Transforms/InstCombine/gep-combine-loop-invariant.ll

[PATCH] [llvm] [InstCombine] Canonicalise ADD+GEP
ClosedPublic