This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Skip merging non-free GEP
Needs ReviewPublic

Authored by junbuml on Sep 28 2018, 1:42 PM.

Download Raw Diff

Details

Reviewers

javed.absar
sebpop
dneilson
efriedma
davide

Summary

Currently, InstCombine aggressively merge GEPs because it can help backends make
better decision with complex addressing modes. However, in some cases, merging
GEPs can cause more instructions. This change will skip merging GEPs if a
non-free GEP is used in multiple basic blocks.

For the test-case (unit_skip_gep_merge.ll), this patch change code generation (-O3) for AArch64 :
from :

foo:
        orr     w8, wzr, #0x18
        madd    x8, x2, x8, x0
        ldr     x8, [x8, #8]
        str     x8, [x4]
        tbz     w1, #0, .LBB0_2
// %bb.1:
        orr     w8, wzr, #0x18
        madd    x8, x2, x8, x0
        ldr     x8, [x8, #16]
        str     x8, [x3]
.LBB0_2:
        ret

to :

foo:
        orr     w8, wzr, #0x18
        madd    x8, x2, x8, x0
        ldr     x9, [x8, #8]
        str     x9, [x4]
        tbz     w1, #0, .LBB0_2
// %bb.1:
        ldr     x8, [x8, #16]
        str     x8, [x3]
.LBB0_2:
        ret

Diff Detail

Event Timeline

junbuml created this revision.Sep 28 2018, 1:42 PM

Herald added a reviewer: javed.absar. · View Herald TranscriptSep 28 2018, 1:42 PM

Herald added subscribers: hiraditya, kristof.beyls, mcrosier. · View Herald Transcript

junbuml added reviewers: sebpop, dneilson, efriedma, davide.Sep 28 2018, 1:52 PM

junbuml added a subscriber: llvm-commits.

Do you have perf/size numbers?

I'm sort of confused that MachineCSE doesn't trigger on your testcase, but that isn't really relevant here, I guess.

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1779	For array GEPs, we have a check to try to make sure we aren't making a GEP more expensive (see EndsWithSequential etc.) Maybe instead of adding a different kind of check, it would make more sense to extend the existing check to struct GEPs?

dmgreen added a subscriber: dmgreen.Oct 1 2018, 5:43 AM

Do you have perf/size numbers?

I didn't see any degradation in size/perf in my spec2000. Just observed minor performance improvement in some internal benchmarks.

I'm sort of confused that MachineCSE doesn't trigger on your testcase, but that isn't really relevant here, I guess.

I also think MachineCSE should be able to catch it in the machine level, but somehow it's not. I will check this too. However, I believe it's better not to distribute the same calculations in early phase unless we have clear benefits with the duplicated calculations.

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1779	Did you mean to move this whole check (line 1680~1687) into EndsWithSequential?

efriedma added inline comments.Oct 1 2018, 1:23 PM

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1779	No. We currently have a check in the "if (EndsWithSequential)" which restricts GEP folding. That doesn't apply to your testcase because the last index is a struct index. Instead, it falls into the "else if (isa<Constant>(*GEP.idx_begin())" case, which doesn't have a profitability check. My suggestion is to adapt the EndsWithSequential profitability check to also apply to struct GEPs.

junbuml added inline comments.Oct 1 2018, 2:34 PM

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1779	The last index (%v) in the test testcase is still a pointer of struct, which is sequential. This testcase actually get into "if (EndsWithSequential)" and the Src (%arrayidx8) is merged with GEP (%tmp1 and %tmp2). Looks like there no such check we have in this patch in "if (EndsWithSequential)".

efriedma added inline comments.Oct 1 2018, 6:45 PM

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1779	Oh, sorry, you're right, I managed to confuse myself based on what we should be doing, rather than what we're actually doing at the moment. I don't like adding heuristics based on the uses of the GEP; that's likely to lead to inconsistent results, and possibly O(N^2) compile-time. What I think we should be doing instead is trying to maintain the invariant that each GEP either has all constant offsets, or has exactly one non-zero offset. Roughly, the rule allows GEPs which will lower to one addition instruction. It composes well, other optimizations can already deal with it (IIRC), and it's a straightforward extension of what we're already doing in the "if (EndsWithSequential)" block.

junbuml added inline comments.Oct 2 2018, 9:26 AM

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1779	What I think we should be doing instead is trying to maintain the invariant that each GEP either has all constant offsets, or has exactly one non-zero offset. I'm afraid if I fully catch what you meant here. I guess you meant that overall we need to be less aggressive in merging GEPs in instcombine, and in instcombine, we try to keep GEPs have either all constant offset or exactly one non-constant offset? If merging GEPs can break this, we do not merge. It composes well, other optimizations can already deal with it (IIRC) I guess CodeGenprepare and ISel. Please let me know other passes which handle something like this.

efriedma added inline comments.Oct 2 2018, 11:04 AM

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1779	If merging GEPs can break this, we do not merge. Yes, that's what I meant.

evandro added a subscriber: evandro.Oct 2 2018, 11:31 AM

Tried to handle Eli's comment. Please take a look and let me know any comment.

With the last modification I made based on Eli's comment, I didn't see any significant changes in size / performance in my spec2000 test on AArch64. Observed minor performance improvement in some internal benchmarks.

Kindly ping.

Kindly ping one more time.

brzycki added a subscriber: brzycki.Oct 29 2018, 1:43 PM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstructionCombining.cpp

33 lines

test/

Transforms/

InstCombine/

gepphigep.ll

4 lines

unit_skip_gep_merge.ll

28 lines

Diff 169109

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp

Show First 20 Lines • Show All 1,103 Lines • ▼ Show 20 Lines	static bool shouldMergeGEPs(GEPOperator &GEP, GEPOperator &Src) {
// Src. If Src is not a trivial GEP too, don't combine		// Src. If Src is not a trivial GEP too, don't combine
// the indices.		// the indices.
if (GEP.hasAllZeroIndices() && !Src.hasAllZeroIndices() &&		if (GEP.hasAllZeroIndices() && !Src.hasAllZeroIndices() &&
!Src.hasOneUse())		!Src.hasOneUse())
return false;		return false;
return true;		return true;
}		}

		static bool couldBeExpensiveGEP(ArrayRef<const Value *> Indices) {
		// We try to maintain that GEPs have either all constant offset or exactly
		// one non-constant offset. If merging GEPs can break this, we do not merge.
		bool FoundNonConst = false;
		bool FoundNonZeroConst = false;
		for (unsigned Idx = 0, Size = Indices.size(); Idx != Size; ++Idx) {
		if (auto *C = dyn_cast<Constant>(Indices[Idx])) {
		if (!C->isNullValue())
		FoundNonZeroConst = true;
		} else {
		if (FoundNonConst)
		return true;
		FoundNonConst = true;
		}
		if (FoundNonConst && FoundNonZeroConst)
		return true;
		}
		return false;
		}

/// Return a value X such that Val = X * Scale, or null if none.		/// Return a value X such that Val = X * Scale, or null if none.
/// If the multiplication is known not to overflow, then NoSignedWrap is set.		/// If the multiplication is known not to overflow, then NoSignedWrap is set.
Value InstCombiner::Descale(Value Val, APInt Scale, bool &NoSignedWrap) {		Value InstCombiner::Descale(Value Val, APInt Scale, bool &NoSignedWrap) {
assert(isa<IntegerType>(Val->getType()) && "Can only descale integers!");		assert(isa<IntegerType>(Val->getType()) && "Can only descale integers!");
assert(cast<IntegerType>(Val->getType())->getBitWidth() ==		assert(cast<IntegerType>(Val->getType())->getBitWidth() ==
Scale.getBitWidth() && "Scale not compatible with value!");		Scale.getBitWidth() && "Scale not compatible with value!");

// If Val is zero or Scale is one then Val = Val * Scale.		// If Val is zero or Scale is one then Val = Val * Scale.
▲ Show 20 Lines • Show All 631 Lines • ▼ Show 20 Lines	if (auto *Src = dyn_cast<GEPOperator>(PtrOp)) {

// Find out whether the last index in the source GEP is a sequential idx.		// Find out whether the last index in the source GEP is a sequential idx.
bool EndsWithSequential = false;		bool EndsWithSequential = false;
for (gep_type_iterator I = gep_type_begin(Src), E = gep_type_end(Src);		for (gep_type_iterator I = gep_type_begin(Src), E = gep_type_end(Src);
I != E; ++I)		I != E; ++I)
EndsWithSequential = I.isSequential();		EndsWithSequential = I.isSequential();

// Can we combine the two pointer arithmetics offsets?		// Can we combine the two pointer arithmetics offsets?
if (EndsWithSequential) {		if (EndsWithSequential) {
		efriedmaUnsubmitted Not Done Reply Inline Actions For array GEPs, we have a check to try to make sure we aren't making a GEP more expensive (see EndsWithSequential etc.) Maybe instead of adding a different kind of check, it would make more sense to extend the existing check to struct GEPs? efriedma: For array GEPs, we have a check to try to make sure we aren't making a GEP more expensive (see…
		junbumlAuthorUnsubmitted Not Done Reply Inline Actions Did you mean to move this whole check (line 1680~1687) into EndsWithSequential? junbuml: Did you mean to move this whole check (line 1680~1687) into EndsWithSequential?
		efriedmaUnsubmitted Not Done Reply Inline Actions No. We currently have a check in the "if (EndsWithSequential)" which restricts GEP folding. That doesn't apply to your testcase because the last index is a struct index. Instead, it falls into the "else if (isa<Constant>(GEP.idx_begin())" case, which doesn't have a profitability check. My suggestion is to adapt the EndsWithSequential profitability check to also apply to struct GEPs. efriedma:* No. We currently have a check in the "if (EndsWithSequential)" which restricts GEP folding.
		junbumlAuthorUnsubmitted Not Done Reply Inline Actions The last index (%v) in the test testcase is still a pointer of struct, which is sequential. This testcase actually get into "if (EndsWithSequential)" and the Src (%arrayidx8) is merged with GEP (%tmp1 and %tmp2). Looks like there no such check we have in this patch in "if (EndsWithSequential)". junbuml: The last index (%v) in the test testcase is still a pointer of struct, which is sequential.
		efriedmaUnsubmitted Not Done Reply Inline Actions Oh, sorry, you're right, I managed to confuse myself based on what we should be doing, rather than what we're actually doing at the moment. I don't like adding heuristics based on the uses of the GEP; that's likely to lead to inconsistent results, and possibly O(N^2) compile-time. What I think we should be doing instead is trying to maintain the invariant that each GEP either has all constant offsets, or has exactly one non-zero offset. Roughly, the rule allows GEPs which will lower to one addition instruction. It composes well, other optimizations can already deal with it (IIRC), and it's a straightforward extension of what we're already doing in the "if (EndsWithSequential)" block. efriedma: Oh, sorry, you're right, I managed to confuse myself based on what we should be doing, rather…
		junbumlAuthorUnsubmitted Not Done Reply Inline Actions What I think we should be doing instead is trying to maintain the invariant that each GEP either has all constant offsets, or has exactly one non-zero offset. I'm afraid if I fully catch what you meant here. I guess you meant that overall we need to be less aggressive in merging GEPs in instcombine, and in instcombine, we try to keep GEPs have either all constant offset or exactly one non-constant offset? If merging GEPs can break this, we do not merge. It composes well, other optimizations can already deal with it (IIRC) I guess CodeGenprepare and ISel. Please let me know other passes which handle something like this. junbuml: >What I think we should be doing instead is trying to maintain the invariant that each GEP…
		efriedmaUnsubmitted Not Done Reply Inline Actions If merging GEPs can break this, we do not merge. Yes, that's what I meant. efriedma: > If merging GEPs can break this, we do not merge. Yes, that's what I meant.
// Replace: gep (gep %P, long B), long A, ...		// Replace: gep (gep %P, long B), long A, ...
// With: T = long A+B; gep %P, T, ...		// With: T = long A+B; gep %P, T, ...
Value *SO1 = Src->getOperand(Src->getNumOperands()-1);		Value *SO1 = Src->getOperand(Src->getNumOperands()-1);
Value *GO1 = GEP.getOperand(1);		Value *GO1 = GEP.getOperand(1);

// If they aren't the same type, then the input hasn't been processed		// If they aren't the same type, then the input hasn't been processed
// by the loop above yet (which canonicalizes sequential index types to		// by the loop above yet (which canonicalizes sequential index types to
// intptr_t). Just avoid transforming this until the input has been		// intptr_t). Just avoid transforming this until the input has been
// normalized.		// normalized.
if (SO1->getType() != GO1->getType())		if (SO1->getType() != GO1->getType())
return nullptr;		return nullptr;

Value *Sum =		Value *Sum =
SimplifyAddInst(GO1, SO1, false, false, SQ.getWithInstruction(&GEP));		SimplifyAddInst(GO1, SO1, false, false, SQ.getWithInstruction(&GEP));
// Only do the combine when we are sure the cost after the		// Only do the combine when we are sure the cost after the
// merge is never more than that before the merge.		// merge is never more than that before the merge.
if (Sum == nullptr)		if (Sum == nullptr)
return nullptr;		return nullptr;

// Update the GEP in place if possible.		// Update the GEP in place if possible.
if (Src->getNumOperands() == 2) {		if (Src->getNumOperands() == 2) {
		if (!Src->hasOneUse()) {
		Indices.push_back(Sum);
		Indices.append(GEP.op_begin() + 2, GEP.op_end());
		if (couldBeExpensiveGEP(Indices))
		return nullptr;
		}
GEP.setOperand(0, Src->getOperand(0));		GEP.setOperand(0, Src->getOperand(0));
GEP.setOperand(1, Sum);		GEP.setOperand(1, Sum);
return &GEP;		return &GEP;
}		}

Indices.append(Src->op_begin()+1, Src->op_end()-1);		Indices.append(Src->op_begin()+1, Src->op_end()-1);
Indices.push_back(Sum);		Indices.push_back(Sum);
Indices.append(GEP.op_begin()+2, GEP.op_end());		Indices.append(GEP.op_begin()+2, GEP.op_end());
} else if (isa<Constant>(*GEP.idx_begin()) &&		} else if (isa<Constant>(*GEP.idx_begin()) &&
cast<Constant>(*GEP.idx_begin())->isNullValue() &&		cast<Constant>(*GEP.idx_begin())->isNullValue() &&
Src->getNumOperands() != 1) {		Src->getNumOperands() != 1) {
// Otherwise we can do the fold if the first index of the GEP is a zero		// Otherwise we can do the fold if the first index of the GEP is a zero
Indices.append(Src->op_begin()+1, Src->op_end());		Indices.append(Src->op_begin()+1, Src->op_end());
Indices.append(GEP.idx_begin()+1, GEP.idx_end());		Indices.append(GEP.idx_begin()+1, GEP.idx_end());
}		}

if (!Indices.empty())		if (!Indices.empty()) {
		if (!Src->hasOneUse() && couldBeExpensiveGEP(Indices))
		return nullptr;

return GEP.isInBounds() && Src->isInBounds()		return GEP.isInBounds() && Src->isInBounds()
? GetElementPtrInst::CreateInBounds(		? GetElementPtrInst::CreateInBounds(
Src->getSourceElementType(), Src->getOperand(0), Indices,		Src->getSourceElementType(), Src->getOperand(0), Indices,
GEP.getName())		GEP.getName())
: GetElementPtrInst::Create(Src->getSourceElementType(),		: GetElementPtrInst::Create(Src->getSourceElementType(),
Src->getOperand(0), Indices,		Src->getOperand(0), Indices,
GEP.getName());		GEP.getName());
}		}
		}

if (GEP.getNumIndices() == 1) {		if (GEP.getNumIndices() == 1) {
unsigned AS = GEP.getPointerAddressSpace();		unsigned AS = GEP.getPointerAddressSpace();
if (GEP.getOperand(1)->getType()->getScalarSizeInBits() ==		if (GEP.getOperand(1)->getType()->getScalarSizeInBits() ==
DL.getIndexSizeInBits(AS)) {		DL.getIndexSizeInBits(AS)) {
uint64_t TyAllocSize = DL.getTypeAllocSize(GEPEltType);		uint64_t TyAllocSize = DL.getTypeAllocSize(GEPEltType);

bool Matched = false;		bool Matched = false;
▲ Show 20 Lines • Show All 1,665 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/gepphigep.ll

Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	bb:
%tmp20 = getelementptr inbounds %struct2, %struct2* %tmp1, i64 %tmp19		%tmp20 = getelementptr inbounds %struct2, %struct2* %tmp1, i64 %tmp19
%tmp21 = getelementptr inbounds %struct2, %struct2* %tmp20, i64 0, i32 0		%tmp21 = getelementptr inbounds %struct2, %struct2* %tmp20, i64 0, i32 0
store i32 0, i32* %tmp21, align 4		store i32 0, i32* %tmp21, align 4
%tmp24 = getelementptr inbounds %struct2, %struct2* %tmp10, i64 0, i32 1		%tmp24 = getelementptr inbounds %struct2, %struct2* %tmp10, i64 0, i32 1
%tmp25 = load i32, i32* %tmp24, align 4		%tmp25 = load i32, i32* %tmp24, align 4
ret i32 %tmp25		ret i32 %tmp25

; CHECK-LABEL: @test2(		; CHECK-LABEL: @test2(
; CHECK: getelementptr inbounds %struct2, %struct2* %tmp1, i64 %tmp9, i32 0		; CHECK: getelementptr inbounds %struct2, %struct2* %tmp10, i64 0, i32 0
; CHECK: getelementptr inbounds %struct2, %struct2* %tmp1, i64 %tmp19, i32 0		; CHECK: getelementptr inbounds %struct2, %struct2* %tmp1, i64 %tmp19, i32 0
; CHECK: getelementptr inbounds %struct2, %struct2* %tmp1, i64 %tmp9, i32 1		; CHECK: getelementptr inbounds %struct2, %struct2* %tmp10, i64 0, i32 1
}		}

; Check that instcombine doesn't insert GEPs before landingpad.		; Check that instcombine doesn't insert GEPs before landingpad.

define i32 @test3(%struct3* %dm, i1 %tmp4, i64 %tmp9, i64 %tmp19, i64 %tmp20, i64 %tmp21) personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*) {		define i32 @test3(%struct3* %dm, i1 %tmp4, i64 %tmp9, i64 %tmp19, i64 %tmp20, i64 %tmp21) personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*) {
bb:		bb:
%tmp = getelementptr inbounds %struct3, %struct3* %dm, i64 0		%tmp = getelementptr inbounds %struct3, %struct3* %dm, i64 0
br i1 %tmp4, label %bb1, label %bb2		br i1 %tmp4, label %bb1, label %bb2
▲ Show 20 Lines • Show All 121 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/unit_skip_gep_merge.ll

This file was added.

				; RUN: opt -instcombine -S < %s \| FileCheck %s

				%ST = type {i8, i8, i8*}
				declare void @bar(i8*)

				; CHECK-LABEL: @foo
				; CHECK-LABEL: entry
				; CHECK: %tmp1 = getelementptr inbounds %ST, %ST* %arrayidx8
				; CHECK-LABEL: BB0
				; CHECK: %tmp2 = getelementptr inbounds %ST, %ST* %arrayidx8
				define void @foo(%ST* %B, i1 %c, i64 %v, i8 %S, i8 %S2) {
				entry:
				%arrayidx8 = getelementptr inbounds %ST, %ST* %B, i64 %v
				%tmp1 = getelementptr inbounds %ST, %ST* %arrayidx8, i64 0, i32 1
				%r = load i8, i8* %tmp1
				store i8* %r, i8** %S2
				br i1 %c, label %BB0, label %BB2

				BB0:
				%tmp2 = getelementptr inbounds %ST, %ST* %arrayidx8, i64 0, i32 2
				%l = load i8, i8* %tmp2
				store i8* %l, i8** %S
				br label %BB2

				BB2:
				ret void
				}