This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Always Fold GEP chains where the first index is zero
AbandonedPublic

Authored by mssimpso on Feb 28 2017, 2:26 PM.

Download Raw Diff

Details

Reviewers

majnemer
eli.friedman
efriedma

Summary

When combining the indices of GEP chains, we currently wait until the source of the chain has been simplified before simplifying its users. For example, when trying to fold %tmp1 into %tmp2 in the code below, we give up so that we can try to fold %tmp0 into %tmp1 first.

%tmp0 = getelementptr %pair, %pair* %p, i64 %i
%tmp1 = getelementptr %pair, %pair* %tmp0, i64 1
%tmp2 = getelementptr %pair, %pair* %tmp1, i64 0, i32 1

However if the %tmp0 - %tmp1 combine is unsuccessful, we will never actually return to try and simplify %tmp2, even though we can.

This patch causes us to always perform the zero-index simplification useful for this example. With this patch, we will simplify the above code to:

%tmp0 = getelementptr %pair, %pair* %p, i64 %i
%tmp2 = getelementptr %pair, %pair* %tmp0, i64 1, i32 1

Diff Detail

Build Status

Buildable 4412
Build 4412: arc lint + arc unit

Event Timeline

mssimpso created this revision.Feb 28 2017, 2:26 PM

mssimpso edited the summary of this revision. (Show Details)

I'd like to be a bit cautious about making changes here. The transform in question is essentially duplicating code: "y = gep(x, i); z =gep(y, 0, j)" is two mul and two add operations; "y = gep(x, i); z =gep(x, i, j)" is three mul and three add operations. So making it more aggressive could hurt performance in some cases.

I haven't really thought through what we should be doing here, but the right fix is probably more than just rearranging the code.

In D30474#689051, @efriedma wrote:

"y = gep(x, i); z =gep(x, i, j)" is three mul and three add operations.

Are you including the cost of computing y after the transformation here? y can be eliminated now (assuming z was its only user). Would you be more comfortable if there was a single user check?

If there's a single user, it's probably fine. Or maybe I'm just being overly cautious. In any case, please run the testsuite (or some equivalent benchmarking) to make sure there aren't any unexpected effects.

Long-term, I'd like to see this code rewritten so it doesn't depend on IR types.

mssimpso added inline comments.Mar 1 2017, 9:55 AM

lib/Transforms/InstCombine/InstructionCombining.cpp
1546–1553	Hey Eli, The single-use check is good enough for my purposes. But if we do that, I think it makes more sense to just guard the bail out check here with !Src->hasOneUse(). This will enable the full combine instead of just the zero simplification. So if GEP is the only user of Src, we go ahead with the combine and Src will be eliminated; otherwise, bail out and wait for Src to be simplified first. I will test this approach and update the patch with the results.

Addressed feedback from Eli.

Added a single use condition to the bail out check. If the source GEP has only one use, we now perform the combine instead of bailing out.

This patch didn't hit much in my testing of spec2000, spec2006, and the test-suite (AArch64/Kryo). It generally resulted in a reduction in the static number of instructions in a handful of programs. Performance was noise aside from a ~2% improvement in spec2000/perlbmk.

LGTM.

This revision is now accepted and ready to land.Mar 1 2017, 5:16 PM

I'm abandoning this change for now. Performance results were mostly noise, so I'm not sure it's worth committing. I might come back to it if I find a compelling test case.

Revision Contents

Path

Size

lib/

Transforms/

InstCombine/

InstructionCombining.cpp

14 lines

test/

Transforms/

InstCombine/

getelementptr.ll

12 lines

Diff 90211

lib/Transforms/InstCombine/InstructionCombining.cpp

Show First 20 Lines • Show All 1,537 Lines • ▼ Show 20 Lines	Instruction *InstCombiner::visitGetElementPtrInst(GetElementPtrInst &GEP) {

// Combine Indices - If the source pointer to this getelementptr instruction		// Combine Indices - If the source pointer to this getelementptr instruction
// is a getelementptr instruction, combine the indices of the two		// is a getelementptr instruction, combine the indices of the two
// getelementptr instructions into a single instruction.		// getelementptr instructions into a single instruction.
//		//
if (GEPOperator *Src = dyn_cast<GEPOperator>(PtrOp)) {		if (GEPOperator *Src = dyn_cast<GEPOperator>(PtrOp)) {
if (!shouldMergeGEPs(cast<GEPOperator>(&GEP), Src))		if (!shouldMergeGEPs(cast<GEPOperator>(&GEP), Src))
return nullptr;		return nullptr;

// Note that if our source is a gep chain itself then we wait for that		// Note that if our source is a gep chain itself having more than one use
// chain to be resolved before we perform this transformation. This		// then we wait for that chain to be resolved before we perform this
// avoids us creating a TON of code in some cases.		// transformation. This avoids us creating a TON of code in some cases.
if (GEPOperator *SrcGEP =		if (!Src->hasOneUse())
dyn_cast<GEPOperator>(Src->getOperand(0)))		if (GEPOperator *SrcGEP = dyn_cast<GEPOperator>(Src->getOperand(0)))
if (SrcGEP->getNumOperands() == 2 && shouldMergeGEPs(Src, SrcGEP))		if (SrcGEP->getNumOperands() == 2 && shouldMergeGEPs(Src, SrcGEP))
return nullptr; // Wait until our source is folded to completion.		return nullptr; // Wait until our source is folded to completion.
		mssimpsoAuthorUnsubmitted Done Reply Inline Actions Hey Eli, The single-use check is good enough for my purposes. But if we do that, I think it makes more sense to just guard the bail out check here with !Src->hasOneUse(). This will enable the full combine instead of just the zero simplification. So if GEP is the only user of Src, we go ahead with the combine and Src will be eliminated; otherwise, bail out and wait for Src to be simplified first. I will test this approach and update the patch with the results. mssimpso: Hey Eli, The single-use check is good enough for my purposes. But if we do that, I think it…

SmallVector<Value*, 8> Indices;		SmallVector<Value*, 8> Indices;

// Find out whether the last index in the source GEP is a sequential idx.		// Find out whether the last index in the source GEP is a sequential idx.
bool EndsWithSequential = false;		bool EndsWithSequential = false;
for (gep_type_iterator I = gep_type_begin(Src), E = gep_type_end(Src);		for (gep_type_iterator I = gep_type_begin(Src), E = gep_type_end(Src);
I != E; ++I)		I != E; ++I)
EndsWithSequential = I.isSequential();		EndsWithSequential = I.isSequential();
▲ Show 20 Lines • Show All 1,678 Lines • Show Last 20 Lines

test/Transforms/InstCombine/getelementptr.ll

	Show First 20 Lines • Show All 925 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: getelementptr [128 x i32]			; CHECK-NEXT: getelementptr [128 x i32]
	; CHECK-NEXT: addrspacecast i32*			; CHECK-NEXT: addrspacecast i32*
	; CHECK-NEXT: ret i32 addrspace(1)*			; CHECK-NEXT: ret i32 addrspace(1)*
	%gep = getelementptr [128 x i32], [128 x i32]* %p, i32 0, i32 0			%gep = getelementptr [128 x i32], [128 x i32]* %p, i32 0, i32 0
	%x = addrspacecast i32* %gep to i32 addrspace(1)*			%x = addrspacecast i32* %gep to i32 addrspace(1)*
	ret i32 addrspace(1)* %x			ret i32 addrspace(1)* %x
	}			}

				define i32* @resolve_gep_chains(%pair* %p, i64 %i) {
				; CHECK-LABEL: @resolve_gep_chains(
				; CHECK-NEXT: [[TMP0:%.]] = getelementptr %pair, %pair %p, i64 %i
				; CHECK-NEXT: [[TMP2:%.]] = getelementptr %pair, %pair [[TMP0]], i64 1, i32 1
				; CHECK-NEXT: ret i32* [[TMP2]]
				;
				%tmp0 = getelementptr %pair, %pair* %p, i64 %i
				%tmp1 = getelementptr %pair, %pair* %tmp0, i64 1
				%tmp2 = getelementptr %pair, %pair* %tmp1, i64 0, i32 1
				ret i32* %tmp2
				}

	; CHECK: attributes [[NUW]] = { nounwind }			; CHECK: attributes [[NUW]] = { nounwind }