This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Always Fold GEP chains where the first index is zero
AbandonedPublic

Authored by mssimpso on Feb 28 2017, 2:26 PM.

Details

Summary

When combining the indices of GEP chains, we currently wait until the source of the chain has been simplified before simplifying its users. For example, when trying to fold %tmp1 into %tmp2 in the code below, we give up so that we can try to fold %tmp0 into %tmp1 first.

%tmp0 = getelementptr %pair, %pair* %p, i64 %i
%tmp1 = getelementptr %pair, %pair* %tmp0, i64 1
%tmp2 = getelementptr %pair, %pair* %tmp1, i64 0, i32 1

However if the %tmp0 - %tmp1 combine is unsuccessful, we will never actually return to try and simplify %tmp2, even though we can.

This patch causes us to always perform the zero-index simplification useful for this example. With this patch, we will simplify the above code to:

%tmp0 = getelementptr %pair, %pair* %p, i64 %i
%tmp2 = getelementptr %pair, %pair* %tmp0, i64 1, i32 1

Event Timeline

mssimpso created this revision.Feb 28 2017, 2:26 PM
mssimpso edited the summary of this revision. (Show Details)

I'd like to be a bit cautious about making changes here. The transform in question is essentially duplicating code: "y = gep(x, i); z =gep(y, 0, j)" is two mul and two add operations; "y = gep(x, i); z =gep(x, i, j)" is three mul and three add operations. So making it more aggressive could hurt performance in some cases.

I haven't really thought through what we should be doing here, but the right fix is probably more than just rearranging the code.

"y = gep(x, i); z =gep(x, i, j)" is three mul and three add operations.

Are you including the cost of computing y after the transformation here? y can be eliminated now (assuming z was its only user). Would you be more comfortable if there was a single user check?

If there's a single user, it's probably fine. Or maybe I'm just being overly cautious. In any case, please run the testsuite (or some equivalent benchmarking) to make sure there aren't any unexpected effects.

Long-term, I'd like to see this code rewritten so it doesn't depend on IR types.

mssimpso added inline comments.Mar 1 2017, 9:55 AM
lib/Transforms/InstCombine/InstructionCombining.cpp
1546–1553

Hey Eli,

The single-use check is good enough for my purposes. But if we do that, I think it makes more sense to just guard the bail out check here with !Src->hasOneUse(). This will enable the full combine instead of just the zero simplification. So if GEP is the only user of Src, we go ahead with the combine and Src will be eliminated; otherwise, bail out and wait for Src to be simplified first.

I will test this approach and update the patch with the results.

mssimpso updated this revision to Diff 90211.Mar 1 2017, 11:23 AM
mssimpso marked an inline comment as done.

Addressed feedback from Eli.

  • Added a single use condition to the bail out check. If the source GEP has only one use, we now perform the combine instead of bailing out.

This patch didn't hit much in my testing of spec2000, spec2006, and the test-suite (AArch64/Kryo). It generally resulted in a reduction in the static number of instructions in a handful of programs. Performance was noise aside from a ~2% improvement in spec2000/perlbmk.

efriedma accepted this revision.Mar 1 2017, 5:16 PM

LGTM.

This revision is now accepted and ready to land.Mar 1 2017, 5:16 PM
mssimpso abandoned this revision.Jun 16 2017, 10:05 AM

I'm abandoning this change for now. Performance results were mostly noise, so I'm not sure it's worth committing. I might come back to it if I find a compelling test case.