Page MenuHomePhabricator

[InstCombine] Skip merging non-free GEP
Needs ReviewPublic

Authored by junbuml on Sep 28 2018, 1:42 PM.

Details

Summary

Currently, InstCombine aggressively merge GEPs because it can help backends make
better decision with complex addressing modes. However, in some cases, merging
GEPs can cause more instructions. This change will skip merging GEPs if a
non-free GEP is used in multiple basic blocks.

For the test-case (unit_skip_gep_merge.ll), this patch change code generation (-O3) for AArch64 :
from :

foo:
        orr     w8, wzr, #0x18
        madd    x8, x2, x8, x0
        ldr     x8, [x8, #8]
        str     x8, [x4]
        tbz     w1, #0, .LBB0_2
// %bb.1:
        orr     w8, wzr, #0x18
        madd    x8, x2, x8, x0
        ldr     x8, [x8, #16]
        str     x8, [x3]
.LBB0_2:
        ret

to :

foo:
        orr     w8, wzr, #0x18
        madd    x8, x2, x8, x0
        ldr     x9, [x8, #8]
        str     x9, [x4]
        tbz     w1, #0, .LBB0_2
// %bb.1:
        ldr     x8, [x8, #16]
        str     x8, [x3]
.LBB0_2:
        ret

Diff Detail

Event Timeline

junbuml created this revision.Sep 28 2018, 1:42 PM

Do you have perf/size numbers?

I'm sort of confused that MachineCSE doesn't trigger on your testcase, but that isn't really relevant here, I guess.

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1779

For array GEPs, we have a check to try to make sure we aren't making a GEP more expensive (see EndsWithSequential etc.) Maybe instead of adding a different kind of check, it would make more sense to extend the existing check to struct GEPs?

dmgreen added a subscriber: dmgreen.Oct 1 2018, 5:43 AM

Do you have perf/size numbers?

I didn't see any degradation in size/perf in my spec2000. Just observed minor performance improvement in some internal benchmarks.

I'm sort of confused that MachineCSE doesn't trigger on your testcase, but that isn't really relevant here, I guess.

I also think MachineCSE should be able to catch it in the machine level, but somehow it's not. I will check this too. However, I believe it's better not to distribute the same calculations in early phase unless we have clear benefits with the duplicated calculations.

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1779

Did you mean to move this whole check (line 1680~1687) into EndsWithSequential?

efriedma added inline comments.Oct 1 2018, 1:23 PM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1779

No.

We currently have a check in the "if (EndsWithSequential)" which restricts GEP folding. That doesn't apply to your testcase because the last index is a struct index. Instead, it falls into the "else if (isa<Constant>(*GEP.idx_begin())" case, which doesn't have a profitability check. My suggestion is to adapt the EndsWithSequential profitability check to also apply to struct GEPs.

junbuml added inline comments.Oct 1 2018, 2:34 PM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1779

The last index (%v) in the test testcase is still a pointer of struct, which is sequential. This testcase actually get into "if (EndsWithSequential)" and the Src (%arrayidx8) is merged with GEP (%tmp1 and %tmp2). Looks like there no such check we have in this patch in "if (EndsWithSequential)".

efriedma added inline comments.Oct 1 2018, 6:45 PM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1779

Oh, sorry, you're right, I managed to confuse myself based on what we should be doing, rather than what we're actually doing at the moment.

I don't like adding heuristics based on the uses of the GEP; that's likely to lead to inconsistent results, and possibly O(N^2) compile-time.

What I think we should be doing instead is trying to maintain the invariant that each GEP either has all constant offsets, or has exactly one non-zero offset. Roughly, the rule allows GEPs which will lower to one addition instruction. It composes well, other optimizations can already deal with it (IIRC), and it's a straightforward extension of what we're already doing in the "if (EndsWithSequential)" block.

junbuml added inline comments.Oct 2 2018, 9:26 AM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1779

What I think we should be doing instead is trying to maintain the invariant that each GEP either has all constant offsets, or has exactly one non-zero offset.

I'm afraid if I fully catch what you meant here. I guess you meant that overall we need to be less aggressive in merging GEPs in instcombine, and in instcombine, we try to keep GEPs have either all constant offset or exactly one non-constant offset? If merging GEPs can break this, we do not merge.

It composes well, other optimizations can already deal with it (IIRC)

I guess CodeGenprepare and ISel. Please let me know other passes which handle something like this.

efriedma added inline comments.Oct 2 2018, 11:04 AM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1779

If merging GEPs can break this, we do not merge.

Yes, that's what I meant.

junbuml updated this revision to Diff 169109.Oct 10 2018, 3:16 PM

Tried to handle Eli's comment. Please take a look and let me know any comment.

With the last modification I made based on Eli's comment, I didn't see any significant changes in size / performance in my spec2000 test on AArch64. Observed minor performance improvement in some internal benchmarks.

Kindly ping.

Kindly ping one more time.