This is an archive of the discontinued LLVM Phabricator instance.

[PATCH] [llvm] [InstCombine] Canonicalise ADD+GEP
ClosedPublic

Authored by d-smirnov on Jul 19 2023, 2:55 AM.

Details

Summary

This patch tries to canonicalise add + gep to gep + gep.

Co-authored-by: Paul Walker <paul.walker@arm.com>

Diff Detail

Event Timeline

d-smirnov created this revision.Jul 19 2023, 2:55 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 19 2023, 2:55 AM
d-smirnov requested review of this revision.Jul 19 2023, 2:55 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 19 2023, 2:55 AM
nikic requested changes to this revision.Jul 19 2023, 3:25 AM
nikic added a subscriber: nikic.

This needs to happen in LICM, not InstCombine.

This revision now requires changes to proceed.Jul 19 2023, 3:25 AM
kiranchandramohan edited the summary of this revision. (Show Details)Jul 19 2023, 4:02 AM
nikic added a comment.Jul 19 2023, 4:50 AM

Alternatively, it would also be possible to canonicalize gep(p, add(x, y)) to gep(gep(p, x), y) in general, in which case existing GEP reassociation support in LICM will take care of the rest. Arguably this is cleaner (as we should have a canonical form between these two possibilities), but it's more likely to cause fallout.

d-smirnov updated this revision to Diff 556881.Sep 15 2023, 3:08 PM

Relaxed checks

Herald added a project: Restricted Project. · View Herald TranscriptSep 15 2023, 3:08 PM
d-smirnov updated this revision to Diff 556894.Sep 16 2023, 3:16 AM

unit test fixed

@nikic Could you check out the updated code to make sure we're on the right track before I try to fix the rest of the unit tests?

d-smirnov updated this revision to Diff 557022.Sep 19 2023, 3:24 AM

unit tests

d-smirnov updated this revision to Diff 557104.Sep 20 2023, 2:34 AM

Hexagon test updated

paulwalker-arm added inline comments.Sep 20 2023, 3:32 AM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
2318–2319

Perhaps move this block after the We do not handle pointer-vector geps here immediately below so this test can be removed.

nikic added inline comments.Sep 20 2023, 4:00 AM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
2322

This needs a one-use check. The transform is not profitable if we have to keep *both* the add and the gep.

Can also use match(GEP.getOperand(1), m_Add(...)) here.

2325

This no longer checks for loop invariance, so we should remove any invariance-related terminology.

2337

This inbounds preservation is incorrect: https://alive2.llvm.org/ce/z/bJZvQG

It's even incorrect if the add is also nsw.

d-smirnov marked 3 inline comments as done.

Reordered and removed extra check

d-smirnov marked an inline comment as done.Sep 20 2023, 12:03 PM

updated

d-smirnov retitled this revision from [PATCH] [llvm] [InstCombine] Reassociate loop invariant GEP index calculations. to [PATCH] [llvm] [InstCombine] Canonicalise ADD+GEP.Sep 21 2023, 9:25 AM
d-smirnov edited the summary of this revision. (Show Details)
d-smirnov updated this revision to Diff 557187.Sep 21 2023, 9:33 AM

comment updated

@nikic Updated. Please review

nikic added inline comments.Oct 2 2023, 6:39 AM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
2325–2327
2333
2336

No need for the NewGEP variable.

d-smirnov updated this revision to Diff 557538.Oct 2 2023, 10:32 AM
d-smirnov marked 2 inline comments as done.

amended

d-smirnov marked an inline comment as done.Oct 2 2023, 10:34 AM

@nikic Amended.

nikic accepted this revision.Oct 4 2023, 6:34 AM

LGTM

We should give this a try, but I think there is a fairly large chance that this will cause regressions somewhere and a more targeted change may be necessary (e.g. only do this for loop-invariants in LICM).

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
2334–2335

IRBuilder needs to be used for all but the last instruction.

This revision is now accepted and ready to land.Oct 4 2023, 6:34 AM
d-smirnov marked an inline comment as done.Oct 5 2023, 4:02 AM

Updated

This revision was automatically updated to reflect the committed changes.
fiigii added a subscriber: fiigii.Oct 6 2023, 6:34 PM

How does this patch work with visitGEPOfGEP that does a reverse transformation?

// Replace: gep (gep %P, long B), long A, ...
// With:    T = long A+B; gep %P, T, ...
nikic added a comment.Oct 9 2023, 5:29 AM

How does this patch work with visitGEPOfGEP that does a reverse transformation?

// Replace: gep (gep %P, long B), long A, ...
// With:    T = long A+B; gep %P, T, ...

The reverse transform is only done if A + B simplifies.


By the way, this change did cause some code size regressions: http://llvm-compile-time-tracker.com/compare.php?from=a16f6462d756804276d4b39267b3c19bcd6949fe&to=e13bed4c5f3544c076ce57e36d9a11eefa5a7815&stat=size-text

The one that stood out to me is that btGjkEpa2.cpp from bullet has become 13% larger.

fiigii added a comment.Oct 9 2023, 2:04 PM

The reverse transform is only done if A + B simplifies.

Looks like`simplifyAddInst` may give add expressions, so I guess this patch may make IC run into infinite loops.

Additionally, this change could make longer GEP chains that could hurt other optimizations by exceeding AA or value-tracking thresholds.

We have some improvements with the patch, most notable: 549.fotonik_3d improves about 6%.
@nikic Should we revert the patch and try another location for it (in LICM pass, as you previously suggested)?

nikic added a comment.Oct 10 2023, 8:35 AM

The reverse transform is only done if A + B simplifies.

Looks like`simplifyAddInst` may give add expressions, so I guess this patch may make IC run into infinite loops.

simplifyAddInst can return an add instruction, but it will be an existing one. It will never introduce a new one. So I'm not sure how this would result in infinite loops?

We have some improvements with the patch, most notable: 549.fotonik_3d improves about 6%.
@nikic Should we revert the patch and try another location for it (in LICM pass, as you previously suggested)?

I don't think we have cause to revert just yet, as we're not aware of any specific issues.

That would be fine. Thanks for explaining.

craig.topper added a subscriber: craig.topper.EditedTue, Nov 21, 6:51 PM

After this patch was recently pulled into my downstream, I'm seeing a lot of invariant.gep created by LICM. For example, in LBM_performStreamCollide in 470.lbm there are 65 of them. On RISC-V, these all get created in registers outside the loop and get spilled. Is ARM seeing anything like this or do you have more addressing modes that allow CodeGenPrepare to bring these back into the loop?

I hadn't realized this came from someone at Arm. The performance results I had were overall roughly flat, with some improvements and regressions. I think there were still some people working through some fixes for some of the knock-on effects but with those nothing large would stick out in what I saw.

I would expect Loop Strength Reduction (maybe with CGP) to be able to optimize the addressing modes back to something that is optimal for the loop if it can. It's not always super reliable though. Might there be something going wrong in that pass?