This patch tries to canonicalise add + gep to gep + gep.
Co-authored-by: Paul Walker <paul.walker@arm.com>
Paths
| Differential D155688
[PATCH] [llvm] [InstCombine] Canonicalise ADD+GEP ClosedPublic Authored by d-smirnov on Jul 19 2023, 2:55 AM.
Details Summary This patch tries to canonicalise add + gep to gep + gep. Co-authored-by: Paul Walker <paul.walker@arm.com>
Diff Detail
Event TimelineThis revision now requires changes to proceed.Jul 19 2023, 3:25 AM Comment Actions Alternatively, it would also be possible to canonicalize gep(p, add(x, y)) to gep(gep(p, x), y) in general, in which case existing GEP reassociation support in LICM will take care of the rest. Arguably this is cleaner (as we should have a canonical form between these two possibilities), but it's more likely to cause fallout. Comment Actions @nikic Could you check out the updated code to make sure we're on the right track before I try to fix the rest of the unit tests?
d-smirnov retitled this revision from [PATCH] [llvm] [InstCombine] Reassociate loop invariant GEP index calculations. to [PATCH] [llvm] [InstCombine] Canonicalise ADD+GEP.Sep 21 2023, 9:25 AM Comment Actions LGTM We should give this a try, but I think there is a fairly large chance that this will cause regressions somewhere and a more targeted change may be necessary (e.g. only do this for loop-invariants in LICM).
This revision is now accepted and ready to land.Oct 4 2023, 6:34 AM Closed by commit rGe13bed4c5f35: [PATCH] [llvm] [InstCombine] Canonicalise ADD+GEP (authored by d-smirnov, committed by MatsPetersson). · Explain WhyOct 6 2023, 4:38 AM This revision was automatically updated to reflect the committed changes. Comment Actions How does this patch work with visitGEPOfGEP that does a reverse transformation? // Replace: gep (gep %P, long B), long A, ... // With: T = long A+B; gep %P, T, ... Comment Actions
The reverse transform is only done if A + B simplifies. By the way, this change did cause some code size regressions: http://llvm-compile-time-tracker.com/compare.php?from=a16f6462d756804276d4b39267b3c19bcd6949fe&to=e13bed4c5f3544c076ce57e36d9a11eefa5a7815&stat=size-text The one that stood out to me is that btGjkEpa2.cpp from bullet has become 13% larger. Comment Actions
Looks like`simplifyAddInst` may give add expressions, so I guess this patch may make IC run into infinite loops. Additionally, this change could make longer GEP chains that could hurt other optimizations by exceeding AA or value-tracking thresholds. Comment Actions We have some improvements with the patch, most notable: 549.fotonik_3d improves about 6%. Comment Actions
simplifyAddInst can return an add instruction, but it will be an existing one. It will never introduce a new one. So I'm not sure how this would result in infinite loops?
I don't think we have cause to revert just yet, as we're not aware of any specific issues. Comment Actions After this patch was recently pulled into my downstream, I'm seeing a lot of invariant.gep created by LICM. For example, in LBM_performStreamCollide in 470.lbm there are 65 of them. On RISC-V, these all get created in registers outside the loop and get spilled. Is ARM seeing anything like this or do you have more addressing modes that allow CodeGenPrepare to bring these back into the loop? Comment Actions I hadn't realized this came from someone at Arm. The performance results I had were overall roughly flat, with some improvements and regressions. I think there were still some people working through some fixes for some of the knock-on effects but with those nothing large would stick out in what I saw. I would expect Loop Strength Reduction (maybe with CGP) to be able to optimize the addressing modes back to something that is optimal for the loop if it can. It's not always super reliable though. Might there be something going wrong in that pass?
Revision Contents
Diff 557628 clang/test/CodeGenCXX/microsoft-abi-dynamic-cast.cpp
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
llvm/test/CodeGen/Hexagon/autohvx/vector-align-tbaa.ll
llvm/test/Transforms/InstCombine/align-addr.ll
llvm/test/Transforms/InstCombine/mem-par-metadata-memcpy.ll
llvm/test/Transforms/InstCombine/memrchr-4.ll
llvm/test/Transforms/InstCombine/shift.ll
llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-accesses.ll
llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-phi.ll
llvm/test/Transforms/LoopVectorize/induction.ll
llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll
llvm/test/Transforms/LoopVectorize/invariant-store-vectorization.ll
llvm/test/Transforms/LoopVectorize/runtime-check.ll
llvm/test/Transforms/LowerMatrixIntrinsics/multiply-fused-loops.ll
|
Perhaps move this block after the We do not handle pointer-vector geps here immediately below so this test can be removed.