This patch tries to canonicalise add + gep to gep + gep.
Co-authored-by: Paul Walker <paul.walker@arm.com>
Differential D155688
[PATCH] [llvm] [InstCombine] Canonicalise ADD+GEP d-smirnov on Jul 19 2023, 2:55 AM. Authored by
Details This patch tries to canonicalise add + gep to gep + gep. Co-authored-by: Paul Walker <paul.walker@arm.com>
Diff Detail
Unit Tests Event TimelineComment Actions Alternatively, it would also be possible to canonicalize gep(p, add(x, y)) to gep(gep(p, x), y) in general, in which case existing GEP reassociation support in LICM will take care of the rest. Arguably this is cleaner (as we should have a canonical form between these two possibilities), but it's more likely to cause fallout. Comment Actions @nikic Could you check out the updated code to make sure we're on the right track before I try to fix the rest of the unit tests?
Comment Actions LGTM We should give this a try, but I think there is a fairly large chance that this will cause regressions somewhere and a more targeted change may be necessary (e.g. only do this for loop-invariants in LICM).
Comment Actions How does this patch work with visitGEPOfGEP that does a reverse transformation? // Replace: gep (gep %P, long B), long A, ... // With: T = long A+B; gep %P, T, ... Comment Actions The reverse transform is only done if A + B simplifies. By the way, this change did cause some code size regressions: http://llvm-compile-time-tracker.com/compare.php?from=a16f6462d756804276d4b39267b3c19bcd6949fe&to=e13bed4c5f3544c076ce57e36d9a11eefa5a7815&stat=size-text The one that stood out to me is that btGjkEpa2.cpp from bullet has become 13% larger. Comment Actions
Looks like`simplifyAddInst` may give add expressions, so I guess this patch may make IC run into infinite loops. Additionally, this change could make longer GEP chains that could hurt other optimizations by exceeding AA or value-tracking thresholds. Comment Actions We have some improvements with the patch, most notable: 549.fotonik_3d improves about 6%. Comment Actions simplifyAddInst can return an add instruction, but it will be an existing one. It will never introduce a new one. So I'm not sure how this would result in infinite loops? I don't think we have cause to revert just yet, as we're not aware of any specific issues. Comment Actions After this patch was recently pulled into my downstream, I'm seeing a lot of invariant.gep created by LICM. For example, in LBM_performStreamCollide in 470.lbm there are 65 of them. On RISC-V, these all get created in registers outside the loop and get spilled. Is ARM seeing anything like this or do you have more addressing modes that allow CodeGenPrepare to bring these back into the loop? Comment Actions I hadn't realized this came from someone at Arm. The performance results I had were overall roughly flat, with some improvements and regressions. I think there were still some people working through some fixes for some of the knock-on effects but with those nothing large would stick out in what I saw. I would expect Loop Strength Reduction (maybe with CGP) to be able to optimize the addressing modes back to something that is optimal for the loop if it can. It's not always super reliable though. Might there be something going wrong in that pass? |
This needs a one-use check. The transform is not profitable if we have to keep *both* the add and the gep.
Can also use match(GEP.getOperand(1), m_Add(...)) here.