Hi Tim, Jingyue and other reviewers,
This patch is based on http://reviews.llvm.org/D5863, which fixes some problems in the SeparateConstOffsetFromGEP pass. So please apply that patch first if you want to have a try.
We find LLVM cannot handle CSE well in GEPs (getelementptrs). The GEP can be very complex, it can has many mixed constant indices and variable indices. The variable indices may also contain other constants. And such complex GEPs are usually kept in CodeGen. But as CodeGen can only see in one basic block. For the GEPs across basic blocks (e.g, two GEPs used in two different basic blocks or one GEP used in two basic blocks), it may have CSE opportunities and will be missed.
Currently there is a pass called SeparateConstOffsetFromGEP, which can separate constant within variable indices and split a complex GEP into two GEPs. But this is not enough for the problem that GEPs across basic blocks I mentioned above.
So I improve this pass. It will separate constant within indices for both sequential and struct types. And most important is that it will also transform complex GEPs into a "ptrtoint + arithmetic + inttoptr" form, so that it is able to find CSE opportunities across basic blocks. Also it can benefit address sinking logic in CodeGenPrepare for those complex GEPs which can not be matched by addressing mode, as part of the address calculation can still be sunk. The address sinking logic can result in better addressing mode. EarlyCSE pass is called after this pass to do CSE. LICM pass is also called to do loop invariant code motion in case any of the address calculations are invariant.
If we don't find CSE opportunities after such arithmetic transformation, it still has no obvious regression on performance, as it will always do such transformation in CodeGen. We just move such transformation several passes ahead of CodeGen.
I tested the performance for A57 of AArch64 target on SPEC CPU2000 and SPEC CPU2006. It has no obvious regressions and has improvements on following benchmarks:
For the benchmarks don't have obvious improvement, we can also see the address calculation and addressing mode are better from the assembly code.
For other targets like NVPTX, I can not test this patch. I think this patch can also benefit the performance, at least it has no regression.