Partial Redundancy Elimination of GEPs prevents CodeGenPrepare from sinking the addressing mode computation of memory instructions back to its uses. The problem comes from the insertion of PHIs, which confuse CGP and make it bail.
I found this problem when looking at sqlite amalgamation from https://www.sqlite.org/download.html.
We could teach CGP to look through PHI nodes in FindAllMemoryUses but this would increase the compilation time (currently scanning is limited to 20 memory instructions - sqlite needs 6 times more). Moreover, CGP still wouldn't be able to handle GEPs that have different base and offset but correspond to the same Value Number (like in the regression test).
This looks good for performance and codesize. I am posting some performance numbers targeting Cortex-A57 AArch64 reported by LNT for llvm-test-suite, spec2000, and spec2006 at -O3 using a resent LLVM trunk revision with my patch applied.
Performance Improvements - execution_time
Performance Regressions - execution_time
Performance Improvements - mem_bytes