When we enabled the new PM, we noticed several big regressions. One reason is that the new PM has different inline orders. The legacy PM has a step that the new PM does not have which is moving the call sites calling functions in the current SCC to the end of the iteration list. I don't know whether the new PM omits this intentionally or not, but I can see two benefits of doing this.
- This step first inlines functions outside of the current SCC to discourage functions inside the same SCC inlining each other which can bloat up the code size.
- Inlining a callee inside the current SCC likely makes the caller recursive. LLVM does not inline any recursive functions.
One drawback I can think of is that callsites from the same caller are stored in two places instead of one. Thus, we may have to switch function proxies more often.
This patch just copied the code from the legacy PM to the new PM. Here is the SPEC performance and code size change
code size (%) (- is smaller) | performance (%) (+ is faster) | |
spec2000/ammp | -1.32 | +0.02 |
spec2000/vortex | -1.06 | -0.21 |
spec2006/gobmk | +0.8 | +1.22 |
spec2006/povray | -0.11 | +31.90 |
spec2017/leela | -1.67 | -0.78 |
spec2017/povray | -0.24 | +27.12 |