Page MenuHomePhabricator

[AArch64] Homogeneous Prolog and Epilog for Size Optimization
Needs ReviewPublic

Authored by kyulee on Mar 22 2020, 11:31 AM.

Details

Summary

Prolog and epilog to handle callee-save registers tend to be irregular with different immediate offsets, which are not often being outlined (by machine outliner) when optimizing for size. From D18619, combining stack operations stretched irregularity further.
This patch tries to emit homogeneous stores and loads with the same offset for prolog and epilog respectively. We have observed that this homogeneous prolog and epilog significantly increased the chance of outlining, resulting in a code size reduction. However, it was still far from the minimum size code due to requiring the special handling of the return register, x30.
This patch custom-outlines prolog and epilog in place:

  • Injects HOM_Prolog and HOM_Epilog psuedo instructions in Prolog and Epilog Injection Pass
  • Lower and optimize them in AArchLowerHomogneousPrologEpilog Pass
  • Outlined helpers are created on demand. Identical helpers are merged by the linker.
  • An opt-in flag is introduced to enable this feature. Another threshold flag is also introduced to control the aggressiveness of outlining for application's need.

This reduced an average of 4% of code size on LLVM-TestSuite/CTMark targeting arm64/-Oz.

Diff Detail

Event Timeline

kyulee created this revision.Mar 22 2020, 11:31 AM

Hello. I like the idea. It's something we thought about internally but no-one has ever worked on enough to see how much of an improvement it gives in general.

llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
229

Can you explain what CompactUnwindFrame is, and why it's needed for this to work? I'm not really an expert on Frame Lowering, but is there a way to get this to work without that restriction?

kyulee added inline comments.Mar 30 2020, 10:22 AM
llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
229

This guarantees a pair register use and the ordering of CSR save locations are fixed, which simplifies the current logic for correctness. An immediate remedy to this is to support the non-pair register case, but I'm not either an expert for all other platforms and calling conventions, and I'm not sure how I test and validate such a relaxation. I think probably it's better leaving this extension to folks who know the details for their platforms.