Prolog and epilog to handle callee-save registers tend to be irregular with different immediate offsets, which are not often being outlined (by machine outliner) when optimizing for size. From D18619, combining stack operations stretched irregularity further.
This patch tries to emit homogeneous stores and loads with the same offset for prolog and epilog respectively. We have observed that this homogeneous prolog and epilog significantly increased the chance of outlining, resulting in a code size reduction. However, it was still far from the minimum size code due to requiring the special handling of the return register, x30.
This patch custom-outlines prolog and epilog in place:
- Injects HOM_Prolog and HOM_Epilog psuedo instructions in Prolog and Epilog Injection Pass
- Lower and optimize them in AArchLowerHomogneousPrologEpilog Pass
- Outlined helpers are created on demand. Identical helpers are merged by the linker.
- An opt-in flag is introduced to enable this feature. Another threshold flag is also introduced to control the aggressiveness of outlining for application's need.
This reduced an average of 4% of code size on LLVM-TestSuite/CTMark targeting arm64/-Oz.
Can you explain what CompactUnwindFrame is, and why it's needed for this to work? I'm not really an expert on Frame Lowering, but is there a way to get this to work without that restriction?