This is an archive of the discontinued LLVM Phabricator instance.

AArch64: Add option to use shared epilogues in compiler-rt
Needs ReviewPublic

Authored by MatzeB on Dec 16 2015, 7:25 PM.

Details

Summary

Most aarch64 function epilogues look the same: A series of ldp
instructions followed by a ret. In fact about all epilogues fall in 1 of
16 patterns. These 16 epilogues are put into compiler-rt to be shared
and the epilogue code is replaced with a jump to these epilogues.

In a testsuite compiled with -Os sharing the epilogues gives a 1.7% percent
code size reduction.

This patch adds the -aarch64-shared-epilogues switch I will perform more
benchmarking to decide whether this is a candidate for -Os or -Oz.

Related to rdar://23082514

Diff Detail

Repository
rL LLVM

Event Timeline

MatzeB updated this revision to Diff 43096.Dec 16 2015, 7:25 PM
MatzeB retitled this revision from to AArch64: Add option to use shared epilogues in compiler-rt.
MatzeB updated this object.
MatzeB set the repository for this revision to rL LLVM.
MatzeB added subscribers: llvm-commits, ab.

I think the general idea of sharing epilogues is a good idea - at the very least when optimizing for size.
Did you also happen to measure the impact on performance?

Overall, I'm wondering if it wouldn't be better to let the compiler put the epilogue functions in comdat sections (or the equivalent for non-ELF object formats), rather than having them in compiler-rt. I think doing so would have the following advantages:

  • It's possible to catch all epilogues, not just the N (16 in the attached patch) most often used ones as seen in a benchmark corpus.
  • The epilogues can more easily be tuned for specific cores when the epilogues are produced by the compiler rather than being stored in compiler-rt. E.g. I've been told that this technique also has been used effectively in other compilers when targeting AArch32. On some AArch32 cores using LDRD tends to be more efficient than using LDM.
  • My gut feel is that if over time we want to modify epilogues; a scheme where the compiler still emits the epilogues is the most flexible. Retaining all versions of epilogues in compiler-rt potentially required by all LLVM revisions ever used may end up being a bookkeeping nightmare.

Obviously, a well-defined naming scheme will be needed to define the epilogue functions (e.g. should they contain a version number?), but I think that's true no matter whether the epilogue functions are produced by the compiler or inserted into compiler-rt.

This also made me wonder if something similar could be done for function prologues? I couldn't immediately think of why it would be impossible - but the overheads involved probably will be higher than with epilogues, e.g. having to do a call to a prologue function, rather than doing a branch to an epilogue function?

I think the general idea of sharing epilogues is a good idea - at the very least when optimizing for size.
Did you also happen to measure the impact on performance?

Overall, I'm wondering if it wouldn't be better to let the compiler put the epilogue functions in comdat sections (or the equivalent for non-ELF object formats), rather than having them in compiler-rt. I think doing so would have the following advantages:

  • It's possible to catch all epilogues, not just the N (16 in the attached patch) most often used ones as seen in a benchmark corpus.
  • The epilogues can more easily be tuned for specific cores when the epilogues are produced by the compiler rather than being stored in compiler-rt. E.g. I've been told that this technique also has been used effectively in other compilers when targeting AArch32. On some AArch32 cores using LDRD tends to be more efficient than using LDM.
  • My gut feel is that if over time we want to modify epilogues; a scheme where the compiler still emits the epilogues is the most flexible. Retaining all versions of epilogues in compiler-rt potentially required by all LLVM revisions ever used may end up being a bookkeeping nightmare.

Yes I agree and I have been thinking about this as well, I disregarded the idea when I realized that we have no infrastructure to place basic blocks into different sections.
However thinking about this now, it may be possible to create pseudo functions on-the-fly just like the pseudo functions I put into compiler-rt, I'll look into this.

Obviously, a well-defined naming scheme will be needed to define the epilogue functions (e.g. should they contain a version number?), but I think that's true no matter whether the epilogue functions are produced by the compiler or inserted into compiler-rt.

We can just describe the contents of the block in a unique way (in this implementation the name contains all the restored registers in order of restoration).

This also made me wonder if something similar could be done for function prologues? I couldn't immediately think of why it would be impossible - but the overheads involved probably will be higher than with epilogues, e.g. having to do a call to a prologue function, rather than doing a branch to an epilogue function?

It may be possible to do something with the prologues as well, but as these require a function call or similar mechanism the performance impact seemed bigger.

asb added a subscriber: asb.Dec 21 2015, 12:01 PM

For reference, Andrew Waterman and others in the RISC-V team looked at using function calls to register store/load helper functions to reduce size overhead in epilogues and prologues as an alternative to supporting load-multiple and store-multiple in the ISA. David Patterson described some of this work at the last RISC-V workshop http://riscv.org/workshop-jun2015/riscv-compressed-workshop-june2015.pdf. See slide 15.

jmolloy resigned from this revision.Jan 6 2016, 5:29 AM
jmolloy edited reviewers, added: kristof.beyls; removed: jmolloy.
ab added a comment.Jan 11 2016, 11:55 AM
  • My gut feel is that if over time we want to modify epilogues; a scheme where the compiler still emits the epilogues is the most flexible. Retaining all versions of epilogues in compiler-rt potentially required by all LLVM revisions ever used may end up being a bookkeeping nightmare.

We don't claim to support non-matching compiler-rt versions though, do we? I thought users/distros were supposed to always use the correct version.

lib/Target/AArch64/AArch64FrameLowering.cpp
856

When does RET occur here? I can't remember a way to bypass RET_ReallyLR.

In D15600#323879, @ab wrote:
  • My gut feel is that if over time we want to modify epilogues; a scheme where the compiler still emits the epilogues is the most flexible. Retaining all versions of epilogues in compiler-rt potentially required by all LLVM revisions ever used may end up being a bookkeeping nightmare.

We don't claim to support non-matching compiler-rt versions though, do we? I thought users/distros were supposed to always use the correct version.

I'm thinking of the case where object files compiled with different LLVM versions are linked together. If this isn't supported, it e.g. makes it pretty hard/impossible to distribute a library of binary code that can be linked with code generated by a number of different versions of LLVM. Allowing people that ship binary libraries to not have to ship a separate library for every single revision of clang/llvm seems like a good thing to me. Always requiring linking against the compiler-rt run-time library would probably also make it near-impossible to link together code produced by different compilers, unless these epilogue functions end up being defined in a de facto runtime library ABI?
I'm not sure if there's an official policy on this though.

All in all, it seems simpler to me to not have these epilogue functions in compiler-rt, but rather produce them in every object file that relies on them.

mcrosier resigned from this revision.Jan 12 2016, 7:53 AM
mcrosier removed a reviewer: mcrosier.

I looked into producing comdat functions and unfortunately I am not sure we can easily do this at the moment. All the codegen passes and the usual CodeGen/Passes.cpp pipeline is built from (Machine)FunctionPasses which are not allowed to create additional functions. I don't see an easy way out there yet.

As for keeping the epilogues in compiler-rt: I do not see how this case is any worse than anything else we have in compiler-rt; If you link with an incompatible/older version there is always a chance that things won't work, this should be the same for epilogues as for example soft-float intrinsics.

To avoid people accidentally changing the epilogue function I decidedly choose names that pretty much completely describe the content of the epilogue function: __epilogue_X19_X20_X21_X22 does what you would expect it to do: restore X19,X10,X21 and X22 in that order and return, I don't see how anyone would change the content of that function without also choosing a different function name.

I looked into producing comdat functions and unfortunately I am not sure we can easily do this at the moment. All the codegen passes and the usual CodeGen/Passes.cpp pipeline is built from (Machine)FunctionPasses which are not allowed to create additional functions. I don't see an easy way out there yet.

As for keeping the epilogues in compiler-rt: I do not see how this case is any worse than anything else we have in compiler-rt; If you link with an incompatible/older version there is always a chance that things won't work, this should be the same for epilogues as for example soft-float intrinsics.

To avoid people accidentally changing the epilogue function I decidedly choose names that pretty much completely describe the content of the epilogue function: __epilogue_X19_X20_X21_X22 does what you would expect it to do: restore X19,X10,X21 and X22 in that order and return, I don't see how anyone would change the content of that function without also choosing a different function name.

Hi Matthias,

My main objection is around requiring people to use compiler-rt as the run-time library.
Right now, it isn't required to use compiler-rt as the run-time library. E.g. on linux, often libgcc is used as the run-time library.
Sure, for some features, off-by-default, compiler-rt is required, like the sanitizers. But then users explicitly opt-in by a command line option.
I'm assuming the goal is that these shared epilogues-generation will be on by-default. Users on systems linking against the libgcc run-time library will all of a sudden see link failures, without having opted in for a particular feature.
If the right solution would be for these functions to be in all AArch64-supporting run-time libraries, then these ought to be defined in the AArch64 run-time library ABI, which would take quite a bit of time and effort.

In short, I think there are 2 practical ways forward:

  1. Only enable shared epilogues-generation on platforms that already demand using compiler-rt as the run-time library. I'm guessing that is Darwin-based platforms, but probably not much else?
  2. As suggested by Renato in a previous comment, add a pass early enough in the pipeline so it can add functions, to add the function definitions for all of the shared epilogues. Even if some of the shared epilogues aren't used in the translation unit, or in the finally linked program, that's OK: the linker should eliminate those. The drawback is that object files will be slightly larger. The advantage is that this should work on all platforms.

All-in-all, with the limited knowledge I have of all the details involved, I prefer option 2 if possible.