Page MenuHomePhabricator

[ELF][PPC64] Synthesize _savegpr[01]_{14..31} and _restgpr[01]_{14..31}

Authored by MaskRay on May 14 2020, 5:53 PM.



In the 64-bit ELF V2 API Specification: Power Architecture, GPR
Save and Restore Functions defines some special functions which may be
referenced by GCC produced assembly (LLVM does not reference them).

With GCC -Os, when the number of call-saved registers exceeds a certain
threshold, GCC generates _savegpr0_* _restgpr0_* calls and expects the
linker to define them. See and . This
is weird because libgcc.a would be the natural place. However, the linker
generation approach has the advantage that the linker can generate
multiple copies to avoid long branch thunks. We don't consider the
advantage significant enough to complicate our trunk implementation, so
we take a simple approach.

  • Check whether _savegpr[01]_{14..31} and _restgpr[01]_{14..31} are used
  • If yes, assemble the predefined .s and add the object file as LazyObjFile.

Diff Detail

Event Timeline

MaskRay created this revision.May 14 2020, 5:53 PM
MaskRay marked 2 inline comments as done.May 14 2020, 6:04 PM
MaskRay added inline comments.
61 ↗(On Diff #264134)

I should add .hidden as well.


I should change one test to check -shared

MaskRay updated this revision to Diff 264138.May 14 2020, 6:56 PM
MaskRay edited the summary of this revision. (Show Details)


MaskRay updated this revision to Diff 264459.May 16 2020, 4:37 PM

Better approach.
Synthesize InputSections

Bdragon28 added inline comments.May 20 2020, 12:35 PM

The example in the ABI doc was illustrating that you can split it into multiple chunks if you make sure both sides finish off saving the rest of the registers. It's actually cleaner to just have it continue dropping through until the end.

I believe the reason for the ABI doc having the example like it is is probably something to do with r30,r31 save/restore being much more common than full save/restore so binutils ld has a small and a large version.

Since we're not bothering with sticking multiple copies in anyway, it doesn't matter and we can go with the simpler implementation here of not treating r29 differently.


These are at fixed offsets to catch counting errors in the code above, correct?

MaskRay updated this revision to Diff 265335.May 20 2020, 1:42 PM
MaskRay marked 4 inline comments as done.

Simplify _restgpr0_*


Thanks for the comment. Let me simplify _restgpr0_*

This is probably some microarchitectural optimization we don't care much.



Thanks for implementing this MaskRay. I've left a couple comments, but still need to go through the testing.


Since we are always writing the entire routine into the buffer anyway why not just hard code the array contents so we don't have to compute it at runtime?


real minor nit: I think the code would be a bit clearer if this check cake outside the lamda: eg

if (!defined.empty())

Does it clutter the code too much to keep track of the correct value in the loop that calls `addOptional' and not have to adjust this later?

MaskRay marked 5 inline comments as done.May 21 2020, 9:27 AM
MaskRay added inline comments.

We need to define symbols _savegpr* and _restgpr*, so a loop is needed. There is little saving if we move instructions alone to a constant array.


Mostly there are 4 callers. Placing it here saves some code.


Yes, the alternative approach will clutter the code. It will require 2 loops, with one finding the start point and the second placing instructions and definitions. This post adjustment approach is the simplest I can find.

MaskRay marked 2 inline comments as done.May 21 2020, 9:35 AM
sfertile added inline comments.May 21 2020, 12:31 PM

The savings mostly come from making the 4 loops we have now (and the 2 that need to be added later for the fpr save/restore routines) nearly identical. We can then write a helper function or lambda and call that 6 time instead of having 6 loops once we add support for _savefpr* and _restfpr* as well. With a minor tweak I believe the helper would also work for the vector save restore routines as well.

Something like:

Helper(uint32_t *Buffer, uint32_t BufferSize, StringRef Prefix) {
  int r = findFirstOf(prefix);  // returns the register number corresponding to the first symbol we need to define.
  // Any referenced symbols are defined, nothing to do.
  if (r == 32)

  int Value = 0;
  const int Offset = 4 * (r - 14);
  for (r < 32; ++r) {
   Name = FormatName(prefix, r)
    addOptional(Name, Value, defined);
    Value += 4;
  // No need to check if defined is empty becuase we would have returned already if no symbols need to be defiend.
  addSection(Buffer + Offset, Buffer + Size, defined);

Its longer and more verbose then one of the for loops, but I think its much easier to follow the logic, and then it only takes 6 lines like:

Helper(savegpr0, ArraySize(savegpr0), "__savegpr0_");

Its more lines of code total, since defining the arrays adds more lines then we save with the replacing the for loops with the helper but I think its much more approachable for a reader.


I think this approach is simpler in terms of less lines of code, but not necessarily simpler to read or understand.

MaskRay updated this revision to Diff 265613.May 21 2020, 2:47 PM
MaskRay marked an inline comment as done.



Thanks for the suggestion. I have found a way to simplify the code.

MaskRay marked 3 inline comments as done.May 21 2020, 2:52 PM
MaskRay updated this revision to Diff 265618.May 21 2020, 3:05 PM

Simplify with MutableArrayRef<uint32_t>

sfertile accepted this revision.May 26 2020, 6:52 AM

One comment but otherwise LGTM.


nit: from is always 14 so we don't need it as an argument, we can just always start the loop at 14.

This revision is now accepted and ready to land.May 26 2020, 6:52 AM
MaskRay marked 2 inline comments as done.May 26 2020, 9:06 AM

Thanks for the review! Will commit soon...


This is reserved for _savevr_20 and _restvr_20 which start from 20 instead of 14..

This revision was automatically updated to reflect the committed changes.
MaskRay marked an inline comment as done.