Page MenuHomePhabricator

[ELF][PPC64] Synthesize _savegpr[01]_{14..31} and _restgpr[01]_{14..31}
ClosedPublic

Authored by MaskRay on Thu, May 14, 5:53 PM.

Details

Summary

In the 64-bit ELF V2 API Specification: Power Architecture, 2.3.3.1. GPR
Save and Restore Functions defines some special functions which may be
referenced by GCC produced assembly (LLVM does not reference them).

With GCC -Os, when the number of call-saved registers exceeds a certain
threshold, GCC generates _savegpr0_* _restgpr0_* calls and expects the
linker to define them. See
https://sourceware.org/pipermail/binutils/2002-February/017444.html and
https://sourceware.org/pipermail/binutils/2004-August/036765.html . This
is weird because libgcc.a would be the natural place. However, the linker
generation approach has the advantage that the linker can generate
multiple copies to avoid long branch thunks. We don't consider the
advantage significant enough to complicate our trunk implementation, so
we take a simple approach.

  • Check whether _savegpr[01]_{14..31} and _restgpr[01]_{14..31} are used
  • If yes, assemble the predefined .s and add the object file as LazyObjFile.

Diff Detail

Event Timeline

MaskRay created this revision.Thu, May 14, 5:53 PM
MaskRay marked 2 inline comments as done.Thu, May 14, 6:04 PM
MaskRay added inline comments.
lld/ELF/Arch/PPC64SaveRest.cpp
61 ↗(On Diff #264134)

I should add .hidden as well.

lld/test/ELF/ppc64-savegpr1.s
15

I should change one test to check -shared

MaskRay updated this revision to Diff 264138.Thu, May 14, 6:56 PM
MaskRay edited the summary of this revision. (Show Details)

adjust

MaskRay updated this revision to Diff 264459.Sat, May 16, 4:37 PM

Better approach.
Synthesize InputSections

Bdragon28 added inline comments.Wed, May 20, 12:35 PM
lld/ELF/Arch/PPC64.cpp
165

The example in the ABI doc was illustrating that you can split it into multiple chunks if you make sure both sides finish off saving the rest of the registers. It's actually cleaner to just have it continue dropping through until the end.

I believe the reason for the ABI doc having the example like it is is probably something to do with r30,r31 save/restore being much more common than full save/restore so binutils ld has a small and a large version.

Since we're not bothering with sticking multiple copies in anyway, it doesn't matter and we can go with the simpler implementation here of not treating r29 differently.

179

These are at fixed offsets to catch counting errors in the code above, correct?

MaskRay updated this revision to Diff 265335.Wed, May 20, 1:42 PM
MaskRay marked 4 inline comments as done.

Simplify _restgpr0_*

lld/ELF/Arch/PPC64.cpp
165

Thanks for the comment. Let me simplify _restgpr0_*

This is probably some microarchitectural optimization we don't care much.

179

Yes.

Thanks for implementing this MaskRay. I've left a couple comments, but still need to go through the testing.

lld/ELF/Arch/PPC64.cpp
133

Since we are always writing the entire routine into the buffer anyway why not just hard code the array contents so we don't have to compute it at runtime?

143

real minor nit: I think the code would be a bit clearer if this check cake outside the lamda: eg

if (!defined.empty())
  addSection(...)
152

Does it clutter the code too much to keep track of the correct value in the loop that calls `addOptional' and not have to adjust this later?

MaskRay marked 5 inline comments as done.Thu, May 21, 9:27 AM
MaskRay added inline comments.
lld/ELF/Arch/PPC64.cpp
133

We need to define symbols _savegpr* and _restgpr*, so a loop is needed. There is little saving if we move instructions alone to a constant array.

143

Mostly there are 4 callers. Placing it here saves some code.

152

Yes, the alternative approach will clutter the code. It will require 2 loops, with one finding the start point and the second placing instructions and definitions. This post adjustment approach is the simplest I can find.

MaskRay marked 2 inline comments as done.Thu, May 21, 9:35 AM
sfertile added inline comments.Thu, May 21, 12:31 PM
lld/ELF/Arch/PPC64.cpp
133

The savings mostly come from making the 4 loops we have now (and the 2 that need to be added later for the fpr save/restore routines) nearly identical. We can then write a helper function or lambda and call that 6 time instead of having 6 loops once we add support for _savefpr* and _restfpr* as well. With a minor tweak I believe the helper would also work for the vector save restore routines as well.

Something like:

Helper(uint32_t *Buffer, uint32_t BufferSize, StringRef Prefix) {
  int r = findFirstOf(prefix);  // returns the register number corresponding to the first symbol we need to define.
  // Any referenced symbols are defined, nothing to do.
  if (r == 32)
  return;

  int Value = 0;
  const int Offset = 4 * (r - 14);
  for (r < 32; ++r) {
   Name = FormatName(prefix, r)
    addOptional(Name, Value, defined);
    Value += 4;
  }
  // No need to check if defined is empty becuase we would have returned already if no symbols need to be defiend.
  addSection(Buffer + Offset, Buffer + Size, defined);
}

Its longer and more verbose then one of the for loops, but I think its much easier to follow the logic, and then it only takes 6 lines like:

Helper(savegpr0, ArraySize(savegpr0), "__savegpr0_");

Its more lines of code total, since defining the arrays adds more lines then we save with the replacing the for loops with the helper but I think its much more approachable for a reader.

152

I think this approach is simpler in terms of less lines of code, but not necessarily simpler to read or understand.

MaskRay updated this revision to Diff 265613.Thu, May 21, 2:47 PM
MaskRay marked an inline comment as done.

Simplify

lld/ELF/Arch/PPC64.cpp
133

Thanks for the suggestion. I have found a way to simplify the code.

MaskRay marked 3 inline comments as done.Thu, May 21, 2:52 PM
MaskRay updated this revision to Diff 265618.Thu, May 21, 3:05 PM

Simplify with MutableArrayRef<uint32_t>

sfertile accepted this revision.Tue, May 26, 6:52 AM

One comment but otherwise LGTM.

lld/ELF/Arch/PPC64.cpp
125

nit: from is always 14 so we don't need it as an argument, we can just always start the loop at 14.

This revision is now accepted and ready to land.Tue, May 26, 6:52 AM
MaskRay marked 2 inline comments as done.Tue, May 26, 9:06 AM

Thanks for the review! Will commit soon...

lld/ELF/Arch/PPC64.cpp
125

This is reserved for _savevr_20 and _restvr_20 which start from 20 instead of 14..

This revision was automatically updated to reflect the committed changes.
MaskRay marked an inline comment as done.