This is an archive of the discontinued LLVM Phabricator instance.

[PPC64] toc-indirect to toc-relative relaxation.
AbandonedPublic

Authored by sfertile on Nov 19 2018, 12:52 PM.

Details

Summary

When accessing a global variable which is not defined in the translation-unit being compiled the compilers on ppc64 will generate a toc-entry for the global and a got-indirect access using that toc-entry (as if it were a .got entry)

For example the following C code

extern int aGlobal;

int foo(void) {
  return aGlobal;
}

would get translated to:

        addis 3, 2, .LC0@toc@ha 
        ld 3, .LC0@toc@l(3)
        lwa 3, 0(3)

...

        .section        .toc,"aw",@progbits
.LC0:
        .tc aGlobal[TC],aGlobal

Where .LC0@toc is the offset from the TOC base-pointer to the label .LC0, and .tc aGlobal[TC],aGlobal creates an entry in the .toc section to store the address of aGlobal.
The first 2 instructions build the address of the toc-entry and load the address of aGlobal into r3, while the following lwa instruction loads the value of the global into r3.

If the global being accessed is a non-preemptable definition and the offset from the TOC pointer to the definition can be materialized with 2 instructions then we can relax the first 2 instructions to add that offset to the TOC base pointer rather then loading the address out of the .toc.

Diff Detail

Repository
rLLD LLVM Linker

Event Timeline

sfertile created this revision.Nov 19 2018, 12:52 PM

Zaara pointed out I am missing the full context, I'll re post the patch with the context fixed shortly.

sfertile updated this revision to Diff 174677.Nov 19 2018, 1:23 PM

Added full context

syzaara added inline comments.Nov 20 2018, 6:45 AM
ELF/Arch/PPC64.cpp
264

Instead of 58, we can use the enum DFormOpcd which has LD defined to 58.

266

Expeceted -> Expected

ELF/InputSection.cpp
872

Maybe use Target->GotEntrySize instead of 8.

879

entires -> entries

966

relxation -> relaxation

MaskRay added inline comments.Nov 21 2018, 1:28 PM
ELF/Arch/PPC64.cpp
267

Instr &= 0x03FFFFFF;

ELF/InputSection.cpp
882

Offset / 8 /* Toc Entry size */ -> Index

895

Maybe just return None;

MaskRay added inline comments.Nov 21 2018, 1:41 PM
ELF/InputSection.cpp
968

They are put here probably because of ARM/AArch64 precedents: getAArch64UndefinedRelativeWeakVA getARMStaticBase etc.

If TOC is PPC/PPC64 specific, the static function tryRelaxToc and many of its callees may be renamed to include the substring PPC.

ELF/Relocations.h
70

If the TOC (Table of Contents) section is PPC/PPC64 specific this should probably be named R_PPC_RELAX_TOC. R_AARCH64_RELAX_TLS_GD_TO_IE_PAGE_PC is a precedent.

sfertile updated this revision to Diff 175064.Nov 22 2018, 10:44 AM

Addressed initial review comments.

sfertile marked 9 inline comments as done.Nov 22 2018, 10:45 AM
sfertile added inline comments.
ELF/InputSection.cpp
895

Thanks, I wasn't aware of 'None'.

ruiu added inline comments.Nov 26 2018, 10:56 AM
ELF/Arch/PPC64.cpp
266

Error messages should start with a lowercase letter.

I think you should use error() instead of llvm_unreachable() because this condition can be triggered by a user. llvm_unreachable should be used only to report an internal error or a bug.

267–269

I think Instr = (Instr & 0x3FFFFFF) | 0x38000000; is a bit more straightforward.

Or,

writeInstrFromHalf16(Loc, (Instr & 0x3FFFFFF) | 0x38000000);

is perhaps better.

271

This is an unusual formatting style in lld. Please put break before }.

274

U -> u

ELF/InputSection.cpp
867

ElfTy -> ELFT

868

Can you return std::pair<Defined *, int64_t> instead of Optional<std::pair<Defined *, int64_t>>? It looks like you can simply return {nullptr, 0} to indicate a null value.

870

Replace auto with the actual type.

882

Ditto

923

Please do not use auto in lld unless its actual type is obvious from right-hand side.

grimar added a subscriber: grimar.Nov 27 2018, 1:34 AM
grimar added inline comments.
ELF/InputSection.cpp
900

Seems the intention was to name it 'relaxTocPPC64'?

911

dyn_cast -> cast as null return value is not expected here.

930

Defined* -> Defined

sfertile updated this revision to Diff 176625.Dec 4 2018, 7:13 AM
sfertile marked an inline comment as done.

Addressed further review comments.

sfertile marked 11 inline comments as done.Dec 4 2018, 7:17 AM
sfertile added inline comments.
ELF/InputSection.cpp
900

I named it relaxToPPC64 because the function is looking up the symbol we will potentially relax to. I failed to realize that its unfortunately close to relaxToc which would also make sense as a name :)

grimar added a comment.Dec 4 2018, 7:48 AM

I did not have a chance to debug it, but my comment is below.

ELF/InputSection.cpp
969

I think we usually reach here (relocateAlloc) when we already know will we do the relaxation or not.
I.e. for example on x86_64 for R_RELAX_GOT_PC we just call Target->relaxGot because check in the adjustRelaxExpr the conditions for relaxing. And here then we either have R_RELAX_GOT_PC which name contains RELAX and means we *will do* the relaxation or we would have another R_* expression if we are not going to relax.

So I think you should not have a R_PPC64_RELAX_TOC expression here if you are not going to relax. I think you should check the conditions somewhere earlier. Also, the platform-specific code can live into own file (like ELF\Arch\PPC64.cpp) then. I think you can put all the functions you added there.

sfertile marked an inline comment as done.Dec 5 2018, 8:44 AM

Related to @grimar comment, I am also playing around with doing some of the checking early and mapping toc based relocations that are not-relaxable to got based relocations during scanRelocs. Ideally I'd like to be able to add a separate step for PPC64 once all relocations have been scanned that will iterate over the .toc sections to handle large-code model where any symbols that can't be safely relaxed get added to the .got and all remaining toc based relocations get relaxed. This should hopefully allow me to get rid of the .toc sections in the final binaries, but I'm not familiar enough with large-code model, or gcc and xlc codegen to know if this is feasible yet.

ELF/InputSection.cpp
969

The got-indirect to toc-relative relaxation is unique in this regard because we can't decide if we can relax until the data segment has been fully built. During scanRelocs we could see if the indirection is to a defined symbol, but that is only enough to know that the relocation is potentially relax-able. We still need the offset from the TOC base pointer to the symbols definition to fit in a signed 32-bit offset (which isn't guaranteed in the large code model), and scanRelocs is too early to determine that. I could add the relax checking earlier than here but that would involve doing the same scan over the .toc sections relocations and replacing those that are indeed relaxable. I decided not to do that in the spirit of keeping it simple.

I like the idea of moving the platform specific code into PPC64.cpp, and will update the patch with that change, but I think here is the right place to perform the relaxation.

sfertile updated this revision to Diff 179574.Dec 27 2018, 1:17 PM

Moved the toc relaxation logic into the PPC64 target.

Few comments/suggestions.

ELF/Arch/PPC64.cpp
110

Warnings and error messages should start from the lowercase.

120

The same.

127

You can write shorter:

return {dyn_cast_or_null<Defined>(&IndirectSym), getAddend<ELFT>(Rela)};
137

Why not DefSym->Section->Name != ".toc" ?

138

return {};

141

return {};

155

So maybe you should never reach here if !Config->TocOptimize?
Seems you can do something like the following in getRelExpr:

case R_PPC64_TOC16_HA:
case R_PPC64_TOC16_LO_DS:
  return Config->TocOptimize ? R_PPC64_RELAX_TOC : R_XXXX;
157

RelaxTo.first/RelaxTo.second are a bit hard to follow and read. Lets do:

Defined *D;
int64_t Addend;
std::tie(D, Addend) = relaxTo(Rel);
ELF/InputSection.cpp
969

OK, thanks for the explanation.

ELF/Target.h
177

I think Returns false otherwise part is an obvious continuation and hence excessive.

test/ELF/ppc64-got-indirect.s
8

You can use

--check-prefixes=CHECK,CHECK-LE
test/ELF/ppc64-toc-relax-jumptable.s
37

Can you remove the excessive indentations please?

test/ELF/ppc64-toc-relax.s
63
# CHECK-LABEL: ...
sfertile updated this revision to Diff 179765.Dec 31 2018, 9:19 AM

Addressed review comments

sfertile marked 10 inline comments as done.Dec 31 2018, 9:23 AM
sfertile updated this revision to Diff 179766.Dec 31 2018, 9:26 AM

Forgot to update formatting in one of the tests.

sfertile marked 2 inline comments as done.Dec 31 2018, 9:27 AM
ruiu added inline comments.Jan 2 2019, 3:55 PM
ELF/Arch/PPC64.cpp
110

Isn't this an error?

120

Ditto

sfertile marked an inline comment as done.Jan 3 2019, 12:14 PM
sfertile added inline comments.
ELF/Arch/PPC64.cpp
110

Ideally it would be, however I found a gcc bug where it emits a toc entry without a corresponding relocation. This causes the look ups for any relocations after the skipped one to fail. I am still trying to cut the problem down to something smaller so I can understand it better (and open a bug report to get it fixed), but it seems that the rest of the codegen is valid (ie nothing tries to read or write to the toc entry missing the relocation). I don't think the problem is prevalent enough to warrant changing to a slower scan of the relocations in the case we do find a mismatch, and because the rest of the codegen is 'good' I don't think we should emit an error either.

sfertile marked an inline comment as done.Jan 7 2019, 7:12 AM
sfertile added inline comments.
ELF/Arch/PPC64.cpp
110

I've got some clarification from the ppc64 gcc/binutils developers and although the case I found was a bug , both gcc and xlc will emit values that don't need relocations into the .toc section in certian circumstances. I'll have to fix the lookup logic to deal with the fact I can't assume every .toc entry has a relocation.

sfertile marked 2 inline comments as not done.Jan 7 2019, 7:45 AM
sfertile updated this revision to Diff 182331.Jan 17 2019, 10:13 AM

Fixed look up to handle .toc sections that don't have a relocation for every entry and added a comprehensive test to verify we perform the correct optimization on all access types in this case.

MaskRay added inline comments.Jan 28 2019, 11:30 PM
ELF/Arch/PPC64.cpp
166

The title of this revision says "toc-indirect" while here and some other places say "got-indirect". Which term is preferred? (Or maybe they are two different things and I am confused..)

MaskRay added inline comments.Jan 28 2019, 11:32 PM
ELF/Arch/PPC64.cpp
107

if (Relas.empty())

ruiu added inline comments.Jan 29 2019, 4:00 PM
ELF/Arch/PPC64.cpp
102

It feels this comment can be improved. IIUC, this function scans relocations for a given section to find one whose offset is the same as a given offset. Is this correct? If so, can you add something like that as a comment?

106

I'd define an alias at the beginning of this function like this:

typedef typename ELFT::Rela RelTy;
112

When we define a local helper lambda function inside a function, we capture variables just by [&] instead of explicitly enumerating variables.

113

nit: use shorter variable name for a variable with just two lines of scope.

163

Can you add a high-level comment as to what optimization this function is supposed to do?

sfertile updated this revision to Diff 184516.Jan 31 2019, 9:05 AM

Addressed several review comments, and fixed bug where I was erroneously relaxing preepmtible symbols.

sfertile marked 6 inline comments as done.Jan 31 2019, 9:46 AM
sfertile added inline comments.
ELF/Arch/PPC64.cpp
112

I've updated to follow suit, but can I ask the rational behind this? For a lambda as small as this one its easy to see that only one variable is used and its not modified, but in general I would prefer to be explicit and not capture anymore then I used.

166

got-indirect is the term used in the ABI and so I have tried to stick mostly to that, especially for anything a user would see like a diagnostic. toc-indirect is a term I made up to describe the way gcc/clang/xlc actually produce a got-indirecton as its a bit different then what I think most would expect. The 2 ways have slightly different input, but the output from the linker is the same.

got-indirect -> its up to the linker to allocate a got entry for sym. The got entry is accessed relative to the toc-pointer.

 addis r3, r2, sym@got@ha   
 ld        r3, sym@got@l(r3) 
lwa       r3, r3

vs toc-indirect --> the compiler allocates an entry in the .toc section and puts a relocation for the symbols address into that .toc entry. The toc entry is accessed relative to the toc-pointer.

    addis r3, r2, .LC0@toc@ha   
    ld        r3, .LC0@toc@l(r3) 
   lwa       r3, r3
...

   .toc
.LC0:
   .tc sym[TC], sym # emits a relocation for sym. Roughly equivalent to .quad sym?

All this extra work in looking up the relocation to find the symbol thats being accessed is because the compiler emits the second form rather then the first. In the final binary the .toc sections are either merged into the .got section (bfd/gold) or allocated right after the .got section (lld) and you end up with a got-indirect access.

ruiu added a comment.Jan 31 2019, 10:40 AM

Generally looking good, but I have one question.

ELF/Arch/PPC64.cpp
129

I think I don't understand why you can't relax if the number of relocations for the .toc is smaller than the number of entires in .toc. I thought that if you can find a relocation that has a specific offset, that's the relocation that you are looking for, and you can read a symbol and an offset from that relocation. I think there's something more about PPC64 ABI that I don't know of. Could you explain?

sfertile marked an inline comment as done.Jan 31 2019, 1:31 PM
sfertile added inline comments.
ELF/Arch/PPC64.cpp
129

I think the missing piece here is that I am assuming each toc entry has at most 1 relocation. The relocations are R_PPC64_ADDR64 and relocates one full entry each, or we have a constant/value with no relocation. That way a look up should always find an r_offset greater then or equal to Offset on the first look-up, except for when we truncated the index because the one we calculated was outside the array bounds. If we truncated the index and find a r_offset < Offset we can return because the array is sorted by offset.

The patch needs a rebase.

ELF/Arch/PPC64.cpp
115

Sym as a reference can't be null, so you can use dyn_cast instead of dyn_cast_or_null.

118

The relocations are sorted by offset

IIRC this is not guaranteed but I think it is fine to lose toc optimization if the input does not have this property.

The linear search below for (auto R : llvm::reverse(Relas.slice(0, Index))) { concerns me.

Is it possible to add a LastRelaIndex variable in InputSectionBase::relocateAlloc:

if (!tryRelaxTocPPC64(Type, Rel, Expr, BufLoc, LastRelaIndex))

and keep it monotonically increasing in getSymAndAddend? If the input doesn't guarantee increasing r_offset, it may lose TOC optimizations for some relocations, but as I said it should be fine.

Herald added a project: Restricted Project. · View Herald TranscriptMar 26 2019, 11:08 PM
Herald added a subscriber: jdoerfert. · View Herald Transcript
sfertile marked an inline comment as done.Mar 27 2019, 8:41 AM
sfertile added inline comments.
ELF/Arch/PPC64.cpp
118

The relocations are sorted by offset

IIRC this is not guaranteed

I believe its guaranteed for binutils tools. We also sort the relocations for the .toc section in scanRelocs in case there are other tools which don't produce the relocations in sorted order.

The linear search below for (auto R : llvm::reverse(Relas.slice(0, Index))) { concerns me

I believe missing toc relocations to be an uncommon thing. As far as I am aware llvm will not place constants into the .toc section. gcc will for a few very specific cases but it is still not very common. xlc is the outlier here as I heard it will do this some what commonly despite there being no need to with the V2 abi.

For the objects I found that were missing relocations when compiled with gcc , we either found the relocation we were looking for, or broke out of the search in a few comparisons. If a linear search does become a problem for some pathological inputs we could switch to a binary search but I think doing that initially we would end up pessimaling the common case for something that might not ever naturally occur.

MaskRay added inline comments.Mar 28 2019, 4:17 AM
ELF/Arch/PPC64.cpp
118

I was not suggesting a binary search.

I suggested to change:

for (const Relocation &Rel : Relocations) { // the loop is in relocateAlloc
  // Offset is increasing
  for (Index = computed(Rel); J >= 0; J--) // the loop is in PPC64.cpp getSymAndAddend
    if (Offset == Relas[Index].r_offset) {
      /* found */
      break;
    }
}

to

Index = 0
for (const Relocation &Rel : Relocations) { // the loop is in relocateAlloc
  // Offset is increasing
  for (; Index < Relas.size(); ++Index)
    // Index is increasing
    if (Offset == Relas[Index].r_offset) {
      /* found */
      break;
    }
}
MaskRay added inline comments.Mar 28 2019, 4:21 AM
ELF/Arch/PPC64.cpp
118

Sorry, see below:

for (const Relocation &Rel : Relocations) { // the loop is in relocateAlloc
  // Offset is increasing
  for (Index = computed(Rel); Index >= 0; Index--) // the loop is in PPC64.cpp getSymAndAddend
    if (Offset == Relas[Index].r_offset) {
      /* found */
      break;
    }
}

to

Index = 0
for (const Relocation &Rel : Relocations) { // the loop is in relocateAlloc
  // Offset is increasing
  for (; Index < Relas.size(); ++Index)
    // Index is increasing
    if (Offset == Relas[Index].r_offset) {
      /* found */
      break;
    } else if (Offset < Relas[Index].r_offset)
      break;
}
sfertile added inline comments.Mar 28 2019, 9:05 AM
ELF/Arch/PPC64.cpp
118

I think I may have misunderstood which relocations you were referring too with the first comment about sort order. Only the relocations that operate on the .toc section are sorted. (Otherwise I'm misunderstanding this proposed change, perhaps we could talk on IRC to clairfy?)

Here is my understanding of the proposed look-up:

Index = 0
for (const Relocation &Rel : Relocations) { // the loop is in relocateAlloc
  // Offset is not (guranteed) increasing, Addend is not (typically) increasing.
  Rel.Sym    // This must be .toc if its an indirect.
  Rel.Addend // The addends will not be in any particular order. 
                         // When looking up the symbol we indirect too, we want to find the
                        //  relocation that relocates the toc section at offset `Rel.Addend`.
  Rel.Offset // The offset in the section being relocated. Unrelated to the lookup performed in the .toc section.

  // Inside the PPC64 target.
  // Loop over the relocations for the .toc section. These **are** sorted based on offset.

  for (; Index < Relas.size(); ++Index)
    // Index is increasing
    if (Offset == Relas[Index].r_offset) {
      /* found */
      break;
    } else if (Offset < Relas[Index].r_offset)
      break;
}
  1. Any indirect that refers to an offset smaller then the first missing relocation would still be optimized as its array mapping is still correct.
  2. Any offset strictly between the first missing relocation and the largest toc offset seen so far will not be optimized as Relas[Index].r_offset > Offset
  3. Any toc offset greater then or equal too the largest toc offset seen so far will be optimized (as long as it has a relocation). If the new offset is greater then the previous maximum offset, it increases the size of the window described in #2.

@sfertile Didn't catch you these days on IRC...

I've created D60958 for my idea.

@sfertile Didn't catch you these days on IRC...

I've created D60958 for my idea.

Sorry about that. I had a workshop the week following the last time we talked and I've been focused on getting parts of our AIX work moving since then. I'm going over D60958 right now. Thanks for doing this.

sfertile abandoned this revision.May 2 2019, 9:12 AM

Abandoning in favour of D60958.