Page MenuHomePhabricator

[PPC64] toc-indirect to toc-relative relaxation.
Needs ReviewPublic

Authored by sfertile on Nov 19 2018, 12:52 PM.

Details

Summary

When accessing a global variable which is not defined in the translation-unit being compiled the compilers on ppc64 will generate a toc-entry for the global and a got-indirect access using that toc-entry (as if it were a .got entry)

For example the following C code

extern int aGlobal;

int foo(void) {
  return aGlobal;
}

would get translated to:

        addis 3, 2, .LC0@toc@ha 
        ld 3, .LC0@toc@l(3)
        lwa 3, 0(3)

...

        .section        .toc,"aw",@progbits
.LC0:
        .tc aGlobal[TC],aGlobal

Where .LC0@toc is the offset from the TOC base-pointer to the label .LC0, and .tc aGlobal[TC],aGlobal creates an entry in the .toc section to store the address of aGlobal.
The first 2 instructions build the address of the toc-entry and load the address of aGlobal into r3, while the following lwa instruction loads the value of the global into r3.

If the global being accessed is a non-preemptable definition and the offset from the TOC pointer to the definition can be materialized with 2 instructions then we can relax the first 2 instructions to add that offset to the TOC base pointer rather then loading the address out of the .toc.

Diff Detail

Repository
rLLD LLVM Linker

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Zaara pointed out I am missing the full context, I'll re post the patch with the context fixed shortly.

sfertile updated this revision to Diff 174677.Nov 19 2018, 1:23 PM

Added full context

syzaara added inline comments.Nov 20 2018, 6:45 AM
ELF/Arch/PPC64.cpp
369

Instead of 58, we can use the enum DFormOpcd which has LD defined to 58.

371

Expeceted -> Expected

ELF/InputSection.cpp
879

Maybe use Target->GotEntrySize instead of 8.

886

entires -> entries

894

relxation -> relaxation

MaskRay added inline comments.Nov 21 2018, 1:28 PM
ELF/Arch/PPC64.cpp
372

Instr &= 0x03FFFFFF;

ELF/InputSection.cpp
889

Offset / 8 /* Toc Entry size */ -> Index

902

Maybe just return None;

MaskRay added inline comments.Nov 21 2018, 1:41 PM
ELF/InputSection.cpp
896

They are put here probably because of ARM/AArch64 precedents: getAArch64UndefinedRelativeWeakVA getARMStaticBase etc.

If TOC is PPC/PPC64 specific, the static function tryRelaxToc and many of its callees may be renamed to include the substring PPC.

ELF/Relocations.h
77

If the TOC (Table of Contents) section is PPC/PPC64 specific this should probably be named R_PPC_RELAX_TOC. R_AARCH64_RELAX_TLS_GD_TO_IE_PAGE_PC is a precedent.

sfertile updated this revision to Diff 175064.Nov 22 2018, 10:44 AM

Addressed initial review comments.

sfertile marked 9 inline comments as done.Nov 22 2018, 10:45 AM
sfertile added inline comments.
ELF/InputSection.cpp
902

Thanks, I wasn't aware of 'None'.

ruiu added inline comments.Nov 26 2018, 10:56 AM
ELF/Arch/PPC64.cpp
371

Error messages should start with a lowercase letter.

I think you should use error() instead of llvm_unreachable() because this condition can be triggered by a user. llvm_unreachable should be used only to report an internal error or a bug.

372–374

I think Instr = (Instr & 0x3FFFFFF) | 0x38000000; is a bit more straightforward.

Or,

writeInstrFromHalf16(Loc, (Instr & 0x3FFFFFF) | 0x38000000);

is perhaps better.

376

This is an unusual formatting style in lld. Please put break before }.

379

U -> u

ELF/InputSection.cpp
874

ElfTy -> ELFT

875

Can you return std::pair<Defined *, int64_t> instead of Optional<std::pair<Defined *, int64_t>>? It looks like you can simply return {nullptr, 0} to indicate a null value.

877

Replace auto with the actual type.

889

Ditto

930

Please do not use auto in lld unless its actual type is obvious from right-hand side.

grimar added a subscriber: grimar.Nov 27 2018, 1:34 AM
grimar added inline comments.
ELF/InputSection.cpp
907

Seems the intention was to name it 'relaxTocPPC64'?

918

dyn_cast -> cast as null return value is not expected here.

937

Defined* -> Defined

sfertile updated this revision to Diff 176625.Dec 4 2018, 7:13 AM
sfertile marked an inline comment as done.

Addressed further review comments.

sfertile marked 11 inline comments as done.Dec 4 2018, 7:17 AM
sfertile added inline comments.
ELF/InputSection.cpp
907

I named it relaxToPPC64 because the function is looking up the symbol we will potentially relax to. I failed to realize that its unfortunately close to relaxToc which would also make sense as a name :)

grimar added a comment.Dec 4 2018, 7:48 AM

I did not have a chance to debug it, but my comment is below.

ELF/InputSection.cpp
897

I think we usually reach here (relocateAlloc) when we already know will we do the relaxation or not.
I.e. for example on x86_64 for R_RELAX_GOT_PC we just call Target->relaxGot because check in the adjustRelaxExpr the conditions for relaxing. And here then we either have R_RELAX_GOT_PC which name contains RELAX and means we *will do* the relaxation or we would have another R_* expression if we are not going to relax.

So I think you should not have a R_PPC64_RELAX_TOC expression here if you are not going to relax. I think you should check the conditions somewhere earlier. Also, the platform-specific code can live into own file (like ELF\Arch\PPC64.cpp) then. I think you can put all the functions you added there.

sfertile marked an inline comment as done.Dec 5 2018, 8:44 AM

Related to @grimar comment, I am also playing around with doing some of the checking early and mapping toc based relocations that are not-relaxable to got based relocations during scanRelocs. Ideally I'd like to be able to add a separate step for PPC64 once all relocations have been scanned that will iterate over the .toc sections to handle large-code model where any symbols that can't be safely relaxed get added to the .got and all remaining toc based relocations get relaxed. This should hopefully allow me to get rid of the .toc sections in the final binaries, but I'm not familiar enough with large-code model, or gcc and xlc codegen to know if this is feasible yet.

ELF/InputSection.cpp
897

The got-indirect to toc-relative relaxation is unique in this regard because we can't decide if we can relax until the data segment has been fully built. During scanRelocs we could see if the indirection is to a defined symbol, but that is only enough to know that the relocation is potentially relax-able. We still need the offset from the TOC base pointer to the symbols definition to fit in a signed 32-bit offset (which isn't guaranteed in the large code model), and scanRelocs is too early to determine that. I could add the relax checking earlier than here but that would involve doing the same scan over the .toc sections relocations and replacing those that are indeed relaxable. I decided not to do that in the spirit of keeping it simple.

I like the idea of moving the platform specific code into PPC64.cpp, and will update the patch with that change, but I think here is the right place to perform the relaxation.

sfertile updated this revision to Diff 179574.Dec 27 2018, 1:17 PM

Moved the toc relaxation logic into the PPC64 target.

Few comments/suggestions.

ELF/Arch/PPC64.cpp
109

Warnings and error messages should start from the lowercase.

119

The same.

126

You can write shorter:

return {dyn_cast_or_null<Defined>(&IndirectSym), getAddend<ELFT>(Rela)};
136

Why not DefSym->Section->Name != ".toc" ?

137

return {};

140

return {};

154

So maybe you should never reach here if !Config->TocOptimize?
Seems you can do something like the following in getRelExpr:

case R_PPC64_TOC16_HA:
case R_PPC64_TOC16_LO_DS:
  return Config->TocOptimize ? R_PPC64_RELAX_TOC : R_XXXX;
156

RelaxTo.first/RelaxTo.second are a bit hard to follow and read. Lets do:

Defined *D;
int64_t Addend;
std::tie(D, Addend) = relaxTo(Rel);
ELF/InputSection.cpp
897

OK, thanks for the explanation.

ELF/Target.h
172

I think Returns false otherwise part is an obvious continuation and hence excessive.

test/ELF/ppc64-got-indirect.s
4–5

You can use

--check-prefixes=CHECK,CHECK-LE
test/ELF/ppc64-toc-relax-jumptable.s
37

Can you remove the excessive indentations please?

test/ELF/ppc64-toc-relax.s
63
# CHECK-LABEL: ...
sfertile updated this revision to Diff 179765.Dec 31 2018, 9:19 AM

Addressed review comments

sfertile marked 10 inline comments as done.Dec 31 2018, 9:23 AM
sfertile updated this revision to Diff 179766.Dec 31 2018, 9:26 AM

Forgot to update formatting in one of the tests.

sfertile marked 2 inline comments as done.Dec 31 2018, 9:27 AM
ruiu added inline comments.Jan 2 2019, 3:55 PM
ELF/Arch/PPC64.cpp
109

Isn't this an error?

119

Ditto

sfertile marked an inline comment as done.Jan 3 2019, 12:14 PM
sfertile added inline comments.
ELF/Arch/PPC64.cpp
109

Ideally it would be, however I found a gcc bug where it emits a toc entry without a corresponding relocation. This causes the look ups for any relocations after the skipped one to fail. I am still trying to cut the problem down to something smaller so I can understand it better (and open a bug report to get it fixed), but it seems that the rest of the codegen is valid (ie nothing tries to read or write to the toc entry missing the relocation). I don't think the problem is prevalent enough to warrant changing to a slower scan of the relocations in the case we do find a mismatch, and because the rest of the codegen is 'good' I don't think we should emit an error either.

sfertile marked an inline comment as done.Jan 7 2019, 7:12 AM
sfertile added inline comments.
ELF/Arch/PPC64.cpp
109

I've got some clarification from the ppc64 gcc/binutils developers and although the case I found was a bug , both gcc and xlc will emit values that don't need relocations into the .toc section in certian circumstances. I'll have to fix the lookup logic to deal with the fact I can't assume every .toc entry has a relocation.

sfertile marked 2 inline comments as not done.Jan 7 2019, 7:45 AM
sfertile updated this revision to Diff 182331.Jan 17 2019, 10:13 AM

Fixed look up to handle .toc sections that don't have a relocation for every entry and added a comprehensive test to verify we perform the correct optimization on all access types in this case.

MaskRay added inline comments.Jan 28 2019, 11:30 PM
ELF/Arch/PPC64.cpp
174

The title of this revision says "toc-indirect" while here and some other places say "got-indirect". Which term is preferred? (Or maybe they are two different things and I am confused..)

MaskRay added inline comments.Jan 28 2019, 11:32 PM
ELF/Arch/PPC64.cpp
115

if (Relas.empty())

ruiu added inline comments.Jan 29 2019, 4:00 PM
ELF/Arch/PPC64.cpp
110

It feels this comment can be improved. IIUC, this function scans relocations for a given section to find one whose offset is the same as a given offset. Is this correct? If so, can you add something like that as a comment?

114

I'd define an alias at the beginning of this function like this:

typedef typename ELFT::Rela RelTy;
120

When we define a local helper lambda function inside a function, we capture variables just by [&] instead of explicitly enumerating variables.

121

nit: use shorter variable name for a variable with just two lines of scope.

171

Can you add a high-level comment as to what optimization this function is supposed to do?

sfertile updated this revision to Diff 184516.Jan 31 2019, 9:05 AM

Addressed several review comments, and fixed bug where I was erroneously relaxing preepmtible symbols.

sfertile marked 6 inline comments as done.Jan 31 2019, 9:46 AM
sfertile added inline comments.
ELF/Arch/PPC64.cpp
120

I've updated to follow suit, but can I ask the rational behind this? For a lambda as small as this one its easy to see that only one variable is used and its not modified, but in general I would prefer to be explicit and not capture anymore then I used.

174

got-indirect is the term used in the ABI and so I have tried to stick mostly to that, especially for anything a user would see like a diagnostic. toc-indirect is a term I made up to describe the way gcc/clang/xlc actually produce a got-indirecton as its a bit different then what I think most would expect. The 2 ways have slightly different input, but the output from the linker is the same.

got-indirect -> its up to the linker to allocate a got entry for sym. The got entry is accessed relative to the toc-pointer.

 addis r3, r2, sym@got@ha   
 ld        r3, sym@got@l(r3) 
lwa       r3, r3

vs toc-indirect --> the compiler allocates an entry in the .toc section and puts a relocation for the symbols address into that .toc entry. The toc entry is accessed relative to the toc-pointer.

    addis r3, r2, .LC0@toc@ha   
    ld        r3, .LC0@toc@l(r3) 
   lwa       r3, r3
...

   .toc
.LC0:
   .tc sym[TC], sym # emits a relocation for sym. Roughly equivalent to .quad sym?

All this extra work in looking up the relocation to find the symbol thats being accessed is because the compiler emits the second form rather then the first. In the final binary the .toc sections are either merged into the .got section (bfd/gold) or allocated right after the .got section (lld) and you end up with a got-indirect access.

ruiu added a comment.Jan 31 2019, 10:40 AM

Generally looking good, but I have one question.

ELF/Arch/PPC64.cpp
137

I think I don't understand why you can't relax if the number of relocations for the .toc is smaller than the number of entires in .toc. I thought that if you can find a relocation that has a specific offset, that's the relocation that you are looking for, and you can read a symbol and an offset from that relocation. I think there's something more about PPC64 ABI that I don't know of. Could you explain?

sfertile marked an inline comment as done.Jan 31 2019, 1:31 PM
sfertile added inline comments.
ELF/Arch/PPC64.cpp
137

I think the missing piece here is that I am assuming each toc entry has at most 1 relocation. The relocations are R_PPC64_ADDR64 and relocates one full entry each, or we have a constant/value with no relocation. That way a look up should always find an r_offset greater then or equal to Offset on the first look-up, except for when we truncated the index because the one we calculated was outside the array bounds. If we truncated the index and find a r_offset < Offset we can return because the array is sorted by offset.