- Support R_SPARC_GOT13 relocation, completing the support for 32bit PIC model.
- Support the R_SPARC_GOTDATA_OP_ family of relocations and its optimizations.
- Reserve a GOT entry as required by the ABI.
- Emit RELA entries as required by the ABI.
- Avoid the use of .got.plt.
- Apply the R_SPARC_JMP_SLOT relocation on PLT entries.
Details
Diff Detail
Event Timeline
Commits should be something like:
- Fix lld to use rela
- Finish GOT support
- Add IE TLS
- Add LD TLS
- Add GD TLS
I dislike that we are fixing these stuff increasingly with so many patches.
See the history of Arch/PPC.cpp - most stuff are implemented with very few commits.
I'm getting a bit tired of splitting and merging patches, can we please decide on what the best course of action is and end this suffering?
Hm, MaskRay makes a good point. I guess my real complaint when I stop and think about it is not combining multiple things into one commit, but having a commit that says "do X" when it's really "do X and Y". One commit that's "fix everything", or two commits "fix everything except TLS" and "finish TLS", seems fine to me, so long as the descriptions are correct and don't sneak in changes that are unrelated to the commit _message_.
I think the problem is the existing port is too broken. Aside from that, some relocation types are clearly not tested. Some GOT/PLT relocations are likely not properly tested.
The bar for the initial check-in was too low I'd say.
I think for a mature port, fixing things incrementally make for good commit history. But I cannot feel similarly for sparcv9 which apparently misses many things.
Personally I like the test organization of ppc32-* the most. Not too many tests (x86_64 tends to have the problem that there are too many files) but sufficient coverage is provided.
By looking at the history, you can see that I ensured GOT/PLT and every related things are correct in one commit and we never need to fix the implemented stuff.
I think it is useful to set up a minimal goal. When you fix/implement GOT/PLT, make sure some basic programs with GOT/PLT are really working.
If we do "fix GOT a" "fot GOT b" "fix PLT c", it is very likely that a subsequent commit may update an earlier commit. This is less elegant.
So what's the state of this port? When I implemented ppc32 support and added riscv support, I made sure glibc/musl ppc32/riscv dynamically linking programs work.
Some (candidate) milestones: (1) support -fno-pic -no-pie -nostdlib link (2) support dynamic linking (many stuff are here: RELATIVE/COPY/GOT/PLT/...) (3) support tls (4) support extra things that glibc static links require
The port is good enough to be able to successfully link the whole Zig compiler and its test suite.
This set of patches is part of my effort to let it build and run a whole Zig program that's dynamically linked against glibc, I'm progressively fixing lld as needed.
My current roadmap is:
- Implement the remaining TLS models {IE, GD, LD}
- Implement the GOTDATA relocations.
Some fixup commits to fix what was broken/missing are perhaps inelegant but necessary.
Add back the RELA bit, no explicit test is needed as the existing ones are already covering that bit.
Sort of. I managed to link and run (under Qemu) a dynamically-linked PIE executable after applying a few more touch-ups, the idea is to turn this patchset into one targeting GOT/PLT support, as you suggested.
No updated patch was uploaded because I've hit a problem with (I think) how LLD handles GOT entries and how SPARC GOTDATA relocations are supposed to work. At the moment the plan is to avoid implementing the GOTDATA optimizations and treat them as regular GOT-relative relocations instead.
Below you can find the set of relocations emitted by GCC for a small hello-world program, the six GOTDATA triplets reference six different objects in the .rodata sections ... by means of a STT_SECTION reference with non-zero addend. Since all the six relocations point to the same symbol LLD generates a single GOT entry (every call to isInGot returns true) and that generates binaries that are crashing at runtime.
As you can see the code sequence for this kind of relocation expects a single GOT entry for each relocation w/ different addends:
sethi %gdop_hix22(symbol), %l1 xor %l1, %gdop_lox10(symbol), %l1 ld [%l7 + %l1], %l2, %gdop(symbol)
I'm open to suggestions on how to overcome this problem, of course relaxing this relocation kind solves the problem at hand (even though that opens another can of worms, RelExpr already has 64 elements and RelExprMaskBuilder requires some work to accommodate more entries) but I wonder if that'd be only papering over a bigger problem.
Section (3) .rela.text { 0x4 R_SPARC_PC22 _GLOBAL_OFFSET_TABLE_ 0xFFFFFFFFFFFFFFF8 0x8 R_SPARC_PC10 _GLOBAL_OFFSET_TABLE_ 0xFFFFFFFFFFFFFFFC 0xC R_SPARC_WPLT30 __sparc_get_pc_thunk.l7 0x0 0x24 R_SPARC_GOTDATA_OP_HIX22 .rodata 0x0 0x28 R_SPARC_GOTDATA_OP_LOX10 .rodata 0x0 0x2C R_SPARC_GOTDATA_OP .rodata 0x0 0x34 R_SPARC_WPLT30 puts 0x0 0x3C R_SPARC_GOTDATA_OP_HIX22 .rodata 0x10 0x40 R_SPARC_GOTDATA_OP_LOX10 .rodata 0x10 0x44 R_SPARC_GOTDATA_OP .rodata 0x10 0x4C R_SPARC_GOTDATA_OP_HIX22 .rodata 0x18 0x50 R_SPARC_GOTDATA_OP_LOX10 .rodata 0x18 0x54 R_SPARC_GOTDATA_OP .rodata 0x18 0x5C R_SPARC_WPLT30 fopen 0x0 0x74 R_SPARC_GOTDATA_OP_HIX22 .rodata 0x28 0x78 R_SPARC_GOTDATA_OP_LOX10 .rodata 0x28 0x7C R_SPARC_GOTDATA_OP .rodata 0x28 0x84 R_SPARC_WPLT30 puts 0x0 0x98 R_SPARC_GOTDATA_OP_HIX22 .rodata 0x30 0x9C R_SPARC_GOTDATA_OP_LOX10 .rodata 0x30 0xA0 R_SPARC_GOTDATA_OP .rodata 0x30 0xA8 R_SPARC_WPLT30 fwrite 0x0 0xB4 R_SPARC_WPLT30 fclose 0x0 0xBC R_SPARC_GOTDATA_OP_HIX22 .rodata 0x38 0xC0 R_SPARC_GOTDATA_OP_LOX10 .rodata 0x38 0xC4 R_SPARC_GOTDATA_OP .rodata 0x38 0xCC R_SPARC_WPLT30 puts 0x0 }
A relocation that behaves as R_GOT_OFF or, when it's possible to relax it, as a R_GOTREL.
Yes, the R_GOT_OFF version of what R_RELAX_GOT_PC is to R_GOT_PC (e.g. movq foo@GOTPCREL(%rip), %rax -> leaq foo(%rip), %rax, where the latter means leaq foo@PCREL(%rip) but that's implied by the %rip base).
Oh. Yes, that's the idea. At the moment I'm treating all of them as R_GOT_OFF but, as explained above, that's not working for every binary.
Here's the updated patch, I'm now able to link and run several small programs linked against glibc (in both PIC/PIE mode), shared libraries and some basic examples w/ ifuncs.
The use of uint128_t is a hack that works pretty well, suggestions are welcome.
Once D102575 I'll be able to commit some tests.
Downstream we turned it into a std::pair<uint64_t, uint64_t> (https://github.com/CTSRD-CHERI/llvm-project/blob/de58a2a25c32471b4b3c2aec77fee624e24120cd/lld/ELF/Relocations.cpp#L136-L179), though something more general where it's parameterised on the number of words might be nicer rather than hand-rolling a pair version.
Interesting, I started implementing it in terms of an array of uint64_t elements.
Using a std::pair looks good to me, it's quite unlikely for RelExpr to become bigger than 128 (hopefully). Do you plan on upstreaming this change?
clang-tidy: warning: 'getDynRel' overrides a member function but is not marked 'override' [clang-diagnostic-inconsistent-missing-override]
not useful