R_X86_64_GOTTPOFF is not always requires GOT entries. Some relocations can be converted to local ones.
This patch is a port of gold optimizations for this relocation:
https://android.googlesource.com/toolchain/binutils/+/53b6ed3bceea971857c996b6dcb96de96b99335f/binutils-2.19/gold/x86_64.cc#2551
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
ELF/Target.cpp | ||
---|---|---|
362 ↗ | (On Diff #40303) | I dont sure what to do here. How to check out of bound for negative idx. Should I check (will need BufBegin arg I guess) or we can assume its never be out ? |
ELF/InputSection.cpp | ||
---|---|---|
150–154 ↗ | (On Diff #40303) | Can you move this above line 137 just like "if (Target->isTlsGlobalDynamicReloc(Type)) { ... }"? Also, do you think you can merge getTlsOptimization and relocateTlsToLe into one function? I'd probably write like this. // Some TLS relocations are always compiled to use GOT, but the linker // is sometimes able to rewrite it so that it doesn't use GOT. This may not // only apply relocations but also modify preceding instructions. if (applyOptimizeTls(BufLoc, Type, AddrLoc, Body)) continue; |
ELF/Target.cpp | ||
356 ↗ | (On Diff #40303) | Please add a reference to Ulrich's document. |
362–368 ↗ | (On Diff #40303) | Is that actually a possible scenario that one wants to add a TLS value to SP register? Do you have to take care of that? |
ELF/Target.h | ||
61 ↗ | (On Diff #40303) | Currently it only returns None or ToLE, so can you change the signature of the function so that it returns a boolean value? |
ELF/InputSection.cpp | ||
---|---|---|
150–154 ↗ | (On Diff #40303) | About merging into one. |
ELF/Target.cpp | ||
362–368 ↗ | (On Diff #40303) | I did not analyse how much people do that :) |
ELF/Target.h | ||
61 ↗ | (On Diff #40303) | Will do. |
ELF/InputSection.cpp | ||
---|---|---|
150–154 ↗ | (On Diff #40303) | Fair. Then please rename relocateTlsOptimize. (Readers don't really want to know how it is going to be optimized in this context, so "ToLe" is too detailed.) |
ELF/Target.cpp | ||
362–368 ↗ | (On Diff #40303) | As long as it can be written simple enough, I'm fine, but currently it looks a bit too cryptic. There are six instructions this function may emit. Could you explain what each instruction mean? 0x49 0x81 0xc? |
Review comments addressed.
Simplified.
Test updated.
ELF/InputSection.cpp | ||
---|---|---|
150–154 ↗ | (On Diff #40303) | Renaming is done, but I did not move it because I need SymVA variable. Also this part of code became shorter now. And it uses continue like Target->relocNeedsCopy, so it looks to be in the most consistent place now, no ? |
ELF/Target.cpp | ||
362–368 ↗ | (On Diff #40303) | That is (but I simplified code, so actual table is below in comments)
This is 0x49 0x81 0xC? XX XX XX XX, add $0xXXXXXXXX, %r?
This is 0x49 0xC7 0xC? XX XX XX XX, mov $0xXXXXXXXX, %r?
Never saw not the 0x48 as prefix for that cases, so
Never saw not the 0x48 as prefix for that cases, so
This is 0x4D 0x8D 0x8? XX XX XX XX, lea 0xXXXXXXXX(%r?),%r?
Never saw not the 0x48 as prefix for that cases, so |
362–368 ↗ | (On Diff #40303) | gold and bfd contain misleading comment about that special rsp handling. In fact that also touch r12 for them: asm: addq tls0@GOTTPOFF(%rip), %rsp addq tls0@GOTTPOFF(%rip), %r8 ... addq tls0@GOTTPOFF(%rip), %r12 ... addq tls0@GOTTPOFF(%rip), %r15 00000000004000f0 <main>: 400105: 48 81 c4 f8 ff ff ff add $0xfffffffffffffff8,%rsp 40010c: 4d 8d 80 f8 ff ff ff lea -0x8(%r8),%r8 ... 400128: 49 81 c4 f8 ff ff ff add $0xfffffffffffffff8,%r12 ... 40013d: 4d 8d bf f8 ff ff ff lea -0x8(%r15),%r15 Looks like a bug of gold/bfd for me. Updated possible emit table is:
|
ELF/Target.h | ||
61 ↗ | (On Diff #40303) | Done. |
Or I can insert the fixed (in compare with gold/bfd) check affecting only rsp, but that will enlarge the void X86_64TargetInfo::relocateTlsOptimize()
ELF/Target.cpp | ||
---|---|---|
375–378 ↗ | (On Diff #40406) | No. As I wrote in comments it will enlarge that code, but I can do that. Should I ? |
379 ↗ | (On Diff #40406) | According to manual its optimized exactly to R_X86_64_TPOFF32. Not sure how important skipped in that case OOR check as well: case R_X86_64_TPOFF32: if (!isInt<32>(Val)) error("R_X86_64_TPOFF32 out of range"); |
ELF/Target.cpp | ||
---|---|---|
375–378 ↗ | (On Diff #40406) | With handling SP/r12 registers will be something like next: void X86_64TargetInfo::relocateTlsOptimize(uint8_t *Loc, uint8_t *BufEnd, uint32_t Type, uint64_t P, uint64_t SA) const { uint8_t *Prefix = &Loc[-3]; uint8_t *Inst = &Loc[-2]; uint8_t *RegSlot = &Loc[-1]; uint8_t Reg = (Loc[-1]) >> 3; bool IsAdd = !(*Inst == 0x8b || Reg == 4); if (Reg == 4 && !IsAdd) *Inst = 0x81; else *Inst = IsAdd ? 0x8d : 0xc7; if (*Prefix == 0x4c) *Prefix = IsAdd ? 0x4d : 0x49; *RegSlot = IsAdd ? (0x80 | Reg | (Reg << 3)) : (0xc0 | Reg); relocateOne(Loc, BufEnd, R_X86_64_TPOFF32, P, SA); } |
I also have patch implementing optimizations for R_X86_64_TLSLD relocation. Its relies on some code changes in this one, so will post it right after it.
ELF/InputSection.cpp | ||
---|---|---|
150 ↗ | (On Diff #40484) | getTlsOptimization -> isTlsOptimized (since it returns a boolean value.) Can you move this before uintX_t SymVA = getSymVA<ELFT>(Body); ? |
151 ↗ | (On Diff #40484) | Currently you are not using BufEnd, so please remove that parameter. |
ELF/Target.cpp | ||
354 ↗ | (On Diff #40484) | Remove outermost (). |
371–372 ↗ | (On Diff #40484) | I'm wondering if this is correct. One should never use a R_X86_64_GOTTPOFF relocation with instructions other than MOV or ADD? |
Review comments addressed.
ELF/InputSection.cpp | ||
---|---|---|
150 ↗ | (On Diff #40484) | Done. |
151 ↗ | (On Diff #40484) | It is used because relocateTlsOptimize() calls relocateOne() which accepts it. |
ELF/Target.cpp | ||
354 ↗ | (On Diff #40484) | Done. |
371–372 ↗ | (On Diff #40484) | As stated in comment above the method, @gottpoff(%rip) must be used in movq or addq instructions only (its written in Ulrich manual), so yes, this relocation optimization only can have mov and add cases and thats definitely correct. |
ELF/Target.cpp | ||
---|---|---|
392 ↗ | (On Diff #40551) | I applied you patch to my local repository to see if there's a room to make this simpler, only to find that this wouldn't be significantly simplified. I slightly updated the comments and code though. Please take a look. // In some conditions, R_X86_64_GOTTPOFF relocation can be optimized to // R_X86_64_TPOFF32 so that R_X86_64_TPOFF32 so that it does not use GOT. // This function does that. Read "ELF Handling For Thread-Local Storage, // 5.5 x86-x64 linker optimizations" (http://www.akkadia.org/drepper/tls.pdf) // by Ulrich Drepper for details. void X86_64TargetInfo::relocateTlsOptimize(uint8_t *Loc, uint8_t *BufStart, uint8_t *BufEnd, uint64_t P, uint64_t SA) const { // Ulrich's document section 6.5 says that @gottpoff(%rip) must be // used in MOVQ or ADDQ instructions only. // "MOVQ foo@GOTTPOFF(%RIP), %REG" is transformed to "MOVQ $foo, %REG". // "ADDQ foo@GOTTPOFF(%RIP), %REG" is transformed to "LEAQ foo(%REG), %REG" // (if the register is not RSP) or "ADDQ $foo, %RSP". // Opcodes info can be found at http://ref.x86asm.net/coder64.html#x48. if (Loc - 3 < BufStart) error("TLS relocation optimization failed. Buffer overrun!"); uint8_t *Prefix = Loc - 3; uint8_t *Inst = Loc - 2; uint8_t *RegSlot = Loc - 1; uint8_t Reg = Loc[-1] >> 3; bool IsMov = *Inst == 0x8b; bool RspAdd = !IsMov && Reg == 4; // r12 and rsp registers requires special handling. // Problem is that for other registers, for example leaq 0xXXXXXXXX(%r11),%r11 // result out is 7 bytes: 4d 8d 9b XX XX XX XX, // but leaq 0xXXXXXXXX(%r12),%r12 is 8 bytes: 4d 8d a4 24 XX XX XX XX. // The same true for rsp. So we convert to addq for them, saving 1 byte that // we dont have. if (RspAdd) *Inst = 0x81; else *Inst = IsMov ? 0xc7 : 0x8d; if (*Prefix == 0x4c) *Prefix = (IsMov || RspAdd) ? 0x49 : 0x4d; *RegSlot = (IsMov || RspAdd) ? (0xc0 | Reg) : (0x80 | Reg | (Reg << 3)); relocateOne(Loc, BufEnd, R_X86_64_TPOFF32, P, SA); } |
ELF/Target.cpp | ||
---|---|---|
371 ↗ | (On Diff #40551) | Please add a test for this. It is important to show that we can actually get here with broken input. |
Review comments addressed.
ELF/Target.cpp | ||
---|---|---|
371 ↗ | (On Diff #40551) | I bit doubt that we should check such things here. Broken input for that place with or without that patch will lead to broken output. With this optimization applied or without it. The difference only in what way it broken (broken + optimization == broken x 2, but still broken). |
392 ↗ | (On Diff #40551) | Looks good. Applied your changes. Thanks ! I only added r12 to comment: // (if the register is not RSP/R12) or "ADDQ $foo, %RSP". |
test/elf2/tls-opt.s | ||
2 ↗ | (On Diff #40551) | Done. |
13 ↗ | (On Diff #40551) | Any of them is sufficient I think but we are emiting binary output and not the text. So need to keep and check the binary. At the same time text helps to read the test so it should be there at least in comments for binary I think. But there is no point to move it to comments I believe. I would leave it as is because of that all. |
test/elf2/tls-opt.s | ||
---|---|---|
24 ↗ | (On Diff #40608) | The last one also is a part of currupted output, I`ll move it below |
test/ELF/tls-opt.s | ||
---|---|---|
2 ↗ | (On Diff #40747) | Forgot to switch to ld.lld here. Will be fixed. |
LGTM with a nit.
ELF/InputSection.cpp | ||
---|---|---|
137 ↗ | (On Diff #40903) | Can you move Body.isTLS() into isTlsOptimized function? |