This is an archive of the discontinued LLVM Phabricator instance.

[ELF2] - Optimization for R_X86_64_GOTTPOFF relocation.
ClosedPublic

Authored by grimar on Nov 16 2015, 9:30 AM.

Diff Detail

Repository
rL LLVM

Event Timeline

grimar updated this revision to Diff 40303.Nov 16 2015, 9:30 AM
grimar retitled this revision from to [ELF2] - Optimization for R_X86_64_GOTTPOFF relocation..
grimar updated this object.
grimar added reviewers: ruiu, rafael.
grimar added subscribers: llvm-commits, grimar.
grimar added inline comments.Nov 16 2015, 9:33 AM
ELF/Target.cpp
362 ↗(On Diff #40303)

I dont sure what to do here. How to check out of bound for negative idx. Should I check (will need BufBegin arg I guess) or we can assume its never be out ?

ruiu added inline comments.Nov 16 2015, 10:06 AM
ELF/InputSection.cpp
150–154 ↗(On Diff #40303)

Can you move this above line 137 just like "if (Target->isTlsGlobalDynamicReloc(Type)) { ... }"?

Also, do you think you can merge getTlsOptimization and relocateTlsToLe into one function? I'd probably write like this.

// Some TLS relocations are always compiled to use GOT, but the linker
// is sometimes able to rewrite it so that it doesn't use GOT. This may not
// only apply relocations but also modify preceding instructions.
if (applyOptimizeTls(BufLoc, Type, AddrLoc, Body))
  continue;
ELF/Target.cpp
356 ↗(On Diff #40303)

Please add a reference to Ulrich's document.

362–368 ↗(On Diff #40303)

Is that actually a possible scenario that one wants to add a TLS value to SP register? Do you have to take care of that?

ELF/Target.h
61 ↗(On Diff #40303)

Currently it only returns None or ToLE, so can you change the signature of the function so that it returns a boolean value?

grimar added inline comments.Nov 16 2015, 11:53 AM
ELF/InputSection.cpp
150–154 ↗(On Diff #40303)

About merging into one.
getTlsOptimization() is also now used also in another place, in bool X86_64TargetInfo::relocNeedsGot().
So such merging probably not possible.

ELF/Target.cpp
362–368 ↗(On Diff #40303)

I did not analyse how much people do that :)
gold and bfd do the same. And it is still possible scenario even if chance is low. I jsut want to be on a safe side with this single line.

ELF/Target.h
61 ↗(On Diff #40303)

Will do.

ruiu added inline comments.Nov 16 2015, 12:26 PM
ELF/InputSection.cpp
150–154 ↗(On Diff #40303)

Fair. Then please rename relocateTlsOptimize. (Readers don't really want to know how it is going to be optimized in this context, so "ToLe" is too detailed.)

ELF/Target.cpp
362–368 ↗(On Diff #40303)

As long as it can be written simple enough, I'm fine, but currently it looks a bit too cryptic.

There are six instructions this function may emit. Could you explain what each instruction mean?

0x49 0x81 0xc?
0x49 0xc7 0xc?
0x?? 0x81 0xc?
0x?? 0xc7 0xc?
0x4d 0x8d 0x8?
0x?? 0x8d 0x8?

grimar updated this revision to Diff 40406.Nov 17 2015, 9:50 AM

Review comments addressed.
Simplified.
Test updated.

ELF/InputSection.cpp
150–154 ↗(On Diff #40303)

Renaming is done, but I did not move it because I need SymVA variable. Also this part of code became shorter now. And it uses continue like Target->relocNeedsCopy, so it looks to be in the most consistent place now, no ?

ELF/Target.cpp
362–368 ↗(On Diff #40303)

That is (but I simplified code, so actual table is below in comments)

  1. 0x49 0x81 0xc?

This is 0x49 0x81 0xC? XX XX XX XX, add $0xXXXXXXXX, %r?
Ex: 49 81 C0 00 00 00 00 = add $0x0, %r8

  1. 0x49 0xc7 0xc?

This is 0x49 0xC7 0xC? XX XX XX XX, mov $0xXXXXXXXX, %r?
Ex: 49 C7 C0 00 00 00 00 = mov $0x0, %r8

  1. 0x?? 0x81 0xc?

Never saw not the 0x48 as prefix for that cases, so
This is 0x48 0x81 0xC? XX XX XX XX, add $0xXXXXXXXX, %r??
Ex: 48 81 C0 00 00 00 00 = add $0x0, %rax

  1. 0x?? 0xc7 0xc?

Never saw not the 0x48 as prefix for that cases, so
This is 0x48 0xc7 0xc? XX XX XX XX, mov $0xXXXXXXXX, %r??
Ex: 48 C7 C0 00 00 00 00 = mov $0x0, %rax

  1. 0x4d 0x8d 0x8?

This is 0x4D 0x8D 0x8? XX XX XX XX, lea 0xXXXXXXXX(%r?),%r?
Ex: 4D 8D 89 00 00 00 00 = lea 0x0(%r9),%r9

  1. 0x?? 0x8d 0x8?

Never saw not the 0x48 as prefix for that cases, so
This is 0x48 0x8d 0x8? XX XX XX XX, lea 0xXXXXXXXX(%r??),%r??
Ex: 48 8D 89 00 00 00 00 = lea 0x0(%rcx),%rcx

362–368 ↗(On Diff #40303)

gold and bfd contain misleading comment about that special rsp handling. In fact that also touch r12 for them:

asm:
 addq tls0@GOTTPOFF(%rip), %rsp
 addq tls0@GOTTPOFF(%rip), %r8 
...
 addq tls0@GOTTPOFF(%rip), %r12
...
 addq tls0@GOTTPOFF(%rip), %r15 

00000000004000f0 <main>:
  400105:	48 81 c4 f8 ff ff ff 	add    $0xfffffffffffffff8,%rsp
  40010c:	4d 8d 80 f8 ff ff ff 	lea    -0x8(%r8),%r8
...
  400128:	49 81 c4 f8 ff ff ff 	add    $0xfffffffffffffff8,%r12
...
  40013d:	4d 8d bf f8 ff ff ff 	lea    -0x8(%r15),%r15

Looks like a bug of gold/bfd for me.
I rewrited the code to simplify it and removed special handling for rsp for now. I think we can add that if it will be needed later.

Updated possible emit table is:

  1. 0x49 0xC7 0xC?
  2. 0x?? 0xC7 0xC?
  3. 0x4D 0x8D 0x8?
  4. 0x?? 0x8D 0x8?
ELF/Target.h
61 ↗(On Diff #40303)

Done.

I rewrited the code to simplify it and removed special handling for rsp for now. I think we can add that if it will be needed later.

Or I can insert the fixed (in compare with gold/bfd) check affecting only rsp, but that will enlarge the void X86_64TargetInfo::relocateTlsOptimize()

ruiu added inline comments.Nov 17 2015, 10:00 AM
ELF/Target.cpp
367 ↗(On Diff #40406)

Add BufStart and check for the bound.

370–374 ↗(On Diff #40406)

Instruct -> Inst
IsMovOp -> IsMov

375–378 ↗(On Diff #40406)

Does this handle SP register?

379 ↗(On Diff #40406)

I think the original code was better

write32le(Loc, SA - Out<ELF64LE>::TlsPhdr->p_memsz);
grimar added inline comments.Nov 17 2015, 10:20 AM
ELF/Target.cpp
375–378 ↗(On Diff #40406)

No. As I wrote in comments it will enlarge that code, but I can do that. Should I ?

379 ↗(On Diff #40406)

According to manual its optimized exactly to R_X86_64_TPOFF32. Not sure how important skipped in that case OOR check as well:

case R_X86_64_TPOFF32:
    if (!isInt<32>(Val))
      error("R_X86_64_TPOFF32 out of range");
grimar added inline comments.Nov 17 2015, 12:11 PM
ELF/Target.cpp
375–378 ↗(On Diff #40406)

With handling SP/r12 registers will be something like next:
(generates
48 81 c4 f8 ff ff ff add $0xfffffffffffffff8,%rsp for addq tls0@GOTTPOFF(%rip), %rsp
49 81 c4 f8 ff ff ff add $0xfffffffffffffff8,%r12 for addq tls0@GOTTPOFF(%rip), %r12)

void X86_64TargetInfo::relocateTlsOptimize(uint8_t *Loc, uint8_t *BufEnd,
                                           uint32_t Type, uint64_t P,
                                           uint64_t SA) const {
  uint8_t *Prefix = &Loc[-3];
  uint8_t *Inst = &Loc[-2];
  uint8_t *RegSlot = &Loc[-1];
  uint8_t Reg = (Loc[-1]) >> 3;
  bool IsAdd = !(*Inst == 0x8b || Reg == 4);
  if (Reg == 4 && !IsAdd)
    *Inst = 0x81;
  else
    *Inst = IsAdd ? 0x8d : 0xc7;
  if (*Prefix == 0x4c)
    *Prefix = IsAdd ? 0x4d : 0x49;
  *RegSlot = IsAdd ? (0x80 | Reg | (Reg << 3)) : (0xc0 | Reg);

  relocateOne(Loc, BufEnd, R_X86_64_TPOFF32, P, SA);
}
grimar updated this revision to Diff 40484.Nov 18 2015, 1:52 AM
  • Review comments addressed
  • Added handling of rsp/r12, added comment why it is needed.
  • Added buffer overrun check.
  • Test updated (added rsp/r12 cases).
ELF/Target.cpp
367 ↗(On Diff #40406)

Done.

370–374 ↗(On Diff #40406)

Done.

I also have patch implementing optimizations for R_X86_64_TLSLD relocation. Its relies on some code changes in this one, so will post it right after it.

ruiu added inline comments.Nov 18 2015, 1:05 PM
ELF/InputSection.cpp
150 ↗(On Diff #40484)

getTlsOptimization -> isTlsOptimized (since it returns a boolean value.)

Can you move this before

uintX_t SymVA = getSymVA<ELFT>(Body);

?

151 ↗(On Diff #40484)

Currently you are not using BufEnd, so please remove that parameter.

ELF/Target.cpp
354 ↗(On Diff #40484)

Remove outermost ().

371–372 ↗(On Diff #40484)

I'm wondering if this is correct. One should never use a R_X86_64_GOTTPOFF relocation with instructions other than MOV or ADD?

grimar updated this revision to Diff 40551.Nov 18 2015, 1:40 PM

Review comments addressed.

ELF/InputSection.cpp
150 ↗(On Diff #40484)

Done.

151 ↗(On Diff #40484)

It is used because relocateTlsOptimize() calls relocateOne() which accepts it.
I am not using Type now, so removed it.

ELF/Target.cpp
354 ↗(On Diff #40484)

Done.

371–372 ↗(On Diff #40484)

As stated in comment above the method, @gottpoff(%rip) must be used in movq or addq instructions only (its written in Ulrich manual), so yes, this relocation optimization only can have mov and add cases and thats definitely correct.
Also just in case gold and bfd has the same logic (but Ulrich is more preferable information source I think).

ruiu added inline comments.Nov 18 2015, 3:24 PM
ELF/Target.cpp
392 ↗(On Diff #40551)

I applied you patch to my local repository to see if there's a room to make this simpler, only to find that this wouldn't be significantly simplified. I slightly updated the comments and code though. Please take a look.

// In some conditions, R_X86_64_GOTTPOFF relocation can be optimized to
// R_X86_64_TPOFF32 so that R_X86_64_TPOFF32 so that it does not use GOT.
// This function does that. Read "ELF Handling For Thread-Local Storage,
// 5.5 x86-x64 linker optimizations" (http://www.akkadia.org/drepper/tls.pdf)
// by Ulrich Drepper for details.
void X86_64TargetInfo::relocateTlsOptimize(uint8_t *Loc, uint8_t *BufStart,
                                           uint8_t *BufEnd, uint64_t P,
                                           uint64_t SA) const {
  // Ulrich's document section 6.5 says that @gottpoff(%rip) must be
  // used in MOVQ or ADDQ instructions only.
  // "MOVQ foo@GOTTPOFF(%RIP), %REG" is transformed to "MOVQ $foo, %REG".
  // "ADDQ foo@GOTTPOFF(%RIP), %REG" is transformed to "LEAQ foo(%REG), %REG"
  // (if the register is not RSP) or "ADDQ $foo, %RSP".
  // Opcodes info can be found at http://ref.x86asm.net/coder64.html#x48.
  if (Loc - 3 < BufStart)
    error("TLS relocation optimization failed. Buffer overrun!");
  uint8_t *Prefix = Loc - 3;
  uint8_t *Inst = Loc - 2;
  uint8_t *RegSlot = Loc - 1;
  uint8_t Reg = Loc[-1] >> 3;
  bool IsMov = *Inst == 0x8b;
  bool RspAdd = !IsMov && Reg == 4;
  // r12 and rsp registers requires special handling.
  // Problem is that for other registers, for example leaq 0xXXXXXXXX(%r11),%r11
  // result out is 7 bytes: 4d 8d 9b XX XX XX XX,
  // but leaq 0xXXXXXXXX(%r12),%r12 is 8 bytes: 4d 8d a4 24 XX XX XX XX.
  // The same true for rsp. So we convert to addq for them, saving 1 byte that
  // we dont have.
  if (RspAdd)
    *Inst = 0x81;
  else
    *Inst = IsMov ? 0xc7 : 0x8d;
  if (*Prefix == 0x4c)
    *Prefix = (IsMov || RspAdd) ? 0x49 : 0x4d;
  *RegSlot = (IsMov || RspAdd) ? (0xc0 | Reg) : (0x80 | Reg | (Reg << 3));
  relocateOne(Loc, BufEnd, R_X86_64_TPOFF32, P, SA);
}
rafael added inline comments.Nov 18 2015, 4:46 PM
test/elf2/tls-opt.s
2 ↗(On Diff #40551)

Just create a _start and drop the "-e main"

13 ↗(On Diff #40551)

Do you really need the binary content? The text should be sufficient, no? I.E.:

// DISASM-NEXT: 11000: {{.*}} movq $-8, %rax

rafael added inline comments.Nov 18 2015, 4:47 PM
ELF/Target.cpp
371 ↗(On Diff #40551)

Please add a test for this.

It is important to show that we can actually get here with broken input.

grimar updated this revision to Diff 40608.Nov 19 2015, 12:38 AM

Review comments addressed.

ELF/Target.cpp
371 ↗(On Diff #40551)

I bit doubt that we should check such things here. Broken input for that place with or without that patch will lead to broken output. With this optimization applied or without it. The difference only in what way it broken (broken + optimization == broken x 2, but still broken).
Broken input is a bug of compiler and should be covered by its tests or by our tests of broken imputs that are separate tests.
Second problem that if according manual this relocation must use only mov or add then in future llvm-mc for example may add check for that and do not generate broken output. That will make this test to fail. We can save from that only by placing precompiled corrupted binaries as inputs, but thats again not for that patch I think.
But anyways just in case I added what you asked for to test. So if you really think we should do that for some reason - I see nothing too much bad in that if it touches only the test. Its easy to remove from test at any time.

392 ↗(On Diff #40551)

Looks good. Applied your changes. Thanks !

I only added r12 to comment:

// (if the register is not RSP/R12) or "ADDQ $foo, %RSP".
test/elf2/tls-opt.s
2 ↗(On Diff #40551)

Done.

13 ↗(On Diff #40551)

Any of them is sufficient I think but we are emiting binary output and not the text. So need to keep and check the binary. At the same time text helps to read the test so it should be there at least in comments for binary I think. But there is no point to move it to comments I believe. I would leave it as is because of that all.

grimar added inline comments.Nov 19 2015, 7:42 AM
test/elf2/tls-opt.s
24 ↗(On Diff #40608)

The last one also is a part of currupted output, I`ll move it below
// Corrupred output:

grimar updated this revision to Diff 40747.Nov 20 2015, 12:43 AM

Rebased.

grimar added inline comments.Nov 20 2015, 4:53 AM
test/ELF/tls-opt.s
2 ↗(On Diff #40747)

Forgot to switch to ld.lld here. Will be fixed.

grimar updated this revision to Diff 40903.Nov 23 2015, 12:26 AM
  • Rebased
  • Removed buffer overrun check and *BeginBuf argument.
  • Test: ld.lld2->ld.lld
ruiu accepted this revision.Nov 23 2015, 11:16 AM
ruiu edited edge metadata.

LGTM with a nit.

ELF/InputSection.cpp
137 ↗(On Diff #40903)

Can you move Body.isTLS() into isTlsOptimized function?

This revision is now accepted and ready to land.Nov 23 2015, 11:16 AM
In D14713#295215, @ruiu wrote:

LGTM with a nit.

Will do, thanks !

This revision was automatically updated to reflect the committed changes.