This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Don't align PT_TLS's p_memsz
ClosedPublic

Authored by MaskRay on May 17 2019, 4:16 AM.

Details

Summary

The code was added in r252352, probably to address some layout
issues. Actually PT_TLS's p_memsz doesn't need to be aligned. ld.bfd
doesn't do that.

In case of larger alignment (e.g. 64 for Android Bionic on AArch64, see
D62055), this may make the overhead much smaller.

Diff Detail

Repository
rL LLVM

Event Timeline

MaskRay created this revision.May 17 2019, 4:16 AM
ruiu accepted this revision.May 17 2019, 4:59 AM

LGTM

This revision is now accepted and ready to land.May 17 2019, 4:59 AM
This revision was automatically updated to reflect the committed changes.

I had suggested doing something like this at https://reviews.llvm.org/D53906#1326683. I like the idea, but I think we need to fix up the variant 2 case in getTlsTpOffset:

case EM_386:
case EM_X86_64:
  // Variant 2. The TLS segment is located just before the thread pointer.
  return -Out::TlsPhdr->p_memsz;

e.g. If the TLS segment is 4-bytes in size with 8-byte alignment (and assuming p_vaddr % p_align == 0), then the segment will be located at TP-8 on variant 2 targets, not TP-4.

I haven't tested this, but something like this would probably work:

return -alignTo(Out::TlsPhdr->p_memsz, Out::TlsPhdr->p_align);

Otherwise, lld and the loader will disagree about the location of TLS symbols.

I had suggested doing something like this at https://reviews.llvm.org/D53906#1326683. I like the idea, but I think we need to fix up the variant 2 case in getTlsTpOffset:

case EM_386:
case EM_X86_64:
  // Variant 2. The TLS segment is located just before the thread pointer.
  return -Out::TlsPhdr->p_memsz;

e.g. If the TLS segment is 4-bytes in size with 8-byte alignment (and assuming p_vaddr % p_align == 0), then the segment will be located at TP-8 on variant 2 targets, not TP-4.

I haven't tested this, but something like this would probably work:

return -alignTo(Out::TlsPhdr->p_memsz, Out::TlsPhdr->p_align);

Otherwise, lld and the loader will disagree about the location of TLS symbols.

Thanks for pointing this out! I pushed a quick fix. To be more precise, it should be:

diff --git c/ELF/InputSection.cpp w/ELF/InputSection.cpp
index 72a2c298d..2eb9a0452 100644
--- c/ELF/InputSection.cpp
+++ w/ELF/InputSection.cpp
@@ -594,11 +594,13 @@ static int64_t getTlsTpOffset() {
     // NB: While the ARM/AArch64 ABI formally has a 2-word TCB size, lld
     // effectively increases the TCB size to 8 words for Android compatibility.
     // It accomplishes this by increasing the segment's alignment.
-    return alignTo(Config->Wordsize * 2, Out::TlsPhdr->p_align);
+    return alignTo(Config->Wordsize * 2, Out::TlsPhdr->p_align, Out::TlsPhdr->FirstSec->Addr);
   case EM_386:
   case EM_X86_64:
     // Variant 2. The TLS segment is located just before the thread pointer.
-    return -alignTo(Out::TlsPhdr->p_memsz, Out::TlsPhdr->p_align);
+    return -(Out::TlsPhdr->p_memsz +
+             (-Out::TlsPhdr->p_memsz - Out::TlsPhdr->FirstSec->Addr &
+              Out::TlsPhdr->p_align - 1));
   case EM_PPC64:
     // The thread pointer points to a fixed offset from the start of the
     // executable's TLS segment. An offset of 0x7000 allows a signed 16-bit

The formulae are complex because they take p_vaddr%p_align!=0 into account... I didn't do that in the commit because I cannot think of a way to make p_vaddr%p_align!=0 (after we remove the ARM/AArch64 overalignment hack).

I had suggested doing something like this at https://reviews.llvm.org/D53906#1326683. I like the idea, but I think we need to fix up the variant 2 case in getTlsTpOffset:

case EM_386:
case EM_X86_64:
  // Variant 2. The TLS segment is located just before the thread pointer.
  return -Out::TlsPhdr->p_memsz;

e.g. If the TLS segment is 4-bytes in size with 8-byte alignment (and assuming p_vaddr % p_align == 0), then the segment will be located at TP-8 on variant 2 targets, not TP-4.

I haven't tested this, but something like this would probably work:

return -alignTo(Out::TlsPhdr->p_memsz, Out::TlsPhdr->p_align);

Otherwise, lld and the loader will disagree about the location of TLS symbols.

Thanks for pointing this out! I pushed a quick fix. To be more precise, it should be:

diff --git c/ELF/InputSection.cpp w/ELF/InputSection.cpp
index 72a2c298d..2eb9a0452 100644
--- c/ELF/InputSection.cpp
+++ w/ELF/InputSection.cpp
@@ -594,11 +594,13 @@ static int64_t getTlsTpOffset() {
     // NB: While the ARM/AArch64 ABI formally has a 2-word TCB size, lld
     // effectively increases the TCB size to 8 words for Android compatibility.
     // It accomplishes this by increasing the segment's alignment.
-    return alignTo(Config->Wordsize * 2, Out::TlsPhdr->p_align);
+    return alignTo(Config->Wordsize * 2, Out::TlsPhdr->p_align, Out::TlsPhdr->FirstSec->Addr);
   case EM_386:
   case EM_X86_64:
     // Variant 2. The TLS segment is located just before the thread pointer.
-    return -alignTo(Out::TlsPhdr->p_memsz, Out::TlsPhdr->p_align);
+    return -(Out::TlsPhdr->p_memsz +
+             (-Out::TlsPhdr->p_memsz - Out::TlsPhdr->FirstSec->Addr &
+              Out::TlsPhdr->p_align - 1));
   case EM_PPC64:
     // The thread pointer points to a fixed offset from the start of the
     // executable's TLS segment. An offset of 0x7000 allows a signed 16-bit

The formulae are complex because they take p_vaddr%p_align!=0 into account... I didn't do that in the commit because I cannot think of a way to make p_vaddr%p_align!=0 (after we remove the ARM/AArch64 overalignment hack).

@rprichard @ruiu I missed another two cases for x86-32: gd to le relaxation and @tpoff... rLLD361088 now.

Anyway, I think with recent llvm/lld changes, it is low overhead to implement the workaround in crtbegin*.o now.

.section .tdata,"awT",@progbits
.p2align 8
.zero 1 # delete if you don't care about ld.bfd (ld.bfd strips empty PT_TLS)

.text
_start:
  .reloc 0, R_AARCH64_NONE, .tdata

I had suggested doing something like this at https://reviews.llvm.org/D53906#1326683. I like the idea, but I think we need to fix up the variant 2 case in getTlsTpOffset:

case EM_386:
case EM_X86_64:
  // Variant 2. The TLS segment is located just before the thread pointer.
  return -Out::TlsPhdr->p_memsz;

e.g. If the TLS segment is 4-bytes in size with 8-byte alignment (and assuming p_vaddr % p_align == 0), then the segment will be located at TP-8 on variant 2 targets, not TP-4.

I haven't tested this, but something like this would probably work:

return -alignTo(Out::TlsPhdr->p_memsz, Out::TlsPhdr->p_align);

Otherwise, lld and the loader will disagree about the location of TLS symbols.

Thanks for pointing this out! I pushed a quick fix. To be more precise, it should be:

diff --git c/ELF/InputSection.cpp w/ELF/InputSection.cpp
index 72a2c298d..2eb9a0452 100644
--- c/ELF/InputSection.cpp
+++ w/ELF/InputSection.cpp
@@ -594,11 +594,13 @@ static int64_t getTlsTpOffset() {
     // NB: While the ARM/AArch64 ABI formally has a 2-word TCB size, lld
     // effectively increases the TCB size to 8 words for Android compatibility.
     // It accomplishes this by increasing the segment's alignment.
-    return alignTo(Config->Wordsize * 2, Out::TlsPhdr->p_align);
+    return alignTo(Config->Wordsize * 2, Out::TlsPhdr->p_align, Out::TlsPhdr->FirstSec->Addr);
   case EM_386:
   case EM_X86_64:
     // Variant 2. The TLS segment is located just before the thread pointer.
-    return -alignTo(Out::TlsPhdr->p_memsz, Out::TlsPhdr->p_align);
+    return -(Out::TlsPhdr->p_memsz +
+             (-Out::TlsPhdr->p_memsz - Out::TlsPhdr->FirstSec->Addr &
+              Out::TlsPhdr->p_align - 1));
   case EM_PPC64:
     // The thread pointer points to a fixed offset from the start of the
     // executable's TLS segment. An offset of 0x7000 allows a signed 16-bit

The formulae are complex because they take p_vaddr%p_align!=0 into account... I didn't do that in the commit because I cannot think of a way to make p_vaddr%p_align!=0 (after we remove the ARM/AArch64 overalignment hack).

I'm just working on some internal testing in relation to this. It's quite possible to trigger this behaviour using a linker script. All you need is for the PT_TLS segment to appear somewhere other than at the front of its PT_LOAD segment, with data before it. In my experiment, I had an 8-byte section first, followed by an 8-byte sized, 8-byte aligned .tdata and 16-byte aligned .tbss. This resulted in a PT_TLS appearing at a p_vaddr % p_align != 0 position.

I'm posting this because I wasn't sure if a test case has since been added, or whether I should just try posting my example as a test case. @MaskRay, what do you think?

I had suggested doing something like this at https://reviews.llvm.org/D53906#1326683. I like the idea, but I think we need to fix up the variant 2 case in getTlsTpOffset:

case EM_386:
case EM_X86_64:
  // Variant 2. The TLS segment is located just before the thread pointer.
  return -Out::TlsPhdr->p_memsz;

e.g. If the TLS segment is 4-bytes in size with 8-byte alignment (and assuming p_vaddr % p_align == 0), then the segment will be located at TP-8 on variant 2 targets, not TP-4.

I haven't tested this, but something like this would probably work:

return -alignTo(Out::TlsPhdr->p_memsz, Out::TlsPhdr->p_align);

Otherwise, lld and the loader will disagree about the location of TLS symbols.

Thanks for pointing this out! I pushed a quick fix. To be more precise, it should be:

diff --git c/ELF/InputSection.cpp w/ELF/InputSection.cpp
index 72a2c298d..2eb9a0452 100644
--- c/ELF/InputSection.cpp
+++ w/ELF/InputSection.cpp
@@ -594,11 +594,13 @@ static int64_t getTlsTpOffset() {
     // NB: While the ARM/AArch64 ABI formally has a 2-word TCB size, lld
     // effectively increases the TCB size to 8 words for Android compatibility.
     // It accomplishes this by increasing the segment's alignment.
-    return alignTo(Config->Wordsize * 2, Out::TlsPhdr->p_align);
+    return alignTo(Config->Wordsize * 2, Out::TlsPhdr->p_align, Out::TlsPhdr->FirstSec->Addr);
   case EM_386:
   case EM_X86_64:
     // Variant 2. The TLS segment is located just before the thread pointer.
-    return -alignTo(Out::TlsPhdr->p_memsz, Out::TlsPhdr->p_align);
+    return -(Out::TlsPhdr->p_memsz +
+             (-Out::TlsPhdr->p_memsz - Out::TlsPhdr->FirstSec->Addr &
+              Out::TlsPhdr->p_align - 1));
   case EM_PPC64:
     // The thread pointer points to a fixed offset from the start of the
     // executable's TLS segment. An offset of 0x7000 allows a signed 16-bit

The formulae are complex because they take p_vaddr%p_align!=0 into account... I didn't do that in the commit because I cannot think of a way to make p_vaddr%p_align!=0 (after we remove the ARM/AArch64 overalignment hack).

I'm just working on some internal testing in relation to this. It's quite possible to trigger this behaviour using a linker script. All you need is for the PT_TLS segment to appear somewhere other than at the front of its PT_LOAD segment, with data before it. In my experiment, I had an 8-byte section first, followed by an 8-byte sized, 8-byte aligned .tdata and 16-byte aligned .tbss. This resulted in a PT_TLS appearing at a p_vaddr % p_align != 0 position.

I'm posting this because I wasn't sure if a test case has since been added, or whether I should just try posting my example as a test case. @MaskRay, what do you think?

Adding back the code does not break any test, so there is no. If you are going to add a test, probably reuse an existing tls test and check PT_TLS.

lld/trunk/ELF/Writer.cpp