This is an archive of the discontinued LLVM Phabricator instance.

[ARM][AArch64] Revert Android Bionic PT_TLS overaligning hack
ClosedPublic

Authored by MaskRay on May 17 2019, 12:04 AM.

Details

Summary

This reverts D53906.

D53906 increased p_align of PT_TLS on ARM/AArch64 to 32/64 to make the
static TLS layout compatible with Android Bionic's ELF TLS. However,
this may cause glibc ARM/AArch64 programs to crash (see PR41527).

The faulty PT_TLS in the executable satisfies p_vaddr%p_align != 0. The
remainder is normally 0 but may be non-zero with the hack in place. The
problem is that we increase PT_TLS's p_align after OutputSections'
addresses are fixed (assignAddress()). It is possible that
p_vaddr%old_p_align = 0 while p_vaddr%new_p_align != 0.

For the thread local variable defined in the executable, lld computed
TLS offsets (local exec) are different from glibc computed TLS offsets
from another module (initial exec/generic dynamic). Note: PR41527
reported the issue of initial exec but the bug affects generic dynamic
as well.

(glibc is correct in that it compute offsets that satisfy
offset%p_align == p_vaddr%p_align, which is a basic ELF requirement.
This hack appears to work on FreeBSD rtld, musl<=1.1.22, and Bionic, but
that is just because they (and lld) incorrectly compute offsets that
satisfy offset%p_align = 0 instead.)

Android developers are fine to revert this patch, carry this patch in
their tree before figuring out a long-term solution (e.g. a dummy .tdata
with sh_addralign=64 sh_size={0,1} in crtbegin*.o files. The overhead is
now insignificant after D62059).

Event Timeline

MaskRay created this revision.May 17 2019, 12:04 AM
MaskRay updated this revision to Diff 199983.May 17 2019, 12:04 AM
MaskRay edited reviewers, added: peter.smith, rprichard, ruiu, srhines; removed: espindola.

.

Harbormaster completed remote builds in B32048: Diff 199983.
MaskRay updated this revision to Diff 199984.May 17 2019, 12:07 AM

This is correct now

MaskRay updated this revision to Diff 199985.May 17 2019, 12:11 AM
MaskRay edited the summary of this revision. (Show Details)

Reword a bit

rprichard accepted this revision.May 17 2019, 12:25 AM

I'm wondering if ruiu and/or srhines want to weigh in. I think I'm OK with reverting this change, at least for now, so that we can get the lld+glibc combination working again. It seems like a good idea to fix the https://bugs.llvm.org/show_bug.cgi?id=41527 regression in LLVM 8.0.1, which is coming up soon.

We could temporarily carry a revert of this revert in the Android tree, and I could look into doing TLS reservation using crtbegin*.o.

This revision is now accepted and ready to land.May 17 2019, 12:25 AM
MaskRay updated this revision to Diff 199987.May 17 2019, 12:48 AM
MaskRay edited the summary of this revision. (Show Details)

Explain the crash

I'm happy to go along with the Android folks decision here.

MaskRay updated this revision to Diff 200008.May 17 2019, 3:15 AM
MaskRay edited the summary of this revision. (Show Details)

Clarify: it affects generic dynamic as well (which makes it even worse)

MaskRay updated this revision to Diff 200129.May 17 2019, 7:26 PM
MaskRay edited the summary of this revision. (Show Details)

Rebase after D62059

MaskRay added a comment.EditedMay 17 2019, 7:44 PM

Now that (a) we get .reloc support (D61973 D61992), (b) R_*_NONE on Elf*_Rel targets are compatible with ld.lld -r D62052 (c) p_memsz of PT_TLS is no longer aligned (D62059; Ryan had noticed the difference in https://reviews.llvm.org/D53906#1326683 but it had not been implemented... D62059 was tricky as it broke x86-{32,64} but I think they are good now), the workaround in crtbegin_{dynamic_static}.o should be feasible and it should have very little overhead.

Just waiting for @ruiu and/or @srhines to weigh in :) This revert should immediately unbreak glibc ARM/AArch64 users. I've describe the crash scenario in the description (PR41527 says it is for initial exec, but the bug affects the generic dynamic (most common; when you access an extern __thread in a DSO) model as well):

For a thread local variable defined in the executable, lld computed TLS offset (local exec) is different from glibc computed TLS offset from another module (initial exec/generic dynamic).

Since Ryan's patch affects only executables (if (!Config->Shared && (Config->EMachine == EM_ARM || Config->EMachine == EM_AARCH64)), I assume there is some code in Bionic that fires if the executable doesn't have PT_TLS but one of the shared objects at startup (part of static TLS) has. I think that piece of code can be deleted now.

# crtbegin_{dynamic,static}.o or some other object file that defines _start. I know little about Bionic
.section .tdata,"awT",@progbits
.p2align 6  # 5 for ARM
.zero 1 # delete if you don't care about ld.bfd (ld.bfd strips empty PT_TLS)

.text
.globl _start
_start:
  ...
  .reloc 0, R_AARCH64_NONE, .tdata  # prevent --gc-sections from discarding .tdata

If you don't care about ld.bfd, the zero-sized .tdata is nearly zero overhead :) If the program doesn't use TLS, p_memsz of PT_TLS will be zero: no allocation happens.

srhines accepted this revision.May 17 2019, 7:53 PM

The revert is fine with me, although I still would like Ryan to follow up on verifying that the suggested fix for Bionic will indeed work for all cases. I think it is fine for you to merge now to unbreak things for everyone else though. Thanks for working through the problem.

Just re-confirmed that Ryan is ok to revert D53906! I'll go ahead.

This revision was automatically updated to reflect the committed changes.
ruiu added a comment.May 19 2019, 11:53 PM

I'd vote for re-submitting the original patch with a larger alignment to avoid introducing Android-specific ABI to the output, as I said in a different review thread. Is there any problem to do that?

MaskRay added a comment.EditedMay 20 2019, 12:12 AM

I'd vote for re-submitting the original patch with a larger alignment to avoid introducing Android-specific ABI to the output, as I said in a different review thread. Is there any problem to do that?

@ruiu To fix the hack isn't impossible, but that would make the code more hacky (D61824 and D61825), and some ARM/AArch users (e.g. musl) are not happy. As I described in https://reviews.llvm.org/D62055#1507426, the relevant support is all in place, the nearly zero-overhead workaround in Bionic crtbegin_{dynamic,static}.o should be straightforward now (it is strictly better than the lld patch in all aspects). I don't know why we would like to re-introduce the hack and cause churn to the 8.0.1 release.