Page MenuHomePhabricator

[ELF] Resolve R_DTPREL in .debug_* referencing discarded symbols to -1
ClosedPublic

Authored by MaskRay on Jun 30 2020, 11:38 AM.

Details

Summary

The location of a TLS variable is encoded as a DW_OP_const4u/DW_OP_const8u
followed by a DW_OP_push_tls_address (or DW_OP_GNU_push_tls_address https://sourceware.org/bugzilla/show_bug.cgi?id=11616 ).

This change follows up to D81784 and makes relocations types generalized as
R_DTPREL (e.g. R_X86_64_DTPOFF{32,64}, R_PPC64_DTPREL64) use -1 as the
tombstone value as well. This works for both TLS Variant I and Variant II
architectures.

  • arm: .long tls(tlsldo) # not working currently (R_ARM_TLS_LDO32 is R_ABS)
  • mips64: .dtpreldword tls+32768
  • ppc64: .quad tls@DTPREL+0x8000
  • riscv: neither GCC nor clang has implemented DW_AT_location. It is likely .long/.quad tls@dtprel+0x800
  • x86-32: .long tls@DTPOFF
  • x86-64: .long tls@DTPOFF; .quad tls@DTPOFF

tls has a non-negative st_value, so such relocations (st_value+addend)
never resolve to -1 in a normal (not discarded) case.

// clang -fuse-ld=lld -g -ffunction-sections a.c -Wl,--gc-sections
// foo and tls will be discarded by --gc-sections.
// DW_AT_location [DW_FORM_exprloc] (DW_OP_const8u 0xffffffffffffffff, DW_OP_GNU_push_tls_address)
thread_local int tls;
int foo() { return ++tls; }
int main() {}

Also, drop logic added in D26201 intended to address PR30793. It added a test
(gc-debuginfo-tls.s) using a non-SHF_ALLOC section and a local symbol, which
does not reflect the intended scenario: a relocation in a SHF_ALLOC section
referencing a discarded non-local symbol. For such a non .debug_* section, just
emit an error.

Diff Detail

Event Timeline

MaskRay created this revision.Jun 30 2020, 11:38 AM
Herald added a project: Restricted Project. · View Herald Transcript
MaskRay updated this revision to Diff 274578.Jun 30 2020, 12:07 PM

Delete gc-debuginfo-tls.s (the test has nothing to do with TLS so the name is inappropriate) and add gc-sections-tls.s

MaskRay updated this revision to Diff 274580.Jun 30 2020, 12:11 PM

Improve test

Sorry, I don't follow why we can't just use this approach for all relocations referencing dead allocatable code/data? We should probably only get certain relocations, but I could see us either a) easily missing another relevant one, or b) not fixing up things when a new relocation gets introduced in the future.

The testing of the change looks fine to me.

lld/ELF/InputSection.cpp
922

typed -> type

Sorry, I don't follow why we can't just use this approach for all relocations referencing dead allocatable code/data? We should probably only get certain relocations, but I could see us either a) easily missing another relevant one, or b) not fixing up things when a new relocation gets introduced in the future.

To add to this, reading up on the TLS documentation, at least for X86_64, it is possible to sometimes use R_X86_64_32 instead of R_X86_64_DTPOFF32, although I don't know if that situation would ever occur in debug data.

psmith added inline comments.Jul 1 2020, 1:36 PM
lld/ELF/InputSection.cpp
910

At the moment this won't work for Arm. The debug relocation is R_ARM_TLS_LDO32 which currently maps to the expr R_ABS. I don't know whether that is a problem in ARM.cpp. The only difference I can see is that R_DTPREL is relaxed which we don't want to do on Arm. It maybe that it is best to make R_ARM_TLS_LDO32 (S + A - TLS) to R_DTPREL and then prevent the relaxation for Arm as it is a bit more obvious.

I'll try and run some tests tomorrow.

MaskRay added a comment.EditedJul 1 2020, 2:25 PM

Sorry, I don't follow why we can't just use this approach for all relocations referencing dead allocatable code/data? We should probably only get certain relocations, or b) not fixing up things when a new relocation gets introduced in the future.

If we want to support R_X86_64_32, we'll need to truncate UINT64_MAX (R_X86_64_32 calls checkUInt(..., 32), UINT64_MAX will cause a relocation error), i.e. we'll need some code somewhere. I'd rather do it with existing type == target->symbolicRel.

but I could see us either a) easily missing another relevant one

No, there is not a need for another R_ABS relocation. For new relocation types, we should evaluate them, rather than being loose and using -1 for every new relocation type.

To add to this, reading up on the TLS documentation, at least for X86_64, it is possible to sometimes use R_X86_64_32 instead of R_X86_64_DTPOFF32, although I don't know if that situation would ever occur in debug data.

R_X86_64_{32,64} referencing a STT_TLS is incorrect.

  • R_X86_64_64: load_base + st_value + r_addend
  • R_X86_64_DTPOFFSET{32,64}: st_value + r_addend - DTP_OFFSET

DTP_OFFSET is musl term. glibc calls it TLS_DTV_OFFSET. On x86-64, DTP_OFFSET is 0 so mixing is correct if

source: http://git.musl-libc.org/cgit/musl/tree/ldso/dynlink.c#n454 I am confident with the code as I've reported a musl issue in this area and fixed a few TLS bugs in LLD.

MaskRay edited the summary of this revision. (Show Details)Jul 1 2020, 2:25 PM
MaskRay marked an inline comment as done.Jul 1 2020, 2:28 PM
MaskRay added inline comments.
lld/ELF/InputSection.cpp
910

Thanks for spotting this. R_DTPREL will work. ARM does not have a DTP offset (like ppc/mips/m68k) (TLS_DTV_OFFSET in glibc)

Added:

arm: .long tls(tlsldo) # not working currently (R_ARM_TLS_LDO32 is R_ABS)

Sorry, I don't follow why we can't just use this approach for all relocations referencing dead allocatable code/data? We should probably only get certain relocations, or b) not fixing up things when a new relocation gets introduced in the future.

If we want to support R_X86_64_32, we'll need to truncate UINT64_MAX (R_X86_64_32 calls checkUInt(..., 32), UINT64_MAX will cause a relocation error), i.e. we'll need some code somewhere. I'd rather do it with existing type == target->symbolicRel.

I see your point, but isn't the truncation going to be required for R_X86_64_DTPOFF32 anyway? Related aside: could you make sure that code path is tested too, please, since it's a real use-case, and highlighted a bug in my local attempt at fixing what this patch fixes. More generally, if you add the truncation code, you could drop the relocation check entirely, so you're not really changing the code complexity, whilst improving the code robustness in my opinion.

but I could see us either a) easily missing another relevant one

No, there is not a need for another R_ABS relocation. For new relocation types, we should evaluate them, rather than being loose and using -1 for every new relocation type.

To add to this, reading up on the TLS documentation, at least for X86_64, it is possible to sometimes use R_X86_64_32 instead of R_X86_64_DTPOFF32, although I don't know if that situation would ever occur in debug data.

R_X86_64_{32,64} referencing a STT_TLS is incorrect.

That statement is not the same as my understanding of the ps-ABI doc though. See page 76 of https://github.com/hjl-tools/x86-psABI/wiki/x86-64-psABI-1.0.pdf:

R_X86_64_DTPOFF64 and R_X86_64_DTPOFF32 compute the offset from the pointer in that entry to the referenced symbol. The linker generates such relocations in adjacent entries in the GOT, in response to R_X86_64_TLSGD and R_X86_64_TLSLD relocations. If the linker can compute the offset itself, because the referenced symbol binds locally, the relocations R_X86_64_64 and R_X86_64_32 may be used instead.

My reading of this says if a compiler knows that the symbol would be locally bound (presumably because the symbol is static/in an anonymous namespace?), it can emit an R_X86_64_32 instead of R_X86_64_DTPOFF32. That being said, although I could get gcc to emit a DTPOFF32 relocation, I wasn't able to persuade it to emit an ABS 32 relocation for the same case.

  • R_X86_64_64: load_base + st_value + r_addend
  • R_X86_64_DTPOFFSET{32,64}: st_value + r_addend - DTP_OFFSET

DTP_OFFSET is musl term. glibc calls it TLS_DTV_OFFSET. On x86-64, DTP_OFFSET is 0 so mixing is correct if

This sentence is unfinished?

MaskRay added a comment.EditedJul 2 2020, 9:31 AM

Sorry, I don't follow why we can't just use this approach for all relocations referencing dead allocatable code/data? We should probably only get certain relocations, or b) not fixing up things when a new relocation gets introduced in the future.

If we want to support R_X86_64_32, we'll need to truncate UINT64_MAX (R_X86_64_32 calls checkUInt(..., 32), UINT64_MAX will cause a relocation error), i.e. we'll need some code somewhere. I'd rather do it with existing type == target->symbolicRel.

I see your point, but isn't the truncation going to be required for R_X86_64_DTPOFF32 anyway? Related aside: could you make sure that code path is tested too, please, since it's a real use-case, and highlighted a bug in my local attempt at fixing what this patch fixes. More generally, if you add the truncation code, you could drop the relocation check entirely, so you're not really changing the code complexity, whilst improving the code robustness in my opinion.

An R_ABS relocation is conceptually unsigned (representing an address) while R_DTPREL can be considered signed (an offset added to dtpmod). R_X86_64_DTPOFF32 does not need truncation because it does not call checkUInt32.

It is tested. See .long global@dtpoff

Note that GNU ld does not allow .long foo - 1 (R_X86_64_32) if foo is 0. LLD's checkUInt32 is compatible with GNU ld.

but I could see us either a) easily missing another relevant one

No, there is not a need for another R_ABS relocation. For new relocation types, we should evaluate them, rather than being loose and using -1 for every new relocation type.

To add to this, reading up on the TLS documentation, at least for X86_64, it is possible to sometimes use R_X86_64_32 instead of R_X86_64_DTPOFF32, although I don't know if that situation would ever occur in debug data.

R_X86_64_{32,64} referencing a STT_TLS is incorrect.

That statement is not the same as my understanding of the ps-ABI doc though. See page 76 of https://github.com/hjl-tools/x86-psABI/wiki/x86-64-psABI-1.0.pdf:

R_X86_64_DTPOFF64 and R_X86_64_DTPOFF32 compute the offset from the pointer in that entry to the referenced symbol. The linker generates such relocations in adjacent entries in the GOT, in response to R_X86_64_TLSGD and R_X86_64_TLSLD relocations. If the linker can compute the offset itself, because the referenced symbol binds locally, the relocations R_X86_64_64 and R_X86_64_32 may be used instead.

My reading of this says if a compiler knows that the symbol would be locally bound (presumably because the symbol is static/in an anonymous namespace?), it can emit an R_X86_64_32 instead of R_X86_64_DTPOFF32. That being said, although I could get gcc to emit a DTPOFF32 relocation, I wasn't able to persuade it to emit an ABS 32 relocation for the same case.

This paragraph describes traditional Generic Dynamic/Local Dynamic behavior. __tls_get_addr({dtpmod, st_value-DTP_OFFSET}) = __tls_get_addr({dtpmod, 0}) + st_value-DTP_OFFSET

If __tls_get_addr({dtpmod, 0}) has been precomputed, you can add st_value - DTP_OFFSET to the value to get the address of the TLS variable. To get st_value, you can use either R_X86_64_32 or R_X86_64_DTPOFF32.
R_X86_64_DTPOFF32 can be used because on x86, DTP_OFFSET is zero.

.debug_info DW_OP_GNU_push_tls_address/DW_OP_form_tls_address is a different context. I think it is conceptually incorrectly to say that R_X86_64_32 can be used for its operand. GCC/clang do not emit R_X86_64_32.

  • R_X86_64_64: load_base + st_value + r_addend
  • R_X86_64_DTPOFFSET{32,64}: st_value + r_addend - DTP_OFFSET

DTP_OFFSET is musl term. glibc calls it TLS_DTV_OFFSET. On x86-64, DTP_OFFSET is 0 so mixing is correct if

This sentence is unfinished?

On x86-64, DTP_OFFSET is 0 so mixing can still produce correct results.

jhenderson accepted this revision.Jul 3 2020, 1:13 AM

Thanks for the explanation, LGTM, but I'd appreciate somebody else confirming things too, as I don't really follow the stuff about the other targets.

This revision is now accepted and ready to land.Jul 3 2020, 1:13 AM
psmith added a comment.Jul 3 2020, 5:28 AM

I'm ok with this as is. I'll need to check over what should be happening on Arm, but I think that can be done in a separate change.

MaskRay updated this revision to Diff 275418.Jul 3 2020, 9:43 AM
MaskRay marked an inline comment as done.
MaskRay edited the summary of this revision. (Show Details)

Adjust comment

MaskRay updated this revision to Diff 275420.Jul 3 2020, 9:50 AM

Add debug-dead-reloc-tls-arm.s

This revision was automatically updated to reflect the committed changes.