Page MenuHomePhabricator

[ELF] Fix "TLS attribute mismatch" false positives for STT_NOTYPE undefined symbols
ClosedPublic

Authored by MaskRay on Apr 18 2020, 4:33 PM.

Details

Summary

D13550 added the diagnostic to address/work around a crash.
The rule was refined by D19836 (test/ELF/tls-archive.s) to exclude Lazy symbols.

https://bugs.llvm.org/show_bug.cgi?id=45598 reported another case where the current logic has a false positive:

Bitcode does not record undefined module-level inline assembly symbols
(IRSymtab.cpp:Builder::addSymbol). Such an undefined symbol does not
have the FB_tls bit and lld will not consider it STT_TLS. When the symbol is
later replaced by a STT_TLS Defined, lld will error "TLS attribute mismatch".

This patch fixes this false positive by allowing a STT_NOTYPE undefined
symbol to be replaced by a STT_TLS.

Considered alternative:

Moving the diagnostics to scanRelocs() can improve the diagnostics (PR36049)
but that requires a fair amount of refactoring. We will need more
RelExpr members. It requires more thoughts whether it is worthwhile.

See test/ELF/tls-mismatch.s for behavior differences. We will fail to
diagnose a likely runtime bug (STT_NOTYPE non-TLS relocation referencing
a TLS definition). This is probably acceptable because compiler
generated code sets symbol types properly.

Diff Detail

Event Timeline

MaskRay created this revision.Apr 18 2020, 4:33 PM
MaskRay edited the summary of this revision. (Show Details)Apr 18 2020, 4:37 PM
MaskRay updated this revision to Diff 258564.Apr 18 2020, 5:05 PM

Improve test

I agree that the diagnostic where it currently lives is not ideal. I'm not comfortable with removing it completely as over the years I've seen several compiler bugs detected before release by this type of error; TLS variables are almost always accessed via C-code so we don't tend to see mistakes unless there is a bug. Is there another place that we can put this warning and get most of the benefit? For example if we added the check in scanRelocs the symbol combination would have completed so if there was a non-TLS symbol reference (via bitcode or assembly) that was replaced by a TLS definition we would accept that. However if the definition were non-TLS we would.

grimar added a comment.EditedApr 20 2020, 2:20 AM

I'd also try to improve it somehow rather than dropping it. (Because of a possible runtime error concern). Perhaps introducing a new option to disable this diagnostic would make sense?
Btw, do you know what is the behavior of modern bfd/gold and/or their plans?

MaskRay added a comment.EditedApr 20 2020, 7:37 AM

For example if we added the check in scanRelocs the symbol combination would have completed so if there was a non-TLS symbol reference (via bitcode or assembly) that was replaced by a TLS definition we would accept that. However if the definition were non-TLS we would.

(1)The example I intend to add to lld/test/ELF/tls-mismatch.s (below) and (2) https://bugs.llvm.org/show_bug.cgi?id=45600 suggests that testing STT_TLS is not suitable.

# RUN: echo '.globl tls1' | llvm-mc -filetype=obj -triple=x86_64 - -o %t2.o
# RUN: ld.lld %t2.o %t.o -o /dev/null

We can add the diagnostic back when (2) is implemented (likely via a new RelExpr member) (My experience is that this diagnostic has not detected anything.)

Btw, do you know what is the behavior of modern bfd/gold and/or their plans?

GNU ld does not check TLS mismatching. gold has a symbol '%s' used as both __thread and non-__thread error but its use case is a bit different (despite the similarity of code).

For example if we added the check in scanRelocs the symbol combination would have completed so if there was a non-TLS symbol reference (via bitcode or assembly) that was replaced by a TLS definition we would accept that. However if the definition were non-TLS we would.

(1)The example I intend to add to lld/test/ELF/tls-mismatch.s (below) and (2) https://bugs.llvm.org/show_bug.cgi?id=45600 suggests that testing STT_TLS is not suitable.

# RUN: echo '.globl tls1' | llvm-mc -filetype=obj -triple=x86_64 - -o %t2.o
# RUN: ld.lld %t2.o %t.o -o /dev/null

We can add the diagnostic back when (2) is implemented (likely via a new RelExpr member) (My experience is that this diagnostic has not detected anything.)

There is at least one that I know of that was caught by a link time error in LLVM D43860 the linker was BFD

/usr/bin/aarch64-linux-gnu-ld: /tmp/test2-67df6a.o(.debug_info+0x37): R_AARCH64_ABS64 used with TLS symbol x

At a glance it looks like that transformation of using an STT_SECTION symbol is not legal in ELF. I can see that an STT_SECTION symbol for a section with SHF_TLS could be considered equivalent to a local STT_TLS symbol, and LLD could be modified to handle it that way; but that isn't what the spec says and other linkers could have special case handling of TLS symbols for STT_TLS symbols and even if they don't error they may not produce correct output. Apologies if I've missed something in my haste.

STT_TLS
The symbol specifies a Thread-Local Storage entity. When defined, it gives the assigned offset for the symbol, not the actual address. Symbols of type STT_TLS can be referenced by only special thread-local storage relocations and thread-local storage relocations can only reference symbols with type STT_TLS. Implementation need not support thread-local storage.

What I was thinking of was something when processing relocations
if (reloc is TLS)

check symbol is TLS and error/warn if not

else

check symbol is not TLS and error/warn if not

By that point we need to know if the symbol is TLS (reference from bitcode should be replaced by TLS definition). That should support the use case in PR45598. Looking at PR36049 I don't think that assigning a TLS variable to an absolute value in a script or defsym makes sense and I don't think BFD or gold are particularly helpful in providing guidance. Assigning a TLS variable to another tls2=tls1 can make sense and Gold and BFD do the right thing and make tls2 STT_TLS. I think we could fix that.

Btw, do you know what is the behavior of modern bfd/gold and/or their plans?

GNU ld does not check TLS mismatching. gold has a symbol '%s' used as both __thread and non-__thread error but its use case is a bit different (despite the similarity of code).

https://bugs.llvm.org/show_bug.cgi?id=21077 suggests that BFD at least used to for AArch64

/usr/bin/aarch64-linux-gnu-ld: /tmp/test2-67df6a.o(.debug_info+0x37): R_AARCH64_ABS64 used with TLS symbol x

A grep for "used with TLS" brings up aarch64, arm, m68k, xtensa, and PPC backends with this error.

MaskRay added a comment.EditedApr 20 2020, 10:05 AM

For example if we added the check in scanRelocs the symbol combination would have completed so if there was a non-TLS symbol reference (via bitcode or assembly) that was replaced by a TLS definition we would accept that. However if the definition were non-TLS we would.

(1)The example I intend to add to lld/test/ELF/tls-mismatch.s (below) and (2) https://bugs.llvm.org/show_bug.cgi?id=45600 suggests that testing STT_TLS is not suitable.

# RUN: echo '.globl tls1' | llvm-mc -filetype=obj -triple=x86_64 - -o %t2.o
# RUN: ld.lld %t2.o %t.o -o /dev/null

We can add the diagnostic back when (2) is implemented (likely via a new RelExpr member) (My experience is that this diagnostic has not detected anything.)

There is at least one that I know of that was caught by a link time error in LLVM D43860 the linker was BFD

/usr/bin/aarch64-linux-gnu-ld: /tmp/test2-67df6a.o(.debug_info+0x37): R_AARCH64_ABS64 used with TLS symbol x

At a glance it looks like that transformation of using an STT_SECTION symbol is not legal in ELF. I can see that an STT_SECTION symbol for a section with SHF_TLS could be considered equivalent to a local STT_TLS symbol, and LLD could be modified to handle it that way; but that isn't what the spec says and other linkers could have special case handling of TLS symbols for STT_TLS symbols and even if they don't error they may not produce correct output. Apologies if I've missed something in my haste.

STT_TLS
The symbol specifies a Thread-Local Storage entity. When defined, it gives the assigned offset for the symbol, not the actual address. Symbols of type STT_TLS can be referenced by only special thread-local storage relocations and thread-local storage relocations can only reference symbols with type STT_TLS. Implementation need not support thread-local storage.

A few parties agreed to defer and thread-local storage relocations can only reference symbols with type STT_TLS to psABIs (https://groups.google.com/g/generic-abi search "TLS relocations referencing section symbols").

Rethinking about this. We can allow STT_NOTYPE, STT_SECTION and STT_TLS symbol to be overridden by a STT_TLS definition.

What I was thinking of was something when processing relocations
if (reloc is TLS)

check symbol is TLS and error/warn if not

else

check symbol is not TLS and error/warn if not

By that point we need to know if the symbol is TLS (reference from bitcode should be replaced by TLS definition). That should support the use case in PR45598. Looking at PR36049 I don't think that assigning a TLS variable to an absolute value in a script or defsym makes sense and I don't think BFD or gold are particularly helpful in providing guidance. Assigning a TLS variable to another tls2=tls1 can make sense and Gold and BFD do the right thing and make tls2 STT_TLS. I think we could fix that.

Incorporating the amendment to the ELF specification, scanRelocs can do:

if (reloc is TLS)
  error if symbol is neither of STT_TLS, STT_SECTION, STT_NOTYPE
else
  error if symbol is STT_TLS

This can improve the diagnostics because scanRelocs knows where the symbol is used (https://bugs.llvm.org/show_bug.cgi?id=36049 "Mismatched TLS error could be better").

Due to the way we handle TLS relocations: not everything is handled in handleTlsRelocation, more precisely, R_TLS/R_NEG_TLS/R_DTPREL/some needsGot relocations/R_TLSLD_PC / some R_ARM_TLS_LDO32 are not handled.

Moving the diagnostic to scanRelocs() requires some refactoring.

Btw, do you know what is the behavior of modern bfd/gold and/or their plans?

GNU ld does not check TLS mismatching. gold has a symbol '%s' used as both __thread and non-__thread error but its use case is a bit different (despite the similarity of code).

https://bugs.llvm.org/show_bug.cgi?id=21077 suggests that BFD at least used to for AArch64

/usr/bin/aarch64-linux-gnu-ld: /tmp/test2-67df6a.o(.debug_info+0x37): R_AARCH64_ABS64 used with TLS symbol x

A grep for "used with TLS" brings up aarch64, arm, m68k, xtensa, and PPC backends with this error.

Thanks for checking. I just noticed that the x86 backend does not have the diagnostic. The backend differences have been consistently causing problems.

https://sourceware.org/pipermail/binutils/2020-March/000358.html "Current BFD linker delegates most, if not all, of relocation to each backend. There are many similar codes in backends. I am sharing as much codes between i386 and x86-64 backends as I can."

Thanks for the update. If the gABI is going in that direction (let's hope Cary can get the new version in a place where it can be updated fairly soon) then I'm happy about that part. I think that if we are able to keep the error where it makes sense such as in scanRelocs then I'm happy to remove it from where it is currently, which I agree is not ideally placed to avoid false positives.

I think the link to the gABI is https://groups.google.com/forum/#!topic/generic-abi/dJ4_Y78aQ2M I think the m/WKggp9fKCgAJ part is specific to your login.

MaskRay added a comment.EditedApr 20 2020, 10:57 AM

Thanks for the update. If the gABI is going in that direction (let's hope Cary can get the new version in a place where it can be updated fairly soon) then I'm happy about that part. I think that if we are able to keep the error where it makes sense such as in scanRelocs then I'm happy to remove it from where it is currently, which I agree is not ideally placed to avoid false positives.

I think the link to the gABI is https://groups.google.com/forum/#!topic/generic-abi/dJ4_Y78aQ2M I think the m/WKggp9fKCgAJ part is specific to your login.

Sorry, I just edited my comment (after re-reading TLS relocations referencing section symbols). The new content is very different from when I made the comment initially.

If we are going to retain the "TLS mismatch" diagnostics in Symbols.h, we need to preclude STT_TLS, STT_SECTION (reasonable symbols types from regular object files), STT_NOTYPE (from bitcode module-level inline assembly).

If we are going to move the "TLS mismatch" diagnostics to scanRelocs. We need to refactor handleTlsRelocations (which requires a considerable amount of work), we may need to split some RelExpr members (R_GOT_PC -> non-TLS / TLS). We will hit the the limit of 64 RelExpr members. We can have location information for https://bugs.llvm.org/show_bug.cgi?id=36049

MaskRay updated this revision to Diff 258799.Apr 20 2020, 11:10 AM
MaskRay retitled this revision from [ELF] Delete "TLS attribute mismatch" diagnostics to [ELF] Temporarily delete "TLS attribute mismatch" diagnostics.
MaskRay edited the summary of this revision. (Show Details)

Edit description

MaskRay updated this revision to Diff 258800.Apr 20 2020, 11:17 AM
MaskRay edited the summary of this revision. (Show Details)
MaskRay removed a subscriber: dexonsmith.

Edit description

Harbormaster completed remote builds in B53988: Diff 258800.
MaskRay updated this revision to Diff 258907.EditedApr 20 2020, 11:24 PM
MaskRay retitled this revision from [ELF] Temporarily delete "TLS attribute mismatch" diagnostics to [ELF] Fix "TLS attribute mismatch" false positives for STT_NOTYPE undefined symbols.
MaskRay edited the summary of this revision. (Show Details)

After more thoughts, I think we should retain the diagnostic but add an exception that STT_NOTYPE undefined symbol should be allowed to be replaced.

The existing wording is not ideal because TLS relocations should support some non-TLS symbols, e.g. STT_SECTION, and probably also STT_NOTYPE undefined.

Doing the diagnostic in scanRelocs may be overkill.

In GNU ld, bfd/elflink.c _bfd_elf_merge_symbol detects several TLS mismatch problems.

      if (tdef && ntdef)
	_bfd_error_handler
	  /* xgettext:c-format */
	  (_("%s: TLS definition in %pB section %pA "
	     "mismatches non-TLS definition in %pB section %pA"),
	   h->root.root.string, tbfd, tsec, ntbfd, ntsec);
      else if (!tdef && !ntdef)
	_bfd_error_handler
	  /* xgettext:c-format */
	  (_("%s: TLS reference in %pB "
	     "mismatches non-TLS reference in %pB"),
	   h->root.root.string, tbfd, ntbfd);
      else if (tdef)
	_bfd_error_handler
	  /* xgettext:c-format */
	  (_("%s: TLS definition in %pB section %pA "
	     "mismatches non-TLS reference in %pB"),
	   h->root.root.string, tbfd, tsec, ntbfd);
      else
	_bfd_error_handler
	  /* xgettext:c-format */
	  (_("%s: TLS reference in %pB "
	     "mismatches non-TLS definition in %pB section %pA"),
	   h->root.root.string, tbfd, ntbfd, ntsec);

If we are going to improve the diagnostics (PR36049), we should extract the code from Symbol::replace to Symbol::mergeProperties. tls-mismatch.s has a test case where we currently fail to flag an incorrect use case.

tls-in-archive.s and tls-weak-undef.s contain two other interesting use cases which we should not error. I have not unified them into tls-mismatch.s yet.

MaskRay updated this revision to Diff 258908.Apr 20 2020, 11:30 PM

Fix a comment

psmith accepted this revision.Apr 21 2020, 12:26 AM

Thanks for the update, I'm much more comfortable with this version.

This revision is now accepted and ready to land.Apr 21 2020, 12:26 AM
grimar accepted this revision.Apr 21 2020, 2:05 AM

Looks safer to me too.

This revision was automatically updated to reflect the committed changes.