This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Make dot in .tbss correct
ClosedPublic

Authored by MaskRay on Jul 30 2021, 7:22 PM.

Details

Summary

GNU ld doesn't support multiple SHF_TLS SHT_NOBITS output sections (it restores
the address after an SHF_TLS SHT_NOBITS section, so consecutive SHF_TLS
SHT_NOBITS sections will have conflicting address ranges).

That said, threadBssOffset implements limited support for consecutive SHF_TLS
SHT_NOBITS sections. (SHF_TLS SHT_PROGBITS following a SHF_TLS SHT_NOBITS can still be
incorrect.)

. in an output section description of an SHF_TLS SHT_NOBITS section is
incorrect. (https://lists.llvm.org/pipermail/llvm-dev/2021-July/151974.html)

This patch saves the end address of the previous tbss section in
ctx->tbssAddr, changes dot in the beginning of assignOffset so
that . evaluation will be correct.

Diff Detail

Event Timeline

MaskRay created this revision.Jul 30 2021, 7:22 PM
MaskRay requested review of this revision.Jul 30 2021, 7:22 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 30 2021, 7:22 PM
MaskRay updated this revision to Diff 363254.Jul 30 2021, 7:29 PM

accidentally lost LinkerScript.h change

MaskRay edited the summary of this revision. (Show Details)Jul 30 2021, 7:31 PM
noah95 added a subscriber: noah95.Aug 1 2021, 1:12 AM

Just got back from a vacation, will try and take a look as soon as I can, I've got this D107235, and D107234 on the to-do list.

I'm a little nervous about this change as is. I think that if we only support one SHT_NOBITS SHF_TLS OutputSection then we should give an error message if there is more than one. Perhaps detect this in the PT_TLS output code. My main concern is about the the internal default (no linker script) and orphan TLS sections that don't have the .tbss. prefix. IIUC these will be placed after .tbss and will be covered by the PT_TLS program header but we will silently give these the wrong address. I'm sure these will be rare, but without an error message we may silently break some programs. I'm not as concerned about the linker script case as the onus is on the author to get the patterns right so that there are no orphans.

As a thought, could we keep with a slight modification?

bool isTbss =
    (ctx->outSec->flags & SHF_TLS) && ctx->outSec->type == SHT_NOBITS;
uint64_t start = isTbss ? dot + ctx->threadBssOffset : dot;
start = alignTo(start, alignment);
uint64_t end = start + size;

if (isTbss)
  ctx->threadBssOffset = end - dot;

dot = end;
return end;

In this way we set dot, which should permit the linker script to define symbols within the same output section, but dot will then get reset by your new code

// NOBITS TLS sections reset the location as well. Having another TLS section
// below sec is unsupported and regarded as user error.

so the threadBssOffset should be as it was previously.

MaskRay updated this revision to Diff 363813.Aug 3 2021, 11:19 AM
MaskRay retitled this revision from [ELF] Make dot in .tbss correct and drop threadBssOffset to [ELF] Make dot in .tbss correct.
MaskRay edited the summary of this revision. (Show Details)

address comments

I'm a little nervous about this change as is. I think that if we only support one SHT_NOBITS SHF_TLS OutputSection then we should give an error message if there is more than one. Perhaps detect this in the PT_TLS output code. My main concern is about the the internal default (no linker script) and orphan TLS sections that don't have the .tbss. prefix. IIUC these will be placed after .tbss and will be covered by the PT_TLS program header but we will silently give these the wrong address. I'm sure these will be rare, but without an error message we may silently break some programs. I'm not as concerned about the linker script case as the onus is on the author to get the patterns right so that there are no orphans.

LG to keep the multiple tbss mechanism.

As a thought, could we keep with a slight modification?

bool isTbss =
    (ctx->outSec->flags & SHF_TLS) && ctx->outSec->type == SHT_NOBITS;
uint64_t start = isTbss ? dot + ctx->threadBssOffset : dot;
start = alignTo(start, alignment);
uint64_t end = start + size;

if (isTbss)
  ctx->threadBssOffset = end - dot;

dot = end;
return end;

In this way we set dot, which should permit the linker script to define symbols within the same output section, but dot will then get reset by your new code

// NOBITS TLS sections reset the location as well. Having another TLS section
// below sec is unsupported and regarded as user error.

so the threadBssOffset should be as it was previously.

LinkerScript::getSymbolValue does not read ctx->threadBssOffset. I tried repairing it (see below) but ended up with the current diff

 ExprValue LinkerScript::getSymbolValue(StringRef name, const Twine &loc) {
   if (name == ".") {
-    if (ctx)
-      return {ctx->outSec, false, dot - ctx->outSec->addr, loc};
-    error(loc + ": unable to get location counter value");
-    return 0;
+    if (!ctx) {
+      error(loc + ": unable to get location counter value");
+      return 0;
+    }
+    if ((ctx->outSec->flags & SHF_TLS) && ctx->outSec->type == SHT_NOBITS) {
+      uint64_t offset =
+          ctx->threadBssOffset -
+          (Out::tlsPhdr ? ctx->outSec->addr - Out::tlsPhdr->firstSec->addr : 0);
+      return {ctx->outSec, false, offset, loc};
+    }
+    return {ctx->outSec, false, dot - ctx->outSec->addr, loc};
   }
peter.smith accepted this revision.Aug 4 2021, 1:19 AM

Thanks for the update, LGTM.

This revision is now accepted and ready to land.Aug 4 2021, 1:19 AM
This revision was automatically updated to reflect the committed changes.