This is an archive of the discontinued LLVM Phabricator instance.

[JITLink] Add initial native TLS support to ELFNix platform
ClosedPublic

Authored by StephenFan on Sep 5 2021, 9:05 AM.

Details

Summary

This patch use the same way as the https://reviews.llvm.org/rGfe1fa43f16beac1506a2e73a9f7b3c81179744eb to handle the thread local variable.

It allocates 2 * pointerSize space in GOT to represent the thread key and data address. Instead of using the _tls_get_addr function, I customed a function __orc_rt_elfnix_tls_get_addr to get the address of thread local varible. Currently, this is a wip patch, only one TLS relocation R_X86_64_TLSGD is supported and I need to add the corresponding test cases.

To allocate the TLS descriptor in GOT, I need to get the edge kind information in PerGraphGOTAndPLTStubBuilder, So I add a Edge::Kind K argument in some functions in PerGraphGOTAndPLTStubBuilder.h. If it is not suitable, I can think further to solve this problem.

Diff Detail

Event Timeline

StephenFan created this revision.Sep 5 2021, 9:05 AM
StephenFan requested review of this revision.Sep 5 2021, 9:05 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptSep 5 2021, 9:05 AM
Herald added subscribers: llvm-commits, Restricted Project. · View Herald Transcript
StephenFan edited the summary of this revision. (Show Details)Sep 5 2021, 9:12 AM
StephenFan updated this revision to Diff 370812.Sep 5 2021, 9:24 AM

Improve comments

MaskRay added a subscriber: MaskRay.Sep 5 2021, 9:59 AM

TLS descriptors refer to an alternative ABI to traditional general dynamic/local dynamic TLS models. The name cannot be repurposed to a different usage.

lhames added a comment.Sep 5 2021, 3:49 PM

To allocate the TLS descriptor in GOT, I need to get the edge kind information in PerGraphGOTAndPLTStubBuilder, So I add a Edge::Kind K argument in some functions in PerGraphGOTAndPLTStubBuilder.h. If it is not
suitable, I can think further to solve this problem.

Is there a good reason to put these in the GOT? I would create and manage a different section for these, rather than re-using the GOT.

Side note: Re-using the GOT in MachO is safe, but was a bit lazy. We could probably come up with a generic TableSection<T> utility that we could re-use for GOTs, PLTs, and these new thread data structures.

TLS descriptors refer to an alternative ABI to traditional general dynamic/local dynamic TLS models. The name cannot be repurposed to a different usage.

LinkGraphs are not ELF graphs so we're free to redefine terms to a certain extent, but I agree that it's good to avoid confusion where possible. Do you have a a suggested alternative?

compiler-rt/lib/orc/elfnix_platform.cpp
113

The capitalization of THread should be fixed here.

256

The capitalization of THread should be fixed here too.

MoritzS added a subscriber: MoritzS.Sep 6 2021, 1:40 AM

To allocate the TLS descriptor in GOT, I need to get the edge kind information in PerGraphGOTAndPLTStubBuilder, So I add a Edge::Kind K argument in some functions in PerGraphGOTAndPLTStubBuilder.h. If it is not
suitable, I can think further to solve this problem.

Is there a good reason to put these in the GOT? I would create and manage a different section for these, rather than re-using the GOT.

The ELF TLS spec actually describes using the GOT for that. Each TLSGD relocation is usually converted into two adjacent GOT entries with the DTPMOD64 and DTPOFF64 relocations. I think the practical reason for that is that a TLSGD relocation is 32-bit PC-relative, so using an address of the GOT will guarantee that the 32 bit offset can represent the address.

In general I would suggest not trying to resolve a TLSGD/LD relocation at all but instead converting it to a GOTTPOFF relocation. Then, no runtime function to implement __tls_get_addr is needed. You can find the detailed description of how to do this in Section 5.5 of the ELF TLS spec (https://akkadia.org/drepper/tls.pdf).

I implemented this for RuntimeDyld (D105466) so you can take a look there. Let me know where I can help!

lhames added a comment.Sep 6 2021, 6:00 AM

The ELF TLS spec actually describes using the GOT for that.

I need to find some time to read the ELF TLS spec. Unless use of the GOT is required (which seems unlikely) I'd rather put these in their own section. In the unlikely event that they really need to be mixed with GOT entries we should spin out an ELF-specific GOT pass to handle this.

I think the practical reason for that is that a TLSGD relocation is 32-bit PC-relative, so using an address of the GOT will guarantee that the 32 bit offset can represent the address.

We can count on JITLink (and the memory manager) to do this for us. It's designed to address RuntimeDyld::MemoryManager's shortcomings in that regard.

In general I would suggest not trying to resolve a TLSGD/LD relocation at all but instead converting it to a GOTTPOFF relocation. Then, no runtime function to implement __tls_get_addr is needed.

Is that compatible with adding extra code at runtime? I suspect we'll need to go the other way: convert things to function calls by default, but make it possible to use direct models where they're safe. That's speculation having not read the TLS spec though.

In general I would suggest not trying to resolve a TLSGD/LD relocation at all but instead converting it to a GOTTPOFF relocation. Then, no runtime function to implement __tls_get_addr is needed.

Is that compatible with adding extra code at runtime? I suspect we'll need to go the other way: convert things to function calls by default, but make it possible to use direct models where they're safe. That's speculation having not read the TLS spec though.

That depends on how the allocation of TLS sections is implemented, but in general you are right, I didn't think about that.

The problem I faced when implementing the TLS relocations is that we can't chose which types of TLS relocations the object files we try to link use. This means that we need to implement the GOTTPOFF/TPOFF relocations that do not go through the indirection of __tls_get_addr. They just resolve to an offset that will be added to the value stored in %fs:0. This is usually set initially by ld.so when a process starts by using arch_prctl(ARCH_SET_FS), or for newer CPUS by using the wrfsbase instruction. This means that we can't easily extend the storage of the process that runs the runtime linker to add new TLS sections since messing with the fs register will very likely lead to unexpected results. There are several possible solutions for that with different disadvantages:

  1. Don't allow GOTTPOFF/TPOFF relocations at all. This means that we will not be able to link many pre-compiled static libraries.
  2. Allow all TLS relocations. Since we can't add new TLS storage at runtime, define a global thread-local variable which will then be used by the linker to resolve GOTTPOFF/TPOFF relocations.
  3. Rewrite the relocated code so that it does not use the fs register. On Linux we could probably replace all references to fs by gs and then manually set the gs register? I'm not sure if this can break code if we change instructions that did not originally have a relocation.

I implemented 2. in llvm-rtdyld. Since this is mainly used for testing, the size of the pre-allocated TLS section is only 16 B. In a "real" program you could easily allocate an entire page without noticing any runtime overhead. This is more than enough to link an entire static glibc, for example.

In general I would suggest not trying to resolve a TLSGD/LD relocation at all but instead converting it to a GOTTPOFF relocation. Then, no runtime function to implement __tls_get_addr is needed.

Is that compatible with adding extra code at runtime? I suspect we'll need to go the other way: convert things to function calls by default, but make it possible to use direct models where they're safe. That's speculation having not read the TLS spec though.

That depends on how the allocation of TLS sections is implemented, but in general you are right, I didn't think about that.

The problem I faced when implementing the TLS relocations is that we can't chose which types of TLS relocations the object files we try to link use. This means that we need to implement the GOTTPOFF/TPOFF relocations that do not go through the indirection of __tls_get_addr. They just resolve to an offset that will be added to the value stored in %fs:0. This is usually set initially by ld.so when a process starts by using arch_prctl(ARCH_SET_FS), or for newer CPUS by using the wrfsbase instruction. This means that we can't easily extend the storage of the process that runs the runtime linker to add new TLS sections since messing with the fs register will very likely lead to unexpected results. There are several possible solutions for that with different disadvantages:

  1. Don't allow GOTTPOFF/TPOFF relocations at all. This means that we will not be able to link many pre-compiled static libraries.
  2. Allow all TLS relocations. Since we can't add new TLS storage at runtime, define a global thread-local variable which will then be used by the linker to resolve GOTTPOFF/TPOFF relocations.
  3. Rewrite the relocated code so that it does not use the fs register. On Linux we could probably replace all references to fs by gs and then manually set the gs register? I'm not sure if this can break code if we change instructions that did not originally have a relocation.

I implemented 2. in llvm-rtdyld. Since this is mainly used for testing, the size of the pre-allocated TLS section is only 16 B. In a "real" program you could easily allocate an entire page without noticing any runtime overhead. This is more than enough to link an entire static glibc, for example.

The point I wanted to make here is that if we go for 2. (or 3.), we might as well convert all TLSGD/LD relocations to TPOFF. We would implement all the logic for the TPOFF relocations anyway and if we assume that the pre-allocated TLS storage is large enough to link all object files, this will lead to less runtime overhead to access TLS variables.

To allocate the TLS descriptor in GOT, I need to get the edge kind information in PerGraphGOTAndPLTStubBuilder, So I add a Edge::Kind K argument in some functions in PerGraphGOTAndPLTStubBuilder.h. If it is not
suitable, I can think further to solve this problem.

Is there a good reason to put these in the GOT? I would create and manage a different section for these, rather than re-using the GOT.

Put these in the GOT is not necessary. I agree with you that we need to create and manage a different section for these.

Side note: Re-using the GOT in MachO is safe, but was a bit lazy. We could probably come up with a generic TableSection<T> utility that we could re-use for GOTs, PLTs, and these new thread data structures.

Emm, I don't know how to come up with a generic TableSection<T> utility. What does the generic type `T' represents?

In general I would suggest not trying to resolve a TLSGD/LD relocation at all but instead converting it to a GOTTPOFF relocation. Then, no runtime function to implement __tls_get_addr is needed.

Is that compatible with adding extra code at runtime? I suspect we'll need to go the other way: convert things to function calls by default, but make it possible to use direct models where they're safe. That's speculation having not read the TLS spec though.

That depends on how the allocation of TLS sections is implemented, but in general you are right, I didn't think about that.

The problem I faced when implementing the TLS relocations is that we can't chose which types of TLS relocations the object files we try to link use. This means that we need to implement the GOTTPOFF/TPOFF relocations that do not go through the indirection of __tls_get_addr. They just resolve to an offset that will be added to the value stored in %fs:0. This is usually set initially by ld.so when a process starts by using arch_prctl(ARCH_SET_FS), or for newer CPUS by using the wrfsbase instruction. This means that we can't easily extend the storage of the process that runs the runtime linker to add new TLS sections since messing with the fs register will very likely lead to unexpected results. There are several possible solutions for that with different disadvantages:

  1. Don't allow GOTTPOFF/TPOFF relocations at all. This means that we will not be able to link many pre-compiled static libraries.
  2. Allow all TLS relocations. Since we can't add new TLS storage at runtime, define a global thread-local variable which will then be used by the linker to resolve GOTTPOFF/TPOFF relocations.
  3. Rewrite the relocated code so that it does not use the fs register. On Linux we could probably replace all references to fs by gs and then manually set the gs register? I'm not sure if this can break code if we change instructions that did not originally have a relocation.

I implemented 2. in llvm-rtdyld. Since this is mainly used for testing, the size of the pre-allocated TLS section is only 16 B. In a "real" program you could easily allocate an entire page without noticing any runtime overhead. This is more than enough to link an entire static glibc, for example.

Thanks, MoritzS!

I will read your implementation.

I am well familiar with the TLS implementations in ld.lld and musl and somewhat familiar with FreeBSD rtld and glibc ld.so.
I have been consulted by ghc on its FreeBSD support.

I appreciate if folks who want to add the LLVM JIT ELF support keep me in the loop.
I currently know nearly nothing about JIT but am happy to allocate some time to study.

To allocate the TLS descriptor in GOT, I need to get the edge kind information in PerGraphGOTAndPLTStubBuilder, So I add a Edge::Kind K argument in some functions in PerGraphGOTAndPLTStubBuilder.h. If it is not
suitable, I can think further to solve this problem.

Is there a good reason to put these in the GOT? I would create and manage a different section for these, rather than re-using the GOT.

The ELF TLS spec actually describes using the GOT for that. Each TLSGD relocation is usually converted into two adjacent GOT entries with the DTPMOD64 and DTPOFF64 relocations. I think the practical reason for that is that a TLSGD relocation is 32-bit PC-relative, so using an address of the GOT will guarantee that the 32 bit offset can represent the address.

It doesn't need to be the .got section. It can be any section, but ld.so is the one responsible for filling DTPMOD64/DTPOFF64 values.

In general I would suggest not trying to resolve a TLSGD/LD relocation at all but instead converting it to a GOTTPOFF relocation. Then, no runtime function to implement __tls_get_addr is needed. You can find the detailed description of how to do this in Section 5.5 of the ELF TLS spec (https://akkadia.org/drepper/tls.pdf).

I implemented this for RuntimeDyld (D105466) so you can take a look there. Let me know where I can help!

While the psABI documents of x86-32/x86-64/ppc32/ppc64 have defined TLS optimizations, many (arm/aarch64/riscv/...) don't.
So converting to GOTPTOFF is not always feasible.

  1. Allow all TLS relocations. Since we can't add new TLS storage at runtime, define a global thread-local variable which will then be used by the linker to resolve GOTTPOFF/TPOFF relocations.

I agree that reserving static TLS blocks is the most plausible approach (before looking at an implementation).

MoritzS added inline comments.Sep 8 2021, 2:53 AM
compiler-rt/lib/orc/elfnix_tls.x86-64.S
21

Do we need to save all registers here? I saw you took this from the implementation in MachO and I'm not familiar with the ABI there. But for ELFNix I don't think that is necessary. For TLSGD/TLSLD relocations the compiler already emits a regular function call to __tls_get_addr which means that it already takes care of saving the caller saved registers. Also, since you implemented __orc_rt_elfnix_tls_get_addr_impl as a regular function in C++, its generated assembly will also correctly store all callee saved registers.

StephenFan updated this revision to Diff 371777.Sep 9 2021, 9:12 PM
  1. Fix typo
  2. Rename TLS Descriptor to TLS Info entry
  3. Add PerGraphTLSInfoEntryBuilder pass to create and manage the TLS relative data structure
  4. Add test file trivial-tls.S
  5. Address @MoritzS 's comments
StephenFan updated this revision to Diff 371778.Sep 9 2021, 9:14 PM

clang-format

StephenFan added inline comments.Sep 9 2021, 9:16 PM
compiler-rt/lib/orc/elfnix_tls.x86-64.S
21

I agree with you, although I am not a ELF x86-64 expert.

StephenFan marked an inline comment as done.Sep 10 2021, 1:37 AM

Side note: Re-using the GOT in MachO is safe, but was a bit lazy. We could probably come up with a generic TableSection<T> utility that we could re-use for GOTs, PLTs, and these new thread data structures.

Emm, I don't know how to come up with a generic TableSection<T> utility. What does the generic type `T' represents?

@StephenFan -- got-pointer, tlv-pointer, plt-stub, tlv-entry-pair, ....
This would generalize the behavior in PerGraphGOTAndPLTStubsBuilder, which is a nice cleanup. I don't think it's important for this discussion though.

  1. Don't allow GOTTPOFF/TPOFF relocations at all. This means that we will not be able to link many pre-compiled static libraries.
  2. Allow all TLS relocations. Since we can't add new TLS storage at runtime, define a global thread-local variable which will then be used by the linker to resolve GOTTPOFF/TPOFF relocations.
  3. Rewrite the relocated code so that it does not use the fs register. On Linux we could probably replace all references to fs by gs and then manually set the gs register? I'm not sure if this can break code if we change instructions that did not originally have a relocation.

I implemented 2. in llvm-rtdyld. Since this is mainly used for testing, the size of the pre-allocated TLS section is only 16 B. In a "real" program you could easily allocate an entire page without noticing any runtime overhead. This is more than enough to link an entire static glibc, for example.

The point I wanted to make here is that if we go for 2. (or 3.), we might as well convert all TLSGD/LD relocations to TPOFF. We would implement all the logic for the TPOFF relocations anyway and if we assume that the pre-allocated TLS storage is large enough to link all object files, this will lead to less runtime overhead to access TLS variables.

By default we should optimize for full support of TLS and ORC features, at the cost of performance. We can also offer plugins that people can opt in to to go the other way and give up features (e.g. dynamically extensible TLV) to recapture performance.

On x86-64 and arm64 I think that we can synthesize a per-TLV stub and re-write the TLV access as a call to that stub to get the address of the requested TLV. There may be higher performance schemes that still fit our constraints (full TLS and ORC feature support), but specialized-stubs at least offer a baseline solution.

compiler-rt/lib/orc/elfnix_tls.x86-64.S
21

We don't need to save all the registers, that is just a conservative way to get up and running. I've filed https://llvm.org/PR51820 with a rough sketch of the scheme that I would like to move to eventually for MachO (and I think ELFNix could share the same code).

StephenFan retitled this revision from [JITLink][WIP] Add initial native TLS support to ELFNix platform to [JITLink] Add initial native TLS support to ELFNix platform.Sep 12 2021, 9:44 PM
lhames accepted this revision.Sep 12 2021, 9:46 PM

LGTM -- I think any follow up improvements based on this discussion can happen in-tree.

Thank you for working on this Stephen!

This revision is now accepted and ready to land.Sep 12 2021, 9:46 PM
This revision was landed with ongoing or failed builds.Sep 12 2021, 11:36 PM
This revision was automatically updated to reflect the committed changes.