This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Implement thread-local storage (local-exec model)
ClosedPublic

Authored by quantum on Jul 10 2019, 3:15 PM.

Details

Summary

Thread local variables are placed inside a .tdata segment. Their symbols are
offsets from the start of the segment. The address of a thread local variable
is computed as __tls_base + the offset from the start of the segment.

.tdata segment is a passive segment and memory.init is used once per thread
to initialize the thread local storage.

__tls_base is a wasm global. Since each thread has its own wasm instance,
it is effectively thread local. Currently, __tls_base must be initialized
at thread startup, and so cannot be used with dynamic libraries.

__tls_base is to be initialized with a new linker-synthesized function,
__wasm_init_tls, which takes as an argument a block of memory to use as the
storage for thread locals. It then initializes the block of memory and sets
__tls_base. As __wasm_init_tls will handle the memory initialization,
the memory does not have to be zeroed.

To help allocating memory for thread-local storage, a new compiler intrinsic
is introduced: __builtin_wasm_tls_size(). This instrinsic function returns
the size of the thread-local storage for the current function.

The expected usage is to run something like the following upon thread startup:

__wasm_init_tls(malloc(__builtin_wasm_tls_size()));

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJul 10 2019, 3:15 PM
quantum planned changes to this revision.Jul 10 2019, 3:16 PM

Need to add tests for the compiler intrinsic __builtin_wasm_tls_size.

quantum updated this revision to Diff 209084.Jul 10 2019, 3:54 PM

Add tests for __builtin_wasm_tls_size.

quantum updated this revision to Diff 209090.Jul 10 2019, 4:05 PM

Fix fail call.

This looks nice!

wasm_init_tls(calloc(builtin_wasm_tls_size(), 1));

Would it make sense to change the API contract for __wasm_init_tls to guarantee that initializes all bytes (with zeros as needed)? __wasm_init_tls knows what bytes it's initializing, so we could use plain malloc instead of calloc and avoid redundant zero initialization of those bytes.

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1111

It looks like this supports local-exec, but would need to be extended to handle initial-exec or the other TLS models. Assuming this is correct, could you add a check for the TLS model and report_fatal_error on unsupported models?

quantum updated this revision to Diff 209102.Jul 10 2019, 5:30 PM

Fix linker warning.

quantum edited the summary of this revision. (Show Details)Jul 10 2019, 5:58 PM
quantum marked an inline comment as done.Jul 10 2019, 6:00 PM

This looks nice!

wasm_init_tls(calloc(builtin_wasm_tls_size(), 1));

Would it make sense to change the API contract for __wasm_init_tls to guarantee that initializes all bytes (with zeros as needed)? __wasm_init_tls knows what bytes it's initializing, so we could use plain malloc instead of calloc and avoid redundant zero initialization of those bytes.

Updated the summary. I just realized that the way the code is written, the plain malloc would already work.

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1111

I'll add an error for non-local-exec for now.

Nice!
Then where should we call __wasm_init_tls, in case within a library?

clang/include/clang/Basic/BuiltinsWebAssembly.def
33

Why is it c(const)? According to this comment, this is true if this function has no side effects and doesn't read memory, i.e., the result should be only dependent on its arguments. Can't wasm globals be memory locations in machines?

lld/test/wasm/tls.ll
58

Hmm, I think there might be a way to actually print disassembly results. There are '*.test' files in test directory, in which we have some examples. For example, this test has a sequence of commands you want to run, and you can put input files in a separate directory. That test uses obj2yaml, but can we possibly use llvm-objdump or something to disassemble?

lld/wasm/Driver.cpp
540

Does this TLS thing work when Config->Shared == true, i.e., do we create TLS globals even when this is a library?

lld/wasm/Writer.cpp
431

Should we check for "mutable-global" feature too?

514

Now __tls_base is also mutable, I think we should fix the comment

769

Any reason for the new block scope?

777

Is it guaranteed that there's only one TLS segment?

llvm/include/llvm/IR/IntrinsicsWebAssembly.td
134
  • Why is it speculatable?
  • I'm not sure if it's InstNoMem, because wasm globals can be memory locations after all. What do other people think?
llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1095

If you do the conversion in WebAssemblyISelDAGToDAG.cpp, you don't need to create WebAssemblyISD node, SDTypeProfile, and SDNode for every single instruction you want to generate. This WebAssemblyISelLowering.cpp is a part of legalization so it cannot generate machine instructions directly, whereas WebAssemblyISelDAGToDAG.cpp is a part of instruction selection so you have direct access to machine instructions. I think moving routines there can be cleaner?

1109

Does this offset work when there are non-thread-local globals too?

llvm/lib/Target/WebAssembly/WebAssemblyInstrInfo.td
86

Nit: indentation

124

Nit: indentations look weird for both defs

llvm/lib/Target/WebAssembly/WebAssemblyTargetMachine.cpp
192

Looks like this feature requires both bulk memory and mutable global. Shouldn't we strip thread locals if either is not enabled?

llvm/test/CodeGen/WebAssembly/target-features-tls.ll
21–27

Is thread local feature still dependent on atomics? If not, do we still need to test this with both atomics on and off? This now looks rather dependent on bulk memory and mutable global. Should we test this feature with those options instead? (I can be mistaken, so please let me know if so)

llvm/test/CodeGen/WebAssembly/tls.ll
15

If the only difference of O0 and O2 results is the order of global.get and i32.const, you can use [[ https://llvm.org/docs/CommandGuide/FileCheck.html#the-check-dag-directive | CHECK-DAG ]] directive, like

; CHECK-DAG: global.get __tls_base
; CHECK-DAG: i32.const tls
...

The same for other functions too. Maybe we don't need separate O0 and O2 testing in this file.

quantum updated this revision to Diff 209258.Jul 11 2019, 10:49 AM

Add error for non-local-exec TLS models. Fix unit tests.

quantum marked 20 inline comments as done.Jul 11 2019, 11:15 AM

I'll clean up the table gen stuff.

clang/include/clang/Basic/BuiltinsWebAssembly.def
33

I was thinking that since the global is immutable, so the value is always a constant.

lld/test/wasm/tls.ll
58

We already know that we can do something like

Run: obj2yaml %t.wasm | sed -n '/Body:/{s/^\s*Body:\s*//;s/../0x& /gp}'  | llvm-mc -disassemble -triple=wasm32

to compare the actual assembly. As for llvm-objdump, it seems to be unable to disassemble the WASM properly:

.../tools/lld/test/wasm/Output/tls.ll.tmp.wasm:	file format WASM


Disassembly of section CODE:

00000000 CODE:
        # 4 functions in section.
       1: 02 00                        	block   	invalid_type
       3: 0b                           	end
       4: 10 00                        	call	0
       6: 20 00                        	local.get	0
       8: 24 01                        	global.set	1
       a: 20 00                        	local.get	0
       c: 41 00                        	i32.const	0
       e: 41 08                        	i32.const	8
      10: fc 08 00 00                  	memory.init	0, 0
      14: 0b                           	end
      15: 0f                           	return
      16: 00                           	llvm-objdump: lib/Target/WebAssembly/WebAssemblyGenAsmWriter.inc:2032: void llvm::WebAssemblyInstPrinter::printInstruction(const llvm::MCInst *, llvm::raw_ostream &): Assertion `Bits != 0 && "Cannot print this instruction."' failed.
lld/wasm/Driver.cpp
540

Since TLS is module specific (we can't allocate memory for other modules), we must in fact generate this symbol for every module. Shared library support will not be implemented in this diff, however.

lld/wasm/Writer.cpp
431

Do we need to? I thought it's always available since we use it for the stack pointer.

514

Will do.

769

Yes, the raw_string_ostream must destruct by the end of the scope. This is the pattern used in other functions above.

777

Yes, all the TLS input segments will be merged into .tdata.

llvm/include/llvm/IR/IntrinsicsWebAssembly.td
134

@tlively suggested that I do this. It should be speculatable because it has no side effects (since it returns a constant). As for InstrNoMem, I am not sure if globals are memory, and whether reading a constant counts as memory access.

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1095

It seems most other architectures do it in *TargetLowering, so I decided to do the same.

1109

Yes, the linker handles the offsets. All thread locals are grouped together by the linker.

llvm/lib/Target/WebAssembly/WebAssemblyTargetMachine.cpp
192

Do we want to do that? I think having your thread locals vanish if you didn't pass -mbulk-memory is surprising. It might make more sense to strip thread locals if -pthread is not passed.

llvm/test/CodeGen/WebAssembly/target-features-tls.ll
21–27

On second thought, this test is somewhat meaningless now, since the thread local is always created regardless of whether atomics are enabled. What does @tlively think?

llvm/test/CodeGen/WebAssembly/tls.ll
15

Will do. I didn't know CHECK-DAG existed.

quantum updated this revision to Diff 209271.Jul 11 2019, 11:16 AM
quantum marked 6 inline comments as done.

Update comments regarding multable globals. Use CHECK-DAG.

quantum updated this revision to Diff 209274.Jul 11 2019, 11:21 AM

Clean up table gen.

quantum marked 5 inline comments as done.Jul 11 2019, 12:45 PM
quantum added inline comments.
clang/include/clang/Basic/BuiltinsWebAssembly.def
33

According to @tlively, there is no solid definition on whether a global (especially a constant one), counts as memory access. For now, I am going to change this to not constant. We can always change it back later.

llvm/include/llvm/IR/IntrinsicsWebAssembly.td
134

I think I am going to remove these attributes for now. We can add them back if we end up deciding that immutable globals are not memory accesses.

quantum updated this revision to Diff 209304.Jul 11 2019, 12:46 PM

Treat global.get of an immutable global as a memory access for now.

quantum retitled this revision from [WebAssembly] Implement thread-local storage for non-PIC cases to [WebAssembly] Implement thread-local storage (local-exec model).Jul 11 2019, 1:53 PM
tlively added inline comments.Jul 11 2019, 3:30 PM
clang/include/clang/Basic/BuiltinsWebAssembly.def
33

I think this requires more conversation.

lld/test/wasm/data-layout.ll
43 ↗(On Diff #209274)

These globals don't have enough information to tell the reader what they even are, and they don't have anything to do with the data layout, so how about skipping these in the test with a comment saying what is being skipped?

lld/test/wasm/tls.ll
58

It might be worth filing an LLVM bug for this (or possibly fixing in a separate CL).

lld/wasm/Symbols.cpp
207

It would be great to have an explanatory comment here.

lld/wasm/Writer.cpp
243

What happens when there are multiple TLS sections and --no-merge-data-segments is used? I assume their sizes should be added together?

643

Does this mean we can't control whether .tdata or .tbss comes first? Is that important for anything?

777

--no-merge-data-segments!

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1095

It would be good to try @aheejin's suggestion out and see if it leads to cleaner code. Cargo culting other targets is a good way to get started, but we should still try to make our code as simple as possible.

llvm/lib/Target/WebAssembly/WebAssemblyInstrInfo.td
124

I agree with @aheejin that it would be ideal if we didn't need to add these new WebAssemblyISD items. We've come this far without needing them, so it would be good to keep that going.

llvm/lib/Target/WebAssembly/WebAssemblyTargetMachine.cpp
192

I have a patch up to make -pthread imply -mbulk-memory, so this shouldn't be a problem from a UI perspective. I agree with @aheejin that we should strip TLS if the necessary features aren't present to preserver feature-correctness.

All the errors for using TLS without bulk memory should also apply to mutable-globals, which I hadn't considered before. I will update my patch so -pthread implies -mmutable-globals as well.

llvm/test/CodeGen/WebAssembly/target-features-tls.ll
21–27

I agree with @aheejin that we should change the test to toggle bulk-memory and mutable-global and strip TLS if those are not both present.

llvm/test/CodeGen/WebAssembly/tls.ll
15

If the point of testing with -O0 and -O2 is to test both with and without FastIsel, it might be better to pass -fast-isel to explicitly enable it instead.

quantum marked 28 inline comments as done.Jul 11 2019, 6:09 PM
quantum added inline comments.
clang/include/clang/Basic/BuiltinsWebAssembly.def
33

We decided that this is really a constant, since this global is immutable.

lld/test/wasm/data-layout.ll
43 ↗(On Diff #209274)

Going to skip over __tls_base and __tls_size.

lld/test/wasm/tls.ll
58

Going to fix this at some point in the future.

lld/wasm/Symbols.cpp
207

Adding a comment.

lld/wasm/Writer.cpp
243

I'll merge the TLS segments anyways despite --no-merge-data-segments.

431

On second thought, the mutable-global feature is for import/export of mutable globals. TLS does not need to do this.

643

Yes, it does mean that. The only reason why .tbss is supposed to be separate is so that its memory can just be zeroed whereas .tdata has to have the bytes stored in the program image. Currently, we just explicitly store the zero bytes, so this won't be a problem.

777

I am making --no-merge-data-segments merge TLS segments anyway.

llvm/include/llvm/IR/IntrinsicsWebAssembly.td
134

We decided that this is a constant after all, since the global is immutable. This also simplifies the implementation of the lowering code.

llvm/lib/Target/WebAssembly/WebAssemblyInstrInfo.td
124

I'll try out an implementation with ISelDAGtoDAG later.

llvm/lib/Target/WebAssembly/WebAssemblyTargetMachine.cpp
192

Going to strip out TLS if bulk-memory is not enabled.

llvm/test/CodeGen/WebAssembly/target-features-tls.ll
21–27

Changed test to test +/-bulk-memory.

llvm/test/CodeGen/WebAssembly/tls.ll
15

Changing to use -fast-isel.

quantum updated this revision to Diff 209391.Jul 11 2019, 6:09 PM
quantum marked 10 inline comments as done.

Apply review feedback

I wonder if __tls_base should be allocated by the embedder (or by the parent/creator thread). Then it could be an *immutable* global import that is allocated up front. I guess __stack_pointer should be treated in the same way (except mutable).

I don't want to block this change on this, but just putting the idea out there.

lld/test/wasm/data-layout.ll
43 ↗(On Diff #209274)

Is there any reason these symbols need to exist in non-threaded builds. The cost of two extra globals is small but not nothing.

I'd also rather not skip over them here. If TLS changes I'd like to see this test need updating.

lld/test/wasm/data-segment-merging.ll
32 ↗(On Diff #209391)

Again, can we avoid creating this completely for non-threaded builds?

lld/test/wasm/tls.ll
58

This is known issue with the disassembler in that it doesn't know where the functions start and stop in executable output, only in object files.

lld/wasm/Writer.cpp
629

Maybe write this as a single conditional so its clear even without this comment?

Its really great to see this change BTW! Thanks.

quantum updated this revision to Diff 209535.Jul 12 2019, 11:03 AM

Guard __wasm_init_tls and TLS globals behind a flag (currently --shared-memory).

@tlively and I discussed offline and agreed that it's probably best to use the existing flag instead of adding another flag that needs to be passed when linking multi-threaded executables.

quantum marked 4 inline comments as done.Jul 12 2019, 11:04 AM
quantum added inline comments.
lld/wasm/Writer.cpp
629

Changed in latest diff.

quantum updated this revision to Diff 209560.Jul 12 2019, 12:18 PM
quantum marked an inline comment as done.

Switch to using ISelDAGToDAG instead of ISelLowering

quantum updated this revision to Diff 209566.Jul 12 2019, 12:36 PM

Don't show a crash report for incorrect flags to use TLS

@sbc100 As for __tls_base being immutable, I am not sure how this will interact with dynamic linking once we implement it. On native platforms, dlopened libraries have their TLS blocks allocated on first use.

What exactly happens to dlopened libraries when there are already threads around? Would it be feasible to create TLS blocks for when the library is loaded on existing thread?

If there's any chance this TLS ABI could be useful for WASI (I don't know if there's been any WASI work on threads yet, but it seems like there's no reason it couldn't be), then we should start a doc in tool-conventions for it. If not then we should get it behind the emscripten OS in LLVM. (and document it anyway; either in tool-conventions or somewhere in the emscripten site).

Oh, btw, any reason these have to be passive segments? Why can't we just make them active segments and let the VM initialize them for us?

Oh, btw, any reason these have to be passive segments? Why can't we just make them active segments and let the VM initialize them for us?

These need to be passive segments so that we can initialize it in different locations in linear memory for every thread.

If we had multiple memories, TLS could have been implemented as a separate linear memory, but we are stuck with mallocing and initializing for now.

If there's any chance this TLS ABI could be useful for WASI (I don't know if there's been any WASI work on threads yet, but it seems like there's no reason it couldn't be), then we should start a doc in tool-conventions for it. If not then we should get it behind the emscripten OS in LLVM. (and document it anyway; either in tool-conventions or somewhere in the emscripten site).

Documenting this in tool-conventions is planned.

quantum updated this revision to Diff 209656.Jul 12 2019, 5:42 PM

Use signed LEB128 for i32.const. Basically, do what D64612 did.

aheejin added inline comments.Jul 12 2019, 5:51 PM
lld/test/wasm/data-segments.ll
7

What is the difference between ACTIVE and ACTIVE-TLS? It looks we don't have different build processes for them. And as what @sbc100 said, can we exclude TLS from build?

lld/test/wasm/tls.ll
7

Can we possibly mix one non-tls global variable between tls1 and tls2, just to see they work together?

lld/wasm/Writer.cpp
431

Ah you're right.

514

PIng. It doesn't look like fixed yet

629

Has it been reflected? It says it has been changed but it doesn't look like it has

llvm/lib/Target/WebAssembly/WebAssemblyISelDAGToDAG.cpp
203 ↗(On Diff #209566)

These two lines can be shortened to

ReplaceNode(Node, TLSAddress);

The same applies to the code below for __tls_size too.

208 ↗(On Diff #209566)

Nit: Might be a bit cleaner to extract the expression, which is complicated, like

unsigned IntNo = cast<ConstantSDNode>(Node->getOperand(0))->getZExtValue());
switch (IntNo) {
 ...
quantum updated this revision to Diff 209658.Jul 12 2019, 6:07 PM
quantum marked 14 inline comments as done.

Change per review feedback.

lld/test/wasm/data-segments.ll
7

ACTIVE-TLS is for builds with TLS enabled. Currently, we use --shared-memory to determine that, per @tlively's recommendation. The rationale is that we don't want even more flags that need to be passed in a proper threaded build.

lld/test/wasm/tls.ll
7

Done.

lld/wasm/Writer.cpp
514

Looks like the fix got rebasing against the change that made everything lowerCamelCase. Going to do it again.

629

I think you are commenting on an old version? It is a single condition on the latest version.

llvm/lib/Target/WebAssembly/WebAssemblyISelDAGToDAG.cpp
203 ↗(On Diff #209566)

Nice. I didn't know that was possible. Changing.

208 ↗(On Diff #209566)

Done.

aheejin added inline comments.Jul 12 2019, 6:54 PM
lld/test/wasm/data-segments.ll
7

Then if we don't enable --shared-memory, we don't generate those globals? Do we have a test for that?

tlively added inline comments.Jul 12 2019, 8:30 PM
lld/wasm/Driver.cpp
540

Until we do implement TLS for shared modules, would it make sense to omit these globals and function from shared modules, since we can't use them in that context anyway?

lld/wasm/Writer.cpp
643

It would be really great if we could find a way to elide the .bss 0 bytes as a code size optimization. Since we can't assume that the memory is already zeroed the way we can with passive segments, perhaps we can use a memory.fill instruction to zero the memory? Pursuing this in a follow-on CL should be fine.

805

You could avoid duplicating these lines by making them unconditional.

905

Can you remind me how the InitTLSFunction interacts with relocatable code? I'm wondering if this should be called up in the condition with the other synthetic functions.

llvm/lib/Target/WebAssembly/WebAssemblyTargetMachine.cpp
193

I just realized that if we have atomics but not bulk memory and TLS is stripped, then we will be in the awkward situation of both using atomics and disallowing atomics because the module is not thread safe. I think the best solution would be to go back and forcibly strip both atomics and TLS if either of them would be stripped.

llvm/test/CodeGen/WebAssembly/tls.ll
2

It would be good to check the negative case, too, i.e with bulk-memory disabled.

6

Is CHECK still used as a prefix if it not listed in the invocation of FileCheck?

quantum updated this revision to Diff 209953.Jul 15 2019, 1:57 PM
quantum marked 16 inline comments as done.

Deal with review comments.

lld/test/wasm/data-segments.ll
7

Yes. In this file, the cases with --check-prefix ACTIVE will ensure that the fields are not generated.

lld/wasm/Driver.cpp
540

That would make sense for now.

lld/wasm/Writer.cpp
643

Yes, this is a good idea for a follow-on CL.

805

Right.

905

I don't think it should. __wasm_init_tls initializes the TLS block for the current module only, so every shared library needs to have its own __wasm_init_tls.

llvm/lib/Target/WebAssembly/WebAssemblyTargetMachine.cpp
193

Done.

llvm/test/CodeGen/WebAssembly/tls.ll
2

Done.

6

Yes, the default prefix is CHECK.

quantum updated this revision to Diff 209966.Jul 15 2019, 3:07 PM

Fix duplicate end function

The offset field of a segment can be a constant expression which can be a global.get of an imported global. So we could have an imported global __tls_base which is different for each thread, and have an active segment with that as its segment offset?

Another high-level question (based just on reading the CL description): The TLS-size intrinsic is per-function, does that mean that the tls-init function is called for every function? are there just multiple TLS sections per object file?

Another high-level question (based just on reading the CL description): The TLS-size intrinsic is per-function, does that mean that the tls-init function is called for every function? are there just multiple TLS sections per object file?

The TLS-size intrinsic returns the total size of TLS for the module it's called from. The initialization function initializes the TLS for the entire module.

The offset field of a segment can be a constant expression which can be a global.get of an imported global. So we could have an imported global __tls_base which is different for each thread, and have an active segment with that as its segment offset?

I didn't know that it could have been a constant expression. I don't think this would have worked very well on the main thread though, since we need to run malloc before we can compute __tls_base. I think this requires the global to be mutable, if only because we need to be able to initialize it on the main thread.

Where should we call __wasm_init_tls, in case within a library?

Where should we call __wasm_init_tls, in case within a library?

Dynamic libraries are not supported by the local-exec TLS model. Support will be implemented later. The idea is that we'll have an __ensure_tls_initialized function that will malloc and __wasm_init_tls if __tls_base == 0.

LGTM apart from one last comment

llvm/lib/Target/WebAssembly/WebAssemblyTargetMachine.cpp
193

We can be more fine grained than this. We should strip atomics or TLS only if the relevant feature is missing or the other one is stripped. It is possible to have thread-locals and bulk memory but not have any atomics to strip.

quantum marked an inline comment as done.Jul 16 2019, 1:45 PM
quantum updated this revision to Diff 210166.Jul 16 2019, 1:45 PM

More fine-grainted stripping.

sbc100 accepted this revision.Jul 16 2019, 1:56 PM
sbc100 added inline comments.
lld/wasm/Writer.cpp
400

Right now we always compiler with -fdata-sections, but if we didn't then object files might have just one ".tdata" sections. Perhaps drop the final "."?

This revision is now accepted and ready to land.Jul 16 2019, 1:56 PM
quantum updated this revision to Diff 210184.EditedJul 16 2019, 2:35 PM

Removed the trailing .. I would add a test to make sure things work without -fdata-sections, but this->Options.DataSections = true; is hard-coded in WebAssemblyTargetMachine.cpp.

quantum marked an inline comment as done.Jul 16 2019, 2:35 PM
quantum updated this revision to Diff 210187.Jul 16 2019, 2:53 PM

Remove extra braces

quantum updated this revision to Diff 210188.Jul 16 2019, 2:57 PM

Disable atomics when TLS is stripped

tlively accepted this revision.Jul 16 2019, 2:57 PM

Nice work!

This revision was automatically updated to reflect the committed changes.

I had a reply that got eaten here, so I'm going to keep trolling you on your CL since we don't have a design doc for this.
The offset field of a data segment initializer can be a global.get on an imported global. (https://webassembly.github.io/spec/core/valid/instructions.html#constant-expressions). Since each thread is separately instantiated with separate JS, we could have a global import like __tls_base which has a different value in each thread. Then we wouldn't need to manually call the init code anywhere. Would there be other advantages or disadvantages for that?

I had a reply that got eaten here, so I'm going to keep trolling you on your CL since we don't have a design doc for this.
The offset field of a data segment initializer can be a global.get on an imported global. (https://webassembly.github.io/spec/core/valid/instructions.html#constant-expressions). Since each thread is separately instantiated with separate JS, we could have a global import like __tls_base which has a different value in each thread. Then we wouldn't need to manually call the init code anywhere. Would there be other advantages or disadvantages for that?

I already answered this one:

The offset field of a segment can be a constant expression which can be a global.get of an imported global. So we could have an imported global __tls_base which is different for each thread, and have an active segment with that as its segment offset?

I didn't know that it could have been a constant expression. I don't think this would have worked very well on the main thread though, since we need to run malloc before we can compute __tls_base. I think this requires the global to be mutable, if only because we need to be able to initialize it on the main thread.

The problem I found with the import approach is that the main thread cannot use global.get of an imported global to get the location of TLS, mainly because it needs to be malloc'd. To use malloc, the main thread needs to be initialized. This will result in a circular dependency.

Another high-level question (based just on reading the CL description): The TLS-size intrinsic is per-function, does that mean that the tls-init function is called for every function? are there just multiple TLS sections per object file?

The TLS-size intrinsic returns the total size of TLS for the module it's called from. The initialization function initializes the TLS for the entire module.

The offset field of a segment can be a constant expression which can be a global.get of an imported global. So we could have an imported global __tls_base which is different for each thread, and have an active segment with that as its segment offset?

I didn't know that it could have been a constant expression. I don't think this would have worked very well on the main thread though, since we need to run malloc before we can compute __tls_base. I think this requires the global to be mutable, if only because we need to be able to initialize it on the main thread.

Interesting. I was assuming we could be clever and make the main thread special and have the linker allocate its tls space upfront (like we do for main thread stack space) rather than do it a runtime, but maybe that needlessly complicated. Seems like ideally we could want to import __tls_base but not for the main thread but I don't think of a way to express that.

mloud added a subscriber: mloud.Sep 25 2021, 11:42 AM