Thread local variables are placed inside a .tdata segment. Their symbols are
offsets from the start of the segment. The address of a thread local variable
is computed as __tls_base + the offset from the start of the segment.
.tdata segment is a passive segment and memory.init is used once per thread
to initialize the thread local storage.
__tls_base is a wasm global. Since each thread has its own wasm instance,
it is effectively thread local. Currently, __tls_base must be initialized
at thread startup, and so cannot be used with dynamic libraries.
__tls_base is to be initialized with a new linker-synthesized function,
__wasm_init_tls, which takes as an argument a block of memory to use as the
storage for thread locals. It then initializes the block of memory and sets
__tls_base. As __wasm_init_tls will handle the memory initialization,
the memory does not have to be zeroed.
To help allocating memory for thread-local storage, a new compiler intrinsic
is introduced: __builtin_wasm_tls_size(). This instrinsic function returns
the size of the thread-local storage for the current function.
The expected usage is to run something like the following upon thread startup:
__wasm_init_tls(malloc(__builtin_wasm_tls_size()));
Why is it c(const)? According to this comment, this is true if this function has no side effects and doesn't read memory, i.e., the result should be only dependent on its arguments. Can't wasm globals be memory locations in machines?