Thread local variables are placed inside a .tdata segment. Their symbols are
offsets from the start of the segment. The address of a thread local variable
is computed as __tls_base + the offset from the start of the segment.
.tdata segment is a passive segment and memory.init is used once per thread
to initialize the thread local storage.
__tls_base is a wasm global. Since each thread has its own wasm instance,
it is effectively thread local. Currently, __tls_base must be initialized
at thread startup, and so cannot be used with dynamic libraries.
__tls_base is to be initialized with a new linker-synthesized function,
__wasm_init_tls, which takes as an argument a block of memory to use as the
storage for thread locals. It then initializes the block of memory and sets
__tls_base. As __wasm_init_tls will handle the memory initialization,
the memory does not have to be zeroed.
To help allocating memory for thread-local storage, a new compiler intrinsic
is introduced: __builtin_wasm_tls_size(). This instrinsic function returns
the size of the thread-local storage for the current function.
The expected usage is to run something like the following upon thread startup:
__wasm_init_tls(malloc(__builtin_wasm_tls_size()));