s390 is special again - instead of tls_get_addr, it has tls_get_offset
with special calling conventions: the result is TP relative, and
the argument is GOT-relative. Since we need to get address of the caller's
GOT, which is in %r12, we have to use assembly like glibc does.
Aside of tls_get_offset, glibc also implements a slightly saner
tls_get_addr_internal, which takes a pointer as argument, but still
returns a TP-relative offset. It is used for dlsym() called on TLS
symbols, so we have to intercept it was well. Our __tls_get_offset
is also implemented by delegating to it.