There were two problems with my previous implementation
- TLS was not allocated if nothing used it. There is a fallback
initialization in _init_libc, but this was not enough if one of the
init functions tried to access errno.
- __tls_get_addr was not resolved. If __tls_get_addr was called through
a plt entry, everything would just break :(