With Android/Bionic, delay deallocation to round 2 of 4. It must run after
C++ thread_local destructors have been called, but before the final 2
rounds, because emutls calls free, and jemalloc then needs another 2
rounds to free its thread-specific data.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
I uploaded this change earlier to https://android-review.googlesource.com/c/toolchain/compiler-rt/+/683602.
Thanks, Ryan, for chasing this down (literally), and restoring some sanity to teardown.
My emutls.c comment:
We can't wait until the final two rounds, because jemalloc needs two rounds
after the final malloc/free call to free its thread-specific data.
It might be helpful to explain on this review why jemalloc needs 2 rounds. Here's my comment from Google's bug tracker, http://b/78022094#comment13. I'll link to it from the code.
jemalloc's TSD (thread-specific data) code has four compile-time modes:
- JEMALLOC_MALLOC_THREAD_CLEANUP: uses __thread variables, and the code using jemalloc must call a _malloc_thread_cleanup function at thread-exit. Apparently FreeBSD uses this.
- JEMALLOC_TLS: uses __thread variables and a pthread key destructor
- _WIN32: self-explanatory
- default: uses pthread_getspecific / pthread_setspecific and a key destructor
Bionic must use the final mode (emutls uses malloc, so malloc can't use emutls).
A jemalloc TSD has four explicit states:
- uninitialized
- nominal
- purgatory
- reincarnated
Summary of jemalloc cleanup states:
- The typical state is nominal
- On nominal cleanup: free everything but the outer TSD struct, move to purgatory state, and schedule another dtor call
- On purgatory cleanup: free the TSD struct
- On reincarnated cleanup: move to purgatory state and schedule another dtor call
- Calling malloc/free moves the state from purgatory to reincarnated.
jemalloc needs 2 pthread destructor rounds to free its TSD (nominal -> purgatory, purgatory -> deallocated).
If emutls cleanup happens on round 2 instead of 3, then jemalloc can be completely deallocated at the start of round 2, then reinitialized when emutls calls free. In round 3, jemalloc would enter purgatory, and in round 4, it would be deallocated again. I *could* try to avoid this with an otherwise pointless realloc call in emutls round 1, but it doesn't seem to make thread exit that much slower. I measured a slowdown on the order of ~10-20us per thread exit.
Address chh's comments from
https://android-review.googlesource.com/c/toolchain/compiler-rt/+/683602
(change int -> uintptr_t) and add a link to my Phabricator jemalloc comment.
lib/builtins/emutls.c | ||
---|---|---|
373 ↗ | (On Diff #147953) | Good catch. I think that line could have dereferenced NULL on out-of-memory. |
One inline comment, but otherwise looks ok. You could also split out all of the non-bionic/skip_destructor_rounds into cleanup and other patches as well.
lib/builtins/emutls.c | ||
---|---|---|
27 ↗ | (On Diff #148110) | No links to internal bug reporting infrastructure please. |
Remove links to bug trackers. (I think the links to the internal Google
tracker need to be removed because other members of the LLVM community can't
access them. I think I *could* keep the GitHub link, but I don't think
there's anything especially useful there.) The Phabricator link is useful,
because it explains why jemalloc needs 2 rounds.
Split out part of the change into an earlier cleanup commit.