This is an archive of the discontinued LLVM Phabricator instance.

[builtins] Delay emutls deallocation for one round
ClosedPublic

Authored by rprichard on May 16 2018, 2:54 PM.

Details

Summary

With Android/Bionic, delay deallocation to round 2 of 4. It must run after
C++ thread_local destructors have been called, but before the final 2
rounds, because emutls calls free, and jemalloc then needs another 2
rounds to free its thread-specific data.

Fixes https://github.com/android-ndk/ndk/issues/687

Event Timeline

rprichard created this revision.May 16 2018, 2:54 PM
Herald added subscribers: Restricted Project, llvm-commits, delcypher. · View Herald TranscriptMay 16 2018, 2:54 PM
srhines accepted this revision.May 17 2018, 12:19 AM

Thanks, Ryan, for chasing this down (literally), and restoring some sanity to teardown.

This revision is now accepted and ready to land.May 17 2018, 12:19 AM
chh added a reviewer: chh.May 18 2018, 3:44 PM
chh added inline comments.May 18 2018, 3:47 PM
lib/builtins/emutls.c
32

s/int/uintptr_t/

349–350

s/int/uintptr_t/

My emutls.c comment:

We can't wait until the final two rounds, because jemalloc needs two rounds
after the final malloc/free call to free its thread-specific data.

It might be helpful to explain on this review why jemalloc needs 2 rounds. Here's my comment from Google's bug tracker, http://b/78022094#comment13. I'll link to it from the code.

jemalloc's TSD (thread-specific data) code has four compile-time modes:

  • JEMALLOC_MALLOC_THREAD_CLEANUP: uses __thread variables, and the code using jemalloc must call a _malloc_thread_cleanup function at thread-exit. Apparently FreeBSD uses this.
  • JEMALLOC_TLS: uses __thread variables and a pthread key destructor
  • _WIN32: self-explanatory
  • default: uses pthread_getspecific / pthread_setspecific and a key destructor

Bionic must use the final mode (emutls uses malloc, so malloc can't use emutls).

A jemalloc TSD has four explicit states:

  • uninitialized
  • nominal
  • purgatory
  • reincarnated

Summary of jemalloc cleanup states:

  • The typical state is nominal
  • On nominal cleanup: free everything but the outer TSD struct, move to purgatory state, and schedule another dtor call
  • On purgatory cleanup: free the TSD struct
  • On reincarnated cleanup: move to purgatory state and schedule another dtor call
  • Calling malloc/free moves the state from purgatory to reincarnated.

jemalloc needs 2 pthread destructor rounds to free its TSD (nominal -> purgatory, purgatory -> deallocated).

If emutls cleanup happens on round 2 instead of 3, then jemalloc can be completely deallocated at the start of round 2, then reinitialized when emutls calls free. In round 3, jemalloc would enter purgatory, and in round 4, it would be deallocated again. I *could* try to avoid this with an otherwise pointless realloc call in emutls round 1, but it doesn't seem to make thread exit that much slower. I measured a slowdown on the order of ~10-20us per thread exit.

Address chh's comments from
https://android-review.googlesource.com/c/toolchain/compiler-rt/+/683602
(change int -> uintptr_t) and add a link to my Phabricator jemalloc comment.

rprichard marked 2 inline comments as done.May 21 2018, 11:59 PM
chh accepted this revision.May 22 2018, 10:01 AM

Only two minor suggestions. Everything looks fine. Thanks.

lib/builtins/emutls.c
102

line 96 could be just emutls_setspecific(array).

371

nit, line 373 sets array field; it's better to be nested inside the if-statement of line 371.

rprichard updated this revision to Diff 148110.May 22 2018, 2:56 PM

Avoid a NULL dereference on out-of-memory. Reuse emutls_setspecific.

rprichard marked 2 inline comments as done.May 22 2018, 3:01 PM
rprichard added inline comments.
lib/builtins/emutls.c
371

Good catch. I think that line could have dereferenced NULL on out-of-memory.

srhines accepted this revision.May 23 2018, 2:38 AM

Everything looks great now. Thanks again, Ryan!

echristo accepted this revision.Jun 6 2018, 3:21 PM
echristo added a subscriber: echristo.

One inline comment, but otherwise looks ok. You could also split out all of the non-bionic/skip_destructor_rounds into cleanup and other patches as well.

lib/builtins/emutls.c
27

No links to internal bug reporting infrastructure please.

rprichard updated this revision to Diff 150222.Jun 6 2018, 4:37 PM

Remove links to bug trackers. (I think the links to the internal Google
tracker need to be removed because other members of the LLVM community can't
access them. I think I *could* keep the GitHub link, but I don't think
there's anything especially useful there.) The Phabricator link is useful,
because it explains why jemalloc needs 2 rounds.

Split out part of the change into an earlier cleanup commit.

rprichard updated this revision to Diff 150238.Jun 6 2018, 5:57 PM

I think(?) each git commit should be a separate Phabricator review?

echristo accepted this revision.Jun 6 2018, 6:10 PM

LGTM.

rprichard retitled this revision from Delay emutls deallocation for one round to [builtins] Delay emutls deallocation for one round.Jun 7 2018, 3:46 PM
This revision was automatically updated to reflect the committed changes.