This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Fix PR22606 - Leak pthread_key with static storage duration to ensure all of thread-local destructors are called.
ClosedPublic

Authored by EricWF on Apr 2 2015, 9:18 AM.

Details

Summary

See https://llvm.org/bugs/show_bug.cgi?id=22606 for more discussion.

Most of the changes in this patch are file reorganization to help ensure assumptions about how __thread_specific_pointer is used hold. The assumptions are:

  • __thread_specific_ptr<Tp> is only created with a __thread_struct pointer.
  • __thread_specific_ptr<Tp> can only be constructed inside the __thread_local_data() function.

I'll remove the comments before committing. They are there for clarity during review.

Diff Detail

Event Timeline

EricWF updated this revision to Diff 23157.Apr 2 2015, 9:18 AM
EricWF retitled this revision from to [libc++] Fix PR22606 - Leak pthread_key with static storage duration to ensure all of thread-local destructors are called..
EricWF updated this object.
EricWF edited the test plan for this revision. (Show Details)
EricWF added reviewers: mclow.lists, earthdok.
EricWF added a subscriber: Unknown Object (MLST).
mclow.lists edited edge metadata.Apr 5 2015, 9:52 PM

(just to make sure that I understand), I see two changes here :

  1. Some code movement (line 310-ish to line 116-ish). No real functionality change there.
  2. Make the constructor for __thread_specific_ptr private, and add a friend decl for __thread_local_data.

Is that correct?

(just to make sure that I understand), I see two changes here :

  1. Some code movement (line 310-ish to line 116-ish). No real functionality change there.
  2. Make the constructor for __thread_specific_ptr private, and add a friend decl for __thread_local_data.

Is that correct?

That is correct, but for some reason this patch itself is incorrect. there should be a third change:

  • ~__thread_specific_ptr() does not call pthread_key_delete(__key_)
EricWF updated this revision to Diff 23278.Apr 6 2015, 10:03 AM
EricWF edited edge metadata.

Actually leak the key...

earthdok edited edge metadata.Apr 7 2015, 6:02 AM

lgtm

I'd still leave a short comment though.

Other than my inline comment, LGTM.

Since a rI'm sitting on this for a bit until I determine a couple of things.

  1. Could libc++ use std::atexit to call the destructor function before destroying the pthread key?
  2. What does the standard say about race conditions between thread-exit and program-exit? Do all created threads have to be terminated before std::exit() begins?
  3. How do the pthread_key thread-exit destructors fit into the steps C++ takes at a thread exit?

OK. I have some new thoughts. The __thread_struct destructor serves three purposes when it is invoked at thread exit.

  1. Notifying registered condition variables.
  2. Making registered shared state ready.
  3. Deleting the tuple of arguments used to start the thread.

The problem this patch is trying to solve is the case where std::exit(0) has been called. Once program termination has begun the only code that should be running are destructors and functions registered with at_exit(...). This means that the code waiting on the CV or shared state must be doing so in a destructor. Furthermore it is likely that the registered CV's have already been destroyed.

For these reasons I don't think it is safe for libc++ to run steps #1 or #2 when program termination has begun. So I'm worried about going out of our way to support this.

I'll write up some more documentation on this and hopefully reach a conclusion.

EricWF added a comment.Jul 6 2015, 4:47 PM

OK. I have some new thoughts. The __thread_struct destructor serves three purposes when it is invoked at thread exit.

  1. Notifying registered condition variables.
  2. Making registered shared state ready.
  3. Deleting the tuple of arguments used to start the thread.

I was incorrect in this comment. #3 is unrelated to this problem. They are always freed. Only #1 and #2 are any concern.

The problem is this:

Libc++ uses a single pthread key for thread local storage. This TLS is used by <condition_variable> and <future> to notify/ready consumers on thread exit. The notification is done in the pthread key's destructor which is executed during pthread_exit(...).
When the main thread begins program termination and a detached child thread, t0, has yet to terminate there is a race condition between t0 executing the TLS destructor and the main thread destroying the static pthread key. Once the main thread destroys the pthread key no more TLS destructors are run.

There are two solutions to this race condition:

  1. Leak the pthread key. If we never destroy the during program termination then all TLS destructors on detached threads will run.
  2. Reference count the key and only destroy it once all TLS have been run. This change would require a MAJOR ABI break that we could not ship for some time.

Because #2 requires an ABI break I believe that #1 is the better option. Because we only leak one key, and this key is leaked during program termination, it is very unlikely (read: almost impossible) for a user to notice this change.

However, I question if is wise to attempt to run the TLS destructors at all once program termination has begun. The shared state referenced by the destructors must have static storage duration and this static storage may or may not have been destroyed already. It seems very unlikely that a well-formed program will attempt to use the results of std::notify_at_thread_exit(...) or std::promise::set_value_at_thread_exit(...) during program termination. This is likely also undefined behavior as noted in [basic.start.term]C++17 3.6.3 p4:

If there is a use of a standard library object or function not permitted within signal handlers (18.10) that
does not happen before (1.10) completion of destruction of objects with static storage duration and execution
of std::atexit registered functions (18.5), the program has undefined behavior. [ Note: If there is a use
of an object with static storage duration that does not happen before the object’s destruction, the program
has undefined behavior. Terminating every thread before a call to std::exit or the exit from main is
sufficient, but not necessary, to satisfy these requirements. These requirements permit thread managers as
static-storage-duration objects. — end note ]

If we choose not to run the TLS destructor's once program termination has begun we should still make an effort to free all memory allocated by the internal TLS structures to placate ASAN.

In summation I believe we should take this patch as a partial fix to the above problem and continue to investigate if we should disable thread exit notifications once program termination has begun.

@mclow.lists: ping. It would be nice to fix this and prevent the ASAN failures.

mclow.lists accepted this revision.Aug 3 2015, 11:24 AM
mclow.lists edited edge metadata.

LGTM. I hate the leaking, but I think it's the best we can do at this time.

This revision is now accepted and ready to land.Aug 3 2015, 11:24 AM
EricWF closed this revision.Aug 18 2015, 12:41 PM