This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
src/
17/30
cxa_thread_atexit.cpp
-
test/
-
CMakeLists.txt
-
cxa_thread_atexit_test.pass.cpp
-
libcxxabi/test/
-
test/
-
config.py
-
lit.site.cfg.in
6/6
thread_local_destruction_order.pass.cpp

Differential D21803

[libcxxabi] Provide a fallback __cxa_thread_atexit() implementation
ClosedPublic

Authored by tavianator on Jun 28 2016, 10:49 AM.

Download Raw Diff

Details

Reviewers

mclow.lists
danalbert
jroelofs
EricWF

Summary

__cxa_thread_atexit_impl() isn't present on all platforms, for example
Android pre-6.0. This patch uses a weak symbol to detect _impl()
support, falling back to a pthread_key_t-based implementation.

Diff Detail

Event Timeline

tavianator updated this revision to Diff 62108.Jun 28 2016, 10:49 AM

tavianator retitled this revision from to [libcxxabi] Provide a fallback __cxa_thread_atexit() implementation.

tavianator updated this object.

tavianator added reviewers: danalbert, jroelofs.

tavianator added a subscriber: cfe-commits.

Herald added subscribers: danalbert, tberghammer. · View Herald TranscriptJun 28 2016, 10:49 AM

majnemer added a subscriber: majnemer.Jun 28 2016, 3:30 PM

majnemer added inline comments.

src/cxa_thread_atexit.cpp
18–133	Is there a reason to use the weak symbol on non-Android platforms?
36–47	What happens if `pthread_key_create` fails?

tavianator added inline comments.Jun 28 2016, 6:52 PM

src/cxa_thread_atexit.cpp
18–133	Just simplicity. (There are some fringe benefits, like the ability to build against pre-2.18 glibc but have it detect ..._impl() support when run with newer libraries. Or magically supporting musl etc. if they add it, without having to recompile.) Would you rather keep the build-time checks for non-Android platforms?
36–47	Nothing good! I'll add the necessary error handling.

Added error handling for pthread_key uses

You should look at __thread_specific_ptr in libcxx's <thread>. It does a lot of these things in order to satisfy the requirements of notify_all_at_thread_exit, set_value_at_thread_exit, and make_ready_at_thread_exit.

@rmaprath has been doing some work to make the threading runtime library swappable. I don't recall if his work extended to libcxxabi or not, but I'll page him anyway.

This implementation of __cxa_thread_atexit doesn't interact nicely with shared libraries. The following sequence of events causes unloaded code to get invoked.

Create thread 42
Load library foo from thread 42
Call a function with a thread_local object with a dtor.
Unload library foo.
Join thread 42

glibc does some extra work during __cxa_thread_atexit_impl to bump the reference count of the shared library so that the user's "unload" doesn't actually unload the library.

src/cxa_thread_atexit.cpp
40	I think this is correct, but it needs some comments because it is not obvious what (or why) this is implemented this way. More specifically, document the cases where run_dtors is run because of ~DtorListHolder vs. the cases where run_dtors is run because of the callback registered at pthread_key_create.

In D21803#469988, @bcraig wrote:

You should look at __thread_specific_ptr in libcxx's <thread>. It does a lot of these things in order to satisfy the requirements of notify_all_at_thread_exit, set_value_at_thread_exit, and make_ready_at_thread_exit.

Had a look at it. One thing that stands out is that notify_all_at_thread_exit() and friends are supposed to be invoked *after* thread_local destructors. But the order that pthread_key destructors run in is unspecified. This could be worked around by waiting for the second iteration through pthread_key destructors before triggering ~__thread_struct_imp(). It looks like libstdc++ has a similar bug if ..._impl() isn't available.

@rmaprath has been doing some work to make the threading runtime library swappable. I don't recall if his work extended to libcxxabi or not, but I'll page him anyway.

<__threading_support>? Seems to be libc++-specific. There's a few other raw uses of pthreads in libc++abi.

This implementation of __cxa_thread_atexit doesn't interact nicely with shared libraries. The following sequence of events causes unloaded code to get invoked.

Create thread 42

Load library foo from thread 42

Call a function with a thread_local object with a dtor.

Unload library foo.

Join thread 42

glibc does some extra work during __cxa_thread_atexit_impl to bump the reference count of the shared library so that the user's "unload" doesn't actually unload the library.

Yep. This is about as good as libc++abi can do on its own though. Note that libstdc++ has similar limitations if ..._impl() isn't available.

In D21803#470060, @tavianator wrote:

In D21803#469988, @bcraig wrote:

You should look at __thread_specific_ptr in libcxx's <thread>. It does a lot of these things in order to satisfy the requirements of notify_all_at_thread_exit, set_value_at_thread_exit, and make_ready_at_thread_exit.

Had a look at it. One thing that stands out is that notify_all_at_thread_exit() and friends are supposed to be invoked *after* thread_local destructors. But the order that pthread_key destructors run in is unspecified. This could be worked around by waiting for the second iteration through pthread_key destructors before triggering ~__thread_struct_imp(). It looks like libstdc++ has a similar bug if ..._impl() isn't available.

It also intentionally leaks the pthread key. Does the __thread_specific_ptr rationale hold for this change as well?

This implementation of __cxa_thread_atexit doesn't interact nicely with shared libraries. The following sequence of events causes unloaded code to get invoked.

Create thread 42

Load library foo from thread 42

Call a function with a thread_local object with a dtor.

Unload library foo.

Join thread 42

glibc does some extra work during __cxa_thread_atexit_impl to bump the reference count of the shared library so that the user's "unload" doesn't actually unload the library.

Yep. This is about as good as libc++abi can do on its own though. Note that libstdc++ has similar limitations if ..._impl() isn't available.

I was going to tell you that this is implementable with dladdr (which I think Android has). Then I looked more at the "prevent unloading" side of things, and it looks like that requires digging into the library structures directly. Ugh.

Comment on the limitation in the source, but you don't need to change any code for this item.

In D21803#470060, @tavianator wrote:

In D21803#469988, @bcraig wrote:

@rmaprath has been doing some work to make the threading runtime library swappable. I don't recall if his work extended to libcxxabi or not, but I'll page him anyway.

<__threading_support>? Seems to be libc++-specific. There's a few other raw uses of pthreads in libc++abi.

Plan to get started with libc++abi as soon as I'm done with libc++, I've hit a couple of bumps with the latter, resolving them at the moment.

I don't see any new pthread dependencies introduced in this patch, so it sounds OK to me. Thanks for the ping though!

The patch itself looks OK to me. Will need approval from @mclow.lists or @EricWF.

majnemer added inline comments.Jun 29 2016, 10:59 AM

src/cxa_thread_atexit.cpp
18–133	I think that this should be an opt-in mechanism, there are platforms that presumably never need to pay the cost of the unused code (macOS comes to mind).

In D21803#470086, @bcraig wrote:

It also intentionally leaks the pthread key. Does the __thread_specific_ptr rationale hold for this change as well?

Hmm, maybe? If other global destructors run after ~DtorListHolder(), and they cause a thread_local to be initialized for the first time, __cxa_thread_atexit() might be called again. I was thinking that dtors would get re-initialized in that case but it appears it does not. So yeah, I think I'll need to leak the pthread_key_t.

I'm not sure how to avoid leaking the actual thread_local objects that get created in that situation. There's nothing left to trigger run_dtors() a second time.

src/cxa_thread_atexit.cpp
18–133	This file is only built for UNIX AND NOT (APPLE OR CYGWIN). Other platforms use something other than __cxa_thread_atexit() I assume.

Hmm, maybe? If other global destructors run after ~DtorListHolder(), and they cause a thread_local to be initialized for the first time, __cxa_thread_atexit() might be called again. I was thinking that dtors would get re-initialized in that case but it appears it does not. So yeah, I think I'll need to leak the pthread_key_t.

I'm not sure how to avoid leaking the actual thread_local objects that get created in that situation. There's nothing left to trigger run_dtors() a second time.

I'm not concerned about the loss of memory or pthread_key resources in this leak, as it is a very short-lived leak (the process is going away after all). We do need to have an idea of what happens with the destructor invocations for the other kinds of resources though.

I think the C++14 spec says what should happen.

3.6.3 Termination

[...] The completions

of the destructors for all initialized objects with thread storage duration within that thread are sequenced
before the initiation of the destructors of any object with static storage duration. If the completion of the
constructor or dynamic initialization of an object with thread storage duration is sequenced before that of
another, the completion of the destructor of the second is sequenced before the initiation of the destructor
of the first. If the completion of the constructor or dynamic initialization of an object with static storage
duration is sequenced before that of another, the completion of the destructor of the second is sequenced
before the initiation of the destructor of the first.

What that means for this implementation is that I think that _cxa_thread_atexit is allowed to be called during run_dtors. If running the dtor for a thread local variable 'cat', we encounter a previously unseen thread_local 'dog', the compiler will call the ctor, then register the dtor with _cxa_thread_atexit. Since it is the most recently constructed thread local object, I would expect the 'dog' dtor to be the next dtor to be run. You may be able to support this just by moving "elem = elem->next" below the dtor invocation.

On the topic of __cxa_thread_atexit, was it ever specified how it interacts with things like thread cancellation?

dimitry added a subscriber: dimitry.Jun 29 2016, 4:04 PM

dimitry added inline comments.

src/cxa_thread_atexit.cpp
45	run_dtors() is called when/if libc++.so gets unloaded... but only for the thread calling dlclose()?

In D21803#470564, @joerg wrote:

On the topic of __cxa_thread_atexit, was it ever specified how it interacts with things like thread cancellation?

I don't think it's officially specified anywhere. C++ threads don't have a cancel method. The POSIX spec doesn't speak about the C++ ABI. The Itanium ABI could talk about this, but hasn't yet.

I think this implementation does the right thing with regards to cancellation though. POSIX says that first cancellation cleanup handlers are called, then thread-specific data destructors are called. pthread_cancel is still a really bad idea due to how it (doesn't) interact with RAII, but at least TLS data won't get leaked.

src/cxa_thread_atexit.cpp
45	Most of the dtor magic is on the pthread_key_create side. pthreads lets you register a per-thread destructor. This destructor is only run on process termination (I think).

dimitry added inline comments.Jun 30 2016, 9:12 AM

src/cxa_thread_atexit.cpp
45	I meant the call from ~DtorListHolder()

Fixed some corner cases regarding destruction order and very-late-initialized thread_locals. Explicitly documented the known limitations compared to __cxa_thread_atexit_impl().

In D21803#470448, @bcraig wrote:

What that means for this implementation is that I think that _cxa_thread_atexit is allowed to be called during run_dtors. If running the dtor for a thread local variable 'cat', we encounter a previously unseen thread_local 'dog', the compiler will call the ctor, then register the dtor with _cxa_thread_atexit. Since it is the most recently constructed thread local object, I would expect the 'dog' dtor to be the next dtor to be run. You may be able to support this just by moving "elem = elem->next" below the dtor invocation.

It wasn't quite that easy (have to re-look at the pthread_key to get newly added thread_locals), but that's done in the latest patch.

src/cxa_thread_atexit.cpp
38	See http://stackoverflow.com/q/38130185/502399 for a test case that would trigger this. This may not be necessary depending on the answer to that question.
45	This has changed somewhat in the latest patch, but the gist is similar. If libc++abi.so is dlclose()d, there had better not be any still-running threads that expect to execute thread_local destructors (or any other C++ code, for that matter). In the usual case (libc++abi.so loaded at startup, not by a later dlopen()), the last run_dtors() call happens as the final thread is exiting.

Added missing __dso_handle declaration.

Fix copy-pasta that result in an infinite loop.

bcraig added inline comments.Jun 30 2016, 12:57 PM

src/cxa_thread_atexit.cpp
47	Why are we doing this? I can see it being a little useful when debugging / developing, so that you get an early warning that something has gone wrong, but it seems like this will always be setting a value to the value it already has.
54	Maybe this concern is unfounded, but I'm not overly fond of pthread_getspecific and setspecific in a loop. I've always been under the impression that those functions are rather slow. Could we add a layer of indirection so that we don't need to call getspecific and setspecific so often? Basically make the pointer that is directly stored in TLS an immutable pointer to pointer.

tavianator added inline comments.Jun 30 2016, 1:17 PM

src/cxa_thread_atexit.cpp
47	pthread_key destructors run after the key is set to null. I re-set it here since the loop reads the key.
54	Sure, I can do that. Would reduce the number of setspecific() calls in __cxa_thread_atexit too.

Also, can you add test cases for a lot of these things? I don't expect test cases for the DSO side of things, but a lot of the tricky atexit cases should be covered.

Added a test case for destructor ordering. Got rid of pthread_{get,set}specific in a loop.

bcraig added inline comments.Jul 5 2016, 9:24 AM

src/cxa_thread_atexit.cpp
23	I'm going to have to agree with @majnemer. I think that the config check for LIBCXXABI_HAS_CXA_THREAD_ATEXIT_IMPL should stay in place. If cxa_thread_atexit_impl exists, then all of the fallback code can disappear at preprocessing time. We do lose out on the minor benefit of avoiding some libc++ recompiles, but we also avoid code bloat. For what it's worth, I'm willing to keep the weak symbol check in place if __cxa_thread_atexit_impl isn't present, I just don't want to pay for the fallback when I know I'm not going to use it.
47	The loop doesn't read pthread_getspecific anymore. I get the need for the setspecific call here for your previous design, but I don't think it's needed now.
test/thread_local_destruction_order.pass.cpp
2	Nit: file name is wrong here.
49	Can we have a CreatesThreadLocalInDestructor in the thread_fn as well? That way we can test both the main function and a pthread. If I understand your code and comments correctly, those go through different code paths.
55	In the places where you can, validate that dtors actually are getting called. This may be your only place where you can do that. So something like 'assert(seq == 1)' here.

tavianator added inline comments.Jul 5 2016, 12:13 PM

src/cxa_thread_atexit.cpp
23	Makes sense, I'll do that.
47	__cxa_thread_atexit() calls pthread_getspecific(), so it's still needed. Otherwise it would create a new list instead of adding to the current one, and the ordering would be wrong.
test/thread_local_destruction_order.pass.cpp
49	Yep, meant to do that actually!
55	Sounds good. What I wanted to do was print some output in the destructors, and check for a certain expected output. But I couldn't figure out how to do that with lit.

bcraig added inline comments.Jul 5 2016, 12:23 PM

test/thread_local_destruction_order.pass.cpp
55	Normally, you would do that by piping the output to the llvm FileCheck utility. My unconfirmed suspicion is that that approach will run into difficulties because of libcxx and libcxxabi specific setups. I think just having the global object's dtor check is good enough for the final post condition though. I'm not terribly worried about global dtors malfunctioning.

Bring back HAVE___CXA_THREAD_ATEXIT_IMPL, and avoid the weak symbol/fallback implementation in that case
Fix a leak in an error path
Add a CreatesThreadLocalInDestructor to a non-main thread in the destructor ordering test

LGTM (with a comment nit), but you'll need to get approval from @EricWF or @mclow.lists.

I would like some of the information from your stack overflow post to make it's way to the comments. In particular, I think I would like to see it documented that we have made a choice for some undefined behavior.

Update comments to mention that late-initialized thread_locals invoke undefined behavior.

Make sure the tail of the list is null.

Anything else I need to do for this patch?

Ping?

In D21803#530678, @tavianator wrote:

Ping?

Well, I still think it's fine. Maybe a direct message to @mclow.lists or @EricWF?

In D21803#530681, @bcraig wrote:

In D21803#530678, @tavianator wrote:

Ping?

Well, I still think it's fine. Maybe a direct message to @mclow.lists or @EricWF?

Is there a way to do that through Phabricator? Or did you mean to email them directly? (Not sure what their emails are but I can probably figure it out from the list history.)

The "@" will do a ping through phabricator, but a direct email is probably going to be your best bet at this point.

I'll look at this within the hour.

We can perform far fewer calls to pthread_getspecific/pthread_setspecific if we represent the list head using a global __thread DtorList* list_head = nullptr.
This also allows us to avoid the hack of setting/unsetting the key during run_dtors() which I really do not like.

Here is a patch that applies such changes: https://gist.github.com/EricWF/a071376b1216aabdd1695eec2175c374

What do you think of this idea?

src/cxa_thread_atexit.cpp
42	Can you clarify what you mean by "other threads"? How is libc++ supposed to detect and handle this problem?

In D21803#532309, @EricWF wrote:

__thread

What do you think of this idea?

You'll have to guard it against all the platforms that don't support TLS. Darwin 10.6 is one of them.

In D21803#532347, @jroelofs wrote:

In D21803#532309, @EricWF wrote:

__thread

What do you think of this idea?

You'll have to guard it against all the platforms that don't support TLS. Darwin 10.6 is one of them.

Which is fine because we shouldn't supply a definition of __cxa_thread_atexit on those platforms.

In D21803#532382, @EricWF wrote:

In D21803#532347, @jroelofs wrote:

In D21803#532309, @EricWF wrote:

__thread

What do you think of this idea?

You'll have to guard it against all the platforms that don't support TLS. Darwin 10.6 is one of them.

Which is fine because we shouldn't supply a definition of __cxa_thread_atexit on those platforms.

Ah, ok. Sounds good.

In D21803#532309, @EricWF wrote:

__thread

What do you think of this idea?

Makes sense to me, I'll integrate it into the next revision.

src/cxa_thread_atexit.cpp
42	I meant "non-main threads" ("other" is in relation to the bullet point above), but I can clarify this, sure. libc++ could be patched to do something like this: pthread_key_t key1, key2; void destructor1(void* ptr) { pthread_setspecific(key2, ptr); } void destructor2(void* ptr) { // Runs in the second iteration through pthread_key destructors, // therefore after thread_local destructors } pthread_key_create(&key1, destructor1); pthread_key_create(&key2, destructor2); pthread_setspecific(key1, ptr); (Or it could use a counter/flag and a single pthread_key.) libstdc++ has the same bug when __cxa_thread_atexit_impl() isn't available, so I'm not sure that change would really be necessary. If it is, I can write up the libc++ patch.

EricWF added inline comments.Sep 2 2016, 11:36 AM

src/cxa_thread_atexit.cpp
42	I don't think we need to patch this in libc++. Especially because it would be incorrect in the vast majority of cases.

Uses a __thread variable to hold the destructor list, as @EricWF suggested.

LGTM modulo bug fix.

src/cxa_thread_atexit.cpp
70	There is a bug here. If `head->next == nullptr` and if `head->dtor(head->obj))` creates a TL variable in the destructor then that destructor will not be invoked. Here's an updated test case which catches the bug: https://gist.github.com/EricWF/3bb50d4f28b91aa28d2adefea0e94a0e

tavianator added inline comments.Sep 7 2016, 7:04 AM

src/cxa_thread_atexit.cpp
70	I can't reproduce that failure here, your exact test case passes (even with `#undef HAVE___CXA_THREAD_ATEXIT_IMPL` and the weak symbol test commented out). Tracing the implementation logic, it seems correct. If `head->next == nullptr` then this line does `dtors = nullptr`. Then if `head->dtor(head->obj)` registers a new `thread_local`, `__cxa_thread_atexit()` does `head = malloc(...); ... dtors = head;`. Then the next iteration of the loop `while (auto head = dtors) {` picks up that new node. Have I missed something?

EricWF added inline comments.Sep 7 2016, 2:03 PM

src/cxa_thread_atexit.cpp
70	I can't reproduce this morning either, I must have been doing something funny. I'll look at this with a fresh head tomorrow. If I can't find anything this will be good to go. Thanks for working on this.

tavianator added inline comments.Sep 8 2016, 1:18 PM

src/cxa_thread_atexit.cpp
70	No problem! I can integrate your updated test case anyway if you want.

Hahnfeld added a subscriber: Hahnfeld.Sep 13 2016, 8:21 AM

LGTM after addressing the inline comments.

src/cxa_thread_atexit.cpp
70	Yeah I would like to see the upgraded test case applied. At least that way we're testing the case in question. So I agree with your above analysis of what happens, and that all destructors are correctly called during the first iteration of pthread key destruction. My one issues is that we still register a new non-null key which forces pthread to run the destructor for the key again. I would like to see this fixed.

Integrated @EricWF's expanded test case, and avoid an unneeded pthread_setspecific() call if the last thread_local's destructor initializes a new thread_local.

Herald added subscribers: mgorny, beanz. · View Herald TranscriptSep 15 2016, 8:24 AM

tavianator marked 5 inline comments as done.Sep 15 2016, 8:27 AM

tavianator added inline comments.

src/cxa_thread_atexit.cpp
70	Yep, done! I guess that was the point of `thread_alive` from your original patch-to-my-patch, sorry for stripping it out.

LGTM. Thanks for the patch.

src/cxa_thread_atexit.cpp
70	Na I thought I was solving a bug that didn't exist. You give me too much credit.

This revision is now accepted and ready to land.Sep 15 2016, 12:05 PM

s/indended/intended/

rmaprath mentioned this in D24864: [libcxxabi] Refactor pthread usage into a separate API.Sep 30 2016, 1:59 AM

@tavianator: I'm about to commit D24864, which will affect this patch (indirectly). D24864 basically refactors all pthread dependencies behind a separate API. It would be pretty straightforward for you to update this patch though, just replacing pthread calls with ones in thread_support.h (perhaps adding anything missing).

Hope you don't mind me going first? If you are going to commit this soon, I can hold off D24864. Let me know.

Cheers,

/ Asiri

@rmaprath I'll merge this if needed. Feel free to commit your patch first.

@tavianator Do you need somebody to merge this for you?

In D21803#556857, @EricWF wrote:

@rmaprath I'll merge this if needed. Feel free to commit your patch first.

Yeah, @rmaprath I'm happy to rebase this over your patch.

@tavianator Do you need somebody to merge this for you?

I assume so, yeah.

In D21803#567774, @tavianator wrote:

In D21803#556857, @EricWF wrote:

@rmaprath I'll merge this if needed. Feel free to commit your patch first.

Yeah, @rmaprath I'm happy to rebase this over your patch.

My patch got a bit stuck downstream :(

So you / @EricWF can go ahead with this patch, I'll rebase mine over yours when I'm ready to commit.

/ Asiri

Committed as r283988. Thanks for the patch!

Revision Contents

Path

Size

src/

cxa_thread_atexit.cpp

123 lines

test/

CMakeLists.txt

1 line

cxa_thread_atexit_test.pass.cpp

1 line

libcxxabi/

test/

config.py

2 lines

lit.site.cfg.in

1 line

thread_local_destruction_order.pass.cpp

59 lines

Diff 70406

src/cxa_thread_atexit.cpp

	//===----------------------- cxa_thread_atexit.cpp ------------------------===//			//===----------------------- cxa_thread_atexit.cpp ------------------------===//
	//			//
	// The LLVM Compiler Infrastructure			// The LLVM Compiler Infrastructure
	//			//
	// This file is dual licensed under the MIT and the University of Illinois Open			// This file is dual licensed under the MIT and the University of Illinois Open
	// Source Licenses. See LICENSE.TXT for details.			// Source Licenses. See LICENSE.TXT for details.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

				#include "abort_message.h"
	#include "cxxabi.h"			#include "cxxabi.h"
				#include <cstdlib>
				#include <pthread.h>

	namespace __cxxabiv1 {			namespace __cxxabiv1 {

				using Dtor = void()(void);

				extern "C"
				#ifndef HAVE___CXA_THREAD_ATEXIT_IMPL
				// A weak symbol is used to detect this function's presence in the C library
				// at runtime, even if libc++ is built against an older libc
				__attribute__((__weak__))
				bcraigUnsubmitted Done Reply Inline Actions I'm going to have to agree with @majnemer. I think that the config check for LIBCXXABI_HAS_CXA_THREAD_ATEXIT_IMPL should stay in place. If cxa_thread_atexit_impl exists, then all of the fallback code can disappear at preprocessing time. We do lose out on the minor benefit of avoiding some libc++ recompiles, but we also avoid code bloat. For what it's worth, I'm willing to keep the weak symbol check in place if __cxa_thread_atexit_impl isn't present, I just don't want to pay for the fallback when I know I'm not going to use it. bcraig: I'm going to have to agree with @majnemer. I think that the config check for…
				tavianatorAuthorUnsubmitted Done Reply Inline Actions Makes sense, I'll do that. tavianator: Makes sense, I'll do that.
				#endif
				int __cxa_thread_atexit_impl(Dtor, void, void);

				#ifndef HAVE___CXA_THREAD_ATEXIT_IMPL

				namespace {
				// This implementation is used if the C library does not provide
				// __cxa_thread_atexit_impl() for us. It has a number of limitations that are
				// difficult to impossible to address without ..._impl():
				//
				// - dso_symbol is ignored. This means that a shared library may be unloaded
				// (via dlclose()) before its thread_local destructors have run.
				//
				// - thread_local destructors for the main thread are run by the destructor of
				// a static object. This is later than expected; they should run before the
				tavianatorAuthorUnsubmitted Done Reply Inline Actions See http://stackoverflow.com/q/38130185/502399 for a test case that would trigger this. This may not be necessary depending on the answer to that question. tavianator: See http://stackoverflow.com/q/38130185/502399 for a test case that would trigger this. This…
				// destructors of any objects with static storage duration.
				//
				bcraigUnsubmitted Done Reply Inline Actions I think this is correct, but it needs some comments because it is not obvious what (or why) this is implemented this way. More specifically, document the cases where run_dtors is run because of ~DtorListHolder vs. the cases where run_dtors is run because of the callback registered at pthread_key_create. bcraig: I think this is correct, but it needs some comments because it is not obvious what (or why)…
				// - thread_local destructors on non-main threads run on the first iteration
				// through the pthread_key destructors. std::notify_all_at_thread_exit()
				EricWFUnsubmitted Not Done Reply Inline Actions Can you clarify what you mean by "other threads"? How is libc++ supposed to detect and handle this problem? EricWF: Can you clarify what you mean by "other threads"? How is libc++ supposed to detect and handle…
				tavianatorAuthorUnsubmitted Not Done Reply Inline Actions I meant "non-main threads" ("other" is in relation to the bullet point above), but I can clarify this, sure. libc++ could be patched to do something like this: pthread_key_t key1, key2; void destructor1(void* ptr) { pthread_setspecific(key2, ptr); } void destructor2(void* ptr) { // Runs in the second iteration through pthread_key destructors, // therefore after thread_local destructors } pthread_key_create(&key1, destructor1); pthread_key_create(&key2, destructor2); pthread_setspecific(key1, ptr); (Or it could use a counter/flag and a single pthread_key.) libstdc++ has the same bug when __cxa_thread_atexit_impl() isn't available, so I'm not sure that change would really be necessary. If it is, I can write up the libc++ patch. tavianator: I meant "non-main threads" ("other" is in relation to the bullet point above), but I can…
				EricWFUnsubmitted Not Done Reply Inline Actions I don't think we need to patch this in libc++. Especially because it would be incorrect in the vast majority of cases. EricWF: I don't think we need to patch this in libc++. Especially because it would be incorrect in the…
				// and similar functions must be careful to wait until the second iteration
				// to provide their indended ordering guarantees.
				//
				dimitryUnsubmitted Not Done Reply Inline Actions run_dtors() is called when/if libc++.so gets unloaded... but only for the thread calling dlclose()? dimitry: run_dtors() is called when/if libc++.so gets unloaded... but only for the thread calling…
				bcraigUnsubmitted Not Done Reply Inline Actions Most of the dtor magic is on the pthread_key_create side. pthreads lets you register a per-thread destructor. This destructor is only run on process termination (I think). bcraig: Most of the dtor magic is on the pthread_key_create side. pthreads lets you register a per…
				dimitryUnsubmitted Not Done Reply Inline Actions I meant the call from ~DtorListHolder() dimitry: I meant the call from ~DtorListHolder()
				tavianatorAuthorUnsubmitted Not Done Reply Inline Actions This has changed somewhat in the latest patch, but the gist is similar. If libc++abi.so is dlclose()d, there had better not be any still-running threads that expect to execute thread_local destructors (or any other C++ code, for that matter). In the usual case (libc++abi.so loaded at startup, not by a later dlopen()), the last run_dtors() call happens as the final thread is exiting. tavianator: This has changed somewhat in the latest patch, but the gist is similar. If libc++abi.so is…
				// Another limitation, though one shared with ..._impl(), is that any
				// thread_locals that are first initialized after non-thread_local global
				majnemerUnsubmitted Done Reply Inline Actions What happens if `pthread_key_create` fails? majnemer: What happens if `pthread_key_create` fails?
				tavianatorAuthorUnsubmitted Done Reply Inline Actions Nothing good! I'll add the necessary error handling. tavianator: Nothing good! I'll add the necessary error handling.
				bcraigUnsubmitted Not Done Reply Inline Actions Why are we doing this? I can see it being a little useful when debugging / developing, so that you get an early warning that something has gone wrong, but it seems like this will always be setting a value to the value it already has. bcraig: Why are we doing this? I can see it being a little useful when debugging / developing, so that…
				tavianatorAuthorUnsubmitted Not Done Reply Inline Actions pthread_key destructors run after the key is set to null. I re-set it here since the loop reads the key. tavianator: pthread_key destructors run after the key is set to null. I re-set it here since the loop…
				bcraigUnsubmitted Not Done Reply Inline Actions The loop doesn't read pthread_getspecific anymore. I get the need for the setspecific call here for your previous design, but I don't think it's needed now. bcraig: The loop doesn't read pthread_getspecific anymore. I get the need for the setspecific call…
				tavianatorAuthorUnsubmitted Not Done Reply Inline Actions __cxa_thread_atexit() calls pthread_getspecific(), so it's still needed. Otherwise it would create a new list instead of adding to the current one, and the ordering would be wrong. tavianator: __cxa_thread_atexit() calls pthread_getspecific(), so it's still needed. Otherwise it would…
				// destructors begin to run will not be destroyed. [basic.start.term] states
				// that all thread_local destructors are sequenced before the destruction of
				// objects with static storage duration, resulting in a contradiction if a
				// thread_local is constructed after that point. Thus we consider such
				// programs ill-formed, and don't bother to run those destructors. (If the
				// program terminates abnormally after such a thread_local is constructed,
				// the destructor is not expected to run and thus there is no contradiction.
				bcraigUnsubmitted Done Reply Inline Actions Maybe this concern is unfounded, but I'm not overly fond of pthread_getspecific and setspecific in a loop. I've always been under the impression that those functions are rather slow. Could we add a layer of indirection so that we don't need to call getspecific and setspecific so often? Basically make the pointer that is directly stored in TLS an immutable pointer to pointer. bcraig: Maybe this concern is unfounded, but I'm not overly fond of pthread_getspecific and setspecific…
				tavianatorAuthorUnsubmitted Done Reply Inline Actions Sure, I can do that. Would reduce the number of setspecific() calls in __cxa_thread_atexit too. tavianator: Sure, I can do that. Would reduce the number of setspecific() calls in __cxa_thread_atexit too.
				// So construction still has to work.)

				struct DtorList {
				Dtor dtor;
				void* obj;
				DtorList* next;
				};

				// The linked list of thread-local destructors to run
				__thread DtorList* dtors = nullptr;
				// Used to trigger destructors on thread exit; value is ignored
				pthread_key_t dtors_key;

				void run_dtors(void*) {
				while (auto head = dtors) {
				dtors = head->next;
				EricWFUnsubmitted Done Reply Inline Actions There is a bug here. If `head->next == nullptr` and if `head->dtor(head->obj))` creates a TL variable in the destructor then that destructor will not be invoked. Here's an updated test case which catches the bug: https://gist.github.com/EricWF/3bb50d4f28b91aa28d2adefea0e94a0e EricWF: There is a bug here. If `head->next == nullptr` and if `head->dtor(head->obj))` creates a TL…
				tavianatorAuthorUnsubmitted Done Reply Inline Actions I can't reproduce that failure here, your exact test case passes (even with `#undef HAVE___CXA_THREAD_ATEXIT_IMPL` and the weak symbol test commented out). Tracing the implementation logic, it seems correct. If `head->next == nullptr` then this line does `dtors = nullptr`. Then if `head->dtor(head->obj)` registers a new `thread_local`, `__cxa_thread_atexit()` does `head = malloc(...); ... dtors = head;`. Then the next iteration of the loop `while (auto head = dtors) {` picks up that new node. Have I missed something? tavianator: I can't reproduce that failure here, your exact test case passes (even with `#undef…
				EricWFUnsubmitted Done Reply Inline Actions I can't reproduce this morning either, I must have been doing something funny. I'll look at this with a fresh head tomorrow. If I can't find anything this will be good to go. Thanks for working on this. EricWF: I can't reproduce this morning either, I must have been doing something funny. I'll look at…
				tavianatorAuthorUnsubmitted Done Reply Inline Actions No problem! I can integrate your updated test case anyway if you want. tavianator: No problem! I can integrate your updated test case anyway if you want.
				EricWFUnsubmitted Done Reply Inline Actions Yeah I would like to see the upgraded test case applied. At least that way we're testing the case in question. So I agree with your above analysis of what happens, and that all destructors are correctly called during the first iteration of pthread key destruction. My one issues is that we still register a new non-null key which forces pthread to run the destructor for the key again. I would like to see this fixed. EricWF: Yeah I would like to see the upgraded test case applied. At least that way we're testing the…
				tavianatorAuthorUnsubmitted Not Done Reply Inline Actions Yep, done! I guess that was the point of `thread_alive` from your original patch-to-my-patch, sorry for stripping it out. tavianator: Yep, done! I guess that was the point of `thread_alive` from your original patch-to-my-patch…
				EricWFUnsubmitted Not Done Reply Inline Actions Na I thought I was solving a bug that didn't exist. You give me too much credit. EricWF: Na I thought I was solving a bug that didn't exist. You give me too much credit.
				head->dtor(head->obj);
				std::free(head);
				}
				}

				struct DtorsManager {
				DtorsManager() {
				// There is intentionally no matching pthread_key_delete call, as
				// __cxa_thread_atexit() may be called arbitrarily late (for example, from
				// global destructors or atexit() handlers).
				if (pthread_key_create(&dtors_key, run_dtors) != 0) {
				abort_message("pthread_key_create() failed in __cxa_thread_atexit()");
				}
				}

				~DtorsManager() {
				// pthread_key destructors do not run on threads that call exit()
				// (including when the main thread returns from main()), so we explicitly
				// call the destructor here. This runs at exit time (potentially earlier
				// if libc++abi is dlclose()'d). Any thread_locals initialized after this
				// point will not be destroyed.
				run_dtors(nullptr);
				}
				};
				} // namespace

				#endif // HAVE__CXA_THREAD_ATEXIT_IMPL

	extern "C" {			extern "C" {

				_LIBCXXABI_FUNC_VIS int __cxa_thread_atexit(Dtor dtor, void* obj, void* dso_symbol) throw() {
	#ifdef HAVE___CXA_THREAD_ATEXIT_IMPL			#ifdef HAVE___CXA_THREAD_ATEXIT_IMPL

	_LIBCXXABI_FUNC_VIS int __cxa_thread_atexit(void (dtor)(void ), void *obj,
	void *dso_symbol) throw() {
	extern int __cxa_thread_atexit_impl(void ()(void ), void , void );
	return __cxa_thread_atexit_impl(dtor, obj, dso_symbol);			return __cxa_thread_atexit_impl(dtor, obj, dso_symbol);
				#else
				if (__cxa_thread_atexit_impl) {
				return __cxa_thread_atexit_impl(dtor, obj, dso_symbol);
				} else {
				// Initialize the dtors pthread_key (uses __cxa_guard_*() for one-time
				// initialization and __cxa_atexit() for destruction)
				static DtorsManager manager;

				auto tail = dtors;
				if (!tail) {
				if (pthread_setspecific(dtors_key, &dtors_key) != 0) {
				return -1;
				}
	}			}

	#endif // HAVE__CXA_THREAD_ATEXIT_IMPL			auto head = static_cast<DtorList*>(std::malloc(sizeof(DtorList)));
				if (!head) {
				return -1;
				}

				head->dtor = dtor;
				head->obj = obj;
				head->next = tail;
				dtors = head;

				return 0;
				}
				#endif // HAVE___CXA_THREAD_ATEXIT_IMPL
				}

				majnemerUnsubmitted Done Reply Inline Actions Is there a reason to use the weak symbol on non-Android platforms? majnemer: Is there a reason to use the weak symbol on non-Android platforms?
				tavianatorAuthorUnsubmitted Done Reply Inline Actions Just simplicity. (There are some fringe benefits, like the ability to build against pre-2.18 glibc but have it detect ..._impl() support when run with newer libraries. Or magically supporting musl etc. if they add it, without having to recompile.) Would you rather keep the build-time checks for non-Android platforms? tavianator: Just simplicity. (There are some fringe benefits, like the ability to build against pre-2.18…
				majnemerUnsubmitted Done Reply Inline Actions I think that this should be an opt-in mechanism, there are platforms that presumably never need to pay the cost of the unused code (macOS comes to mind). majnemer: I think that this should be an opt-in mechanism, there are platforms that presumably never need…
				tavianatorAuthorUnsubmitted Done Reply Inline Actions This file is only built for UNIX AND NOT (APPLE OR CYGWIN). Other platforms use something other than __cxa_thread_atexit() I assume. tavianator: This file is only built for UNIX AND NOT (APPLE OR CYGWIN). Other platforms use something…
	} // extern "C"			} // extern "C"
	} // namespace __cxxabiv1			} // namespace __cxxabiv1

test/CMakeLists.txt

	Show All 10 Lines
	endif()			endif()

	pythonize_bool(LIBCXXABI_BUILD_32_BITS)			pythonize_bool(LIBCXXABI_BUILD_32_BITS)
	pythonize_bool(LIBCXX_ENABLE_SHARED)			pythonize_bool(LIBCXX_ENABLE_SHARED)
	pythonize_bool(LIBCXXABI_ENABLE_SHARED)			pythonize_bool(LIBCXXABI_ENABLE_SHARED)
	pythonize_bool(LIBCXXABI_ENABLE_THREADS)			pythonize_bool(LIBCXXABI_ENABLE_THREADS)
	pythonize_bool(LIBCXXABI_ENABLE_EXCEPTIONS)			pythonize_bool(LIBCXXABI_ENABLE_EXCEPTIONS)
	pythonize_bool(LIBCXXABI_USE_LLVM_UNWINDER)			pythonize_bool(LIBCXXABI_USE_LLVM_UNWINDER)
	pythonize_bool(LIBCXXABI_HAS_CXA_THREAD_ATEXIT_IMPL)
	set(LIBCXXABI_TARGET_INFO "libcxx.test.target_info.LocalTI" CACHE STRING			set(LIBCXXABI_TARGET_INFO "libcxx.test.target_info.LocalTI" CACHE STRING
	"TargetInfo to use when setting up test environment.")			"TargetInfo to use when setting up test environment.")
	set(LIBCXXABI_EXECUTOR "None" CACHE STRING			set(LIBCXXABI_EXECUTOR "None" CACHE STRING
	"Executor to use when running tests.")			"Executor to use when running tests.")

	set(AUTO_GEN_COMMENT "## Autogenerated by libcxxabi configuration.\n# Do not edit!")			set(AUTO_GEN_COMMENT "## Autogenerated by libcxxabi configuration.\n# Do not edit!")
	configure_file(			configure_file(
	${CMAKE_CURRENT_SOURCE_DIR}/lit.site.cfg.in			${CMAKE_CURRENT_SOURCE_DIR}/lit.site.cfg.in
	Show All 21 Lines

test/cxa_thread_atexit_test.pass.cpp

	//===--------------------- cxa_thread_atexit_test.cpp ---------------------===//			//===--------------------- cxa_thread_atexit_test.cpp ---------------------===//
	//			//
	// The LLVM Compiler Infrastructure			// The LLVM Compiler Infrastructure
	//			//
	// This file is dual licensed under the MIT and the University of Illinois Open			// This file is dual licensed under the MIT and the University of Illinois Open
	// Source Licenses. See LICENSE.TXT for details.			// Source Licenses. See LICENSE.TXT for details.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	// REQUIRES: linux			// REQUIRES: linux
	// REQUIRES: thread_atexit

	#include <assert.h>			#include <assert.h>
	#include <cxxabi.h>			#include <cxxabi.h>

	static bool AtexitImplCalled = false;			static bool AtexitImplCalled = false;

	extern "C" int __cxa_thread_atexit_impl(void (dtor)(void ), void *obj,			extern "C" int __cxa_thread_atexit_impl(void (dtor)(void ), void *obj,
	void *dso_symbol) {			void *dso_symbol) {
	Show All 15 Lines

test/libcxxabi/test/config.py

Show All 31 Lines	class Configuration(LibcxxConfiguration):
def configure_obj_root(self):		def configure_obj_root(self):
self.libcxxabi_obj_root = self.get_lit_conf('libcxxabi_obj_root')		self.libcxxabi_obj_root = self.get_lit_conf('libcxxabi_obj_root')
super(Configuration, self).configure_obj_root()		super(Configuration, self).configure_obj_root()

def configure_features(self):		def configure_features(self):
super(Configuration, self).configure_features()		super(Configuration, self).configure_features()
if not self.get_lit_bool('enable_exceptions', True):		if not self.get_lit_bool('enable_exceptions', True):
self.config.available_features.add('libcxxabi-no-exceptions')		self.config.available_features.add('libcxxabi-no-exceptions')
if self.get_lit_bool('thread_atexit', True):
self.config.available_features.add('thread_atexit')

def configure_compile_flags(self):		def configure_compile_flags(self):
self.cxx.compile_flags += ['-DLIBCXXABI_NO_TIMER']		self.cxx.compile_flags += ['-DLIBCXXABI_NO_TIMER']
if self.get_lit_bool('enable_exceptions', True):		if self.get_lit_bool('enable_exceptions', True):
self.cxx.compile_flags += ['-funwind-tables']		self.cxx.compile_flags += ['-funwind-tables']
else:		else:
self.cxx.compile_flags += ['-fno-exceptions', '-DLIBCXXABI_HAS_NO_EXCEPTIONS']		self.cxx.compile_flags += ['-fno-exceptions', '-DLIBCXXABI_HAS_NO_EXCEPTIONS']
if not self.get_lit_bool('enable_threads', True):		if not self.get_lit_bool('enable_threads', True):
Show All 26 Lines

test/lit.site.cfg.in

	@AUTO_GEN_COMMENT@			@AUTO_GEN_COMMENT@
	config.cxx_under_test = "@LIBCXXABI_COMPILER@"			config.cxx_under_test = "@LIBCXXABI_COMPILER@"
	config.project_obj_root = "@CMAKE_BINARY_DIR@"			config.project_obj_root = "@CMAKE_BINARY_DIR@"
	config.libcxxabi_src_root = "@LIBCXXABI_SOURCE_DIR@"			config.libcxxabi_src_root = "@LIBCXXABI_SOURCE_DIR@"
	config.libcxxabi_obj_root = "@LIBCXXABI_BINARY_DIR@"			config.libcxxabi_obj_root = "@LIBCXXABI_BINARY_DIR@"
	config.abi_library_path = "@LIBCXXABI_LIBRARY_DIR@"			config.abi_library_path = "@LIBCXXABI_LIBRARY_DIR@"
	config.libcxx_src_root = "@LIBCXXABI_LIBCXX_PATH@"			config.libcxx_src_root = "@LIBCXXABI_LIBCXX_PATH@"
	config.cxx_headers = "@LIBCXXABI_LIBCXX_INCLUDES@"			config.cxx_headers = "@LIBCXXABI_LIBCXX_INCLUDES@"
	config.cxx_library_root = "@LIBCXXABI_LIBCXX_LIBRARY_PATH@"			config.cxx_library_root = "@LIBCXXABI_LIBCXX_LIBRARY_PATH@"
	config.llvm_unwinder = "@LIBCXXABI_USE_LLVM_UNWINDER@"			config.llvm_unwinder = "@LIBCXXABI_USE_LLVM_UNWINDER@"
	config.enable_threads = "@LIBCXXABI_ENABLE_THREADS@"			config.enable_threads = "@LIBCXXABI_ENABLE_THREADS@"
	config.use_sanitizer = "@LLVM_USE_SANITIZER@"			config.use_sanitizer = "@LLVM_USE_SANITIZER@"
	config.enable_32bit = "@LIBCXXABI_BUILD_32_BITS@"			config.enable_32bit = "@LIBCXXABI_BUILD_32_BITS@"
	config.target_info = "@LIBCXXABI_TARGET_INFO@"			config.target_info = "@LIBCXXABI_TARGET_INFO@"
	config.executor = "@LIBCXXABI_EXECUTOR@"			config.executor = "@LIBCXXABI_EXECUTOR@"
	config.thread_atexit = "@LIBCXXABI_HAS_CXA_THREAD_ATEXIT_IMPL@"
	config.libcxxabi_shared = "@LIBCXXABI_ENABLE_SHARED@"			config.libcxxabi_shared = "@LIBCXXABI_ENABLE_SHARED@"
	config.enable_shared = "@LIBCXX_ENABLE_SHARED@"			config.enable_shared = "@LIBCXX_ENABLE_SHARED@"
	config.enable_exceptions = "@LIBCXXABI_ENABLE_EXCEPTIONS@"			config.enable_exceptions = "@LIBCXXABI_ENABLE_EXCEPTIONS@"

	# Let the main config do the real work.			# Let the main config do the real work.
	lit_config.load_config(config, "@LIBCXXABI_SOURCE_DIR@/test/lit.cfg")			lit_config.load_config(config, "@LIBCXXABI_SOURCE_DIR@/test/lit.cfg")

test/thread_local_destruction_order.pass.cpp

				//===-------------- thread_local_destruction_order.pass.cpp ---------------===//
				//
				bcraigUnsubmitted Done Reply Inline Actions Nit: file name is wrong here. bcraig: Nit: file name is wrong here.
				// The LLVM Compiler Infrastructure
				//
				// This file is dual licensed under the MIT and the University of Illinois Open
				// Source Licenses. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//

				// UNSUPPORTED: c++98, c++03

				#include <cassert>
				#include <thread>

				int seq = 0;

				class OrderChecker {
				public:
				explicit OrderChecker(int n) : n_{n} { }

				~OrderChecker() {
				assert(seq++ == n_);
				}

				private:
				int n_;
				};

				class CreatesThreadLocalInDestructor {
				public:
				CreatesThreadLocalInDestructor(int n) : n_{n} { }

				~CreatesThreadLocalInDestructor() {
				thread_local OrderChecker checker{n_};
				}

				private:
				int n_;
				};

				OrderChecker global{6};

				void thread_fn() {
				static OrderChecker fn_static{4};
				thread_local OrderChecker fn_thread_local{1};
				thread_local CreatesThreadLocalInDestructor creates_tl{0};
				}

				int main() {
				bcraigUnsubmitted Done Reply Inline Actions Can we have a CreatesThreadLocalInDestructor in the thread_fn as well? That way we can test both the main function and a pthread. If I understand your code and comments correctly, those go through different code paths. bcraig: Can we have a CreatesThreadLocalInDestructor in the thread_fn as well? That way we can test…
				tavianatorAuthorUnsubmitted Done Reply Inline Actions Yep, meant to do that actually! tavianator: Yep, meant to do that actually!
				static OrderChecker fn_static{5};

				std::thread{thread_fn}.join();
				assert(seq == 2);

				thread_local OrderChecker fn_thread_local{3};
				bcraigUnsubmitted Done Reply Inline Actions In the places where you can, validate that dtors actually are getting called. This may be your only place where you can do that. So something like 'assert(seq == 1)' here. bcraig: In the places where you can, validate that dtors actually are getting called. This may be your…
				tavianatorAuthorUnsubmitted Done Reply Inline Actions Sounds good. What I wanted to do was print some output in the destructors, and check for a certain expected output. But I couldn't figure out how to do that with lit. tavianator: Sounds good. What I wanted to do was print some output in the destructors, and check for a…
				bcraigUnsubmitted Done Reply Inline Actions Normally, you would do that by piping the output to the llvm FileCheck utility. My unconfirmed suspicion is that that approach will run into difficulties because of libcxx and libcxxabi specific setups. I think just having the global object's dtor check is good enough for the final post condition though. I'm not terribly worried about global dtors malfunctioning. bcraig: Normally, you would do that by piping the output to the llvm FileCheck utility. My unconfirmed…
				thread_local CreatesThreadLocalInDestructor creates_tl{2};

				return 0;
				}

This is an archive of the discontinued LLVM Phabricator instance.

[libcxxabi] Provide a fallback __cxa_thread_atexit() implementationClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 70406

src/cxa_thread_atexit.cpp

test/CMakeLists.txt

test/cxa_thread_atexit_test.pass.cpp

test/libcxxabi/test/config.py

test/lit.site.cfg.in

test/thread_local_destruction_order.pass.cpp

[libcxxabi] Provide a fallback __cxa_thread_atexit() implementation
ClosedPublic