Download Raw Diff

Details

Reviewers

ldionne
Mordante
• Quuxplusone
EricWF

Group Reviewers

Restricted Project

Summary

This is the 2nd of 4 changes to add support for POSIX(OFF) on z/OS.
See D117366 for more background, and D110349 for discussion of an
alternative implementation.

Depends on D117366

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

DanielMcIntosh-IBM requested review of this revision.Jan 14 2022, 5:01 PM

DanielMcIntosh-IBM created this revision.

Herald added a project: Restricted Project. · View Herald TranscriptJan 14 2022, 5:01 PM

Herald added a reviewer: Restricted Project. · View Herald Transcript

Herald added a subscriber: libcxx-commits. · View Herald Transcript

DanielMcIntosh-IBM added a child revision: D117375: [libcxx][SystemZ][z/OS] added internal_threading_support.h.Jan 14 2022, 5:11 PM

Harbormaster completed remote builds in B143534: Diff 400199.Jan 14 2022, 5:15 PM

DanielMcIntosh-IBM edited the summary of this revision. (Show Details)Jan 14 2022, 5:17 PM

Update Documentation

Harbormaster completed remote builds in B143541: Diff 400210.Jan 14 2022, 5:29 PM

DanielMcIntosh-IBM added inline comments.Jan 14 2022, 6:08 PM

libcxx/include/__locale
206–208	Kind of ugly that we still need to `#include "mutex"` just for this. Maybe I should change this to #if defined(_LIBCPP_ABI_MICROSOFT) uintptr_t __reserved; #else unsigned long __reserved; #endif And add a `static_assert(sizeof(locale::id.__reserved) == sizeof(once_flag))` somewhere. Not clear where I'd add it though.
215–216	I think technically this removes a symbol from the ABI, but it's one that really shouldn't ever get used by anybody. Does this mean I need to update something in `libcxx/lib`?
libcxx/src/locale.cpp
716–732	I'm not 100% certain the atomic load/store is necessary or that I got the memory ordering right. Looking at `std::call_once` as a reference, perhaps this should be if(__libcpp_atomic_load(&__id_, _AO_Acquire) != 0) and on line 726 __libcpp_atomic_store(&__id_, ++__next_id, _AO_Release); However, I think the reason `std::call_once` uses acquire/release memory ordering is to make sure "all concurrent calls to call_once are guaranteed to observe any side-effects made by the active call, with no additional synchronization", which doesn't really apply here.
732	I could have also addressed this by pulling `__call_once` out of mutex.cpp into a separate file, and had that use `internal_threading_support.h`. That would have made `std::call_once` usable when the base threading support library (e.g. pthread) is unavailable at runtime. However, since `std::call_once` is part of the C++11 Thread Support Library, I figured it's better to change things here so we're consistent about what we do and don't support when the thread support library is unavailable (and we don't needlessly affect the performance of `std::call_once`). Assuming this and D117373 are the only reason we support including `<mutex>` when `_LIBCPP_HAS_NO_THREADS` is defined (instead of giving an `#error` like we do with almost every other header that is part of the C++11 Thread support library), and both of these changes are accepted, it may make sense to stop supporting `<mutex>` with `_LIBCPP_HAS_NO_THREADS`.

Fix copy-paste error

Harbormaster completed remote builds in B143549: Diff 400221.Jan 14 2022, 6:35 PM

Update abi lists

Harbormaster completed remote builds in B143555: Diff 400229.Jan 14 2022, 10:39 PM

No change. Rebase onto D117366 and re-trigger CI

Harbormaster completed remote builds in B144330: Diff 401281.Jan 19 2022, 10:43 AM

DanielMcIntosh-IBM mentioned this in D120348: [libcxx][SystemZ][ POSIX(OFF) support on z/OS.Mar 1 2022, 8:19 AM

Do we have tests that fire up a bunch of threads and construct various locale::ids concurrently? If not we should add some. That will allow TSAN to diagnose bugs.

Overall, I think the current approach is still the best approach. call_once is always available in libc++, so you can use it unconditionally.

libcxx/src/locale.cpp
717	Where does `__id_` get initialized?
718	Before this change, there was no synchronization between different instances of `locale::id` objects because each object had its own `__flag`. There was an atomic increment, but that was non-blocking. Now, every construction of a new `locale::id` potentially blocks every construction of `locale::id`. This could significantly effect the behavior of existing multithreaded programs that use a lot of locales and a lot of threads. Also, couldn't this just be written static int& __id_ref = (__id_ = __cxx_atomic_add(__next_id, 1)); return __id; Which is just a fancier way of writing the current `call_once` version. (Which is available in all dialects and with/without threads).

This revision now requires changes to proceed.Mar 10 2022, 3:34 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 10 2022, 3:34 PM

EricWF added inline comments.Mar 10 2022, 3:36 PM

libcxx/lib/abi/arm64-apple-darwin.libcxxabi.v1.stable.exceptions.no_new_in_libcxx.abilist
1707 ↗	(On Diff #401281)	The fact this function went away, means the ABI is potentially broken.
libcxx/src/locale.cpp
717	Nevermind, I see it now.

In D117372#3374008, @EricWF wrote:

Do we have tests that fire up a bunch of threads and construct various locale::ids concurrently? If not we should add some. That will allow TSAN to diagnose bugs.

None that I can find. Should that be part of a separate patch?

Overall, I think the current approach is still the best approach. call_once is always available in libc++, so you can use it unconditionally.

Update: it appears that in the C++ standard, [atomics] has been moved to [thread.atomics] and what was previously known as the "Thread Support Library" is now the "Concurrency support library" (see https://github.com/cplusplus/draft/commit/d74c2170a9f4c928519461d7742293af2d141852). This will change how we have to word things in documentation & comments, but that's about it. I still believe libc++ shouldn't use anything other than atomics from the Concurrency Support Library, especially when _LIBCPP_HAS_NO_THREADS is defined. Support for atomics was separated from support for threads by D114109, so this re-organization of the standard doesn't really change anything.

I would argue that call_once shouldn't always be available in libc++. Specifically, it shouldn't be available when _LIBCPP_HAS_NO_THREADS is defined. Nearly every header that is part of the C++11 Thread Support Library (i.e. <thread>, <mutex>, <shared_mutex>, <condition_variable>, <semaphore>, <latch> and <future>) has the following right near at the top:

#ifdef _LIBCPP_HAS_NO_THREADS
# error <latch> is not supported on this single threaded system
#endif

The only exceptions are <mutex> and <condition_variable>. However, all of <condition_variable> is wrapped in a #ifndef _LIBCPP_HAS_NO_THREADS, so including it does nothing when _LIBCPP_HAS_NO_THREADS is defined (aside from the #include of <__config>, <__mutex_base>, <memory> and <version> at the very start, but <__mutex_base> is similarly wrapped in a #ifndef _LIBCPP_HAS_NO_THREADS). <mutex> does the exact same thing except it's careful to put all the definitions and forward declarations needed for std::call_once outside the #if directive, just so that we can use std::call_once here in libcxx/src/locale.cpp.

This seems like a very poor design to me, and if I had to guess the only reason it's like this is probably because _LIBCPP_HAS_NO_THREADS didn't exist when locale::id::__get was written, and we didn't bother to change the design when we added it. I would actually argue that we should have forbidden the use of <mutex> and <condition_variable> with _LIBCPP_HAS_NO_THREADS, because there's nothing in there that should realistically be used by a single-threaded application. As a user, if you're on a platform without threads, using std::call_once is frankly ridiculous - replacing the std::once_flag with a boolean, and using a simple if statement to check the boolean makes WAYYYYY more sense. However, at this point, it's maybe a little late to be doing that, since we've already opened the door to such bizarre uses of libc++. We could maybe still add something like this to <mutex> and <condition_variable> (though #warning is non-standard):

#ifdef _LIBCPP_HAS_NO_THREADS
# warning <mutex> included on a single threaded system
#endif

Any changes to <mutex> and <condition_variable> aside, when we're on a single-threaded system, libc++ itself certainly (in my mind) shouldn't be relying on anything from <mutex> or <condition_variable> (including std::call_once).

libcxx/lib/abi/arm64-apple-darwin.libcxxabi.v1.stable.exceptions.no_new_in_libcxx.abilist
1707 ↗	(On Diff #401281)	I was under the impression this wouldn't be the first time we've removed a function from the ABI? E.g. In libcxx/lib/abi/CHANGELOG.TXT there's this at line 816: This change also marks __start_std_streams as hidden -- this variable is only required to initialize the streams, and nobody should depend on it from outside the dylib. This function is in a similar situation: It should only be getting called from `libcxx/src/locale.cpp`, which is a compiled libc++ source file and thus part of the dylib.
libcxx/src/locale.cpp
718	There was still synchronization, it was just buried in `std::call_once` (specifically inside `std::__call_once` at libcxx/src/mutex.cpp:231). If anything, this version is less restrictive since the memory ordering here is `_AO_Relaxed` instead of `_AO_Acquire` like in `std::call_once`. This could improve performance and reduce the amount of time spent blocked/waiting since it allows more re-ordering of instructions than before. However, as I mentioned in a comment a little further down I'm not 100% sure I actually got this memory ordering right, since I'm not very familiar with them. As for your proposed alternative, I'm not super familiar with how static local references work, so I might be missing something, but won't that result in only 1 call to `__cxx_atomic_add` and thus only one increment operation on `__next_id`? The current implementation gets around this by using a different `__flag_` per instance of `locale::id` (recall that `__flag_` is a non-static member of `locale::id`). I can't see any way of replicating this behaviour using just static locals.

Follow up on @EricWF's comments based on a video chat with him:
Since id::__get is apparently hit quite often, the extra performance we might get here because of the relaxed memory ordering instead of acquire/release ordering could potentially be significant. Not to mention that in the happy path we replace 1 atomic load (from id::__flag_.__state_) and 1 non-atomic load (from id::__id_) with a single atomic load (from id::__id_). I do have to run some benchmarks to double check, but I would be shocked if this somehow hurt performance.
Comparing the old sequence to the new sequence side by side makes this pretty obvious I think.

The common/happy path:

old version using std::call_once                                    |   new version using __threading_support directly

call into std::call_once                                            <=  -
atomic_load(id::__flag_.__state_, _AO_Acquire)                      <=  atomic_load(id::__id_, _AO_Relaxed)
-                                                                   ..  assign load result to local_variable
compare load result NEQ ~0                                          ==  compare local_variable NEQ 0
//(False)
return from std::call_once                                          <=  -
return id::__id_ - 1                                                <   return local_variable - 1

And the uncommon path is WAY simpler after the change:

old version using std::call_once                                    |   new version using __threading_support directly

call into std::call_once                                            <=  -
atomic_load(id::__flag_.__state_, _AO_Acquire)                      <=  atomic_load(id::__id_, _AO_Relaxed)
-                                                                   ..  assign load result to local_variable
compare load result NEQ ~0                                          ==  compare local_variable NEQ 0
{
    junk wrapper code                                               <=  -
    call into __call_once                                           <=  -
//(NO_THREADS):
    compare (volatile) id::__flag_.__state_ EQ 0                    <=  compare local_variable EQ 0
    {
        start try block                                             <=  -
        assign 1 to (volatile) id::__flag_.__state_                 <   -
        junk wrapper code                                           <=  -
        atomic_add(&__next_id, 1, _AO_Seq);                         <=  increment __next_id
        -                                                           ..  assign __next_id to local_variable
        assign __next_id to id::__id_                               ??  atomic_store(id::__id_, local_variable, _AO_Relaxed)
        assign ~0 to (volatile) id::__flag_.__state_                <   -
        catch block/end of try-block                                <=  -
    }
//(THREADS):
    lock mutex common to all std::call_once calls                   <   lock mutex common to all id::__get calls (a subset of the previous calls to std::call_once)
    -                                                               ..  assign id::__id_ to local_variable
    compare (volatile) id::__flag_.__state_ EQ 1                    <   -
        condvar_wait                                                <   -
    compare (volatile) id::__flag_.__state_ EQ 0                    <=  compare local_variable EQ 0
    {
        start try block                                             <=  -
        atomic_store(id::__flag_.__state_, 1, _AO_Relaxed)          <   -
        unlock mutex                                                <   -
        junk wrapper code                                           <=  -
        atomic_add(&__next_id, 1, _AO_Seq);                         <=  increment __next_id
        -                                                           ..  assign __next_id to local_variable
        assign __next_id to id::__id_                               <   -
        re-lock mutex                                               <   -
        atomic_store(id::__flag_.__state_, ~0, _AO_Release)         <=  atomic_store(id::__id_, local_variable, _AO_Relaxed)
        unlock mutex                                                ==  unlock mutex
        condvar_broadcast                                           <   -
        catch block/end of try-block                                <=  -
    }
    else
        unlock mutex                                                ==  unlock mutex
}
return from std::call_once                                          <=  -
return id::__id_ - 1                                                <   return local_variable - 1

Once this becomes more of a priority for us, I'll run some actual benchmarks to make sure before I consider landing this.

Diff 400199

libcxx/include/__locale

	Show First 20 Lines • Show All 197 Lines • ▼ Show 20 Lines
	// facet(const facet&) = delete; // effectively done in __shared_count			// facet(const facet&) = delete; // effectively done in __shared_count
	// void operator=(const facet&) = delete;			// void operator=(const facet&) = delete;
	private:			private:
	virtual void __on_zero_shared() _NOEXCEPT;			virtual void __on_zero_shared() _NOEXCEPT;
	};			};

	class _LIBCPP_TYPE_VIS locale::id			class _LIBCPP_TYPE_VIS locale::id
	{			{
	once_flag __flag_;			// Reserve space for a once_flag to preserve ABI compatibility
				once_flag __reserved;
	int32_t __id_;			int32_t __id_;
				DanielMcIntosh-IBMAuthorUnsubmitted Not Done Reply Inline Actions Kind of ugly that we still need to `#include "mutex"` just for this. Maybe I should change this to #if defined(_LIBCPP_ABI_MICROSOFT) uintptr_t __reserved; #else unsigned long __reserved; #endif And add a `static_assert(sizeof(locale::id.__reserved) == sizeof(once_flag))` somewhere. Not clear where I'd add it though. DanielMcIntosh-IBM: Kind of ugly that we still need to `#include "mutex"` just for this. Maybe I should change this…

	static int32_t __next_id;			static int32_t __next_id;
	public:			public:
	_LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR id() :__id_(0) {}			_LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR id() :__id_(0) {}
	void operator=(const id&) = delete;			void operator=(const id&) = delete;
	id(const id&) = delete;			id(const id&) = delete;

	private:
	void __init();
	DanielMcIntosh-IBMAuthorUnsubmitted Not Done Reply Inline Actions I think technically this removes a symbol from the ABI, but it's one that really shouldn't ever get used by anybody. Does this mean I need to update something in `libcxx/lib`? DanielMcIntosh-IBM: I think technically this removes a symbol from the ABI, but it's one that really shouldn't ever…
	public: // only needed for tests			public: // only needed for tests
	long __get();			long __get();

	friend class locale;			friend class locale;
	friend class locale::__imp;			friend class locale::__imp;
	};			};

	template <class _Facet>			template <class _Facet>
	▲ Show 20 Lines • Show All 1,582 Lines • Show Last 20 Lines

libcxx/src/locale.cpp

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	// On Solaris, we need to define something to make the C99 parts of localeconv			// On Solaris, we need to define something to make the C99 parts of localeconv
	// visible.			// visible.
	#ifdef __sun__			#ifdef __sun__
	#define _LCONV_C99			#define _LCONV_C99
	#endif			#endif

				#include "__threading_support"
	#include "algorithm"			#include "algorithm"
	#include "clocale"			#include "clocale"
	#include "codecvt"			#include "codecvt"
	#include "cstdio"			#include "cstdio"
	#include "cstdlib"			#include "cstdlib"
	#include "cstring"			#include "cstring"
	#include "locale"			#include "locale"
	#include "string"			#include "string"
	▲ Show 20 Lines • Show All 679 Lines • ▼ Show 20 Lines
	{			{
	delete this;			delete this;
	}			}

	// locale::id			// locale::id

	int32_t locale::id::__next_id = 0;			int32_t locale::id::__next_id = 0;

	namespace			#ifndef _LIBCPP_HAS_NO_THREADS
	{			_LIBCPP_SAFE_STATIC static __libcpp_mutex_t __id_mut = _LIBCPP_MUTEX_INITIALIZER;
				#endif
	class __fake_bind
	{
	locale::id* id_;
	void (locale::id::* pmf_)();
	public:
	__fake_bind(void (locale::id::* pmf)(), locale::id* id)
	: id_(id), pmf_(pmf) {}

	void operator()() const
	{
	(id_->*pmf_)();
	}
	};

	}			long locale::id::__get() {
				// Before we do anything as expensive as acquire a mutex, check if __id_ has already been set
				auto id_copy = __libcpp_atomic_load(&__id_, _AO_Relaxed);
				EricWFUnsubmitted Not Done Reply Inline Actions Where does `__id_` get initialized? EricWF: Where does `__id_` get initialized?
				EricWFUnsubmitted Not Done Reply Inline Actions Nevermind, I see it now. EricWF: Nevermind, I see it now.
				if (id_copy != 0)
				EricWFUnsubmitted Not Done Reply Inline Actions Before this change, there was no synchronization between different instances of `locale::id` objects because each object had its own `__flag`. There was an atomic increment, but that was non-blocking. Now, every construction of a new `locale::id` potentially blocks every construction of `locale::id`. This could significantly effect the behavior of existing multithreaded programs that use a lot of locales and a lot of threads. Also, couldn't this just be written static int& __id_ref = (__id_ = __cxx_atomic_add(__next_id, 1)); return __id; Which is just a fancier way of writing the current `call_once` version. (Which is available in all dialects and with/without threads). EricWF: Before this change, there was no synchronization between different instances of `locale::id`…
				DanielMcIntosh-IBMAuthorUnsubmitted Done Reply Inline Actions There was still synchronization, it was just buried in `std::call_once` (specifically inside `std::__call_once` at libcxx/src/mutex.cpp:231). If anything, this version is less restrictive since the memory ordering here is `_AO_Relaxed` instead of `_AO_Acquire` like in `std::call_once`. This could improve performance and reduce the amount of time spent blocked/waiting since it allows more re-ordering of instructions than before. However, as I mentioned in a comment a little further down I'm not 100% sure I actually got this memory ordering right, since I'm not very familiar with them. As for your proposed alternative, I'm not super familiar with how static local references work, so I might be missing something, but won't that result in only 1 call to `__cxx_atomic_add` and thus only one increment operation on `__next_id`? The current implementation gets around this by using a different `__flag_` per instance of `locale::id` (recall that `__flag_` is a non-static member of `locale::id`). I can't see any way of replicating this behaviour using just static locals. DanielMcIntosh-IBM: There was still synchronization, it was just buried in `std::call_once` (specifically inside…
				return id_copy - 1;

	long			#ifndef _LIBCPP_HAS_NO_THREADS
	locale::id::__get()			__libcpp_mutex_lock(&__m_);
	{			#endif
	call_once(__flag_, __fake_bind(&locale::id::__init, this));			id_copy = __id_;
	DanielMcIntosh-IBMAuthorUnsubmitted Not Done Reply Inline Actions I could have also addressed this by pulling `__call_once` out of mutex.cpp into a separate file, and had that use `internal_threading_support.h`. That would have made `std::call_once` usable when the base threading support library (e.g. pthread) is unavailable at runtime. However, since `std::call_once` is part of the C++11 Thread Support Library, I figured it's better to change things here so we're consistent about what we do and don't support when the thread support library is unavailable (and we don't needlessly affect the performance of `std::call_once`). Assuming this and D117373 are the only reason we support including `<mutex>` when `_LIBCPP_HAS_NO_THREADS` is defined (instead of giving an `#error` like we do with almost every other header that is part of the C++11 Thread support library), and both of these changes are accepted, it may make sense to stop supporting `<mutex>` with `_LIBCPP_HAS_NO_THREADS`. DanielMcIntosh-IBM: I could have also addressed this by pulling `__call_once` out of mutex.cpp into a separate file…
	return __id_ - 1;			if (id_copy == 0) {
				id_copy = ++__next_id;
				__libcpp_relaxed_store(&__id_, id_copy);
	}			}

	void			#ifndef _LIBCPP_HAS_NO_THREADS
	locale::id::__init()			__libcpp_mutex_unlock(&__m_);
	{			#endif
				DanielMcIntosh-IBMAuthorUnsubmitted Not Done Reply Inline Actions I'm not 100% certain the atomic load/store is necessary or that I got the memory ordering right. Looking at `std::call_once` as a reference, perhaps this should be if(__libcpp_atomic_load(&__id_, _AO_Acquire) != 0) and on line 726 __libcpp_atomic_store(&__id_, ++__next_id, _AO_Release); However, I think the reason `std::call_once` uses acquire/release memory ordering is to make sure "all concurrent calls to call_once are guaranteed to observe any side-effects made by the active call, with no additional synchronization", which doesn't really apply here. DanielMcIntosh-IBM: I'm not 100% certain the atomic load/store is necessary or that I got the memory ordering right.
	__id_ = __libcpp_atomic_add(&__next_id, 1);			return id_copy - 1;
	}			}

	// template <> class collate_byname<char>			// template <> class collate_byname<char>

	collate_byname<char>::collate_byname(const char* n, size_t refs)			collate_byname<char>::collate_byname(const char* n, size_t refs)
	: collate<char>(refs),			: collate<char>(refs),
	__l(newlocale(LC_ALL_MASK, n, 0))			__l(newlocale(LC_ALL_MASK, n, 0))
	{			{
	▲ Show 20 Lines • Show All 5,854 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[libcxx] switch locale from using std::call_once to __libcpp_mutex_t
Needs RevisionPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 400199

libcxx/include/__locale

libcxx/src/locale.cpp

This is an archive of the discontinued LLVM Phabricator instance.

[libcxx] switch locale from using std::call_once to __libcpp_mutex_tNeeds RevisionPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 400199

libcxx/include/__locale

libcxx/src/locale.cpp

[libcxx] switch locale from using std::call_once to __libcpp_mutex_t
Needs RevisionPublic