This is an archive of the discontinued LLVM Phabricator instance.

[libcxx][SystemZ][z/OS] Added is_threading_api_enabled and might_have_multiple_threads to __threading_support
AbandonedPublic

Authored by DanielMcIntosh-IBM on Sep 23 2021, 10:21 AM.

Details

Reviewers
ldionne
EricWF
rmaprath
compnerd
jroelofs
Group Reviewers
Restricted Project
Summary

On z/OS, the availability of several POSIX functions depends on the
environment at program start. As a result, we don't know at compile
time if mutexes and multithreading are supported.

The z/OS Language Environment provides a flag CEEEDB_POSIX to check
whether POSIX functions (i.e. the threading api) is enabled. It also
provides a flag CEEEDBMULTITHREAD which is set to true when more than
1 thread is running, which we can use as an alternative in some cases
to get a little extra performance out of single-threaded applications.

Since there are many places we are going to need to check one of these
flags, this PR adds functions for doing that to __threading_support.

Diff Detail

Event Timeline

DanielMcIntosh-IBM requested review of this revision.Sep 23 2021, 10:21 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptSep 23 2021, 10:21 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript

No changes, just rebasing to re-trigger build and fix pre-merge checks

LGTM

libcxx/include/__support/ibm/ceeedb.h
20

Providing a link here would be helpful, if you have one.

Quuxplusone added inline comments.
libcxx/include/__threading_support
254–259

Peanut gallery says: It seems like it would be strictly easier to reason about this code if the user-facing API were called bool __libcpp_might_have_multiple_threads(). That is, IIUC, we don't care so much about what's "enabled" (potentia) so much as what is the case right now dynamically (actus).
If !__libcpp_might_have_multiple_threads(), then we can skip locking mutexes (as long as they'll definitely be unlocked before the user gets a chance to spawn a second thread). I think that's what's going on in D113069, for example.
Maybe this potentia/actus distinction is the same distinction as what you're calling are_threads_enabled (potentia) versus has_spawned_other_threads (actus); but if so, I haven't seen how you propose to use has_spawned_other_threads, and (FWLIW) I don't understand why D113069 used are_threads_enabled instead of has_spawned_other_threads.
I say "might have multiple threads" instead of "definitely has multiple threads" because I imagine it's generally not possible for libc++ to tell whether the user might have spawned threads via pthreads or whatever — basically as you say on line 385, but making sure that the name of the function reflects its semantics.

(We had a bad time with __libcpp_is_constant_evaluated, where sometimes you wanted to distinguish between __libcpp_might_be_constant_evaluated and __libcpp_is_definitely_runtime_evaluated, and sometimes you wanted to distinguish between __libcpp_is_definitely_constant_evaluated and __libcpp_might_be_runtime_evaluated; and it was never obvious at the call-site which of those two interpretations __libcpp_is_constant_evaluated was meant to convey.)

Address review comments

libcxx/include/__threading_support
254–259

I have renamed them both to hopefully make things a bit more clear.
are_threads_enabled -> is_threading_API_enabled
has_spawned_other_threads -> might_have_multiple_threads

You're right that the potentia/actus distinction is the difference between is_threading_api_enabled and might_have_multiple_threads. However is_threading_api_enabled also tells us whether it is even safe to try locking a mutex, and unlike might_have_multiple_threads, it will not change over the lifetime of the process.

D113069 is kind of a bad example of when you would use one or the other, because (as per it's description), we could probably have used might_have_multiple_threads. The only reason we didn't was out of an abundance of caution.
For some better examples of when we would want to use might_have_multiple_threads, see D113058, D113066, D112567 and D110351. For an example of when we have to use is_threading_api_enabled instead of might_have_multiple_threads, see D113054

DanielMcIntosh-IBM marked an inline comment as done.Nov 4 2021, 6:51 PM
DanielMcIntosh-IBM marked an inline comment as done.

Note that although I'm picking nits, I'm not an approver for this. I assume you'll want to get ldionne's attention on this entire patch series.

libcxx/include/__support/ibm/ceeedb.h
47

Nits: Surely '\x04' is more confusing than 4?
Should you guard against the user doing #define CEEEDB_POSIX 42 before including this header? Ditto for all non-reserved identifiers used in this file (lines 22–26, 32–36, 40, 45).

libcxx/include/__threading_support
254–259

Awesome. I know I'm bikeshedding here, but... now I understand (per the comment on line 341) that not __libcpp_is_threading_api_enabled() basically means __libcpp_will_never_have_multiple_threads(). ;)
(However, I'm not fully happy with the word "never." We mean "this-specific-process-will-never-spawn-multiple-threads," but I can easily imagine someone misinterpreting it as "libcpp-policy-states-this-platform-will-be-single-threaded-forever." I don't know how to fix that problem. So I would leave your status quo until someone (maybe you) suggests anything strictly better than __libcpp_is_threading_api_enabled.)

IIUC, __libcpp_might_have_multiple_threads is for "do I need a lock+unlock around this global access, given that it's all happening before the next user code runs," and __libcpp_is_threading_api_enabled is for "do I need a lock before this access, given that some user code is going to run under the lock and thus __libcpp_might_have_multiple_threads may become true between now and the line that does the unlock."

343–344

s/posix_on/__posix_on/

Rebase and address more review comments

libcxx/include/__support/ibm/ceeedb.h
47

The CEE prefix is reserved on z/OS for Language Environment external names, so no we don't need to guard against it. I will however add a #error message for if the header is included when not on z/OS

DanielMcIntosh-IBM marked an inline comment as done.Nov 8 2021, 11:39 AM
DanielMcIntosh-IBM retitled this revision from [libcxx][SystemZ][z/OS] Added are_threads_enabled and has_spawned_other_threads to __threading_support to [libcxx][SystemZ][z/OS] Added is_threading_api_enabled and might_have_multiple_threads to __threading_support.Nov 8 2021, 12:35 PM
DanielMcIntosh-IBM edited the summary of this revision. (Show Details)Nov 8 2021, 12:38 PM
ldionne requested changes to this revision.Nov 8 2021, 1:22 PM

I looked at the first 4-5 patches in this series, and I have a pretty big conceptual problem with this and all the little workarounds it allows creating afterwards. We already have a compile-time notion of whether threads are enabled or not and it seems really strange to me to control this sort of aspect of a program at runtime. I know that's the system you're working with and libc++ has no bearing on that, however I am pretty uneasy about opening the door to this kind of runtime customization.

Is there a reason why the underlying C library can't implement e.g. mtx_lock as a no-op if threading is disabled at runtime? The main motivation for this patch appears to be:

to get a little extra performance out of single-threaded applications.

How compelling is that performance gain? How frequent are single-threaded programs? Are there other ways to achieve that performance gain without inserting such "hacks" in the code?

TL;DR: This is crossing a pretty clear cut boundary that we have never crossed before (runtime vs compile-time), and I think it's not a great idea to do that.

This revision now requires changes to proceed.Nov 8 2021, 1:22 PM
DanielMcIntosh-IBM added a comment.EditedNov 8 2021, 2:36 PM

Is there a reason why the underlying C library can't implement e.g. mtx_lock as a no-op if threading is disabled at runtime?

Just to clarify, z/OS doesn't use the C11 threading library for __threading_support, it uses the POSIX threading library so it would be pthread_mutex_lock instead of mtx_lock.
I don't know if we could have the underlying pthread_mutex_lock and friends act as a no-op like you've described (it's managed by a separate team), but I strongly suspect it's not an option, as that could affect the behaviour of existing applications on z/OS.

However, making __libcpp_mutex_lock a no-op when the POSIX functions aren't available (aka POSIX(OFF)) might be an option, and it might work for some of the places we are making changes, but it won't work for all of them.
For example, it won't work for D113054. In POSIX(OFF), not only does pthread_once (and thus __libcpp_execute_once) not work, there's no way of allocating thread-local storage, so we'd have to implement fake versions of __libcpp_execute_once and all the thread local storage functions. That would still be technically feasible, but it would get very ugly very fast, and probably perform exceedingly poorly.

SeanP added a subscriber: SeanP.Nov 9 2021, 9:25 AM

Louis, we don't want to make the entire threading model for z/OS to be dynamic. We would like to say the threading model uses pthread, but we need to enable these guard functions to work in programs that use threading and not. As Daniel pointed out a number of the libc++ threading functions used by these guard functions are not trivial to implement for a non-threading application. I would also worry about other parts of the library that should be using threading behaving correctly. We are open to other options. We could change the conditional compilations so both forms of the guard functions are available and call the right one depending on if threading is enabled or not. Would that be acceptable?

The real problem being addressed is that programs that are run in the non POSIX environment (can be determined by an environment variable) cannot be calling the pthread functions by these guard functions. The performance benefit is side gain.

Is there a reason why the underlying C library can't implement e.g. mtx_lock as a no-op if threading is disabled at runtime?

I don't know if we could have the underlying pthread_mutex_lock and friends act as a no-op like you've described (it's managed by a separate team), but I strongly suspect it's not an option, as that could affect the behaviour of existing applications on z/OS.

I don't understand -- if it's correct to do it for libc++, it should be correct to do it for the C library too, no? If a program is single-threaded, I don't understand why the C library couldn't apply the same optimizations that you're trying to add to libc++, i.e. make mutexes no-ops and so on.

However, making __libcpp_mutex_lock a no-op when the POSIX functions aren't available (aka POSIX(OFF)) might be an option, and it might work for some of the places we are making changes, but it won't work for all of them.
For example, it won't work for D113054. In POSIX(OFF), not only does pthread_once (and thus __libcpp_execute_once) not work, there's no way of allocating thread-local storage, so we'd have to implement fake versions of __libcpp_execute_once and all the thread local storage functions. That would still be technically feasible, but it would get very ugly very fast, and probably perform exceedingly poorly.

There is actually a way to use an external threading API to implement all the libc++ threading primitives, see _LIBCPP_HAS_THREAD_API_EXTERNAL. We could promote it to official support if that were helpful, but I'd still like to understand why an optimization that is valid for libc++ isn't valid for libc, i.e. my question above.

DanielMcIntosh-IBM added a comment.EditedNov 22 2021, 2:51 PM

I don't understand -- if it's correct to do it for libc++, it should be correct to do it for the C library too, no? If a program is single-threaded, I don't understand why the C library couldn't apply the same optimizations that you're trying to add to libc++, i.e. make mutexes no-ops and so on.

As far as I can tell, the only parts of the C library that need to perform any kind of thread synchronization are the ones that the user would use when they themselves are performing thread synchronization (e.g. mtx_lock).
The places we're making changes are places where the user could run into thread synchronization code, even if the didn't call any functions related to threading. Each of the changes is to code that either doesn't have a C equivalent, or the C equivalent doesn't need to worry about thread safety.

  • Exception storage (D113054): C doesn't have exceptions
  • __call_once (D113058): This is needed because locale::id::__get() uses call_once. C makes concurrent use/modification of the current locale undefined behaviour.
  • LIBCPP debug (D113065): No C equivalent
  • Shared ptr std::atomic... overloads (D113066): C doesn't have shared pointers
  • random_shuffle (D113069): C doesn't provide any functions/algorithms as complex as random_shuffle
  • fallback malloc (D112567): This is used for exception handling so they still function in the out-of-memory case. C doesn't have exceptions.
  • static local variables (D110351): In C, static local variables are initialized before main(), so a) we don't need to keep track of whether the initialization has been performed yet and b) the initialization all happens in a single-threaded environment anyways

The goal isn't performance optimization, but to address the places in libc++ where threading code has leaked into code unrelated to the threading library. The issue is that z/OS users still need to be able to use those parts of the library when posix is disabled, but we can't completely strip out the synchronization code at compile time like _LIBCPP_HAS_NO_THREADS does because that synchronization code is needed when for when posix ISN'T disabled. Thus, we need to gate the synchronization code behind a run-time check of the availability the threading api (which is what __libcpp_is_threading_api_enabled is for).

__libcpp_is_threading_api_enabled is sufficient on its own to fix this problem, and we could stop there.
However, in most cases, if we have to gate the synchronization code behind a runtime check anyways, we might as well go one step further and use __libcpp_might_have_multiple_threads instead. The notable exception where I couldn't use __libcpp_might_have_multiple_threads is D113054.

On a related note, AIX will have the same issue with the availability of the threading library being unknown at compile time, and unlike z/OS, they do not have any equivalent of __libcpp_is_threading_api_enabled, so we've been talking internally about how to address the places where I used __libcpp_is_threading_api_enabled. This may result in updates that make it even less obvious that these changes are not about optimizations to libc++.

There is actually a way to use an external threading API to implement all the libc++ threading primitives, see _LIBCPP_HAS_THREAD_API_EXTERNAL. We could promote it to official support if that were helpful, but I'd still like to understand why an optimization that is valid for libc++ isn't valid for libc, i.e. my question above.

_LIBCPP_HAS_THREAD_API_EXTERNAL isn't helpful since we don't want a separate API. We still need to use the pthread version of the __threading_support API. Something similar to _LIBCPP_HAS_THREAD_LIBRARY_EXTERNAL could work. I'm working on an alternative implementation that will use something similar to it right now.

zibi added a subscriber: zibi.Feb 1 2022, 10:24 AM

Louis, I want to pick up the discussion on POSIX(OFF) left few months ago.

The proposal here is our favorite one and if allowed we want to proceed with it.

An alternative solution as Daniel indicated above is available in D117375 with its ancestors.
Even though this is the 4th implementation we still prefer D110349.

Please let us know if you still have a concern and if so how you want us to address it.

zibi added inline comments.Feb 3 2022, 8:53 AM
libcxx/include/__threading_support
254–259

IIUC, __libcpp_might_have_multiple_threads is for "do I need a lock+unlock around this global access, given that it's all happening before the next user code runs," and __libcpp_is_threading_api_enabled is for "do I need a lock before this access, given that some user code is going to run under the lock and thus __libcpp_might_have_multiple_threads may become true between now and the line that does the unlock."

The __libcpp_is_threading_api_enabled checks if threading is enabled (POSIX ON) or not (POSIX OFF) as its name clearly indicates that. It cannot be changed throughout the live of the application and it is set at the compile time of the application.
However, the __libcpp_might_have_multiple_threads can be changed in the life of the application. It is needed so the single threading execution can be made optimum.

zibi added a comment.Feb 11 2022, 1:53 PM

Clarifying my previous in-line comment...

libcxx/include/__threading_support
254–259

The __libcpp_is_threading_api_enabled checks if threading is enabled (POSIX ON) or not (POSIX OFF) as its name clearly indicates that. It cannot be changed throughout the live of the application and it is set at the compile time of the application.

Correction:
The __libcpp_is_threading_api_enabled() checks if threading is enabled (POSIX ON) or not (POSIX OFF) as its name clearly indicates. It cannot be changed throughout the live of the application but it can be set at the start of the application via env. var. _CEE_RUNOPTS=POSIX(OFF).

zibi added a comment.Feb 22 2022, 12:38 PM

Please see the new version in D120348.

DanielMcIntosh-IBM marked an inline comment as done.