Page MenuHomePhabricator

[libcxx] Refactor pthread usage - II
ClosedPublic

Authored by rmaprath on Apr 22 2016, 8:25 AM.

Details

Summary

This is mostly D11781 with the final review comments addressed:

  • Merged all the headers into a single __os_support header
  • Moved all internal functions (those only used by the library sources) away from the headers. This required adding a few #ifdefs into the library sources to select between the thread API.

Note that this is only a refactoring, in that it isolates pthread usage of libcxx allowing anyone to easily plug-in a different thread implementation into libcxx sources.

The final goal of this work (at least for us) is a bit more involved: we want to allow libcxx users to plug-in their own thread implementation at compile time. I have a patch for this which builds on the current one, I will be uploading it soon for comments.

Diff Detail

Event Timeline

rmaprath updated this revision to Diff 54654.Apr 22 2016, 8:25 AM
rmaprath retitled this revision from to [libcxx] Refactor pthread usage - II.
rmaprath updated this object.
rmaprath added subscribers: espositofulvio, cfe-commits.
rmaprath updated this revision to Diff 54658.Apr 22 2016, 8:29 AM

Added bit more context to the diff.

rmaprath updated this revision to Diff 54999.Apr 26 2016, 6:37 AM

Minor cleanup + Gentle ping.

bcraig added a subscriber: bcraig.Apr 26 2016, 7:08 AM
bcraig added inline comments.
src/mutex.cpp
31

Can't we just check for _LIBCPP_THREAD_API_PTHREAD once at the top of the file and #error as necessary there? I don't get the value of multiple #errors for the same condition, but sprinkled throughout the implementation. I do see the cost in the strategy, in that there are now lots of places that can bit-rot.

rmaprath updated this revision to Diff 55119.Apr 26 2016, 4:39 PM

Addressing review comments from @bcraig:

In the earlier patch, I tried to keep the __os_support header to a minimum by not exposing the internal pthread dependencies (in library sources) in this header. This however blew up on me when externalizing the threading support in D19415, where I ended up sprinkling a lot of:

#if defined(_LIBCPP_THREAD_API_PTHREAD)
  // Do pthread thing
#elif defined(_LIBCPP_THREAD_API_EXTERNAL)
  // Do other thing
#else
  // Fault
#endif

Perhaps the mistake was to view these two threading APIs as unrelated. After hiding all pthread dependencies behind the corresponding __os_xxxx functions (as suggested), everything falls into place (D19415 trivially becomes the list of __os_xxxx function declarations).

I have addressed the remaining review comments along the way. Will update D19415 soon.

@bcraig: Hope this is much better?

LGTM. You will still need to get approval from one of your original reviewers though.

mclow.lists edited edge metadata.Apr 27 2016, 12:02 PM

In general, I'm happy with this direction. I think we need to check that there are no ABI changes that happen here (I don't see any, but I'll keep looking), and I think the stuff in <__os_support> should be marked "always inline"

On a bikeshed note: is <__os_support> the right name? Or should it be something like <__thread> or <__threading_support>?

include/__os_support
32 ↗(On Diff #55119)

These should all be marked with _LIBCPP_ALWAYS_INLINE.

src/algorithm.cpp
51

What happened to the initializer here?

src/memory.cpp
130

How does this array get initialized now?

src/mutex.cpp
201

These two variables used to be initialized. Now they're not.

On a bikeshed note: is <__os_support> the right name? Or should it be something like <__thread> or <__threading_support>?

I went with __os_support in case if we'd want to group further external dependencies (like, for example, if some non-c POSIX calls are used in the upcoming filesystem library). I can revert back to __threading_support if that's preferred.

Thanks.

/ Asiri

include/__os_support
32 ↗(On Diff #55119)

Got it. Will update.

src/algorithm.cpp
51

I'm expecting the constructor of mutex to be run here at load time (before main). Note that this a libc++ mutex rather than a pthreads mutex (which does not required a constructor call like this). Would that be too much of an overhead?

src/memory.cpp
130

Same as above, but bigger (this now requires 16 constructor calls during load time). I can check if clang would figure out that this is a trivial construction (at least for the default pthreads implementation) and be able to avoid the constructor calls altogether, but may be relying on such behaviour is not good.

src/mutex.cpp
201

Same, relying on initialization at load time.

bcraig added inline comments.Apr 27 2016, 1:03 PM
src/algorithm.cpp
51

std::mutex's default ctor is constexpr. As long as the compiler supports constexpr, this initialization shouldn't require runtime code.

rmaprath added inline comments.Apr 27 2016, 1:22 PM
src/algorithm.cpp
51

Missed that!. The same applied to std::condition_variable (used below). From the sources it looks like we do cater for compilers that do not support constexpr, so for those compilers it would depend on how clever they are in optimizing out this kind of trivial constructor calls.

rmaprath updated this revision to Diff 55425.Apr 28 2016, 9:16 AM
rmaprath edited edge metadata.

Addressing review comments by @mclow.lists:

  • Switched to _LIBCPP_ALWAYS_INLINE from _LIBCPP_INLINE_VISIBILITY for all __os_xxx functions.

I've left the remaining points (naming of __os_support header and initialization of internal mutex / condition_variable instances) untouched pending further discussion.

/ Asiri

On a bikeshed note: is <__os_support> the right name? Or should it be something like <__thread> or <__threading_support>?

I went with __os_support in case if we'd want to group further external dependencies (like, for example, if some non-c POSIX calls are used in the upcoming filesystem library). I can revert back to __threading_support if that's preferred.

Thanks.

/ Asiri

I don't particularly care for <os_support> either. Either of <thread> or <__threading_support> are fine with me.

Note that there are already other OS specific calls that libcxx has to handle. A lot of the locale code wants to use BSD style foo_l functions that are present on Linux. I prefer to keep the locale code independent of the threading code, as the two don't overlap much.

For filesystem related items, I would expect those to go under <filesystem> or <filesystem_support>, or something similar.

rmaprath updated this revision to Diff 55443.Apr 28 2016, 10:53 AM

Renamed __os_support to __threading_support as suggested by @mclow.lists and @bcraig. I've left the namespace name __libcpp_os_support untouched, can change it to __libcpp_threading_support if the latter is preferred.

Thanks!

/ Asiri

LGTM. You'll still need to wait for one of the other reviewers though (probably @mclow.lists )

EricWF edited edge metadata.Apr 28 2016, 12:55 PM

The patch looks good for the most part. Nit picking I would rather have __libcpp_thread_foo instead of __libcpp_os_support::__os_thread_foo. IMO the namespace is undeeded.
@mclow.lists can you weigh in on this?

include/__mutex_base
42

Similar to the reason we choose __libcpp_mutex these macros should have a _LIBCPP prefix as well instead of an _OS one.

283

Shouldn't this be __cond_var_type?

rmaprath updated this revision to Diff 55501.Apr 28 2016, 3:30 PM
rmaprath edited edge metadata.

Addressing review comments from @EricWF:

  • Agree with all the suggested changes, I've applied them all.

Thanks!

/ Asiri

OK. I *think* this is my last round of review comments except for one specific issue. I'm still looking into the changes to the static mutex's and condition_variables in memory.cpp and algorithm.cpp. In these cases I don't want to go from compile-time initialization to run-time initialization. This could introduce the static initialization order fiasco.

include/__mutex_base
37

Why not use __libcpp_mutex_t directly? Sorry for yet another renaming comment.

42

Only one underscore in _LIBCPP_MUTEX_INITIALIZER

include/__threading_support
12

_LIBCPP_THREADING_SUPPORT

14
#include <__config>

#ifndef _LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER
#pragma GCC system_header
#endif
31

Two comments:

  1. This is only used to initialize a recursive mutex. Let's simply name this __libcpp_recursive_mutex_init and get rid of the bool parameter.
  2. This is only used in recursive_mutex.cpp. It's OK to keep this in __threading_support for now but I'll probably want to move this out of the headers later.
34

Nit:

Let's invert the control flow here:

if (ec) return ec;
ec = pthread_mutexattr_settype(...);

Then at the end we can just return 0;

83

Nit: _LIBCPP_CONDVAR_INITIALIZER

120

Can we split this into __libcpp_thread_id_equal and __libcpp_thread_id_less?

130

This signature needs to work in C++03. Let's make the signature __libcpp_thread_create(__libcpp_thread_t* __t, _Func* __f, _Arg* __arg) because both _Func and _Arg should be pointers so there is no need to perfect forward them.

170

This signature needs to work in C++03. Make the signature __libcpp_tl_create(__libcpp_tl_key*, _Func*) since the function must be a pointer.

195

_LIBCPP_THREADING_SUPPORT

src/mutex.cpp
228

This change seems incorrect. There are some paths below where we manually unlock mut before a throw or return. Using unique_lock will result in a double unlock.

I think the fix is to change all mut.lock() and mut.unlock() calls into lk.lock()/lk.unlock() calls.

rmaprath updated this revision to Diff 55557.Apr 29 2016, 3:25 AM

OK. I *think* this is my last round of review comments except for one specific issue. I'm still looking into the changes to the static mutex's and condition_variables in memory.cpp and algorithm.cpp. In these cases I don't want to go from compile-time initialization to run-time initialization. This could introduce the static initialization order fiasco.

So, as pointed out earlier by @bcraig, there won't be a runtime overhead for those compilers supporting constexr. For those that don't, this is still a trivial enough constructor call that is very likely to get optimized away. To elaborate, we can simplify this scenario to something like:

thread.h

struct thread_t {
  int state1;
  int *state2;
  bool state3;
};

#define THREAD_INITIALIZER  {0, 0, false}

test.cpp

#include "thread.h"

class Foo {
private:
  thread_t __t_;
public:
  Foo() {__t_ = (thread_t) THREAD_INITIALIZER;}
};

Foo f();

int main() {
  return 0;
}

Compiling this with either clang or gcc in -std=c++03 does not introduce an entry in the .init_array section. This can be observed with:

> clang++ -std=c++03 -c test.cpp
> objdump -j .init_array -dxs test.o

On the other hand, if you slightly change the constructor to take a parameter, or print something into std::cout, there will be an entry in the .init_array.

Moreover, even if some compiler naively emits a runtime initialization code for these constructors (mutex and condition_variable), I don't see how that can lead to problems related to static initialization order - all these constructors do is simply copy some memory, at worse, the compiler may emit a call to memcpy - but memcpy does not (AFAIK) depend on any other library state. In other words, these constructors don't require any other parts of the library having being initialized first.

I hope that clears it?

I have addressed the rest of you comments. Two minor nits:

  • Can we keep __libcpp_mutex_t (used in __threading_support) and __mutex_type (used in __mutex_base) separate? I feel it's cleaner to make a distinction between the types of the threading API and the rest of the library. I don't have a strong opinion though, let me know if you'd like it changed still.
  • I'd rather not move __libcpp_recursive_mutex_init back into recursive_mutex.cpp, as this would introduce pthread dependencies (pthread_mutexattr_xxx) back into the library sources. It's possible to workaround that by adding further __libcpp_mutexattr_xxx functions to the threading API, but those seem to be bit too pthread-specific. Again, no strong preference.

Thanks!

/ Asiri

rmaprath updated this revision to Diff 55569.Apr 29 2016, 4:37 AM

Minor tweak: Got rid of the unnecessary template parameters on __libcpp_thread_create and __libcpp_tl_create.

rmaprath updated this revision to Diff 55571.Apr 29 2016, 4:55 AM

Missed one: s/_LIBCPP_COND_INITIALIZER/_LIBCPP_CONDVAR_INITIALIZER

rmaprath updated this revision to Diff 55593.Apr 29 2016, 6:48 AM

Added missing __confg header include.

OK. I *think* this is my last round of review comments except for one specific issue. I'm still looking into the changes to the static mutex's and condition_variables in memory.cpp and algorithm.cpp. In these cases I don't want to go from compile-time initialization to run-time initialization. This could introduce the static initialization order fiasco.

So, as pointed out earlier by @bcraig, there won't be a runtime overhead for those compilers supporting constexr. For those that don't, this is still a trivial enough constructor call that is very likely to get optimized away. To elaborate, we can simplify this scenario to something like:

thread.h

struct thread_t {
  int state1;
  int *state2;
  bool state3;
};

#define THREAD_INITIALIZER  {0, 0, false}

test.cpp

#include "thread.h"

class Foo {
private:
  thread_t __t_;
public:
  Foo() {__t_ = (thread_t) THREAD_INITIALIZER;}
};

Foo has a trivial destructor. Our mutex does not. I've already looked into the codegen for clang and in each case your change creates runtime code. The constructors may still be optimized away but it still has to register the destructors.

I'm just trying to figure out if that's going to be a problem.

Example here: https://godbolt.org/g/dJL29v

OK. I *think* this is my last round of review comments except for one specific issue. I'm still looking into the changes to the static mutex's and condition_variables in memory.cpp and algorithm.cpp. In these cases I don't want to go from compile-time initialization to run-time initialization. This could introduce the static initialization order fiasco.

So, as pointed out earlier by @bcraig, there won't be a runtime overhead for those compilers supporting constexr. For those that don't, this is still a trivial enough constructor call that is very likely to get optimized away. To elaborate, we can simplify this scenario to something like:

thread.h

struct thread_t {
  int state1;
  int *state2;
  bool state3;
};

#define THREAD_INITIALIZER  {0, 0, false}

test.cpp

#include "thread.h"

class Foo {
private:
  thread_t __t_;
public:
  Foo() {__t_ = (thread_t) THREAD_INITIALIZER;}
};

Foo has a trivial destructor. Our mutex does not. I've already looked into the codegen for clang and in each case your change creates runtime code. The constructors may still be optimized away but it still has to register the destructors.

I'm just trying to figure out if that's going to be a problem.

Example here: https://godbolt.org/g/dJL29v

Hi Eric,

Thanks for the pointer (and also for godbolt!). I can see that that my code creates runtime code to register the destructors.

Apart from the overhead (registering the destructors at load time and then actually calling them when the DSO is unloaded), I can't see how this could lead to other issues. The ABI seem to be have a pretty well defined DSO destruction process [1].

Having to handle proper destruction of internal mutexes and condition variables sound like an OK thing for a standard library to do. But then, the following commit (which replaced mutex with the pthread native mutexes in memory.cpp originally) makes me think twice:

commit e33c2d1926f49221c9d72a353d797d135a810d77
Author: Howard Hinnant <hhinnant@apple.com>
Date:   Sat Mar 16 00:17:53 2013 +0000

    This should be nothing but a load-time optimization.  I'm trying to reduce load time initializers and this is a big one.  No visible functionality change intended.
    
    git-svn-id: https://llvm.org/svn/llvm-project/libcxx/trunk@177212 91177308-0d34-0410-b5e6-96231b3b80d8

diff --git a/src/memory.cpp b/src/memory.cpp
index 14084a5..98bcc86 100644
--- a/src/memory.cpp
+++ b/src/memory.cpp
@@ -122,7 +122,15 @@ __shared_weak_count::__get_deleter(const type_info&) const _NOEXCEPT
 #if __has_feature(cxx_atomic)
 
 static const std::size_t __sp_mut_count = 16;
-static mutex mut_back[__sp_mut_count];
+static pthread_mutex_t mut_back_imp[__sp_mut_count] =
+{
+    PTHREAD_MUTEX_INITIALIZER, PTHREAD_MUTEX_INITIALIZER, PTHREAD_MUTEX_INITIALIZER, PTHREAD_MUTEX_INITIALIZER,
+    PTHREAD_MUTEX_INITIALIZER, PTHREAD_MUTEX_INITIALIZER, PTHREAD_MUTEX_INITIALIZER, PTHREAD_MUTEX_INITIALIZER,
+    PTHREAD_MUTEX_INITIALIZER, PTHREAD_MUTEX_INITIALIZER, PTHREAD_MUTEX_INITIALIZER, PTHREAD_MUTEX_INITIALIZER,
+    PTHREAD_MUTEX_INITIALIZER, PTHREAD_MUTEX_INITIALIZER, PTHREAD_MUTEX_INITIALIZER, PTHREAD_MUTEX_INITIALIZER
+};
+
+static mutex* mut_back = reinterpret_cast<std::mutex*>(mut_back_imp);
 
 _LIBCPP_CONSTEXPR __sp_mut::__sp_mut(void* p) _NOEXCEPT
    : __lx(p)

So, perhaps it is best to leave these pthread mutexes alone, purely for performance reasons. We'll have to excuse a couple of #ifdef _LIBCPP_THREAD_API_XXXX conditionals in the library sources to allow the external threading API to function.

Does that sound like an OK compromise? @EricWF, @mclow.lists?

Thanks.

/ Asiri

[1] https://mentorembedded.github.io/cxx-abi/abi.html#dso-dtor

EricWF added a comment.May 2 2016, 7:21 PM

So, perhaps it is best to leave these pthread mutexes alone, purely for performance reasons. We'll have to excuse a couple of #ifdef _LIBCPP_THREAD_API_XXXX conditionals in the library sources to allow the external threading API to function.

Does that sound like an OK compromise? @EricWF, @mclow.lists?

Sounds good to me. Although we should probably change these from pthread_mutex_t to __libcpp_mutex_t.

rmaprath updated this revision to Diff 55961.May 3 2016, 3:03 AM

As agreed, reverted back to using the native mutex / condition_variable types within library sources.

@EricWF: Good to go now?

Thanks.

/ Asiri

rmaprath updated this revision to Diff 55965.May 3 2016, 3:59 AM

Added missing initializer (typo).

EricWF accepted this revision.May 5 2016, 6:09 PM
EricWF edited edge metadata.

LGTM after the minor fixes. Thank you for the patch.

include/__threading_support
38

ec -> __ec.

125

Let's make the return type bool so it's clear.

127

return pthread_equal(t1, t2) != 0;

132

Make the return type bool.

134

return t1 < t2;

include/thread
200

Nit: Use nullptr.

This revision is now accepted and ready to land.May 5 2016, 6:09 PM
rmaprath closed this revision.May 6 2016, 8:22 AM

@EricWF: Thanks for taking the time to review! :)

Committed as r268734. The bots seem to be happy.