Add an implementation for the Win32 threading model as a backing API for
the internal c++ threading interfaces. This uses the Fls* family for
the TLS (which has the support for adding termination callbacks),
CRITICAL_SECTIONs for the recursive mutex, and Slim Reader/Writer locks
(SRW locks) for non-recursive mutexes. These APIs should all be
available on Vista or newer.
Details
- Reviewers
mclow.lists EricWF
Diff Detail
- Repository
- rL LLVM
Event Timeline
slim reader-writer locks are faster than critical sections, I'd recommend your implementation switch to those.
Also, why do you use Fls instead of Tls?
Added inline comments.
The main issue is that this patch needs to use the same <__threading_support> forward declarations as every other supported threading API; Only new definitions should be provided.
include/__threading_support | ||
---|---|---|
44 | Isn't checking _LIBCPP_HAS_THREAD_API_WIN32 enough? | |
57 | The forward declarations of the __libcpp_ threading wrapper should be shared between all API's. Please don't add your own forward declarations for Windows. | |
src/algorithm.cpp | ||
51 ↗ | (On Diff #82821) | This makes me sad because not having constant initialization manifests itself as a static initialization order fiasco in libc++. If we do go this route please file a libc++ bug stating that Win32 mutex's do not provide constant initialization. I think this should check _LIBCPP_HAS_THREAD_API_WIN32? |
src/memory.cpp | ||
158 ↗ | (On Diff #82821) | I think this should check _LIBCPP_HAS_THREAD_API_WIN32? |
src/mutex.cpp | ||
198 ↗ | (On Diff #82821) | I think this should check _LIBCPP_HAS_THREAD_API_WIN32? |
@majnemer Im using the Fls* APIs since they provide the thread termination callback, which Tls* doesn't. Good point about the SRW. Those are newer, but, we can always provide a fallback later.
I don't believe they are newer than the condition variable API so all should be fine.
The SRW initializers avoid the initializer changes, so its one less set of changes that is necessary.
include/__threading_support | ||
---|---|---|
44 | No, because the structure of the header is such that it requires redeclarations :-(. |
Use SRW locks, rebase for avoid redeclaration of interfaces, remove static initialization check removals.
include/__threading_support | ||
---|---|---|
300–305 | I don't think you can use slim rw locks for recursive locks. I think we will need to use CRITICAL_SECTION for those. std::recursive_mutex can't be used with std::condition_variable AFAIK so all you need (I think) is recursive versions of __libcpp_mutex_... Recursive locks should be used far less frequently which makes it valuable, IMO, to use slim rw locks for the non-recursive mutex implementation. | |
355 | I don't think it should be __libcpp_condvar_timedwait's problem. __libcpp_condvar_timedwait wraps pthread_cond_timedwait on POSIX platforms and the caller of __libcpp_condvar_wait properly handles spurious wakeups. The caller of __libcpp_condvar_timedwait probably should be audited. |
include/__threading_support | ||
---|---|---|
300–305 | You are absolutely right. That was something that I looked at originally and went with the CS. However, the overhead of a tagged struct is 5 or 9 bytes (sizeof(void *) + 1) bytes (ignoring padding for MS ABI). Going with that should give the benefits of always being able to properly initialize the CS instead of the kludge. |
include/__threading_support | ||
---|---|---|
355 | SG; seems that there is a single user in condition_variable.cpp |
include/__threading_support | ||
---|---|---|
300–305 | Er, isn't the overhead much more than that? IIRC, CRITICAL_SECTION is quite large. You'd be making all the users of std::mutex pay for the space of std::recursive_mutex... |
include/__threading_support | ||
---|---|---|
83 | Why not a tagged union? |
include/__threading_support | ||
---|---|---|
300–305 | Yeah, CRITICAL_SECTION is 24 bytes vs 4 bytes for a SRWLOCK. |
include/__threading_support | ||
---|---|---|
44 | Do these definitions have any affect when <Windows.h> has already been included? | |
306 | The initial __libcpp_foo declarations specify the correct linkage and visibility attributes. Please don't decorate the definitions. |
include/__threading_support | ||
---|---|---|
44 | No, they dont effect it once it has been included. They are beneficial since they reduce the amount of stuff that gets included (including things which, at least when I last checked, can cause clang to choke). | |
57 | Yeah, restructed that. There is now the dependent patch for the type-erased mutex handling. |
include/__threading_support | ||
---|---|---|
44 | And can users re-include <Windows.h> afterwards in the same TU and get all of the symbols? |
Re Eric's windows.h concern.
include/__threading_support | ||
---|---|---|
44 | I don't think so. We've recently switched to defining these two symbols in our build system, and I think that's basically the only way to make this work in a project composed of headers from various authors. I think you're right that libc++ should not define them. |
You don't need #define VC_EXTRA_LEAN since:
- VC_EXTRA_LEAN is a mistype, only VC_EXTRALEAN has some meaning in Windows ecosystem;
- VC_EXTRALEAN is used *only* in MFC headers which obviously aren't used in clang codebase.
Thus it would be better to remove this altogether, and (perhaps) replace with the relevant #define WIN32_LEAN_AND_MEAN.
Remove incorrect use of VC_EXTRALEAN. Fix signature for __libcpp_set_thread_specific. Provide CC adjustment thunks for the thread creation. Add an assertion for sleep timeout truncation. Implement exec-once interfaces.
include/__threading_support | ||
---|---|---|
448 | In posix, pthread_cond_timedwait uses absolute time but we use relative time here. | |
450 | We need to include <chrono> to use types and functions from this namespace. | |
454 | You have done it accidentally right. I realized there is a second problem here - if timeout equals to INFINITE, the thread will wait forever. INFINITE equals to (DWORD)-1, so the strict sign in the assert is required. But for sanity sake _VSTD::numeric_limits<DWORD>::max() should be replaced with INFINITE. | |
476 | We need underscores for parameters and init_routine. | |
487 | Underscores for parameters are missing. | |
517 | Trampolines will never be inlined. Should we put them in support *.cpp instead? The downside is new public symbols which can't be changed without breaking backward compatibility. The upside is that we will have only one copy of each trampoline. What do you think? | |
521 | Do we need #include <cstdlib> for free() and malloc() now? Can we use new/delete instead? BTW, What is the purpose of _VSTD? I think it is used to prevent argument-dependent lookup and so is not needed for free and malloc. | |
522 | Should we even try to pass thread exit code, given that sizeof(unsigned int) < sizeof(void*) on 64-bit system? std::thread doesn't support thread exit code anyway. |
include/__threading_support | ||
---|---|---|
44 | I don't think libc++ should not include windows.h in a public header. I'd rather write our own __threading_support_win.h that re-prototypes everything we need from windows.h, and then we can have some test in libc++ that validates that there are no mismatches when including windows.h before libc++ <mutex>. If we do this, we should sink as much win32 API usage as possible out of headers to reduce our duplication. It's worth pointing out VS 2015's thread implementation also hides its win32 API usage. | |
517 | Considering that libc++ already has a __thread_proxy trampoline, let's just give it the right CC and get rid of this trampoline. | |
583 | Yeah, we need to fix that. We should find a way to make __thread_specific_ptr::__at_thread_exit(void*) have the right convention out of the box, rather than thunking. Something like: #define __LIBCPP_TLS_CALLBACK_CC WINAPI .. // else #define __LIBCPP_TLS_CALLBACK_CC ... // <thread> static void __LIBCPP_TLS_CALLBACK_CC __at_thread_exit(void*); |
include/__threading_support | ||
---|---|---|
448 | Good catch. I guess we have to convert the timespec to a relative time before using it. Slightly annoying, but thankfully, we can fallback on chrono :-). |
include/__threading_support | ||
---|---|---|
471 | Negative timeout_ms.count() is a normal case, we shouldn't fail on assertion. The reason is that program can always be slow enough for current timestamp (system_clock::now()) to pass any fixed point in time (__ts/abstime). | |
473 | I just checked in the Windows SDK, INFINITE is defined as 0xFFFFFFFF and negative timeouts are legal anyway. |
LGTM.
include/__threading_support | ||
---|---|---|
474 | There is still problem with large timeout but let us fix it later. |
I'm seeing the following build errors on Windows:
C:\Users\Eric\workspace\libcxx\include\__threading_support(521,3): warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete] delete __data; ^ ~~~~~~ C:\Users\Eric\workspace\libcxx\include\__threading_support(530,18): error: assigning to 'void *(*)(void *)' from incompatible type 'void (*)(void *)': different return type ('void *' vs 'void') data->__func = __func; ^~~~~~ C:\Users\Eric\workspace\libcxx\include\__threading_support(572,5): error: functions that differ only in their return type cannot be overloaded int __libcpp_thread_yield() ~~~ ^ C:\Users\Eric\workspace\libcxx\include\__threading_support(178,6): note: previous declaration is here void __libcpp_thread_yield(); ~~~~ ^ C:\Users\Eric\workspace\libcxx\include\__threading_support(589,5): error: functions that differ only in their return type cannot be overloaded int __libcpp_tls_get(__libcpp_tls_key __key) ~~~ ^ C:\Users\Eric\workspace\libcxx\include\__threading_support(185,7): note: previous declaration is here void *__libcpp_tls_get(__libcpp_tls_key __key); ~~~~~~^ C:\Users\Eric\workspace\libcxx\include\__threading_support(591,10): error: cannot initialize return object of type 'int' with an rvalue of type 'PVOID' (aka 'void *') return FlsGetValue(__key);
Just tried the updated patch and I still get one build error:
C:\Users\Eric\workspace\libcxx\src\thread.cpp(135,16): error: use of undeclared identifier 'nanosleep' while (nanosleep(&ts, &ts) == -1 && errno == EINTR)
include/__threading_support | ||
---|---|---|
33–34 | Can we do as Reid suggests and not expose users to windows.h? |
include/__threading_support | ||
---|---|---|
595 | The _LIBCPP_TLS_DESTRUCTOR_CC needs to appear an the initial declaration as well. |
include/__threading_support | ||
---|---|---|
34 |
I was about to ask the same question. These includes are dragging in the __deallocate macro and I would love to avoid that. |
include/__threading_support | ||
---|---|---|
34 | I feel like we would end up with a lot of duplication if we went down this route, since this is using a fair amount of Windows APIs. @rnk suggested having a test for prototype mismatches, but even with those checks there could be a high maintenance burden to the duplication. Was the main objection to WIN32_LEAN_AND_MEAN that it would be problematic for modules? If we're including windows.h, it seems strictly preferable to include it with WIN32_LEAN_AND_MEAN than without, since we'll pull in a lot less that way. Including windows.h without WIN32_LEAN_AND_MEAN can also interact with other headers badly sometimes, e.g. [winsock2.h](https://msdn.microsoft.com/en-us/library/windows/desktop/ms737629%28v=vs.85%29.aspx). |
include/__threading_support | ||
---|---|---|
34 | It seems that dragging in the __deallocate macro is inevitable :-( I submitted a patch to work around __deallocate here: https://reviews.llvm.org/D28426 |
include/__threading_support | ||
---|---|---|
34 |
The objection is that it breaks user code. For example: #include <thread> #include <Windows.h> // Windows.h already included as lean and mean. typedef NonLeanAndMeanSymbol foo; // ERROR NonLeanAndMeanSymbol not defined |
include/__threading_support | ||
---|---|---|
34 | But without the WIN32_LEAN_AND_MEAN, we're gonna break #include <thread> #include <winsock2.h> (you could fix this by reordering the includes, which would also fix your example, or by defining WIN32_LEAN_AND_MEAN yourself, but it doesn't seem great either) |
include/__threading_support | ||
---|---|---|
34 | I would much rather break that code. The fact that Windows.h doesn't play nice with other Windows headers is a problem for Windows not libc++. |
include/__threading_support | ||
---|---|---|
34 | Although I think avoiding the Windows.h include all together would be better (if possible). However I think that can be fixed after committing this. |
SVN r291333
include/__threading_support | ||
---|---|---|
33–34 | I think that I would rather do that in a follow up. This will require a fair amount of duplication. |
I think we agreed that we cannot use this macro.