We will try to use GetSystemTimePreciseAsFileTime if possible.
Reference: https://sourceforge.net/p/mingw-w64/mingw-w64/ci/59195b2d7fe26549f70969b0dd487293819f023e/.
Details
- Reviewers
compnerd ldionne Prince213 - Group Reviewers
Restricted Project - Commits
- rG3ec634e65a02: [libcxx] Use GetSystemTimePreciseAsFileTime() if available
Diff Detail
Event Timeline
@amccarth - Can you weigh in here on what would be a suitable tradeoff for atomicity/safety/practicality for runtime loaded function pointers like these? Is it fine and safe enough in practice with a plain static pointer like this?
tl;dr: I understand the desire to avoid even tiny synchronization costs when trying to read a high-precision timer, but I'm uncomfortable with the data race. I suggest initializing the static function pointer under the locking provided by the compiler.
My reasoning is below. If I got any of this wrong, I would reconsider my position.
First, we have a function-scope static getSysTime_p that's not explicitly initialized, so it get zero-initialized, which is safe to assume maps to a nullptr on any platform that runs Windows. With no explicit initialization on the definition, I don't think the compiler has to inject any synchronization to ensure the static is initialized exactly once during the first call. The zeros will be generated at compile time.
Next, with the manual initialization, there's obviously the chance that two (or more) threads could attempt it at the same time. Two threads could race on reading getSysTime_p and they could race again to write it. Officially, that's UB, right? I'm pretty sure that won't result in any tearing on x86 or x64, but I don't know about others. (Windows on ARM is a thing now.)
Since kernel32 is guaranteed to be (and to remain) loaded for the duration of any Windows-hosted process, the call to GetProcAddress should always give the same result, which mitigates the concerns expressed in the remarks for GetProcAddress.
Every thread that gets into the initialization block will try to write the same value, so maybe those races aren't a problem in real life? I cannot say definitively, and it would be very hard to write a test for. If you decide to rely on that, make sure to add a comment about it.
I feel it would be safer to guarantee the static function pointer is initialized exactly once.
The simplest option would be to initialize it where it's defined. I would move the code that finds the address to a separate function, say GetGetSysTimeProcAddress, and then simplify this code to:
static GetSysTime_t getSysTime_p = GetGetSysTimeProcAddress(); getSysTime_p(&ft);
Of course, there's a small cost on every call, since the program will have to check whether the static has been initialized, but I assume that's very small and about as optimal as possible for the platform. _Maybe_ you could do a tiny bit better manually placing your own fences, but that's outside my skill set, and it would be very difficult to test. I doubt that making the pointer an std::atomic<T> or using ::InterlockedXxx functions would be more efficient than what the compiler generates. The double-checked lock pattern is also notoriously difficult to get right.
Another option would be to make the function pointer global, and initialize it once before any other threads have a chance to call. I don't know whether the library has such a hook. The cost of the lookup isn't huge, but it would be in the startup sequence. I don't have enough data to say whether that's an acceptable trade-off.
You could make the function pointer thread-local, so that every thread has there own copy that nobody races with, but I'm not sure that's any better than the other options. Imagine the pathologically case where every thread calls only once, and the program keeps spawning new threads.
Minor note: GetModuleHandleW might be a little faster than GetModuleHandleA. I believe the -A version will do the equivalent of a call to MultiByteToWide, which you could avoid by calling -W explicitly and adding an L prefix to the string literal. (That's just for the module name. The proc names are always "ANSI".)
Oh, that's great, I had no idea that this actually ends up synchronized. That's very neat. And the overhead once initialized seems miniscule.
Yeah such a setup sounds good to me, and clearly better than the other options listed.
Minor note: GetModuleHandleW might be a little faster than GetModuleHandleA. I believe the -A version will do the equivalent of a call to MultiByteToWide, which you could avoid by calling -W explicitly and adding an L prefix to the string literal. (That's just for the module name. The proc names are always "ANSI".)
On this note, note that for store apps, we can't use GetModuleHandle at all. So if _LIBCPP_WINDOWS_STORE_APP is defined, we should stick to static compile time switching between the functions.
FWIW, for static switching of the function, it seems like the more precise definition for when it's available is more like #if (_WIN32_WINNT >= _WIN32_WINNT_WIN8 && WINAPI_FAMILY_PARTITION(WINAPI_PARTITION_DESKTOP)) || (_WIN32_WINNT >= _WIN32_WINNT_WIN10). If _LIBCPP_WINDOWS_STORE_APP is defined, then WINAPI_FAMILY_PARTITION(WINAPI_PARTITION_DESKTOP) equals false. So if _LIBCPP_WINDOWS_STORE_APP is defined, we can use the newer function since _WIN32_WINNT >= _WIN32_WINNT_WIN10.
Yes, it was formalized in C++11 and is sometimes called "magic statics." I believe Clang and GCC may have implemented them even earlier. MSVC did not implement them until VS 2015.
Back in 2004, Raymond Chen wrote a blog post suggesting that _not_ synchronizing function static initialization was intentional and appropriate.
MSVC has a command line switch to disable the injection of the synchronization code for old programs that still expect the old behavior.
Could you please improve the commit message to explain the reasoning for this? GetSystemTimePreciseAsFileTime has been available since Windows 8+, which has already been EOL'ed. Windows 10 is supposed to be EOL'ed in ~4 years. So, being more explicit about the reasoning for the dynamic resolution of the function in the commit message seems reasonable.
I think that the changes that @amccarth suggested are the right way to handle this.
I presume the reason is for mingw configs that target a lower version by default. (Though, the latest versions of mingw-w64 default to setting _WIN32_WINNT to win10, but distributions may set a different default.)
FWIW I think we could keep the statically dispatched version via ifdefs with the current condition, when we know for sure that we can use the new function.
Try to detect if GetSystemTimePreciseAsFileTime is available at runtime when it is not present at compile time.
Users might target a lower version of Windows, but the same code might later in time run in an recent version of Windows, where GetSystemTimePreciseAsFileTime is available. We'll try to detect this situation at runtime.
libcxx/src/chrono.cpp | ||
---|---|---|
93 | I think we should keep a static case with #elif !WINAPI_FAMILY_PARTITION(WINAPI_PARTITION_DESKTOP), ie for store apps targeting win8, that just uses GetSystemTimeAsFileTime unconditionally, as the runtime loading/lookup functions aren’t available for such apps. I think the negated form of the condition for the function above then would be _WIN32_WINNT < _WIN32_WINNT_WIN8 && WINAPI_FAMILY_PARTITION(WINAPI_PARTITION_DESKTOP). (It’s be neat to be able to have all of that within the #else here, I dunno if that’s possible though.) |
If we are targeting windows 8+, GetSystemTimePreciseAsFileTime should be available and there's no need to check for it.
If that's not the case, we try to detect and cache the function to use.
This seems better and more simple.
No, this isn’t what I suggested.
If (win8 && desktop) || win10 New func, statically Elif !desktop Old func, statically Else Runtime init Endif
In windows 8 store apps, the new function isn’t available. It only became available to store apps in win 10.
Thanks, this looks ok to me.
libcxx/src/chrono.cpp | ||
---|---|---|
93 | Technically, the version bit of _WIN32_WINNT >= _WIN32_WINNT_WIN8 && !WINAPI_FAMILY_PARTITION(WINAPI_PARTITION_DESKTOP) is redundant, it’s enough to just check the family. (There’s no non-desktop for pre-win8.) |
Oh, also: This will still need approval from libcxx approves, I’m not one of them.
But in the meantime, what git author line do you want for the commit, I.e. “Real Name <email@address>“?
Please do clang-format the patch as well.
libcxx/src/chrono.cpp | ||
---|---|---|
68 | Can you change this to GetFileTimePreciseAsFileTimePtr? I think that FP is a bit too terse, and confusing within the standard library implementation. I think that a using alias may be nicer. |
libcxx/src/chrono.cpp | ||
---|---|---|
93 | If you're updating the patch still, could you also remove the _WIN32_WINNT >= _WIN32_WINNT_WIN8 && part from this (the !desktop) condition as I requested? |
LGTM, but the only question I have is whether we should initialize this at file scope so as to avoid a function-local static? As it stands, we're going to be checking whether the function-local static is initialized every time we enter the function, which seems undesirable for a function like system_clock::now().
Thanks, that’s a good point. Initially I was about to respond that it’s a tradeoff between startup time and runtime - but I guess libc++ has a fair number of global startup constructors anyway, so it’s probably not a big deal anyway.
However - as the initialization order of global constructors is undefined, wouldn’t it be an issue if it’s used from a constructor in user code?
We could use the init_priority attribute to enforce that this static is initialized before other statics. Either way works for me.
Oh, right, we have a neat macro for that already (_LIBCPP_INIT_PRIORITY_MAX), that’s probably the best solution. Thanks!
I'll update the patch with a version that initializes the function pointer once at library init, with _LIBCPP_INIT_PRIORITY_MAX, to avoid any extra runtime synchronization overhead when calling the function.
Moved the function pointer initialization to file scope, to avoid needing synchronization when accessing it on each call. I had to change the plain function into a class with constructor, to be able to use _LIBCPP_INIT_PRIORITY_MAX.
I'll go ahead and push this (with the original author as git author) after the CI passes.
Can you change this to GetFileTimePreciseAsFileTimePtr? I think that FP is a bit too terse, and confusing within the standard library implementation. I think that a using alias may be nicer.