⚙ D104987 [libcxx] Use GetSystemTimePreciseAsFileTime() if available

Prince213 requested review of this revision.Jun 26 2021, 9:36 PM

Prince213 created this revision.

Herald added a reviewer: Restricted Project. · View Herald TranscriptJun 26 2021, 9:36 PM

Herald added a subscriber: libcxx-commits. · View Herald Transcript

Use WINAPI calling convention.

Fix format.

Harbormaster completed remote builds in B111172: Diff 354731.Jun 27 2021, 2:34 AM

Prince213 edited the summary of this revision. (Show Details)Jun 28 2021, 1:43 AM

Herald added a subscriber: mstorsjo. · View Herald TranscriptJun 28 2021, 1:43 AM

@amccarth - Can you weigh in here on what would be a suitable tradeoff for atomicity/safety/practicality for runtime loaded function pointers like these? Is it fine and safe enough in practice with a plain static pointer like this?

tl;dr: I understand the desire to avoid even tiny synchronization costs when trying to read a high-precision timer, but I'm uncomfortable with the data race. I suggest initializing the static function pointer under the locking provided by the compiler.

My reasoning is below. If I got any of this wrong, I would reconsider my position.

First, we have a function-scope static getSysTime_p that's not explicitly initialized, so it get zero-initialized, which is safe to assume maps to a nullptr on any platform that runs Windows. With no explicit initialization on the definition, I don't think the compiler has to inject any synchronization to ensure the static is initialized exactly once during the first call. The zeros will be generated at compile time.

Next, with the manual initialization, there's obviously the chance that two (or more) threads could attempt it at the same time. Two threads could race on reading getSysTime_p and they could race again to write it. Officially, that's UB, right? I'm pretty sure that won't result in any tearing on x86 or x64, but I don't know about others. (Windows on ARM is a thing now.)

Since kernel32 is guaranteed to be (and to remain) loaded for the duration of any Windows-hosted process, the call to GetProcAddress should always give the same result, which mitigates the concerns expressed in the remarks for GetProcAddress.

Every thread that gets into the initialization block will try to write the same value, so maybe those races aren't a problem in real life? I cannot say definitively, and it would be very hard to write a test for. If you decide to rely on that, make sure to add a comment about it.

I feel it would be safer to guarantee the static function pointer is initialized exactly once.

The simplest option would be to initialize it where it's defined. I would move the code that finds the address to a separate function, say GetGetSysTimeProcAddress, and then simplify this code to:

static GetSysTime_t getSysTime_p = GetGetSysTimeProcAddress();
getSysTime_p(&ft);

Of course, there's a small cost on every call, since the program will have to check whether the static has been initialized, but I assume that's very small and about as optimal as possible for the platform. _Maybe_ you could do a tiny bit better manually placing your own fences, but that's outside my skill set, and it would be very difficult to test. I doubt that making the pointer an std::atomic<T> or using ::InterlockedXxx functions would be more efficient than what the compiler generates. The double-checked lock pattern is also notoriously difficult to get right.

Another option would be to make the function pointer global, and initialize it once before any other threads have a chance to call. I don't know whether the library has such a hook. The cost of the lookup isn't huge, but it would be in the startup sequence. I don't have enough data to say whether that's an acceptable trade-off.

You could make the function pointer thread-local, so that every thread has there own copy that nobody races with, but I'm not sure that's any better than the other options. Imagine the pathologically case where every thread calls only once, and the program keeps spawning new threads.

Minor note: GetModuleHandleW might be a little faster than GetModuleHandleA. I believe the -A version will do the equivalent of a call to MultiByteToWide, which you could avoid by calling -W explicitly and adding an L prefix to the string literal. (That's just for the module name. The proc names are always "ANSI".)

In D104987#2855606, @amccarth wrote:
tl;dr: I understand the desire to avoid even tiny synchronization costs when trying to read a high-precision timer, but I'm uncomfortable with the data race. I suggest initializing the static function pointer under the locking provided by the compiler.

The simplest option would be to initialize it where it's defined. I would move the code that finds the address to a separate function, say GetGetSysTimeProcAddress, and then simplify this code to:
static GetSysTime_t getSysTime_p = GetGetSysTimeProcAddress();
getSysTime_p(&ft);

Oh, that's great, I had no idea that this actually ends up synchronized. That's very neat. And the overhead once initialized seems miniscule.

Yeah such a setup sounds good to me, and clearly better than the other options listed.

Minor note: GetModuleHandleW might be a little faster than GetModuleHandleA. I believe the -A version will do the equivalent of a call to MultiByteToWide, which you could avoid by calling -W explicitly and adding an L prefix to the string literal. (That's just for the module name. The proc names are always "ANSI".)

On this note, note that for store apps, we can't use GetModuleHandle at all. So if _LIBCPP_WINDOWS_STORE_APP is defined, we should stick to static compile time switching between the functions.

FWIW, for static switching of the function, it seems like the more precise definition for when it's available is more like #if (_WIN32_WINNT >= _WIN32_WINNT_WIN8 && WINAPI_FAMILY_PARTITION(WINAPI_PARTITION_DESKTOP)) || (_WIN32_WINNT >= _WIN32_WINNT_WIN10). If _LIBCPP_WINDOWS_STORE_APP is defined, then WINAPI_FAMILY_PARTITION(WINAPI_PARTITION_DESKTOP) equals false. So if _LIBCPP_WINDOWS_STORE_APP is defined, we can use the newer function since _WIN32_WINNT >= _WIN32_WINNT_WIN10.

In D104987#2856224, @mstorsjo wrote:
In D104987#2855606, @amccarth wrote:
tl;dr: I understand the desire to avoid even tiny synchronization costs when trying to read a high-precision timer, but I'm uncomfortable with the data race. I suggest initializing the static function pointer under the locking provided by the compiler.

The simplest option would be to initialize it where it's defined. I would move the code that finds the address to a separate function, say GetGetSysTimeProcAddress, and then simplify this code to:
static GetSysTime_t getSysTime_p = GetGetSysTimeProcAddress();
getSysTime_p(&ft);
Oh, that's great, I had no idea that this actually ends up synchronized. That's very neat.

Yes, it was formalized in C++11 and is sometimes called "magic statics." I believe Clang and GCC may have implemented them even earlier. MSVC did not implement them until VS 2015.

Back in 2004, Raymond Chen wrote a blog post suggesting that _not_ synchronizing function static initialization was intentional and appropriate.

MSVC has a command line switch to disable the injection of the synchronization code for old programs that still expect the old behavior.

Could you please improve the commit message to explain the reasoning for this? GetSystemTimePreciseAsFileTime has been available since Windows 8+, which has already been EOL'ed. Windows 10 is supposed to be EOL'ed in ~4 years. So, being more explicit about the reasoning for the dynamic resolution of the function in the commit message seems reasonable.

I think that the changes that @amccarth suggested are the right way to handle this.

This revision now requires changes to proceed.Jul 9 2021, 10:33 AM

I presume the reason is for mingw configs that target a lower version by default. (Though, the latest versions of mingw-w64 default to setting _WIN32_WINNT to win10, but distributions may set a different default.)

FWIW I think we could keep the statically dispatched version via ifdefs with the current condition, when we know for sure that we can use the new function.

Try to detect if GetSystemTimePreciseAsFileTime is available at runtime when it is not present at compile time.

Users might target a lower version of Windows, but the same code might later in time run in an recent version of Windows, where GetSystemTimePreciseAsFileTime is available. We'll try to detect this situation at runtime.

Harbormaster completed remote builds in B113326: Diff 357678.Jul 9 2021, 7:52 PM

mstorsjo added inline comments.Jul 10 2021, 5:07 AM

libcxx/src/chrono.cpp
93	I think we should keep a static case with #elif !WINAPI_FAMILY_PARTITION(WINAPI_PARTITION_DESKTOP), ie for store apps targeting win8, that just uses GetSystemTimeAsFileTime unconditionally, as the runtime loading/lookup functions aren’t available for such apps. I think the negated form of the condition for the function above then would be _WIN32_WINNT < _WIN32_WINNT_WIN8 && WINAPI_FAMILY_PARTITION(WINAPI_PARTITION_DESKTOP). (It’s be neat to be able to have all of that within the #else here, I dunno if that’s possible though.)

mstorsjo added a reviewer: mstorsjo.Jul 10 2021, 5:07 AM

Prince213 updated this revision to Diff 357706.Jul 10 2021, 5:41 AM

If we are targeting windows 8+, GetSystemTimePreciseAsFileTime should be available and there's no need to check for it.
If that's not the case, we try to detect and cache the function to use.

This seems better and more simple.

No, this isn’t what I suggested.

If (win8 && desktop) || win10
    New func, statically 
Elif !desktop
    Old func, statically 
Else
    Runtime init
Endif

In D104987#2868832, @Prince213 wrote:

If we are targeting windows 8+, GetSystemTimePreciseAsFileTime should be available and there's no need to check for it.
If that's not the case, we try to detect and cache the function to use.

This seems better and more simple.

In windows 8 store apps, the new function isn’t available. It only became available to store apps in win 10.

I see.

Prince213 updated this revision to Diff 357707.Jul 10 2021, 5:52 AM

Thanks, this looks ok to me.

libcxx/src/chrono.cpp
93	Technically, the version bit of _WIN32_WINNT >= _WIN32_WINNT_WIN8 && !WINAPI_FAMILY_PARTITION(WINAPI_PARTITION_DESKTOP) is redundant, it’s enough to just check the family. (There’s no non-desktop for pre-win8.)

Oh, also: This will still need approval from libcxx approves, I’m not one of them.

But in the meantime, what git author line do you want for the commit, I.e. “Real Name <email@address>“?

Harbormaster completed remote builds in B113349: Diff 357707.Jul 10 2021, 7:57 AM

Sizhe Zhao <prc.zhao@outlook.com>

Please do clang-format the patch as well.

libcxx/src/chrono.cpp
68	Can you change this to `GetFileTimePreciseAsFileTimePtr`? I think that `FP` is a bit too terse, and confusing within the standard library implementation. I think that a `using` alias may be nicer.

Prince213 updated this revision to Diff 358141.Jul 12 2021, 9:42 PM

mstorsjo added inline comments.Jul 12 2021, 10:47 PM

libcxx/src/chrono.cpp
93	If you're updating the patch still, could you also remove the `_WIN32_WINNT >= _WIN32_WINNT_WIN8 &&` part from this (the !desktop) condition as I requested?

Harbormaster completed remote builds in B113648: Diff 358141.Jul 12 2021, 11:23 PM

Prince213 updated this revision to Diff 358591.Jul 14 2021, 6:58 AM

Prince213 marked an inline comment as done.

Harbormaster completed remote builds in B113969: Diff 358591.Jul 14 2021, 8:38 AM

LGTM, thanks!

@ldionne Is this patch ok with you? It's ok'd by both me and @compnerd and passes Windows CI.

In D104987#2914398, @mstorsjo wrote:

@ldionne Is this patch ok with you? It's ok'd by both me and @compnerd and passes Windows CI.

Ping @ldionne

In D104987#2914398, @mstorsjo wrote:

@ldionne Is this patch ok with you? It's ok'd by both me and @compnerd and passes Windows CI.

Ping @ldionne

LGTM, but the only question I have is whether we should initialize this at file scope so as to avoid a function-local static? As it stands, we're going to be checking whether the function-local static is initialized every time we enter the function, which seems undesirable for a function like system_clock::now().

This revision is now accepted and ready to land.Aug 26 2021, 6:56 AM

In D104987#2967130, @ldionne wrote:

LGTM, but the only question I have is whether we should initialize this at file scope so as to avoid a function-local static? As it stands, we're going to be checking whether the function-local static is initialized every time we enter the function, which seems undesirable for a function like system_clock::now().

Thanks, that’s a good point. Initially I was about to respond that it’s a tradeoff between startup time and runtime - but I guess libc++ has a fair number of global startup constructors anyway, so it’s probably not a big deal anyway.

However - as the initialization order of global constructors is undefined, wouldn’t it be an issue if it’s used from a constructor in user code?

In D104987#2967333, @mstorsjo wrote:

In D104987#2967130, @ldionne wrote:

LGTM, but the only question I have is whether we should initialize this at file scope so as to avoid a function-local static? As it stands, we're going to be checking whether the function-local static is initialized every time we enter the function, which seems undesirable for a function like system_clock::now().

Thanks, that’s a good point. Initially I was about to respond that it’s a tradeoff between startup time and runtime - but I guess libc++ has a fair number of global startup constructors anyway, so it’s probably not a big deal anyway.

However - as the initialization order of global constructors is undefined, wouldn’t it be an issue if it’s used from a constructor in user code?

We could use the init_priority attribute to enforce that this static is initialized before other statics. Either way works for me.

In D104987#2967640, @ldionne wrote:

In D104987#2967333, @mstorsjo wrote:

In D104987#2967130, @ldionne wrote:

LGTM, but the only question I have is whether we should initialize this at file scope so as to avoid a function-local static? As it stands, we're going to be checking whether the function-local static is initialized every time we enter the function, which seems undesirable for a function like system_clock::now().

Thanks, that’s a good point. Initially I was about to respond that it’s a tradeoff between startup time and runtime - but I guess libc++ has a fair number of global startup constructors anyway, so it’s probably not a big deal anyway.

However - as the initialization order of global constructors is undefined, wouldn’t it be an issue if it’s used from a constructor in user code?

We could use the init_priority attribute to enforce that this static is initialized before other statics. Either way works for me.

Oh, right, we have a neat macro for that already (_LIBCPP_INIT_PRIORITY_MAX), that’s probably the best solution. Thanks!

I'll update the patch with a version that initializes the function pointer once at library init, with _LIBCPP_INIT_PRIORITY_MAX, to avoid any extra runtime synchronization overhead when calling the function.

Moved the function pointer initialization to file scope, to avoid needing synchronization when accessing it on each call. I had to change the plain function into a class with constructor, to be able to use _LIBCPP_INIT_PRIORITY_MAX.

I'll go ahead and push this (with the original author as git author) after the CI passes.

Harbormaster completed remote builds in B121402: Diff 368974.Aug 26 2021, 2:16 PM

Rerun CI

Harbormaster completed remote builds in B121491: Diff 369100.Aug 27 2021, 10:00 AM

Closed by commit rG3ec634e65a02: [libcxx] Use GetSystemTimePreciseAsFileTime() if available (authored by Prince213, committed by mstorsjo). · Explain WhyAug 27 2021, 10:12 AM

This revision was automatically updated to reflect the committed changes.

mstorsjo added a commit: rG3ec634e65a02: [libcxx] Use GetSystemTimePreciseAsFileTime() if available.