This is an archive of the discontinued LLVM Phabricator instance.

[builtins] Use Interlocked* intrinsics for atomics on MSVC
ClosedPublic

Authored by mstorsjo on Jul 30 2017, 11:23 PM.

Details

Summary

Tested on MSVC 2013, 2015 and 2017 targeting X86, X64 and ARM.

This fixes building emutls.c for Windows for ARM (both with clang which don't need these atomics fallbacks at all, but just failed due to the immintrin.h include before, and with MSVC).

Diff Detail

Repository
rL LLVM

Event Timeline

mstorsjo created this revision.Jul 30 2017, 11:23 PM
marsupial edited edge metadata.Aug 1 2017, 7:26 PM

I'm wondering if the #ifdef is now a dead code path entirely.
Do you have any insight into how ARM is getting __atomic_load_n and x86/64 not?

I'm wondering if the #ifdef is now a dead code path entirely.
Do you have any insight into how ARM is getting __atomic_load_n and x86/64 not?

I had only tested it with clang for ARM, where the whole ifdef isn't used. I hadn't tested this with MSVC for ARM though (I only use compiler-rt when building with clang-mingw, not when building with MSVC), but it does indeed fail there. The _load/store_be_u32/64 intrinsics aren't perhaps the most canonical way of doing atomic loads/stores, but MSVC is pretty badly lacking in that department. They do document this though:

https://msdn.microsoft.com/en-us/library/windows/desktop/ms684122(v=vs.85).aspx

Simple reads and writes to properly-aligned 32-bit variables are atomic operations. In other words, you will not end up with only one portion of the variable updated; all bits are updated in an atomic fashion. However, access is not guaranteed to be synchronized. If two threads are reading and writing from the same variable, you cannot determine if one thread will perform its read operation before the other performs its write operation.

Simple reads and writes to properly aligned 64-bit variables are atomic on 64-bit Windows. Reads and writes to 64-bit values are not guaranteed to be atomic on 32-bit Windows. Reads and writes to variables of other sizes are not guaranteed to be atomic on any platform.

Although I'm not sure how well that applies to non-x86 platforms. I can update it to use the Interlocked* intrinsics that are available for many platforms. They're a bit hairy though since not all of them are available for all platforms, and it varies a little across MSVC versions as well.

mstorsjo updated this revision to Diff 109292.Aug 2 2017, 1:20 AM
mstorsjo retitled this revision from [compiler-rt] [builtins] Only include immintrin.h if it is used to [builtins] Use _Interlocked* intrinsics for atomics on MSVC.
mstorsjo edited the summary of this revision. (Show Details)
mstorsjo added inline comments.Aug 2 2017, 1:23 AM
lib/builtins/emutls.c
219 ↗(On Diff #109292)

X64 doesn't have Add64, but have got Or64_np. ARM (and according to MSVC 2017's intrin.h, ARM64 as well) have got Add64 though.

231 ↗(On Diff #109292)

_InterlockedExchangePointer would ideally work on both 32 and 64 bit, but in MSVC 2013 for X86, this fails due to some kludge in MSVC's intrin.h.

marsupial added inline comments.Aug 2 2017, 6:29 AM
lib/builtins/emutls.c
219 ↗(On Diff #109292)

The docs for these functions say it returns the original value.
Should be fine as 0 is added/ored, but a comment explaining the situation may be wise.

mstorsjo added inline comments.Aug 2 2017, 8:27 AM
lib/builtins/emutls.c
219 ↗(On Diff #109292)

Sure, I can add that. Does my approach look sensible otherwise? Do you think I should add inline comments about the gotchas in different versions as motivation for each of them?

marsupial added inline comments.Aug 2 2017, 12:55 PM
lib/builtins/emutls.c
219 ↗(On Diff #109292)

Overall seems proper/better. I did a quick check, and the previous load/store ops are significantly faster than the interlocked variants (which is possible why I chose them), but think it adds too much complexity and weirdness for expanding it to other architectures.

So just a simple comment about the or/add of zero is returning the original value that makes the operation equivalent to a pure load is enough for me.

I think the #ifdefs are enough/self-explanatory as to platform availability.

mstorsjo updated this revision to Diff 109487.Aug 2 2017, 11:12 PM
mstorsjo retitled this revision from [builtins] Use _Interlocked* intrinsics for atomics on MSVC to [builtins] Use Interlocked* intrinsics for atomics on MSVC.

Added the requested comment. Switched to the version of the intrinsics without a leading underscore, which made it work more consistently across MSVC versions, simplifying the code yet quite a bit.

This revision is now accepted and ready to land.Aug 3 2017, 10:15 AM
This revision was automatically updated to reflect the committed changes.