Details
Diff Detail
- Repository
- rC Clang
Event Timeline
| lib/CodeGen/CGBuiltin.cpp | ||
|---|---|---|
| 3003 | Is the no fence version really equivalent to this seq_cst version here? | |
| lib/CodeGen/CGBuiltin.cpp | ||
|---|---|---|
| 3003 | I don't see InterlockedCompareExchangePointer creating a fence but it should. (since it is the fence version). So maybe that needs to be fixed (and possibly other fence functions). InterlockedCompareExchangePointer_nf should not create a fence so the current implementation seems correct. | |
| lib/CodeGen/CGBuiltin.cpp | ||
|---|---|---|
| 3003 | "create a fence" is a little confusing because the AArch64 ldaxr/stlxr have builtin fences... but presumably the "nf" version should use ldxr/stxr instead. At the IR level, that corresponds to "monotonic". | |
| lib/CodeGen/CGBuiltin.cpp | ||
|---|---|---|
| 3003 | Hm, if we're supposed to have fences, we're probably missing them everywhere. I thought the atomicrmw orderings were enough, but that shows what I know about the C++ memory model. =p We don't generate a fence for this intrinsic when MSVC does: unsigned char test_arm(long *base, long idx) {
return _interlockedbittestandreset_acq(base, idx);
}@efriedma, is MSVC over-emitting extra fences, or are we under-emitting? If this is their "bug", should we be bug-for-bug compatible with it so we get the same observable memory states? | |
MSVC seems to generate ldaxr/stlxr (as well as dmb) for InterlockedCompareExchangePointer and ldxr/stxr for InterlockedCompareExchangePointer_nf.
void *test_InterlockedCompareExchangePointer(void * volatile *Destination,
void *Exchange, void *Comparand) {
return _InterlockedCompareExchangePointer(Destination, Exchange, Comparand);
}
test_InterlockedCompareExchangePointer:
58: ff 83 00 d1 sub sp, sp, #32
5c: e0 0b 00 f9 str x0, [sp, #16]
60: e1 07 00 f9 str x1, [sp, #8]
64: e2 03 00 f9 str x2, [sp]
68: ec 03 40 f9 ldr x12, [sp]
6c: ec 03 0c aa mov x12, x12
70: eb 07 40 f9 ldr x11, [sp, #8]
74: eb 03 0b aa mov x11, x11
78: ea 0b 40 f9 ldr x10, [sp, #16]
7c: ea 03 0a aa mov x10, x10
$LN3:
80: 49 fd 5f c8 ldaxr x9, [x10]
84: e9 03 09 aa mov x9, x9
88: 3f 01 0c eb cmp x9, x12
8c: 81 00 00 54 b.ne #16 <$LN4>
90: 4b fd 08 c8 stlxr w8, x11, [x10]
94: 1f 01 00 71 cmp w8, #0
98: 41 ff ff 54 b.ne #-24 <$LN3>
$LN4:
9c: bf 3b 03 d5 dmb ish
a0: e9 0f 00 f9 str x9, [sp, #24]
a4: e0 0f 40 f9 ldr x0, [sp, #24]
a8: ff 83 00 91 add sp, sp, #32
ac: c0 03 5f d6 retvoid *test_InterlockedCompareExchangePointer_nf(void * volatile *Destination,
void *Exchange, void *Comparand) {
return _InterlockedCompareExchangePointer_nf(Destination, Exchange, Comparand);
}
test_InterlockedCompareExchangePointer_nf:
0: ff 83 00 d1 sub sp, sp, #32
4: e0 0b 00 f9 str x0, [sp, #16]
8: e1 07 00 f9 str x1, [sp, #8]
c: e2 03 00 f9 str x2, [sp]
10: ec 03 40 f9 ldr x12, [sp]
14: ec 03 0c aa mov x12, x12
18: eb 07 40 f9 ldr x11, [sp, #8]
1c: eb 03 0b aa mov x11, x11
20: ea 0b 40 f9 ldr x10, [sp, #16]
24: ea 03 0a aa mov x10, x10
$LN3:
28: 49 7d 5f c8 ldxr x9, [x10]
2c: e9 03 09 aa mov x9, x9
30: 3f 01 0c eb cmp x9, x12
34: 81 00 00 54 b.ne #16 <$LN4>
38: 4b 7d 08 c8 stxr w8, x11, [x10]
3c: 1f 01 00 71 cmp w8, #0
40: 41 ff ff 54 b.ne #-24 <$LN3>
$LN4:
44: e9 0f 00 f9 str x9, [sp, #24]
48: e0 0f 40 f9 ldr x0, [sp, #24]
4c: ff 83 00 91 add sp, sp, #32
50: c0 03 5f d6 ret
54: 00 00 00 00 <unknown>AArch64 Psuedo Expansion does not seem to honor AtomicOrdering types (SequentiallyConsistent/Monotonic). It seems to always generate LDAXRX/STLXRX for 64-bit Cmp_Xchg.
| lib/CodeGen/CGBuiltin.cpp | ||
|---|---|---|
| 3003 | In terms of emitting fences, maybe we are actually missing a fence. An ldaxr+stlxr pair isn't a complete fence, at least in theory; it stops loads from being hosted past it, and stores from being sunk past it, but stores can be hoisted and loads can be sunk. That said, a stronger fence isn't necessary to model C++11 atomics; C++11 only requires sequential consistency with other sequentially consistent operations, plus an acquire/release barrier. And a program that isn't using C++11 relaxed atomics can't depend on a stronger barrier without triggering a data race. A program written before C++11 atomics were generally available could theoretically depend on the barrier through a series of operations that cause a data race under the C++11 rules. Since there were no clear rules before C++11, programs did things like emulate relaxed atomics on top of volatile pointers; using such a construct, a program could depend on the implied barrier. For this reason, gcc emits a fence on AArch64 after __sync_* (but not __atomic_*). And I guess MSVC does something similar for _Interlocked*. That said, LLVM has never emitted a dmb for __sync_* operations on AArch64, and it hasn't led to any practical problems as far as I know. | |
Yes, I was testing at -O0. At -O2 I can see monotonic ordering affect the choice of LD/ST insts used.
| lib/CodeGen/CGBuiltin.cpp | ||
|---|---|---|
| 3003 |
Looking over it again, this bit is a really bad description of what actually happens. Really there are two issues: one, an operation from after an ldaxr/stlxr pair could end up betwen the ldaxr and stlxr, and two, a cmpxchg doesn't always execute the stlxr. | |
Is the no fence version really equivalent to this seq_cst version here?