This is an archive of the discontinued LLVM Phabricator instance.

lib/CodeGen/CGBuiltin.cpp
3003	I don't see InterlockedCompareExchangePointer creating a fence but it should. (since it is the fence version). So maybe that needs to be fixed (and possibly other fence functions). InterlockedCompareExchangePointer_nf should not create a fence so the current implementation seems correct.

efriedma added a subscriber: efriedma.Oct 3 2018, 12:03 PM

efriedma added inline comments.

lib/CodeGen/CGBuiltin.cpp
3003	"create a fence" is a little confusing because the AArch64 ldaxr/stlxr have builtin fences... but presumably the "nf" version should use ldxr/stxr instead. At the IR level, that corresponds to "monotonic".

rnk added inline comments.Oct 4 2018, 1:47 PM

lib/CodeGen/CGBuiltin.cpp
3003	Hm, if we're supposed to have fences, we're probably missing them everywhere. I thought the atomicrmw orderings were enough, but that shows what I know about the C++ memory model. =p We don't generate a fence for this intrinsic when MSVC does: unsigned char test_arm(long *base, long idx) { return _interlockedbittestandreset_acq(base, idx); } @efriedma, is MSVC over-emitting extra fences, or are we under-emitting? If this is their "bug", should we be bug-for-bug compatible with it so we get the same observable memory states?

MSVC seems to generate ldaxr/stlxr (as well as dmb) for InterlockedCompareExchangePointer and ldxr/stxr for InterlockedCompareExchangePointer_nf.

void *test_InterlockedCompareExchangePointer(void * volatile *Destination,
                                             void *Exchange, void *Comparand) {
  return _InterlockedCompareExchangePointer(Destination, Exchange, Comparand);
}

test_InterlockedCompareExchangePointer:
      58:       ff 83 00 d1     sub     sp, sp, #32
      5c:       e0 0b 00 f9     str     x0, [sp, #16]
      60:       e1 07 00 f9     str     x1, [sp, #8]
      64:       e2 03 00 f9     str     x2, [sp]
      68:       ec 03 40 f9     ldr     x12, [sp]
      6c:       ec 03 0c aa     mov     x12, x12
      70:       eb 07 40 f9     ldr     x11, [sp, #8]
      74:       eb 03 0b aa     mov     x11, x11
      78:       ea 0b 40 f9     ldr     x10, [sp, #16]
      7c:       ea 03 0a aa     mov     x10, x10

$LN3:
      80:       49 fd 5f c8     ldaxr   x9, [x10]
      84:       e9 03 09 aa     mov     x9, x9
      88:       3f 01 0c eb     cmp     x9, x12
      8c:       81 00 00 54     b.ne    #16 <$LN4>
      90:       4b fd 08 c8     stlxr   w8, x11, [x10]
      94:       1f 01 00 71     cmp     w8, #0
      98:       41 ff ff 54     b.ne    #-24 <$LN3>

$LN4:
      9c:       bf 3b 03 d5     dmb     ish
      a0:       e9 0f 00 f9     str     x9, [sp, #24]
      a4:       e0 0f 40 f9     ldr     x0, [sp, #24]
      a8:       ff 83 00 91     add     sp, sp, #32
      ac:       c0 03 5f d6     ret

void *test_InterlockedCompareExchangePointer_nf(void * volatile *Destination,
                                             void *Exchange, void *Comparand) {
  return _InterlockedCompareExchangePointer_nf(Destination, Exchange, Comparand);
}

test_InterlockedCompareExchangePointer_nf:
       0:       ff 83 00 d1     sub     sp, sp, #32
       4:       e0 0b 00 f9     str     x0, [sp, #16]
       8:       e1 07 00 f9     str     x1, [sp, #8]
       c:       e2 03 00 f9     str     x2, [sp]
      10:       ec 03 40 f9     ldr     x12, [sp]
      14:       ec 03 0c aa     mov     x12, x12
      18:       eb 07 40 f9     ldr     x11, [sp, #8]
      1c:       eb 03 0b aa     mov     x11, x11
      20:       ea 0b 40 f9     ldr     x10, [sp, #16]
      24:       ea 03 0a aa     mov     x10, x10

$LN3:
      28:       49 7d 5f c8     ldxr    x9, [x10]
      2c:       e9 03 09 aa     mov     x9, x9
      30:       3f 01 0c eb     cmp     x9, x12
      34:       81 00 00 54     b.ne    #16 <$LN4>
      38:       4b 7d 08 c8     stxr    w8, x11, [x10]
      3c:       1f 01 00 71     cmp     w8, #0
      40:       41 ff ff 54     b.ne    #-24 <$LN3>

$LN4:
      44:       e9 0f 00 f9     str     x9, [sp, #24]
      48:       e0 0f 40 f9     ldr     x0, [sp, #24]
      4c:       ff 83 00 91     add     sp, sp, #32
      50:       c0 03 5f d6     ret
      54:       00 00 00 00  <unknown>

AArch64 Psuedo Expansion does not seem to honor AtomicOrdering types (SequentiallyConsistent/Monotonic). It seems to always generate LDAXRX/STLXRX for 64-bit Cmp_Xchg.

How are you testing?

You might see different behavior at -O0.

efriedma added inline comments.Oct 4 2018, 3:27 PM

lib/CodeGen/CGBuiltin.cpp
3003	In terms of emitting fences, maybe we are actually missing a fence. An ldaxr+stlxr pair isn't a complete fence, at least in theory; it stops loads from being hosted past it, and stores from being sunk past it, but stores can be hoisted and loads can be sunk. That said, a stronger fence isn't necessary to model C++11 atomics; C++11 only requires sequential consistency with other sequentially consistent operations, plus an acquire/release barrier. And a program that isn't using C++11 relaxed atomics can't depend on a stronger barrier without triggering a data race. A program written before C++11 atomics were generally available could theoretically depend on the barrier through a series of operations that cause a data race under the C++11 rules. Since there were no clear rules before C++11, programs did things like emulate relaxed atomics on top of volatile pointers; using such a construct, a program could depend on the implied barrier. For this reason, gcc emits a fence on AArch64 after `__sync_` (but not `__atomic_`). And I guess MSVC does something similar for `_Interlocked`. That said, LLVM has never emitted a dmb for `__sync_` operations on AArch64, and it hasn't led to any practical problems as far as I know.

In D52807#1255835, @efriedma wrote:

How are you testing?

You might see different behavior at -O0.

Yes, I was testing at -O0. At -O2 I can see monotonic ordering affect the choice of LD/ST insts used.

LGTM

This revision is now accepted and ready to land.Oct 5 2018, 12:03 PM

Closed by commit rC343881: [COFF, ARM64] Add _InterlockedCompareExchangePointer_nf intrinsic (authored by mgrang). · Explain WhyOct 5 2018, 12:51 PM

This revision was automatically updated to reflect the committed changes.

efriedma added inline comments.Oct 5 2018, 12:52 PM

lib/CodeGen/CGBuiltin.cpp
3003	it stops loads from being hosted past it, and stores from being sunk past it, but stores can be hoisted and loads can be sunk. Looking over it again, this bit is a really bad description of what actually happens. Really there are two issues: one, an operation from after an ldaxr/stlxr pair could end up betwen the ldaxr and stlxr, and two, a cmpxchg doesn't always execute the stlxr.

Revision Contents

Path

Size

include/

clang/

Basic/

Builtins.def

1 line

lib/

CodeGen/

CGBuiltin.cpp

13 lines

test/

CodeGen/

ms-intrinsics.c

15 lines

Diff 168512

include/clang/Basic/Builtins.def

	Show First 20 Lines • Show All 780 Lines • ▼ Show 20 Lines
	LANGBUILTIN(_InterlockedAnd8, "ccD*c", "n", ALL_MS_LANGUAGES)			LANGBUILTIN(_InterlockedAnd8, "ccD*c", "n", ALL_MS_LANGUAGES)
	LANGBUILTIN(_InterlockedAnd16, "ssD*s", "n", ALL_MS_LANGUAGES)			LANGBUILTIN(_InterlockedAnd16, "ssD*s", "n", ALL_MS_LANGUAGES)
	LANGBUILTIN(_InterlockedAnd, "NiNiD*Ni", "n", ALL_MS_LANGUAGES)			LANGBUILTIN(_InterlockedAnd, "NiNiD*Ni", "n", ALL_MS_LANGUAGES)
	LANGBUILTIN(_InterlockedCompareExchange8, "ccD*cc", "n", ALL_MS_LANGUAGES)			LANGBUILTIN(_InterlockedCompareExchange8, "ccD*cc", "n", ALL_MS_LANGUAGES)
	LANGBUILTIN(_InterlockedCompareExchange16, "ssD*ss", "n", ALL_MS_LANGUAGES)			LANGBUILTIN(_InterlockedCompareExchange16, "ssD*ss", "n", ALL_MS_LANGUAGES)
	LANGBUILTIN(_InterlockedCompareExchange, "NiNiD*NiNi", "n", ALL_MS_LANGUAGES)			LANGBUILTIN(_InterlockedCompareExchange, "NiNiD*NiNi", "n", ALL_MS_LANGUAGES)
	LANGBUILTIN(_InterlockedCompareExchange64, "LLiLLiD*LLiLLi", "n", ALL_MS_LANGUAGES)			LANGBUILTIN(_InterlockedCompareExchange64, "LLiLLiD*LLiLLi", "n", ALL_MS_LANGUAGES)
	LANGBUILTIN(_InterlockedCompareExchangePointer, "vvDvv*", "n", ALL_MS_LANGUAGES)			LANGBUILTIN(_InterlockedCompareExchangePointer, "vvDvv*", "n", ALL_MS_LANGUAGES)
				LANGBUILTIN(_InterlockedCompareExchangePointer_nf, "vvDvv*", "n", ALL_MS_LANGUAGES)
	LANGBUILTIN(_InterlockedDecrement16, "ssD*", "n", ALL_MS_LANGUAGES)			LANGBUILTIN(_InterlockedDecrement16, "ssD*", "n", ALL_MS_LANGUAGES)
	LANGBUILTIN(_InterlockedDecrement, "NiNiD*", "n", ALL_MS_LANGUAGES)			LANGBUILTIN(_InterlockedDecrement, "NiNiD*", "n", ALL_MS_LANGUAGES)
	LANGBUILTIN(_InterlockedExchange, "NiNiD*Ni", "n", ALL_MS_LANGUAGES)			LANGBUILTIN(_InterlockedExchange, "NiNiD*Ni", "n", ALL_MS_LANGUAGES)
	LANGBUILTIN(_InterlockedExchange8, "ccD*c", "n", ALL_MS_LANGUAGES)			LANGBUILTIN(_InterlockedExchange8, "ccD*c", "n", ALL_MS_LANGUAGES)
	LANGBUILTIN(_InterlockedExchange16, "ssD*s", "n", ALL_MS_LANGUAGES)			LANGBUILTIN(_InterlockedExchange16, "ssD*s", "n", ALL_MS_LANGUAGES)
	LANGBUILTIN(_InterlockedExchangeAdd8, "ccD*c", "n", ALL_MS_LANGUAGES)			LANGBUILTIN(_InterlockedExchangeAdd8, "ccD*c", "n", ALL_MS_LANGUAGES)
	LANGBUILTIN(_InterlockedExchangeAdd16, "ssD*s", "n", ALL_MS_LANGUAGES)			LANGBUILTIN(_InterlockedExchangeAdd16, "ssD*s", "n", ALL_MS_LANGUAGES)
	LANGBUILTIN(_InterlockedExchangeAdd, "NiNiD*Ni", "n", ALL_MS_LANGUAGES)			LANGBUILTIN(_InterlockedExchangeAdd, "NiNiD*Ni", "n", ALL_MS_LANGUAGES)
	▲ Show 20 Lines • Show All 723 Lines • Show Last 20 Lines

lib/CodeGen/CGBuiltin.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,993 Lines • ▼ Show 20 Lines	return EmitCall(Call->getCallee()->getType(),
EmitScalarExpr(Chain));		EmitScalarExpr(Chain));
}		}
case Builtin::BI_InterlockedExchange8:		case Builtin::BI_InterlockedExchange8:
case Builtin::BI_InterlockedExchange16:		case Builtin::BI_InterlockedExchange16:
case Builtin::BI_InterlockedExchange:		case Builtin::BI_InterlockedExchange:
case Builtin::BI_InterlockedExchangePointer:		case Builtin::BI_InterlockedExchangePointer:
return RValue::get(		return RValue::get(
EmitMSVCBuiltinExpr(MSVCIntrin::_InterlockedExchange, E));		EmitMSVCBuiltinExpr(MSVCIntrin::_InterlockedExchange, E));
case Builtin::BI_InterlockedCompareExchangePointer: {		case Builtin::BI_InterlockedCompareExchangePointer:
		case Builtin::BI_InterlockedCompareExchangePointer_nf: {
		rnkUnsubmitted Not Done Reply Inline Actions Is the no fence version really equivalent to this seq_cst version here? rnk: Is the no fence version really equivalent to this seq_cst version here?
		mgrangAuthorUnsubmitted Not Done Reply Inline Actions I don't see InterlockedCompareExchangePointer creating a fence but it should. (since it is the fence version). So maybe that needs to be fixed (and possibly other fence functions). InterlockedCompareExchangePointer_nf should not create a fence so the current implementation seems correct. mgrang: I don't see InterlockedCompareExchangePointer creating a fence but it should. (since it is the…
		rnkUnsubmitted Not Done Reply Inline Actions Hm, if we're supposed to have fences, we're probably missing them everywhere. I thought the atomicrmw orderings were enough, but that shows what I know about the C++ memory model. =p We don't generate a fence for this intrinsic when MSVC does: unsigned char test_arm(long base, long idx) { return _interlockedbittestandreset_acq(base, idx); } @efriedma, is MSVC over-emitting extra fences, or are we under-emitting? If this is their "bug", should we be bug-for-bug compatible with it so we get the same observable memory states? rnk:* Hm, if we're supposed to have fences, we're probably missing them everywhere. I thought the…
		efriedmaUnsubmitted Not Done Reply Inline Actions "create a fence" is a little confusing because the AArch64 ldaxr/stlxr have builtin fences... but presumably the "nf" version should use ldxr/stxr instead. At the IR level, that corresponds to "monotonic". efriedma: "create a fence" is a little confusing because the AArch64 ldaxr/stlxr have builtin fences...
		efriedmaUnsubmitted Not Done Reply Inline Actions In terms of emitting fences, maybe we are actually missing a fence. An ldaxr+stlxr pair isn't a complete fence, at least in theory; it stops loads from being hosted past it, and stores from being sunk past it, but stores can be hoisted and loads can be sunk. That said, a stronger fence isn't necessary to model C++11 atomics; C++11 only requires sequential consistency with other sequentially consistent operations, plus an acquire/release barrier. And a program that isn't using C++11 relaxed atomics can't depend on a stronger barrier without triggering a data race. A program written before C++11 atomics were generally available could theoretically depend on the barrier through a series of operations that cause a data race under the C++11 rules. Since there were no clear rules before C++11, programs did things like emulate relaxed atomics on top of volatile pointers; using such a construct, a program could depend on the implied barrier. For this reason, gcc emits a fence on AArch64 after `__sync_` (but not `__atomic_`). And I guess MSVC does something similar for `_Interlocked`. That said, LLVM has never emitted a dmb for `__sync_` operations on AArch64, and it hasn't led to any practical problems as far as I know. efriedma: In terms of emitting fences, maybe we are actually missing a fence. An ldaxr+stlxr pair isn't…
		efriedmaUnsubmitted Not Done Reply Inline Actions it stops loads from being hosted past it, and stores from being sunk past it, but stores can be hoisted and loads can be sunk. Looking over it again, this bit is a really bad description of what actually happens. Really there are two issues: one, an operation from after an ldaxr/stlxr pair could end up betwen the ldaxr and stlxr, and two, a cmpxchg doesn't always execute the stlxr. efriedma: > it stops loads from being hosted past it, and stores from being sunk past it, but stores can…
llvm::Type *RTy;		llvm::Type *RTy;
llvm::IntegerType *IntType =		llvm::IntegerType *IntType =
IntegerType::get(getLLVMContext(),		IntegerType::get(getLLVMContext(),
getContext().getTypeSize(E->getType()));		getContext().getTypeSize(E->getType()));
llvm::Type *IntPtrType = IntType->getPointerTo();		llvm::Type *IntPtrType = IntType->getPointerTo();

llvm::Value *Destination =		llvm::Value *Destination =
Builder.CreateBitCast(EmitScalarExpr(E->getArg(0)), IntPtrType);		Builder.CreateBitCast(EmitScalarExpr(E->getArg(0)), IntPtrType);

llvm::Value *Exchange = EmitScalarExpr(E->getArg(1));		llvm::Value *Exchange = EmitScalarExpr(E->getArg(1));
RTy = Exchange->getType();		RTy = Exchange->getType();
Exchange = Builder.CreatePtrToInt(Exchange, IntType);		Exchange = Builder.CreatePtrToInt(Exchange, IntType);

llvm::Value *Comparand =		llvm::Value *Comparand =
Builder.CreatePtrToInt(EmitScalarExpr(E->getArg(2)), IntType);		Builder.CreatePtrToInt(EmitScalarExpr(E->getArg(2)), IntType);

auto Result =		auto Ordering =
Builder.CreateAtomicCmpXchg(Destination, Comparand, Exchange,		BuiltinID == Builtin::BI_InterlockedCompareExchangePointer_nf ?
AtomicOrdering::SequentiallyConsistent,		AtomicOrdering::Monotonic : AtomicOrdering::SequentiallyConsistent;
AtomicOrdering::SequentiallyConsistent);
		auto Result = Builder.CreateAtomicCmpXchg(Destination, Comparand, Exchange,
		Ordering, Ordering);
Result->setVolatile(true);		Result->setVolatile(true);

return RValue::get(Builder.CreateIntToPtr(Builder.CreateExtractValue(Result,		return RValue::get(Builder.CreateIntToPtr(Builder.CreateExtractValue(Result,
0),		0),
RTy));		RTy));
}		}
case Builtin::BI_InterlockedCompareExchange8:		case Builtin::BI_InterlockedCompareExchange8:
case Builtin::BI_InterlockedCompareExchange16:		case Builtin::BI_InterlockedCompareExchange16:
▲ Show 20 Lines • Show All 9,754 Lines • Show Last 20 Lines

test/CodeGen/ms-intrinsics.c

	Show First 20 Lines • Show All 229 Lines • ▼ Show 20 Lines
	// CHECK: %[[EXCHANGE:[0-9]+]] = ptrtoint i8* %Exchange to [[iPTR]]			// CHECK: %[[EXCHANGE:[0-9]+]] = ptrtoint i8* %Exchange to [[iPTR]]
	// CHECK: %[[COMPARAND:[0-9]+]] = ptrtoint i8* %Comparand to [[iPTR]]			// CHECK: %[[COMPARAND:[0-9]+]] = ptrtoint i8* %Comparand to [[iPTR]]
	// CHECK: %[[XCHG:[0-9]+]] = cmpxchg volatile [[iPTR]]* %[[DEST:[0-9]+]], [[iPTR]] %[[COMPARAND:[0-9]+]], [[iPTR]] %[[EXCHANGE:[0-9]+]] seq_cst seq_cst			// CHECK: %[[XCHG:[0-9]+]] = cmpxchg volatile [[iPTR]]* %[[DEST:[0-9]+]], [[iPTR]] %[[COMPARAND:[0-9]+]], [[iPTR]] %[[EXCHANGE:[0-9]+]] seq_cst seq_cst
	// CHECK: %[[EXTRACT:[0-9]+]] = extractvalue { [[iPTR]], i1 } %[[XCHG]], 0			// CHECK: %[[EXTRACT:[0-9]+]] = extractvalue { [[iPTR]], i1 } %[[XCHG]], 0
	// CHECK: %[[RESULT:[0-9]+]] = inttoptr [[iPTR]] %[[EXTRACT]] to i8*			// CHECK: %[[RESULT:[0-9]+]] = inttoptr [[iPTR]] %[[EXTRACT]] to i8*
	// CHECK: ret i8* %[[RESULT:[0-9]+]]			// CHECK: ret i8* %[[RESULT:[0-9]+]]
	// CHECK: }			// CHECK: }

				void test_InterlockedCompareExchangePointer_nf(void volatile *Destination,
				void Exchange, void Comparand) {
				return _InterlockedCompareExchangePointer_nf(Destination, Exchange, Comparand);
				}

				// CHECK: define{{.}}i8 @test_InterlockedCompareExchangePointer_nf(i8** {{[a-z_ ]}}%Destination, i8 {{[a-z_ ]}}%Exchange, i8 {{[a-z_ ]}}%Comparand){{.}}{
				// CHECK: %[[DEST:[0-9]+]] = bitcast i8** %Destination to [[iPTR]]*
				// CHECK: %[[EXCHANGE:[0-9]+]] = ptrtoint i8* %Exchange to [[iPTR]]
				// CHECK: %[[COMPARAND:[0-9]+]] = ptrtoint i8* %Comparand to [[iPTR]]
				// CHECK: %[[XCHG:[0-9]+]] = cmpxchg volatile [[iPTR]]* %[[DEST:[0-9]+]], [[iPTR]] %[[COMPARAND:[0-9]+]], [[iPTR]] %[[EXCHANGE:[0-9]+]] monotonic monotonic
				// CHECK: %[[EXTRACT:[0-9]+]] = extractvalue { [[iPTR]], i1 } %[[XCHG]], 0
				// CHECK: %[[RESULT:[0-9]+]] = inttoptr [[iPTR]] %[[EXTRACT]] to i8*
				// CHECK: ret i8* %[[RESULT:[0-9]+]]
				// CHECK: }

	char test_InterlockedExchange8(char volatile *value, char mask) {			char test_InterlockedExchange8(char volatile *value, char mask) {
	return _InterlockedExchange8(value, mask);			return _InterlockedExchange8(value, mask);
	}			}
	// CHECK: define{{.}}i8 @test_InterlockedExchange8(i8{{[a-z_ ]}}%value, i8{{[a-z_ ]}}%mask){{.*}}{			// CHECK: define{{.}}i8 @test_InterlockedExchange8(i8{{[a-z_ ]}}%value, i8{{[a-z_ ]}}%mask){{.*}}{
	// CHECK: [[RESULT:%[0-9]+]] = atomicrmw xchg i8* %value, i8 %mask seq_cst			// CHECK: [[RESULT:%[0-9]+]] = atomicrmw xchg i8* %value, i8 %mask seq_cst
	// CHECK: ret i8 [[RESULT:%[0-9]+]]			// CHECK: ret i8 [[RESULT:%[0-9]+]]
	// CHECK: }			// CHECK: }

	▲ Show 20 Lines • Show All 366 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[COFF, ARM64] Add _InterlockedCompareExchangePointer_nf intrinsicClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 168512

include/clang/Basic/Builtins.def

lib/CodeGen/CGBuiltin.cpp

test/CodeGen/ms-intrinsics.c

[COFF, ARM64] Add _InterlockedCompareExchangePointer_nf intrinsic
ClosedPublic