This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
compiler-rt/lib/builtins/
-
lib/
-
builtins/
8
atomic.c

Differential D45321

[atomics] Fix runtime calls for misaligned atomics
ClosedPublic

Authored by t.p.northover on Apr 5 2018, 6:36 AM.

Download Raw Diff

Details

Reviewers

compnerd
efriedma
javed.absar
jfb

Summary

The __atomic_whatever functions in compiler-rt were only looking at the size of their argument when deciding whether the implementation should be lock-free. This is incorrect:

On x86 simple loads and stores are not atomic if the address is misaligned. They could use cmpxchg to implement loads & stores instead, but I believe this would be incompatible with GCC's ABI (and it is an ABI choice because you obviously can't mix them).
On other platforms (ARM that I know of) misaligned atomic accesses will simply fault (as well as the above).

So this patch falls back to locks for the misaligned case. It also adds the single-byte case to the generic functions since even though Clang won't make calls to them, they should probably work properly if anyone does.

Diff Detail

Repository: rCRT Compiler Runtime

Event Timeline

t.p.northover created this revision.Apr 5 2018, 6:36 AM

Herald added subscribers: Restricted Project, kristof.beyls, mcrosier. · View Herald TranscriptApr 5 2018, 6:36 AM

How how can atomic objects end up being mis-aligned? Isn't it UB already?

Atomics accessed via C11 _Atomic and C++11 std::atomic will be suitably aligned, but there's a reasonable amount of legacy code that uses the GCC builtins on non-atomic types (of unknown alignment) and this is what Clang uses to implement those accesses when they come up. Also, on the LLVM side there are even fewer restrictions and load atomic i32, i32* %ptr monotonic, align 1 is perfectly valid IR that gets lowered to these calls.

In parallel I'm trying to add a performance warning for when Clang hits this issue and is forced to generate libcalls.

Atomics accessed via C11 _Atomic and C++11 std::atomic will be suitably aligned, but there's a reasonable amount of legacy code that uses the GCC builtins on non-atomic types (of unknown alignment) and this is what Clang uses to implement those accesses when they come up.

This is illegal, right?
Even non-atomic accesses can fail and be miscompiled for unaligned variables, no?
For example, I would assume that "(uintptr_t)ptr % 4 == 0" where ptr come from an unmarked int will be evaluated to a compile-time true.

Also, on the LLVM side there are even fewer restrictions and load atomic i32, i32* %ptr monotonic, align 1 is perfectly valid IR that gets lowered to these calls.

I've just checked the LangRef and realised this is false. LLVM does make such loads & stores UB, though I still think the Clang use is valid (if unfortunate).

Even non-atomic accesses can fail and be miscompiled for unaligned variables, no?

For example, I would assume that "(uintptr_t)ptr % 4 == 0" where ptr come from an unmarked int will be evaluated to a compile-time true.

For an unmarked int that's true, but there are ways to get integer types with lowered alignment requirements. Mostly revolving around applying some kind of packed attribute or pragma to either a struct or the type directly.

I've just checked the LangRef and realised this is false. LLVM does make such loads & stores UB, though I still think the Clang use is valid (if unfortunate).

Bug won't clang lower these access to the exactly that IR that LLVM declares as UB?

Bug won't clang lower these access to the exactly that IR that LLVM declares as UB?

No, it explicitly checks alignment to decide whether to make a libcall itself.

Also, the LLVM situation is even more nuanced: after r320243 I think the LangRef is out of date and misaligned ones can't be called UB; fortunately, it also seems to do the right thing.

No, it explicitly checks alignment to decide whether to make a libcall itself.

Ack. Thanks for explaining.

Yes, the situation around atomic instructions in IR is a bit of a mess. Originally, the backend was required to reject instructions which wouldn't be lock-free, and all the documentation with that expectation. But that changed later to start generating calls to __atomic_* instead, and I guess the documentation was never completely updated.

clang might also have a bug here? I think I've seen it generate calls to __atomic_load_8 with a misaligned operand.

compiler-rt/lib/builtins/atomic.c
177–178	We also need to fix this FIXME, for correctness... if we have a 16-byte atomic store implementation, we need to use it.

t.p.northover added inline comments.Apr 6 2018, 9:25 AM

compiler-rt/lib/builtins/atomic.c
177–178	Any chance I could skip that for now? It's an order of magnitude harder than fixing the misalignment problems since LLVM already generates a mixture of libcalls and cmpxchg based on the CPU for x86, which forces this to be a runtime CPUID check.

efriedma added inline comments.Apr 6 2018, 10:59 AM

compiler-rt/lib/builtins/atomic.c
177–178	Can you at least fix it for targets where 16-byte atomics are always lock-free, like aarch64? (Of course I don't expect you to implement the x86 cpuid bits.)

t.p.northover added inline comments.Apr 6 2018, 11:15 AM

compiler-rt/lib/builtins/atomic.c
177–178	Sounds like a reasonable compromise. This check is easy enough to get right (I can use `defined(__SIZEOF_INT128__)` as a proxy for whether it's supported). I don't suppose you have any ideas about the `IS_LOCK_FREE_16` check though? I had a bit of a think earlier on and it's harder than it looks. The best I've come up with so far is: #define IS_LOCK_FREE_16(ptr) (__builtin_constant_p(__c11_atomic_is_lock_free(16)) && __c11_atomic_is_lock_free(16) && (uintptr_t)ptr % 16 == 0) The constant check is there because otherwise we get a realised (and unresolved) call to `__atomic_is_lock_free(16, 0)` when Clang doesn't know -- and on x86 this would, of course, be CPUID based if implemented so I can't in good conscience make Clang decide it's 0.

__atomic_always_lock_free(16,0)?

Updated so that 16-byte atomics work when the instruction is always available. Good idea Eli.

LGTM, but maybe wait a couple days to see if @compnerd wants to comment.

This revision is now accepted and ready to land.Apr 9 2018, 12:15 PM

Sorry to update the patch after you've reviewed it, but I'm afraid as a result of https://llvm.org/PR34347 it's turned out that misaligned x86 atomics actually need the lock-free implementation, which makes this even messier than I thought.

It adds 3 problems:

As far as I know there's no way Clang communicates this, in fact I don't think it's even aware of it. So the updated patch has bare x86_64 and i386 checks.
There is no way to make Clang inline a misaligned atomic operation that I could find, and even if it could the LLVM IR would be dubious (the LangRef is sprinkled with warnings about such things being undefined and they get converted back to libcalls anyway).
x86 movs aren't atomic when misaligned, so load & store need a separate path for that case.

Because of the second, misaligned pointers in the new patch go down an aligned codepath with all the technical UB implications you might fear. It works, though. As far as I can tell this fixes everything except 128-bit x86 atomics in practice.

t.p.northover updated this revision to Diff 142366.Apr 13 2018, 2:45 AM

The Linux libatomic __atomic_is_lock_free returns false for unaligned pointers, even on x86. clang must generate code which is compatible with that, so it *cannot* inline misaligned atomic operations. Given that clang can't inline misaligned atomic operations anyway, I don't see any compelling reason for compiler-rt to try to lower misaligned atomics using lock-free operations.

Not that's it's really relevant, but for the testcase in https://bugzilla.redhat.com/show_bug.cgi?id=1565766#c4 , clang should assume the pointer is aligned: it's undefined behavior to produce an int* which isn't 4-byte aligned.

The Linux libatomic __atomic_is_lock_free returns false for unaligned pointers, even on x86. clang must generate code which is compatible with that, so it *cannot* inline misaligned atomic operations.

I agree we need to be compatible, but I don't see that:

#include <stdlib.h>
#include <stdio.h>

int main() {
  char *mem = malloc(16);

  for (int i = 1; i <= 64; i <<= 1)
    printf("__atomic_is_lock_free(%d, %p) = %d\n", i, mem+1, __atomic_is_lock_free(i, mem+1));
   free(mem);
}

tells me that everything <= 4 is lock-free on x86_64 Linux (I have no idea why 4). GCC also does inline them (and I see no special misaligned libatomic implementation except for __atomic_load and __atomic_store which take a lock).

Basically, I think GCC is in as bad shape as we are but in practice lock-free implementations are what gets emitted. At least we can change our implementation in the future since we do emit a libcall for misaligned operations.

I don't see any compelling reason for compiler-rt to try to lower misaligned atomics using lock-free operations.

My best argument is still that I think it's the maximally compatible way to proceed. It is ugly though.

Herald added a reviewer: javed.absar. · View Herald TranscriptMay 11 2018, 6:06 AM

efriedma added inline comments.May 16 2018, 3:50 PM

compiler-rt/lib/builtins/atomic.c
193	Isn't the return value of `__c11_atomic_compare_exchange_weak` the success boolean? Even with that fixed, though, this is UB because you're violating the alignment rules for `_Atomic`. It might appear to emit the right code for now, but it's a ticking time bomb because the compiler could optimize the "cmpxchg" to a "mov" (since you're not actually modifying the memory on success). The "right" way to do this is to use `__atomic_load` on a pointer to an unligned type. Granted, clang currently doesn't lower that to the sequence you want (instead it generates a libcall to `__atomic_load_4`).

Herald added a reviewer: jfb. · View Herald TranscriptMay 16 2018, 3:50 PM

efriedma added inline comments.May 16 2018, 3:58 PM

compiler-rt/lib/builtins/atomic.c
193	(Or, of course, you could implement this with inline assembly, if you can't get the compiler to emit the sequence you want.)

Sorry I take so long replying to this each time. I really need to plan my time better.

compiler-rt/lib/builtins/atomic.c
193	Isn't the return value of __c11_atomic_compare_exchange_weak the success boolean? Oops, yes. I'll fix that. Even with that fixed, though, this is UB because you're violating the alignment rules for _Atomic. Oh yes, it's definitely really dodgy. As part of the implementation I think compiler-rt gets some kind of latitude to know about Clang's implementation details there though. It's impossible to implement std::vector without UB for example, but libc++ keeps chugging along anyway. It might appear to emit the right code for now, but it's a ticking time bomb because the compiler could optimize the "cmpxchg" to a "mov" (since you're not actually modifying the memory on success). I don't think it could (ignoring the whole UB => nasal demons thing). The mov would not do an atomic load, which would render the return value potentially invalid. A cmpxchg still needs to return a valid previous value even if memory is not modified. It could potentially modify it to an atomic load and then back into a call to `__atomic_load_4` though (assuming it inserted the correct barriers, which might not be possible for a cmpxchg release). The "right" way to do this is to use `__atomic_load` on a pointer to an unligned type. Granted, clang currently doesn't lower that to the sequence you want (instead it generates a libcall to `__atomic_load_4`). Yes. And even if there was a way to get Clang to generate the desired IR LLVM doesn't do the right thing with it either (it calls `__atomic_load`). Inline assembly is an option, but even for just x86 it's significantly uglier because of the whole amd64 vs x86, cmpxchg16b thing.

Fixing cmpxchg implementation of load to return correct value.

Did a bit more research into the GNU libatomic behavior. As far as I can tell, my initial assessment was correct: libatomic never uses unaligned atomic operations. However, it uses a trick which makes this a little complicated to test: it promotes small atomic operations to pointer size before performing the alignment check. So, for example, libatomic supports lock-free operations with size=3, depending on the input pointer. (On x86, it still only promotes up to pointer size, even though larger lock-free atomics are available; not sure why.)

compiler-rt/lib/builtins/atomic.c
193	The mov would not do an atomic load, which would render the return value potentially invalid. A cmpxchg still needs to return a valid previous value even if memory is not modified. The normal lowering for an atomic load from an aligned pointer on x86 is "mov"; not sure what you're getting at here. even for just x86 it's significantly uglier because of the whole amd64 vs x86, cmpxchg16b thing. x86 doesn't support unaligned 16-byte atomic operations.

Ping @t.p.northover and @compnerd

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJan 12 2020, 4:41 PM

Herald added a subscriber: dexonsmith. · View Herald Transcript

efriedma mentioned this in D86510: [compiler-rt] Fix atomic support functions on 32-bit architectures.Aug 25 2020, 4:00 PM

tstellar added a subscriber: tstellar.Jul 15 2021, 10:04 PM

Herald added a subscriber: pengfei. · View Herald TranscriptJul 15 2021, 10:04 PM

lkail added a subscriber: lkail.Feb 8 2022, 6:22 PM

Done by D86510

Herald added a project: Restricted Project. · View Herald TranscriptSun, Jan 21, 4:30 PM

Herald added a subscriber: Enna1. · View Herald Transcript

Revision Contents

Path

Size

compiler-rt/

lib/

builtins/

atomic.c

95 lines

Diff 151698

compiler-rt/lib/builtins/atomic.c

Show First 20 Lines • Show All 116 Lines • ▼ Show 20 Lines	static __inline Lock lock_for_pointer(void ptr) {
// Now use the high(er) set of bits to perturb the hash, so that we don't		// Now use the high(er) set of bits to perturb the hash, so that we don't
// get collisions from atomic fields in a single object		// get collisions from atomic fields in a single object
hash >>= 16;		hash >>= 16;
hash ^= low;		hash ^= low;
// Return a pointer to the word to use		// Return a pointer to the word to use
return locks + (hash & SPINLOCK_MASK);		return locks + (hash & SPINLOCK_MASK);
}		}

/// Macros for determining whether a size is lock free. Clang can not yet		// x86 guarantees all operations except load and store will be atomic on
/// codegen __atomic_is_lock_free(16), so for now we assume 16-byte values are		// misaligned pointers too (with possible performance penalties), and that's
/// not lock free.		// baked into the ABI.
#define IS_LOCK_FREE_1 __c11_atomic_is_lock_free(1)		#if defined(__x86_64__) \|\| defined(__i386__)
#define IS_LOCK_FREE_2 __c11_atomic_is_lock_free(2)		#define LOCK_FREE_MISALIGNED_ATOMICS 1
#define IS_LOCK_FREE_4 __c11_atomic_is_lock_free(4)		#else
#define IS_LOCK_FREE_8 __c11_atomic_is_lock_free(8)		#define LOCK_FREE_MISALIGNED_ATOMICS 0
#define IS_LOCK_FREE_16 0		#endif

		/// Macros for determining whether a size is lock free.
		/// __c11_atomic_lock_free(16) is sometimes a dynamic property (e.g. not all
		/// x86_64 CPUs have cmpxchg16) but compiler-rt does not implement the required
		/// libcall yet so we use an approximation valid on all CPUs where it is a
		/// static property.
		#define IS_LOCK_FREE_1(ptr) __c11_atomic_is_lock_free(1)
		#define IS_LOCK_FREE_2(ptr) (__c11_atomic_is_lock_free(2) && (LOCK_FREE_MISALIGNED_ATOMICS \|\| (uintptr_t)ptr % 2 == 0))
		#define IS_LOCK_FREE_4(ptr) (__c11_atomic_is_lock_free(4) && (LOCK_FREE_MISALIGNED_ATOMICS \|\| (uintptr_t)ptr % 4 == 0))
		#define IS_LOCK_FREE_8(ptr) (__c11_atomic_is_lock_free(8) && (LOCK_FREE_MISALIGNED_ATOMICS \|\| (uintptr_t)ptr % 8 == 0))
		#define IS_LOCK_FREE_16(ptr) (__atomic_always_lock_free(16, 0) && (LOCK_FREE_MISALIGNED_ATOMICS \|\| (uintptr_t)ptr % 16 == 0))

		// Most 32-bit platforms do not have __int128_t, but fortunately they also don't
		// support 16-byte atomics so in that case the horribly wrong use of uint64_t
		// will never actually happen.
		#if defined(__SIZEOF_INT128__)
		#define LOCK_FREE_16_TYPE __uint128_t
		#else
		_Static_assert(!IS_LOCK_FREE_16(0), "lock free 16-byte atomic attempted but nothing to implement it with");
		#define LOCK_FREE_16_TYPE uint64_t
		#endif

/// Macro that calls the compiler-generated lock-free versions of functions		/// Macro that calls the compiler-generated lock-free versions of functions
/// when they exist.		/// when they exist.
#define LOCK_FREE_CASES() \		#define LOCK_FREE_CASES(ptr) \
do {\		do {\
switch (size) {\		switch (size) {\
		case 1:\
		if (IS_LOCK_FREE_1(ptr)) {\
		LOCK_FREE_ACTION(uint8_t);\
		}\
case 2:\		case 2:\
if (IS_LOCK_FREE_2) {\		if (IS_LOCK_FREE_2(ptr)) {\
LOCK_FREE_ACTION(uint16_t);\		LOCK_FREE_ACTION(uint16_t);\
}\		}\
case 4:\		case 4:\
if (IS_LOCK_FREE_4) {\		if (IS_LOCK_FREE_4(ptr)) {\
LOCK_FREE_ACTION(uint32_t);\		LOCK_FREE_ACTION(uint32_t);\
}\		}\
case 8:\		case 8:\
if (IS_LOCK_FREE_8) {\		if (IS_LOCK_FREE_8(ptr)) {\
LOCK_FREE_ACTION(uint64_t);\		LOCK_FREE_ACTION(uint64_t);\
}\		}\
case 16:\		case 16:\
if (IS_LOCK_FREE_16) {\		if (IS_LOCK_FREE_16(ptr)) {\
/* FIXME: __uint128_t isn't available on 32 bit platforms.		LOCK_FREE_ACTION(LOCK_FREE_16_TYPE);\
		efriedmaUnsubmitted Not Done Reply Inline Actions We also need to fix this FIXME, for correctness... if we have a 16-byte atomic store implementation, we need to use it. efriedma: We also need to fix this FIXME, for correctness... if we have a 16-byte atomic store…
		t.p.northoverAuthorUnsubmitted Not Done Reply Inline Actions Any chance I could skip that for now? It's an order of magnitude harder than fixing the misalignment problems since LLVM already generates a mixture of libcalls and cmpxchg based on the CPU for x86, which forces this to be a runtime CPUID check. t.p.northover: Any chance I could skip that for now? It's an order of magnitude harder than fixing the…
		efriedmaUnsubmitted Not Done Reply Inline Actions Can you at least fix it for targets where 16-byte atomics are always lock-free, like aarch64? (Of course I don't expect you to implement the x86 cpuid bits.) efriedma: Can you at least fix it for targets where 16-byte atomics are always lock-free, like aarch64?
		t.p.northoverAuthorUnsubmitted Not Done Reply Inline Actions Sounds like a reasonable compromise. This check is easy enough to get right (I can use `defined(__SIZEOF_INT128__)` as a proxy for whether it's supported). I don't suppose you have any ideas about the `IS_LOCK_FREE_16` check though? I had a bit of a think earlier on and it's harder than it looks. The best I've come up with so far is: #define IS_LOCK_FREE_16(ptr) (__builtin_constant_p(__c11_atomic_is_lock_free(16)) && __c11_atomic_is_lock_free(16) && (uintptr_t)ptr % 16 == 0) The constant check is there because otherwise we get a realised (and unresolved) call to `__atomic_is_lock_free(16, 0)` when Clang doesn't know -- and on x86 this would, of course, be CPUID based if implemented so I can't in good conscience make Clang decide it's 0. t.p.northover: Sounds like a reasonable compromise. This check is easy enough to get right (I can use `defined…
LOCK_FREE_ACTION(__uint128_t);*/\
}\		}\
}\		}\
} while (0)		} while (0)


/// An atomic load operation. This is atomic with respect to the source		/// An atomic load operation. This is atomic with respect to the source
/// pointer only.		/// pointer only.
void __atomic_load_c(int size, void src, void dest, int model) {		void __atomic_load_c(int size, void src, void dest, int model) {
		#if LOCK_FREE_MISALIGNED_ATOMICS
#define LOCK_FREE_ACTION(type) \		#define LOCK_FREE_ACTION(type) \
		if ((uintptr_t)src % size == 0)\
((type)dest) = __c11_atomic_load((_Atomic(type)*)src, model);\		((type)dest) = __c11_atomic_load((_Atomic(type)*)src, model);\
		else {\
		type tmp = 0;\
		__c11_atomic_compare_exchange_weak((_Atomic(type)*)src, &tmp, 0, model, model);\
		efriedmaUnsubmitted Not Done Reply Inline Actions Isn't the return value of `__c11_atomic_compare_exchange_weak` the success boolean? Even with that fixed, though, this is UB because you're violating the alignment rules for `_Atomic`. It might appear to emit the right code for now, but it's a ticking time bomb because the compiler could optimize the "cmpxchg" to a "mov" (since you're not actually modifying the memory on success). The "right" way to do this is to use `__atomic_load` on a pointer to an unligned type. Granted, clang currently doesn't lower that to the sequence you want (instead it generates a libcall to `__atomic_load_4`). efriedma: Isn't the return value of `__c11_atomic_compare_exchange_weak` the success boolean? Even with…
		efriedmaUnsubmitted Not Done Reply Inline Actions (Or, of course, you could implement this with inline assembly, if you can't get the compiler to emit the sequence you want.) efriedma: (Or, of course, you could implement this with inline assembly, if you can't get the compiler to…
		t.p.northoverAuthorUnsubmitted Not Done Reply Inline Actions Isn't the return value of __c11_atomic_compare_exchange_weak the success boolean? Oops, yes. I'll fix that. Even with that fixed, though, this is UB because you're violating the alignment rules for _Atomic. Oh yes, it's definitely really dodgy. As part of the implementation I think compiler-rt gets some kind of latitude to know about Clang's implementation details there though. It's impossible to implement std::vector without UB for example, but libc++ keeps chugging along anyway. It might appear to emit the right code for now, but it's a ticking time bomb because the compiler could optimize the "cmpxchg" to a "mov" (since you're not actually modifying the memory on success). I don't think it could (ignoring the whole UB => nasal demons thing). The mov would not do an atomic load, which would render the return value potentially invalid. A cmpxchg still needs to return a valid previous value even if memory is not modified. It could potentially modify it to an atomic load and then back into a call to `__atomic_load_4` though (assuming it inserted the correct barriers, which might not be possible for a cmpxchg release). The "right" way to do this is to use `__atomic_load` on a pointer to an unligned type. Granted, clang currently doesn't lower that to the sequence you want (instead it generates a libcall to `__atomic_load_4`). Yes. And even if there was a way to get Clang to generate the desired IR LLVM doesn't do the right thing with it either (it calls `__atomic_load`). Inline assembly is an option, but even for just x86 it's significantly uglier because of the whole amd64 vs x86, cmpxchg16b thing. t.p.northover: > Isn't the return value of __c11_atomic_compare_exchange_weak the success boolean? Oops, yes.
		efriedmaUnsubmitted Not Done Reply Inline Actions The mov would not do an atomic load, which would render the return value potentially invalid. A cmpxchg still needs to return a valid previous value even if memory is not modified. The normal lowering for an atomic load from an aligned pointer on x86 is "mov"; not sure what you're getting at here. even for just x86 it's significantly uglier because of the whole amd64 vs x86, cmpxchg16b thing. x86 doesn't support unaligned 16-byte atomic operations. efriedma: > The mov would not do an atomic load, which would render the return value potentially invalid.
		((type)dest) = tmp;\
		}\
return;		return;
LOCK_FREE_CASES();		#else
		#define LOCK_FREE_ACTION(type) \
		((type)dest) = __c11_atomic_load((_Atomic(type)*)src, model);\
		return;
		#endif
		LOCK_FREE_CASES(src);
#undef LOCK_FREE_ACTION		#undef LOCK_FREE_ACTION
Lock *l = lock_for_pointer(src);		Lock *l = lock_for_pointer(src);
lock(l);		lock(l);
memcpy(dest, src, size);		memcpy(dest, src, size);
unlock(l);		unlock(l);
}		}

/// An atomic store operation. This is atomic with respect to the destination		/// An atomic store operation. This is atomic with respect to the destination
/// pointer only.		/// pointer only.
void __atomic_store_c(int size, void dest, void src, int model) {		void __atomic_store_c(int size, void dest, void src, int model) {
		#if LOCK_FREE_MISALIGNED_ATOMICS
		#define LOCK_FREE_ACTION(type)\
		if ((uintptr_t)dest % size == 0)\
		__c11_atomic_store((_Atomic(type))dest, (type*)dest, model);\
		else\
		__c11_atomic_exchange((_Atomic(type))dest, (type *)dest, model);\
		return;
		#else
#define LOCK_FREE_ACTION(type) \		#define LOCK_FREE_ACTION(type) \
__c11_atomic_store((_Atomic(type))dest, (type*)dest, model);\		__c11_atomic_store((_Atomic(type))dest, (type*)dest, model);\
return;		return;
LOCK_FREE_CASES();		#endif
		LOCK_FREE_CASES(dest);
#undef LOCK_FREE_ACTION		#undef LOCK_FREE_ACTION
Lock *l = lock_for_pointer(dest);		Lock *l = lock_for_pointer(dest);
lock(l);		lock(l);
memcpy(dest, src, size);		memcpy(dest, src, size);
unlock(l);		unlock(l);
}		}

/// Atomic compare and exchange operation. If the value at *ptr is identical		/// Atomic compare and exchange operation. If the value at *ptr is identical
/// to the value at expected, then this copies value at desired to *ptr. If		/// to the value at expected, then this copies value at desired to *ptr. If
/// they are not, then this stores the current value from ptr in expected.		/// they are not, then this stores the current value from ptr in expected.
///		///
/// This function returns 1 if the exchange takes place or 0 if it fails.		/// This function returns 1 if the exchange takes place or 0 if it fails.
int __atomic_compare_exchange_c(int size, void ptr, void expected,		int __atomic_compare_exchange_c(int size, void ptr, void expected,
void *desired, int success, int failure) {		void *desired, int success, int failure) {
#define LOCK_FREE_ACTION(type) \		#define LOCK_FREE_ACTION(type) \
return __c11_atomic_compare_exchange_strong((_Atomic(type))ptr, (type)expected,\		return __c11_atomic_compare_exchange_strong((_Atomic(type))ptr, (type)expected,\
(type)desired, success, failure)		(type)desired, success, failure)
LOCK_FREE_CASES();		LOCK_FREE_CASES(ptr);
#undef LOCK_FREE_ACTION		#undef LOCK_FREE_ACTION
Lock *l = lock_for_pointer(ptr);		Lock *l = lock_for_pointer(ptr);
lock(l);		lock(l);
if (memcmp(ptr, expected, size) == 0) {		if (memcmp(ptr, expected, size) == 0) {
memcpy(ptr, desired, size);		memcpy(ptr, desired, size);
unlock(l);		unlock(l);
return 1;		return 1;
}		}
memcpy(expected, ptr, size);		memcpy(expected, ptr, size);
unlock(l);		unlock(l);
return 0;		return 0;
}		}

/// Performs an atomic exchange operation between two pointers. This is atomic		/// Performs an atomic exchange operation between two pointers. This is atomic
/// with respect to the target address.		/// with respect to the target address.
void __atomic_exchange_c(int size, void ptr, void val, void *old, int model) {		void __atomic_exchange_c(int size, void ptr, void val, void *old, int model) {
#define LOCK_FREE_ACTION(type) \		#define LOCK_FREE_ACTION(type) \
(type)old = __c11_atomic_exchange((_Atomic(type))ptr, (type*)val,\		(type)old = __c11_atomic_exchange((_Atomic(type))ptr, (type*)val,\
model);\		model);\
return;		return;
LOCK_FREE_CASES();		LOCK_FREE_CASES(ptr);
#undef LOCK_FREE_ACTION		#undef LOCK_FREE_ACTION
Lock *l = lock_for_pointer(ptr);		Lock *l = lock_for_pointer(ptr);
lock(l);		lock(l);
memcpy(old, ptr, size);		memcpy(old, ptr, size);
memcpy(ptr, val, size);		memcpy(ptr, val, size);
unlock(l);		unlock(l);
}		}

Show All 13 Lines	#define OPTIMISED_CASES\
OPTIMISED_CASE(1, IS_LOCK_FREE_1, uint8_t)\		OPTIMISED_CASE(1, IS_LOCK_FREE_1, uint8_t)\
OPTIMISED_CASE(2, IS_LOCK_FREE_2, uint16_t)\		OPTIMISED_CASE(2, IS_LOCK_FREE_2, uint16_t)\
OPTIMISED_CASE(4, IS_LOCK_FREE_4, uint32_t)\		OPTIMISED_CASE(4, IS_LOCK_FREE_4, uint32_t)\
OPTIMISED_CASE(8, IS_LOCK_FREE_8, uint64_t)		OPTIMISED_CASE(8, IS_LOCK_FREE_8, uint64_t)
#endif		#endif

#define OPTIMISED_CASE(n, lockfree, type)\		#define OPTIMISED_CASE(n, lockfree, type)\
type __atomic_load_##n(type *src, int model) {\		type __atomic_load_##n(type *src, int model) {\
if (lockfree)\		if (lockfree(src))\
return __c11_atomic_load((_Atomic(type)*)src, model);\		return __c11_atomic_load((_Atomic(type)*)src, model);\
Lock *l = lock_for_pointer(src);\		Lock *l = lock_for_pointer(src);\
lock(l);\		lock(l);\
type val = *src;\		type val = *src;\
unlock(l);\		unlock(l);\
return val;\		return val;\
}		}
OPTIMISED_CASES		OPTIMISED_CASES
#undef OPTIMISED_CASE		#undef OPTIMISED_CASE

#define OPTIMISED_CASE(n, lockfree, type)\		#define OPTIMISED_CASE(n, lockfree, type)\
void __atomic_store_##n(type *dest, type val, int model) {\		void __atomic_store_##n(type *dest, type val, int model) {\
if (lockfree) {\		if (lockfree(dest)) {\
__c11_atomic_store((_Atomic(type)*)dest, val, model);\		__c11_atomic_store((_Atomic(type)*)dest, val, model);\
return;\		return;\
}\		}\
Lock *l = lock_for_pointer(dest);\		Lock *l = lock_for_pointer(dest);\
lock(l);\		lock(l);\
*dest = val;\		*dest = val;\
unlock(l);\		unlock(l);\
return;\		return;\
}		}
OPTIMISED_CASES		OPTIMISED_CASES
#undef OPTIMISED_CASE		#undef OPTIMISED_CASE

#define OPTIMISED_CASE(n, lockfree, type)\		#define OPTIMISED_CASE(n, lockfree, type)\
type __atomic_exchange_##n(type *dest, type val, int model) {\		type __atomic_exchange_##n(type *dest, type val, int model) {\
if (lockfree)\		if (lockfree(dest))\
return __c11_atomic_exchange((_Atomic(type)*)dest, val, model);\		return __c11_atomic_exchange((_Atomic(type)*)dest, val, model);\
Lock *l = lock_for_pointer(dest);\		Lock *l = lock_for_pointer(dest);\
lock(l);\		lock(l);\
type tmp = *dest;\		type tmp = *dest;\
*dest = val;\		*dest = val;\
unlock(l);\		unlock(l);\
return tmp;\		return tmp;\
}		}
OPTIMISED_CASES		OPTIMISED_CASES
#undef OPTIMISED_CASE		#undef OPTIMISED_CASE

#define OPTIMISED_CASE(n, lockfree, type)\		#define OPTIMISED_CASE(n, lockfree, type)\
int __atomic_compare_exchange_##n(type ptr, type expected, type desired,\		int __atomic_compare_exchange_##n(type ptr, type expected, type desired,\
int success, int failure) {\		int success, int failure) {\
if (lockfree)\		if (lockfree(ptr))\
return __c11_atomic_compare_exchange_strong((_Atomic(type)*)ptr, expected, desired,\		return __c11_atomic_compare_exchange_strong((_Atomic(type)*)ptr, expected, desired,\
success, failure);\		success, failure);\
Lock *l = lock_for_pointer(ptr);\		Lock *l = lock_for_pointer(ptr);\
lock(l);\		lock(l);\
if (ptr == expected) {\		if (ptr == expected) {\
*ptr = desired;\		*ptr = desired;\
unlock(l);\		unlock(l);\
return 1;\		return 1;\
}\		}\
expected = ptr;\		expected = ptr;\
unlock(l);\		unlock(l);\
return 0;\		return 0;\
}		}
OPTIMISED_CASES		OPTIMISED_CASES
#undef OPTIMISED_CASE		#undef OPTIMISED_CASE

////////////////////////////////////////////////////////////////////////////////		////////////////////////////////////////////////////////////////////////////////
// Atomic read-modify-write operations for integers of various sizes.		// Atomic read-modify-write operations for integers of various sizes.
////////////////////////////////////////////////////////////////////////////////		////////////////////////////////////////////////////////////////////////////////
#define ATOMIC_RMW(n, lockfree, type, opname, op) \		#define ATOMIC_RMW(n, lockfree, type, opname, op) \
type __atomic_fetch_##opname##_##n(type *ptr, type val, int model) {\		type __atomic_fetch_##opname##_##n(type *ptr, type val, int model) {\
if (lockfree) \		if (lockfree(ptr))\
return __c11_atomic_fetch_##opname((_Atomic(type)*)ptr, val, model);\		return __c11_atomic_fetch_##opname((_Atomic(type)*)ptr, val, model);\
Lock *l = lock_for_pointer(ptr);\		Lock *l = lock_for_pointer(ptr);\
lock(l);\		lock(l);\
type tmp = *ptr;\		type tmp = *ptr;\
*ptr = tmp op val;\		*ptr = tmp op val;\
unlock(l);\		unlock(l);\
return tmp;\		return tmp;\
}		}
Show All 16 Lines