Trying to accurately model what the hardware actually supports seems to lead to a lot of people complaining, and nobody saying it's actually helpful. So just pretend everything is lock-free, and let users deal with ensuring that the __sync_* routines are actually lock-free. If anyone complains, we can just say "gcc does the same thing".
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Unit Tests
Event Timeline
Comment Actions
Looking at GCC it looks like there (for cortex-m0 at least) the behaviour is that loads and stores are generated inline, but more complex operations go to the atomic library calls (not the sync library calls). e.g. for
int x, y; int fn() { return __atomic_load_n(&x, __ATOMIC_SEQ_CST); } int fn2() { return __atomic_compare_exchange_n(&x, &y, 0, 0, 0, __ATOMIC_SEQ_CST); }
I get with arm-none-eabi-gcc tmp.c -O1 -mcpu=cortex-m0
fn: ldr r3, .L2 dmb ish ldr r0, [r3] dmb ish bx lr fn2: push {lr} sub sp, sp, #12 ldr r0, .L5 adds r1, r0, #4 movs r3, #5 str r3, [sp] movs r3, #0 movs r2, #0 bl __atomic_compare_exchange_4 add sp, sp, #12 pop {pc}
so if we're doing this for compatibility with GCC we should do the same.
Comment Actions
So gcc has two different behaviors on ARM:
- On linux, prefers __sync calls, and generates inline code for load/store.
- On baremetal, gcc chooses what sort of atomic call to generate based on how the source code is written: if the user writes __sync, you get __sync, and if the user writes __atomic, the user gets __atomic. But it generates inline code for load/store, so it's assuming the __atomic implementation is lock-free.
We'd have to hack clang IR generation to generate different IR for the two constructs. I'm not sure what the underlying logic is, or if it's worth trying to emulate.