ARMv5 and older architectures don’t support SMP and do not have atomic instructions. Still they’re in use in IoT world, where one has to stick to libgcc.
~~~
OS Laboratory. Huawei RRI. Saint-Petersburg
Differential D116088
[compiler-rt] Implement ARM atomic operations for architectures without SMP support kpdev42 on Dec 20 2021, 10:41 PM. Authored by
Details ARMv5 and older architectures don’t support SMP and do not have atomic instructions. Still they’re in use in IoT world, where one has to stick to libgcc. ~~~ OS Laboratory. Huawei RRI. Saint-Petersburg
Diff Detail
Event TimelineComment Actions I think this looks reasonable. I haven't tried it, but I guess we should proceed with it instead of holding it back, as there's nobody else reviewing it. Comment Actions Correct me if I'm wrong, but I don't think this stubs are async signal safe nor will they work for preemptive multitasking systems? Comment Actions Those stubs are basically cas loops (https://en.wikipedia.org/wiki/Compare-and-swap) which are not much different from their SMP counterparts, except memory sync ops are not used. This should work normally in case of preemption (preempting thread will spend its quota in busy-wait). Signal can be a problem if it preempt a thread executing atomic op, but I wonder if SMP code handles this Comment Actions I'm concerned providing these is going to cause issues. The provided implementation are not atomic. Blindly assuming that the user is compiling for a target that doesn't have pre-emptible threads seems like a bad idea. How do you expect users to use these methods, anyway? clang shouldn't be generating calls to these methods. (It looks like it actually does in some cases on ARM, but that's not intended behavior.) For users who want to pretend threads don't exist, we should provide a compiler flag. -mthread-model doesn't quite work right for this at the moment, but it would make sense to fix it, I think. Comment Actions D120026 is merged now, which addresses the issue of the compiler generating __sync calls where it isn't supposed to. Does anyone want to continue discussing what changes to compiler-rt would be appropriate? I didn't mean to completely shut down discussion with my comment. Comment Actions Imagine we have the following piece of code in the program: volatile int G; int foo() { return __sync_add_and_fetch(&G, 1); } Now we want having this built and running on armv5 platform. At the moment the only option we have is to use libgcc. Unfortunately this have one big disadvantage: we're only limited to Linux, because call to __sync_add_and_fetch boils down to Linux kernel user helper. We want this to work on other platforms also, and here is what compiler-rt good for. However sync ops operations in compiler-rt use memory barriers, which doesn't work on armv5: any attempt to use memory barrier on the latter will result in SIGILL. As armv5 doesn't support SMP (but still supports preemptive multitasking) it's possible in out opinion to implement sync ops as a compare and swap loop without memory barriers. What's your opinion on this? Comment Actions On a target that doesn't support SMP, you don't need memory barriers, sure. (I think we'd want a CMake flag to explicitly assert that you're only going to run the code on chips without SMP.) That doesn't really solve your issue, though. To implement atomic compare-and-swap or rmw operations, you need to ensure your code doesn't get interrupted in the middle. If your system supports multithreading, and a thread can be preempted at any time, you need one of the following:
None of these options work on armv5 without some sort of kernel assistance. (Well, technically, you can implement a spinlock with swp, but that's very inefficient.) Comment Actions At the moment, in case of compiler-rt, __sync_add_and_fetch boils down to
IMO, performance-wise there is not much difference (if any) between this and No code in compiler-rt disables interrupts, so it can and will be interrupted in the middle, Now if we are on a platform which doesn't support SMP we can use ordinary memory operations instead Comment Actions Even on a non-SMP processor, there's a bit of magic in ldrex/strex: ldrex sets a hidden "lock" bit, and strex checks it. If there's a context switch between the load and the store, the switch will clear that bit. So when the code continues to execute, the store fails. Comment Actions Well, after some investigation it turned out that:
I've reimplemented sync ops using these primitves, PTAL Comment Actions ldr+op+swp still isn't atomic. For each point in the code, please try the exercise of "what if my code is interrupted here"? The only way to use swp to implement general atomic operations is to use a separate spinlock. |