Set setMaxAtomicSizeInBitsSupported for SystemZ.
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
I don't think this is quite correct, as we do support 128-bit atomics on properly aligned addresses. In any case, the back-end implements atomic load, store, and cmpxchg for i128. We do not currently implement atomicrmw, but that could be handled in terms of cmpxchg of course. For non-aligned addresses, we do need to fall back to the library routine, however.
Interestingly enough, the SystemZ back-end does not even use the AtomicExpandPass at all at the moment, so the setMaxAtomicSizeInBitsSupported change probably doesn't even have any effect? Everything still seems to work, however ...
@jonpa we don't seem to have test cases for i128 atomicrmw at the moment - could you have a loop at what happens here, and if there's maybe something we need to change? Thanks!
LegalizeTypes expands the i128 result of e.g. an AtomicLoadAdd node, which is done by DAGTypeLegalizer::ExpandAtomic(), which results in a library call. There are no tests for the atomic operations with i128, but it should work due to the TypeLegalizer. i128 is not a legal type so I guess therefore they are all expanded this way.
I could imagine that it might be beneficial to use the AtomicExpandPass to use the target instructions whenever possible, and libcalls otherwise. However, I am unsure about this: The AtomicExpandPass seems to do this (emit the libcalls only if needed, after checking alignment as well), but I see this in the Atomics document:
"One very important property of the atomic operations is that if your backend supports any inline lock-free atomic operations of a given size, you should support ALL operations of that size in a lock-free manner...it is invalid to implement atomic load using the native instruction, but cmpxchg using a library call to a function that uses a mutex, atomic load must also expand to a library call on such architectures, so that it can remain atomic with regards to a simultaneous cmpxchg, by using the same mutex."
I guess I must be missing something somehow...
No, you don't. You cannot mix lock-free and mutex atomics. E.g, if loads are lock-free and stores use a mutex. Then you have lost atomicity. Your mutex version could do stores one byte at a time.
Note that this is handled by the particular implementation of the 16-byte atomic library routines on s390x: they all perform a *runtime* check for proper alignment, and if the address actually *is* 16-byte aligned, the library routine will in fact use the appropriate 16-byte atomic instructions to implement the operation (instead of a separate mutex). That is why it *is* safe for the compiler to emit those instructions itself *if* it can prove at compile-time that the address will in fact be properly aligned at run-time. (That's also done by GCC, for example.)