This is an archive of the discontinued LLVM Phabricator instance.

[SystemZ] Set setMaxAtomicSizeInBitsSupported
AbandonedPublic

Authored by brad on Jan 7 2023, 12:24 AM.

Details

Reviewers
jonpa
uweigand
Summary

Set setMaxAtomicSizeInBitsSupported for SystemZ.

Diff Detail

Event Timeline

brad created this revision.Jan 7 2023, 12:24 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 7 2023, 12:24 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
brad requested review of this revision.Jan 7 2023, 12:24 AM

I don't think this is quite correct, as we do support 128-bit atomics on properly aligned addresses. In any case, the back-end implements atomic load, store, and cmpxchg for i128. We do not currently implement atomicrmw, but that could be handled in terms of cmpxchg of course. For non-aligned addresses, we do need to fall back to the library routine, however.

Interestingly enough, the SystemZ back-end does not even use the AtomicExpandPass at all at the moment, so the setMaxAtomicSizeInBitsSupported change probably doesn't even have any effect? Everything still seems to work, however ...

@jonpa we don't seem to have test cases for i128 atomicrmw at the moment - could you have a loop at what happens here, and if there's maybe something we need to change? Thanks!

LegalizeTypes expands the i128 result of e.g. an AtomicLoadAdd node, which is done by DAGTypeLegalizer::ExpandAtomic(), which results in a library call. There are no tests for the atomic operations with i128, but it should work due to the TypeLegalizer. i128 is not a legal type so I guess therefore they are all expanded this way.

I could imagine that it might be beneficial to use the AtomicExpandPass to use the target instructions whenever possible, and libcalls otherwise. However, I am unsure about this: The AtomicExpandPass seems to do this (emit the libcalls only if needed, after checking alignment as well), but I see this in the Atomics document:

"One very important property of the atomic operations is that if your backend supports any inline lock-free atomic operations of a given size, you should support ALL operations of that size in a lock-free manner...it is invalid to implement atomic load using the native instruction, but cmpxchg using a library call to a function that uses a mutex, atomic load must also expand to a library call on such architectures, so that it can remain atomic with regards to a simultaneous cmpxchg, by using the same mutex."

I guess I must be missing something somehow...

tschuett added a subscriber: tschuett.EditedJan 14 2023, 12:56 PM

No, you don't. You cannot mix lock-free and mutex atomics. E.g, if loads are lock-free and stores use a mutex. Then you have lost atomicity. Your mutex version could do stores one byte at a time.

LegalizeTypes expands the i128 result of e.g. an AtomicLoadAdd node, which is done by DAGTypeLegalizer::ExpandAtomic(), which results in a library call. There are no tests for the atomic operations with i128, but it should work due to the TypeLegalizer. i128 is not a legal type so I guess therefore they are all expanded this way.

I could imagine that it might be beneficial to use the AtomicExpandPass to use the target instructions whenever possible, and libcalls otherwise. However, I am unsure about this: The AtomicExpandPass seems to do this (emit the libcalls only if needed, after checking alignment as well), but I see this in the Atomics document:

"One very important property of the atomic operations is that if your backend supports any inline lock-free atomic operations of a given size, you should support ALL operations of that size in a lock-free manner...it is invalid to implement atomic load using the native instruction, but cmpxchg using a library call to a function that uses a mutex, atomic load must also expand to a library call on such architectures, so that it can remain atomic with regards to a simultaneous cmpxchg, by using the same mutex."

I guess I must be missing something somehow...

Note that this is handled by the particular implementation of the 16-byte atomic library routines on s390x: they all perform a *runtime* check for proper alignment, and if the address actually *is* 16-byte aligned, the library routine will in fact use the appropriate 16-byte atomic instructions to implement the operation (instead of a separate mutex). That is why it *is* safe for the compiler to emit those instructions itself *if* it can prove at compile-time that the address will in fact be properly aligned at run-time. (That's also done by GCC, for example.)

jonpa added a comment.Mar 20 2023, 7:57 AM

I have experimented with this here: https://reviews.llvm.org/D146425

brad abandoned this revision.Jul 9 2023, 5:41 PM