This covers 128-bit loads, and atomicrmw operations without a single native instruction. Using CAS saves a bit of code size and has a better chance of succeeding with high contention on some systems.
Diff Detail
Event Timeline
The patch looks good to me.
Applying this patch solves this issue:
https://github.com/llvm/llvm-project/issues/55199
which is a reduced test from Rust generated code.
Could one of the ARM64 maintainers also approve this change?
Didn't realize this was up for review; happened to spot it on the list.
Some of these sequence seem extremely long. It should be a little better on main, since we improved ccmp formation, but can we rearrange operations somehow so we need fewer mov operations in the fast path?
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
17674 | A comment here might be useful. | |
17712 | 80 cols Comment needs to be updated. | |
llvm/test/CodeGen/AArch64/arm64-atomic-128.ll | ||
836 | This bug got fixed, right? | |
llvm/test/CodeGen/AArch64/atomicrmw-xchg-fp.ll | ||
107 | These moves seem very strange. |
llvm/test/CodeGen/AArch64/arm64-atomic-128.ll | ||
---|---|---|
836 | Nevermind, this isn't a bug; got it confused with a different issue. |
llvm/test/CodeGen/AArch64/atomicrmw-xchg-fp.ll | ||
---|---|---|
107 | The first two are part of forming an xseqregclass thing from component registers, the second two are because CASP clobbers its input but we want to compare against it afterwards. Still not ideal, but not completely out there. |
Please fix the commit message to fix the incorrect claim that this reduces codesize; the only case that actually decreases in size is the atomic load.
Please file a followup bug for the extra mov instructions.
Otherwise LGTM, I guess; improving the performance under contention probably outweighs any inefficiency caused by the extra instructions.
(We might also want to mess with the code sequences for outlining, but that doesn't need to be this patch.)
Thanks Eli. I filed https://github.com/llvm/llvm-project/issues/59516 for the moves and pushed it with the changes you asked for (10d34f5538e0).
A comment here might be useful.