This is an archive of the discontinued LLVM Phabricator instance.

AArch64: use `CAS` instead of `LDX`/`STX` for more ops if available
ClosedPublic

Authored by t.p.northover on Sep 20 2021, 6:27 AM.

Details

Summary

This covers 128-bit loads, and atomicrmw operations without a single native instruction. Using CAS saves a bit of code size and has a better chance of succeeding with high contention on some systems.

Diff Detail

Event Timeline

t.p.northover created this revision.Sep 20 2021, 6:27 AM
t.p.northover requested review of this revision.Sep 20 2021, 6:27 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 20 2021, 6:27 AM
sebpop accepted this revision.Dec 12 2022, 11:51 AM
sebpop added a subscriber: sebpop.

The patch looks good to me.
Applying this patch solves this issue:
https://github.com/llvm/llvm-project/issues/55199
which is a reduced test from Rust generated code.

Could one of the ARM64 maintainers also approve this change?

This revision is now accepted and ready to land.Dec 12 2022, 11:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 12 2022, 11:51 AM

Didn't realize this was up for review; happened to spot it on the list.

Some of these sequence seem extremely long. It should be a little better on main, since we improved ccmp formation, but can we rearrange operations somehow so we need fewer mov operations in the fast path?

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
21876

A comment here might be useful.

21915–21918

80 cols

Comment needs to be updated.

llvm/test/CodeGen/AArch64/arm64-atomic-128.ll
774

This bug got fixed, right?

llvm/test/CodeGen/AArch64/atomicrmw-xchg-fp.ll
107

These moves seem very strange.

efriedma added inline comments.Dec 12 2022, 6:24 PM
llvm/test/CodeGen/AArch64/arm64-atomic-128.ll
774

Nevermind, this isn't a bug; got it confused with a different issue.

t.p.northover marked 2 inline comments as done.Dec 13 2022, 7:11 AM
t.p.northover added inline comments.
llvm/test/CodeGen/AArch64/atomicrmw-xchg-fp.ll
107

The first two are part of forming an xseqregclass thing from component registers, the second two are because CASP clobbers its input but we want to compare against it afterwards. Still not ideal, but not completely out there.

Updating diff to ToT.

efriedma accepted this revision.Dec 13 2022, 10:32 AM

Please fix the commit message to fix the incorrect claim that this reduces codesize; the only case that actually decreases in size is the atomic load.

Please file a followup bug for the extra mov instructions.

Otherwise LGTM, I guess; improving the performance under contention probably outweighs any inefficiency caused by the extra instructions.

(We might also want to mess with the code sequences for outlining, but that doesn't need to be this patch.)

t.p.northover closed this revision.Dec 14 2022, 4:16 AM

Thanks Eli. I filed https://github.com/llvm/llvm-project/issues/59516 for the moves and pushed it with the changes you asked for (10d34f5538e0).