This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Codegen for FEAT_LRCPC3
ClosedPublic

Authored by tmatheson on Jan 10 2023, 2:00 PM.

Details

Summary

Implements support for the following 128-bit atomic operations with +rcpc3:

  • store-release
  • store-release volatile
  • load-acquire
  • load-acquire volatile
  • load-acquire const
  • load-acquire const volatile

D126250 and D137590 added support for emitting LDAPR (Load-Acquire RCPc) rather
than LDAP (Load-Acquire) when +rcpc is available. This patch allows emitting
the 128-bit RCPc instructions added in FEAT_LRCPC3 LDIAPP/STILP. The
implementation is different from LDAPR, because there are no non-RCPc
equivalents for these new instructions.

Support for the offset variants will be added in a later patch.

Diff Detail

Event Timeline

tmatheson created this revision.Jan 10 2023, 2:00 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 10 2023, 2:00 PM
tmatheson requested review of this revision.Jan 10 2023, 2:00 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 10 2023, 2:00 PM

You need FEAT_LSE2 if you're going to generate STILP/LDIAPP from a 128-bit access, in order to get single-copy atomicity.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
5776

Outdated?

22233

Yes that sounds reasonable

llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
1218
tmatheson updated this revision to Diff 491788.Jan 24 2023, 7:28 AM
tmatheson marked 3 inline comments as done.

Require LSE2

lenary accepted this revision.Jan 25 2023, 3:55 AM
This revision is now accepted and ready to land.Jan 25 2023, 3:55 AM
This revision was landed with ongoing or failed builds.Jan 25 2023, 4:28 AM
This revision was automatically updated to reflect the committed changes.
efriedma added inline comments.
llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-load-rcpc3.ll
331

Sort of orthogonal to this change, but can someone at ARM verify if this is the sequence we actually want for sequentially consistent loads with lse2, as opposed to using caspal? (I'm a bit concerned given the issues we ran into with narrower widths on Windows; see D141748.)

efriedma added inline comments.Feb 7 2023, 1:02 PM
llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-load-rcpc3.ll
331

Any update here?

tmatheson added inline comments.Feb 8 2023, 2:01 AM
llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-load-rcpc3.ll
331

Not yet but I haven't forgotten about it.

tmatheson added inline comments.
llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-load-rcpc3.ll
331

Sorry this has taken a while, I've had to work on other things. The first thing to note is that a CASP implementation is slower, so as long as both are correct we should use the ldp+dmb.

@LukeGeeson has been working on adding CASP to herd7 so that we can compare ldp+dmb with casp. Using his Telechat tool, which can compare the C++ semantics with the machine code semantics and identify new behaviours introduced by the machine code, we aren't seeing any differences so far, although admittedly the number of cases we've checked is low because each CASP case has to be manually written.

We are still looking for at a way to run more extensive testing automatically, but if you have any concerns about specific scenarios we can check them.

efriedma added inline comments.Mar 20 2023, 12:57 PM
llvm/test/CodeGen/AArch64/Atomics/aarch64_be-atomic-load-rcpc3.ll
331

I guess my biggest concern here is that we're basically committing to never using sequentially consistent (non-RCPc) load/store instructions for i128, even if the instruction set adds them in the future. If you're fine with that, then I guess I don't have any other specific concerns?