This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][RCPC3] Instruction selection for LDAP1/STL1 instructions
ClosedPublic

Authored by pratlucas on Jun 16 2023, 5:53 AM.

Details

Summary

This implements the DAG patterns to enable instruction selection for the
LDAP1 and STL1 instructions from FEAT_LRCPC3. The instructions should
match the following combinations:

  • Aqcuiring atomic load + vector insert element for LDAP1.
  • Vector extract element + releasing atomic store for STL1.

Patterns have also been added to cope with the DAG structure found when
dealing with 1-lane sub-vectors.

Diff Detail

Event Timeline

pratlucas created this revision.Jun 16 2023, 5:53 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 16 2023, 5:53 AM
pratlucas requested review of this revision.Jun 16 2023, 5:53 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 16 2023, 5:53 AM
LukeGeeson added a comment.EditedJun 19 2023, 8:01 AM

From my understanding of LDAP1 and STL1 these instructions (in particular LDAP) have the same ordering semantics as LDAPR [1] and hence I expect the correctness argument will be similar to the LDAPR testing [2].

I expect this patch is ok (as long as LDAP1 is not used for SC accesses or mixed) - but I'll need to implement some testing to be certain.

[1] https://developer.arm.com/documentation/ddi0602/2022-12/SIMD-FP-Instructions/LDAP1--SIMD-FP---Load-Acquire-RCpc-one-single-element-structure-to-one-lane-of-one-register-
[2] https://reviews.llvm.org/D126250

tmatheson accepted this revision.Jun 20 2023, 3:21 AM

LGTM. As @LukeGeeson notes these are not for sequentially consistent ops so won't cause something like [1], and they are gated behind +rcpc3.

[1] https://github.com/llvm/llvm-project/issues/62652

This revision is now accepted and ready to land.Jun 20 2023, 3:21 AM

I'd like to see test coverage for load/store of:

  • a plain non-vector double value
  • 64-bit vectors
  • insert/extract from SVE vectors
  • inserting into lane zero of a <2 x i64> vector

(I'm not expecting any particular result for the above, but I'd like to see what happens beyond the exact patterns you've implemented.)

Matt added a subscriber: Matt.Jun 20 2023, 2:08 PM
pratlucas updated this revision to Diff 535432.Jun 28 2023, 8:46 AM

Increasing test coverage, with contributions from @tmatheson.

efriedma accepted this revision.Jul 5 2023, 2:16 PM

LGTM

Not implementing optimized lowering for "double" and SVE seems fine for now, but you might want to consider it for the future.

llvm/test/CodeGen/AArch64/rcpc3.ll
248

The tests for a non-atomic load of a vector don't seem necessary. (I forgot you can't do an atomic load of a vector.)

Maybe worth testing "load atomic i64" followed by a bitcast to <2 x i32>, instead.

pratlucas updated this revision to Diff 538075.Jul 7 2023, 4:24 AM

Adding tests for atomic load/store of i64 + bitcast to/from <2 x i32>.

I'm keeping the non-atomic load/store tests to make sure we have minimal
covereage of those operations with rcpc3 enabled, as I couldn't find it
in any other test.

This revision was landed with ongoing or failed builds.Jul 7 2023, 4:33 AM
This revision was automatically updated to reflect the committed changes.