This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Implement structured load intrinsics
ClosedPublic

Authored by c-rhodes on Mar 6 2020, 8:08 AM.

Details

Summary

This patch adds initial support for the following instrinsics:

  • llvm.aarch64.sve.ld2
  • llvm.aarch64.sve.ld3
  • llvm.aarch64.sve.ld4

For loading two, three and four vectors worth of data. Basic codegen is
implemented with reg+reg and reg+imm addressing modes being addressed
in a later patch.

The types returned by these intrinsics have a number of elements that is
a multiple of the elements in a 128-bit vector for a given type and N,
where N is the number of vectors being loaded, i.e. 2, 3 or 4. Thus, for
32-bit elements the types are:

LD2 : <vscale x 8 x i32>
LD3 : <vscale x 12 x i32>
LD4 : <vscale x 16 x i32>

This is implemented with target-specific intrinsics for each variant
that take the same operands as the IR intrinsic but return N values,
where the type of each value is a full vector, i.e. <vscale x 4 x i32>
in the above example. These values are then concatenated using the
standard concat_vector intrinsic to maintain type legality with the IR.

These intrinsics are intended for use in the Arm C Language
Extension (ACLE).

Diff Detail

Event Timeline

c-rhodes created this revision.Mar 6 2020, 8:08 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 6 2020, 8:08 AM

Hi @c-rhodes ,

thank you for working on this. I am basing the addressing mode optimization for ldN on this patch, I just wanted to point out a couple of minor remarks!

Grazie,

Francesco

[Nit] Should use vscale instead of n in the commit message:

LD2 : <n x 8 x i32>
LD3 : <n x 12 x i32>
LD4 : <n x 16 x i32>
llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
1421

SubRegIdx is always set to AArch64::zsub0. Can we remove it from the parameter list of the method and use AArch64::zsub0 directly inside the function?

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9791

[Nit] Might be worth asserting that VT.getVectorElementCount() % N == 0.

fpetrogalli added inline comments.Mar 18 2020, 7:46 AM
llvm/include/llvm/IR/IntrinsicsAArch64.td
817

Question: you have three overloaded operands here. How comes that you need to specify only one of them in the intrinsic name?

If I look at one of your tests:

%res = call <vscale x 32 x i8> @llvm.aarch64.sve.ld2.nxv32i8(<vscale x 16 x i1> %pred,
                                                             <vscale x 16 x i8>* %addr)

By the definition you have specified here, I was expecting to see the following intrinsic: @llvm.aarch64.sve.ld2.nxv32i8.nxv16i8.p0nxv16i8. Am I missing something?

c-rhodes marked an inline comment as done.Mar 18 2020, 9:21 AM

Thanks for the comments @fpetrogalli! I'll update this patch to address your comments.

llvm/include/llvm/IR/IntrinsicsAArch64.td
817

You're right, there should be a suffix for each overloaded type in the intrinsic name. I initially implemented each load intrinsic separately with special types where the predicate and address were based on the return type, so something like LLVMVectorHalfWidth<0, llvm_i1_ty> and LLVMPointerToHalfWidthVector<0> for LD2 args. With this definition only the return type was overloaded and I missed updating the intrinsics in the tests. Interesting how there's no issues without specifying each overloaded type, I'll fix the tests.

fpetrogalli added inline comments.Mar 18 2020, 6:54 PM
llvm/test/CodeGen/AArch64/sve-intrinsics-loads-with-extract.ll
23 ↗(On Diff #248735)

I just realized... do we need these tests? I might have missed something, but it seems to me that none of the code you have added here is related to the interaction between ld<N> and tuple.get. For example, the fact that in this specific test the load is loading memory in z31 and z0, in this order, seems to me more of a register allocation problem other than anything related to the fact that %res is coming from a ld<N> intrinsic.

I think that we can simplify these tests by removing the ld<N> and just writing tests that involve using the tuple.get on the input parameter of the test. For example, in the specific case of @ld2b_i8_1, I think you want to run llc on this code:

define <vscale x 16 x i8> @ld2b_i8_1(<vscale x 32 x i8> %res) {
; CHECK-LABEL: ld2b_i8_1:
; CHECK: mov z0.d, z1.d
; CHECK-NEXT: ret
  %v2 = call <vscale x 16 x i8> @llvm.aarch64.sve.tuple.get.nxv32i8(<vscale x 32 x i8> %res, i32 1)
  ret <vscale x 16 x i8> %v2

What you would be testing here is that whatever register is assigned to the input parameter, it is shuffled around by the intrinsic to be returned by the correct output register.

If you agree on this approach though, I think this deserves a separate patch.

One last (minor) extra reason for splitting things like I suggested: if we make these tests ld<N>-free, we won't have to bother updating/extending them when we need to do any work around the ld<N>.

c-rhodes added inline comments.Mar 19 2020, 3:05 AM
llvm/test/CodeGen/AArch64/sve-intrinsics-loads-with-extract.ll
23 ↗(On Diff #248735)

I implemented these tests before we had calling convention support for passing and returning tuples of scalable vectors to/from functions (downstream) so the extract was necessary. There are already tests similar to what you suggested in D75674 covering tuple.get that should be sufficient (?), I can remove these if they're not adding anything.

fpetrogalli added inline comments.Mar 19 2020, 5:05 AM
llvm/test/CodeGen/AArch64/sve-intrinsics-loads-with-extract.ll
23 ↗(On Diff #248735)

I think you can safely remove them. As you said, they were written when we didn't have the tuple calling conventions, but now we have it, and it is tested in https://reviews.llvm.org/D75674.

Nit: it might not be a problem, but in https://reviews.llvm.org/D75674 I cannot see tests that check when a tuple like <vscale x 8 x i64> is passed to a function definition. I see tests that are getting multiple "one -register" parameters in input, building a tuple with an intrinsic and passing it to a tuple.get intrinsic or to a function call, but not a test like the one I extracted in which the tuple is in input to the define. I'll leave a note there and drag @sdesmalen attention to see whether we should add them.

Thanks,

Francesco

c-rhodes updated this revision to Diff 266237.May 26 2020, 8:57 AM
c-rhodes edited the summary of this revision. (Show Details)

Changes:

  • Rebased.
  • Move lowering to DAGCombine pre-legalisation.
  • Added assert for VT.getVectorElementCount().Min % N == 0.
  • Removed test/CodeGen/AArch64/sve-intrinsics-loads-with-extract.ll.
  • Added missing overloaded argument types to intrinsics in test/CodeGen/AArch64/sve-intrinsics-loads.ll.
  • Removed SubRegIdx from SelectPredicatedLoad since AArch64::zsub0 is always used.
c-rhodes marked 4 inline comments as done.May 26 2020, 9:01 AM
c-rhodes added inline comments.
llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
1421

Done, you'll have to rebase D77251 once this is merged

sdesmalen added inline comments.Jun 8 2020, 9:48 AM
llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
1461

Is the above statement correct when having just replaced SDValue(N, NumVecs-1) in the loop above?

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9763

nit: call this LowerSVEStructLoad ?

sdesmalen accepted this revision.Jun 8 2020, 9:55 AM

LGTM if you can add the extra comment.

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
1461

Sorry, never mind that comment, I now see it is taking the chain from Load, not N.
Can you add a comment clarifying this though? (that it is copying the chain here)?

This revision is now accepted and ready to land.Jun 8 2020, 9:55 AM
c-rhodes updated this revision to Diff 269286.Jun 8 2020, 10:41 AM

Changes:

  • Rename LowerSVEStructuredLoad -> LowerSVEStructLoad.
  • Clarify copy chain in SelectPredicatedLoad.
c-rhodes marked 2 inline comments as done.Jun 8 2020, 10:43 AM
c-rhodes added inline comments.
llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
1461

Yeah it's just copying the chain, agree it's confusing using NumVecs as the index, I've updated it.

This revision was automatically updated to reflect the committed changes.