Intrinsics and relative codegen has been implemented for the following SVE instructions: 1. PRF<T> <prfop>, <Pg>, [<Xn|SP>, <Zm>.S, <mod>] -> 32-bit scaled offset 2. PRF<T> <prfop>, <Pg>, [<Xn|SP>, <Zm>.D, <mod>] -> 32-bit unpacked scaled offset 3. PRF<T> <prfop>, <Pg>, [<Xn|SP>, <Zm>.D] -> 64-bit scaled offset 4. PRF<T> <prfop>, <Pg>, [<Zn>.S{, #<imm>}] -> 32-bit element 5. PRF<T> <prfop>, <Pg>, [<Zn>.D{, #<imm>}] -> 64-bit element The instructions are associated the following intrinsics, respectively: 1. void @llvm.aarch64.sve.gather.prf<T>.scaled.<mod>.nx4vi32( i8* %base, <vscale x 4 x i32> %offset, <vscale x 4 x i1> %Pg, i32 %prfop) 2. void @llvm.aarch64.sve.gather.prf<T>.scaled.<mod>.nx2vi32( i8* %base, <vscale x 2 x i32> %offset, <vscale x 2 x i1> %Pg, i32 %prfop) 3. void @llvm.aarch64.sve.gather.prf<T>.scaled.nx2vi64( i8* %base, <vscale x 2 x i64> %offset, <vscale x 2 x i1> %Pg, i32 %prfop) 4. void @llvm.aarch64.sve.gather.prf<T>.nx4vi32( <vscale x 4 x i32> %bases, i64 %imm, <vscale x 4 x i1> %Pg, i32 %prfop) 5. void @llvm.aarch64.sve.gather.prf<T>.nx2vi64( <vscale x 2 x i64> %bases, i64 %imm, <vscale x 2 x i1> %Pg, i32 %prfop) The intrinsics are the IR counterpart of the following SVE ACLE functions: * void svprf<T>(svbool_t pg, const void *base, svprfop op) * void svprf<T>_vnum(svbool_t pg, const void *base, int64_t vnum, svprfop op) * void svprf<T>_gather[_u32base](svbool_t pg, svuint32_t bases, svprfop op) * void svprf<T>_gather[_u64base](svbool_t pg, svuint64_t bases, svprfop op) * void svprf<T>_gather_[s32]offset(svbool_t pg, const void *base, svint32_t offsets, svprfop op) * void svprf<T>_gather_[u32]offset(svbool_t pg, const void *base, svint32_t offsets, svprfop op) * void svprf<T>_gather_[s64]offset(svbool_t pg, const void *base, svint64_t offsets, svprfop op) * void svprf<T>_gather_[u64]offset(svbool_t pg, const void *base, svint64_t offsets, svprfop op) * void svprf<T>_gather[_u32base]_offset(svbool_t pg, svuint32_t bases, int64_t offset, svprfop op) * void svprf<T>_gather[_u64base]_offset(svbool_t pg, svuint64_t bases,int64_t offset, svprfop op) Reviewers: andwar, sdesmalen, efriedma, rengolin Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75580
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Hi @fpetrogalli, thank you for working on this!
My main points:
- ACLE allows scalar indices that may be out of range for PRFH <prfop>, <Pg>, [<Zn>.S{, #<imm>}]. Will we be able to cater for those with your approach?
- Could you please remove the extra empty lines from test files? (and make sure that the formatting is consistent)
- We tend to put declarations at the end of the test files (I've not seen a test with a different style, but I've not seen them all either ;-) )
llvm/include/llvm/IR/IntrinsicsAArch64.td | ||
---|---|---|
1263 | This is an unwritten rule, but so far class definitions and intrinsics definitions are kept separately (i.e. all class definitions first, followed by all intrinsics definitions). Some classes are re-used by many intrinsics, so the 2 definitions (class and intrinsics) can end up being far apart anyway. For consistency sake I'd separate the two. | |
1266 | [nit] I think that this way would be a bit more consistent (one space before the comment) class SVE_prf_scaled : Intrinsic<[], [ llvm_anyptr_ty, // Base address llvm_anyvector_ty, // offsets LLVMScalarOrSameVectorWidth<1, llvm_i1_ty>, // Predicate llvm_i32_ty // prfop ], [IntrArgMemOnly, NoCapture<0>, ImmArg<3>]>; | |
1280 | [nit] Base addresses | |
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td | ||
13 | Please, could this block) of code (and it's sunblocks) could be documented/commented consistently with the rest of the file? E.g.: // Prefetches - pattern definitions // Having said that, this file is growing organically so we may want to revisit the current style. | |
18 | Why extra indentation? | |
40 | IMO this would be easier to read if it was split across two lines. | |
llvm/test/CodeGen/AArch64/sve-intrinsics-gather-prefetches-scaled-offset.ll | ||
263 | [nit] many empty lines | |
llvm/test/CodeGen/AArch64/sve-intrinsics-gather-prefetches-vect-base-imm-offset.ll | ||
11 | What about indices that are out of range for <prfop>, <Pg>, [<Zn>.S{, #<imm>}]? IIUC, this is similar to what's tested here: Currently this is neither supported nor tested. | |
37 | [nit] empty line |
llvm/include/llvm/IR/IntrinsicsAArch64.td | ||
---|---|---|
1285 | Because the ACLE intrinsic doesn't require the offset to be an immediate, we don't want ImmArg<1> to be an immediate in the LLVM IR intrinsic, and rather leave it to CodeGen to make sure it generates the appropriate instruction. | |
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td | ||
40 | nit: format to split statement on two lines? | |
llvm/lib/Target/AArch64/SVEInstrFormats.td | ||
6463 | nit: this is some odd formatting here. | |
llvm/test/CodeGen/AArch64/sve-intrinsics-gather-prefetches-scaled-offset.ll | ||
5 | nit: Most tests have the declarations at the bottom. |
llvm/test/CodeGen/AArch64/sve-intrinsics-gather-prefetches-scaled-offset.ll | ||
---|---|---|
133 | The offset vector here should be <vscale x 2 x i32>, not <vscale x 2 x i64>. I'll fix this. |
Yep, I have added a test to cover all cases in which the intrinsic cannot be lowered to such instruction (run time values or invalid immediate).
- Could you please remove the extra empty lines from test files? (and make sure that the formatting is consistent)
Done! Please double check I haven't missed any.
- We tend to put declarations at the end of the test files (I've not seen a test with a different style, but I've not seen them all either ;-) )
Done.
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td | ||
---|---|---|
13 | I have removed these PatFrag, they are not needed anymore in last patch. | |
18 | PatFrags removed from the patch. | |
llvm/test/CodeGen/AArch64/sve-intrinsics-gather-prefetches-scaled-offset.ll | ||
133 | This is now working, the offset uses the correct scalable vector <vscale x 2 x i32>. |
Hi @sdesmalen and @andwar,
thank you for your review!
I have addressed all your comments, please let me know if I have missed any.
Please notice that this patch now handles unpacked vector offsets and invalid immediate scalar offset.
Grazie,
Francesco
@fpetrogalli Thanks for the update - this has evolved into a very neat patch! I've left a few nits, nothing major. One additional nice-to-have: could the commit message contain the list of intrinsics that you are adding? Personally I think that making references to ACLE there is not needed, but I don't mind it being there.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
12954 | This method is only needed for unpacked offset, right? Maybe worth stating that for all other cases the input node is already OK? Otherwise I read this as:
| |
12960 | bail off -> bail out | |
12964 | [nit] IMO the comment and the code are inconsistent. What about this: // Extend the unpacked offset vector to 64-bit lanes. Offset = DAG.getNode(ISD::ANY_EXTEND, DL, MVT::nxv2i64, Offset); const SDLoc DL(N); SDValue Ops[] = { N->getOperand(0), // Chain N->getOperand(1), // Intrinsic ID N->getOperand(2), // Base Offset, // Offset N->getOperand(4), // Pg N->getOperand(5) // prfop }; This way the other comments are not unnecessarily pushed away. | |
12977 | [nit] represent -> represents | |
12985 | Very nice and useful method :) Could you please generalise the name and use it in peformGatherLoadCombine and performScatterStoreCombine:
It's an identical condition, just written horribly by me :) | |
13005 | Hm, it's a bit more like: Check whether we need to remap? Yes - remap, No - all good! The way it's written now suggests that this method will always remap. | |
13013 | Broken indentation. | |
13020 | ... and by updating the Intrinsic ID ;-) | |
llvm/lib/Target/AArch64/SVEInstrFormats.td | ||
6494 | [nit] Unwritten rule - Pat always follow InstAlias. | |
6861 | [nit] Pat follow InstAlias | |
llvm/test/CodeGen/AArch64/sve-intrinsics-gather-prefetches-scaled-offset.ll | ||
69 | [nit] Inconsistent formatting |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
12954 | nit: this comment is a bit confusing. It doesn't so much 'combine' anything, but it rather 'legalizes the offset'. | |
12958 | nit: unnecessary use of const (same for other places in this patch). | |
13024 | Given that you've gone the route of doing this in ISelLowering rather than having ComplexPatterns, it's probably better to create ISD nodes rather than passing the intrinsics around. This means we can later reuse them if there is ever a llvm.gather.prefetch, but also to streamline the implementation of prefetches with that of the LD1 gathers. It would also do away with having to pass the operands explicitly like this and thus simplify these combines. |
llvm/include/llvm/IR/IntrinsicsAArch64.td | ||
---|---|---|
1268 | I realise this only now, but having the predicate as the third operand is different from the gather loads. Can you make this to be the first operand, rather than the third, to streamline their definition with the gather loads? | |
1278 | Same here. | |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
13024 | We discussed this offline and decided it's fine for now to use the intrinsics directly. We may revisit this later when we clean up the logic in this file for mapping the gathers/scatters/prefetches. |
llvm/include/llvm/IR/IntrinsicsAArch64.td | ||
---|---|---|
1268 | Good point. I have redefined the class as class SVE_gather_prf_scalar_base_vector_offset_scaled : Intrinsic<[], [ LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>, // Predicate llvm_ptr_ty, // Base address llvm_anyvector_ty, // Offsets llvm_i32_ty // Prfop ], [IntrInaccessibleMemOrArgMemOnly, NoCapture<0>, ImmArg<3>]>; and consequently redefined the intrinsics from, say, declare void @llvm.aarch64.sve.gather.prfb.scaled.uxtw.nx4vi32(i8* %base, <vscale x 4 x i32> %offset, <vscale x 4 x i1> %Pg, i32 %prfop), to declare void @llvm.aarch64.sve.gather.prfb.scaled.uxtw.nx4vi32(<vscale x 4 x i1> %Pg, i8* %base, <vscale x 4 x i32> %offset, i32 %prfop), but I get llc runtime failures as the following: Wrong types for attribute: inalloca nest noalias nocapture nonnull readnone readonly signext sret zeroext byval dereferenceable(1) dereferenceable_or_null(1) void (<vscale x 4 x i1>, i8*, <vscale x 4 x i32>, i32)* @llvm.aarch64.sve.gather.prfb.scaled.uxtw.nxv4i32 I cannot figure out what I am doing wrong though... Am I right assuming that llvm_anyvec_ty is still the overloaded type that needs to be mentioned in the name of the intrinsic in IR? (nxv4i32 for the example reported here). | |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
12954 | I am not sure what you want me to do here. If it helps, I have updated the comment of the method to the following: /// Legalize the gather prefetch (scalar + vector addressing mode) when the /// offset vector is an unpacked 32-bit scalable vector. The other cases (Offset /// != nxv2i32) do not need legalization. | |
12958 | Goodbye, beloved const. :) | |
13005 | I have updated the comment of the method to the following, let me know if it is still unclear. /// Combines a node carrying the intrinsic `aarch64_sve_gather_prf<T>` into a /// node that uses `aarch64_sve_gather_prf<T>_scaled_uxtw` when the scalar /// offset passed to `aarch64_sve_gather_prf<T>` is not a valid immediate for /// the sve gather prefetch instruction with vector plus immediate addressing /// mode. | |
13013 | Hum, CI would have caught it. I usually run git clang-format to my patches before submitting them. I'll double check if I haven't miss this one. | |
13024 | Thank you! |
Hi @sdesmalen, @andwar.
Thank you for your reviews.
@sdesmalen, I got into troubles when trying to reorder the operands. I have reported a runtime error that llc is spitting out when running (see inline comment). I'll keep looking at it as the request makes absolutely sense, but let me know if you already can spot what I am doing wrong from the example.
@andwar - I'd like to keep the list of SVE ACLE in the commit message. It seems quite a useful piece of information to carry around if someone asks "which ACLEs are these intrinsics implementing?".
Other than that, I thing I have covered all feedback. Let me know if I missed any.
Thank you!
Francesco
Thanks you for addressing my comments @fpetrogalli !
A one final nice-to-have (not a blocker!). Could this snippet from legalizeSVEGatherPrefetchOffsVec be extracted into a separate function:
// Not an unpacked vector, bail out. if (Offset.getValueType().getSimpleVT().SimpleTy != MVT::nxv2i32) return SDValue(); // Extend the unpacked offset vector to 64-bit lanes. SDLoc DL(N); Offset = DAG.getNode(ISD::ANY_EXTEND, DL, MVT::nxv2i64, Offset);
and re-used by performGatherLoadCombine and performScatterStoreCombine? We can do it in a separate patch too.
llvm/include/llvm/IR/IntrinsicsAArch64.td | ||
---|---|---|
1268 |
I think so, but IIUC this (from you example that didn't work): LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>, // Predicate means the following: the 0th operand has the same number of elements as the 0th operand. Which is why, I think, you hit that problem ^^^. |
llvm/include/llvm/IR/IntrinsicsAArch64.td | ||
---|---|---|
1268 | The pointer is now the second argument, and I suspect you forgot to update NoCapture<0> to NoCapture<1>. |
llvm/include/llvm/IR/IntrinsicsAArch64.td | ||
---|---|---|
1268 | Thanks! I missed that. | |
1268 | Just a clarification: the 0-th operand refers to the 0-th overloaded operand, which is still llvm_anyvector_ty. @sdesmalen was right, it's the NoCapture that was causing problems. All good now :). Thank you both. |
Is it worth? The logic of these methods seems quite different to me. What performGatherLoadCombine` and performScatterStoreCombine do is "do something in any case, if the offset is unpacked, extend it". What the method legalizeSVEGatherPrefetchOffsVec do is "do nothing, unless the offset is unpacked". At the end, what I can see that can be merged is the Offset = DAG.getNode(... invocation that insert the ANY_EXTEND. Not much that is worth factoring out.
Francesco
Hello! Thanks for this patch. In release builds, I'm seeing lots of warnings that:
In file included from ../lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp:9: ../lib/Target/AArch64/MCTargetDesc/AArch64AddressingModes.h:850:13: warning: unused function 'isValidImmForSVEVecImmAddrMode' [-Wunused-function] static bool isValidImmForSVEVecImmAddrMode(unsigned OffsetInBytes, ^
Also, there's one instance:
../lib/Target/AArch64/AArch64ISelLowering.cpp:12714:21: warning: unused variable 'OffsetConst' [-Wunused-variable] ConstantSDNode *OffsetConst = dyn_cast<ConstantSDNode>(Offset.getNode()); ^
Can you please take a look?
Hi @nickdesaulniers ,
thank you for pointing this out! Someone has been faster than me in reacting to this! https://github.com/llvm/llvm-project/commit/f20dcc31e31fdb94159572af1e2a87dcc5d02bd8 (@vitalybuka - thank you for your patch)
Please let me know if you see any other problems with the patch.
Kind regards,
Francesco
This is an unwritten rule, but so far class definitions and intrinsics definitions are kept separately (i.e. all class definitions first, followed by all intrinsics definitions). Some classes are re-used by many intrinsics, so the 2 definitions (class and intrinsics) can end up being far apart anyway.
For consistency sake I'd separate the two.