This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/IR/
-
llvm/
-
IR/
11/11
IntrinsicsAArch64.td
-
lib/Target/AArch64/
-
Target/
-
AArch64/
17/17
AArch64ISelLowering.cpp
6/6
AArch64SVEInstrInfo.td
-
MCTargetDesc/
-
AArch64AddressingModes.h
3/3
SVEInstrFormats.td
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
5/5
sve-intrinsics-gather-prefetches-scaled-offset.ll
2/2
sve-intrinsics-gather-prefetches-vect-base-imm-offset.ll
-
sve-intrinsics-gather-prefetches-vect-base-invalid-imm-offset.ll

Differential D75580

[llvm][CodeGen][SVE] Implement IR intrinsics for gather prefetch.
ClosedPublic

Authored by fpetrogalli on Mar 3 2020, 4:00 PM.

Download Raw Diff

Details

Reviewers

andwar
sdesmalen
efriedma
rengolin

Commits

rG0f2b68d9c70e: Implement IR intrinsics for gather prefetch.

Summary

Intrinsics and relative codegen has been implemented for the following
SVE instructions:

1. PRF<T> <prfop>, <Pg>, [<Xn|SP>, <Zm>.S, <mod>] -> 32-bit          scaled offset
2. PRF<T> <prfop>, <Pg>, [<Xn|SP>, <Zm>.D, <mod>] -> 32-bit unpacked scaled offset
3. PRF<T> <prfop>, <Pg>, [<Xn|SP>, <Zm>.D]        -> 64-bit          scaled offset
4. PRF<T> <prfop>, <Pg>, [<Zn>.S{, #<imm>}]       -> 32-bit element
5. PRF<T> <prfop>, <Pg>, [<Zn>.D{, #<imm>}]       -> 64-bit element

The instructions are associated the following intrinsics, respectively:

1. void @llvm.aarch64.sve.gather.prf<T>.scaled.<mod>.nx4vi32(
          i8* %base,
          <vscale x 4 x i32> %offset,
      <vscale x 4 x i1> %Pg,
      i32 %prfop)

2. void @llvm.aarch64.sve.gather.prf<T>.scaled.<mod>.nx2vi32(
          i8* %base,
          <vscale x 2 x i32> %offset,
      <vscale x 2 x i1> %Pg,
      i32 %prfop)

3. void @llvm.aarch64.sve.gather.prf<T>.scaled.nx2vi64(
          i8* %base,
      <vscale x 2 x i64> %offset,
      <vscale x 2 x i1> %Pg,
      i32 %prfop)

4. void @llvm.aarch64.sve.gather.prf<T>.nx4vi32(
          <vscale x 4 x i32> %bases,
      i64 %imm,
      <vscale x 4 x i1> %Pg,
      i32 %prfop)

5. void @llvm.aarch64.sve.gather.prf<T>.nx2vi64(
          <vscale x 2 x i64> %bases,
      i64 %imm,
      <vscale x 2 x i1> %Pg,
      i32 %prfop)

The intrinsics are the IR counterpart of the following SVE ACLE functions:

* void svprf<T>(svbool_t pg, const void *base, svprfop op)
* void svprf<T>_vnum(svbool_t pg, const void *base, int64_t vnum, svprfop op)
* void svprf<T>_gather[_u32base](svbool_t pg, svuint32_t bases, svprfop op)
* void svprf<T>_gather[_u64base](svbool_t pg, svuint64_t bases, svprfop op)
* void svprf<T>_gather_[s32]offset(svbool_t pg, const void *base, svint32_t offsets, svprfop op)
* void svprf<T>_gather_[u32]offset(svbool_t pg, const void *base, svint32_t offsets, svprfop op)
* void svprf<T>_gather_[s64]offset(svbool_t pg, const void *base, svint64_t offsets, svprfop op)
* void svprf<T>_gather_[u64]offset(svbool_t pg, const void *base, svint64_t offsets, svprfop op)
* void svprf<T>_gather[_u32base]_offset(svbool_t pg, svuint32_t bases, int64_t offset, svprfop op)
* void svprf<T>_gather[_u64base]_offset(svbool_t pg, svuint64_t bases,int64_t offset, svprfop op)

Reviewers: andwar, sdesmalen, efriedma, rengolin

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75580

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

fpetrogalli created this revision.Mar 3 2020, 4:00 PM

Herald added a reviewer: rengolin. · View Herald TranscriptMar 3 2020, 4:00 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: llvm-commits, psnobl, rkruppe and 2 others. · View Herald Transcript

Harbormaster completed remote builds in B47995: Diff 248062.Mar 3 2020, 5:16 PM

Hi @fpetrogalli, thank you for working on this!

My main points:

ACLE allows scalar indices that may be out of range for PRFH <prfop>, <Pg>, [<Zn>.S{, #<imm>}]. Will we be able to cater for those with your approach?
Could you please remove the extra empty lines from test files? (and make sure that the formatting is consistent)
We tend to put declarations at the end of the test files (I've not seen a test with a different style, but I've not seen them all either ;-) )

llvm/include/llvm/IR/IntrinsicsAArch64.td
1267	This is an unwritten rule, but so far class definitions and intrinsics definitions are kept separately (i.e. all class definitions first, followed by all intrinsics definitions). Some classes are re-used by many intrinsics, so the 2 definitions (class and intrinsics) can end up being far apart anyway. For consistency sake I'd separate the two.
1270	[nit] I think that this way would be a bit more consistent (one space before the comment) class SVE_prf_scaled : Intrinsic<[], [ llvm_anyptr_ty, // Base address llvm_anyvector_ty, // offsets LLVMScalarOrSameVectorWidth<1, llvm_i1_ty>, // Predicate llvm_i32_ty // prfop ], [IntrArgMemOnly, NoCapture<0>, ImmArg<3>]>;
1284	[nit] Base addresses
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
13	Please, could this block) of code (and it's sunblocks) could be documented/commented consistently with the rest of the file? E.g.: // Prefetches - pattern definitions // Having said that, this file is growing organically so we may want to revisit the current style.
18	Why extra indentation?
40	IMO this would be easier to read if it was split across two lines.
llvm/test/CodeGen/AArch64/sve-intrinsics-gather-prefetches-scaled-offset.ll
263	[nit] many empty lines
llvm/test/CodeGen/AArch64/sve-intrinsics-gather-prefetches-vect-base-imm-offset.ll
11	What about indices that are out of range for `<prfop>, <Pg>, [<Zn>.S{, #<imm>}]`? IIUC, this is similar to what's tested here: https://github.com/llvm/llvm-project/blob/e60c28746b0cf30323e736f34048b02ff34688f6/llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-vector-base-imm-offset.ll#L284 Currently this is neither supported nor tested.
37	[nit] empty line

sdesmalen added inline comments.Mar 4 2020, 6:07 AM

llvm/include/llvm/IR/IntrinsicsAArch64.td
1289	Because the ACLE intrinsic doesn't require the offset to be an immediate, we don't want ImmArg<1> to be an immediate in the LLVM IR intrinsic, and rather leave it to CodeGen to make sure it generates the appropriate instruction.
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
40	nit: format to split statement on two lines?
llvm/lib/Target/AArch64/SVEInstrFormats.td
6463	nit: this is some odd formatting here.
llvm/test/CodeGen/AArch64/sve-intrinsics-gather-prefetches-scaled-offset.ll
5	nit: Most tests have the declarations at the bottom. nit: There are some random newlines in this file.

fpetrogalli marked an inline comment as done.Mar 4 2020, 6:58 AM

fpetrogalli added inline comments.

llvm/test/CodeGen/AArch64/sve-intrinsics-gather-prefetches-scaled-offset.ll
133	The offset vector here should be `<vscale x 2 x i32>`, not `<vscale x 2 x i64>`. I'll fix this.

In D75580#1905227, @andwar wrote:

Hi @fpetrogalli, thank you for working on this!

My main points:

ACLE allows scalar indices that may be out of range for PRFH <prfop>, <Pg>, [<Zn>.S{, #<imm>}]. Will we be able to cater for those with your approach?

Yep, I have added a test to cover all cases in which the intrinsic cannot be lowered to such instruction (run time values or invalid immediate).

Could you please remove the extra empty lines from test files? (and make sure that the formatting is consistent)

Done! Please double check I haven't missed any.

We tend to put declarations at the end of the test files (I've not seen a test with a different style, but I've not seen them all either ;-) )

Done.

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
13	I have removed these `PatFrag`, they are not needed anymore in last patch.
18	PatFrags removed from the patch.
llvm/test/CodeGen/AArch64/sve-intrinsics-gather-prefetches-scaled-offset.ll
133	This is now working, the offset uses the correct scalable vector `<vscale x 2 x i32>`.

Hi @sdesmalen and @andwar,

thank you for your review!

I have addressed all your comments, please let me know if I have missed any.

Please notice that this patch now handles unpacked vector offsets and invalid immediate scalar offset.

Grazie,

Francesco

fpetrogalli edited the summary of this revision. (Show Details)Mar 9 2020, 7:43 AM

Harbormaster completed remote builds in B48548: Diff 249098.Mar 9 2020, 8:35 AM

@fpetrogalli Thanks for the update - this has evolved into a very neat patch! I've left a few nits, nothing major. One additional nice-to-have: could the commit message contain the list of intrinsics that you are adding? Personally I think that making references to ACLE there is not needed, but I don't mind it being there.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
12957	This method is only needed for unpacked offset, right? Maybe worth stating that for all other cases the input node is already OK? Otherwise I read this as: do something for unpacked 32-bit offsets, in all other cases "I don't know". And in reality it's more like: do something for unpacked 32-bit offsets, in all other cases it's already OK
12963	`bail off` -> `bail out`
12967	[nit] IMO the comment and the code are inconsistent. What about this: // Extend the unpacked offset vector to 64-bit lanes. Offset = DAG.getNode(ISD::ANY_EXTEND, DL, MVT::nxv2i64, Offset); const SDLoc DL(N); SDValue Ops[] = { N->getOperand(0), // Chain N->getOperand(1), // Intrinsic ID N->getOperand(2), // Base Offset, // Offset N->getOperand(4), // Pg N->getOperand(5) // prfop }; This way the other comments are not unnecessarily pushed away.
12980	[nit] represent -> represents [nit] Inconsistent comment style - you use `//` here, but you used `///` before. Since these are static methods, I've been using `//` everywhere, but the file is inconsistent in this respect. So choose the one you like best :)
12988	Very nice and useful method :) Could you please generalise the name and use it in `peformGatherLoadCombine` and `performScatterStoreCombine`: https://github.com/llvm/llvm-project/blob/ff9ac33e1e02a7c637b8cdf081a7d88f40bf387f/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp#L12768 https://github.com/llvm/llvm-project/blob/ff9ac33e1e02a7c637b8cdf081a7d88f40bf387f/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp#L12674 It's an identical condition, just written horribly by me :)
13008	Hm, it's a bit more like: Check whether we need to remap? Yes - remap, No - all good! The way it's written now suggests that this method will always remap.
13016	Broken indentation.
13023	... and by updating the Intrinsic ID ;-)
llvm/lib/Target/AArch64/SVEInstrFormats.td
6494	[nit] Unwritten rule - `Pat` always follow `InstAlias`.
6861	[nit] `Pat` follow `InstAlias`
llvm/test/CodeGen/AArch64/sve-intrinsics-gather-prefetches-scaled-offset.ll
69	[nit] Inconsistent formatting

sdesmalen added inline comments.Mar 10 2020, 4:47 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
12957	nit: this comment is a bit confusing. It doesn't so much 'combine' anything, but it rather 'legalizes the offset'.
12961	nit: unnecessary use of `const` (same for other places in this patch).
13027	Given that you've gone the route of doing this in ISelLowering rather than having ComplexPatterns, it's probably better to create ISD nodes rather than passing the intrinsics around. This means we can later reuse them if there is ever a llvm.gather.prefetch, but also to streamline the implementation of prefetches with that of the LD1 gathers. It would also do away with having to pass the operands explicitly like this and thus simplify these combines.

sdesmalen added inline comments.Mar 12 2020, 8:58 AM

llvm/include/llvm/IR/IntrinsicsAArch64.td
1272	I realise this only now, but having the predicate as the third operand is different from the gather loads. Can you make this to be the first operand, rather than the third, to streamline their definition with the gather loads?
1282	Same here.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13027	We discussed this offline and decided it's fine for now to use the intrinsics directly. We may revisit this later when we clean up the logic in this file for mapping the gathers/scatters/prefetches.

fpetrogalli marked 21 inline comments as done.Mar 12 2020, 3:26 PM

fpetrogalli added inline comments.

llvm/include/llvm/IR/IntrinsicsAArch64.td
1272	Good point. I have redefined the class as class SVE_gather_prf_scalar_base_vector_offset_scaled : Intrinsic<[], [ LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>, // Predicate llvm_ptr_ty, // Base address llvm_anyvector_ty, // Offsets llvm_i32_ty // Prfop ], [IntrInaccessibleMemOrArgMemOnly, NoCapture<0>, ImmArg<3>]>; and consequently redefined the intrinsics from, say, `declare void @llvm.aarch64.sve.gather.prfb.scaled.uxtw.nx4vi32(i8* %base, <vscale x 4 x i32> %offset, <vscale x 4 x i1> %Pg, i32 %prfop)`, to `declare void @llvm.aarch64.sve.gather.prfb.scaled.uxtw.nx4vi32(<vscale x 4 x i1> %Pg, i8* %base, <vscale x 4 x i32> %offset, i32 %prfop)`, but I get `llc` runtime failures as the following: Wrong types for attribute: inalloca nest noalias nocapture nonnull readnone readonly signext sret zeroext byval dereferenceable(1) dereferenceable_or_null(1) void (<vscale x 4 x i1>, i8, <vscale x 4 x i32>, i32) @llvm.aarch64.sve.gather.prfb.scaled.uxtw.nxv4i32 I cannot figure out what I am doing wrong though... Am I right assuming that `llvm_anyvec_ty` is still the overloaded type that needs to be mentioned in the name of the intrinsic in IR? (`nxv4i32` for the example reported here).
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
12957	I am not sure what you want me to do here. If it helps, I have updated the comment of the method to the following: /// Legalize the gather prefetch (scalar + vector addressing mode) when the /// offset vector is an unpacked 32-bit scalable vector. The other cases (Offset /// != nxv2i32) do not need legalization.
12961	Goodbye, beloved `const`. :)
13008	I have updated the comment of the method to the following, let me know if it is still unclear. /// Combines a node carrying the intrinsic `aarch64_sve_gather_prf<T>` into a /// node that uses `aarch64_sve_gather_prf<T>_scaled_uxtw` when the scalar /// offset passed to `aarch64_sve_gather_prf<T>` is not a valid immediate for /// the sve gather prefetch instruction with vector plus immediate addressing /// mode.
13016	Hum, CI would have caught it. I usually run `git clang-format` to my patches before submitting them. I'll double check if I haven't miss this one.
13027	Thank you!

Hi @sdesmalen, @andwar.

Thank you for your reviews.

@sdesmalen, I got into troubles when trying to reorder the operands. I have reported a runtime error that llc is spitting out when running (see inline comment). I'll keep looking at it as the request makes absolutely sense, but let me know if you already can spot what I am doing wrong from the example.

@andwar - I'd like to keep the list of SVE ACLE in the commit message. It seems quite a useful piece of information to carry around if someone asks "which ACLEs are these intrinsics implementing?".

Other than that, I thing I have covered all feedback. Let me know if I missed any.

Thank you!

Francesco

fpetrogalli edited the summary of this revision. (Show Details)Mar 12 2020, 3:34 PM

Herald added subscribers: danielkiss, kristof.beyls. · View Herald TranscriptMar 12 2020, 3:34 PM

Harbormaster failed remote builds in B49067: Diff 250075!Mar 12 2020, 4:51 PM

Thanks you for addressing my comments @fpetrogalli !

A one final nice-to-have (not a blocker!). Could this snippet from legalizeSVEGatherPrefetchOffsVec be extracted into a separate function:

// Not an unpacked vector, bail out.
if (Offset.getValueType().getSimpleVT().SimpleTy != MVT::nxv2i32)
  return SDValue();

// Extend the unpacked offset vector to 64-bit lanes.
SDLoc DL(N);
Offset = DAG.getNode(ISD::ANY_EXTEND, DL, MVT::nxv2i64, Offset);

and re-used by performGatherLoadCombine and performScatterStoreCombine? We can do it in a separate patch too.

llvm/include/llvm/IR/IntrinsicsAArch64.td
1272	Am I right assuming that llvm_anyvec_ty is still the overloaded type that needs to be mentioned in the name of the intrinsic in IR? (nxv4i32 for the example reported here). I think so, but IIUC this (from you example that didn't work): LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>, // Predicate means the following: the 0th operand has the same number of elements as the 0th operand. Which is why, I think, you hit that problem ^^^.

sdesmalen added inline comments.Mar 13 2020, 4:23 AM

llvm/include/llvm/IR/IntrinsicsAArch64.td
1272	The pointer is now the second argument, and I suspect you forgot to update `NoCapture<0>` to `NoCapture<1>`.

I have reorder the operands as requested by @sdesmalen.

Thank you.

fpetrogalli marked 6 inline comments as done.Mar 13 2020, 8:34 AM

fpetrogalli added inline comments.

llvm/include/llvm/IR/IntrinsicsAArch64.td
1272	Thanks! I missed that.
1272	Just a clarification: the 0-th operand refers to the 0-th overloaded operand, which is still `llvm_anyvector_ty`. @sdesmalen was right, it's the `NoCapture` that was causing problems. All good now :). Thank you both.

Harbormaster failed remote builds in B49145: Diff 250217!Mar 13 2020, 9:06 AM

In D75580#1921350, @andwar wrote:
Thanks you for addressing my comments @fpetrogalli !

A one final nice-to-have (not a blocker!). Could this snippet from legalizeSVEGatherPrefetchOffsVec be extracted into a separate function:
// Not an unpacked vector, bail out.
if (Offset.getValueType().getSimpleVT().SimpleTy != MVT::nxv2i32)
  return SDValue();

// Extend the unpacked offset vector to 64-bit lanes.
SDLoc DL(N);
Offset = DAG.getNode(ISD::ANY_EXTEND, DL, MVT::nxv2i64, Offset);
and re-used by performGatherLoadCombine and performScatterStoreCombine? We can do it in a separate patch too.

Is it worth? The logic of these methods seems quite different to me. What performGatherLoadCombine` and performScatterStoreCombine do is "do something in any case, if the offset is unpacked, extend it". What the method legalizeSVEGatherPrefetchOffsVec do is "do nothing, unless the offset is unpacked". At the end, what I can see that can be merged is the Offset = DAG.getNode(... invocation that insert the ANY_EXTEND. Not much that is worth factoring out.

Francesco

LGTM!

This revision is now accepted and ready to land.Mar 16 2020, 2:28 AM

LGTM

Closed by commit rG0f2b68d9c70e: Implement IR intrinsics for gather prefetch. (authored by fpetrogalli). · Explain WhyMar 16 2020, 12:02 PM

This revision was automatically updated to reflect the committed changes.

Hello! Thanks for this patch. In release builds, I'm seeing lots of warnings that:

In file included from ../lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp:9:
../lib/Target/AArch64/MCTargetDesc/AArch64AddressingModes.h:850:13: warning: unused function 'isValidImmForSVEVecImmAddrMode' [-Wunused-function]
static bool isValidImmForSVEVecImmAddrMode(unsigned OffsetInBytes,
            ^

Also, there's one instance:

../lib/Target/AArch64/AArch64ISelLowering.cpp:12714:21: warning: unused variable 'OffsetConst' [-Wunused-variable]
    ConstantSDNode *OffsetConst = dyn_cast<ConstantSDNode>(Offset.getNode());
                    ^

Can you please take a look?

In D75580#1925234, @nickdesaulniers wrote:

Hello! Thanks for this patch. In release builds, I'm seeing lots of warnings that:

In file included from ../lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp:9:
../lib/Target/AArch64/MCTargetDesc/AArch64AddressingModes.h:850:13: warning: unused function 'isValidImmForSVEVecImmAddrMode' [-Wunused-function]
static bool isValidImmForSVEVecImmAddrMode(unsigned OffsetInBytes,
            ^

Also, there's one instance:

../lib/Target/AArch64/AArch64ISelLowering.cpp:12714:21: warning: unused variable 'OffsetConst' [-Wunused-variable]
    ConstantSDNode *OffsetConst = dyn_cast<ConstantSDNode>(Offset.getNode());
                    ^

Can you please take a look?

Hi @nickdesaulniers ,

thank you for pointing this out! Someone has been faster than me in reacting to this! https://github.com/llvm/llvm-project/commit/f20dcc31e31fdb94159572af1e2a87dcc5d02bd8 (@vitalybuka - thank you for your patch)

Please let me know if you see any other problems with the patch.

Kind regards,

Francesco

Revision Contents

Path

Size

llvm/

include/

llvm/

IR/

IntrinsicsAArch64.td

49 lines

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

106 lines

AArch64SVEInstrInfo.td

42 lines

MCTargetDesc/

AArch64AddressingModes.h

20 lines

SVEInstrFormats.td

37 lines

test/

CodeGen/

AArch64/

sve-intrinsics-gather-prefetches-scaled-offset.ll

200 lines

sve-intrinsics-gather-prefetches-vect-base-imm-offset.ll

82 lines

sve-intrinsics-gather-prefetches-vect-base-invalid-imm-offset.ll

286 lines

Diff 250217

llvm/include/llvm/IR/IntrinsicsAArch64.td

Show First 20 Lines • Show All 1,257 Lines • ▼ Show 20 Lines	class AdvSIMD_ScatterStore_VS_Intrinsic
: Intrinsic<[],		: Intrinsic<[],
[		[
llvm_anyvector_ty,		llvm_anyvector_ty,
LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
llvm_anyvector_ty, llvm_i64_ty		llvm_anyvector_ty, llvm_i64_ty
],		],
[IntrWriteMem, IntrArgMemOnly]>;		[IntrWriteMem, IntrArgMemOnly]>;


		class SVE_gather_prf_scalar_base_vector_offset_scaled
		andwarUnsubmitted Done Reply Inline Actions This is an unwritten rule, but so far class definitions and intrinsics definitions are kept separately (i.e. all class definitions first, followed by all intrinsics definitions). Some classes are re-used by many intrinsics, so the 2 definitions (class and intrinsics) can end up being far apart anyway. For consistency sake I'd separate the two. andwar: This is an unwritten rule, but so far class definitions and intrinsics definitions are kept…
		: Intrinsic<[],
		[
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>, // Predicate
		andwarUnsubmitted Done Reply Inline Actions [nit] I think that this way would be a bit more consistent (one space before the comment) class SVE_prf_scaled : Intrinsic<[], [ llvm_anyptr_ty, // Base address llvm_anyvector_ty, // offsets LLVMScalarOrSameVectorWidth<1, llvm_i1_ty>, // Predicate llvm_i32_ty // prfop ], [IntrArgMemOnly, NoCapture<0>, ImmArg<3>]>; andwar: [nit] I think that this way would be a bit more consistent (one space before the comment) ```…
		llvm_ptr_ty, // Base address
		llvm_anyvector_ty, // Offsets
		sdesmalenUnsubmitted Done Reply Inline Actions I realise this only now, but having the predicate as the third operand is different from the gather loads. Can you make this to be the first operand, rather than the third, to streamline their definition with the gather loads? sdesmalen: I realise this only now, but having the predicate as the third operand is different from the…
		fpetrogalliAuthorUnsubmitted Done Reply Inline Actions Good point. I have redefined the class as class SVE_gather_prf_scalar_base_vector_offset_scaled : Intrinsic<[], [ LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>, // Predicate llvm_ptr_ty, // Base address llvm_anyvector_ty, // Offsets llvm_i32_ty // Prfop ], [IntrInaccessibleMemOrArgMemOnly, NoCapture<0>, ImmArg<3>]>; and consequently redefined the intrinsics from, say, `declare void @llvm.aarch64.sve.gather.prfb.scaled.uxtw.nx4vi32(i8* %base, <vscale x 4 x i32> %offset, <vscale x 4 x i1> %Pg, i32 %prfop)`, to `declare void @llvm.aarch64.sve.gather.prfb.scaled.uxtw.nx4vi32(<vscale x 4 x i1> %Pg, i8* %base, <vscale x 4 x i32> %offset, i32 %prfop)`, but I get `llc` runtime failures as the following: Wrong types for attribute: inalloca nest noalias nocapture nonnull readnone readonly signext sret zeroext byval dereferenceable(1) dereferenceable_or_null(1) void (<vscale x 4 x i1>, i8, <vscale x 4 x i32>, i32) @llvm.aarch64.sve.gather.prfb.scaled.uxtw.nxv4i32 I cannot figure out what I am doing wrong though... Am I right assuming that `llvm_anyvec_ty` is still the overloaded type that needs to be mentioned in the name of the intrinsic in IR? (`nxv4i32` for the example reported here). fpetrogalli: Good point. I have redefined the class as ``` class…
		sdesmalenUnsubmitted Done Reply Inline Actions The pointer is now the second argument, and I suspect you forgot to update `NoCapture<0>` to `NoCapture<1>`. sdesmalen: The pointer is now the second argument, and I suspect you forgot to update `NoCapture<0>` to…
		fpetrogalliAuthorUnsubmitted Done Reply Inline Actions Thanks! I missed that. fpetrogalli: Thanks! I missed that.
		andwarUnsubmitted Done Reply Inline Actions Am I right assuming that llvm_anyvec_ty is still the overloaded type that needs to be mentioned in the name of the intrinsic in IR? (nxv4i32 for the example reported here). I think so, but IIUC this (from you example that didn't work): LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>, // Predicate means the following: the 0th operand has the same number of elements as the 0th operand. Which is why, I think, you hit that problem ^^^. andwar: > Am I right assuming that llvm_anyvec_ty is still the overloaded type that needs to be…
		fpetrogalliAuthorUnsubmitted Done Reply Inline Actions Just a clarification: the 0-th operand refers to the 0-th overloaded operand, which is still `llvm_anyvector_ty`. @sdesmalen was right, it's the `NoCapture` that was causing problems. All good now :). Thank you both. fpetrogalli: Just a clarification: the 0-th operand refers to the 0-th overloaded operand, which is still…
		llvm_i32_ty // Prfop
		],
		[IntrInaccessibleMemOrArgMemOnly, NoCapture<1>, ImmArg<3>]>;

		class SVE_gather_prf_vector_base_scalar_offset
		: Intrinsic<[],
		[
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>, // Predicate
		llvm_anyvector_ty, // Base addresses
		llvm_i64_ty, // Scalar offset
		sdesmalenUnsubmitted Done Reply Inline Actions Same here. sdesmalen: Same here.
		llvm_i32_ty // Prfop
		],
		andwarUnsubmitted Done Reply Inline Actions [nit] Base addresses andwar: [nit] Base addresses
		[IntrInaccessibleMemOrArgMemOnly, ImmArg<3>]>;

//		//
// Loads		// Loads
//		//
		sdesmalenUnsubmitted Done Reply Inline Actions Because the ACLE intrinsic doesn't require the offset to be an immediate, we don't want ImmArg<1> to be an immediate in the LLVM IR intrinsic, and rather leave it to CodeGen to make sure it generates the appropriate instruction. sdesmalen: Because the ACLE intrinsic doesn't require the offset to be an immediate, we don't want…

def int_aarch64_sve_ldnt1 : AdvSIMD_1Vec_PredLoad_Intrinsic;		def int_aarch64_sve_ldnt1 : AdvSIMD_1Vec_PredLoad_Intrinsic;

def int_aarch64_sve_ldnf1 : AdvSIMD_1Vec_PredFaultingLoad_Intrinsic;		def int_aarch64_sve_ldnf1 : AdvSIMD_1Vec_PredFaultingLoad_Intrinsic;
def int_aarch64_sve_ldff1 : AdvSIMD_1Vec_PredFaultingLoad_Intrinsic;		def int_aarch64_sve_ldff1 : AdvSIMD_1Vec_PredFaultingLoad_Intrinsic;

//		//
// Stores		// Stores
//		//

def int_aarch64_sve_stnt1 : AdvSIMD_1Vec_PredStore_Intrinsic;		def int_aarch64_sve_stnt1 : AdvSIMD_1Vec_PredStore_Intrinsic;

//		//
// Prefetch		// Prefetches
//		//

def int_aarch64_sve_prf		def int_aarch64_sve_prf
: Intrinsic<[], [llvm_anyvector_ty, llvm_ptr_ty, llvm_i32_ty],		: Intrinsic<[], [llvm_anyvector_ty, llvm_ptr_ty, llvm_i32_ty],
[IntrArgMemOnly, ImmArg<2>]>;		[IntrArgMemOnly, ImmArg<2>]>;

		// Scalar + 32-bit scaled offset vector, zero extend, packed and
		// unpacked.
		def int_aarch64_sve_gather_prfb_scaled_uxtw : SVE_gather_prf_scalar_base_vector_offset_scaled;
		def int_aarch64_sve_gather_prfh_scaled_uxtw : SVE_gather_prf_scalar_base_vector_offset_scaled;
		def int_aarch64_sve_gather_prfw_scaled_uxtw : SVE_gather_prf_scalar_base_vector_offset_scaled;
		def int_aarch64_sve_gather_prfd_scaled_uxtw : SVE_gather_prf_scalar_base_vector_offset_scaled;

		// Scalar + 32-bit scaled offset vector, sign extend, packed and
		// unpacked.
		def int_aarch64_sve_gather_prfb_scaled_sxtw : SVE_gather_prf_scalar_base_vector_offset_scaled;
		def int_aarch64_sve_gather_prfw_scaled_sxtw : SVE_gather_prf_scalar_base_vector_offset_scaled;
		def int_aarch64_sve_gather_prfh_scaled_sxtw : SVE_gather_prf_scalar_base_vector_offset_scaled;
		def int_aarch64_sve_gather_prfd_scaled_sxtw : SVE_gather_prf_scalar_base_vector_offset_scaled;

		// Scalar + 64-bit scaled offset vector.
		def int_aarch64_sve_gather_prfb_scaled : SVE_gather_prf_scalar_base_vector_offset_scaled;
		def int_aarch64_sve_gather_prfh_scaled : SVE_gather_prf_scalar_base_vector_offset_scaled;
		def int_aarch64_sve_gather_prfw_scaled : SVE_gather_prf_scalar_base_vector_offset_scaled;
		def int_aarch64_sve_gather_prfd_scaled : SVE_gather_prf_scalar_base_vector_offset_scaled;

		// Vector + scalar.
		def int_aarch64_sve_gather_prfb : SVE_gather_prf_vector_base_scalar_offset;
		def int_aarch64_sve_gather_prfh : SVE_gather_prf_vector_base_scalar_offset;
		def int_aarch64_sve_gather_prfw : SVE_gather_prf_vector_base_scalar_offset;
		def int_aarch64_sve_gather_prfd : SVE_gather_prf_vector_base_scalar_offset;

//		//
// Scalar to vector operations		// Scalar to vector operations
//		//

def int_aarch64_sve_dup : AdvSIMD_SVE_DUP_Intrinsic;		def int_aarch64_sve_dup : AdvSIMD_SVE_DUP_Intrinsic;
def int_aarch64_sve_dup_x : AdvSIMD_SVE_DUP_Unpred_Intrinsic;		def int_aarch64_sve_dup_x : AdvSIMD_SVE_DUP_Unpred_Intrinsic;


▲ Show 20 Lines • Show All 904 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 12,640 Lines • ▼ Show 20 Lines	assert(Offset.getValueType().isScalableVector() &&
"This method is only for scalable vectors of offsets");		"This method is only for scalable vectors of offsets");

SDValue Shift = DAG.getConstant(Log2_32(BitWidth / 8), DL, MVT::i64);		SDValue Shift = DAG.getConstant(Log2_32(BitWidth / 8), DL, MVT::i64);
SDValue SplatShift = DAG.getNode(ISD::SPLAT_VECTOR, DL, MVT::nxv2i64, Shift);		SDValue SplatShift = DAG.getNode(ISD::SPLAT_VECTOR, DL, MVT::nxv2i64, Shift);

return DAG.getNode(ISD::SHL, DL, MVT::nxv2i64, Offset, SplatShift);		return DAG.getNode(ISD::SHL, DL, MVT::nxv2i64, Offset, SplatShift);
}		}

		/// Check if the value of \p Offset represents a valid immediate for the SVE
		/// gather load/prefetch and scatter store instructiona with vector base and
		/// immediate offset addressing mode:
		///
		/// [<Zn>.[S\|D]{, #<imm>}]
		///
		/// where <imm> = sizeof(<T>) * k, for k = 0, 1, ..., 31.
		static bool isValidImmForSVEVecImmAddrMode(SDValue Offset,
		unsigned ScalarSizeInBytes) {
		ConstantSDNode *OffsetConst = dyn_cast<ConstantSDNode>(Offset.getNode());
		return OffsetConst && AArch64_AM::isValidImmForSVEVecImmAddrMode(
		OffsetConst->getZExtValue(), ScalarSizeInBytes);
		}

static SDValue performScatterStoreCombine(SDNode *N, SelectionDAG &DAG,		static SDValue performScatterStoreCombine(SDNode *N, SelectionDAG &DAG,
unsigned Opcode,		unsigned Opcode,
bool OnlyPackedOffsets = true) {		bool OnlyPackedOffsets = true) {
const SDValue Src = N->getOperand(2);		const SDValue Src = N->getOperand(2);
const EVT SrcVT = Src->getValueType(0);		const EVT SrcVT = Src->getValueType(0);
assert(SrcVT.isScalableVector() &&		assert(SrcVT.isScalableVector() &&
"Scatter stores are only possible for SVE vectors");		"Scatter stores are only possible for SVE vectors");

Show All 35 Lines	static SDValue performScatterStoreCombine(SDNode *N, SelectionDAG &DAG,

// SST1_IMM requires that the offset is an immediate that is:		// SST1_IMM requires that the offset is an immediate that is:
// * a multiple of #SizeInBytes,		// * a multiple of #SizeInBytes,
// * in the range [0, 31 x #SizeInBytes],		// * in the range [0, 31 x #SizeInBytes],
// where #SizeInBytes is the size in bytes of the stored items. For		// where #SizeInBytes is the size in bytes of the stored items. For
// immediates outside that range and non-immediate scalar offsets use SST1 or		// immediates outside that range and non-immediate scalar offsets use SST1 or
// SST1_UXTW instead.		// SST1_UXTW instead.
if (Opcode == AArch64ISD::SST1_IMM) {		if (Opcode == AArch64ISD::SST1_IMM) {
uint64_t MaxIndex = 31;
uint64_t SrcElSize = SrcElVT.getStoreSize().getKnownMinSize();

ConstantSDNode *OffsetConst = dyn_cast<ConstantSDNode>(Offset.getNode());		ConstantSDNode *OffsetConst = dyn_cast<ConstantSDNode>(Offset.getNode());
if (nullptr == OffsetConst \|\|		if (!isValidImmForSVEVecImmAddrMode(Offset,
OffsetConst->getZExtValue() > MaxIndex * SrcElSize \|\|		SrcVT.getScalarSizeInBits() / 8)) {
OffsetConst->getZExtValue() % SrcElSize) {
if (MVT::nxv4i32 == Base.getValueType().getSimpleVT().SimpleTy)		if (MVT::nxv4i32 == Base.getValueType().getSimpleVT().SimpleTy)
Opcode = AArch64ISD::SST1_UXTW;		Opcode = AArch64ISD::SST1_UXTW;
else		else
Opcode = AArch64ISD::SST1;		Opcode = AArch64ISD::SST1;

std::swap(Base, Offset);		std::swap(Base, Offset);
}		}
}		}
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
static SDValue performGatherLoadCombine(SDNode *N, SelectionDAG &DAG,		static SDValue performGatherLoadCombine(SDNode *N, SelectionDAG &DAG,
unsigned Opcode,		unsigned Opcode,
bool OnlyPackedOffsets = true) {		bool OnlyPackedOffsets = true) {
const EVT RetVT = N->getValueType(0);		const EVT RetVT = N->getValueType(0);
assert(RetVT.isScalableVector() &&		assert(RetVT.isScalableVector() &&
"Gather loads are only possible for SVE vectors");		"Gather loads are only possible for SVE vectors");

SDLoc DL(N);		SDLoc DL(N);
MVT RetElVT = RetVT.getVectorElementType().getSimpleVT();

// Make sure that the loaded data will fit into an SVE register		// Make sure that the loaded data will fit into an SVE register
if (RetVT.getSizeInBits().getKnownMinSize() > AArch64::SVEBitsPerBlock)		if (RetVT.getSizeInBits().getKnownMinSize() > AArch64::SVEBitsPerBlock)
return SDValue();		return SDValue();

// Depending on the addressing mode, this is either a pointer or a vector of		// Depending on the addressing mode, this is either a pointer or a vector of
// pointers (that fits into one register)		// pointers (that fits into one register)
SDValue Base = N->getOperand(3);		SDValue Base = N->getOperand(3);
// Depending on the addressing mode, this is either a single offset or a		// Depending on the addressing mode, this is either a single offset or a
// vector of offsets (that fits into one register)		// vector of offsets (that fits into one register)
SDValue Offset = N->getOperand(4);		SDValue Offset = N->getOperand(4);

// For "scalar + vector of indices", just scale the indices. This only		// For "scalar + vector of indices", just scale the indices. This only
// applies to non-temporal gathers because there's no instruction that takes		// applies to non-temporal gathers because there's no instruction that takes
// indicies.		// indicies.
if (Opcode == AArch64ISD::GLDNT1_INDEX) {		if (Opcode == AArch64ISD::GLDNT1_INDEX) {
Offset =		Offset = getScaledOffsetForBitWidth(DAG, Offset, DL,
getScaledOffsetForBitWidth(DAG, Offset, DL, RetElVT.getSizeInBits());		RetVT.getScalarSizeInBits());
Opcode = AArch64ISD::GLDNT1;		Opcode = AArch64ISD::GLDNT1;
}		}

// In the case of non-temporal gather loads there's only one SVE instruction		// In the case of non-temporal gather loads there's only one SVE instruction
// per data-size: "scalar + vector", i.e.		// per data-size: "scalar + vector", i.e.
// * ldnt1{b\|h\|w\|d} { z0.s }, p0/z, [z0.s, x0]		// * ldnt1{b\|h\|w\|d} { z0.s }, p0/z, [z0.s, x0]
// Since we do have intrinsics that allow the arguments to be in a different		// Since we do have intrinsics that allow the arguments to be in a different
// order, we may need to swap them to match the spec.		// order, we may need to swap them to match the spec.
if (Opcode == AArch64ISD::GLDNT1 && Offset.getValueType().isVector())		if (Opcode == AArch64ISD::GLDNT1 && Offset.getValueType().isVector())
std::swap(Base, Offset);		std::swap(Base, Offset);

// GLD{FF}1_IMM requires that the offset is an immediate that is:		// GLD{FF}1_IMM requires that the offset is an immediate that is:
// * a multiple of #SizeInBytes,		// * a multiple of #SizeInBytes,
// * in the range [0, 31 x #SizeInBytes],		// * in the range [0, 31 x #SizeInBytes],
// where #SizeInBytes is the size in bytes of the loaded items. For		// where #SizeInBytes is the size in bytes of the loaded items. For
// immediates outside that range and non-immediate scalar offsets use GLD1 or		// immediates outside that range and non-immediate scalar offsets use GLD1 or
// GLD1_UXTW instead.		// GLD1_UXTW instead.
if (Opcode == AArch64ISD::GLD1_IMM \|\| Opcode == AArch64ISD::GLDFF1_IMM) {		if (Opcode == AArch64ISD::GLD1_IMM \|\| Opcode == AArch64ISD::GLDFF1_IMM) {
uint64_t MaxIndex = 31;		if (!isValidImmForSVEVecImmAddrMode(Offset,
uint64_t RetElSize = RetElVT.getStoreSize().getKnownMinSize();		RetVT.getScalarSizeInBits() / 8)) {

ConstantSDNode *OffsetConst = dyn_cast<ConstantSDNode>(Offset.getNode());
if (nullptr == OffsetConst \|\|
OffsetConst->getZExtValue() > MaxIndex * RetElSize \|\|
OffsetConst->getZExtValue() % RetElSize) {
if (MVT::nxv4i32 == Base.getValueType().getSimpleVT().SimpleTy)		if (MVT::nxv4i32 == Base.getValueType().getSimpleVT().SimpleTy)
Opcode = (Opcode == AArch64ISD::GLD1_IMM) ? AArch64ISD::GLD1_UXTW		Opcode = (Opcode == AArch64ISD::GLD1_IMM) ? AArch64ISD::GLD1_UXTW
: AArch64ISD::GLDFF1_UXTW;		: AArch64ISD::GLDFF1_UXTW;
else		else
Opcode = (Opcode == AArch64ISD::GLD1_IMM) ? AArch64ISD::GLD1		Opcode = (Opcode == AArch64ISD::GLD1_IMM) ? AArch64ISD::GLD1
: AArch64ISD::GLDFF1;		: AArch64ISD::GLDFF1;

std::swap(Base, Offset);		std::swap(Base, Offset);
▲ Show 20 Lines • Show All 127 Lines • ▼ Show 20 Lines	performSignExtendInRegCombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI,
SDValue ExtLoad = DAG.getNode(NewOpc, SDLoc(N), VTs, Ops);		SDValue ExtLoad = DAG.getNode(NewOpc, SDLoc(N), VTs, Ops);
DCI.CombineTo(N, ExtLoad);		DCI.CombineTo(N, ExtLoad);
DCI.CombineTo(Src.getNode(), ExtLoad, ExtLoad.getValue(1));		DCI.CombineTo(Src.getNode(), ExtLoad, ExtLoad.getValue(1));

// Return N so it doesn't get rechecked		// Return N so it doesn't get rechecked
return SDValue(N, 0);		return SDValue(N, 0);
}		}

		/// Legalize the gather prefetch (scalar + vector addressing mode) when the
		sdesmalenUnsubmitted Done Reply Inline Actions nit: this comment is a bit confusing. It doesn't so much 'combine' anything, but it rather 'legalizes the offset'. sdesmalen: nit: this comment is a bit confusing. It doesn't so much 'combine' anything, but it rather…
		andwarUnsubmitted Done Reply Inline Actions This method is only needed for unpacked offset, right? Maybe worth stating that for all other cases the input node is already OK? Otherwise I read this as: do something for unpacked 32-bit offsets, in all other cases "I don't know". And in reality it's more like: do something for unpacked 32-bit offsets, in all other cases it's already OK andwar: This method is only needed for unpacked offset, right? Maybe worth stating that for all other…
		fpetrogalliAuthorUnsubmitted Done Reply Inline Actions I am not sure what you want me to do here. If it helps, I have updated the comment of the method to the following: /// Legalize the gather prefetch (scalar + vector addressing mode) when the /// offset vector is an unpacked 32-bit scalable vector. The other cases (Offset /// != nxv2i32) do not need legalization. fpetrogalli: I am not sure what you want me to do here. If it helps, I have updated the comment of the…
		/// offset vector is an unpacked 32-bit scalable vector. The other cases (Offset
		/// != nxv2i32) do not need legalization.
		static SDValue legalizeSVEGatherPrefetchOffsVec(SDNode *N, SelectionDAG &DAG) {
		const unsigned OffsetPos = 4;
		sdesmalenUnsubmitted Done Reply Inline Actions nit: unnecessary use of `const` (same for other places in this patch). sdesmalen: nit: unnecessary use of `const` (same for other places in this patch).
		fpetrogalliAuthorUnsubmitted Done Reply Inline Actions Goodbye, beloved `const`. :) fpetrogalli: Goodbye, beloved `const`. :)
		SDValue Offset = N->getOperand(OffsetPos);

		andwarUnsubmitted Done Reply Inline Actions `bail off` -> `bail out` andwar: `bail off` -> `bail out`
		// Not an unpacked vector, bail out.
		if (Offset.getValueType().getSimpleVT().SimpleTy != MVT::nxv2i32)
		return SDValue();

		andwarUnsubmitted Done Reply Inline Actions [nit] IMO the comment and the code are inconsistent. What about this: // Extend the unpacked offset vector to 64-bit lanes. Offset = DAG.getNode(ISD::ANY_EXTEND, DL, MVT::nxv2i64, Offset); const SDLoc DL(N); SDValue Ops[] = { N->getOperand(0), // Chain N->getOperand(1), // Intrinsic ID N->getOperand(2), // Base Offset, // Offset N->getOperand(4), // Pg N->getOperand(5) // prfop }; This way the other comments are not unnecessarily pushed away. andwar: [nit] IMO the comment and the code are inconsistent. What about this: ``` // Extend the…
		// Extend the unpacked offset vector to 64-bit lanes.
		SDLoc DL(N);
		Offset = DAG.getNode(ISD::ANY_EXTEND, DL, MVT::nxv2i64, Offset);
		SmallVector<SDValue, 5> Ops(N->op_begin(), N->op_end());
		// Replace the offset operand with the 64-bit one.
		Ops[OffsetPos] = Offset;

		return DAG.getNode(N->getOpcode(), DL, DAG.getVTList(MVT::Other), Ops);
		}

		/// Combines a node carrying the intrinsic `aarch64_sve_gather_prf<T>` into a
		/// node that uses `aarch64_sve_gather_prf<T>_scaled_uxtw` when the scalar
		/// offset passed to `aarch64_sve_gather_prf<T>` is not a valid immediate for
		andwarUnsubmitted Done Reply Inline Actions [nit] represent -> represents [nit] Inconsistent comment style - you use `//` here, but you used `///` before. Since these are static methods, I've been using `//` everywhere, but the file is inconsistent in this respect. So choose the one you like best :) andwar: [nit] represent -> represents [nit] Inconsistent comment style - you use `//` here, but you…
		/// the sve gather prefetch instruction with vector plus immediate addressing
		/// mode.
		static SDValue combineSVEPrefetchVecBaseImmOff(SDNode *N, SelectionDAG &DAG,
		unsigned NewIID,
		unsigned ScalarSizeInBytes) {
		const unsigned ImmPos = 4, OffsetPos = 3;
		// No need to combine the node if the immediate is valid...
		if (isValidImmForSVEVecImmAddrMode(N->getOperand(ImmPos), ScalarSizeInBytes))
		andwarUnsubmitted Done Reply Inline Actions Very nice and useful method :) Could you please generalise the name and use it in `peformGatherLoadCombine` and `performScatterStoreCombine`: https://github.com/llvm/llvm-project/blob/ff9ac33e1e02a7c637b8cdf081a7d88f40bf387f/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp#L12768 https://github.com/llvm/llvm-project/blob/ff9ac33e1e02a7c637b8cdf081a7d88f40bf387f/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp#L12674 It's an identical condition, just written horribly by me :) andwar: Very nice and useful method :) Could you please generalise the name and use it in…
		return SDValue();

		// ...otherwise swap the offset base with the offset...
		SmallVector<SDValue, 5> Ops(N->op_begin(), N->op_end());
		std::swap(Ops[ImmPos], Ops[OffsetPos]);
		// ...and remap the intrinsic `aarch64_sve_gather_prf<T>` to
		// `aarch64_sve_gather_prf<T>_scaled_uxtw`.
		SDLoc DL(N);
		Ops[1] = DAG.getConstant(NewIID, DL, MVT::i64);

		return DAG.getNode(N->getOpcode(), DL, DAG.getVTList(MVT::Other), Ops);
		}

SDValue AArch64TargetLowering::PerformDAGCombine(SDNode *N,		SDValue AArch64TargetLowering::PerformDAGCombine(SDNode *N,
DAGCombinerInfo &DCI) const {		DAGCombinerInfo &DCI) const {
SelectionDAG &DAG = DCI.DAG;		SelectionDAG &DAG = DCI.DAG;
switch (N->getOpcode()) {		switch (N->getOpcode()) {
default:		default:
LLVM_DEBUG(dbgs() << "Custom combining: skipping\n");		LLVM_DEBUG(dbgs() << "Custom combining: skipping\n");
break;		break;
		andwarUnsubmitted Done Reply Inline Actions Hm, it's a bit more like: Check whether we need to remap? Yes - remap, No - all good! The way it's written now suggests that this method will always remap. andwar: Hm, it's a bit more like: Check whether we need to remap? Yes - remap, No - all good! The way…
		fpetrogalliAuthorUnsubmitted Done Reply Inline Actions I have updated the comment of the method to the following, let me know if it is still unclear. /// Combines a node carrying the intrinsic `aarch64_sve_gather_prf<T>` into a /// node that uses `aarch64_sve_gather_prf<T>_scaled_uxtw` when the scalar /// offset passed to `aarch64_sve_gather_prf<T>` is not a valid immediate for /// the sve gather prefetch instruction with vector plus immediate addressing /// mode. fpetrogalli: I have updated the comment of the method to the following, let me know if it is still unclear.
case ISD::ADD:		case ISD::ADD:
case ISD::SUB:		case ISD::SUB:
return performAddSubLongCombine(N, DCI, DAG);		return performAddSubLongCombine(N, DCI, DAG);
case ISD::XOR:		case ISD::XOR:
return performXorCombine(N, DAG, DCI, Subtarget);		return performXorCombine(N, DAG, DCI, Subtarget);
case ISD::MUL:		case ISD::MUL:
return performMulCombine(N, DAG, DCI, Subtarget);		return performMulCombine(N, DAG, DCI, Subtarget);
case ISD::SINT_TO_FP:		case ISD::SINT_TO_FP:
		andwarUnsubmitted Done Reply Inline Actions Broken indentation. andwar: Broken indentation.
		fpetrogalliAuthorUnsubmitted Done Reply Inline Actions Hum, CI would have caught it. I usually run `git clang-format` to my patches before submitting them. I'll double check if I haven't miss this one. fpetrogalli: Hum, CI would have caught it. I usually run `git clang-format` to my patches before submitting…
case ISD::UINT_TO_FP:		case ISD::UINT_TO_FP:
return performIntToFpCombine(N, DAG, Subtarget);		return performIntToFpCombine(N, DAG, Subtarget);
case ISD::FP_TO_SINT:		case ISD::FP_TO_SINT:
case ISD::FP_TO_UINT:		case ISD::FP_TO_UINT:
return performFpToIntCombine(N, DAG, DCI, Subtarget);		return performFpToIntCombine(N, DAG, DCI, Subtarget);
case ISD::FDIV:		case ISD::FDIV:
return performFDivCombine(N, DAG, DCI, Subtarget);		return performFDivCombine(N, DAG, DCI, Subtarget);
		andwarUnsubmitted Done Reply Inline Actions ... and by updating the Intrinsic ID ;-) andwar: ... and by updating the Intrinsic ID ;-)
case ISD::OR:		case ISD::OR:
return performORCombine(N, DCI, Subtarget);		return performORCombine(N, DCI, Subtarget);
case ISD::AND:		case ISD::AND:
return performANDCombine(N, DCI);		return performANDCombine(N, DCI);
		sdesmalenUnsubmitted Done Reply Inline Actions Given that you've gone the route of doing this in ISelLowering rather than having ComplexPatterns, it's probably better to create ISD nodes rather than passing the intrinsics around. This means we can later reuse them if there is ever a llvm.gather.prefetch, but also to streamline the implementation of prefetches with that of the LD1 gathers. It would also do away with having to pass the operands explicitly like this and thus simplify these combines. sdesmalen: Given that you've gone the route of doing this in ISelLowering rather than having…
		sdesmalenUnsubmitted Done Reply Inline Actions We discussed this offline and decided it's fine for now to use the intrinsics directly. We may revisit this later when we clean up the logic in this file for mapping the gathers/scatters/prefetches. sdesmalen: We discussed this offline and decided it's fine for now to use the intrinsics directly. We may…
		fpetrogalliAuthorUnsubmitted Done Reply Inline Actions Thank you! fpetrogalli: Thank you!
case ISD::SRL:		case ISD::SRL:
return performSRLCombine(N, DCI);		return performSRLCombine(N, DCI);
case ISD::INTRINSIC_WO_CHAIN:		case ISD::INTRINSIC_WO_CHAIN:
return performIntrinsicCombine(N, DCI, Subtarget);		return performIntrinsicCombine(N, DCI, Subtarget);
case ISD::ANY_EXTEND:		case ISD::ANY_EXTEND:
case ISD::ZERO_EXTEND:		case ISD::ZERO_EXTEND:
case ISD::SIGN_EXTEND:		case ISD::SIGN_EXTEND:
return performExtendCombine(N, DCI, DAG);		return performExtendCombine(N, DCI, DAG);
Show All 22 Lines	case AArch64ISD::DUP:
return performPostLD1Combine(N, DCI, false);		return performPostLD1Combine(N, DCI, false);
case AArch64ISD::NVCAST:		case AArch64ISD::NVCAST:
return performNVCASTCombine(N);		return performNVCASTCombine(N);
case ISD::INSERT_VECTOR_ELT:		case ISD::INSERT_VECTOR_ELT:
return performPostLD1Combine(N, DCI, true);		return performPostLD1Combine(N, DCI, true);
case ISD::INTRINSIC_VOID:		case ISD::INTRINSIC_VOID:
case ISD::INTRINSIC_W_CHAIN:		case ISD::INTRINSIC_W_CHAIN:
switch (cast<ConstantSDNode>(N->getOperand(1))->getZExtValue()) {		switch (cast<ConstantSDNode>(N->getOperand(1))->getZExtValue()) {
		case Intrinsic::aarch64_sve_gather_prfb:
		return combineSVEPrefetchVecBaseImmOff(
		N, DAG, Intrinsic::aarch64_sve_gather_prfb_scaled_uxtw,
		1 /=ScalarSizeInBytes/);
		case Intrinsic::aarch64_sve_gather_prfh:
		return combineSVEPrefetchVecBaseImmOff(
		N, DAG, Intrinsic::aarch64_sve_gather_prfh_scaled_uxtw,
		2 /=ScalarSizeInBytes/);
		case Intrinsic::aarch64_sve_gather_prfw:
		return combineSVEPrefetchVecBaseImmOff(
		N, DAG, Intrinsic::aarch64_sve_gather_prfw_scaled_uxtw,
		4 /=ScalarSizeInBytes/);
		case Intrinsic::aarch64_sve_gather_prfd:
		return combineSVEPrefetchVecBaseImmOff(
		N, DAG, Intrinsic::aarch64_sve_gather_prfd_scaled_uxtw,
		8 /=ScalarSizeInBytes/);
		case Intrinsic::aarch64_sve_gather_prfb_scaled_uxtw:
		case Intrinsic::aarch64_sve_gather_prfb_scaled_sxtw:
		case Intrinsic::aarch64_sve_gather_prfh_scaled_uxtw:
		case Intrinsic::aarch64_sve_gather_prfh_scaled_sxtw:
		case Intrinsic::aarch64_sve_gather_prfw_scaled_uxtw:
		case Intrinsic::aarch64_sve_gather_prfw_scaled_sxtw:
		case Intrinsic::aarch64_sve_gather_prfd_scaled_uxtw:
		case Intrinsic::aarch64_sve_gather_prfd_scaled_sxtw:
		return legalizeSVEGatherPrefetchOffsVec(N, DAG);
case Intrinsic::aarch64_neon_ld2:		case Intrinsic::aarch64_neon_ld2:
case Intrinsic::aarch64_neon_ld3:		case Intrinsic::aarch64_neon_ld3:
case Intrinsic::aarch64_neon_ld4:		case Intrinsic::aarch64_neon_ld4:
case Intrinsic::aarch64_neon_ld1x2:		case Intrinsic::aarch64_neon_ld1x2:
case Intrinsic::aarch64_neon_ld1x3:		case Intrinsic::aarch64_neon_ld1x3:
case Intrinsic::aarch64_neon_ld1x4:		case Intrinsic::aarch64_neon_ld1x4:
case Intrinsic::aarch64_neon_ld2lane:		case Intrinsic::aarch64_neon_ld2lane:
case Intrinsic::aarch64_neon_ld3lane:		case Intrinsic::aarch64_neon_ld3lane:
▲ Show 20 Lines • Show All 810 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

//=- AArch64SVEInstrInfo.td - AArch64 SVE Instructions -- tablegen ------=//		//=- AArch64SVEInstrInfo.td - AArch64 SVE Instructions -- tablegen ------=//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// AArch64 Scalable Vector Extension (SVE) Instruction definitions.		// AArch64 Scalable Vector Extension (SVE) Instruction definitions.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

def SVE8BitLslImm : ComplexPattern<i32, 2, "SelectSVE8BitLslImm", [imm]>;		def SVE8BitLslImm : ComplexPattern<i32, 2, "SelectSVE8BitLslImm", [imm]>;
		andwarUnsubmitted Done Reply Inline Actions Please, could this block) of code (and it's sunblocks) could be documented/commented consistently with the rest of the file? E.g.: // Prefetches - pattern definitions // Having said that, this file is growing organically so we may want to revisit the current style. andwar: Please, could this block) of code (and it's sunblocks) could be documented/commented…
		fpetrogalliAuthorUnsubmitted Done Reply Inline Actions I have removed these `PatFrag`, they are not needed anymore in last patch. fpetrogalli: I have removed these `PatFrag`, they are not needed anymore in last patch.
def SVELShiftImm64 : ComplexPattern<i32, 1, "SelectSVEShiftImm64<0, 64>", []>;		def SVELShiftImm64 : ComplexPattern<i32, 1, "SelectSVEShiftImm64<0, 64>", []>;

// Non-faulting loads - node definitions		// Non-faulting loads - node definitions
//		//
def SDT_AArch64_LDNF1 : SDTypeProfile<1, 3, [		def SDT_AArch64_LDNF1 : SDTypeProfile<1, 3, [
		andwarUnsubmitted Done Reply Inline Actions Why extra indentation? andwar: Why extra indentation?
		fpetrogalliAuthorUnsubmitted Done Reply Inline Actions PatFrags removed from the patch. fpetrogalli: PatFrags removed from the patch.
SDTCisVec<0>, SDTCisVec<1>, SDTCisPtrTy<2>,		SDTCisVec<0>, SDTCisVec<1>, SDTCisPtrTy<2>,
SDTCVecEltisVT<1,i1>, SDTCisSameNumEltsAs<0,1>		SDTCVecEltisVT<1,i1>, SDTCisSameNumEltsAs<0,1>
]>;		]>;

def AArch64ldnf1 : SDNode<"AArch64ISD::LDNF1", SDT_AArch64_LDNF1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;		def AArch64ldnf1 : SDNode<"AArch64ISD::LDNF1", SDT_AArch64_LDNF1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;
def AArch64ldff1 : SDNode<"AArch64ISD::LDFF1", SDT_AArch64_LDNF1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;		def AArch64ldff1 : SDNode<"AArch64ISD::LDFF1", SDT_AArch64_LDNF1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;

def AArch64ldnf1s : SDNode<"AArch64ISD::LDNF1S", SDT_AArch64_LDNF1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;		def AArch64ldnf1s : SDNode<"AArch64ISD::LDNF1S", SDT_AArch64_LDNF1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;
def AArch64ldff1s : SDNode<"AArch64ISD::LDFF1S", SDT_AArch64_LDNF1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;		def AArch64ldff1s : SDNode<"AArch64ISD::LDFF1S", SDT_AArch64_LDNF1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;

// Gather loads - node definitions		// Gather loads - node definitions
//		//
def SDT_AArch64_GATHER_SV : SDTypeProfile<1, 4, [		def SDT_AArch64_GATHER_SV : SDTypeProfile<1, 4, [
SDTCisVec<0>, SDTCisVec<1>, SDTCisPtrTy<2>, SDTCisVec<3>, SDTCisVT<4, OtherVT>,		SDTCisVec<0>, SDTCisVec<1>, SDTCisPtrTy<2>, SDTCisVec<3>, SDTCisVT<4, OtherVT>,
SDTCVecEltisVT<1,i1>, SDTCisSameNumEltsAs<0,1>		SDTCVecEltisVT<1,i1>, SDTCisSameNumEltsAs<0,1>
]>;		]>;

def SDT_AArch64_GATHER_VS : SDTypeProfile<1, 4, [		def SDT_AArch64_GATHER_VS : SDTypeProfile<1, 4, [
SDTCisVec<0>, SDTCisVec<1>, SDTCisVec<2>, SDTCisInt<3>, SDTCisVT<4, OtherVT>,		SDTCisVec<0>, SDTCisVec<1>, SDTCisVec<2>, SDTCisInt<3>, SDTCisVT<4, OtherVT>,
SDTCVecEltisVT<1,i1>, SDTCisSameNumEltsAs<0,1>		SDTCVecEltisVT<1,i1>, SDTCisSameNumEltsAs<0,1>
]>;		]>;

		sdesmalenUnsubmitted Done Reply Inline Actions nit: format to split statement on two lines? sdesmalen: nit: format to split statement on two lines?
		andwarUnsubmitted Done Reply Inline Actions IMO this would be easier to read if it was split across two lines. andwar: IMO this would be easier to read if it was split across two lines.
def AArch64ld1_gather : SDNode<"AArch64ISD::GLD1", SDT_AArch64_GATHER_SV, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;		def AArch64ld1_gather : SDNode<"AArch64ISD::GLD1", SDT_AArch64_GATHER_SV, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;
def AArch64ld1_gather_scaled : SDNode<"AArch64ISD::GLD1_SCALED", SDT_AArch64_GATHER_SV, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;		def AArch64ld1_gather_scaled : SDNode<"AArch64ISD::GLD1_SCALED", SDT_AArch64_GATHER_SV, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;
def AArch64ld1_gather_uxtw : SDNode<"AArch64ISD::GLD1_UXTW", SDT_AArch64_GATHER_SV, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;		def AArch64ld1_gather_uxtw : SDNode<"AArch64ISD::GLD1_UXTW", SDT_AArch64_GATHER_SV, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;
def AArch64ld1_gather_sxtw : SDNode<"AArch64ISD::GLD1_SXTW", SDT_AArch64_GATHER_SV, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;		def AArch64ld1_gather_sxtw : SDNode<"AArch64ISD::GLD1_SXTW", SDT_AArch64_GATHER_SV, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;
def AArch64ld1_gather_uxtw_scaled : SDNode<"AArch64ISD::GLD1_UXTW_SCALED", SDT_AArch64_GATHER_SV, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;		def AArch64ld1_gather_uxtw_scaled : SDNode<"AArch64ISD::GLD1_UXTW_SCALED", SDT_AArch64_GATHER_SV, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;
def AArch64ld1_gather_sxtw_scaled : SDNode<"AArch64ISD::GLD1_SXTW_SCALED", SDT_AArch64_GATHER_SV, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;		def AArch64ld1_gather_sxtw_scaled : SDNode<"AArch64ISD::GLD1_SXTW_SCALED", SDT_AArch64_GATHER_SV, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;
def AArch64ld1_gather_imm : SDNode<"AArch64ISD::GLD1_IMM", SDT_AArch64_GATHER_VS, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;		def AArch64ld1_gather_imm : SDNode<"AArch64ISD::GLD1_IMM", SDT_AArch64_GATHER_VS, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;

▲ Show 20 Lines • Show All 826 Lines • ▼ Show 20 Lines	multiclass sve_prefetch<SDPatternOperator prefetch, ValueType PredTy, Instruction RegImmInst, Instruction RegRegInst, int scale, ComplexPattern AddrCP> {

defm : sve_prefetch<int_aarch64_sve_prf, nxv16i1, PRFB_PRI, PRFB_PRR, 0, am_sve_regreg_lsl0>;		defm : sve_prefetch<int_aarch64_sve_prf, nxv16i1, PRFB_PRI, PRFB_PRR, 0, am_sve_regreg_lsl0>;
defm : sve_prefetch<int_aarch64_sve_prf, nxv8i1, PRFH_PRI, PRFH_PRR, 1, am_sve_regreg_lsl1>;		defm : sve_prefetch<int_aarch64_sve_prf, nxv8i1, PRFH_PRI, PRFH_PRR, 1, am_sve_regreg_lsl1>;
defm : sve_prefetch<int_aarch64_sve_prf, nxv4i1, PRFW_PRI, PRFS_PRR, 2, am_sve_regreg_lsl2>;		defm : sve_prefetch<int_aarch64_sve_prf, nxv4i1, PRFW_PRI, PRFS_PRR, 2, am_sve_regreg_lsl2>;
defm : sve_prefetch<int_aarch64_sve_prf, nxv2i1, PRFD_PRI, PRFD_PRR, 3, am_sve_regreg_lsl3>;		defm : sve_prefetch<int_aarch64_sve_prf, nxv2i1, PRFD_PRI, PRFD_PRR, 3, am_sve_regreg_lsl3>;

// Gather prefetch using scaled 32-bit offsets, e.g.		// Gather prefetch using scaled 32-bit offsets, e.g.
// prfh pldl1keep, p0, [x0, z0.s, uxtw #1]		// prfh pldl1keep, p0, [x0, z0.s, uxtw #1]
defm PRFB_S : sve_mem_32b_prfm_sv_scaled<0b00, "prfb", ZPR32ExtSXTW8Only, ZPR32ExtUXTW8Only>;		defm PRFB_S : sve_mem_32b_prfm_sv_scaled<0b00, "prfb", ZPR32ExtSXTW8Only, ZPR32ExtUXTW8Only, int_aarch64_sve_gather_prfb_scaled_sxtw, int_aarch64_sve_gather_prfb_scaled_uxtw>;
defm PRFH_S : sve_mem_32b_prfm_sv_scaled<0b01, "prfh", ZPR32ExtSXTW16, ZPR32ExtUXTW16>;		defm PRFH_S : sve_mem_32b_prfm_sv_scaled<0b01, "prfh", ZPR32ExtSXTW16, ZPR32ExtUXTW16, int_aarch64_sve_gather_prfh_scaled_sxtw, int_aarch64_sve_gather_prfh_scaled_uxtw>;
defm PRFW_S : sve_mem_32b_prfm_sv_scaled<0b10, "prfw", ZPR32ExtSXTW32, ZPR32ExtUXTW32>;		defm PRFW_S : sve_mem_32b_prfm_sv_scaled<0b10, "prfw", ZPR32ExtSXTW32, ZPR32ExtUXTW32, int_aarch64_sve_gather_prfw_scaled_sxtw, int_aarch64_sve_gather_prfw_scaled_uxtw>;
defm PRFD_S : sve_mem_32b_prfm_sv_scaled<0b11, "prfd", ZPR32ExtSXTW64, ZPR32ExtUXTW64>;		defm PRFD_S : sve_mem_32b_prfm_sv_scaled<0b11, "prfd", ZPR32ExtSXTW64, ZPR32ExtUXTW64, int_aarch64_sve_gather_prfd_scaled_sxtw, int_aarch64_sve_gather_prfd_scaled_uxtw>;

// Gather prefetch using unpacked, scaled 32-bit offsets, e.g.		// Gather prefetch using unpacked, scaled 32-bit offsets, e.g.
// prfh pldl1keep, p0, [x0, z0.d, uxtw #1]		// prfh pldl1keep, p0, [x0, z0.d, uxtw #1]
defm PRFB_D : sve_mem_64b_prfm_sv_ext_scaled<0b00, "prfb", ZPR64ExtSXTW8Only, ZPR64ExtUXTW8Only>;		defm PRFB_D : sve_mem_64b_prfm_sv_ext_scaled<0b00, "prfb", ZPR64ExtSXTW8Only, ZPR64ExtUXTW8Only, int_aarch64_sve_gather_prfb_scaled_sxtw, int_aarch64_sve_gather_prfb_scaled_uxtw>;
defm PRFH_D : sve_mem_64b_prfm_sv_ext_scaled<0b01, "prfh", ZPR64ExtSXTW16, ZPR64ExtUXTW16>;		defm PRFH_D : sve_mem_64b_prfm_sv_ext_scaled<0b01, "prfh", ZPR64ExtSXTW16, ZPR64ExtUXTW16, int_aarch64_sve_gather_prfh_scaled_sxtw, int_aarch64_sve_gather_prfh_scaled_uxtw>;
defm PRFW_D : sve_mem_64b_prfm_sv_ext_scaled<0b10, "prfw", ZPR64ExtSXTW32, ZPR64ExtUXTW32>;		defm PRFW_D : sve_mem_64b_prfm_sv_ext_scaled<0b10, "prfw", ZPR64ExtSXTW32, ZPR64ExtUXTW32, int_aarch64_sve_gather_prfw_scaled_sxtw, int_aarch64_sve_gather_prfw_scaled_uxtw>;
defm PRFD_D : sve_mem_64b_prfm_sv_ext_scaled<0b11, "prfd", ZPR64ExtSXTW64, ZPR64ExtUXTW64>;		defm PRFD_D : sve_mem_64b_prfm_sv_ext_scaled<0b11, "prfd", ZPR64ExtSXTW64, ZPR64ExtUXTW64, int_aarch64_sve_gather_prfd_scaled_sxtw, int_aarch64_sve_gather_prfd_scaled_uxtw>;

// Gather prefetch using scaled 64-bit offsets, e.g.		// Gather prefetch using scaled 64-bit offsets, e.g.
// prfh pldl1keep, p0, [x0, z0.d, lsl #1]		// prfh pldl1keep, p0, [x0, z0.d, lsl #1]
defm PRFB_D_SCALED : sve_mem_64b_prfm_sv_lsl_scaled<0b00, "prfb", ZPR64ExtLSL8>;		defm PRFB_D_SCALED : sve_mem_64b_prfm_sv_lsl_scaled<0b00, "prfb", ZPR64ExtLSL8, int_aarch64_sve_gather_prfb_scaled>;
defm PRFH_D_SCALED : sve_mem_64b_prfm_sv_lsl_scaled<0b01, "prfh", ZPR64ExtLSL16>;		defm PRFH_D_SCALED : sve_mem_64b_prfm_sv_lsl_scaled<0b01, "prfh", ZPR64ExtLSL16, int_aarch64_sve_gather_prfh_scaled>;
defm PRFW_D_SCALED : sve_mem_64b_prfm_sv_lsl_scaled<0b10, "prfw", ZPR64ExtLSL32>;		defm PRFW_D_SCALED : sve_mem_64b_prfm_sv_lsl_scaled<0b10, "prfw", ZPR64ExtLSL32, int_aarch64_sve_gather_prfw_scaled>;
defm PRFD_D_SCALED : sve_mem_64b_prfm_sv_lsl_scaled<0b11, "prfd", ZPR64ExtLSL64>;		defm PRFD_D_SCALED : sve_mem_64b_prfm_sv_lsl_scaled<0b11, "prfd", ZPR64ExtLSL64, int_aarch64_sve_gather_prfd_scaled>;

// Gather prefetch using 32/64-bit pointers with offset, e.g.		// Gather prefetch using 32/64-bit pointers with offset, e.g.
// prfh pldl1keep, p0, [z0.s, #16]		// prfh pldl1keep, p0, [z0.s, #16]
// prfh pldl1keep, p0, [z0.d, #16]		// prfh pldl1keep, p0, [z0.d, #16]
defm PRFB_S_PZI : sve_mem_32b_prfm_vi<0b00, "prfb", imm0_31>;		defm PRFB_S_PZI : sve_mem_32b_prfm_vi<0b00, "prfb", imm0_31, int_aarch64_sve_gather_prfb>;
defm PRFH_S_PZI : sve_mem_32b_prfm_vi<0b01, "prfh", uimm5s2>;		defm PRFH_S_PZI : sve_mem_32b_prfm_vi<0b01, "prfh", uimm5s2, int_aarch64_sve_gather_prfh>;
defm PRFW_S_PZI : sve_mem_32b_prfm_vi<0b10, "prfw", uimm5s4>;		defm PRFW_S_PZI : sve_mem_32b_prfm_vi<0b10, "prfw", uimm5s4, int_aarch64_sve_gather_prfw>;
defm PRFD_S_PZI : sve_mem_32b_prfm_vi<0b11, "prfd", uimm5s8>;		defm PRFD_S_PZI : sve_mem_32b_prfm_vi<0b11, "prfd", uimm5s8, int_aarch64_sve_gather_prfd>;

defm PRFB_D_PZI : sve_mem_64b_prfm_vi<0b00, "prfb", imm0_31>;		defm PRFB_D_PZI : sve_mem_64b_prfm_vi<0b00, "prfb", imm0_31, int_aarch64_sve_gather_prfb>;
defm PRFH_D_PZI : sve_mem_64b_prfm_vi<0b01, "prfh", uimm5s2>;		defm PRFH_D_PZI : sve_mem_64b_prfm_vi<0b01, "prfh", uimm5s2, int_aarch64_sve_gather_prfh>;
defm PRFW_D_PZI : sve_mem_64b_prfm_vi<0b10, "prfw", uimm5s4>;		defm PRFW_D_PZI : sve_mem_64b_prfm_vi<0b10, "prfw", uimm5s4, int_aarch64_sve_gather_prfw>;
defm PRFD_D_PZI : sve_mem_64b_prfm_vi<0b11, "prfd", uimm5s8>;		defm PRFD_D_PZI : sve_mem_64b_prfm_vi<0b11, "prfd", uimm5s8, int_aarch64_sve_gather_prfd>;

defm ADR_SXTW_ZZZ_D : sve_int_bin_cons_misc_0_a_sxtw<0b00, "adr">;		defm ADR_SXTW_ZZZ_D : sve_int_bin_cons_misc_0_a_sxtw<0b00, "adr">;
defm ADR_UXTW_ZZZ_D : sve_int_bin_cons_misc_0_a_uxtw<0b01, "adr">;		defm ADR_UXTW_ZZZ_D : sve_int_bin_cons_misc_0_a_uxtw<0b01, "adr">;
defm ADR_LSL_ZZZ_S : sve_int_bin_cons_misc_0_a_32_lsl<0b10, "adr">;		defm ADR_LSL_ZZZ_S : sve_int_bin_cons_misc_0_a_32_lsl<0b10, "adr">;
defm ADR_LSL_ZZZ_D : sve_int_bin_cons_misc_0_a_64_lsl<0b11, "adr">;		defm ADR_LSL_ZZZ_D : sve_int_bin_cons_misc_0_a_64_lsl<0b11, "adr">;

def : Pat<(nxv4i32 (int_aarch64_sve_adrb nxv4i32:$Op1, nxv4i32:$Op2)),		def : Pat<(nxv4i32 (int_aarch64_sve_adrb nxv4i32:$Op1, nxv4i32:$Op2)),
(ADR_LSL_ZZZ_S_0 $Op1, $Op2)>;		(ADR_LSL_ZZZ_S_0 $Op1, $Op2)>;
▲ Show 20 Lines • Show All 1,108 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/MCTargetDesc/AArch64AddressingModes.h

Show First 20 Lines • Show All 834 Lines • ▼ Show 20 Lines	inline static bool isAnyMOVWMovAlias(uint64_t Value, int RegWidth) {
// It's not a MOVZ, but it might be a MOVN.		// It's not a MOVZ, but it might be a MOVN.
Value = ~Value;		Value = ~Value;
if (RegWidth == 32)		if (RegWidth == 32)
Value &= 0xffffffffULL;		Value &= 0xffffffffULL;

return isAnyMOVZMovAlias(Value, RegWidth);		return isAnyMOVZMovAlias(Value, RegWidth);
}		}

		/// Check if the value of \p OffsetInBytes can be used as an immediate for
		/// the gather load/prefetch and scatter store instructions with vector base and
		/// immediate offset addressing mode:
		///
		/// [<Zn>.[S\|D]{, #<imm>}]
		///
		/// where <imm> = sizeof(<T>) * k, for k = 0, 1, ..., 31.
		static bool isValidImmForSVEVecImmAddrMode(unsigned OffsetInBytes,
		unsigned ScalarSizeInBytes) {
		// The immediate is not a multiple of the scalar size.
		if (OffsetInBytes % ScalarSizeInBytes)
		return false;

		// The immediate is out of range.
		if (OffsetInBytes / ScalarSizeInBytes > 31)
		return false;

		return true;
		}

} // end namespace AArch64_AM		} // end namespace AArch64_AM

} // end namespace llvm		} // end namespace llvm

#endif		#endif

llvm/lib/Target/AArch64/SVEInstrFormats.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,449 Lines • ▼ Show 20 Lines	: I<(outs), (ins sve_prfop:$prfop, PPR3bAny:$Pg, GPR64sp:$Rn, zprext:$Zm),
let Inst{4} = 0b0;		let Inst{4} = 0b0;
let Inst{3-0} = prfop;		let Inst{3-0} = prfop;

let hasSideEffects = 1;		let hasSideEffects = 1;
}		}

multiclass sve_mem_32b_prfm_sv_scaled<bits<2> msz, string asm,		multiclass sve_mem_32b_prfm_sv_scaled<bits<2> msz, string asm,
RegisterOperand sxtw_opnd,		RegisterOperand sxtw_opnd,
RegisterOperand uxtw_opnd> {		RegisterOperand uxtw_opnd,
		PatFrag op_sxtw,
		PatFrag op_uxtw> {
def _UXTW_SCALED : sve_mem_32b_prfm_sv<msz, 0, asm, uxtw_opnd>;		def _UXTW_SCALED : sve_mem_32b_prfm_sv<msz, 0, asm, uxtw_opnd>;
def _SXTW_SCALED : sve_mem_32b_prfm_sv<msz, 1, asm, sxtw_opnd>;		def _SXTW_SCALED : sve_mem_32b_prfm_sv<msz, 1, asm, sxtw_opnd>;

		sdesmalenUnsubmitted Done Reply Inline Actions nit: this is some odd formatting here. sdesmalen: nit: this is some odd formatting here.
		def : Pat<(op_uxtw (nxv4i1 PPR3bAny:$Pg), (i64 GPR64sp:$Rn), (nxv4i32 uxtw_opnd:$Zm), (i32 sve_prfop:$prfop)),
		(!cast<Instruction>(NAME # _UXTW_SCALED) sve_prfop:$prfop, PPR3bAny:$Pg, GPR64sp:$Rn, uxtw_opnd:$Zm)>;

		def : Pat<(op_sxtw (nxv4i1 PPR3bAny:$Pg), (i64 GPR64sp:$Rn), (nxv4i32 sxtw_opnd:$Zm), (i32 sve_prfop:$prfop)),
		(!cast<Instruction>(NAME # _SXTW_SCALED) sve_prfop:$prfop, PPR3bAny:$Pg, GPR64sp:$Rn, sxtw_opnd:$Zm)>;
}		}

class sve_mem_32b_prfm_vi<bits<2> msz, string asm, Operand imm_ty>		class sve_mem_32b_prfm_vi<bits<2> msz, string asm, Operand imm_ty>
: I<(outs), (ins sve_prfop:$prfop, PPR3bAny:$Pg, ZPR32:$Zn, imm_ty:$imm5),		: I<(outs), (ins sve_prfop:$prfop, PPR3bAny:$Pg, ZPR32:$Zn, imm_ty:$imm5),
asm, "\t$prfop, $Pg, [$Zn, $imm5]",		asm, "\t$prfop, $Pg, [$Zn, $imm5]",
"",		"",
[]>, Sched<[]> {		[]>, Sched<[]> {
bits<3> Pg;		bits<3> Pg;
bits<5> Zn;		bits<5> Zn;
bits<5> imm5;		bits<5> imm5;
bits<4> prfop;		bits<4> prfop;
let Inst{31-25} = 0b1000010;		let Inst{31-25} = 0b1000010;
let Inst{24-23} = msz;		let Inst{24-23} = msz;
let Inst{22-21} = 0b00;		let Inst{22-21} = 0b00;
let Inst{20-16} = imm5;		let Inst{20-16} = imm5;
let Inst{15-13} = 0b111;		let Inst{15-13} = 0b111;
let Inst{12-10} = Pg;		let Inst{12-10} = Pg;
let Inst{9-5} = Zn;		let Inst{9-5} = Zn;
let Inst{4} = 0b0;		let Inst{4} = 0b0;
let Inst{3-0} = prfop;		let Inst{3-0} = prfop;
}		}

multiclass sve_mem_32b_prfm_vi<bits<2> msz, string asm, Operand imm_ty> {		multiclass sve_mem_32b_prfm_vi<bits<2> msz, string asm, Operand imm_ty, SDPatternOperator op> {
def NAME : sve_mem_32b_prfm_vi<msz, asm, imm_ty>;		def NAME : sve_mem_32b_prfm_vi<msz, asm, imm_ty>;

def : InstAlias<asm # "\t$prfop, $Pg, [$Zn]",		def : InstAlias<asm # "\t$prfop, $Pg, [$Zn]",
		andwarUnsubmitted Done Reply Inline Actions [nit] Unwritten rule - `Pat` always follow `InstAlias`. andwar: [nit] Unwritten rule - `Pat` always follow `InstAlias`.
(!cast<Instruction>(NAME) sve_prfop:$prfop, PPR3bAny:$Pg, ZPR32:$Zn, 0), 1>;		(!cast<Instruction>(NAME) sve_prfop:$prfop, PPR3bAny:$Pg, ZPR32:$Zn, 0), 1>;

		def : Pat<(op (nxv4i1 PPR_3b:$Pg), (nxv4i32 ZPR32:$Zn), (i64 imm_ty:$imm), (i32 sve_prfop:$prfop)),
		(!cast<Instruction>(NAME) sve_prfop:$prfop, PPR_3b:$Pg, ZPR32:$Zn, imm_ty:$imm)>;
}		}

class sve_mem_z_fill<string asm>		class sve_mem_z_fill<string asm>
: I<(outs ZPRAny:$Zt), (ins GPR64sp:$Rn, simm9:$imm9),		: I<(outs ZPRAny:$Zt), (ins GPR64sp:$Rn, simm9:$imm9),
asm, "\t$Zt, [$Rn, $imm9, mul vl]",		asm, "\t$Zt, [$Rn, $imm9, mul vl]",
"",		"",
[]>, Sched<[]> {		[]>, Sched<[]> {
bits<5> Rn;		bits<5> Rn;
▲ Show 20 Lines • Show All 297 Lines • ▼ Show 20 Lines	: I<(outs), (ins sve_prfop:$prfop, PPR3bAny:$Pg, GPR64sp:$Rn, zprext:$Zm),
let Inst{4} = 0b0;		let Inst{4} = 0b0;
let Inst{3-0} = prfop;		let Inst{3-0} = prfop;

let hasSideEffects = 1;		let hasSideEffects = 1;
}		}

multiclass sve_mem_64b_prfm_sv_ext_scaled<bits<2> msz, string asm,		multiclass sve_mem_64b_prfm_sv_ext_scaled<bits<2> msz, string asm,
RegisterOperand sxtw_opnd,		RegisterOperand sxtw_opnd,
RegisterOperand uxtw_opnd> {		RegisterOperand uxtw_opnd,
		PatFrag op_sxtw,
		PatFrag op_uxtw> {
def _UXTW_SCALED : sve_mem_64b_prfm_sv<msz, 0, 0, asm, uxtw_opnd>;		def _UXTW_SCALED : sve_mem_64b_prfm_sv<msz, 0, 0, asm, uxtw_opnd>;
def _SXTW_SCALED : sve_mem_64b_prfm_sv<msz, 1, 0, asm, sxtw_opnd>;		def _SXTW_SCALED : sve_mem_64b_prfm_sv<msz, 1, 0, asm, sxtw_opnd>;

		def : Pat<(op_uxtw (nxv2i1 PPR3bAny:$Pg), (i64 GPR64sp:$Rn), (nxv2i64 uxtw_opnd:$Zm), (i32 sve_prfop:$prfop)),
		(!cast<Instruction>(NAME # _UXTW_SCALED) sve_prfop:$prfop, PPR3bAny:$Pg, GPR64sp:$Rn, uxtw_opnd:$Zm)>;

		def : Pat<(op_sxtw (nxv2i1 PPR3bAny:$Pg), (i64 GPR64sp:$Rn), (nxv2i64 sxtw_opnd:$Zm), (i32 sve_prfop:$prfop)),
		(!cast<Instruction>(NAME # _SXTW_SCALED) sve_prfop:$prfop, PPR3bAny:$Pg, GPR64sp:$Rn, sxtw_opnd:$Zm)>;

}		}

multiclass sve_mem_64b_prfm_sv_lsl_scaled<bits<2> msz, string asm,		multiclass sve_mem_64b_prfm_sv_lsl_scaled<bits<2> msz, string asm,
RegisterOperand zprext> {		RegisterOperand zprext, PatFrag frag> {
def NAME : sve_mem_64b_prfm_sv<msz, 1, 1, asm, zprext>;		def NAME : sve_mem_64b_prfm_sv<msz, 1, 1, asm, zprext>;

		def : Pat<(frag (nxv2i1 PPR3bAny:$Pg), (i64 GPR64sp:$Rn), (nxv2i64 zprext:$Zm), (i32 sve_prfop:$prfop)),
		(!cast<Instruction>(NAME) sve_prfop:$prfop, PPR3bAny:$Pg, GPR64sp:$Rn, zprext:$Zm)>;

}		}


class sve_mem_64b_prfm_vi<bits<2> msz, string asm, Operand imm_ty>		class sve_mem_64b_prfm_vi<bits<2> msz, string asm, Operand imm_ty>
: I<(outs), (ins sve_prfop:$prfop, PPR3bAny:$Pg, ZPR64:$Zn, imm_ty:$imm5),		: I<(outs), (ins sve_prfop:$prfop, PPR3bAny:$Pg, ZPR64:$Zn, imm_ty:$imm5),
asm, "\t$prfop, $Pg, [$Zn, $imm5]",		asm, "\t$prfop, $Pg, [$Zn, $imm5]",
"",		"",
[]>, Sched<[]> {		[]>, Sched<[]> {
Show All 9 Lines	: I<(outs), (ins sve_prfop:$prfop, PPR3bAny:$Pg, ZPR64:$Zn, imm_ty:$imm5),
let Inst{12-10} = Pg;		let Inst{12-10} = Pg;
let Inst{9-5} = Zn;		let Inst{9-5} = Zn;
let Inst{4} = 0b0;		let Inst{4} = 0b0;
let Inst{3-0} = prfop;		let Inst{3-0} = prfop;

let hasSideEffects = 1;		let hasSideEffects = 1;
}		}

multiclass sve_mem_64b_prfm_vi<bits<2> msz, string asm, Operand imm_ty> {		multiclass sve_mem_64b_prfm_vi<bits<2> msz, string asm, Operand imm_ty, SDPatternOperator op> {
def NAME : sve_mem_64b_prfm_vi<msz, asm, imm_ty>;		def NAME : sve_mem_64b_prfm_vi<msz, asm, imm_ty>;

def : InstAlias<asm # "\t$prfop, $Pg, [$Zn]",		def : InstAlias<asm # "\t$prfop, $Pg, [$Zn]",
		andwarUnsubmitted Done Reply Inline Actions [nit] `Pat` follow `InstAlias` andwar: [nit] `Pat` follow `InstAlias`
(!cast<Instruction>(NAME) sve_prfop:$prfop, PPR3bAny:$Pg, ZPR64:$Zn, 0), 1>;		(!cast<Instruction>(NAME) sve_prfop:$prfop, PPR3bAny:$Pg, ZPR64:$Zn, 0), 1>;

		def : Pat<(op (nxv2i1 PPR_3b:$Pg), (nxv2i64 ZPR32:$Zn), (i64 imm_ty:$imm), (i32 sve_prfop:$prfop)),
		(!cast<Instruction>(NAME) sve_prfop:$prfop, PPR_3b:$Pg, ZPR32:$Zn, imm_ty:$imm)>;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// SVE Compute Vector Address Group		// SVE Compute Vector Address Group
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

class sve_int_bin_cons_misc_0_a<bits<2> opc, bits<2> msz, string asm,		class sve_int_bin_cons_misc_0_a<bits<2> opc, bits<2> msz, string asm,
ZPRRegOp zprty, RegisterOperand zprext>		ZPRRegOp zprty, RegisterOperand zprext>
▲ Show 20 Lines • Show All 481 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-intrinsics-gather-prefetches-scaled-offset.ll

This file was added.

				; RUN: llc -mtriple=aarch64--linux-gnu -mattr=+sve --asm-verbose=false < %s \| FileCheck %s

				; PRFB <prfop>, <Pg>, [<Xn\|SP>, <Zm>.S, <mod>] -> 32-bit scaled offset
				define void @llvm_aarch64_sve_gather_prfb_scaled_uxtw_nx4vi32(<vscale x 4 x i1> %Pg, i8* %base, <vscale x 4 x i32> %offset) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfb_scaled_uxtw_nx4vi32:
				sdesmalenUnsubmitted Done Reply Inline Actions nit: Most tests have the declarations at the bottom. nit: There are some random newlines in this file. sdesmalen: nit: Most tests have the declarations at the bottom. nit: There are some random newlines in…
				; CHECK-NEXT: prfb pldl1strm, p0, [x0, z0.s, uxtw]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfb.scaled.uxtw.nx4vi32(<vscale x 4 x i1> %Pg, i8* %base, <vscale x 4 x i32> %offset, i32 1)
				ret void
				}

				define void @llvm_aarch64_sve_gather_prfb_scaled_sxtw_nx4vi32(<vscale x 4 x i1> %Pg, i8* %base, <vscale x 4 x i32> %offset) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfb_scaled_sxtw_nx4vi32:
				; CHECK-NEXT: prfb pldl1strm, p0, [x0, z0.s, sxtw]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfb.scaled.sxtw.nx4vi32(<vscale x 4 x i1> %Pg, i8* %base, <vscale x 4 x i32> %offset, i32 1)
				ret void
				}

				; PRFB <prfop>, <Pg>, [<Xn\|SP>, <Zm>.D, <mod>] -> 32-bit unpacked scaled offset

				define void @llvm_aarch64_sve_gather_prfb_scaled_uxtw_nx2vi64(<vscale x 2 x i1> %Pg, i8* %base, <vscale x 2 x i32> %offset) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfb_scaled_uxtw_nx2vi64:
				; CHECK-NEXT: prfb pldl1strm, p0, [x0, z0.d, uxtw]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfb.scaled.uxtw.nx2vi64(<vscale x 2 x i1> %Pg, i8* %base, <vscale x 2 x i32> %offset, i32 1)
				ret void
				}

				define void @llvm_aarch64_sve_gather_prfb_scaled_sxtw_nx2vi64(<vscale x 2 x i1> %Pg, i8* %base, <vscale x 2 x i32> %offset) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfb_scaled_sxtw_nx2vi64:
				; CHECK-NEXT: prfb pldl1strm, p0, [x0, z0.d, sxtw]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfb.scaled.sxtw.nx2vi64(<vscale x 2 x i1> %Pg, i8* %base, <vscale x 2 x i32> %offset, i32 1)
				ret void
				}
				; PRFB <prfop>, <Pg>, [<Xn\|SP>, <Zm>.D] -> 64-bit scaled offset
				define void @llvm_aarch64_sve_gather_prfb_scaled_nx2vi64(<vscale x 2 x i1> %Pg, i8* %base, <vscale x 2 x i64> %offset) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfb_scaled_nx2vi64:
				; CHECK-NEXT: prfb pldl1strm, p0, [x0, z0.d]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfb.scaled.nx2vi64(<vscale x 2 x i1> %Pg, i8* %base, <vscale x 2 x i64> %offset, i32 1)
				ret void
				}

				;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

				; PRFH <prfop>, <Pg>, [<Xn\|SP>, <Zm>.S, <mod>] -> 32-bit scaled offset
				define void @llvm_aarch64_sve_gather_prfh_scaled_uxtw_nx4vi32(<vscale x 4 x i1> %Pg, i8* %base, <vscale x 4 x i32> %offset) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfh_scaled_uxtw_nx4vi32:
				; CHECK-NEXT: prfh pldl1strm, p0, [x0, z0.s, uxtw #1]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfh.scaled.uxtw.nx4vi32(<vscale x 4 x i1> %Pg, i8* %base, <vscale x 4 x i32> %offset, i32 1)
				ret void
				}

				define void @llvm_aarch64_sve_gather_prfh_scaled_sxtw_nx4vi32(<vscale x 4 x i1> %Pg, i8* %base, <vscale x 4 x i32> %offset) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfh_scaled_sxtw_nx4vi32:
				; CHECK-NEXT: prfh pldl1strm, p0, [x0, z0.s, sxtw #1]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfh.scaled.sxtw.nx4vi32(<vscale x 4 x i1> %Pg, i8* %base, <vscale x 4 x i32> %offset, i32 1)
				ret void
				}

				; PRFH <prfop>, <Pg>, [<Xn\|SP>, <Zm>.D, <mod> #1] -> 32-bit unpacked scaled offset
				define void @llvm_aarch64_sve_gather_prfh_scaled_uxtw_nx2vi64(<vscale x 2 x i1> %Pg, i8* %base, <vscale x 2 x i32> %offset) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfh_scaled_uxtw_nx2vi64:
				; CHECK-NEXT: prfh pldl1strm, p0, [x0, z0.d, uxtw #1]
				; CHECK-NEXT: ret
				andwarUnsubmitted Done Reply Inline Actions [nit] Inconsistent formatting andwar: [nit] Inconsistent formatting
				call void @llvm.aarch64.sve.gather.prfh.scaled.uxtw.nx2vi64(<vscale x 2 x i1> %Pg, i8* %base, <vscale x 2 x i32> %offset, i32 1)
				ret void
				}

				define void @llvm_aarch64_sve_gather_prfh_scaled_sxtw_nx2vi64(<vscale x 2 x i1> %Pg, i8* %base, <vscale x 2 x i32> %offset) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfh_scaled_sxtw_nx2vi64:
				; CHECK-NEXT: prfh pldl1strm, p0, [x0, z0.d, sxtw #1]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfh.scaled.sxtw.nx2vi64(<vscale x 2 x i1> %Pg, i8* %base, <vscale x 2 x i32> %offset, i32 1)
				ret void
				}

				; PRFH <prfop>, <Pg>, [<Xn\|SP>, <Zm>.D] -> 64-bit scaled offset
				define void @llvm_aarch64_sve_gather_prfh_scaled_nx2vi64(<vscale x 2 x i1> %Pg, i8* %base, <vscale x 2 x i64> %offset) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfh_scaled_nx2vi64:
				; CHECK-NEXT: prfh pldl1strm, p0, [x0, z0.d, lsl #1]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfh.scaled.nx2vi64(<vscale x 2 x i1> %Pg, i8* %base, <vscale x 2 x i64> %offset, i32 1)
				ret void
				}

				;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

				; PRFW <prfop>, <Pg>, [<Xn\|SP>, <Zm>.S, <mod>] -> 32-bit scaled offset
				define void @llvm_aarch64_sve_gather_prfw_scaled_uxtw_nx4vi32(<vscale x 4 x i1> %Pg, i8* %base, <vscale x 4 x i32> %offset) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfw_scaled_uxtw_nx4vi32:
				; CHECK-NEXT: prfw pldl1strm, p0, [x0, z0.s, uxtw #2]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfw.scaled.uxtw.nx4vi32(<vscale x 4 x i1> %Pg, i8* %base, <vscale x 4 x i32> %offset, i32 1)
				ret void
				}

				define void @llvm_aarch64_sve_gather_prfw_scaled_sxtw_nx4vi32(<vscale x 4 x i1> %Pg, i8* %base, <vscale x 4 x i32> %offset) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfw_scaled_sxtw_nx4vi32:
				; CHECK-NEXT: prfw pldl1strm, p0, [x0, z0.s, sxtw #2]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfw.scaled.sxtw.nx4vi32(<vscale x 4 x i1> %Pg, i8* %base, <vscale x 4 x i32> %offset, i32 1)
				ret void
				}

				; PRFW <prfop>, <Pg>, [<Xn\|SP>, <Zm>.D, <mod> #2] -> 32-bit unpacked scaled offset
				define void @llvm_aarch64_sve_gather_prfw_scaled_uxtw_nx2vi64(<vscale x 2 x i1> %Pg, i8* %base, <vscale x 2 x i32> %offset) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfw_scaled_uxtw_nx2vi64:
				; CHECK-NEXT: prfw pldl1strm, p0, [x0, z0.d, uxtw #2]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfw.scaled.uxtw.nx2vi64(<vscale x 2 x i1> %Pg, i8* %base, <vscale x 2 x i32> %offset, i32 1)
				ret void
				}

				define void @llvm_aarch64_sve_gather_prfw_scaled_sxtw_nx2vi64(<vscale x 2 x i1> %Pg, i8* %base, <vscale x 2 x i32> %offset) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfw_scaled_sxtw_nx2vi64:
				; CHECK-NEXT: prfw pldl1strm, p0, [x0, z0.d, sxtw #2]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfw.scaled.sxtw.nx2vi64(<vscale x 2 x i1> %Pg, i8* %base, <vscale x 2 x i32> %offset, i32 1)
				ret void
				}

				; PRFW <prfop>, <Pg>, [<Xn\|SP>, <Zm>.D] -> 64-bit scaled offset
				define void @llvm_aarch64_sve_gather_prfw_scaled_nx2vi64(<vscale x 2 x i1> %Pg, i8* %base, <vscale x 2 x i64> %offset) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfw_scaled_nx2vi64:
				; CHECK-NEXT: prfw pldl1strm, p0, [x0, z0.d, lsl #2]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfw.scaled.nx2vi64(<vscale x 2 x i1> %Pg, i8* %base, <vscale x 2 x i64> %offset, i32 1)
				ret void
				fpetrogalliAuthorUnsubmitted Done Reply Inline Actions The offset vector here should be `<vscale x 2 x i32>`, not `<vscale x 2 x i64>`. I'll fix this. fpetrogalli: The offset vector here should be `<vscale x 2 x i32>`, not `<vscale x 2 x i64>`. I'll fix this.
				fpetrogalliAuthorUnsubmitted Done Reply Inline Actions This is now working, the offset uses the correct scalable vector `<vscale x 2 x i32>`. fpetrogalli: This is now working, the offset uses the correct scalable vector `<vscale x 2 x i32>`.
				}

				;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

				; PRFD <prfop>, <Pg>, [<Xn\|SP>, <Zm>.S, <mod>] -> 32-bit scaled offset
				define void @llvm_aarch64_sve_gather_prfd_scaled_uxtw_nx4vi32(<vscale x 4 x i1> %Pg, i8* %base, <vscale x 4 x i32> %offset) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfd_scaled_uxtw_nx4vi32:
				; CHECK-NEXT: prfd pldl1strm, p0, [x0, z0.s, uxtw #3]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfd.scaled.uxtw.nx4vi32(<vscale x 4 x i1> %Pg, i8* %base, <vscale x 4 x i32> %offset, i32 1)
				ret void
				}

				define void @llvm_aarch64_sve_gather_prfd_scaled_sxtw_nx4vi32(<vscale x 4 x i1> %Pg, i8* %base, <vscale x 4 x i32> %offset) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfd_scaled_sxtw_nx4vi32:
				; CHECK-NEXT: prfd pldl1strm, p0, [x0, z0.s, sxtw #3]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfd.scaled.sxtw.nx4vi32(<vscale x 4 x i1> %Pg, i8* %base, <vscale x 4 x i32> %offset, i32 1)
				ret void
				}

				; PRFD <prfop>, <Pg>, [<Xn\|SP>, <Zm>.D, <mod> #3] -> 32-bit unpacked scaled offset
				define void @llvm_aarch64_sve_gather_prfd_scaled_uxtw_nx2vi64(<vscale x 2 x i1> %Pg, i8* %base, <vscale x 2 x i32> %offset) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfd_scaled_uxtw_nx2vi64:
				; CHECK-NEXT: prfd pldl1strm, p0, [x0, z0.d, uxtw #3]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfd.scaled.uxtw.nx2vi64(<vscale x 2 x i1> %Pg, i8* %base, <vscale x 2 x i32> %offset, i32 1)
				ret void
				}

				define void @llvm_aarch64_sve_gather_prfd_scaled_sxtw_nx2vi64(<vscale x 2 x i1> %Pg, i8* %base, <vscale x 2 x i32> %offset) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfd_scaled_sxtw_nx2vi64:
				; CHECK-NEXT: prfd pldl1strm, p0, [x0, z0.d, sxtw #3]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfd.scaled.sxtw.nx2vi64(<vscale x 2 x i1> %Pg, i8* %base, <vscale x 2 x i32> %offset, i32 1)
				ret void
				}

				; PRFD <prfop>, <Pg>, [<Xn\|SP>, <Zm>.D] -> 64-bit scaled offset
				define void @llvm_aarch64_sve_gather_prfd_scaled_nx2vi64(<vscale x 2 x i1> %Pg, i8* %base, <vscale x 2 x i64> %offset) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfd_scaled_nx2vi64:
				; CHECK-NEXT: prfd pldl1strm, p0, [x0, z0.d, lsl #3]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfd.scaled.nx2vi64(<vscale x 2 x i1> %Pg, i8* %base, <vscale x 2 x i64> %offset, i32 1)
				ret void
				}

				declare void @llvm.aarch64.sve.gather.prfb.scaled.uxtw.nx4vi32(<vscale x 4 x i1> %Pg, i8* %base, <vscale x 4 x i32> %offset, i32 %prfop)
				declare void @llvm.aarch64.sve.gather.prfb.scaled.sxtw.nx4vi32(<vscale x 4 x i1> %Pg, i8* %base, <vscale x 4 x i32> %offset, i32 %prfop)
				declare void @llvm.aarch64.sve.gather.prfb.scaled.uxtw.nx2vi64(<vscale x 2 x i1> %Pg, i8* %base, <vscale x 2 x i32> %offset, i32 %prfop)
				declare void @llvm.aarch64.sve.gather.prfb.scaled.sxtw.nx2vi64(<vscale x 2 x i1> %Pg, i8* %base, <vscale x 2 x i32> %offset, i32 %prfop)
				declare void @llvm.aarch64.sve.gather.prfb.scaled.nx2vi64(<vscale x 2 x i1> %Pg, i8* %base, <vscale x 2 x i64> %offset, i32 %prfop)
				declare void @llvm.aarch64.sve.gather.prfh.scaled.uxtw.nx4vi32(<vscale x 4 x i1> %Pg, i8* %base, <vscale x 4 x i32> %offset, i32 %prfop)
				declare void @llvm.aarch64.sve.gather.prfh.scaled.sxtw.nx4vi32(<vscale x 4 x i1> %Pg, i8* %base, <vscale x 4 x i32> %offset, i32 %prfop)
				declare void @llvm.aarch64.sve.gather.prfh.scaled.uxtw.nx2vi64(<vscale x 2 x i1> %Pg, i8* %base, <vscale x 2 x i32> %offset, i32 %prfop)
				declare void @llvm.aarch64.sve.gather.prfh.scaled.sxtw.nx2vi64(<vscale x 2 x i1> %Pg, i8* %base, <vscale x 2 x i32> %offset, i32 %prfop)
				declare void @llvm.aarch64.sve.gather.prfh.scaled.nx2vi64(<vscale x 2 x i1> %Pg, i8* %base, <vscale x 2 x i64> %offset, i32 %prfop)
				declare void @llvm.aarch64.sve.gather.prfw.scaled.uxtw.nx4vi32(<vscale x 4 x i1> %Pg, i8* %base, <vscale x 4 x i32> %offset, i32 %prfop)
				declare void @llvm.aarch64.sve.gather.prfw.scaled.sxtw.nx4vi32(<vscale x 4 x i1> %Pg, i8* %base, <vscale x 4 x i32> %offset, i32 %prfop)
				declare void @llvm.aarch64.sve.gather.prfw.scaled.uxtw.nx2vi64(<vscale x 2 x i1> %Pg, i8* %base, <vscale x 2 x i32> %offset, i32 %prfop)
				declare void @llvm.aarch64.sve.gather.prfw.scaled.sxtw.nx2vi64(<vscale x 2 x i1> %Pg, i8* %base, <vscale x 2 x i32> %offset, i32 %prfop)
				declare void @llvm.aarch64.sve.gather.prfw.scaled.nx2vi64(<vscale x 2 x i1> %Pg, i8* %base, <vscale x 2 x i64> %offset, i32 %prfop)
				declare void @llvm.aarch64.sve.gather.prfd.scaled.uxtw.nx4vi32(<vscale x 4 x i1> %Pg, i8* %base, <vscale x 4 x i32> %offset, i32 %prfop)
				declare void @llvm.aarch64.sve.gather.prfd.scaled.sxtw.nx4vi32(<vscale x 4 x i1> %Pg, i8* %base, <vscale x 4 x i32> %offset, i32 %prfop)
				declare void @llvm.aarch64.sve.gather.prfd.scaled.uxtw.nx2vi64(<vscale x 2 x i1> %Pg, i8* %base, <vscale x 2 x i32> %offset, i32 %prfop)
				declare void @llvm.aarch64.sve.gather.prfd.scaled.sxtw.nx2vi64(<vscale x 2 x i1> %Pg, i8* %base, <vscale x 2 x i32> %offset, i32 %prfop)
				declare void @llvm.aarch64.sve.gather.prfd.scaled.nx2vi64(<vscale x 2 x i1> %Pg, i8* %base, <vscale x 2 x i64> %offset, i32 %prfop)
				andwarUnsubmitted Done Reply Inline Actions [nit] many empty lines andwar: [nit] many empty lines

llvm/test/CodeGen/AArch64/sve-intrinsics-gather-prefetches-vect-base-imm-offset.ll

This file was added.

				; RUN: llc -mtriple=aarch64--linux-gnu -mattr=+sve --asm-verbose=false < %s \| FileCheck %s

				; PRFB <prfop>, <Pg>, [<Zn>.S{, #<imm>}] -> 32-bit element
				define void @llvm_aarch64_sve_gather_prfb_nx4vi32(<vscale x 4 x i32> %bases, <vscale x 4 x i1> %Pg) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfb_nx4vi32:
				; CHECK-NEXT: prfb pldl1strm, p0, [z0.s, #7]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfb.nx4vi32(<vscale x 4 x i1> %Pg, <vscale x 4 x i32> %bases, i64 7, i32 1)
				ret void
				}

				andwarUnsubmitted Done Reply Inline Actions What about indices that are out of range for `<prfop>, <Pg>, [<Zn>.S{, #<imm>}]`? IIUC, this is similar to what's tested here: https://github.com/llvm/llvm-project/blob/e60c28746b0cf30323e736f34048b02ff34688f6/llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-vector-base-imm-offset.ll#L284 Currently this is neither supported nor tested. andwar: What about indices that are out of range for `<prfop>, <Pg>, [<Zn>.S{, #<imm>}]`? IIUC, this…
				; PRFB <prfop>, <Pg>, [<Zn>.D{, #<imm>}] -> 64-bit element
				define void @llvm_aarch64_sve_gather_prfb_nx2vi64(<vscale x 2 x i64> %bases, <vscale x 2 x i1> %Pg) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfb_nx2vi64:
				; CHECK-NEXT: prfb pldl1strm, p0, [z0.d, #7]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfb.nx2vi64(<vscale x 2 x i1> %Pg, <vscale x 2 x i64> %bases, i64 7, i32 1)
				ret void
				}

				; PRFH <prfop>, <Pg>, [<Zn>.S{, #<imm>}] -> 32-bit element
				define void @llvm_aarch64_sve_gather_prfh_nx4vi32(<vscale x 4 x i32> %bases, <vscale x 4 x i1> %Pg) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfh_nx4vi32:
				; CHECK-NEXT: prfh pldl1strm, p0, [z0.s, #6]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfh.nx4vi32(<vscale x 4 x i1> %Pg, <vscale x 4 x i32> %bases, i64 6, i32 1)
				ret void
				}

				; PRFH <prfop>, <Pg>, [<Zn>.D{, #<imm>}] -> 64-bit element
				define void @llvm_aarch64_sve_gather_prfh_nx2vi64(<vscale x 2 x i64> %bases, <vscale x 2 x i1> %Pg) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfh_nx2vi64:
				; CHECK-NEXT: prfh pldl1strm, p0, [z0.d, #6]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfh.nx2vi64(<vscale x 2 x i1> %Pg, <vscale x 2 x i64> %bases, i64 6, i32 1)
				ret void
				}
				andwarUnsubmitted Done Reply Inline Actions [nit] empty line andwar: [nit] empty line

				; PRFW <prfop>, <Pg>, [<Zn>.S{, #<imm>}] -> 32-bit element
				define void @llvm_aarch64_sve_gather_prfw_nx4vi32(<vscale x 4 x i32> %bases, <vscale x 4 x i1> %Pg) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfw_nx4vi32:
				; CHECK-NEXT: prfw pldl1strm, p0, [z0.s, #12]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfw.nx4vi32(<vscale x 4 x i1> %Pg, <vscale x 4 x i32> %bases, i64 12, i32 1)
				ret void
				}

				; PRFW <prfop>, <Pg>, [<Zn>.D{, #<imm>}] -> 64-bit element
				define void @llvm_aarch64_sve_gather_prfw_nx2vi64(<vscale x 2 x i64> %bases, <vscale x 2 x i1> %Pg) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfw_nx2vi64:
				; CHECK-NEXT: prfw pldl1strm, p0, [z0.d, #12]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfw.nx2vi64(<vscale x 2 x i1> %Pg, <vscale x 2 x i64> %bases, i64 12, i32 1)
				ret void
				}

				; PRFD <prfop>, <Pg>, [<Zn>.S{, #<imm>}] -> 32-bit element
				define void @llvm_aarch64_sve_gather_prfd_nx4vi32(<vscale x 4 x i32> %bases, <vscale x 4 x i1> %Pg) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfd_nx4vi32:
				; CHECK-NEXT: prfd pldl1strm, p0, [z0.s, #16]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfd.nx4vi32(<vscale x 4 x i1> %Pg, <vscale x 4 x i32> %bases, i64 16, i32 1)
				ret void
				}

				; PRFD <prfop>, <Pg>, [<Zn>.D{, #<imm>}] -> 64-bit element
				define void @llvm_aarch64_sve_gather_prfd_nx2vi64(<vscale x 2 x i64> %bases, <vscale x 2 x i1> %Pg) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfd_nx2vi64:
				; CHECK-NEXT: prfd pldl1strm, p0, [z0.d, #16]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfd.nx2vi64(<vscale x 2 x i1> %Pg, <vscale x 2 x i64> %bases, i64 16, i32 1)
				ret void
				}

				declare void @llvm.aarch64.sve.gather.prfb.nx4vi32(<vscale x 4 x i1> %Pg, <vscale x 4 x i32> %bases, i64 %imm, i32 %prfop)
				declare void @llvm.aarch64.sve.gather.prfb.nx2vi64(<vscale x 2 x i1> %Pg, <vscale x 2 x i64> %bases, i64 %imm, i32 %prfop)
				declare void @llvm.aarch64.sve.gather.prfh.nx4vi32(<vscale x 4 x i1> %Pg, <vscale x 4 x i32> %bases, i64 %imm, i32 %prfop)
				declare void @llvm.aarch64.sve.gather.prfh.nx2vi64(<vscale x 2 x i1> %Pg, <vscale x 2 x i64> %bases, i64 %imm, i32 %prfop)
				declare void @llvm.aarch64.sve.gather.prfw.nx4vi32(<vscale x 4 x i1> %Pg, <vscale x 4 x i32> %bases, i64 %imm, i32 %prfop)
				declare void @llvm.aarch64.sve.gather.prfw.nx2vi64(<vscale x 2 x i1> %Pg, <vscale x 2 x i64> %bases, i64 %imm, i32 %prfop)
				declare void @llvm.aarch64.sve.gather.prfd.nx4vi32(<vscale x 4 x i1> %Pg, <vscale x 4 x i32> %bases, i64 %imm, i32 %prfop)
				declare void @llvm.aarch64.sve.gather.prfd.nx2vi64(<vscale x 2 x i1> %Pg, <vscale x 2 x i64> %bases, i64 %imm, i32 %prfop)

llvm/test/CodeGen/AArch64/sve-intrinsics-gather-prefetches-vect-base-invalid-imm-offset.ll

This file was added.

				; RUN: llc -mtriple=aarch64--linux-gnu -mattr=+sve --asm-verbose=false < %s \| FileCheck %s

				; PRFB <prfop>, <Pg>, [<Zn>.S{, #<imm>}] -> 32-bit element, imm = 0, 1, ..., 31
				define void @llvm_aarch64_sve_gather_prfb_nx4vi32_runtime_offset(<vscale x 4 x i32> %bases, i64 %imm, <vscale x 4 x i1> %Pg) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfb_nx4vi32_runtime_offset:
				; CHECK-NEXT: prfb pldl1strm, p0, [x0, z0.s, uxtw]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfb.nx4vi32(<vscale x 4 x i1> %Pg, <vscale x 4 x i32> %bases, i64 %imm, i32 1)
				ret void
				}

				define void @llvm_aarch64_sve_gather_prfb_nx4vi32_invalid_immediate_offset_upper_bound(<vscale x 4 x i32> %bases, <vscale x 4 x i1> %Pg) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfb_nx4vi32_invalid_immediate_offset_upper_bound:
				; CHECK-NEXT: mov w[[N:[0-9]+]], #32
				; CHECK-NEXT: prfb pldl1strm, p0, [x[[N]], z0.s, uxtw]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfb.nx4vi32(<vscale x 4 x i1> %Pg, <vscale x 4 x i32> %bases, i64 32, i32 1)
				ret void
				}

				define void @llvm_aarch64_sve_gather_prfb_nx4vi32_invalid_immediate_offset_lower_bound(<vscale x 4 x i32> %bases, <vscale x 4 x i1> %Pg) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfb_nx4vi32_invalid_immediate_offset_lower_bound:
				; CHECK-NEXT: mov x[[N:[0-9]+]], #-1
				; CHECK-NEXT: prfb pldl1strm, p0, [x[[N:[0-9]+]], z0.s, uxtw]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfb.nx4vi32(<vscale x 4 x i1> %Pg, <vscale x 4 x i32> %bases, i64 -1, i32 1)
				ret void
				}

				; PRFB <prfop>, <Pg>, [<Zn>.D{, #<imm>}] -> 64-bit element, imm = 0, 1, ..., 31
				define void @llvm_aarch64_sve_gather_prfb_nx2vi64_runtime_offset(<vscale x 2 x i64> %bases, i64 %imm, <vscale x 2 x i1> %Pg) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfb_nx2vi64_runtime_offset:
				; CHECK-NEXT: prfb pldl1strm, p0, [x0, z0.d, uxtw]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfb.nx2vi64(<vscale x 2 x i1> %Pg, <vscale x 2 x i64> %bases, i64 %imm, i32 1)
				ret void
				}

				define void @llvm_aarch64_sve_gather_prfb_nx2vi64_invalid_immediate_offset_upper_bound(<vscale x 2 x i64> %bases, <vscale x 2 x i1> %Pg) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfb_nx2vi64_invalid_immediate_offset_upper_bound:
				; CHECK-NEXT: mov w[[N:[0-9]+]], #32
				; CHECK-NEXT: prfb pldl1strm, p0, [x[[N]], z0.d, uxtw]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfb.nx2vi64(<vscale x 2 x i1> %Pg, <vscale x 2 x i64> %bases, i64 32, i32 1)
				ret void
				}

				define void @llvm_aarch64_sve_gather_prfb_nx2vi64_invalid_immediate_offset_lower_bound(<vscale x 2 x i64> %bases, <vscale x 2 x i1> %Pg) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfb_nx2vi64_invalid_immediate_offset_lower_bound:
				; CHECK-NEXT: mov x[[N:[0-9]+]], #-1
				; CHECK-NEXT: prfb pldl1strm, p0, [x[[N:[0-9]+]], z0.d, uxtw]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfb.nx2vi64(<vscale x 2 x i1> %Pg, <vscale x 2 x i64> %bases, i64 -1, i32 1)
				ret void
				}

				;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

				; PRFH <prfop>, <Pg>, [<Zn>.S{, #<imm>}] -> 32-bit element, imm = 0, 2, ..., 62
				define void @llvm_aarch64_sve_gather_prfh_nx4vi32_runtime_offset(<vscale x 4 x i32> %bases, i64 %imm, <vscale x 4 x i1> %Pg) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfh_nx4vi32_runtime_offset:
				; CHECK-NEXT: prfh pldl1strm, p0, [x0, z0.s, uxtw #1]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfh.nx4vi32(<vscale x 4 x i1> %Pg, <vscale x 4 x i32> %bases, i64 %imm, i32 1)
				ret void
				}

				define void @llvm_aarch64_sve_gather_prfh_nx4vi32_invalid_immediate_offset_upper_bound(<vscale x 4 x i32> %bases, <vscale x 4 x i1> %Pg) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfh_nx4vi32_invalid_immediate_offset_upper_bound:
				; CHECK-NEXT: mov w[[N:[0-9]+]], #63
				; CHECK-NEXT: prfh pldl1strm, p0, [x[[N]], z0.s, uxtw #1]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfh.nx4vi32(<vscale x 4 x i1> %Pg, <vscale x 4 x i32> %bases, i64 63, i32 1)
				ret void
				}

				define void @llvm_aarch64_sve_gather_prfh_nx4vi32_invalid_immediate_offset_lower_bound(<vscale x 4 x i32> %bases, <vscale x 4 x i1> %Pg) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfh_nx4vi32_invalid_immediate_offset_lower_bound:
				; CHECK-NEXT: mov x[[N:[0-9]+]], #-1
				; CHECK-NEXT: prfh pldl1strm, p0, [x[[N:[0-9]+]], z0.s, uxtw #1]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfh.nx4vi32(<vscale x 4 x i1> %Pg, <vscale x 4 x i32> %bases, i64 -1, i32 1)
				ret void
				}

				define void @llvm_aarch64_sve_gather_prfh_nx4vi32_invalid_immediate_offset_inbound_not_multiple_of_2(<vscale x 4 x i32> %bases, <vscale x 4 x i1> %Pg) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfh_nx4vi32_invalid_immediate_offset_inbound_not_multiple_of_2:
				; CHECK-NEXT: mov w[[N:[0-9]+]], #33
				; CHECK-NEXT: prfh pldl1strm, p0, [x[[N:[0-9]+]], z0.s, uxtw #1]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfh.nx4vi32(<vscale x 4 x i1> %Pg, <vscale x 4 x i32> %bases, i64 33, i32 1)
				ret void
				}

				; PRFH <prfop>, <Pg>, [<Zn>.D{, #<imm>}] -> 64-bit element, imm = 0, 2, ..., 62
				define void @llvm_aarch64_sve_gather_prfh_nx2vi64_runtime_offset(<vscale x 2 x i64> %bases, i64 %imm, <vscale x 2 x i1> %Pg) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfh_nx2vi64_runtime_offset:
				; CHECK-NEXT: prfh pldl1strm, p0, [x0, z0.d, uxtw #1]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfh.nx2vi64(<vscale x 2 x i1> %Pg, <vscale x 2 x i64> %bases, i64 %imm, i32 1)
				ret void
				}

				define void @llvm_aarch64_sve_gather_prfh_nx2vi64_invalid_immediate_offset_upper_bound(<vscale x 2 x i64> %bases, <vscale x 2 x i1> %Pg) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfh_nx2vi64_invalid_immediate_offset_upper_bound:
				; CHECK-NEXT: mov w[[N:[0-9]+]], #63
				; CHECK-NEXT: prfh pldl1strm, p0, [x[[N]], z0.d, uxtw #1]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfh.nx2vi64(<vscale x 2 x i1> %Pg, <vscale x 2 x i64> %bases, i64 63, i32 1)
				ret void
				}

				define void @llvm_aarch64_sve_gather_prfh_nx2vi64_invalid_immediate_offset_lower_bound(<vscale x 2 x i64> %bases, <vscale x 2 x i1> %Pg) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfh_nx2vi64_invalid_immediate_offset_lower_bound:
				; CHECK-NEXT: mov x[[N:[0-9]+]], #-1
				; CHECK-NEXT: prfh pldl1strm, p0, [x[[N:[0-9]+]], z0.d, uxtw #1]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfh.nx2vi64(<vscale x 2 x i1> %Pg, <vscale x 2 x i64> %bases, i64 -1, i32 1)
				ret void
				}

				define void @llvm_aarch64_sve_gather_prfh_nx2vi64_invalid_immediate_offset_inbound_not_multiple_of_2(<vscale x 2 x i64> %bases, <vscale x 2 x i1> %Pg) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfh_nx2vi64_invalid_immediate_offset_inbound_not_multiple_of_2:
				; CHECK-NEXT: mov w[[N:[0-9]+]], #33
				; CHECK-NEXT: prfh pldl1strm, p0, [x[[N:[0-9]+]], z0.d, uxtw #1]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfh.nx2vi64(<vscale x 2 x i1> %Pg, <vscale x 2 x i64> %bases, i64 33, i32 1)
				ret void
				}

				;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

				; PRFW <prfop>, <Pg>, [<Zn>.S{, #<imm>}] -> 32-bit element, imm = 0, 4, ..., 124
				define void @llvm_aarch64_sve_gather_prfw_nx4vi32_runtime_offset(<vscale x 4 x i32> %bases, i64 %imm, <vscale x 4 x i1> %Pg) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfw_nx4vi32_runtime_offset:
				; CHECK-NEXT: prfw pldl1strm, p0, [x0, z0.s, uxtw #2]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfw.nx4vi32(<vscale x 4 x i1> %Pg, <vscale x 4 x i32> %bases, i64 %imm, i32 1)
				ret void
				}

				define void @llvm_aarch64_sve_gather_prfw_nx4vi32_invalid_immediate_offset_upper_bound(<vscale x 4 x i32> %bases, <vscale x 4 x i1> %Pg) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfw_nx4vi32_invalid_immediate_offset_upper_bound:
				; CHECK-NEXT: mov w[[N:[0-9]+]], #125
				; CHECK-NEXT: prfw pldl1strm, p0, [x[[N]], z0.s, uxtw #2]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfw.nx4vi32(<vscale x 4 x i1> %Pg, <vscale x 4 x i32> %bases, i64 125, i32 1)
				ret void
				}

				define void @llvm_aarch64_sve_gather_prfw_nx4vi32_invalid_immediate_offset_lower_bound(<vscale x 4 x i32> %bases, <vscale x 4 x i1> %Pg) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfw_nx4vi32_invalid_immediate_offset_lower_bound:
				; CHECK-NEXT: mov x[[N:[0-9]+]], #-1
				; CHECK-NEXT: prfw pldl1strm, p0, [x[[N:[0-9]+]], z0.s, uxtw #2]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfw.nx4vi32(<vscale x 4 x i1> %Pg, <vscale x 4 x i32> %bases, i64 -1, i32 1)
				ret void
				}

				define void @llvm_aarch64_sve_gather_prfw_nx4vi32_invalid_immediate_offset_inbound_not_multiple_of_4(<vscale x 4 x i32> %bases, <vscale x 4 x i1> %Pg) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfw_nx4vi32_invalid_immediate_offset_inbound_not_multiple_of_4:
				; CHECK-NEXT: mov w[[N:[0-9]+]], #33
				; CHECK-NEXT: prfw pldl1strm, p0, [x[[N:[0-9]+]], z0.s, uxtw #2]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfw.nx4vi32(<vscale x 4 x i1> %Pg, <vscale x 4 x i32> %bases, i64 33, i32 1)
				ret void
				}

				; PRFW <prfop>, <Pg>, [<Zn>.D{, #<imm>}] -> 64-bit element, imm = 0, 4, ..., 124
				define void @llvm_aarch64_sve_gather_prfw_nx2vi64_runtime_offset(<vscale x 2 x i64> %bases, i64 %imm, <vscale x 2 x i1> %Pg) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfw_nx2vi64_runtime_offset:
				; CHECK-NEXT: prfw pldl1strm, p0, [x0, z0.d, uxtw #2]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfw.nx2vi64(<vscale x 2 x i1> %Pg, <vscale x 2 x i64> %bases, i64 %imm, i32 1)
				ret void
				}

				define void @llvm_aarch64_sve_gather_prfw_nx2vi64_invalid_immediate_offset_upper_bound(<vscale x 2 x i64> %bases, <vscale x 2 x i1> %Pg) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfw_nx2vi64_invalid_immediate_offset_upper_bound:
				; CHECK-NEXT: mov w[[N:[0-9]+]], #125
				; CHECK-NEXT: prfw pldl1strm, p0, [x[[N]], z0.d, uxtw #2]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfw.nx2vi64(<vscale x 2 x i1> %Pg, <vscale x 2 x i64> %bases, i64 125, i32 1)
				ret void
				}

				define void @llvm_aarch64_sve_gather_prfw_nx2vi64_invalid_immediate_offset_lower_bound(<vscale x 2 x i64> %bases, <vscale x 2 x i1> %Pg) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfw_nx2vi64_invalid_immediate_offset_lower_bound:
				; CHECK-NEXT: mov x[[N:[0-9]+]], #-1
				; CHECK-NEXT: prfw pldl1strm, p0, [x[[N:[0-9]+]], z0.d, uxtw #2]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfw.nx2vi64(<vscale x 2 x i1> %Pg, <vscale x 2 x i64> %bases, i64 -1, i32 1)
				ret void
				}

				define void @llvm_aarch64_sve_gather_prfw_nx2vi64_invalid_immediate_offset_inbound_not_multiple_of_4(<vscale x 2 x i64> %bases, <vscale x 2 x i1> %Pg) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfw_nx2vi64_invalid_immediate_offset_inbound_not_multiple_of_4:
				; CHECK-NEXT: mov w[[N:[0-9]+]], #33
				; CHECK-NEXT: prfw pldl1strm, p0, [x[[N:[0-9]+]], z0.d, uxtw #2]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfw.nx2vi64(<vscale x 2 x i1> %Pg, <vscale x 2 x i64> %bases, i64 33, i32 1)
				ret void
				}

				;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

				; PRFD <prfop>, <Pg>, [<Zn>.S{, #<imm>}] -> 32-bit element, imm = 0, 8, ..., 248
				define void @llvm_aarch64_sve_gather_prfd_nx4vi32_runtime_offset(<vscale x 4 x i32> %bases, i64 %imm, <vscale x 4 x i1> %Pg) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfd_nx4vi32_runtime_offset:
				; CHECK-NEXT: prfd pldl1strm, p0, [x0, z0.s, uxtw #3]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfd.nx4vi32(<vscale x 4 x i1> %Pg, <vscale x 4 x i32> %bases, i64 %imm, i32 1)
				ret void
				}

				define void @llvm_aarch64_sve_gather_prfd_nx4vi32_invalid_immediate_offset_upper_bound(<vscale x 4 x i32> %bases, <vscale x 4 x i1> %Pg) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfd_nx4vi32_invalid_immediate_offset_upper_bound:
				; CHECK-NEXT: mov w[[N:[0-9]+]], #125
				; CHECK-NEXT: prfd pldl1strm, p0, [x[[N]], z0.s, uxtw #3]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfd.nx4vi32(<vscale x 4 x i1> %Pg, <vscale x 4 x i32> %bases, i64 125, i32 1)
				ret void
				}

				define void @llvm_aarch64_sve_gather_prfd_nx4vi32_invalid_immediate_offset_lower_bound(<vscale x 4 x i32> %bases, <vscale x 4 x i1> %Pg) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfd_nx4vi32_invalid_immediate_offset_lower_bound:
				; CHECK-NEXT: mov x[[N:[0-9]+]], #-1
				; CHECK-NEXT: prfd pldl1strm, p0, [x[[N:[0-9]+]], z0.s, uxtw #3]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfd.nx4vi32(<vscale x 4 x i1> %Pg, <vscale x 4 x i32> %bases, i64 -1, i32 1)
				ret void
				}

				define void @llvm_aarch64_sve_gather_prfd_nx4vi32_invalid_immediate_offset_inbound_not_multiple_of_8(<vscale x 4 x i32> %bases, <vscale x 4 x i1> %Pg) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfd_nx4vi32_invalid_immediate_offset_inbound_not_multiple_of_8:
				; CHECK-NEXT: mov w[[N:[0-9]+]], #33
				; CHECK-NEXT: prfd pldl1strm, p0, [x[[N:[0-9]+]], z0.s, uxtw #3]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfd.nx4vi32(<vscale x 4 x i1> %Pg, <vscale x 4 x i32> %bases, i64 33, i32 1)
				ret void
				}

				; PRFD <prfop>, <Pg>, [<Zn>.D{, #<imm>}] -> 64-bit element, imm = 0, 4, ..., 248
				define void @llvm_aarch64_sve_gather_prfd_nx2vi64_runtime_offset(<vscale x 2 x i64> %bases, i64 %imm, <vscale x 2 x i1> %Pg) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfd_nx2vi64_runtime_offset:
				; CHECK-NEXT: prfd pldl1strm, p0, [x0, z0.d, uxtw #3]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfd.nx2vi64(<vscale x 2 x i1> %Pg, <vscale x 2 x i64> %bases, i64 %imm, i32 1)
				ret void
				}

				define void @llvm_aarch64_sve_gather_prfd_nx2vi64_invalid_immediate_offset_upper_bound(<vscale x 2 x i64> %bases, <vscale x 2 x i1> %Pg) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfd_nx2vi64_invalid_immediate_offset_upper_bound:
				; CHECK-NEXT: mov w[[N:[0-9]+]], #125
				; CHECK-NEXT: prfd pldl1strm, p0, [x[[N]], z0.d, uxtw #3]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfd.nx2vi64(<vscale x 2 x i1> %Pg, <vscale x 2 x i64> %bases, i64 125, i32 1)
				ret void
				}

				define void @llvm_aarch64_sve_gather_prfd_nx2vi64_invalid_immediate_offset_lower_bound(<vscale x 2 x i64> %bases, <vscale x 2 x i1> %Pg) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfd_nx2vi64_invalid_immediate_offset_lower_bound:
				; CHECK-NEXT: mov x[[N:[0-9]+]], #-1
				; CHECK-NEXT: prfd pldl1strm, p0, [x[[N:[0-9]+]], z0.d, uxtw #3]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfd.nx2vi64(<vscale x 2 x i1> %Pg, <vscale x 2 x i64> %bases, i64 -1, i32 1)
				ret void
				}

				define void @llvm_aarch64_sve_gather_prfd_nx2vi64_invalid_immediate_offset_inbound_not_multiple_of_8(<vscale x 2 x i64> %bases, <vscale x 2 x i1> %Pg) nounwind {
				; CHECK-LABEL: llvm_aarch64_sve_gather_prfd_nx2vi64_invalid_immediate_offset_inbound_not_multiple_of_8:
				; CHECK-NEXT: mov w[[N:[0-9]+]], #33
				; CHECK-NEXT: prfd pldl1strm, p0, [x[[N:[0-9]+]], z0.d, uxtw #3]
				; CHECK-NEXT: ret
				call void @llvm.aarch64.sve.gather.prfd.nx2vi64(<vscale x 2 x i1> %Pg, <vscale x 2 x i64> %bases, i64 33, i32 1)
				ret void
				}

				declare void @llvm.aarch64.sve.gather.prfb.nx4vi32(<vscale x 4 x i1> %Pg, <vscale x 4 x i32> %bases, i64 %imm, i32 %prfop)
				declare void @llvm.aarch64.sve.gather.prfb.nx2vi64(<vscale x 2 x i1> %Pg, <vscale x 2 x i64> %bases, i64 %imm, i32 %prfop)
				declare void @llvm.aarch64.sve.gather.prfh.nx4vi32(<vscale x 4 x i1> %Pg, <vscale x 4 x i32> %bases, i64 %imm, i32 %prfop)
				declare void @llvm.aarch64.sve.gather.prfh.nx2vi64(<vscale x 2 x i1> %Pg, <vscale x 2 x i64> %bases, i64 %imm, i32 %prfop)
				declare void @llvm.aarch64.sve.gather.prfw.nx4vi32(<vscale x 4 x i1> %Pg, <vscale x 4 x i32> %bases, i64 %imm, i32 %prfop)
				declare void @llvm.aarch64.sve.gather.prfw.nx2vi64(<vscale x 2 x i1> %Pg, <vscale x 2 x i64> %bases, i64 %imm, i32 %prfop)
				declare void @llvm.aarch64.sve.gather.prfd.nx4vi32(<vscale x 4 x i1> %Pg, <vscale x 4 x i32> %bases, i64 %imm, i32 %prfop)
				declare void @llvm.aarch64.sve.gather.prfd.nx2vi64(<vscale x 2 x i1> %Pg, <vscale x 2 x i64> %bases, i64 %imm, i32 %prfop)

This is an archive of the discontinued LLVM Phabricator instance.

[llvm][CodeGen][SVE] Implement IR intrinsics for gather prefetch.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 250217

llvm/include/llvm/IR/IntrinsicsAArch64.td

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

llvm/lib/Target/AArch64/MCTargetDesc/AArch64AddressingModes.h

llvm/lib/Target/AArch64/SVEInstrFormats.td

llvm/test/CodeGen/AArch64/sve-intrinsics-gather-prefetches-scaled-offset.ll

llvm/test/CodeGen/AArch64/sve-intrinsics-gather-prefetches-vect-base-imm-offset.ll

llvm/test/CodeGen/AArch64/sve-intrinsics-gather-prefetches-vect-base-invalid-imm-offset.ll

[llvm][CodeGen][SVE] Implement IR intrinsics for gather prefetch.
ClosedPublic