Page MenuHomePhabricator

[llvm][CodeGen][SVE] Implement IR intrinsics for gather prefetch.
ClosedPublic

Authored by fpetrogalli on Mar 3 2020, 4:00 PM.

Details

Summary
Intrinsics and relative codegen has been implemented for the following
SVE instructions:

1. PRF<T> <prfop>, <Pg>, [<Xn|SP>, <Zm>.S, <mod>] -> 32-bit          scaled offset
2. PRF<T> <prfop>, <Pg>, [<Xn|SP>, <Zm>.D, <mod>] -> 32-bit unpacked scaled offset
3. PRF<T> <prfop>, <Pg>, [<Xn|SP>, <Zm>.D]        -> 64-bit          scaled offset
4. PRF<T> <prfop>, <Pg>, [<Zn>.S{, #<imm>}]       -> 32-bit element
5. PRF<T> <prfop>, <Pg>, [<Zn>.D{, #<imm>}]       -> 64-bit element

The instructions are associated the following intrinsics, respectively:

1. void @llvm.aarch64.sve.gather.prf<T>.scaled.<mod>.nx4vi32(
          i8* %base,
          <vscale x 4 x i32> %offset,
      <vscale x 4 x i1> %Pg,
      i32 %prfop)

2. void @llvm.aarch64.sve.gather.prf<T>.scaled.<mod>.nx2vi32(
          i8* %base,
          <vscale x 2 x i32> %offset,
      <vscale x 2 x i1> %Pg,
      i32 %prfop)

3. void @llvm.aarch64.sve.gather.prf<T>.scaled.nx2vi64(
          i8* %base,
      <vscale x 2 x i64> %offset,
      <vscale x 2 x i1> %Pg,
      i32 %prfop)

4. void @llvm.aarch64.sve.gather.prf<T>.nx4vi32(
          <vscale x 4 x i32> %bases,
      i64 %imm,
      <vscale x 4 x i1> %Pg,
      i32 %prfop)

5. void @llvm.aarch64.sve.gather.prf<T>.nx2vi64(
          <vscale x 2 x i64> %bases,
      i64 %imm,
      <vscale x 2 x i1> %Pg,
      i32 %prfop)

The intrinsics are the IR counterpart of the following SVE ACLE functions:

* void svprf<T>(svbool_t pg, const void *base, svprfop op)
* void svprf<T>_vnum(svbool_t pg, const void *base, int64_t vnum, svprfop op)
* void svprf<T>_gather[_u32base](svbool_t pg, svuint32_t bases, svprfop op)
* void svprf<T>_gather[_u64base](svbool_t pg, svuint64_t bases, svprfop op)
* void svprf<T>_gather_[s32]offset(svbool_t pg, const void *base, svint32_t offsets, svprfop op)
* void svprf<T>_gather_[u32]offset(svbool_t pg, const void *base, svint32_t offsets, svprfop op)
* void svprf<T>_gather_[s64]offset(svbool_t pg, const void *base, svint64_t offsets, svprfop op)
* void svprf<T>_gather_[u64]offset(svbool_t pg, const void *base, svint64_t offsets, svprfop op)
* void svprf<T>_gather[_u32base]_offset(svbool_t pg, svuint32_t bases, int64_t offset, svprfop op)
* void svprf<T>_gather[_u64base]_offset(svbool_t pg, svuint64_t bases,int64_t offset, svprfop op)

Reviewers: andwar, sdesmalen, efriedma, rengolin

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75580

Diff Detail

Event Timeline

fpetrogalli created this revision.Mar 3 2020, 4:00 PM
Herald added a project: Restricted Project. · View Herald Transcript
andwar added a comment.Mar 4 2020, 6:03 AM

Hi @fpetrogalli, thank you for working on this!

My main points:

  • ACLE allows scalar indices that may be out of range for PRFH <prfop>, <Pg>, [<Zn>.S{, #<imm>}]. Will we be able to cater for those with your approach?
  • Could you please remove the extra empty lines from test files? (and make sure that the formatting is consistent)
  • We tend to put declarations at the end of the test files (I've not seen a test with a different style, but I've not seen them all either ;-) )
llvm/include/llvm/IR/IntrinsicsAArch64.td
1267

This is an unwritten rule, but so far class definitions and intrinsics definitions are kept separately (i.e. all class definitions first, followed by all intrinsics definitions). Some classes are re-used by many intrinsics, so the 2 definitions (class and intrinsics) can end up being far apart anyway.

For consistency sake I'd separate the two.

1270

[nit] I think that this way would be a bit more consistent (one space before the comment)

class SVE_prf_scaled
    : Intrinsic<[],
                [
                  llvm_anyptr_ty, // Base address
                  llvm_anyvector_ty, // offsets
                  LLVMScalarOrSameVectorWidth<1, llvm_i1_ty>, // Predicate
                  llvm_i32_ty // prfop
                ],
                [IntrArgMemOnly, NoCapture<0>, ImmArg<3>]>;
1284

[nit] Base addresses

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
13

Please, could this block) of code (and it's sunblocks) could be documented/commented consistently with the rest of the file? E.g.:

// Prefetches - pattern definitions
//

Having said that, this file is growing organically so we may want to revisit the current style.

18

Why extra indentation?

40

IMO this would be easier to read if it was split across two lines.

llvm/test/CodeGen/AArch64/sve-intrinsics-gather-prefetches-scaled-offset.ll
263

[nit] many empty lines

llvm/test/CodeGen/AArch64/sve-intrinsics-gather-prefetches-vect-base-imm-offset.ll
11

What about indices that are out of range for <prfop>, <Pg>, [<Zn>.S{, #<imm>}]? IIUC, this is similar to what's tested here:

https://github.com/llvm/llvm-project/blob/e60c28746b0cf30323e736f34048b02ff34688f6/llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-vector-base-imm-offset.ll#L284

Currently this is neither supported nor tested.

37

[nit] empty line

sdesmalen added inline comments.Mar 4 2020, 6:07 AM
llvm/include/llvm/IR/IntrinsicsAArch64.td
1289

Because the ACLE intrinsic doesn't require the offset to be an immediate, we don't want ImmArg<1> to be an immediate in the LLVM IR intrinsic, and rather leave it to CodeGen to make sure it generates the appropriate instruction.

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
40

nit: format to split statement on two lines?

llvm/lib/Target/AArch64/SVEInstrFormats.td
6463

nit: this is some odd formatting here.

llvm/test/CodeGen/AArch64/sve-intrinsics-gather-prefetches-scaled-offset.ll
5

nit: Most tests have the declarations at the bottom.
nit: There are some random newlines in this file.

fpetrogalli marked an inline comment as done.Mar 4 2020, 6:58 AM
fpetrogalli added inline comments.
llvm/test/CodeGen/AArch64/sve-intrinsics-gather-prefetches-scaled-offset.ll
133

The offset vector here should be <vscale x 2 x i32>, not <vscale x 2 x i64>. I'll fix this.

fpetrogalli marked 16 inline comments as done.Mar 9 2020, 7:37 AM

Hi @fpetrogalli, thank you for working on this!

My main points:

  • ACLE allows scalar indices that may be out of range for PRFH <prfop>, <Pg>, [<Zn>.S{, #<imm>}]. Will we be able to cater for those with your approach?

Yep, I have added a test to cover all cases in which the intrinsic cannot be lowered to such instruction (run time values or invalid immediate).

  • Could you please remove the extra empty lines from test files? (and make sure that the formatting is consistent)

Done! Please double check I haven't missed any.

  • We tend to put declarations at the end of the test files (I've not seen a test with a different style, but I've not seen them all either ;-) )

Done.

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
13

I have removed these PatFrag, they are not needed anymore in last patch.

18

PatFrags removed from the patch.

llvm/test/CodeGen/AArch64/sve-intrinsics-gather-prefetches-scaled-offset.ll
133

This is now working, the offset uses the correct scalable vector <vscale x 2 x i32>.

fpetrogalli marked 2 inline comments as done.

Hi @sdesmalen and @andwar,

thank you for your review!

I have addressed all your comments, please let me know if I have missed any.

Please notice that this patch now handles unpacked vector offsets and invalid immediate scalar offset.

Grazie,

Francesco

fpetrogalli edited the summary of this revision. (Show Details)Mar 9 2020, 7:43 AM

@fpetrogalli Thanks for the update - this has evolved into a very neat patch! I've left a few nits, nothing major. One additional nice-to-have: could the commit message contain the list of intrinsics that you are adding? Personally I think that making references to ACLE there is not needed, but I don't mind it being there.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
12957

This method is only needed for unpacked offset, right? Maybe worth stating that for all other cases the input node is already OK? Otherwise I read this as:

  • do something for unpacked 32-bit offsets, in all other cases "I don't know".

    And in reality it's more like:
  • do something for unpacked 32-bit offsets, in all other cases it's already OK
12963

bail off -> bail out

12967

[nit] IMO the comment and the code are inconsistent. What about this:

// Extend the unpacked offset vector to 64-bit lanes.
Offset = DAG.getNode(ISD::ANY_EXTEND, DL, MVT::nxv2i64, Offset);

const SDLoc DL(N);
SDValue Ops[] = {
    N->getOperand(0),   // Chain
    N->getOperand(1),   // Intrinsic ID
    N->getOperand(2),   // Base
    Offset,             // Offset
    N->getOperand(4),   // Pg
    N->getOperand(5)    // prfop
};

This way the other comments are not unnecessarily pushed away.

12980

[nit] represent -> represents
[nit] Inconsistent comment style - you use // here, but you used /// before. Since these are static methods, I've been using // everywhere, but the file is inconsistent in this respect. So choose the one you like best :)

12988

Very nice and useful method :) Could you please generalise the name and use it in peformGatherLoadCombine and performScatterStoreCombine:

It's an identical condition, just written horribly by me :)

13008

Hm, it's a bit more like: Check whether we need to remap? Yes - remap, No - all good! The way it's written now suggests that this method will always remap.

13016

Broken indentation.

13023

... and by updating the Intrinsic ID ;-)

llvm/lib/Target/AArch64/SVEInstrFormats.td
6494

[nit] Unwritten rule - Pat always follow InstAlias.

6861

[nit] Pat follow InstAlias

llvm/test/CodeGen/AArch64/sve-intrinsics-gather-prefetches-scaled-offset.ll
69

[nit] Inconsistent formatting

sdesmalen added inline comments.Mar 10 2020, 4:47 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
12957

nit: this comment is a bit confusing. It doesn't so much 'combine' anything, but it rather 'legalizes the offset'.

12961

nit: unnecessary use of const (same for other places in this patch).

13027

Given that you've gone the route of doing this in ISelLowering rather than having ComplexPatterns, it's probably better to create ISD nodes rather than passing the intrinsics around. This means we can later reuse them if there is ever a llvm.gather.prefetch, but also to streamline the implementation of prefetches with that of the LD1 gathers. It would also do away with having to pass the operands explicitly like this and thus simplify these combines.

sdesmalen added inline comments.Mar 12 2020, 8:58 AM
llvm/include/llvm/IR/IntrinsicsAArch64.td
1272

I realise this only now, but having the predicate as the third operand is different from the gather loads. Can you make this to be the first operand, rather than the third, to streamline their definition with the gather loads?

1282

Same here.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13027

We discussed this offline and decided it's fine for now to use the intrinsics directly. We may revisit this later when we clean up the logic in this file for mapping the gathers/scatters/prefetches.

fpetrogalli marked 21 inline comments as done.Mar 12 2020, 3:26 PM
fpetrogalli added inline comments.
llvm/include/llvm/IR/IntrinsicsAArch64.td
1272

Good point. I have redefined the class as

class SVE_gather_prf_scalar_base_vector_offset_scaled
    : Intrinsic<[],
                [
                  LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>, // Predicate
                  llvm_ptr_ty, // Base address
                  llvm_anyvector_ty, // Offsets
                  llvm_i32_ty // Prfop
		],
                [IntrInaccessibleMemOrArgMemOnly, NoCapture<0>, ImmArg<3>]>;

and consequently redefined the intrinsics from, say, declare void @llvm.aarch64.sve.gather.prfb.scaled.uxtw.nx4vi32(i8* %base, <vscale x 4 x i32> %offset, <vscale x 4 x i1> %Pg, i32 %prfop), to declare void @llvm.aarch64.sve.gather.prfb.scaled.uxtw.nx4vi32(<vscale x 4 x i1> %Pg, i8* %base, <vscale x 4 x i32> %offset, i32 %prfop), but I get llc runtime failures as the following:

Wrong types for attribute: inalloca nest noalias nocapture nonnull readnone readonly signext sret zeroext byval dereferenceable(1) dereferenceable_or_null(1)
void (<vscale x 4 x i1>, i8*, <vscale x 4 x i32>, i32)* @llvm.aarch64.sve.gather.prfb.scaled.uxtw.nxv4i32

I cannot figure out what I am doing wrong though... Am I right assuming that llvm_anyvec_ty is still the overloaded type that needs to be mentioned in the name of the intrinsic in IR? (nxv4i32 for the example reported here).

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
12957

I am not sure what you want me to do here. If it helps, I have updated the comment of the method to the following:

/// Legalize the gather prefetch (scalar + vector addressing mode) when the
/// offset vector is an unpacked 32-bit scalable vector. The other cases (Offset
/// != nxv2i32) do not need legalization.
12961

Goodbye, beloved const. :)

13008

I have updated the comment of the method to the following, let me know if it is still unclear.

/// Combines a node carrying the intrinsic `aarch64_sve_gather_prf<T>` into a
/// node that uses `aarch64_sve_gather_prf<T>_scaled_uxtw` when the scalar
/// offset passed to `aarch64_sve_gather_prf<T>` is not a valid immediate for
/// the sve gather prefetch instruction with vector plus immediate addressing
/// mode.
13016

Hum, CI would have caught it. I usually run git clang-format to my patches before submitting them. I'll double check if I haven't miss this one.

13027

Thank you!

fpetrogalli updated this revision to Diff 250075.EditedMar 12 2020, 3:34 PM
fpetrogalli marked 5 inline comments as done.

Hi @sdesmalen, @andwar.

Thank you for your reviews.

@sdesmalen, I got into troubles when trying to reorder the operands. I have reported a runtime error that llc is spitting out when running (see inline comment). I'll keep looking at it as the request makes absolutely sense, but let me know if you already can spot what I am doing wrong from the example.

@andwar - I'd like to keep the list of SVE ACLE in the commit message. It seems quite a useful piece of information to carry around if someone asks "which ACLEs are these intrinsics implementing?".

Other than that, I thing I have covered all feedback. Let me know if I missed any.

Thank you!

Francesco

fpetrogalli edited the summary of this revision. (Show Details)Mar 12 2020, 3:34 PM

Thanks you for addressing my comments @fpetrogalli !

A one final nice-to-have (not a blocker!). Could this snippet from legalizeSVEGatherPrefetchOffsVec be extracted into a separate function:

// Not an unpacked vector, bail out.
if (Offset.getValueType().getSimpleVT().SimpleTy != MVT::nxv2i32)
  return SDValue();

// Extend the unpacked offset vector to 64-bit lanes.
SDLoc DL(N);
Offset = DAG.getNode(ISD::ANY_EXTEND, DL, MVT::nxv2i64, Offset);

and re-used by performGatherLoadCombine and performScatterStoreCombine? We can do it in a separate patch too.

llvm/include/llvm/IR/IntrinsicsAArch64.td
1272

Am I right assuming that llvm_anyvec_ty is still the overloaded type that needs to be mentioned in the name of the intrinsic in IR? (nxv4i32 for the example reported here).

I think so, but IIUC this (from you example that didn't work):

LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>, // Predicate

means the following: the 0th operand has the same number of elements as the 0th operand. Which is why, I think, you hit that problem ^^^.

sdesmalen added inline comments.Mar 13 2020, 4:23 AM
llvm/include/llvm/IR/IntrinsicsAArch64.td
1272

The pointer is now the second argument, and I suspect you forgot to update NoCapture<0> to NoCapture<1>.

I have reorder the operands as requested by @sdesmalen.

Thank you.

fpetrogalli marked 6 inline comments as done.Mar 13 2020, 8:34 AM
fpetrogalli added inline comments.
llvm/include/llvm/IR/IntrinsicsAArch64.td
1272

Thanks! I missed that.

1272

Just a clarification: the 0-th operand refers to the 0-th overloaded operand, which is still llvm_anyvector_ty.

@sdesmalen was right, it's the NoCapture that was causing problems.

All good now :).

Thank you both.

fpetrogalli marked 2 inline comments as done.Mar 13 2020, 11:44 AM

Thanks you for addressing my comments @fpetrogalli !

A one final nice-to-have (not a blocker!). Could this snippet from legalizeSVEGatherPrefetchOffsVec be extracted into a separate function:

// Not an unpacked vector, bail out.
if (Offset.getValueType().getSimpleVT().SimpleTy != MVT::nxv2i32)
  return SDValue();
 
// Extend the unpacked offset vector to 64-bit lanes.
SDLoc DL(N);
Offset = DAG.getNode(ISD::ANY_EXTEND, DL, MVT::nxv2i64, Offset);

and re-used by performGatherLoadCombine and performScatterStoreCombine? We can do it in a separate patch too.

Is it worth? The logic of these methods seems quite different to me. What performGatherLoadCombine` and performScatterStoreCombine do is "do something in any case, if the offset is unpacked, extend it". What the method legalizeSVEGatherPrefetchOffsVec do is "do nothing, unless the offset is unpacked". At the end, what I can see that can be merged is the Offset = DAG.getNode(... invocation that insert the ANY_EXTEND. Not much that is worth factoring out.

Francesco

This revision is now accepted and ready to land.Mar 16 2020, 2:28 AM
This revision was automatically updated to reflect the committed changes.

Hello! Thanks for this patch. In release builds, I'm seeing lots of warnings that:

In file included from ../lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp:9:
../lib/Target/AArch64/MCTargetDesc/AArch64AddressingModes.h:850:13: warning: unused function 'isValidImmForSVEVecImmAddrMode' [-Wunused-function]
static bool isValidImmForSVEVecImmAddrMode(unsigned OffsetInBytes,
            ^

Also, there's one instance:

../lib/Target/AArch64/AArch64ISelLowering.cpp:12714:21: warning: unused variable 'OffsetConst' [-Wunused-variable]
    ConstantSDNode *OffsetConst = dyn_cast<ConstantSDNode>(Offset.getNode());
                    ^

Can you please take a look?

Hello! Thanks for this patch. In release builds, I'm seeing lots of warnings that:

In file included from ../lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp:9:
../lib/Target/AArch64/MCTargetDesc/AArch64AddressingModes.h:850:13: warning: unused function 'isValidImmForSVEVecImmAddrMode' [-Wunused-function]
static bool isValidImmForSVEVecImmAddrMode(unsigned OffsetInBytes,
            ^

Also, there's one instance:

../lib/Target/AArch64/AArch64ISelLowering.cpp:12714:21: warning: unused variable 'OffsetConst' [-Wunused-variable]
    ConstantSDNode *OffsetConst = dyn_cast<ConstantSDNode>(Offset.getNode());
                    ^

Can you please take a look?

Hi @nickdesaulniers ,

thank you for pointing this out! Someone has been faster than me in reacting to this! https://github.com/llvm/llvm-project/commit/f20dcc31e31fdb94159572af1e2a87dcc5d02bd8 (@vitalybuka - thank you for your patch)

Please let me know if you see any other problems with the patch.

Kind regards,

Francesco