Page MenuHomePhabricator

[AArch64][SVE] Integer reduction instructions pattern/intrinsics.
Needs ReviewPublic

Authored by dancgr on Thu, Nov 7, 10:59 AM.

Details

Summary

Added pattern matching/intrinsics for the following SVE instructions:

  • saddv, uaddv
  • smaxv, sminv, umaxv, uminv
  • orv, eorv, andv

For some instructions (smaxv, sminv, umaxv, uminv, org, eorg, andv) the pattern wasn't implemented for i8 and i16 types.

Since i8 and i16 aren't natural types for the FPR8 and FPR16 register classes, they will need custom lowering and some other modifications in order to function properly. These changes are going to be submitted in a latter patch pending some discussion on what is the best way of implementing it.

Event Timeline

dancgr created this revision.Thu, Nov 7, 10:59 AM
Herald added a project: Restricted Project. · View Herald Transcript

For the FPR8 thing, we've run into it before; see https://reviews.llvm.org/D46851 . We should probably look into adding i8 to FPR8; not sure how hard it is, but it makes sense semantically.

We could select these for llvm.experimental.vector.reduce.*, but that doesn't seem like it's high-priority.

LGTM

llvm/include/llvm/IR/IntrinsicsAArch64.td
843

Did you mean to include this in this patch?

dancgr updated this revision to Diff 228438.Fri, Nov 8, 6:51 AM
dancgr marked an inline comment as done.

Removed unused Intrinsic as requested.

llvm/include/llvm/IR/IntrinsicsAArch64.td
843

Actually no, that is for a different thing. I will be removing this.

dancgr marked an inline comment as done.Fri, Nov 8, 6:52 AM

Thanks for this patch @dancgr!

I'd suggest already doing the extra work to support i8 and i16 in this patch, so that the same mechanism can be used for all the i8, i16, i32 and i64 patterns.

One way to do this would be to lower the intrinsics to a custom AArch64ISD node that returns a fixed-width vector (e.g. AArch64::UMAXV_PRED), from which element 0 can be extracted.
For example:

def SDT_AArch64Reduce : SDTypeProfile<1, 2, [SDTCisVec<1>, SDTCisVec<2>]>;
def AArch64umaxv_pred   : SDNode<"AArch64ISD::UMAXV_PRED",   SDT_AArch64Reduce>;

The pattern would then insert its result into a wider vector:

def : Pat<(v16i8 (op (nxv16i1 PPR3bAny:$Pg), (nxv16i8 ZPR8:$Zn))),
          (INSERT_SUBREG (v16i8 (IMPLICIT_DEF)), (!cast<Instruction>(NAME#_B) PPR3bAny:$Pg, ZPR8:$Zn), bsub)>;

This only requires a simple combine rule in ISelLowering that transforms the umaxv intrinsic into the UMAXV_PRED and adds the EXTRACT_ELEMENT operation to extract the byte value from element 0.

llvm/include/llvm/IR/IntrinsicsAArch64.td
832

The result type should rather be LLVMVectorElementType<0> instead of llvm_anyint_ty.

@sdesmalen, would you have any objections if I implemented it as @efriedma suggested?

For the FPR8 thing, we've run into it before; see https://reviews.llvm.org/D46851 . We should probably look into adding i8 to FPR8; not sure how hard it is, but it makes sense semantically.

We could select these for llvm.experimental.vector.reduce.*, but that doesn't seem like it's high-priority.

LGTM

I think that implementing that way would make it simpler for implementing other patterns that have i8 and i16 outputs in the future.

@sdesmalen, would you have any objections if I implemented it as @efriedma suggested?

No real objections. My only reservation is that we're not sure how much effort it will be to implement that. If our goal is to get these intrinsics supported sooner rather than later, then it might be better to use the mechanisms available to us right now as a first step (i.e. insert_subreg and extract_element), before trying something that is more involved. Thanks for checking!