This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Integer reduction instructions pattern/intrinsics.
ClosedPublic

Authored by dancgr on Nov 7 2019, 10:59 AM.

Details

Summary

Added pattern matching/intrinsics for the following SVE instructions:

  • saddv, uaddv
  • smaxv, sminv, umaxv, uminv
  • orv, eorv, andv

For some instructions (smaxv, sminv, umaxv, uminv, org, eorg, andv) the pattern wasn't implemented for i8 and i16 types.

Since i8 and i16 aren't natural types for the FPR8 and FPR16 register classes, they will need custom lowering and some other modifications in order to function properly. These changes are going to be submitted in a latter patch pending some discussion on what is the best way of implementing it.

Event Timeline

dancgr created this revision.Nov 7 2019, 10:59 AM
Herald added a project: Restricted Project. · View Herald Transcript

For the FPR8 thing, we've run into it before; see https://reviews.llvm.org/D46851 . We should probably look into adding i8 to FPR8; not sure how hard it is, but it makes sense semantically.

We could select these for llvm.experimental.vector.reduce.*, but that doesn't seem like it's high-priority.

LGTM

llvm/include/llvm/IR/IntrinsicsAArch64.td
843

Did you mean to include this in this patch?

dancgr updated this revision to Diff 228438.Nov 8 2019, 6:51 AM
dancgr marked an inline comment as done.

Removed unused Intrinsic as requested.

llvm/include/llvm/IR/IntrinsicsAArch64.td
843

Actually no, that is for a different thing. I will be removing this.

dancgr marked an inline comment as done.Nov 8 2019, 6:52 AM

Thanks for this patch @dancgr!

I'd suggest already doing the extra work to support i8 and i16 in this patch, so that the same mechanism can be used for all the i8, i16, i32 and i64 patterns.

One way to do this would be to lower the intrinsics to a custom AArch64ISD node that returns a fixed-width vector (e.g. AArch64::UMAXV_PRED), from which element 0 can be extracted.
For example:

def SDT_AArch64Reduce : SDTypeProfile<1, 2, [SDTCisVec<1>, SDTCisVec<2>]>;
def AArch64umaxv_pred   : SDNode<"AArch64ISD::UMAXV_PRED",   SDT_AArch64Reduce>;

The pattern would then insert its result into a wider vector:

def : Pat<(v16i8 (op (nxv16i1 PPR3bAny:$Pg), (nxv16i8 ZPR8:$Zn))),
          (INSERT_SUBREG (v16i8 (IMPLICIT_DEF)), (!cast<Instruction>(NAME#_B) PPR3bAny:$Pg, ZPR8:$Zn), bsub)>;

This only requires a simple combine rule in ISelLowering that transforms the umaxv intrinsic into the UMAXV_PRED and adds the EXTRACT_ELEMENT operation to extract the byte value from element 0.

llvm/include/llvm/IR/IntrinsicsAArch64.td
832

The result type should rather be LLVMVectorElementType<0> instead of llvm_anyint_ty.

@sdesmalen, would you have any objections if I implemented it as @efriedma suggested?

For the FPR8 thing, we've run into it before; see https://reviews.llvm.org/D46851 . We should probably look into adding i8 to FPR8; not sure how hard it is, but it makes sense semantically.

We could select these for llvm.experimental.vector.reduce.*, but that doesn't seem like it's high-priority.

LGTM

I think that implementing that way would make it simpler for implementing other patterns that have i8 and i16 outputs in the future.

@sdesmalen, would you have any objections if I implemented it as @efriedma suggested?

No real objections. My only reservation is that we're not sure how much effort it will be to implement that. If our goal is to get these intrinsics supported sooner rather than later, then it might be better to use the mechanisms available to us right now as a first step (i.e. insert_subreg and extract_element), before trying something that is more involved. Thanks for checking!

dancgr updated this revision to Diff 230968.Nov 25 2019, 2:08 PM
  • [AArch64][SVE] Add FPR8 and FPR16 types for SVE integer reduction.
dancgr added a comment.EditedNov 25 2019, 2:11 PM

I have added the FPR8 and FPR16 outputs for the SVE Integer reductions.

I have implemented a solution similar to what Sander proposed, but instead I opted for adding v1i8 and v1i16 to the FPR registers in order to simplify the patterns required and make them easier to maintain.

They are handled in the lowering process to get the same result.

I have chosen not to add i8 and i16 types to FPR registers because that would lead to a major refactoring of multiple files.

Sorry about the lack of context, I was getting blocked from uploading the AArch64ISelLowering due to my network maximum upload limit.

dancgr marked 3 inline comments as done.Nov 25 2019, 2:15 PM

Also the change on unrelated patterns it to avoid ambiguities in the FPR8 and FPR16 patterns.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
166 ↗(On Diff #230968)

I will be removing both those comment lines.

llvm/lib/Target/AArch64/AArch64ISelLowering.h
237 ↗(On Diff #230968)

I will be removing this unnecessary extra line.

llvm/lib/Target/AArch64/AArch64InstrFormats.td
6953 ↗(On Diff #230968)

I will be removing this unnecessary extra line.

Patch uploaded without context.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
169 ↗(On Diff #230968)

Is this change necessary? Making these legal has other effects I don't really want to think about.

dancgr updated this revision to Diff 230978.Nov 25 2019, 3:18 PM

Add context.

dancgr marked 2 inline comments as done.Nov 25 2019, 3:21 PM
dancgr added inline comments.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
169 ↗(On Diff #230968)

Its necessary to legalize those types for FPR. I have ran all tests for code-gen and it did not seam to have any side effect. For NEON we are currently legalizing v16i8 for FPR8 and v8i16 for FPR16 in the same way.

efriedma added inline comments.Nov 25 2019, 3:35 PM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
169 ↗(On Diff #230968)

I just tried this and I see failures (if I enable this for all NEON targets, not just SVE).

huntergr added inline comments.Nov 26 2019, 3:37 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
10533 ↗(On Diff #230978)

This appears to be using the INSERT_SUBREG I mentioned above, but in C++ code. For this case I think it's better to use a tablegen pattern.

The extract will be needed for all element types though.

169 ↗(On Diff #230968)

We never needed to add those as legal types downstream. Instead, we use INSERT_SUBREG in output patterns.

llvm/lib/Target/AArch64/SVEInstrFormats.td
5734

So this is where we implemented the INSERT_SUBREG pattern, like this:

def : Pat<(v16i8 (op (nxv16i1 PPR3bAny:$Pg), (nxv16i8 ZPR8:$Zn))),
          (INSERT_SUBREG (v16i8 (IMPLICIT_DEF)), (!cast<Instruction>(NAME#_B) PPR3bAny:$Pg, ZPR8:$Zn), bsub)>;

Similarly for the other types and reduce multiclasses.

While we prefer to use SVE_2_Op_Pat and similar helpers where possible, sometimes we have to use more involved patterns in this file. Doing so avoids the need to add new legal types just to support the reductions.

dancgr marked 5 inline comments as done.Nov 26 2019, 10:35 AM
dancgr added inline comments.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
10533 ↗(On Diff #230978)

Will change the pattern according to the following comment, that way I can get rid of this custom lowering.

169 ↗(On Diff #230968)

Will be removing this.

llvm/lib/Target/AArch64/SVEInstrFormats.td
5734

I will update the solution for this. I will be putting that in a re-usable pattern and simplify the lowering.

dancgr updated this revision to Diff 231113.Nov 26 2019, 11:24 AM
dancgr marked 2 inline comments as done.

I think I have addressed all of the comments from the reviewers on this patch. This way we don't have any new legal type and I embedded the insert_subreg in the helper pattern.

dancgr marked 3 inline comments as done.Nov 26 2019, 11:24 AM

Do any of the reviewers have other suggestions for this patch?

Thanks for making these changes @dancgr! The patch is starting to look good, just a few more suggestions.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
10517 ↗(On Diff #231113)

Given the function name is LowerSVEIntReduction, is it better to assert that DataVT.getVectorElementType().isScalarInteger() ?

10520 ↗(On Diff #231113)

This code only works on legal vectors because of the 128/bitSize, so you'll need to add a

if (!TLI.isTypeLegal(DataVT))
   return SDValue

for when e.g. <vscale x 2 x i32> is passed in as a type.

nit: Rather than using 128 directly, can we create something like AArch64::NeonBitsPerVector?

10525 ↗(On Diff #231113)

nit: these empty spaces between the lines don't improve readability.

llvm/lib/Target/AArch64/AArch64InstrInfo.td
2125 ↗(On Diff #231113)

Are these changes still necessary?

llvm/lib/Target/AArch64/SVEInstrFormats.td
296

nit: Given that the other reduction uses the normal pattern, can we rename this to SVE_2_Op_Pat_Reduce_To_Neon or something?

5734

Thanks, that looks quite neat!

llvm/test/CodeGen/AArch64/sve-int-reduce-pred.ll
34

For the ACLE saddv_i64 is also needed, can you add this case as well? (it will just map directly to a uaddv instruction directly, so that should be a simple change where you call LowerSVEIntReduction

122

nit: strange indentation here.

dancgr marked 9 inline comments as done.Nov 29 2019, 10:26 AM

Done all changes suggested by @sdesmalen. I removed the unnecessary changes to AArch64 patterns, did all of the small details and added nxv2i64 sddv mapping to uaddv.

llvm/test/CodeGen/AArch64/sve-int-reduce-pred.ll
34

For this, I have decided to map it in the SVEInstrFormat pattern instead of the lowering (I added an extra PatternOperator for the uaddv multiclass). Since we don't have custom lowering for SADDV and UADDV (they return i64 FPR128 types so they don't need it).

dancgr updated this revision to Diff 231565.Nov 29 2019, 10:34 AM
dancgr marked an inline comment as done.

Added final touches.

sdesmalen accepted this revision.Dec 2 2019, 9:35 AM

Thanks for making these changes to your patch @dancgr.
Please address the two nits before you commit, but otherwise LGTM.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
10521 ↗(On Diff #231565)

nit: >80chars. Please use clang-format before you commit.

10522 ↗(On Diff #231565)

nit: BitSize is not very descriptive. Perhaps because it is used only once, just propagate VT.getSizeInBits() into the expression calculating OutputVT?
nit: The first character of variables in this file are capitalized, so this should have been BitSize.

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
57

This reads a bit confusing, but I guess it works. You could have done the same thing for all intrinsics. By calling LowerSVEIntReduction and using the same pattern for all reductions, rather than one using the SVE_2_Op_Pat for uaddv/saddv, and using SVE_2_Op_Pat_Reduce_To_Neon for the others. The only thing different is the result type (i64 vs i8/i16/i32/i64), but the same lowering function works for both.

This revision is now accepted and ready to land.Dec 2 2019, 9:35 AM
dancgr marked 5 inline comments as done.Dec 2 2019, 12:18 PM
dancgr added inline comments.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
10521 ↗(On Diff #231565)

Will do. I had ran clang format for ISelLowering, however there were too many changes in other places. I ran clang-format for the part of code that I have changed.

10522 ↗(On Diff #231565)

Removed bitsize for clarity.

dancgr updated this revision to Diff 231763.Dec 2 2019, 12:41 PM
dancgr marked 2 inline comments as done.
dancgr updated this revision to Diff 232186.Dec 4 2019, 11:43 AM