This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/IR/
-
llvm/
-
IR/
3/3
IntrinsicsAArch64.td
-
lib/Target/AArch64/
-
Target/
-
AArch64/
1/1
AArch64ISelLowering.h
15/15
AArch64ISelLowering.cpp
1/1
AArch64SVEInstrInfo.td
4/4
SVEInstrFormats.td
-
Utils/
-
AArch64BaseInfo.h
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
3/3
sve-int-reduce-pred.ll

Differential D69956

[AArch64][SVE] Integer reduction instructions pattern/intrinsics.
ClosedPublic

Authored by dancgr on Nov 7 2019, 10:59 AM.

Download Raw Diff

Details

Reviewers

huntergr
sdesmalen
dancgr
mgudim
amehsan
kmclaughlin
rengolin
efriedma

Summary

Added pattern matching/intrinsics for the following SVE instructions:

saddv, uaddv
smaxv, sminv, umaxv, uminv
orv, eorv, andv

For some instructions (smaxv, sminv, umaxv, uminv, org, eorg, andv) the pattern wasn't implemented for i8 and i16 types.

Since i8 and i16 aren't natural types for the FPR8 and FPR16 register classes, they will need custom lowering and some other modifications in order to function properly. These changes are going to be submitted in a latter patch pending some discussion on what is the best way of implementing it.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

dancgr created this revision.Nov 7 2019, 10:59 AM

Herald added a reviewer: rengolin. · View Herald TranscriptNov 7 2019, 10:59 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: llvm-commits, psnobl, rkruppe and 3 others. · View Herald Transcript

Harbormaster completed remote builds in B40634: Diff 228279.Nov 7 2019, 11:07 AM

For the FPR8 thing, we've run into it before; see https://reviews.llvm.org/D46851 . We should probably look into adding i8 to FPR8; not sure how hard it is, but it makes sense semantically.

We could select these for llvm.experimental.vector.reduce.*, but that doesn't seem like it's high-priority.

LGTM

llvm/include/llvm/IR/IntrinsicsAArch64.td
1004	Did you mean to include this in this patch?

Removed unused Intrinsic as requested.

llvm/include/llvm/IR/IntrinsicsAArch64.td
1004	Actually no, that is for a different thing. I will be removing this.

dancgr marked an inline comment as done.Nov 8 2019, 6:52 AM

Harbormaster completed remote builds in B40684: Diff 228438.Nov 8 2019, 6:53 AM

Thanks for this patch @dancgr!

I'd suggest already doing the extra work to support i8 and i16 in this patch, so that the same mechanism can be used for all the i8, i16, i32 and i64 patterns.

One way to do this would be to lower the intrinsics to a custom AArch64ISD node that returns a fixed-width vector (e.g. AArch64::UMAXV_PRED), from which element 0 can be extracted.
For example:

def SDT_AArch64Reduce : SDTypeProfile<1, 2, [SDTCisVec<1>, SDTCisVec<2>]>;
def AArch64umaxv_pred   : SDNode<"AArch64ISD::UMAXV_PRED",   SDT_AArch64Reduce>;

The pattern would then insert its result into a wider vector:

def : Pat<(v16i8 (op (nxv16i1 PPR3bAny:$Pg), (nxv16i8 ZPR8:$Zn))),
          (INSERT_SUBREG (v16i8 (IMPLICIT_DEF)), (!cast<Instruction>(NAME#_B) PPR3bAny:$Pg, ZPR8:$Zn), bsub)>;

This only requires a simple combine rule in ISelLowering that transforms the umaxv intrinsic into the UMAXV_PRED and adds the EXTRACT_ELEMENT operation to extract the byte value from element 0.

llvm/include/llvm/IR/IntrinsicsAArch64.td
993	The result type should rather be `LLVMVectorElementType<0>` instead of `llvm_anyint_ty`.

cameron.mcinally added a subscriber: cameron.mcinally.Nov 11 2019, 7:19 AM

@sdesmalen, would you have any objections if I implemented it as @efriedma suggested?

In D69956#1737938, @efriedma wrote:

For the FPR8 thing, we've run into it before; see https://reviews.llvm.org/D46851 . We should probably look into adding i8 to FPR8; not sure how hard it is, but it makes sense semantically.

We could select these for llvm.experimental.vector.reduce.*, but that doesn't seem like it's high-priority.

LGTM

I think that implementing that way would make it simpler for implementing other patterns that have i8 and i16 outputs in the future.

In D69956#1740813, @dancgr wrote:

@sdesmalen, would you have any objections if I implemented it as @efriedma suggested?

No real objections. My only reservation is that we're not sure how much effort it will be to implement that. If our goal is to get these intrinsics supported sooner rather than later, then it might be better to use the mechanisms available to us right now as a first step (i.e. insert_subreg and extract_element), before trying something that is more involved. Thanks for checking!

[AArch64][SVE] Add FPR8 and FPR16 types for SVE integer reduction.

Herald added a reviewer: efriedma. · View Herald TranscriptNov 25 2019, 2:08 PM

Harbormaster completed remote builds in B41473: Diff 230968.Nov 25 2019, 2:10 PM

I have added the FPR8 and FPR16 outputs for the SVE Integer reductions.

I have implemented a solution similar to what Sander proposed, but instead I opted for adding v1i8 and v1i16 to the FPR registers in order to simplify the patterns required and make them easier to maintain.

They are handled in the lowering process to get the same result.

I have chosen not to add i8 and i16 types to FPR registers because that would lead to a major refactoring of multiple files.

Sorry about the lack of context, I was getting blocked from uploading the AArch64ISelLowering due to my network maximum upload limit.

Also the change on unrelated patterns it to avoid ambiguities in the FPR8 and FPR16 patterns.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
166	I will be removing both those comment lines.
llvm/lib/Target/AArch64/AArch64ISelLowering.h
237	I will be removing this unnecessary extra line.
llvm/lib/Target/AArch64/AArch64InstrFormats.td
6953 ↗	(On Diff #230968)	I will be removing this unnecessary extra line.

Patch uploaded without context.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
169	Is this change necessary? Making these legal has other effects I don't really want to think about.

Add context.

dancgr marked 2 inline comments as done.Nov 25 2019, 3:21 PM

dancgr added inline comments.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
169	Its necessary to legalize those types for FPR. I have ran all tests for code-gen and it did not seam to have any side effect. For NEON we are currently legalizing v16i8 for FPR8 and v8i16 for FPR16 in the same way.

efriedma added inline comments.Nov 25 2019, 3:35 PM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
169	I just tried this and I see failures (if I enable this for all NEON targets, not just SVE).

huntergr added inline comments.Nov 26 2019, 3:37 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
169	We never needed to add those as legal types downstream. Instead, we use INSERT_SUBREG in output patterns.
10550	This appears to be using the INSERT_SUBREG I mentioned above, but in C++ code. For this case I think it's better to use a tablegen pattern. The extract will be needed for all element types though.
llvm/lib/Target/AArch64/SVEInstrFormats.td
5997	So this is where we implemented the INSERT_SUBREG pattern, like this: def : Pat<(v16i8 (op (nxv16i1 PPR3bAny:$Pg), (nxv16i8 ZPR8:$Zn))), (INSERT_SUBREG (v16i8 (IMPLICIT_DEF)), (!cast<Instruction>(NAME#_B) PPR3bAny:$Pg, ZPR8:$Zn), bsub)>; Similarly for the other types and reduce multiclasses. While we prefer to use `SVE_2_Op_Pat` and similar helpers where possible, sometimes we have to use more involved patterns in this file. Doing so avoids the need to add new legal types just to support the reductions.

dancgr marked 5 inline comments as done.Nov 26 2019, 10:35 AM

dancgr added inline comments.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
169	Will be removing this.
10550	Will change the pattern according to the following comment, that way I can get rid of this custom lowering.
llvm/lib/Target/AArch64/SVEInstrFormats.td
5997	I will update the solution for this. I will be putting that in a re-usable pattern and simplify the lowering.

I think I have addressed all of the comments from the reviewers on this patch. This way we don't have any new legal type and I embedded the insert_subreg in the helper pattern.

dancgr marked 3 inline comments as done.Nov 26 2019, 11:24 AM

Do any of the reviewers have other suggestions for this patch?

Thanks for making these changes @dancgr! The patch is starting to look good, just a few more suggestions.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
10537	Given the function name is `LowerSVEIntReduction`, is it better to assert that `DataVT.getVectorElementType().isScalarInteger()` ?
10540	This code only works on legal vectors because of the `128/bitSize`, so you'll need to add a if (!TLI.isTypeLegal(DataVT)) return SDValue for when e.g. `<vscale x 2 x i32>` is passed in as a type. nit: Rather than using `128` directly, can we create something like AArch64::NeonBitsPerVector?
10545	nit: these empty spaces between the lines don't improve readability.
llvm/lib/Target/AArch64/AArch64InstrInfo.td
2125 ↗	(On Diff #231113)	Are these changes still necessary?
llvm/lib/Target/AArch64/SVEInstrFormats.td
296	nit: Given that the other reduction uses the normal pattern, can we rename this to `SVE_2_Op_Pat_Reduce_To_Neon` or something?
5997	Thanks, that looks quite neat!
llvm/test/CodeGen/AArch64/sve-int-reduce-pred.ll
34	For the ACLE `saddv_i64` is also needed, can you add this case as well? (it will just map directly to a `uaddv` instruction directly, so that should be a simple change where you call `LowerSVEIntReduction`
122	nit: strange indentation here.

Done all changes suggested by @sdesmalen. I removed the unnecessary changes to AArch64 patterns, did all of the small details and added nxv2i64 sddv mapping to uaddv.

llvm/test/CodeGen/AArch64/sve-int-reduce-pred.ll
34	For this, I have decided to map it in the SVEInstrFormat pattern instead of the lowering (I added an extra PatternOperator for the uaddv multiclass). Since we don't have custom lowering for SADDV and UADDV (they return i64 FPR128 types so they don't need it).

Added final touches.

Thanks for making these changes to your patch @dancgr.
Please address the two nits before you commit, but otherwise LGTM.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
10541	nit: >80chars. Please use clang-format before you commit.
10542	nit: `BitSize` is not very descriptive. Perhaps because it is used only once, just propagate `VT.getSizeInBits()` into the expression calculating `OutputVT`? nit: The first character of variables in this file are capitalized, so this should have been `BitSize`.
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
85	This reads a bit confusing, but I guess it works. You could have done the same thing for all intrinsics. By calling `LowerSVEIntReduction` and using the same pattern for all reductions, rather than one using the `SVE_2_Op_Pat` for `uaddv/saddv`, and using `SVE_2_Op_Pat_Reduce_To_Neon` for the others. The only thing different is the result type (i64 vs i8/i16/i32/i64), but the same lowering function works for both.

This revision is now accepted and ready to land.Dec 2 2019, 9:35 AM

dancgr marked 5 inline comments as done.Dec 2 2019, 12:18 PM

dancgr added inline comments.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
10541	Will do. I had ran clang format for ISelLowering, however there were too many changes in other places. I ran clang-format for the part of code that I have changed.
10542	Removed bitsize for clarity.

dancgr updated this revision to Diff 231763.Dec 2 2019, 12:41 PM

dancgr marked 2 inline comments as done.

dancgr updated this revision to Diff 232186.Dec 4 2019, 11:43 AM

b29916cec3f45e5fb5efff5104acf142f348c724

Revision Contents

Path

Size

llvm/

include/

llvm/

IR/

IntrinsicsAArch64.td

27 lines

lib/

Target/

AArch64/

AArch64ISelLowering.h

8 lines

AArch64ISelLowering.cpp

49 lines

AArch64SVEInstrInfo.td

28 lines

SVEInstrFormats.td

33 lines

Utils/

AArch64BaseInfo.h

2 lines

test/

CodeGen/

AArch64/

sve-int-reduce-pred.ll

400 lines

Diff 232186

llvm/include/llvm/IR/IntrinsicsAArch64.td

Show First 20 Lines • Show All 983 Lines • ▼ Show 20 Lines	class AdvSIMD_GatherLoad_32bitOffset_Intrinsic
: GCCBuiltin<"__builtin_sve_" # name>,		: GCCBuiltin<"__builtin_sve_" # name>,
Intrinsic<[OUT], [OUT, llvm_nxv16i1_ty, IN], [IntrNoMem]>;		Intrinsic<[OUT], [OUT, llvm_nxv16i1_ty, IN], [IntrNoMem]>;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// SVE		// SVE

let TargetPrefix = "aarch64" in { // All intrinsics start with "llvm.aarch64.".		let TargetPrefix = "aarch64" in { // All intrinsics start with "llvm.aarch64.".

		class AdvSIMD_SVE_Int_Reduce_Intrinsic
		sdesmalenUnsubmitted Done Reply Inline Actions The result type should rather be `LLVMVectorElementType<0>` instead of `llvm_anyint_ty`. sdesmalen: The result type should rather be `LLVMVectorElementType<0>` instead of `llvm_anyint_ty`.
		: Intrinsic<[LLVMVectorElementType<0>],
		[LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_anyvector_ty],
		[IntrNoMem]>;

		class AdvSIMD_SVE_SADDV_Reduce_Intrinsic
		: Intrinsic<[llvm_i64_ty],
		[LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_anyvector_ty],
		[IntrNoMem]>;

		efriedmaUnsubmitted Done Reply Inline Actions Did you mean to include this in this patch? efriedma: Did you mean to include this in this patch?
		dancgrAuthorUnsubmitted Done Reply Inline Actions Actually no, that is for a different thing. I will be removing this. dancgr: Actually no, that is for a different thing. I will be removing this.
class AdvSIMD_SVE_WHILE_Intrinsic		class AdvSIMD_SVE_WHILE_Intrinsic
: Intrinsic<[llvm_anyvector_ty],		: Intrinsic<[llvm_anyvector_ty],
[llvm_anyint_ty, LLVMMatchType<1>],		[llvm_anyint_ty, LLVMMatchType<1>],
[IntrNoMem]>;		[IntrNoMem]>;

class AdvSIMD_GatherLoad_VecTorBase_Intrinsic		class AdvSIMD_GatherLoad_VecTorBase_Intrinsic
: Intrinsic<[llvm_anyvector_ty],		: Intrinsic<[llvm_anyvector_ty],
[		[
LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
Show All 32 Lines
def int_aarch64_sve_sabd : AdvSIMD_Pred2VectorArg_Intrinsic;		def int_aarch64_sve_sabd : AdvSIMD_Pred2VectorArg_Intrinsic;
def int_aarch64_sve_uabd : AdvSIMD_Pred2VectorArg_Intrinsic;		def int_aarch64_sve_uabd : AdvSIMD_Pred2VectorArg_Intrinsic;

def int_aarch64_sve_mad : AdvSIMD_Pred3VectorArg_Intrinsic;		def int_aarch64_sve_mad : AdvSIMD_Pred3VectorArg_Intrinsic;
def int_aarch64_sve_msb : AdvSIMD_Pred3VectorArg_Intrinsic;		def int_aarch64_sve_msb : AdvSIMD_Pred3VectorArg_Intrinsic;
def int_aarch64_sve_mla : AdvSIMD_Pred3VectorArg_Intrinsic;		def int_aarch64_sve_mla : AdvSIMD_Pred3VectorArg_Intrinsic;
def int_aarch64_sve_mls : AdvSIMD_Pred3VectorArg_Intrinsic;		def int_aarch64_sve_mls : AdvSIMD_Pred3VectorArg_Intrinsic;

		def int_aarch64_sve_saddv : AdvSIMD_SVE_SADDV_Reduce_Intrinsic;
		def int_aarch64_sve_uaddv : AdvSIMD_SVE_SADDV_Reduce_Intrinsic;

		def int_aarch64_sve_smaxv : AdvSIMD_SVE_Int_Reduce_Intrinsic;
		def int_aarch64_sve_umaxv : AdvSIMD_SVE_Int_Reduce_Intrinsic;
		def int_aarch64_sve_sminv : AdvSIMD_SVE_Int_Reduce_Intrinsic;
		def int_aarch64_sve_uminv : AdvSIMD_SVE_Int_Reduce_Intrinsic;

		def int_aarch64_sve_orv : AdvSIMD_SVE_Int_Reduce_Intrinsic;
		def int_aarch64_sve_eorv : AdvSIMD_SVE_Int_Reduce_Intrinsic;
		def int_aarch64_sve_andv : AdvSIMD_SVE_Int_Reduce_Intrinsic;

def int_aarch64_sve_abs : AdvSIMD_Merged1VectorArg_Intrinsic;		def int_aarch64_sve_abs : AdvSIMD_Merged1VectorArg_Intrinsic;
def int_aarch64_sve_neg : AdvSIMD_Merged1VectorArg_Intrinsic;		def int_aarch64_sve_neg : AdvSIMD_Merged1VectorArg_Intrinsic;

def int_aarch64_sve_sdot : AdvSIMD_SVE_DOT_Intrinsic;		def int_aarch64_sve_sdot : AdvSIMD_SVE_DOT_Intrinsic;
def int_aarch64_sve_sdot_lane : AdvSIMD_SVE_DOT_Indexed_Intrinsic;		def int_aarch64_sve_sdot_lane : AdvSIMD_SVE_DOT_Indexed_Intrinsic;

def int_aarch64_sve_udot : AdvSIMD_SVE_DOT_Intrinsic;		def int_aarch64_sve_udot : AdvSIMD_SVE_DOT_Intrinsic;
def int_aarch64_sve_udot_lane : AdvSIMD_SVE_DOT_Indexed_Intrinsic;		def int_aarch64_sve_udot_lane : AdvSIMD_SVE_DOT_Indexed_Intrinsic;
▲ Show 20 Lines • Show All 250 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 149 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {

// Vector across-lanes min/max		// Vector across-lanes min/max
// Only the lower result lane is defined.		// Only the lower result lane is defined.
SMINV,		SMINV,
UMINV,		UMINV,
SMAXV,		SMAXV,
UMAXV,		UMAXV,

		SMAXV_PRED,
		UMAXV_PRED,
		SMINV_PRED,
		UMINV_PRED,
		ORV_PRED,
		EORV_PRED,
		ANDV_PRED,

// Vector bitwise negation		// Vector bitwise negation
NOT,		NOT,

// Vector bitwise selection		// Vector bitwise selection
BIT,		BIT,

// Compare-and-branch		// Compare-and-branch
CBZ,		CBZ,
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
ST1x3post,		ST1x3post,
ST1x4post,		ST1x4post,
LD1DUPpost,		LD1DUPpost,
LD2DUPpost,		LD2DUPpost,
LD3DUPpost,		LD3DUPpost,
LD4DUPpost,		LD4DUPpost,
LD1LANEpost,		LD1LANEpost,
LD2LANEpost,		LD2LANEpost,
LD3LANEpost,		LD3LANEpost,
		dancgrAuthorUnsubmitted Done Reply Inline Actions I will be removing this unnecessary extra line. dancgr: I will be removing this unnecessary extra line.
LD4LANEpost,		LD4LANEpost,
ST2LANEpost,		ST2LANEpost,
ST3LANEpost,		ST3LANEpost,
ST4LANEpost,		ST4LANEpost,

STG,		STG,
STZG,		STZG,
ST2G,		ST2G,
▲ Show 20 Lines • Show All 548 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 157 Lines • ▼ Show 20 Lines	if (Subtarget->hasNEON()) {
addQRTypeForNEON(MVT::v16i8);		addQRTypeForNEON(MVT::v16i8);
addQRTypeForNEON(MVT::v8i16);		addQRTypeForNEON(MVT::v8i16);
addQRTypeForNEON(MVT::v4i32);		addQRTypeForNEON(MVT::v4i32);
addQRTypeForNEON(MVT::v2i64);		addQRTypeForNEON(MVT::v2i64);
addQRTypeForNEON(MVT::v8f16);		addQRTypeForNEON(MVT::v8f16);
}		}

if (Subtarget->hasSVE()) {		if (Subtarget->hasSVE()) {
// Add legal sve predicate types		// Add legal sve predicate types
		dancgrAuthorUnsubmitted Done Reply Inline Actions I will be removing both those comment lines. dancgr: I will be removing both those comment lines.
addRegisterClass(MVT::nxv2i1, &AArch64::PPRRegClass);		addRegisterClass(MVT::nxv2i1, &AArch64::PPRRegClass);
addRegisterClass(MVT::nxv4i1, &AArch64::PPRRegClass);		addRegisterClass(MVT::nxv4i1, &AArch64::PPRRegClass);
addRegisterClass(MVT::nxv8i1, &AArch64::PPRRegClass);		addRegisterClass(MVT::nxv8i1, &AArch64::PPRRegClass);
		efriedmaUnsubmitted Done Reply Inline Actions Is this change necessary? Making these legal has other effects I don't really want to think about. efriedma: Is this change necessary? Making these legal has other effects I don't really want to think…
		dancgrAuthorUnsubmitted Done Reply Inline Actions Its necessary to legalize those types for FPR. I have ran all tests for code-gen and it did not seam to have any side effect. For NEON we are currently legalizing v16i8 for FPR8 and v8i16 for FPR16 in the same way. dancgr: Its necessary to legalize those types for FPR. I have ran all tests for code-gen and it did not…
		efriedmaUnsubmitted Done Reply Inline Actions I just tried this and I see failures (if I enable this for all NEON targets, not just SVE). efriedma: I just tried this and I see failures (if I enable this for all NEON targets, not just SVE).
		dancgrAuthorUnsubmitted Done Reply Inline Actions Will be removing this. dancgr: Will be removing this.
		huntergrUnsubmitted Done Reply Inline Actions We never needed to add those as legal types downstream. Instead, we use INSERT_SUBREG in output patterns. huntergr: We never needed to add those as legal types downstream. Instead, we use INSERT_SUBREG in output…
addRegisterClass(MVT::nxv16i1, &AArch64::PPRRegClass);		addRegisterClass(MVT::nxv16i1, &AArch64::PPRRegClass);

// Add legal sve data types		// Add legal sve data types
addRegisterClass(MVT::nxv16i8, &AArch64::ZPRRegClass);		addRegisterClass(MVT::nxv16i8, &AArch64::ZPRRegClass);
addRegisterClass(MVT::nxv8i16, &AArch64::ZPRRegClass);		addRegisterClass(MVT::nxv8i16, &AArch64::ZPRRegClass);
addRegisterClass(MVT::nxv4i32, &AArch64::ZPRRegClass);		addRegisterClass(MVT::nxv4i32, &AArch64::ZPRRegClass);
addRegisterClass(MVT::nxv2i64, &AArch64::ZPRRegClass);		addRegisterClass(MVT::nxv2i64, &AArch64::ZPRRegClass);

▲ Show 20 Lines • Show All 1,098 Lines • ▼ Show 20 Lines	const char *AArch64TargetLowering::getTargetNodeName(unsigned Opcode) const {
case AArch64ISD::FCMLEz: return "AArch64ISD::FCMLEz";		case AArch64ISD::FCMLEz: return "AArch64ISD::FCMLEz";
case AArch64ISD::FCMLTz: return "AArch64ISD::FCMLTz";		case AArch64ISD::FCMLTz: return "AArch64ISD::FCMLTz";
case AArch64ISD::SADDV: return "AArch64ISD::SADDV";		case AArch64ISD::SADDV: return "AArch64ISD::SADDV";
case AArch64ISD::UADDV: return "AArch64ISD::UADDV";		case AArch64ISD::UADDV: return "AArch64ISD::UADDV";
case AArch64ISD::SMINV: return "AArch64ISD::SMINV";		case AArch64ISD::SMINV: return "AArch64ISD::SMINV";
case AArch64ISD::UMINV: return "AArch64ISD::UMINV";		case AArch64ISD::UMINV: return "AArch64ISD::UMINV";
case AArch64ISD::SMAXV: return "AArch64ISD::SMAXV";		case AArch64ISD::SMAXV: return "AArch64ISD::SMAXV";
case AArch64ISD::UMAXV: return "AArch64ISD::UMAXV";		case AArch64ISD::UMAXV: return "AArch64ISD::UMAXV";
		case AArch64ISD::SMAXV_PRED: return "AArch64ISD::SMAXV_PRED";
		case AArch64ISD::UMAXV_PRED: return "AArch64ISD::UMAXV_PRED";
		case AArch64ISD::SMINV_PRED: return "AArch64ISD::SMINV_PRED";
		case AArch64ISD::UMINV_PRED: return "AArch64ISD::UMINV_PRED";
		case AArch64ISD::ORV_PRED: return "AArch64ISD::ORV_PRED";
		case AArch64ISD::EORV_PRED: return "AArch64ISD::EORV_PRED";
		case AArch64ISD::ANDV_PRED: return "AArch64ISD::ANDV_PRED";
case AArch64ISD::NOT: return "AArch64ISD::NOT";		case AArch64ISD::NOT: return "AArch64ISD::NOT";
case AArch64ISD::BIT: return "AArch64ISD::BIT";		case AArch64ISD::BIT: return "AArch64ISD::BIT";
case AArch64ISD::CBZ: return "AArch64ISD::CBZ";		case AArch64ISD::CBZ: return "AArch64ISD::CBZ";
case AArch64ISD::CBNZ: return "AArch64ISD::CBNZ";		case AArch64ISD::CBNZ: return "AArch64ISD::CBNZ";
case AArch64ISD::TBZ: return "AArch64ISD::TBZ";		case AArch64ISD::TBZ: return "AArch64ISD::TBZ";
case AArch64ISD::TBNZ: return "AArch64ISD::TBNZ";		case AArch64ISD::TBNZ: return "AArch64ISD::TBNZ";
case AArch64ISD::TC_RETURN: return "AArch64ISD::TC_RETURN";		case AArch64ISD::TC_RETURN: return "AArch64ISD::TC_RETURN";
case AArch64ISD::PREFETCH: return "AArch64ISD::PREFETCH";		case AArch64ISD::PREFETCH: return "AArch64ISD::PREFETCH";
▲ Show 20 Lines • Show All 9,223 Lines • ▼ Show 20 Lines	static SDValue combineAcrossLanesIntrinsic(unsigned Opc, SDNode *N,
SDLoc dl(N);		SDLoc dl(N);
return DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, N->getValueType(0),		return DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, N->getValueType(0),
DAG.getNode(Opc, dl,		DAG.getNode(Opc, dl,
N->getOperand(1).getSimpleValueType(),		N->getOperand(1).getSimpleValueType(),
N->getOperand(1)),		N->getOperand(1)),
DAG.getConstant(0, dl, MVT::i64));		DAG.getConstant(0, dl, MVT::i64));
}		}

		static SDValue LowerSVEIntReduction(SDNode *N, unsigned Opc,
		SelectionDAG &DAG) {
		SDLoc dl(N);
		LLVMContext &Ctx = *DAG.getContext();
		const TargetLowering &TLI = DAG.getTargetLoweringInfo();

		EVT VT = N->getValueType(0);
		SDValue Pred = N->getOperand(1);
		sdesmalenUnsubmitted Done Reply Inline Actions Given the function name is `LowerSVEIntReduction`, is it better to assert that `DataVT.getVectorElementType().isScalarInteger()` ? sdesmalen: Given the function name is `LowerSVEIntReduction`, is it better to assert that `DataVT.
		SDValue Data = N->getOperand(2);
		EVT DataVT = Data.getValueType();

		sdesmalenUnsubmitted Done Reply Inline Actions This code only works on legal vectors because of the `128/bitSize`, so you'll need to add a if (!TLI.isTypeLegal(DataVT)) return SDValue for when e.g. `<vscale x 2 x i32>` is passed in as a type. nit: Rather than using `128` directly, can we create something like AArch64::NeonBitsPerVector? sdesmalen: This code only works on legal vectors because of the `128/bitSize`, so you'll need to add a…
		if (DataVT.getVectorElementType().isScalarInteger() &&
		sdesmalenUnsubmitted Done Reply Inline Actions nit: >80chars. Please use clang-format before you commit. sdesmalen: nit: >80chars. Please use clang-format before you commit.
		dancgrAuthorUnsubmitted Done Reply Inline Actions Will do. I had ran clang format for ISelLowering, however there were too many changes in other places. I ran clang-format for the part of code that I have changed. dancgr: Will do. I had ran clang format for ISelLowering, however there were too many changes in other…
		(VT == MVT::i8 \|\| VT == MVT::i16 \|\| VT == MVT::i32 \|\| VT == MVT::i64)) {
		sdesmalenUnsubmitted Done Reply Inline Actions nit: `BitSize` is not very descriptive. Perhaps because it is used only once, just propagate `VT.getSizeInBits()` into the expression calculating `OutputVT`? nit: The first character of variables in this file are capitalized, so this should have been `BitSize`. sdesmalen: nit: `BitSize` is not very descriptive. Perhaps because it is used only once, just propagate…
		dancgrAuthorUnsubmitted Done Reply Inline Actions Removed bitsize for clarity. dancgr: Removed bitsize for clarity.
		if (!TLI.isTypeLegal(DataVT))
		return SDValue();

		sdesmalenUnsubmitted Done Reply Inline Actions nit: these empty spaces between the lines don't improve readability. sdesmalen: nit: these empty spaces between the lines don't improve readability.
		EVT OutputVT = EVT::getVectorVT(Ctx, VT,
		AArch64::NeonBitsPerVector / VT.getSizeInBits());
		SDValue Reduce = DAG.getNode(Opc, dl, OutputVT, Pred, Data);
		SDValue Zero = DAG.getConstant(0, dl, MVT::i64);
		SDValue Result = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, VT, Reduce, Zero);
		huntergrUnsubmitted Done Reply Inline Actions This appears to be using the INSERT_SUBREG I mentioned above, but in C++ code. For this case I think it's better to use a tablegen pattern. The extract will be needed for all element types though. huntergr: This appears to be using the INSERT_SUBREG I mentioned above, but in C++ code. For this case I…
		dancgrAuthorUnsubmitted Done Reply Inline Actions Will change the pattern according to the following comment, that way I can get rid of this custom lowering. dancgr: Will change the pattern according to the following comment, that way I can get rid of this…

		return Result;
		}

		return SDValue();
		}

static SDValue performIntrinsicCombine(SDNode *N,		static SDValue performIntrinsicCombine(SDNode *N,
TargetLowering::DAGCombinerInfo &DCI,		TargetLowering::DAGCombinerInfo &DCI,
const AArch64Subtarget *Subtarget) {		const AArch64Subtarget *Subtarget) {
SelectionDAG &DAG = DCI.DAG;		SelectionDAG &DAG = DCI.DAG;
unsigned IID = getIntrinsicID(N);		unsigned IID = getIntrinsicID(N);
switch (IID) {		switch (IID) {
default:		default:
break;		break;
Show All 38 Lines	static SDValue performIntrinsicCombine(SDNode *N,
case Intrinsic::aarch64_neon_ushl:		case Intrinsic::aarch64_neon_ushl:
return tryCombineShiftImm(IID, N, DAG);		return tryCombineShiftImm(IID, N, DAG);
case Intrinsic::aarch64_crc32b:		case Intrinsic::aarch64_crc32b:
case Intrinsic::aarch64_crc32cb:		case Intrinsic::aarch64_crc32cb:
return tryCombineCRC32(0xff, N, DAG);		return tryCombineCRC32(0xff, N, DAG);
case Intrinsic::aarch64_crc32h:		case Intrinsic::aarch64_crc32h:
case Intrinsic::aarch64_crc32ch:		case Intrinsic::aarch64_crc32ch:
return tryCombineCRC32(0xffff, N, DAG);		return tryCombineCRC32(0xffff, N, DAG);
		case Intrinsic::aarch64_sve_smaxv:
		return LowerSVEIntReduction(N, AArch64ISD::SMAXV_PRED, DAG);
		case Intrinsic::aarch64_sve_umaxv:
		return LowerSVEIntReduction(N, AArch64ISD::UMAXV_PRED, DAG);
		case Intrinsic::aarch64_sve_sminv:
		return LowerSVEIntReduction(N, AArch64ISD::SMINV_PRED, DAG);
		case Intrinsic::aarch64_sve_uminv:
		return LowerSVEIntReduction(N, AArch64ISD::UMINV_PRED, DAG);
		case Intrinsic::aarch64_sve_orv:
		return LowerSVEIntReduction(N, AArch64ISD::ORV_PRED, DAG);
		case Intrinsic::aarch64_sve_eorv:
		return LowerSVEIntReduction(N, AArch64ISD::EORV_PRED, DAG);
		case Intrinsic::aarch64_sve_andv:
		return LowerSVEIntReduction(N, AArch64ISD::ANDV_PRED, DAG);
}		}
return SDValue();		return SDValue();
}		}

static SDValue performExtendCombine(SDNode *N,		static SDValue performExtendCombine(SDNode *N,
TargetLowering::DAGCombinerInfo &DCI,		TargetLowering::DAGCombinerInfo &DCI,
SelectionDAG &DAG) {		SelectionDAG &DAG) {
// If we see something like (zext (sabd (extract_high ...), (DUP ...))) then		// If we see something like (zext (sabd (extract_high ...), (DUP ...))) then
▲ Show 20 Lines • Show All 2,017 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

Show All 22 Lines
def AArch64ld1_gather : SDNode<"AArch64ISD::GLD1", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;		def AArch64ld1_gather : SDNode<"AArch64ISD::GLD1", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;
def AArch64ld1_gather_scaled : SDNode<"AArch64ISD::GLD1_SCALED", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;		def AArch64ld1_gather_scaled : SDNode<"AArch64ISD::GLD1_SCALED", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;
def AArch64ld1_gather_uxtw : SDNode<"AArch64ISD::GLD1_UXTW", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;		def AArch64ld1_gather_uxtw : SDNode<"AArch64ISD::GLD1_UXTW", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;
def AArch64ld1_gather_sxtw : SDNode<"AArch64ISD::GLD1_SXTW", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;		def AArch64ld1_gather_sxtw : SDNode<"AArch64ISD::GLD1_SXTW", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;
def AArch64ld1_gather_uxtw_scaled : SDNode<"AArch64ISD::GLD1_UXTW_SCALED", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;		def AArch64ld1_gather_uxtw_scaled : SDNode<"AArch64ISD::GLD1_UXTW_SCALED", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;
def AArch64ld1_gather_sxtw_scaled : SDNode<"AArch64ISD::GLD1_SXTW_SCALED", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;		def AArch64ld1_gather_sxtw_scaled : SDNode<"AArch64ISD::GLD1_SXTW_SCALED", SDT_AArch64_GLD1, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;
def AArch64ld1_gather_imm : SDNode<"AArch64ISD::GLD1_IMM", SDT_AArch64_GLD1_IMM, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;		def AArch64ld1_gather_imm : SDNode<"AArch64ISD::GLD1_IMM", SDT_AArch64_GLD1_IMM, [SDNPHasChain, SDNPMayLoad, SDNPOptInGlue]>;

		def SDT_AArch64Reduce : SDTypeProfile<1, 2, [SDTCisVec<1>, SDTCisVec<2>]>;

		def AArch64smaxv_pred : SDNode<"AArch64ISD::SMAXV_PRED", SDT_AArch64Reduce>;
		def AArch64umaxv_pred : SDNode<"AArch64ISD::UMAXV_PRED", SDT_AArch64Reduce>;
		def AArch64sminv_pred : SDNode<"AArch64ISD::SMINV_PRED", SDT_AArch64Reduce>;
		def AArch64uminv_pred : SDNode<"AArch64ISD::UMINV_PRED", SDT_AArch64Reduce>;
		def AArch64orv_pred : SDNode<"AArch64ISD::ORV_PRED", SDT_AArch64Reduce>;
		def AArch64eorv_pred : SDNode<"AArch64ISD::EORV_PRED", SDT_AArch64Reduce>;
		def AArch64andv_pred : SDNode<"AArch64ISD::ANDV_PRED", SDT_AArch64Reduce>;

let Predicates = [HasSVE] in {		let Predicates = [HasSVE] in {

def RDFFR_PPz : sve_int_rdffr_pred<0b0, "rdffr">;		def RDFFR_PPz : sve_int_rdffr_pred<0b0, "rdffr">;
def RDFFRS_PPz : sve_int_rdffr_pred<0b1, "rdffrs">;		def RDFFRS_PPz : sve_int_rdffr_pred<0b1, "rdffrs">;
def RDFFR_P : sve_int_rdffr_unpred<"rdffr">;		def RDFFR_P : sve_int_rdffr_unpred<"rdffr">;
def SETFFR : sve_int_setffr<"setffr">;		def SETFFR : sve_int_setffr<"setffr">;
def WRFFR : sve_int_wrffr<"wrffr">;		def WRFFR : sve_int_wrffr<"wrffr">;

Show All 27 Lines	let Predicates = [HasSVE] in {
defm UQSUB_ZI : sve_int_arith_imm0<0b111, "uqsub">;		defm UQSUB_ZI : sve_int_arith_imm0<0b111, "uqsub">;

defm MAD_ZPmZZ : sve_int_mladdsub_vvv_pred<0b0, "mad", int_aarch64_sve_mad>;		defm MAD_ZPmZZ : sve_int_mladdsub_vvv_pred<0b0, "mad", int_aarch64_sve_mad>;
defm MSB_ZPmZZ : sve_int_mladdsub_vvv_pred<0b1, "msb", int_aarch64_sve_msb>;		defm MSB_ZPmZZ : sve_int_mladdsub_vvv_pred<0b1, "msb", int_aarch64_sve_msb>;
defm MLA_ZPmZZ : sve_int_mlas_vvv_pred<0b0, "mla", int_aarch64_sve_mla>;		defm MLA_ZPmZZ : sve_int_mlas_vvv_pred<0b0, "mla", int_aarch64_sve_mla>;
defm MLS_ZPmZZ : sve_int_mlas_vvv_pred<0b1, "mls", int_aarch64_sve_mls>;		defm MLS_ZPmZZ : sve_int_mlas_vvv_pred<0b1, "mls", int_aarch64_sve_mls>;

// SVE predicated integer reductions.		// SVE predicated integer reductions.
defm SADDV_VPZ : sve_int_reduce_0_saddv<0b000, "saddv">;		defm SADDV_VPZ : sve_int_reduce_0_saddv<0b000, "saddv", int_aarch64_sve_saddv>;
defm UADDV_VPZ : sve_int_reduce_0_uaddv<0b001, "uaddv">;		defm UADDV_VPZ : sve_int_reduce_0_uaddv<0b001, "uaddv", int_aarch64_sve_uaddv, int_aarch64_sve_saddv>;
		sdesmalenUnsubmitted Done Reply Inline Actions This reads a bit confusing, but I guess it works. You could have done the same thing for all intrinsics. By calling `LowerSVEIntReduction` and using the same pattern for all reductions, rather than one using the `SVE_2_Op_Pat` for `uaddv/saddv`, and using `SVE_2_Op_Pat_Reduce_To_Neon` for the others. The only thing different is the result type (i64 vs i8/i16/i32/i64), but the same lowering function works for both. sdesmalen: This reads a bit confusing, but I guess it works. You could have done the same thing for all…
defm SMAXV_VPZ : sve_int_reduce_1<0b000, "smaxv">;		defm SMAXV_VPZ : sve_int_reduce_1<0b000, "smaxv", AArch64smaxv_pred>;
defm UMAXV_VPZ : sve_int_reduce_1<0b001, "umaxv">;		defm UMAXV_VPZ : sve_int_reduce_1<0b001, "umaxv", AArch64umaxv_pred>;
defm SMINV_VPZ : sve_int_reduce_1<0b010, "sminv">;		defm SMINV_VPZ : sve_int_reduce_1<0b010, "sminv", AArch64sminv_pred>;
defm UMINV_VPZ : sve_int_reduce_1<0b011, "uminv">;		defm UMINV_VPZ : sve_int_reduce_1<0b011, "uminv", AArch64uminv_pred>;
defm ORV_VPZ : sve_int_reduce_2<0b000, "orv">;		defm ORV_VPZ : sve_int_reduce_2<0b000, "orv", AArch64orv_pred>;
defm EORV_VPZ : sve_int_reduce_2<0b001, "eorv">;		defm EORV_VPZ : sve_int_reduce_2<0b001, "eorv", AArch64eorv_pred>;
defm ANDV_VPZ : sve_int_reduce_2<0b010, "andv">;		defm ANDV_VPZ : sve_int_reduce_2<0b010, "andv", AArch64andv_pred>;

defm ORR_ZI : sve_int_log_imm<0b00, "orr", "orn">;		defm ORR_ZI : sve_int_log_imm<0b00, "orr", "orn">;
defm EOR_ZI : sve_int_log_imm<0b01, "eor", "eon">;		defm EOR_ZI : sve_int_log_imm<0b01, "eor", "eon">;
defm AND_ZI : sve_int_log_imm<0b10, "and", "bic">;		defm AND_ZI : sve_int_log_imm<0b10, "and", "bic">;

defm SMAX_ZI : sve_int_arith_imm1<0b00, "smax", simm8>;		defm SMAX_ZI : sve_int_arith_imm1<0b00, "smax", simm8>;
defm SMIN_ZI : sve_int_arith_imm1<0b10, "smin", simm8>;		defm SMIN_ZI : sve_int_arith_imm1<0b10, "smin", simm8>;
defm UMAX_ZI : sve_int_arith_imm1<0b01, "umax", imm0_255>;		defm UMAX_ZI : sve_int_arith_imm1<0b01, "umax", imm0_255>;
▲ Show 20 Lines • Show All 1,475 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/SVEInstrFormats.td

Show First 20 Lines • Show All 287 Lines • ▼ Show 20 Lines
: Pat<(vtd (op vt1:$Op1)),		: Pat<(vtd (op vt1:$Op1)),
(inst $Op1)>;		(inst $Op1)>;

class SVE_2_Op_Pat<ValueType vtd, SDPatternOperator op, ValueType vt1,		class SVE_2_Op_Pat<ValueType vtd, SDPatternOperator op, ValueType vt1,
ValueType vt2, Instruction inst>		ValueType vt2, Instruction inst>
: Pat<(vtd (op vt1:$Op1, vt2:$Op2)),		: Pat<(vtd (op vt1:$Op1, vt2:$Op2)),
(inst $Op1, $Op2)>;		(inst $Op1, $Op2)>;

		class SVE_2_Op_Pat_Reduce_To_Neon<ValueType vtd, SDPatternOperator op, ValueType vt1,
		sdesmalenUnsubmitted Done Reply Inline Actions nit: Given that the other reduction uses the normal pattern, can we rename this to `SVE_2_Op_Pat_Reduce_To_Neon` or something? sdesmalen: nit: Given that the other reduction uses the normal pattern, can we rename this to…
		ValueType vt2, Instruction inst, SubRegIndex sub>
		: Pat<(vtd (op vt1:$Op1, vt2:$Op2)),
		(INSERT_SUBREG (vtd (IMPLICIT_DEF)), (inst $Op1, $Op2), sub)>;

class SVE_3_Op_Pat<ValueType vtd, SDPatternOperator op, ValueType vt1,		class SVE_3_Op_Pat<ValueType vtd, SDPatternOperator op, ValueType vt1,
ValueType vt2, ValueType vt3, Instruction inst>		ValueType vt2, ValueType vt3, Instruction inst>
: Pat<(vtd (op vt1:$Op1, vt2:$Op2, vt3:$Op3)),		: Pat<(vtd (op vt1:$Op1, vt2:$Op2, vt3:$Op3)),
(inst $Op1, $Op2, $Op3)>;		(inst $Op1, $Op2, $Op3)>;

class SVE_4_Op_Pat<ValueType vtd, SDPatternOperator op, ValueType vt1,		class SVE_4_Op_Pat<ValueType vtd, SDPatternOperator op, ValueType vt1,
ValueType vt2, ValueType vt3, ValueType vt4,		ValueType vt2, ValueType vt3, ValueType vt4,
Instruction inst>		Instruction inst>
▲ Show 20 Lines • Show All 5,639 Lines • ▼ Show 20 Lines	: I<(outs regtype:$Vd), (ins PPR3bAny:$Pg, zprty:$Zn),
let Inst{20-19} = fmt;		let Inst{20-19} = fmt;
let Inst{18-16} = opc;		let Inst{18-16} = opc;
let Inst{15-13} = 0b001;		let Inst{15-13} = 0b001;
let Inst{12-10} = Pg;		let Inst{12-10} = Pg;
let Inst{9-5} = Zn;		let Inst{9-5} = Zn;
let Inst{4-0} = Vd;		let Inst{4-0} = Vd;
}		}

multiclass sve_int_reduce_0_saddv<bits<3> opc, string asm> {		multiclass sve_int_reduce_0_saddv<bits<3> opc, string asm, SDPatternOperator op> {
def _B : sve_int_reduce<0b00, 0b00, opc, asm, ZPR8, FPR64>;		def _B : sve_int_reduce<0b00, 0b00, opc, asm, ZPR8, FPR64>;
def _H : sve_int_reduce<0b01, 0b00, opc, asm, ZPR16, FPR64>;		def _H : sve_int_reduce<0b01, 0b00, opc, asm, ZPR16, FPR64>;
def _S : sve_int_reduce<0b10, 0b00, opc, asm, ZPR32, FPR64>;		def _S : sve_int_reduce<0b10, 0b00, opc, asm, ZPR32, FPR64>;

		def : SVE_2_Op_Pat<i64, op, nxv16i1, nxv16i8, !cast<Instruction>(NAME # _B)>;
		def : SVE_2_Op_Pat<i64, op, nxv8i1, nxv8i16, !cast<Instruction>(NAME # _H)>;
		def : SVE_2_Op_Pat<i64, op, nxv4i1, nxv4i32, !cast<Instruction>(NAME # _S)>;
}		}

multiclass sve_int_reduce_0_uaddv<bits<3> opc, string asm> {		multiclass sve_int_reduce_0_uaddv<bits<3> opc, string asm, SDPatternOperator op, SDPatternOperator opSaddv> {
def _B : sve_int_reduce<0b00, 0b00, opc, asm, ZPR8, FPR64>;		def _B : sve_int_reduce<0b00, 0b00, opc, asm, ZPR8, FPR64>;
def _H : sve_int_reduce<0b01, 0b00, opc, asm, ZPR16, FPR64>;		def _H : sve_int_reduce<0b01, 0b00, opc, asm, ZPR16, FPR64>;
def _S : sve_int_reduce<0b10, 0b00, opc, asm, ZPR32, FPR64>;		def _S : sve_int_reduce<0b10, 0b00, opc, asm, ZPR32, FPR64>;
def _D : sve_int_reduce<0b11, 0b00, opc, asm, ZPR64, FPR64>;		def _D : sve_int_reduce<0b11, 0b00, opc, asm, ZPR64, FPR64>;

		def : SVE_2_Op_Pat<i64, op, nxv16i1, nxv16i8, !cast<Instruction>(NAME # _B)>;
		def : SVE_2_Op_Pat<i64, op, nxv8i1, nxv8i16, !cast<Instruction>(NAME # _H)>;
		def : SVE_2_Op_Pat<i64, op, nxv4i1, nxv4i32, !cast<Instruction>(NAME # _S)>;
		def : SVE_2_Op_Pat<i64, op, nxv2i1, nxv2i64, !cast<Instruction>(NAME # _D)>;
		def : SVE_2_Op_Pat<i64, opSaddv, nxv2i1, nxv2i64, !cast<Instruction>(NAME # _D)>;
}		}

multiclass sve_int_reduce_1<bits<3> opc, string asm> {		multiclass sve_int_reduce_1<bits<3> opc, string asm, SDPatternOperator op> {
def _B : sve_int_reduce<0b00, 0b01, opc, asm, ZPR8, FPR8>;		def _B : sve_int_reduce<0b00, 0b01, opc, asm, ZPR8, FPR8>;
def _H : sve_int_reduce<0b01, 0b01, opc, asm, ZPR16, FPR16>;		def _H : sve_int_reduce<0b01, 0b01, opc, asm, ZPR16, FPR16>;
def _S : sve_int_reduce<0b10, 0b01, opc, asm, ZPR32, FPR32>;		def _S : sve_int_reduce<0b10, 0b01, opc, asm, ZPR32, FPR32>;
def _D : sve_int_reduce<0b11, 0b01, opc, asm, ZPR64, FPR64>;		def _D : sve_int_reduce<0b11, 0b01, opc, asm, ZPR64, FPR64>;

		def : SVE_2_Op_Pat_Reduce_To_Neon<v16i8, op, nxv16i1, nxv16i8, !cast<Instruction>(NAME # _B), bsub>;
		def : SVE_2_Op_Pat_Reduce_To_Neon<v8i16, op, nxv8i1, nxv8i16, !cast<Instruction>(NAME # _H), hsub>;
		def : SVE_2_Op_Pat_Reduce_To_Neon<v4i32, op, nxv4i1, nxv4i32, !cast<Instruction>(NAME # _S), ssub>;
		def : SVE_2_Op_Pat_Reduce_To_Neon<v2i64, op, nxv2i1, nxv2i64, !cast<Instruction>(NAME # _D), dsub>;
}		}

multiclass sve_int_reduce_2<bits<3> opc, string asm> {		multiclass sve_int_reduce_2<bits<3> opc, string asm, SDPatternOperator op> {
def _B : sve_int_reduce<0b00, 0b11, opc, asm, ZPR8, FPR8>;		def _B : sve_int_reduce<0b00, 0b11, opc, asm, ZPR8, FPR8>;
def _H : sve_int_reduce<0b01, 0b11, opc, asm, ZPR16, FPR16>;		def _H : sve_int_reduce<0b01, 0b11, opc, asm, ZPR16, FPR16>;
def _S : sve_int_reduce<0b10, 0b11, opc, asm, ZPR32, FPR32>;		def _S : sve_int_reduce<0b10, 0b11, opc, asm, ZPR32, FPR32>;
def _D : sve_int_reduce<0b11, 0b11, opc, asm, ZPR64, FPR64>;		def _D : sve_int_reduce<0b11, 0b11, opc, asm, ZPR64, FPR64>;

		def : SVE_2_Op_Pat_Reduce_To_Neon<v16i8, op, nxv16i1, nxv16i8, !cast<Instruction>(NAME # _B), bsub>;
		huntergrUnsubmitted Done Reply Inline Actions So this is where we implemented the INSERT_SUBREG pattern, like this: def : Pat<(v16i8 (op (nxv16i1 PPR3bAny:$Pg), (nxv16i8 ZPR8:$Zn))), (INSERT_SUBREG (v16i8 (IMPLICIT_DEF)), (!cast<Instruction>(NAME#_B) PPR3bAny:$Pg, ZPR8:$Zn), bsub)>; Similarly for the other types and reduce multiclasses. While we prefer to use `SVE_2_Op_Pat` and similar helpers where possible, sometimes we have to use more involved patterns in this file. Doing so avoids the need to add new legal types just to support the reductions. huntergr: So this is where we implemented the INSERT_SUBREG pattern, like this: ``` def : Pat<(v16i8…
		dancgrAuthorUnsubmitted Done Reply Inline Actions I will update the solution for this. I will be putting that in a re-usable pattern and simplify the lowering. dancgr: I will update the solution for this. I will be putting that in a re-usable pattern and simplify…
		sdesmalenUnsubmitted Done Reply Inline Actions Thanks, that looks quite neat! sdesmalen: Thanks, that looks quite neat!
		def : SVE_2_Op_Pat_Reduce_To_Neon<v8i16, op, nxv8i1, nxv8i16, !cast<Instruction>(NAME # _H), hsub>;
		def : SVE_2_Op_Pat_Reduce_To_Neon<v4i32, op, nxv4i1, nxv4i32, !cast<Instruction>(NAME # _S), ssub>;
		def : SVE_2_Op_Pat_Reduce_To_Neon<v2i64, op, nxv2i1, nxv2i64, !cast<Instruction>(NAME # _D), dsub>;
}		}

class sve_int_movprfx_pred<bits<2> sz8_32, bits<3> opc, string asm,		class sve_int_movprfx_pred<bits<2> sz8_32, bits<3> opc, string asm,
ZPRRegOp zprty, string pg_suffix, dag iops>		ZPRRegOp zprty, string pg_suffix, dag iops>
: I<(outs zprty:$Zd), iops,		: I<(outs zprty:$Zd), iops,
asm, "\t$Zd, $Pg"#pg_suffix#", $Zn",		asm, "\t$Zd, $Pg"#pg_suffix#", $Zn",
"",		"",
[]>, Sched<[]> {		[]>, Sched<[]> {
▲ Show 20 Lines • Show All 252 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/Utils/AArch64BaseInfo.h

	Show First 20 Lines • Show All 646 Lines • ▼ Show 20 Lines
	// The number of bits in a SVE register is architecturally defined			// The number of bits in a SVE register is architecturally defined
	// to be a multiple of this value. If <M x t> has this number of bits,			// to be a multiple of this value. If <M x t> has this number of bits,
	// a <n x M x t> vector can be stored in a SVE register without any			// a <n x M x t> vector can be stored in a SVE register without any
	// redundant bits. If <M x t> has this number of bits divided by P,			// redundant bits. If <M x t> has this number of bits divided by P,
	// a <n x M x t> vector is stored in a SVE register by placing index i			// a <n x M x t> vector is stored in a SVE register by placing index i
	// in index iP of a <n x (MP) x t> vector. The other elements of the			// in index iP of a <n x (MP) x t> vector. The other elements of the
	// <n x (M*P) x t> vector (such as index 1) are undefined.			// <n x (M*P) x t> vector (such as index 1) are undefined.
	static constexpr unsigned SVEBitsPerBlock = 128;			static constexpr unsigned SVEBitsPerBlock = 128;
				const unsigned NeonBitsPerVector = 128;
	} // end namespace AArch64			} // end namespace AArch64

	} // end namespace llvm			} // end namespace llvm

	#endif			#endif

llvm/test/CodeGen/AArch64/sve-int-reduce-pred.ll

This file was added.

				; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s \| FileCheck %s

				define i64 @saddv_i8(<vscale x 16 x i1> %pg, <vscale x 16 x i8> %a) {
				; CHECK-LABEL: saddv_i8:
				; CHECK: saddv d[[REDUCE:[0-9]+]], p0, z0.b
				; CHECK: fmov x0, d[[REDUCE]]
				; CHECK-NEXT: ret
				%out = call i64 @llvm.aarch64.sve.saddv.nxv16i8(<vscale x 16 x i1> %pg,
				<vscale x 16 x i8> %a)
				ret i64 %out
				}

				define i64 @saddv_i16(<vscale x 8 x i1> %pg, <vscale x 8 x i16> %a) {
				; CHECK-LABEL: saddv_i16:
				; CHECK: saddv d[[REDUCE:[0-9]+]], p0, z0.h
				; CHECK: fmov x0, d[[REDUCE]]
				; CHECK-NEXT: ret
				%out = call i64 @llvm.aarch64.sve.saddv.nxv8i16(<vscale x 8 x i1> %pg,
				<vscale x 8 x i16> %a)
				ret i64 %out
				}


				define i64 @saddv_i32(<vscale x 4 x i1> %pg, <vscale x 4 x i32> %a) {
				; CHECK-LABEL: saddv_i32:
				; CHECK: saddv d[[REDUCE:[0-9]+]], p0, z0.s
				; CHECK: fmov x0, d[[REDUCE]]
				; CHECK-NEXT: ret
				%out = call i64 @llvm.aarch64.sve.saddv.nxv4i32(<vscale x 4 x i1> %pg,
				<vscale x 4 x i32> %a)
				ret i64 %out
				}

				define i64 @saddv_i64(<vscale x 2 x i1> %pg, <vscale x 2 x i64> %a) {
				sdesmalenUnsubmitted Done Reply Inline Actions For the ACLE `saddv_i64` is also needed, can you add this case as well? (it will just map directly to a `uaddv` instruction directly, so that should be a simple change where you call `LowerSVEIntReduction` sdesmalen: For the ACLE `saddv_i64` is also needed, can you add this case as well? (it will just map…
				dancgrAuthorUnsubmitted Done Reply Inline Actions For this, I have decided to map it in the SVEInstrFormat pattern instead of the lowering (I added an extra PatternOperator for the uaddv multiclass). Since we don't have custom lowering for SADDV and UADDV (they return i64 FPR128 types so they don't need it). dancgr: For this, I have decided to map it in the SVEInstrFormat pattern instead of the lowering (I…
				; CHECK-LABEL: saddv_i64
				; CHECK: uaddv d[[REDUCE:[0-9]+]], p0, z0.d
				; CHECK: fmov x0, d[[REDUCE]]
				; CHECK-NEXT: ret
				%out = call i64 @llvm.aarch64.sve.saddv.nxv2i64(<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %a)
				ret i64 %out
				}

				define i64 @uaddv_i8(<vscale x 16 x i1> %pg, <vscale x 16 x i8> %a) {
				; CHECK-LABEL: uaddv_i8:
				; CHECK: uaddv d[[REDUCE:[0-9]+]], p0, z0.b
				; CHECK: fmov x0, d[[REDUCE]]
				; CHECK-NEXT: ret
				%out = call i64 @llvm.aarch64.sve.uaddv.nxv16i8(<vscale x 16 x i1> %pg,
				<vscale x 16 x i8> %a)
				ret i64 %out
				}

				define i64 @uaddv_i16(<vscale x 8 x i1> %pg, <vscale x 8 x i16> %a) {
				; CHECK-LABEL: uaddv_i16:
				; CHECK: uaddv d[[REDUCE:[0-9]+]], p0, z0.h
				; CHECK: fmov x0, d[[REDUCE]]
				; CHECK-NEXT: ret
				%out = call i64 @llvm.aarch64.sve.uaddv.nxv8i16(<vscale x 8 x i1> %pg,
				<vscale x 8 x i16> %a)
				ret i64 %out
				}


				define i64 @uaddv_i32(<vscale x 4 x i1> %pg, <vscale x 4 x i32> %a) {
				; CHECK-LABEL: uaddv_i32:
				; CHECK: uaddv d[[REDUCE:[0-9]+]], p0, z0.s
				; CHECK: fmov x0, d[[REDUCE]]
				; CHECK-NEXT: ret
				%out = call i64 @llvm.aarch64.sve.uaddv.nxv4i32(<vscale x 4 x i1> %pg,
				<vscale x 4 x i32> %a)
				ret i64 %out
				}

				define i64 @uaddv_i64(<vscale x 2 x i1> %pg, <vscale x 2 x i64> %a) {
				; CHECK-LABEL: uaddv_i64:
				; CHECK: uaddv d[[REDUCE:[0-9]+]], p0, z0.d
				; CHECK: fmov x0, d[[REDUCE]]
				; CHECK-NEXT: ret
				%out = call i64 @llvm.aarch64.sve.uaddv.nxv2i64(<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %a)
				ret i64 %out
				}

				define i8 @smaxv_i8(<vscale x 16 x i1> %pg, <vscale x 16 x i8> %a) {
				; CHECK-LABEL: smaxv_i8:
				; CHECK: smaxv b[[REDUCE:[0-9]+]], p0, z0.b
				; CHECK: umov w0, v[[REDUCE]].b[0]
				; CHECK-NEXT: ret
				%out = call i8 @llvm.aarch64.sve.smaxv.nxv16i8(<vscale x 16 x i1> %pg,
				<vscale x 16 x i8> %a)
				ret i8 %out
				}

				define i16 @smaxv_i16(<vscale x 8 x i1> %pg, <vscale x 8 x i16> %a) {
				; CHECK-LABEL: smaxv_i16:
				; CHECK: smaxv h[[REDUCE:[0-9]+]], p0, z0.h
				; CHECK: umov w0, v[[REDUCE]].h[0]
				; CHECK-NEXT: ret
				%out = call i16 @llvm.aarch64.sve.smaxv.nxv8i16(<vscale x 8 x i1> %pg,
				<vscale x 8 x i16> %a)
				ret i16 %out
				}

				define i32 @smaxv_i32(<vscale x 4 x i1> %pg, <vscale x 4 x i32> %a) {
				; CHECK-LABEL: smaxv_i32:
				; CHECK: smaxv s[[REDUCE:[0-9]+]], p0, z0.s
				; CHECK: fmov w0, s[[REDUCE]]
				; CHECK-NEXT: ret
				%out = call i32 @llvm.aarch64.sve.smaxv.nxv4i32(<vscale x 4 x i1> %pg,
				<vscale x 4 x i32> %a)
				ret i32 %out
				}

				define i64 @smaxv_i64(<vscale x 2 x i1> %pg, <vscale x 2 x i64> %a) {
				; CHECK-LABEL: smaxv_i64:
				; CHECK: smaxv d[[REDUCE:[0-9]+]], p0, z0.d
				; CHECK: fmov x0, d[[REDUCE]]
				; CHECK-NEXT: ret
				%out = call i64 @llvm.aarch64.sve.smaxv.nxv2i64(<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %a)
				ret i64 %out
				sdesmalenUnsubmitted Done Reply Inline Actions nit: strange indentation here. sdesmalen: nit: strange indentation here.
				}

				define i8 @umaxv_i8(<vscale x 16 x i1> %pg, <vscale x 16 x i8> %a) {
				; CHECK-LABEL: umaxv_i8:
				; CHECK: umaxv b[[REDUCE:[0-9]+]], p0, z0.b
				; CHECK: umov w0, v[[REDUCE]].b[0]
				; CHECK-NEXT: ret
				%out = call i8 @llvm.aarch64.sve.umaxv.nxv16i8(<vscale x 16 x i1> %pg,
				<vscale x 16 x i8> %a)
				ret i8 %out
				}

				define i16 @umaxv_i16(<vscale x 8 x i1> %pg, <vscale x 8 x i16> %a) {
				; CHECK-LABEL: umaxv_i16:
				; CHECK: umaxv h[[REDUCE:[0-9]+]], p0, z0.h
				; CHECK: umov w0, v[[REDUCE]].h[0]
				; CHECK-NEXT: ret
				%out = call i16 @llvm.aarch64.sve.umaxv.nxv8i16(<vscale x 8 x i1> %pg,
				<vscale x 8 x i16> %a)
				ret i16 %out
				}

				define i32 @umaxv_i32(<vscale x 4 x i1> %pg, <vscale x 4 x i32> %a) {
				; CHECK-LABEL: umaxv_i32:
				; CHECK: umaxv s[[REDUCE:[0-9]+]], p0, z0.s
				; CHECK: fmov w0, s[[REDUCE]]
				; CHECK-NEXT: ret
				%out = call i32 @llvm.aarch64.sve.umaxv.nxv4i32(<vscale x 4 x i1> %pg,
				<vscale x 4 x i32> %a)
				ret i32 %out
				}

				define i64 @umaxv_i64(<vscale x 2 x i1> %pg, <vscale x 2 x i64> %a) {
				; CHECK-LABEL: umaxv_i64:
				; CHECK: umaxv d[[REDUCE:[0-9]+]], p0, z0.d
				; CHECK: fmov x0, d[[REDUCE]]
				; CHECK-NEXT: ret
				%out = call i64 @llvm.aarch64.sve.umaxv.nxv2i64(<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %a)
				ret i64 %out
				}

				define i8 @sminv_i8(<vscale x 16 x i1> %pg, <vscale x 16 x i8> %a) {
				; CHECK-LABEL: sminv_i8:
				; CHECK: sminv b[[REDUCE:[0-9]+]], p0, z0.b
				; CHECK: umov w0, v[[REDUCE]].b[0]
				; CHECK-NEXT: ret
				%out = call i8 @llvm.aarch64.sve.sminv.nxv16i8(<vscale x 16 x i1> %pg,
				<vscale x 16 x i8> %a)
				ret i8 %out
				}

				define i16 @sminv_i16(<vscale x 8 x i1> %pg, <vscale x 8 x i16> %a) {
				; CHECK-LABEL: sminv_i16:
				; CHECK: sminv h[[REDUCE:[0-9]+]], p0, z0.h
				; CHECK: umov w0, v[[REDUCE]].h[0]
				; CHECK-NEXT: ret
				%out = call i16 @llvm.aarch64.sve.sminv.nxv8i16(<vscale x 8 x i1> %pg,
				<vscale x 8 x i16> %a)
				ret i16 %out
				}

				define i32 @sminv_i32(<vscale x 4 x i1> %pg, <vscale x 4 x i32> %a) {
				; CHECK-LABEL: sminv_i32:
				; CHECK: sminv s[[REDUCE:[0-9]+]], p0, z0.s
				; CHECK: fmov w0, s[[REDUCE]]
				; CHECK-NEXT: ret
				%out = call i32 @llvm.aarch64.sve.sminv.nxv4i32(<vscale x 4 x i1> %pg,
				<vscale x 4 x i32> %a)
				ret i32 %out
				}

				define i64 @sminv_i64(<vscale x 2 x i1> %pg, <vscale x 2 x i64> %a) {
				; CHECK-LABEL: sminv_i64:
				; CHECK: sminv d[[REDUCE:[0-9]+]], p0, z0.d
				; CHECK: fmov x0, d[[REDUCE]]
				; CHECK-NEXT: ret
				%out = call i64 @llvm.aarch64.sve.sminv.nxv2i64(<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %a)
				ret i64 %out
				}

				define i8 @uminv_i8(<vscale x 16 x i1> %pg, <vscale x 16 x i8> %a) {
				; CHECK-LABEL: uminv_i8:
				; CHECK: uminv b[[REDUCE:[0-9]+]], p0, z0.b
				; CHECK: umov w0, v[[REDUCE]].b[0]
				; CHECK-NEXT: ret
				%out = call i8 @llvm.aarch64.sve.uminv.nxv16i8(<vscale x 16 x i1> %pg,
				<vscale x 16 x i8> %a)
				ret i8 %out
				}

				define i16 @uminv_i16(<vscale x 8 x i1> %pg, <vscale x 8 x i16> %a) {
				; CHECK-LABEL: uminv_i16:
				; CHECK: uminv h[[REDUCE:[0-9]+]], p0, z0.h
				; CHECK: umov w0, v[[REDUCE]].h[0]
				; CHECK-NEXT: ret
				%out = call i16 @llvm.aarch64.sve.uminv.nxv8i16(<vscale x 8 x i1> %pg,
				<vscale x 8 x i16> %a)
				ret i16 %out
				}

				define i32 @uminv_i32(<vscale x 4 x i1> %pg, <vscale x 4 x i32> %a) {
				; CHECK-LABEL: uminv_i32:
				; CHECK: uminv s[[REDUCE:[0-9]+]], p0, z0.s
				; CHECK: fmov w0, s[[REDUCE]]
				; CHECK-NEXT: ret
				%out = call i32 @llvm.aarch64.sve.uminv.nxv4i32(<vscale x 4 x i1> %pg,
				<vscale x 4 x i32> %a)
				ret i32 %out
				}

				define i64 @uminv_i64(<vscale x 2 x i1> %pg, <vscale x 2 x i64> %a) {
				; CHECK-LABEL: uminv_i64:
				; CHECK: uminv d[[REDUCE:[0-9]+]], p0, z0.d
				; CHECK: fmov x0, d[[REDUCE]]
				; CHECK-NEXT: ret
				%out = call i64 @llvm.aarch64.sve.uminv.nxv2i64(<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %a)
				ret i64 %out
				}

				define i8 @orv_i8(<vscale x 16 x i1> %pg, <vscale x 16 x i8> %a) {
				; CHECK-LABEL: orv_i8:
				; CHECK: orv b[[REDUCE:[0-9]+]], p0, z0.b
				; CHECK: umov w0, v[[REDUCE]].b[0]
				; CHECK-NEXT: ret
				%out = call i8 @llvm.aarch64.sve.orv.nxv16i8(<vscale x 16 x i1> %pg,
				<vscale x 16 x i8> %a)
				ret i8 %out
				}

				define i16 @orv_i16(<vscale x 8 x i1> %pg, <vscale x 8 x i16> %a) {
				; CHECK-LABEL: orv_i16:
				; CHECK: orv h[[REDUCE:[0-9]+]], p0, z0.h
				; CHECK: umov w0, v[[REDUCE]].h[0]
				; CHECK-NEXT: ret
				%out = call i16 @llvm.aarch64.sve.orv.nxv8i16(<vscale x 8 x i1> %pg,
				<vscale x 8 x i16> %a)
				ret i16 %out
				}

				define i32 @orv_i32(<vscale x 4 x i1> %pg, <vscale x 4 x i32> %a) {
				; CHECK-LABEL: orv_i32:
				; CHECK: orv s[[REDUCE:[0-9]+]], p0, z0.s
				; CHECK: fmov w0, s[[REDUCE]]
				; CHECK-NEXT: ret
				%out = call i32 @llvm.aarch64.sve.orv.nxv4i32(<vscale x 4 x i1> %pg,
				<vscale x 4 x i32> %a)
				ret i32 %out
				}

				define i64 @orv_i64(<vscale x 2 x i1> %pg, <vscale x 2 x i64> %a) {
				; CHECK-LABEL: orv_i64:
				; CHECK: orv d[[REDUCE:[0-9]+]], p0, z0.d
				; CHECK: fmov x0, d[[REDUCE]]
				; CHECK-NEXT: ret
				%out = call i64 @llvm.aarch64.sve.orv.nxv2i64(<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %a)
				ret i64 %out
				}

				define i8 @eorv_i8(<vscale x 16 x i1> %pg, <vscale x 16 x i8> %a) {
				; CHECK-LABEL: eorv_i8:
				; CHECK: eorv b[[REDUCE:[0-9]+]], p0, z0.b
				; CHECK: umov w0, v[[REDUCE]].b[0]
				; CHECK-NEXT: ret
				%out = call i8 @llvm.aarch64.sve.eorv.nxv16i8(<vscale x 16 x i1> %pg,
				<vscale x 16 x i8> %a)
				ret i8 %out
				}

				define i16 @eorv_i16(<vscale x 8 x i1> %pg, <vscale x 8 x i16> %a) {
				; CHECK-LABEL: eorv_i16:
				; CHECK: eorv h[[REDUCE:[0-9]+]], p0, z0.h
				; CHECK: umov w0, v[[REDUCE]].h[0]
				; CHECK-NEXT: ret
				%out = call i16 @llvm.aarch64.sve.eorv.nxv8i16(<vscale x 8 x i1> %pg,
				<vscale x 8 x i16> %a)
				ret i16 %out
				}

				define i32 @eorv_i32(<vscale x 4 x i1> %pg, <vscale x 4 x i32> %a) {
				; CHECK-LABEL: eorv_i32:
				; CHECK: eorv s[[REDUCE:[0-9]+]], p0, z0.s
				; CHECK: fmov w0, s[[REDUCE]]
				; CHECK-NEXT: ret
				%out = call i32 @llvm.aarch64.sve.eorv.nxv4i32(<vscale x 4 x i1> %pg,
				<vscale x 4 x i32> %a)
				ret i32 %out
				}

				define i64 @eorv_i64(<vscale x 2 x i1> %pg, <vscale x 2 x i64> %a) {
				; CHECK-LABEL: eorv_i64:
				; CHECK: eorv d[[REDUCE:[0-9]+]], p0, z0.d
				; CHECK: fmov x0, d[[REDUCE]]
				; CHECK-NEXT: ret
				%out = call i64 @llvm.aarch64.sve.eorv.nxv2i64(<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %a)
				ret i64 %out
				}

				define i8 @andv_i8(<vscale x 16 x i1> %pg, <vscale x 16 x i8> %a) {
				; CHECK-LABEL: andv_i8:
				; CHECK: andv b[[REDUCE:[0-9]+]], p0, z0.b
				; CHECK: umov w0, v[[REDUCE]].b[0]
				; CHECK-NEXT: ret
				%out = call i8 @llvm.aarch64.sve.andv.nxv16i8(<vscale x 16 x i1> %pg,
				<vscale x 16 x i8> %a)
				ret i8 %out
				}

				define i16 @andv_i16(<vscale x 8 x i1> %pg, <vscale x 8 x i16> %a) {
				; CHECK-LABEL: andv_i16:
				; CHECK: andv h[[REDUCE:[0-9]+]], p0, z0.h
				; CHECK: umov w0, v[[REDUCE]].h[0]
				; CHECK-NEXT: ret
				%out = call i16 @llvm.aarch64.sve.andv.nxv8i16(<vscale x 8 x i1> %pg,
				<vscale x 8 x i16> %a)
				ret i16 %out
				}

				define i32 @andv_i32(<vscale x 4 x i1> %pg, <vscale x 4 x i32> %a) {
				; CHECK-LABEL: andv_i32:
				; CHECK: andv s[[REDUCE:[0-9]+]], p0, z0.s
				; CHECK: fmov w0, s[[REDUCE]]
				; CHECK-NEXT: ret
				%out = call i32 @llvm.aarch64.sve.andv.nxv4i32(<vscale x 4 x i1> %pg,
				<vscale x 4 x i32> %a)
				ret i32 %out
				}

				define i64 @andv_i64(<vscale x 2 x i1> %pg, <vscale x 2 x i64> %a) {
				; CHECK-LABEL: andv_i64:
				; CHECK: andv d[[REDUCE:[0-9]+]], p0, z0.d
				; CHECK: fmov x0, d[[REDUCE]]
				; CHECK-NEXT: ret
				%out = call i64 @llvm.aarch64.sve.andv.nxv2i64(<vscale x 2 x i1> %pg,
				<vscale x 2 x i64> %a)
				ret i64 %out
				}

				declare i64 @llvm.aarch64.sve.saddv.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>)
				declare i64 @llvm.aarch64.sve.saddv.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>)
				declare i64 @llvm.aarch64.sve.saddv.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>)
				declare i64 @llvm.aarch64.sve.saddv.nxv2i64(<vscale x 2 x i1>, <vscale x 2 x i64>)
				declare i64 @llvm.aarch64.sve.uaddv.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>)
				declare i64 @llvm.aarch64.sve.uaddv.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>)
				declare i64 @llvm.aarch64.sve.uaddv.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>)
				declare i64 @llvm.aarch64.sve.uaddv.nxv2i64(<vscale x 2 x i1>, <vscale x 2 x i64>)
				declare i8 @llvm.aarch64.sve.smaxv.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>)
				declare i16 @llvm.aarch64.sve.smaxv.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>)
				declare i32 @llvm.aarch64.sve.smaxv.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>)
				declare i64 @llvm.aarch64.sve.smaxv.nxv2i64(<vscale x 2 x i1>, <vscale x 2 x i64>)
				declare i8 @llvm.aarch64.sve.umaxv.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>)
				declare i16 @llvm.aarch64.sve.umaxv.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>)
				declare i32 @llvm.aarch64.sve.umaxv.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>)
				declare i64 @llvm.aarch64.sve.umaxv.nxv2i64(<vscale x 2 x i1>, <vscale x 2 x i64>)
				declare i8 @llvm.aarch64.sve.sminv.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>)
				declare i16 @llvm.aarch64.sve.sminv.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>)
				declare i32 @llvm.aarch64.sve.sminv.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>)
				declare i64 @llvm.aarch64.sve.sminv.nxv2i64(<vscale x 2 x i1>, <vscale x 2 x i64>)
				declare i8 @llvm.aarch64.sve.uminv.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>)
				declare i16 @llvm.aarch64.sve.uminv.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>)
				declare i32 @llvm.aarch64.sve.uminv.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>)
				declare i64 @llvm.aarch64.sve.uminv.nxv2i64(<vscale x 2 x i1>, <vscale x 2 x i64>)
				declare i8 @llvm.aarch64.sve.orv.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>)
				declare i16 @llvm.aarch64.sve.orv.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>)
				declare i32 @llvm.aarch64.sve.orv.nxv4i32 (<vscale x 4 x i1>, <vscale x 4 x i32>)
				declare i64 @llvm.aarch64.sve.orv.nxv2i64 (<vscale x 2 x i1>, <vscale x 2 x i64>)
				declare i8 @llvm.aarch64.sve.eorv.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>)
				declare i16 @llvm.aarch64.sve.eorv.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>)
				declare i32 @llvm.aarch64.sve.eorv.nxv4i32 (<vscale x 4 x i1>, <vscale x 4 x i32>)
				declare i64 @llvm.aarch64.sve.eorv.nxv2i64 (<vscale x 2 x i1>, <vscale x 2 x i64>)
				declare i8 @llvm.aarch64.sve.andv.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>)
				declare i16 @llvm.aarch64.sve.andv.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>)
				declare i32 @llvm.aarch64.sve.andv.nxv4i32 (<vscale x 4 x i1>, <vscale x 4 x i32>)
				declare i64 @llvm.aarch64.sve.andv.nxv2i64 (<vscale x 2 x i1>, <vscale x 2 x i64>)

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Integer reduction instructions pattern/intrinsics.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 232186

llvm/include/llvm/IR/IntrinsicsAArch64.td

llvm/lib/Target/AArch64/AArch64ISelLowering.h

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

llvm/lib/Target/AArch64/SVEInstrFormats.td

llvm/lib/Target/AArch64/Utils/AArch64BaseInfo.h

llvm/test/CodeGen/AArch64/sve-int-reduce-pred.ll

[AArch64][SVE] Integer reduction instructions pattern/intrinsics.
ClosedPublic