This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
-
AArch64ISelLowering.h
2
AArch64ISelLowering.cpp
1
AArch64SVEInstrInfo.td
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
sve2-hadd.ll

Differential D131875

[AArch64][SVE] Add hadd and rhadd support
ClosedPublic

Authored by dmgreen on Aug 15 2022, 12:10 AM.

Download Raw Diff

Details

Reviewers

efriedma
paulwalker-arm
david-arm
SjoerdMeijer

Commits

rG1da4d5aafad7: [AArch64][SVE] Add hadd and rhadd support

Summary

This adds basic HADD and RHADD support for SVE, by marking the AVGFLOOR and AVGCEIL as custom and converting those to HADD_PRED/RHADD_PRED AArch64 nodes. Both the existing intrinsics and the _PRED nodes are then lowered to the _ZPmZ instructions.

The <vscale x 2 x i64> tests are added here as the underlying code extends to a <vscale x 2 x i128>, which fails to lower. The others are precommitted to show the differences.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

dmgreen created this revision.Aug 15 2022, 12:10 AM

dmgreen created this object with visibility "No One".

Herald added a project: Restricted Project. · View Herald TranscriptAug 15 2022, 12:10 AM

dmgreen requested review of this revision.Aug 15 2022, 12:10 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 15 2022, 12:10 AM

dmgreen added a parent revision: D128919: [DAG] Teach isConstOrConstSplat about SPLAT_VECTORs.Aug 15 2022, 12:10 AM

dmgreen changed the visibility from "No One" to "Public (No Login Required)".

Herald added a reviewer: efriedma. · View Herald TranscriptAug 15 2022, 12:11 AM

Herald added subscribers: psnobl, hiraditya, kristof.beyls, tschuett. · View Herald Transcript

Matt added a subscriber: Matt.Aug 15 2022, 1:20 PM

I think this now has enough in tree to allow it to work. I've given it a rebase, but it has been a very long time since I wrote it.

Herald added a subscriber: ctetreau. · View Herald TranscriptDec 8 2022, 4:05 AM

Harbormaster completed remote builds in B201932: Diff 481229.Dec 8 2022, 4:05 AM

paulwalker-arm added inline comments.Dec 8 2022, 4:32 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
5188–5202	We use the `_PRED` suffix for nodes where the result of inactive lanes does not matter. This means this transform is not strictly safe (as in somebody might do something erroneous based on the name). We have `convertMergedOpToPredOp` to convert from predicated intrinsics to `_PRED` nodes when it is safe to do so. The alternative is to change the ISD name but that seems wrong because for the IR side of things you really do want `_PRED` nodes to free up isel/register allocation by selecting.
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
3308–3309	This ties in with my previous comment because typically here we'd create `SHADD_ZPZZ_UNDEF_B` pseudo instructions to free up the register allocator. You don't need to bother with this for this patch if you don't want to but otherwise you'll need to handle `AArch64shadd_p` and `int_aarch64_sve_shadd` as separate patterns because they have different behaviour. For an example of how we normally handle this see `MUL_ZPmZ` verses `MUL_ZPZZ`.

Ah I see. Thanks. This alters the intrinsics to go via a PatFrag instead.

Harbormaster completed remote builds in B202027: Diff 481352.Dec 8 2022, 10:29 AM

Small code placement issue but otherwise looks good.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1297–1304	I suspect you didn't mean to put this within the `{MVT::nxv16i1, MVT::nxv8i1...` loop? Perhaps it's worth placing within the existing loop for `{MVT::nxv16i8, MVT::nxv8i16, MVT::nxv4i32, MVT::nxv2i64}` and protecting with `hasSVE2` there?

This revision is now accepted and ready to land.Dec 12 2022, 2:15 AM

This revision was landed with ongoing or failed builds.Dec 14 2022, 1:25 AM

Closed by commit rG1da4d5aafad7: [AArch64][SVE] Add hadd and rhadd support (authored by dmgreen). · Explain Why

This revision was automatically updated to reflect the committed changes.

dmgreen added a commit: rG1da4d5aafad7: [AArch64][SVE] Add hadd and rhadd support.

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.h

4 lines

AArch64ISelLowering.cpp

19 lines

AArch64SVEInstrInfo.td

25 lines

test/

CodeGen/

AArch64/

sve2-hadd.ll

212 lines

Diff 482753

llvm/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 96 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
FDIV_PRED,		FDIV_PRED,
FMA_PRED,		FMA_PRED,
FMAX_PRED,		FMAX_PRED,
FMAXNM_PRED,		FMAXNM_PRED,
FMIN_PRED,		FMIN_PRED,
FMINNM_PRED,		FMINNM_PRED,
FMUL_PRED,		FMUL_PRED,
FSUB_PRED,		FSUB_PRED,
		HADDS_PRED,
		HADDU_PRED,
MUL_PRED,		MUL_PRED,
MULHS_PRED,		MULHS_PRED,
MULHU_PRED,		MULHU_PRED,
		RHADDS_PRED,
		RHADDU_PRED,
SDIV_PRED,		SDIV_PRED,
SHL_PRED,		SHL_PRED,
SMAX_PRED,		SMAX_PRED,
SMIN_PRED,		SMIN_PRED,
SRA_PRED,		SRA_PRED,
SRL_PRED,		SRL_PRED,
UDIV_PRED,		UDIV_PRED,
UMAX_PRED,		UMAX_PRED,
▲ Show 20 Lines • Show All 1,104 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,243 Lines • ▼ Show 20 Lines	for (auto VT : {MVT::nxv16i8, MVT::nxv8i16, MVT::nxv4i32, MVT::nxv2i64}) {
setOperationAction(ISD::SADDSAT, VT, Legal);		setOperationAction(ISD::SADDSAT, VT, Legal);
setOperationAction(ISD::UADDSAT, VT, Legal);		setOperationAction(ISD::UADDSAT, VT, Legal);
setOperationAction(ISD::SSUBSAT, VT, Legal);		setOperationAction(ISD::SSUBSAT, VT, Legal);
setOperationAction(ISD::USUBSAT, VT, Legal);		setOperationAction(ISD::USUBSAT, VT, Legal);
setOperationAction(ISD::UREM, VT, Expand);		setOperationAction(ISD::UREM, VT, Expand);
setOperationAction(ISD::SREM, VT, Expand);		setOperationAction(ISD::SREM, VT, Expand);
setOperationAction(ISD::SDIVREM, VT, Expand);		setOperationAction(ISD::SDIVREM, VT, Expand);
setOperationAction(ISD::UDIVREM, VT, Expand);		setOperationAction(ISD::UDIVREM, VT, Expand);

		if (Subtarget->hasSVE2()) {
		setOperationAction(ISD::AVGFLOORS, VT, Custom);
		setOperationAction(ISD::AVGFLOORU, VT, Custom);
		setOperationAction(ISD::AVGCEILS, VT, Custom);
		setOperationAction(ISD::AVGCEILU, VT, Custom);
		}
}		}

// Illegal unpacked integer vector types.		// Illegal unpacked integer vector types.
for (auto VT : {MVT::nxv8i8, MVT::nxv4i16, MVT::nxv2i32}) {		for (auto VT : {MVT::nxv8i8, MVT::nxv4i16, MVT::nxv2i32}) {
setOperationAction(ISD::EXTRACT_SUBVECTOR, VT, Custom);		setOperationAction(ISD::EXTRACT_SUBVECTOR, VT, Custom);
setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom);		setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom);
}		}

Show All 22 Lines	for (auto VT :
setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom);		setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom);

// There are no legal MVT::nxv16f## based types.		// There are no legal MVT::nxv16f## based types.
if (VT != MVT::nxv16i1) {		if (VT != MVT::nxv16i1) {
setOperationAction(ISD::SINT_TO_FP, VT, Custom);		setOperationAction(ISD::SINT_TO_FP, VT, Custom);
setOperationAction(ISD::UINT_TO_FP, VT, Custom);		setOperationAction(ISD::UINT_TO_FP, VT, Custom);
}		}
}		}

// NEON doesn't support masked loads/stores/gathers/scatters, but SVE does		// NEON doesn't support masked loads/stores/gathers/scatters, but SVE does
for (auto VT : {MVT::v4f16, MVT::v8f16, MVT::v2f32, MVT::v4f32, MVT::v1f64,		for (auto VT : {MVT::v4f16, MVT::v8f16, MVT::v2f32, MVT::v4f32, MVT::v1f64,
MVT::v2f64, MVT::v8i8, MVT::v16i8, MVT::v4i16, MVT::v8i16,		MVT::v2f64, MVT::v8i8, MVT::v16i8, MVT::v4i16, MVT::v8i16,
MVT::v2i32, MVT::v4i32, MVT::v1i64, MVT::v2i64}) {		MVT::v2i32, MVT::v4i32, MVT::v1i64, MVT::v2i64}) {
setOperationAction(ISD::MLOAD, VT, Custom);		setOperationAction(ISD::MLOAD, VT, Custom);
setOperationAction(ISD::MSTORE, VT, Custom);		setOperationAction(ISD::MSTORE, VT, Custom);
setOperationAction(ISD::MGATHER, VT, Custom);		setOperationAction(ISD::MGATHER, VT, Custom);
		paulwalker-armUnsubmitted Not Done Reply Inline Actions I suspect you didn't mean to put this within the `{MVT::nxv16i1, MVT::nxv8i1...` loop? Perhaps it's worth placing within the existing loop for `{MVT::nxv16i8, MVT::nxv8i16, MVT::nxv4i32, MVT::nxv2i64}` and protecting with `hasSVE2` there? paulwalker-arm: I suspect you didn't mean to put this within the `{MVT::nxv16i1, MVT::nxv8i1...` loop? Perhaps…
setOperationAction(ISD::MSCATTER, VT, Custom);		setOperationAction(ISD::MSCATTER, VT, Custom);
}		}

// Firstly, exclude all scalable vector extending loads/truncating stores,		// Firstly, exclude all scalable vector extending loads/truncating stores,
// include both integer and floating scalable vector.		// include both integer and floating scalable vector.
for (MVT VT : MVT::scalable_vector_valuetypes()) {		for (MVT VT : MVT::scalable_vector_valuetypes()) {
for (MVT InnerVT : MVT::scalable_vector_valuetypes()) {		for (MVT InnerVT : MVT::scalable_vector_valuetypes()) {
setTruncStoreAction(VT, InnerVT, Expand);		setTruncStoreAction(VT, InnerVT, Expand);
▲ Show 20 Lines • Show All 908 Lines • ▼ Show 20 Lines	case AArch64ISD::FIRST_NUMBER:
MAKE_CASE(AArch64ISD::CSEL)		MAKE_CASE(AArch64ISD::CSEL)
MAKE_CASE(AArch64ISD::CSINV)		MAKE_CASE(AArch64ISD::CSINV)
MAKE_CASE(AArch64ISD::CSNEG)		MAKE_CASE(AArch64ISD::CSNEG)
MAKE_CASE(AArch64ISD::CSINC)		MAKE_CASE(AArch64ISD::CSINC)
MAKE_CASE(AArch64ISD::THREAD_POINTER)		MAKE_CASE(AArch64ISD::THREAD_POINTER)
MAKE_CASE(AArch64ISD::TLSDESC_CALLSEQ)		MAKE_CASE(AArch64ISD::TLSDESC_CALLSEQ)
MAKE_CASE(AArch64ISD::ABDS_PRED)		MAKE_CASE(AArch64ISD::ABDS_PRED)
MAKE_CASE(AArch64ISD::ABDU_PRED)		MAKE_CASE(AArch64ISD::ABDU_PRED)
		MAKE_CASE(AArch64ISD::HADDS_PRED)
		MAKE_CASE(AArch64ISD::HADDU_PRED)
MAKE_CASE(AArch64ISD::MUL_PRED)		MAKE_CASE(AArch64ISD::MUL_PRED)
MAKE_CASE(AArch64ISD::MULHS_PRED)		MAKE_CASE(AArch64ISD::MULHS_PRED)
MAKE_CASE(AArch64ISD::MULHU_PRED)		MAKE_CASE(AArch64ISD::MULHU_PRED)
		MAKE_CASE(AArch64ISD::RHADDS_PRED)
		MAKE_CASE(AArch64ISD::RHADDU_PRED)
MAKE_CASE(AArch64ISD::SDIV_PRED)		MAKE_CASE(AArch64ISD::SDIV_PRED)
MAKE_CASE(AArch64ISD::SHL_PRED)		MAKE_CASE(AArch64ISD::SHL_PRED)
MAKE_CASE(AArch64ISD::SMAX_PRED)		MAKE_CASE(AArch64ISD::SMAX_PRED)
MAKE_CASE(AArch64ISD::SMIN_PRED)		MAKE_CASE(AArch64ISD::SMIN_PRED)
MAKE_CASE(AArch64ISD::SRA_PRED)		MAKE_CASE(AArch64ISD::SRA_PRED)
MAKE_CASE(AArch64ISD::SRL_PRED)		MAKE_CASE(AArch64ISD::SRL_PRED)
MAKE_CASE(AArch64ISD::UDIV_PRED)		MAKE_CASE(AArch64ISD::UDIV_PRED)
MAKE_CASE(AArch64ISD::UMAX_PRED)		MAKE_CASE(AArch64ISD::UMAX_PRED)
▲ Show 20 Lines • Show All 2,936 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
case Intrinsic::aarch64_sve_udot: {		case Intrinsic::aarch64_sve_udot: {
unsigned Opcode = (IntNo == Intrinsic::aarch64_neon_udot \|\|		unsigned Opcode = (IntNo == Intrinsic::aarch64_neon_udot \|\|
IntNo == Intrinsic::aarch64_sve_udot)		IntNo == Intrinsic::aarch64_sve_udot)
? AArch64ISD::UDOT		? AArch64ISD::UDOT
: AArch64ISD::SDOT;		: AArch64ISD::SDOT;
return DAG.getNode(Opcode, dl, Op.getValueType(), Op.getOperand(1),		return DAG.getNode(Opcode, dl, Op.getValueType(), Op.getOperand(1),
Op.getOperand(2), Op.getOperand(3));		Op.getOperand(2), Op.getOperand(3));
}		}
case Intrinsic::get_active_lane_mask: {		case Intrinsic::get_active_lane_mask: {
SDValue ID =		SDValue ID =
DAG.getTargetConstant(Intrinsic::aarch64_sve_whilelo, dl, MVT::i64);		DAG.getTargetConstant(Intrinsic::aarch64_sve_whilelo, dl, MVT::i64);
return DAG.getNode(ISD::INTRINSIC_WO_CHAIN, dl, Op.getValueType(), ID,		return DAG.getNode(ISD::INTRINSIC_WO_CHAIN, dl, Op.getValueType(), ID,
Op.getOperand(1), Op.getOperand(2));		Op.getOperand(1), Op.getOperand(2));
}		}
}		}
}		}

bool AArch64TargetLowering::shouldExtendGSIndex(EVT VT, EVT &EltTy) const {		bool AArch64TargetLowering::shouldExtendGSIndex(EVT VT, EVT &EltTy) const {
if (VT.getVectorElementType() == MVT::i8 \|\|		if (VT.getVectorElementType() == MVT::i8 \|\|
VT.getVectorElementType() == MVT::i16) {		VT.getVectorElementType() == MVT::i16) {
EltTy = MVT::i32;		EltTy = MVT::i32;
return true;		return true;
}		}
		paulwalker-armUnsubmitted Not Done Reply Inline Actions We use the `_PRED` suffix for nodes where the result of inactive lanes does not matter. This means this transform is not strictly safe (as in somebody might do something erroneous based on the name). We have `convertMergedOpToPredOp` to convert from predicated intrinsics to `_PRED` nodes when it is safe to do so. The alternative is to change the ISD name but that seems wrong because for the IR side of things you really do want `_PRED` nodes to free up isel/register allocation by selecting. paulwalker-arm: We use the `_PRED` suffix for nodes where the result of inactive lanes does not matter. This…
return false;		return false;
}		}

bool AArch64TargetLowering::shouldRemoveExtendFromGSIndex(EVT IndexVT,		bool AArch64TargetLowering::shouldRemoveExtendFromGSIndex(EVT IndexVT,
EVT DataVT) const {		EVT DataVT) const {
// SVE only supports implicit extension of 32-bit indices.		// SVE only supports implicit extension of 32-bit indices.
if (!Subtarget->hasSVE() \|\| IndexVT.getVectorElementType() != MVT::i32)		if (!Subtarget->hasSVE() \|\| IndexVT.getVectorElementType() != MVT::i32)
return false;		return false;
▲ Show 20 Lines • Show All 740 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::LowerOperation(SDValue Op,
case ISD::VSELECT:		case ISD::VSELECT:
return LowerFixedLengthVectorSelectToSVE(Op, DAG);		return LowerFixedLengthVectorSelectToSVE(Op, DAG);
case ISD::ABS:		case ISD::ABS:
return LowerABS(Op, DAG);		return LowerABS(Op, DAG);
case ISD::ABDS:		case ISD::ABDS:
return LowerToPredicatedOp(Op, DAG, AArch64ISD::ABDS_PRED);		return LowerToPredicatedOp(Op, DAG, AArch64ISD::ABDS_PRED);
case ISD::ABDU:		case ISD::ABDU:
return LowerToPredicatedOp(Op, DAG, AArch64ISD::ABDU_PRED);		return LowerToPredicatedOp(Op, DAG, AArch64ISD::ABDU_PRED);
		case ISD::AVGFLOORS:
		return LowerToPredicatedOp(Op, DAG, AArch64ISD::HADDS_PRED);
		case ISD::AVGFLOORU:
		return LowerToPredicatedOp(Op, DAG, AArch64ISD::HADDU_PRED);
		case ISD::AVGCEILS:
		return LowerToPredicatedOp(Op, DAG, AArch64ISD::RHADDS_PRED);
		case ISD::AVGCEILU:
		return LowerToPredicatedOp(Op, DAG, AArch64ISD::RHADDU_PRED);
case ISD::BITREVERSE:		case ISD::BITREVERSE:
return LowerBitreverse(Op, DAG);		return LowerBitreverse(Op, DAG);
case ISD::BSWAP:		case ISD::BSWAP:
return LowerToPredicatedOp(Op, DAG, AArch64ISD::BSWAP_MERGE_PASSTHRU);		return LowerToPredicatedOp(Op, DAG, AArch64ISD::BSWAP_MERGE_PASSTHRU);
case ISD::CTLZ:		case ISD::CTLZ:
return LowerToPredicatedOp(Op, DAG, AArch64ISD::CTLZ_MERGE_PASSTHRU);		return LowerToPredicatedOp(Op, DAG, AArch64ISD::CTLZ_MERGE_PASSTHRU);
case ISD::CTTZ:		case ISD::CTTZ:
return LowerCTTZ(Op, DAG);		return LowerCTTZ(Op, DAG);
▲ Show 20 Lines • Show All 17,617 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

Show First 20 Lines • Show All 183 Lines • ▼ Show 20 Lines
def AArch64fmin_p : SDNode<"AArch64ISD::FMIN_PRED", SDT_AArch64Arith>;		def AArch64fmin_p : SDNode<"AArch64ISD::FMIN_PRED", SDT_AArch64Arith>;
def AArch64fminnm_p : SDNode<"AArch64ISD::FMINNM_PRED", SDT_AArch64Arith>;		def AArch64fminnm_p : SDNode<"AArch64ISD::FMINNM_PRED", SDT_AArch64Arith>;
def AArch64fmul_p : SDNode<"AArch64ISD::FMUL_PRED", SDT_AArch64Arith>;		def AArch64fmul_p : SDNode<"AArch64ISD::FMUL_PRED", SDT_AArch64Arith>;
def AArch64fsub_p : SDNode<"AArch64ISD::FSUB_PRED", SDT_AArch64Arith>;		def AArch64fsub_p : SDNode<"AArch64ISD::FSUB_PRED", SDT_AArch64Arith>;
def AArch64lsl_p : SDNode<"AArch64ISD::SHL_PRED", SDT_AArch64Arith>;		def AArch64lsl_p : SDNode<"AArch64ISD::SHL_PRED", SDT_AArch64Arith>;
def AArch64lsr_p : SDNode<"AArch64ISD::SRL_PRED", SDT_AArch64Arith>;		def AArch64lsr_p : SDNode<"AArch64ISD::SRL_PRED", SDT_AArch64Arith>;
def AArch64mul_p : SDNode<"AArch64ISD::MUL_PRED", SDT_AArch64Arith>;		def AArch64mul_p : SDNode<"AArch64ISD::MUL_PRED", SDT_AArch64Arith>;
def AArch64sabd_p : SDNode<"AArch64ISD::ABDS_PRED", SDT_AArch64Arith>;		def AArch64sabd_p : SDNode<"AArch64ISD::ABDS_PRED", SDT_AArch64Arith>;
		def AArch64shadd_p : SDNode<"AArch64ISD::HADDS_PRED", SDT_AArch64Arith>;
		def AArch64srhadd_p : SDNode<"AArch64ISD::RHADDS_PRED", SDT_AArch64Arith>;
def AArch64sdiv_p : SDNode<"AArch64ISD::SDIV_PRED", SDT_AArch64Arith>;		def AArch64sdiv_p : SDNode<"AArch64ISD::SDIV_PRED", SDT_AArch64Arith>;
def AArch64smax_p : SDNode<"AArch64ISD::SMAX_PRED", SDT_AArch64Arith>;		def AArch64smax_p : SDNode<"AArch64ISD::SMAX_PRED", SDT_AArch64Arith>;
def AArch64smin_p : SDNode<"AArch64ISD::SMIN_PRED", SDT_AArch64Arith>;		def AArch64smin_p : SDNode<"AArch64ISD::SMIN_PRED", SDT_AArch64Arith>;
def AArch64smulh_p : SDNode<"AArch64ISD::MULHS_PRED", SDT_AArch64Arith>;		def AArch64smulh_p : SDNode<"AArch64ISD::MULHS_PRED", SDT_AArch64Arith>;
def AArch64uabd_p : SDNode<"AArch64ISD::ABDU_PRED", SDT_AArch64Arith>;		def AArch64uabd_p : SDNode<"AArch64ISD::ABDU_PRED", SDT_AArch64Arith>;
		def AArch64uhadd_p : SDNode<"AArch64ISD::HADDU_PRED", SDT_AArch64Arith>;
		def AArch64urhadd_p : SDNode<"AArch64ISD::RHADDU_PRED", SDT_AArch64Arith>;
def AArch64udiv_p : SDNode<"AArch64ISD::UDIV_PRED", SDT_AArch64Arith>;		def AArch64udiv_p : SDNode<"AArch64ISD::UDIV_PRED", SDT_AArch64Arith>;
def AArch64umax_p : SDNode<"AArch64ISD::UMAX_PRED", SDT_AArch64Arith>;		def AArch64umax_p : SDNode<"AArch64ISD::UMAX_PRED", SDT_AArch64Arith>;
def AArch64umin_p : SDNode<"AArch64ISD::UMIN_PRED", SDT_AArch64Arith>;		def AArch64umin_p : SDNode<"AArch64ISD::UMIN_PRED", SDT_AArch64Arith>;
def AArch64umulh_p : SDNode<"AArch64ISD::MULHU_PRED", SDT_AArch64Arith>;		def AArch64umulh_p : SDNode<"AArch64ISD::MULHU_PRED", SDT_AArch64Arith>;

def AArch64fadd_p_nsz : PatFrag<(ops node:$op1, node:$op2, node:$op3),		def AArch64fadd_p_nsz : PatFrag<(ops node:$op1, node:$op2, node:$op3),
(AArch64fadd_p node:$op1, node:$op2, node:$op3), [{		(AArch64fadd_p node:$op1, node:$op2, node:$op3), [{
return N->getFlags().hasNoSignedZeros();		return N->getFlags().hasNoSignedZeros();
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	def AArch64fadd_m1 : PatFrags<(ops node:$pg, node:$op1, node:$op2), [
(AArch64fadd_p_nsz (SVEAllActive), node:$op1, (vselect node:$pg, node:$op2, (SVEDup0)))		(AArch64fadd_p_nsz (SVEAllActive), node:$op1, (vselect node:$pg, node:$op2, (SVEDup0)))
]>;		]>;
def AArch64fsub_m1 : PatFrags<(ops node:$pg, node:$op1, node:$op2), [		def AArch64fsub_m1 : PatFrags<(ops node:$pg, node:$op1, node:$op2), [
(int_aarch64_sve_fsub node:$pg, node:$op1, node:$op2),		(int_aarch64_sve_fsub node:$pg, node:$op1, node:$op2),
(vselect node:$pg, (AArch64fsub_p (SVEAllActive), node:$op1, node:$op2), node:$op1),		(vselect node:$pg, (AArch64fsub_p (SVEAllActive), node:$op1, node:$op2), node:$op1),
(AArch64fsub_p (SVEAllActive), node:$op1, (vselect node:$pg, node:$op2, (SVEDup0)))		(AArch64fsub_p (SVEAllActive), node:$op1, (vselect node:$pg, node:$op2, (SVEDup0)))
]>;		]>;

		def AArch64shadd : PatFrags<(ops node:$pg, node:$op1, node:$op2),
		[(int_aarch64_sve_shadd node:$pg, node:$op1, node:$op2),
		(AArch64shadd_p node:$pg, node:$op1, node:$op2)]>;
		def AArch64uhadd : PatFrags<(ops node:$pg, node:$op1, node:$op2),
		[(int_aarch64_sve_uhadd node:$pg, node:$op1, node:$op2),
		(AArch64uhadd_p node:$pg, node:$op1, node:$op2)]>;
		def AArch64srhadd : PatFrags<(ops node:$pg, node:$op1, node:$op2),
		[(int_aarch64_sve_srhadd node:$pg, node:$op1, node:$op2),
		(AArch64srhadd_p node:$pg, node:$op1, node:$op2)]>;
		def AArch64urhadd : PatFrags<(ops node:$pg, node:$op1, node:$op2),
		[(int_aarch64_sve_urhadd node:$pg, node:$op1, node:$op2),
		(AArch64urhadd_p node:$pg, node:$op1, node:$op2)]>;

def AArch64saba : PatFrags<(ops node:$op1, node:$op2, node:$op3),		def AArch64saba : PatFrags<(ops node:$op1, node:$op2, node:$op3),
[(int_aarch64_sve_saba node:$op1, node:$op2, node:$op3),		[(int_aarch64_sve_saba node:$op1, node:$op2, node:$op3),
(add node:$op1, (AArch64sabd_p (SVEAllActive), node:$op2, node:$op3))]>;		(add node:$op1, (AArch64sabd_p (SVEAllActive), node:$op2, node:$op3))]>;

def AArch64uaba : PatFrags<(ops node:$op1, node:$op2, node:$op3),		def AArch64uaba : PatFrags<(ops node:$op1, node:$op2, node:$op3),
[(int_aarch64_sve_uaba node:$op1, node:$op2, node:$op3),		[(int_aarch64_sve_uaba node:$op1, node:$op2, node:$op3),
(add node:$op1, (AArch64uabd_p (SVEAllActive), node:$op2, node:$op3))]>;		(add node:$op1, (AArch64uabd_p (SVEAllActive), node:$op2, node:$op3))]>;

▲ Show 20 Lines • Show All 3,014 Lines • ▼ Show 20 Lines	let Predicates = [HasSVE2orSME] in {
defm SQDMLSLB_ZZZ : sve2_int_mla_long<0b11010, "sqdmlslb", int_aarch64_sve_sqdmlslb>;		defm SQDMLSLB_ZZZ : sve2_int_mla_long<0b11010, "sqdmlslb", int_aarch64_sve_sqdmlslb>;
defm SQDMLSLT_ZZZ : sve2_int_mla_long<0b11011, "sqdmlslt", int_aarch64_sve_sqdmlslt>;		defm SQDMLSLT_ZZZ : sve2_int_mla_long<0b11011, "sqdmlslt", int_aarch64_sve_sqdmlslt>;

// SVE2 saturating multiply-add interleaved long		// SVE2 saturating multiply-add interleaved long
defm SQDMLALBT_ZZZ : sve2_int_mla_long<0b00010, "sqdmlalbt", int_aarch64_sve_sqdmlalbt>;		defm SQDMLALBT_ZZZ : sve2_int_mla_long<0b00010, "sqdmlalbt", int_aarch64_sve_sqdmlalbt>;
defm SQDMLSLBT_ZZZ : sve2_int_mla_long<0b00011, "sqdmlslbt", int_aarch64_sve_sqdmlslbt>;		defm SQDMLSLBT_ZZZ : sve2_int_mla_long<0b00011, "sqdmlslbt", int_aarch64_sve_sqdmlslbt>;

// SVE2 integer halving add/subtract (predicated)		// SVE2 integer halving add/subtract (predicated)
defm SHADD_ZPmZ : sve2_int_arith_pred<0b100000, "shadd", int_aarch64_sve_shadd>;		defm SHADD_ZPmZ : sve2_int_arith_pred<0b100000, "shadd", AArch64shadd>;
defm UHADD_ZPmZ : sve2_int_arith_pred<0b100010, "uhadd", int_aarch64_sve_uhadd>;		defm UHADD_ZPmZ : sve2_int_arith_pred<0b100010, "uhadd", AArch64uhadd>;
		paulwalker-armUnsubmitted Not Done Reply Inline Actions This ties in with my previous comment because typically here we'd create `SHADD_ZPZZ_UNDEF_B` pseudo instructions to free up the register allocator. You don't need to bother with this for this patch if you don't want to but otherwise you'll need to handle `AArch64shadd_p` and `int_aarch64_sve_shadd` as separate patterns because they have different behaviour. For an example of how we normally handle this see `MUL_ZPmZ` verses `MUL_ZPZZ`. paulwalker-arm: This ties in with my previous comment because typically here we'd create `SHADD_ZPZZ_UNDEF_B`…
defm SHSUB_ZPmZ : sve2_int_arith_pred<0b100100, "shsub", int_aarch64_sve_shsub>;		defm SHSUB_ZPmZ : sve2_int_arith_pred<0b100100, "shsub", int_aarch64_sve_shsub>;
defm UHSUB_ZPmZ : sve2_int_arith_pred<0b100110, "uhsub", int_aarch64_sve_uhsub>;		defm UHSUB_ZPmZ : sve2_int_arith_pred<0b100110, "uhsub", int_aarch64_sve_uhsub>;
defm SRHADD_ZPmZ : sve2_int_arith_pred<0b101000, "srhadd", int_aarch64_sve_srhadd>;		defm SRHADD_ZPmZ : sve2_int_arith_pred<0b101000, "srhadd", AArch64srhadd>;
defm URHADD_ZPmZ : sve2_int_arith_pred<0b101010, "urhadd", int_aarch64_sve_urhadd>;		defm URHADD_ZPmZ : sve2_int_arith_pred<0b101010, "urhadd", AArch64urhadd>;
defm SHSUBR_ZPmZ : sve2_int_arith_pred<0b101100, "shsubr", int_aarch64_sve_shsubr>;		defm SHSUBR_ZPmZ : sve2_int_arith_pred<0b101100, "shsubr", int_aarch64_sve_shsubr>;
defm UHSUBR_ZPmZ : sve2_int_arith_pred<0b101110, "uhsubr", int_aarch64_sve_uhsubr>;		defm UHSUBR_ZPmZ : sve2_int_arith_pred<0b101110, "uhsubr", int_aarch64_sve_uhsubr>;

// SVE2 integer pairwise add and accumulate long		// SVE2 integer pairwise add and accumulate long
defm SADALP_ZPmZ : sve2_int_sadd_long_accum_pairwise<0, "sadalp", int_aarch64_sve_sadalp>;		defm SADALP_ZPmZ : sve2_int_sadd_long_accum_pairwise<0, "sadalp", int_aarch64_sve_sadalp>;
defm UADALP_ZPmZ : sve2_int_sadd_long_accum_pairwise<1, "uadalp", int_aarch64_sve_uadalp>;		defm UADALP_ZPmZ : sve2_int_sadd_long_accum_pairwise<1, "uadalp", int_aarch64_sve_uadalp>;

// SVE2 integer pairwise arithmetic		// SVE2 integer pairwise arithmetic
▲ Show 20 Lines • Show All 526 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve2-hadd.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc < %s -mtriple aarch64-none-eabi -mattr=+sve2 -o - \| FileCheck %s		; RUN: llc < %s -mtriple aarch64-none-eabi -mattr=+sve2 -o - \| FileCheck %s

		define <vscale x 2 x i64> @hadds_v2i64(<vscale x 2 x i64> %s0, <vscale x 2 x i64> %s1) {
		; CHECK-LABEL: hadds_v2i64:
		; CHECK: // %bb.0: // %entry
		; CHECK-NEXT: ptrue p0.d
		; CHECK-NEXT: shadd z0.d, p0/m, z0.d, z1.d
		; CHECK-NEXT: ret
		entry:
		%s0s = sext <vscale x 2 x i64> %s0 to <vscale x 2 x i128>
		%s1s = sext <vscale x 2 x i64> %s1 to <vscale x 2 x i128>
		%m = add <vscale x 2 x i128> %s0s, %s1s
		%s = lshr <vscale x 2 x i128> %m, shufflevector (<vscale x 2 x i128> insertelement (<vscale x 2 x i128> poison, i128 1, i32 0), <vscale x 2 x i128> poison, <vscale x 2 x i32> zeroinitializer)
		%s2 = trunc <vscale x 2 x i128> %s to <vscale x 2 x i64>
		ret <vscale x 2 x i64> %s2
		}

		define <vscale x 2 x i64> @haddu_v2i64(<vscale x 2 x i64> %s0, <vscale x 2 x i64> %s1) {
		; CHECK-LABEL: haddu_v2i64:
		; CHECK: // %bb.0: // %entry
		; CHECK-NEXT: ptrue p0.d
		; CHECK-NEXT: uhadd z0.d, p0/m, z0.d, z1.d
		; CHECK-NEXT: ret
		entry:
		%s0s = zext <vscale x 2 x i64> %s0 to <vscale x 2 x i128>
		%s1s = zext <vscale x 2 x i64> %s1 to <vscale x 2 x i128>
		%m = add <vscale x 2 x i128> %s0s, %s1s
		%s = lshr <vscale x 2 x i128> %m, shufflevector (<vscale x 2 x i128> insertelement (<vscale x 2 x i128> poison, i128 1, i32 0), <vscale x 2 x i128> poison, <vscale x 2 x i32> zeroinitializer)
		%s2 = trunc <vscale x 2 x i128> %s to <vscale x 2 x i64>
		ret <vscale x 2 x i64> %s2
		}

define <vscale x 2 x i32> @hadds_v2i32(<vscale x 2 x i32> %s0, <vscale x 2 x i32> %s1) {		define <vscale x 2 x i32> @hadds_v2i32(<vscale x 2 x i32> %s0, <vscale x 2 x i32> %s1) {
; CHECK-LABEL: hadds_v2i32:		; CHECK-LABEL: hadds_v2i32:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: ptrue p0.d		; CHECK-NEXT: ptrue p0.d
; CHECK-NEXT: sxtw z0.d, p0/m, z0.d		; CHECK-NEXT: sxtw z0.d, p0/m, z0.d
; CHECK-NEXT: adr z0.d, [z0.d, z1.d, sxtw]		; CHECK-NEXT: adr z0.d, [z0.d, z1.d, sxtw]
; CHECK-NEXT: lsr z0.d, z0.d, #1		; CHECK-NEXT: lsr z0.d, z0.d, #1
; CHECK-NEXT: ret		; CHECK-NEXT: ret
Show All 20 Lines	entry:
%s = lshr <vscale x 2 x i64> %m, shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 1, i32 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)		%s = lshr <vscale x 2 x i64> %m, shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 1, i32 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)
%s2 = trunc <vscale x 2 x i64> %s to <vscale x 2 x i32>		%s2 = trunc <vscale x 2 x i64> %s to <vscale x 2 x i32>
ret <vscale x 2 x i32> %s2		ret <vscale x 2 x i32> %s2
}		}

define <vscale x 4 x i32> @hadds_v4i32(<vscale x 4 x i32> %s0, <vscale x 4 x i32> %s1) {		define <vscale x 4 x i32> @hadds_v4i32(<vscale x 4 x i32> %s0, <vscale x 4 x i32> %s1) {
; CHECK-LABEL: hadds_v4i32:		; CHECK-LABEL: hadds_v4i32:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: sunpkhi z2.d, z0.s		; CHECK-NEXT: ptrue p0.s
; CHECK-NEXT: sunpklo z0.d, z0.s		; CHECK-NEXT: shadd z0.s, p0/m, z0.s, z1.s
; CHECK-NEXT: sunpkhi z3.d, z1.s
; CHECK-NEXT: sunpklo z1.d, z1.s
; CHECK-NEXT: add z0.d, z0.d, z1.d
; CHECK-NEXT: add z1.d, z2.d, z3.d
; CHECK-NEXT: lsr z1.d, z1.d, #1
; CHECK-NEXT: lsr z0.d, z0.d, #1
; CHECK-NEXT: uzp1 z0.s, z0.s, z1.s
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%s0s = sext <vscale x 4 x i32> %s0 to <vscale x 4 x i64>		%s0s = sext <vscale x 4 x i32> %s0 to <vscale x 4 x i64>
%s1s = sext <vscale x 4 x i32> %s1 to <vscale x 4 x i64>		%s1s = sext <vscale x 4 x i32> %s1 to <vscale x 4 x i64>
%m = add <vscale x 4 x i64> %s0s, %s1s		%m = add <vscale x 4 x i64> %s0s, %s1s
%s = lshr <vscale x 4 x i64> %m, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)		%s = lshr <vscale x 4 x i64> %m, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
%s2 = trunc <vscale x 4 x i64> %s to <vscale x 4 x i32>		%s2 = trunc <vscale x 4 x i64> %s to <vscale x 4 x i32>
ret <vscale x 4 x i32> %s2		ret <vscale x 4 x i32> %s2
}		}

define <vscale x 4 x i32> @haddu_v4i32(<vscale x 4 x i32> %s0, <vscale x 4 x i32> %s1) {		define <vscale x 4 x i32> @haddu_v4i32(<vscale x 4 x i32> %s0, <vscale x 4 x i32> %s1) {
; CHECK-LABEL: haddu_v4i32:		; CHECK-LABEL: haddu_v4i32:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: uunpkhi z2.d, z0.s		; CHECK-NEXT: ptrue p0.s
; CHECK-NEXT: uunpklo z0.d, z0.s		; CHECK-NEXT: uhadd z0.s, p0/m, z0.s, z1.s
; CHECK-NEXT: uunpkhi z3.d, z1.s
; CHECK-NEXT: uunpklo z1.d, z1.s
; CHECK-NEXT: add z0.d, z0.d, z1.d
; CHECK-NEXT: add z1.d, z2.d, z3.d
; CHECK-NEXT: lsr z1.d, z1.d, #1
; CHECK-NEXT: lsr z0.d, z0.d, #1
; CHECK-NEXT: uzp1 z0.s, z0.s, z1.s
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%s0s = zext <vscale x 4 x i32> %s0 to <vscale x 4 x i64>		%s0s = zext <vscale x 4 x i32> %s0 to <vscale x 4 x i64>
%s1s = zext <vscale x 4 x i32> %s1 to <vscale x 4 x i64>		%s1s = zext <vscale x 4 x i32> %s1 to <vscale x 4 x i64>
%m = add <vscale x 4 x i64> %s0s, %s1s		%m = add <vscale x 4 x i64> %s0s, %s1s
%s = lshr <vscale x 4 x i64> %m, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)		%s = lshr <vscale x 4 x i64> %m, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
%s2 = trunc <vscale x 4 x i64> %s to <vscale x 4 x i32>		%s2 = trunc <vscale x 4 x i64> %s to <vscale x 4 x i32>
ret <vscale x 4 x i32> %s2		ret <vscale x 4 x i32> %s2
▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	entry:
%s = lshr <vscale x 4 x i32> %m, shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)		%s = lshr <vscale x 4 x i32> %m, shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
%s2 = trunc <vscale x 4 x i32> %s to <vscale x 4 x i16>		%s2 = trunc <vscale x 4 x i32> %s to <vscale x 4 x i16>
ret <vscale x 4 x i16> %s2		ret <vscale x 4 x i16> %s2
}		}

define <vscale x 8 x i16> @hadds_v8i16(<vscale x 8 x i16> %s0, <vscale x 8 x i16> %s1) {		define <vscale x 8 x i16> @hadds_v8i16(<vscale x 8 x i16> %s0, <vscale x 8 x i16> %s1) {
; CHECK-LABEL: hadds_v8i16:		; CHECK-LABEL: hadds_v8i16:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: sunpkhi z2.s, z0.h		; CHECK-NEXT: ptrue p0.h
; CHECK-NEXT: sunpklo z0.s, z0.h		; CHECK-NEXT: shadd z0.h, p0/m, z0.h, z1.h
; CHECK-NEXT: sunpkhi z3.s, z1.h
; CHECK-NEXT: sunpklo z1.s, z1.h
; CHECK-NEXT: add z0.s, z0.s, z1.s
; CHECK-NEXT: add z1.s, z2.s, z3.s
; CHECK-NEXT: lsr z1.s, z1.s, #1
; CHECK-NEXT: lsr z0.s, z0.s, #1
; CHECK-NEXT: uzp1 z0.h, z0.h, z1.h
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%s0s = sext <vscale x 8 x i16> %s0 to <vscale x 8 x i32>		%s0s = sext <vscale x 8 x i16> %s0 to <vscale x 8 x i32>
%s1s = sext <vscale x 8 x i16> %s1 to <vscale x 8 x i32>		%s1s = sext <vscale x 8 x i16> %s1 to <vscale x 8 x i32>
%m = add <vscale x 8 x i32> %s0s, %s1s		%m = add <vscale x 8 x i32> %s0s, %s1s
%s = lshr <vscale x 8 x i32> %m, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 1, i32 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)		%s = lshr <vscale x 8 x i32> %m, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 1, i32 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)
%s2 = trunc <vscale x 8 x i32> %s to <vscale x 8 x i16>		%s2 = trunc <vscale x 8 x i32> %s to <vscale x 8 x i16>
ret <vscale x 8 x i16> %s2		ret <vscale x 8 x i16> %s2
}		}

define <vscale x 8 x i16> @haddu_v8i16(<vscale x 8 x i16> %s0, <vscale x 8 x i16> %s1) {		define <vscale x 8 x i16> @haddu_v8i16(<vscale x 8 x i16> %s0, <vscale x 8 x i16> %s1) {
; CHECK-LABEL: haddu_v8i16:		; CHECK-LABEL: haddu_v8i16:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: uunpkhi z2.s, z0.h		; CHECK-NEXT: ptrue p0.h
; CHECK-NEXT: uunpklo z0.s, z0.h		; CHECK-NEXT: uhadd z0.h, p0/m, z0.h, z1.h
; CHECK-NEXT: uunpkhi z3.s, z1.h
; CHECK-NEXT: uunpklo z1.s, z1.h
; CHECK-NEXT: add z0.s, z0.s, z1.s
; CHECK-NEXT: add z1.s, z2.s, z3.s
; CHECK-NEXT: lsr z1.s, z1.s, #1
; CHECK-NEXT: lsr z0.s, z0.s, #1
; CHECK-NEXT: uzp1 z0.h, z0.h, z1.h
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%s0s = zext <vscale x 8 x i16> %s0 to <vscale x 8 x i32>		%s0s = zext <vscale x 8 x i16> %s0 to <vscale x 8 x i32>
%s1s = zext <vscale x 8 x i16> %s1 to <vscale x 8 x i32>		%s1s = zext <vscale x 8 x i16> %s1 to <vscale x 8 x i32>
%m = add <vscale x 8 x i32> %s0s, %s1s		%m = add <vscale x 8 x i32> %s0s, %s1s
%s = lshr <vscale x 8 x i32> %m, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 1, i32 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)		%s = lshr <vscale x 8 x i32> %m, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 1, i32 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)
%s2 = trunc <vscale x 8 x i32> %s to <vscale x 8 x i16>		%s2 = trunc <vscale x 8 x i32> %s to <vscale x 8 x i16>
ret <vscale x 8 x i16> %s2		ret <vscale x 8 x i16> %s2
▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	entry:
%s = lshr <vscale x 8 x i16> %m, shufflevector (<vscale x 8 x i16> insertelement (<vscale x 8 x i16> poison, i16 1, i32 0), <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer)		%s = lshr <vscale x 8 x i16> %m, shufflevector (<vscale x 8 x i16> insertelement (<vscale x 8 x i16> poison, i16 1, i32 0), <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer)
%s2 = trunc <vscale x 8 x i16> %s to <vscale x 8 x i8>		%s2 = trunc <vscale x 8 x i16> %s to <vscale x 8 x i8>
ret <vscale x 8 x i8> %s2		ret <vscale x 8 x i8> %s2
}		}

define <vscale x 16 x i8> @hadds_v16i8(<vscale x 16 x i8> %s0, <vscale x 16 x i8> %s1) {		define <vscale x 16 x i8> @hadds_v16i8(<vscale x 16 x i8> %s0, <vscale x 16 x i8> %s1) {
; CHECK-LABEL: hadds_v16i8:		; CHECK-LABEL: hadds_v16i8:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: sunpkhi z2.h, z0.b		; CHECK-NEXT: ptrue p0.b
; CHECK-NEXT: sunpklo z0.h, z0.b		; CHECK-NEXT: shadd z0.b, p0/m, z0.b, z1.b
; CHECK-NEXT: sunpkhi z3.h, z1.b
; CHECK-NEXT: sunpklo z1.h, z1.b
; CHECK-NEXT: add z0.h, z0.h, z1.h
; CHECK-NEXT: add z1.h, z2.h, z3.h
; CHECK-NEXT: lsr z1.h, z1.h, #1
; CHECK-NEXT: lsr z0.h, z0.h, #1
; CHECK-NEXT: uzp1 z0.b, z0.b, z1.b
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%s0s = sext <vscale x 16 x i8> %s0 to <vscale x 16 x i16>		%s0s = sext <vscale x 16 x i8> %s0 to <vscale x 16 x i16>
%s1s = sext <vscale x 16 x i8> %s1 to <vscale x 16 x i16>		%s1s = sext <vscale x 16 x i8> %s1 to <vscale x 16 x i16>
%m = add <vscale x 16 x i16> %s0s, %s1s		%m = add <vscale x 16 x i16> %s0s, %s1s
%s = lshr <vscale x 16 x i16> %m, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i32 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)		%s = lshr <vscale x 16 x i16> %m, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i32 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)
%s2 = trunc <vscale x 16 x i16> %s to <vscale x 16 x i8>		%s2 = trunc <vscale x 16 x i16> %s to <vscale x 16 x i8>
ret <vscale x 16 x i8> %s2		ret <vscale x 16 x i8> %s2
}		}

define <vscale x 16 x i8> @haddu_v16i8(<vscale x 16 x i8> %s0, <vscale x 16 x i8> %s1) {		define <vscale x 16 x i8> @haddu_v16i8(<vscale x 16 x i8> %s0, <vscale x 16 x i8> %s1) {
; CHECK-LABEL: haddu_v16i8:		; CHECK-LABEL: haddu_v16i8:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: uunpkhi z2.h, z0.b		; CHECK-NEXT: ptrue p0.b
; CHECK-NEXT: uunpklo z0.h, z0.b		; CHECK-NEXT: uhadd z0.b, p0/m, z0.b, z1.b
; CHECK-NEXT: uunpkhi z3.h, z1.b
; CHECK-NEXT: uunpklo z1.h, z1.b
; CHECK-NEXT: add z0.h, z0.h, z1.h
; CHECK-NEXT: add z1.h, z2.h, z3.h
; CHECK-NEXT: lsr z1.h, z1.h, #1
; CHECK-NEXT: lsr z0.h, z0.h, #1
; CHECK-NEXT: uzp1 z0.b, z0.b, z1.b
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%s0s = zext <vscale x 16 x i8> %s0 to <vscale x 16 x i16>		%s0s = zext <vscale x 16 x i8> %s0 to <vscale x 16 x i16>
%s1s = zext <vscale x 16 x i8> %s1 to <vscale x 16 x i16>		%s1s = zext <vscale x 16 x i8> %s1 to <vscale x 16 x i16>
%m = add <vscale x 16 x i16> %s0s, %s1s		%m = add <vscale x 16 x i16> %s0s, %s1s
%s = lshr <vscale x 16 x i16> %m, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i32 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)		%s = lshr <vscale x 16 x i16> %m, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i32 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)
%s2 = trunc <vscale x 16 x i16> %s to <vscale x 16 x i8>		%s2 = trunc <vscale x 16 x i16> %s to <vscale x 16 x i8>
ret <vscale x 16 x i8> %s2		ret <vscale x 16 x i8> %s2
}		}

		define <vscale x 2 x i64> @rhadds_v2i64(<vscale x 2 x i64> %s0, <vscale x 2 x i64> %s1) {
		; CHECK-LABEL: rhadds_v2i64:
		; CHECK: // %bb.0: // %entry
		; CHECK-NEXT: ptrue p0.d
		; CHECK-NEXT: srhadd z0.d, p0/m, z0.d, z1.d
		; CHECK-NEXT: ret
		entry:
		%s0s = sext <vscale x 2 x i64> %s0 to <vscale x 2 x i128>
		%s1s = sext <vscale x 2 x i64> %s1 to <vscale x 2 x i128>
		%add = add <vscale x 2 x i128> %s0s, %s1s
		%add2 = add <vscale x 2 x i128> %add, shufflevector (<vscale x 2 x i128> insertelement (<vscale x 2 x i128> poison, i128 1, i32 0), <vscale x 2 x i128> poison, <vscale x 2 x i32> zeroinitializer)
		%s = lshr <vscale x 2 x i128> %add2, shufflevector (<vscale x 2 x i128> insertelement (<vscale x 2 x i128> poison, i128 1, i32 0), <vscale x 2 x i128> poison, <vscale x 2 x i32> zeroinitializer)
		%result = trunc <vscale x 2 x i128> %s to <vscale x 2 x i64>
		ret <vscale x 2 x i64> %result
		}

		define <vscale x 2 x i64> @rhaddu_v2i64(<vscale x 2 x i64> %s0, <vscale x 2 x i64> %s1) {
		; CHECK-LABEL: rhaddu_v2i64:
		; CHECK: // %bb.0: // %entry
		; CHECK-NEXT: ptrue p0.d
		; CHECK-NEXT: urhadd z0.d, p0/m, z0.d, z1.d
		; CHECK-NEXT: ret
		entry:
		%s0s = zext <vscale x 2 x i64> %s0 to <vscale x 2 x i128>
		%s1s = zext <vscale x 2 x i64> %s1 to <vscale x 2 x i128>
		%add = add <vscale x 2 x i128> %s0s, %s1s
		%add2 = add <vscale x 2 x i128> %add, shufflevector (<vscale x 2 x i128> insertelement (<vscale x 2 x i128> poison, i128 1, i32 0), <vscale x 2 x i128> poison, <vscale x 2 x i32> zeroinitializer)
		%s = lshr <vscale x 2 x i128> %add2, shufflevector (<vscale x 2 x i128> insertelement (<vscale x 2 x i128> poison, i128 1, i32 0), <vscale x 2 x i128> poison, <vscale x 2 x i32> zeroinitializer)
		%result = trunc <vscale x 2 x i128> %s to <vscale x 2 x i64>
		ret <vscale x 2 x i64> %result
		}

define <vscale x 2 x i32> @rhadds_v2i32(<vscale x 2 x i32> %s0, <vscale x 2 x i32> %s1) {		define <vscale x 2 x i32> @rhadds_v2i32(<vscale x 2 x i32> %s0, <vscale x 2 x i32> %s1) {
; CHECK-LABEL: rhadds_v2i32:		; CHECK-LABEL: rhadds_v2i32:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: ptrue p0.d		; CHECK-NEXT: ptrue p0.d
; CHECK-NEXT: mov z2.d, #-1 // =0xffffffffffffffff		; CHECK-NEXT: mov z2.d, #-1 // =0xffffffffffffffff
; CHECK-NEXT: sxtw z0.d, p0/m, z0.d		; CHECK-NEXT: sxtw z0.d, p0/m, z0.d
; CHECK-NEXT: sxtw z1.d, p0/m, z1.d		; CHECK-NEXT: sxtw z1.d, p0/m, z1.d
; CHECK-NEXT: eor z0.d, z0.d, z2.d		; CHECK-NEXT: eor z0.d, z0.d, z2.d
Show All 28 Lines	entry:
%s = lshr <vscale x 2 x i64> %add2, shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 1, i32 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)		%s = lshr <vscale x 2 x i64> %add2, shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 1, i32 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)
%result = trunc <vscale x 2 x i64> %s to <vscale x 2 x i32>		%result = trunc <vscale x 2 x i64> %s to <vscale x 2 x i32>
ret <vscale x 2 x i32> %result		ret <vscale x 2 x i32> %result
}		}

define <vscale x 4 x i32> @rhadds_v4i32(<vscale x 4 x i32> %s0, <vscale x 4 x i32> %s1) {		define <vscale x 4 x i32> @rhadds_v4i32(<vscale x 4 x i32> %s0, <vscale x 4 x i32> %s1) {
; CHECK-LABEL: rhadds_v4i32:		; CHECK-LABEL: rhadds_v4i32:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: mov z2.d, #-1 // =0xffffffffffffffff		; CHECK-NEXT: ptrue p0.s
; CHECK-NEXT: sunpkhi z3.d, z0.s		; CHECK-NEXT: srhadd z0.s, p0/m, z0.s, z1.s
; CHECK-NEXT: sunpklo z0.d, z0.s
; CHECK-NEXT: sunpkhi z4.d, z1.s
; CHECK-NEXT: sunpklo z1.d, z1.s
; CHECK-NEXT: eor z0.d, z0.d, z2.d
; CHECK-NEXT: eor z2.d, z3.d, z2.d
; CHECK-NEXT: sub z0.d, z1.d, z0.d
; CHECK-NEXT: sub z1.d, z4.d, z2.d
; CHECK-NEXT: lsr z0.d, z0.d, #1
; CHECK-NEXT: lsr z1.d, z1.d, #1
; CHECK-NEXT: uzp1 z0.s, z0.s, z1.s
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%s0s = sext <vscale x 4 x i32> %s0 to <vscale x 4 x i64>		%s0s = sext <vscale x 4 x i32> %s0 to <vscale x 4 x i64>
%s1s = sext <vscale x 4 x i32> %s1 to <vscale x 4 x i64>		%s1s = sext <vscale x 4 x i32> %s1 to <vscale x 4 x i64>
%add = add <vscale x 4 x i64> %s0s, %s1s		%add = add <vscale x 4 x i64> %s0s, %s1s
%add2 = add <vscale x 4 x i64> %add, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)		%add2 = add <vscale x 4 x i64> %add, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
%s = lshr <vscale x 4 x i64> %add2, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)		%s = lshr <vscale x 4 x i64> %add2, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
%result = trunc <vscale x 4 x i64> %s to <vscale x 4 x i32>		%result = trunc <vscale x 4 x i64> %s to <vscale x 4 x i32>
ret <vscale x 4 x i32> %result		ret <vscale x 4 x i32> %result
}		}

define <vscale x 4 x i32> @rhaddu_v4i32(<vscale x 4 x i32> %s0, <vscale x 4 x i32> %s1) {		define <vscale x 4 x i32> @rhaddu_v4i32(<vscale x 4 x i32> %s0, <vscale x 4 x i32> %s1) {
; CHECK-LABEL: rhaddu_v4i32:		; CHECK-LABEL: rhaddu_v4i32:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: mov z2.d, #-1 // =0xffffffffffffffff		; CHECK-NEXT: ptrue p0.s
; CHECK-NEXT: uunpkhi z3.d, z0.s		; CHECK-NEXT: urhadd z0.s, p0/m, z0.s, z1.s
; CHECK-NEXT: uunpklo z0.d, z0.s
; CHECK-NEXT: uunpkhi z4.d, z1.s
; CHECK-NEXT: uunpklo z1.d, z1.s
; CHECK-NEXT: eor z0.d, z0.d, z2.d
; CHECK-NEXT: eor z2.d, z3.d, z2.d
; CHECK-NEXT: sub z0.d, z1.d, z0.d
; CHECK-NEXT: sub z1.d, z4.d, z2.d
; CHECK-NEXT: lsr z0.d, z0.d, #1
; CHECK-NEXT: lsr z1.d, z1.d, #1
; CHECK-NEXT: uzp1 z0.s, z0.s, z1.s
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%s0s = zext <vscale x 4 x i32> %s0 to <vscale x 4 x i64>		%s0s = zext <vscale x 4 x i32> %s0 to <vscale x 4 x i64>
%s1s = zext <vscale x 4 x i32> %s1 to <vscale x 4 x i64>		%s1s = zext <vscale x 4 x i32> %s1 to <vscale x 4 x i64>
%add = add <vscale x 4 x i64> %s0s, %s1s		%add = add <vscale x 4 x i64> %s0s, %s1s
%add2 = add <vscale x 4 x i64> %add, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)		%add2 = add <vscale x 4 x i64> %add, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
%s = lshr <vscale x 4 x i64> %add2, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)		%s = lshr <vscale x 4 x i64> %add2, shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
%result = trunc <vscale x 4 x i64> %s to <vscale x 4 x i32>		%result = trunc <vscale x 4 x i64> %s to <vscale x 4 x i32>
▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	entry:
%s = lshr <vscale x 4 x i32> %add2, shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)		%s = lshr <vscale x 4 x i32> %add2, shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
%result = trunc <vscale x 4 x i32> %s to <vscale x 4 x i16>		%result = trunc <vscale x 4 x i32> %s to <vscale x 4 x i16>
ret <vscale x 4 x i16> %result		ret <vscale x 4 x i16> %result
}		}

define <vscale x 8 x i16> @rhadds_v8i16(<vscale x 8 x i16> %s0, <vscale x 8 x i16> %s1) {		define <vscale x 8 x i16> @rhadds_v8i16(<vscale x 8 x i16> %s0, <vscale x 8 x i16> %s1) {
; CHECK-LABEL: rhadds_v8i16:		; CHECK-LABEL: rhadds_v8i16:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: mov z2.s, #-1 // =0xffffffffffffffff		; CHECK-NEXT: ptrue p0.h
; CHECK-NEXT: sunpkhi z3.s, z0.h		; CHECK-NEXT: srhadd z0.h, p0/m, z0.h, z1.h
; CHECK-NEXT: sunpklo z0.s, z0.h
; CHECK-NEXT: sunpkhi z4.s, z1.h
; CHECK-NEXT: sunpklo z1.s, z1.h
; CHECK-NEXT: eor z0.d, z0.d, z2.d
; CHECK-NEXT: eor z2.d, z3.d, z2.d
; CHECK-NEXT: sub z0.s, z1.s, z0.s
; CHECK-NEXT: sub z1.s, z4.s, z2.s
; CHECK-NEXT: lsr z0.s, z0.s, #1
; CHECK-NEXT: lsr z1.s, z1.s, #1
; CHECK-NEXT: uzp1 z0.h, z0.h, z1.h
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%s0s = sext <vscale x 8 x i16> %s0 to <vscale x 8 x i32>		%s0s = sext <vscale x 8 x i16> %s0 to <vscale x 8 x i32>
%s1s = sext <vscale x 8 x i16> %s1 to <vscale x 8 x i32>		%s1s = sext <vscale x 8 x i16> %s1 to <vscale x 8 x i32>
%add = add <vscale x 8 x i32> %s0s, %s1s		%add = add <vscale x 8 x i32> %s0s, %s1s
%add2 = add <vscale x 8 x i32> %add, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 1, i32 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)		%add2 = add <vscale x 8 x i32> %add, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 1, i32 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)
%s = lshr <vscale x 8 x i32> %add2, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 1, i32 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)		%s = lshr <vscale x 8 x i32> %add2, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 1, i32 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)
%result = trunc <vscale x 8 x i32> %s to <vscale x 8 x i16>		%result = trunc <vscale x 8 x i32> %s to <vscale x 8 x i16>
ret <vscale x 8 x i16> %result		ret <vscale x 8 x i16> %result
}		}

define <vscale x 8 x i16> @rhaddu_v8i16(<vscale x 8 x i16> %s0, <vscale x 8 x i16> %s1) {		define <vscale x 8 x i16> @rhaddu_v8i16(<vscale x 8 x i16> %s0, <vscale x 8 x i16> %s1) {
; CHECK-LABEL: rhaddu_v8i16:		; CHECK-LABEL: rhaddu_v8i16:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: mov z2.s, #-1 // =0xffffffffffffffff		; CHECK-NEXT: ptrue p0.h
; CHECK-NEXT: uunpkhi z3.s, z0.h		; CHECK-NEXT: urhadd z0.h, p0/m, z0.h, z1.h
; CHECK-NEXT: uunpklo z0.s, z0.h
; CHECK-NEXT: uunpkhi z4.s, z1.h
; CHECK-NEXT: uunpklo z1.s, z1.h
; CHECK-NEXT: eor z0.d, z0.d, z2.d
; CHECK-NEXT: eor z2.d, z3.d, z2.d
; CHECK-NEXT: sub z0.s, z1.s, z0.s
; CHECK-NEXT: sub z1.s, z4.s, z2.s
; CHECK-NEXT: lsr z0.s, z0.s, #1
; CHECK-NEXT: lsr z1.s, z1.s, #1
; CHECK-NEXT: uzp1 z0.h, z0.h, z1.h
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%s0s = zext <vscale x 8 x i16> %s0 to <vscale x 8 x i32>		%s0s = zext <vscale x 8 x i16> %s0 to <vscale x 8 x i32>
%s1s = zext <vscale x 8 x i16> %s1 to <vscale x 8 x i32>		%s1s = zext <vscale x 8 x i16> %s1 to <vscale x 8 x i32>
%add = add <vscale x 8 x i32> %s0s, %s1s		%add = add <vscale x 8 x i32> %s0s, %s1s
%add2 = add <vscale x 8 x i32> %add, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 1, i32 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)		%add2 = add <vscale x 8 x i32> %add, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 1, i32 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)
%s = lshr <vscale x 8 x i32> %add2, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 1, i32 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)		%s = lshr <vscale x 8 x i32> %add2, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 1, i32 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)
%result = trunc <vscale x 8 x i32> %s to <vscale x 8 x i16>		%result = trunc <vscale x 8 x i32> %s to <vscale x 8 x i16>
▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	entry:
%s = lshr <vscale x 8 x i16> %add2, shufflevector (<vscale x 8 x i16> insertelement (<vscale x 8 x i16> poison, i16 1, i32 0), <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer)		%s = lshr <vscale x 8 x i16> %add2, shufflevector (<vscale x 8 x i16> insertelement (<vscale x 8 x i16> poison, i16 1, i32 0), <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer)
%result = trunc <vscale x 8 x i16> %s to <vscale x 8 x i8>		%result = trunc <vscale x 8 x i16> %s to <vscale x 8 x i8>
ret <vscale x 8 x i8> %result		ret <vscale x 8 x i8> %result
}		}

define <vscale x 16 x i8> @rhadds_v16i8(<vscale x 16 x i8> %s0, <vscale x 16 x i8> %s1) {		define <vscale x 16 x i8> @rhadds_v16i8(<vscale x 16 x i8> %s0, <vscale x 16 x i8> %s1) {
; CHECK-LABEL: rhadds_v16i8:		; CHECK-LABEL: rhadds_v16i8:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: mov z2.h, #-1 // =0xffffffffffffffff		; CHECK-NEXT: ptrue p0.b
; CHECK-NEXT: sunpkhi z3.h, z0.b		; CHECK-NEXT: srhadd z0.b, p0/m, z0.b, z1.b
; CHECK-NEXT: sunpklo z0.h, z0.b
; CHECK-NEXT: sunpkhi z4.h, z1.b
; CHECK-NEXT: sunpklo z1.h, z1.b
; CHECK-NEXT: eor z0.d, z0.d, z2.d
; CHECK-NEXT: eor z2.d, z3.d, z2.d
; CHECK-NEXT: sub z0.h, z1.h, z0.h
; CHECK-NEXT: sub z1.h, z4.h, z2.h
; CHECK-NEXT: lsr z0.h, z0.h, #1
; CHECK-NEXT: lsr z1.h, z1.h, #1
; CHECK-NEXT: uzp1 z0.b, z0.b, z1.b
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%s0s = sext <vscale x 16 x i8> %s0 to <vscale x 16 x i16>		%s0s = sext <vscale x 16 x i8> %s0 to <vscale x 16 x i16>
%s1s = sext <vscale x 16 x i8> %s1 to <vscale x 16 x i16>		%s1s = sext <vscale x 16 x i8> %s1 to <vscale x 16 x i16>
%add = add <vscale x 16 x i16> %s0s, %s1s		%add = add <vscale x 16 x i16> %s0s, %s1s
%add2 = add <vscale x 16 x i16> %add, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i32 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)		%add2 = add <vscale x 16 x i16> %add, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i32 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)
%s = lshr <vscale x 16 x i16> %add2, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i32 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)		%s = lshr <vscale x 16 x i16> %add2, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i32 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)
%result = trunc <vscale x 16 x i16> %s to <vscale x 16 x i8>		%result = trunc <vscale x 16 x i16> %s to <vscale x 16 x i8>
ret <vscale x 16 x i8> %result		ret <vscale x 16 x i8> %result
}		}

define <vscale x 16 x i8> @rhaddu_v16i8(<vscale x 16 x i8> %s0, <vscale x 16 x i8> %s1) {		define <vscale x 16 x i8> @rhaddu_v16i8(<vscale x 16 x i8> %s0, <vscale x 16 x i8> %s1) {
; CHECK-LABEL: rhaddu_v16i8:		; CHECK-LABEL: rhaddu_v16i8:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: mov z2.h, #-1 // =0xffffffffffffffff		; CHECK-NEXT: ptrue p0.b
; CHECK-NEXT: uunpkhi z3.h, z0.b		; CHECK-NEXT: urhadd z0.b, p0/m, z0.b, z1.b
; CHECK-NEXT: uunpklo z0.h, z0.b
; CHECK-NEXT: uunpkhi z4.h, z1.b
; CHECK-NEXT: uunpklo z1.h, z1.b
; CHECK-NEXT: eor z0.d, z0.d, z2.d
; CHECK-NEXT: eor z2.d, z3.d, z2.d
; CHECK-NEXT: sub z0.h, z1.h, z0.h
; CHECK-NEXT: sub z1.h, z4.h, z2.h
; CHECK-NEXT: lsr z0.h, z0.h, #1
; CHECK-NEXT: lsr z1.h, z1.h, #1
; CHECK-NEXT: uzp1 z0.b, z0.b, z1.b
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%s0s = zext <vscale x 16 x i8> %s0 to <vscale x 16 x i16>		%s0s = zext <vscale x 16 x i8> %s0 to <vscale x 16 x i16>
%s1s = zext <vscale x 16 x i8> %s1 to <vscale x 16 x i16>		%s1s = zext <vscale x 16 x i8> %s1 to <vscale x 16 x i16>
%add = add <vscale x 16 x i16> %s0s, %s1s		%add = add <vscale x 16 x i16> %s0s, %s1s
%add2 = add <vscale x 16 x i16> %add, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i32 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)		%add2 = add <vscale x 16 x i16> %add, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i32 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)
%s = lshr <vscale x 16 x i16> %add2, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i32 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)		%s = lshr <vscale x 16 x i16> %add2, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i32 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)
%result = trunc <vscale x 16 x i16> %s to <vscale x 16 x i8>		%result = trunc <vscale x 16 x i16> %s to <vscale x 16 x i8>
ret <vscale x 16 x i8> %result		ret <vscale x 16 x i8> %result
}		}