This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
1/1
AArch64ISelLowering.h
-
AArch64ISelLowering.cpp
1/1
AArch64SVEInstrInfo.td
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
sve-expand-div.ll
-
sve-fixed-length-int-div.ll
5/10
sve-fixed-length-int-mulh.ll
-
sve-int-arith-imm.ll
-
sve-int-mulh-pred.ll
-
sve2-int-mulh.ll

Differential D100487

[AArch64][SVE] Lower MULHU/MULHS nodes to umulh/smulh instructions
ClosedPublic

Authored by bsmith on Apr 14 2021, 8:28 AM.

Download Raw Diff

Details

Reviewers

joechrisellis
peterwaller-arm
paulwalker-arm
SanderSpies
efriedma

Commits

rGb8b075d8d744: [AArch64][SVE] Lower MULHU/MULHS nodes to umulh/smulh instructions

Summary

Mark MULHS/MULHU nodes as legal for both scalable and fixed SVE types,
and lower them to the appropriate SVE instructions.

Additionally now that the MULH nodes are legal, integer divides can be
expanded into a more performant code sequence.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

bsmith created this revision.Apr 14 2021, 8:28 AM

Herald added a reviewer: efriedma. · View Herald TranscriptApr 14 2021, 8:28 AM

Herald added subscribers: psnobl, hiraditya, kristof.beyls, tschuett. · View Herald Transcript

bsmith requested review of this revision.Apr 14 2021, 8:28 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 14 2021, 8:28 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

paulwalker-arm added inline comments.Apr 15 2021, 7:58 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.h
92–93	As the general rule we use the common node name suffixed with `_PRED`, so in this case these should be `MULHS_PRED` and `MULHU_PRED`.
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
192	Just a note, because of my other comment, to say that this is fine, here we typically prefer the AArch64 names as you've used here, although it'll be nicer still if you maintained the existing alphabetical ordering :)
llvm/test/CodeGen/AArch64/sve-fixed-length-int-mulh.ll
6	Should this be `VBITS_EQ_2048`?
68	Given you're using `EQ_256` I doubt the `#min` logic means anything. That said, why not VBITS_GE_256, shouldn't the code be the same for larger vectors?
394	As there's no NEON instruction for i64 based vectors I'm wondering if it's worth using SVE for this case as well? much like we do for ISD::MUL.

Rename ISD nodes to match common names
Preserve some alphabetical ordering
Fix 2048 vector width tests
Update fixed width tests to test vector widths smaller than the given value
Allow generation of SVE mulh instructions for neon sized i64 vectors
Add test to check divide expansion that can now happen
Add minsize attribute to some SVE divide tests to prevent divide expansion from happening

Harbormaster completed remote builds in B99150: Diff 338074.Apr 16 2021, 6:58 AM

Matt added a subscriber: Matt.Apr 17 2021, 9:15 AM

paulwalker-arm accepted this revision.Apr 19 2021, 4:06 AM

paulwalker-arm added inline comments.

llvm/test/CodeGen/AArch64/sve-fixed-length-int-mulh.ll
3–6	Please add `RUN` lines for all the support vector lengths. Also, just in case somebody wonders why, can you add a comment saying there's no validation for splitting vector operations because the necessary MULH DAG combine does no apply to illegally typed operations.
292	Weirdly SVE is not being used here. Is the output different when SVE is disable?
780	Same comment as smulh_v4i32.

This revision is now accepted and ready to land.Apr 19 2021, 4:06 AM

bsmith added inline comments.Apr 20 2021, 5:42 AM

llvm/test/CodeGen/AArch64/sve-fixed-length-int-mulh.ll
3–6	There already is a comment as such, on line 12 (23 in the new patch)
292	No, it's the same without SVE enabled. NEON has patterns to match 128-bit mulh nodes (but not 64-bit as above), as it can use the smull2+smull pattern below. Perhaps we should still fall back to SVE instead of this sequence? (Or I just fix the comment..)

paulwalker-arm added inline comments.Apr 20 2021, 5:59 AM

llvm/test/CodeGen/AArch64/sve-fixed-length-int-mulh.ll
3–6	That must have been where "my" idea came from :)
292	Thanks. In which case fixing the comment works for me.

Closed by commit rGb8b075d8d744: [AArch64][SVE] Lower MULHU/MULHS nodes to umulh/smulh instructions (authored by bsmith). · Explain WhyApr 20 2021, 7:18 AM

This revision was automatically updated to reflect the committed changes.

bsmith added a commit: rGb8b075d8d744: [AArch64][SVE] Lower MULHU/MULHS nodes to umulh/smulh instructions.

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.h

2 lines

AArch64ISelLowering.cpp

18 lines

AArch64SVEInstrInfo.td

8 lines

test/

CodeGen/

AArch64/

sve-expand-div.ll

144 lines

sve-fixed-length-int-div.ll

3 lines

sve-fixed-length-int-mulh.ll

1006 lines

sve-int-arith-imm.ll

6 lines

sve-int-mulh-pred.ll

140 lines

sve2-int-mulh.ll

132 lines

Diff 338860

llvm/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 83 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
FMA_PRED,		FMA_PRED,
FMAXNM_PRED,		FMAXNM_PRED,
FMINNM_PRED,		FMINNM_PRED,
FMAX_PRED,		FMAX_PRED,
FMIN_PRED,		FMIN_PRED,
FMUL_PRED,		FMUL_PRED,
FSUB_PRED,		FSUB_PRED,
MUL_PRED,		MUL_PRED,
		MULHS_PRED,
		MULHU_PRED,
		paulwalker-armUnsubmitted Done Reply Inline Actions As the general rule we use the common node name suffixed with `_PRED`, so in this case these should be `MULHS_PRED` and `MULHU_PRED`. paulwalker-arm: As the general rule we use the common node name suffixed with `_PRED`, so in this case these…
SDIV_PRED,		SDIV_PRED,
SHL_PRED,		SHL_PRED,
SMAX_PRED,		SMAX_PRED,
SMIN_PRED,		SMIN_PRED,
SRA_PRED,		SRA_PRED,
SRL_PRED,		SRL_PRED,
SUB_PRED,		SUB_PRED,
UDIV_PRED,		UDIV_PRED,
▲ Show 20 Lines • Show All 989 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,120 Lines • ▼ Show 20 Lines	for (auto VT : {MVT::nxv16i8, MVT::nxv8i16, MVT::nxv4i32, MVT::nxv2i64}) {
setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom);		setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom);
setOperationAction(ISD::UINT_TO_FP, VT, Custom);		setOperationAction(ISD::UINT_TO_FP, VT, Custom);
setOperationAction(ISD::SINT_TO_FP, VT, Custom);		setOperationAction(ISD::SINT_TO_FP, VT, Custom);
setOperationAction(ISD::FP_TO_UINT, VT, Custom);		setOperationAction(ISD::FP_TO_UINT, VT, Custom);
setOperationAction(ISD::FP_TO_SINT, VT, Custom);		setOperationAction(ISD::FP_TO_SINT, VT, Custom);
setOperationAction(ISD::MGATHER, VT, Custom);		setOperationAction(ISD::MGATHER, VT, Custom);
setOperationAction(ISD::MSCATTER, VT, Custom);		setOperationAction(ISD::MSCATTER, VT, Custom);
setOperationAction(ISD::MUL, VT, Custom);		setOperationAction(ISD::MUL, VT, Custom);
		setOperationAction(ISD::MULHS, VT, Custom);
		setOperationAction(ISD::MULHU, VT, Custom);
setOperationAction(ISD::SPLAT_VECTOR, VT, Custom);		setOperationAction(ISD::SPLAT_VECTOR, VT, Custom);
setOperationAction(ISD::SELECT, VT, Custom);		setOperationAction(ISD::SELECT, VT, Custom);
setOperationAction(ISD::SETCC, VT, Custom);		setOperationAction(ISD::SETCC, VT, Custom);
setOperationAction(ISD::SDIV, VT, Custom);		setOperationAction(ISD::SDIV, VT, Custom);
setOperationAction(ISD::UDIV, VT, Custom);		setOperationAction(ISD::UDIV, VT, Custom);
setOperationAction(ISD::SMIN, VT, Custom);		setOperationAction(ISD::SMIN, VT, Custom);
setOperationAction(ISD::UMIN, VT, Custom);		setOperationAction(ISD::UMIN, VT, Custom);
setOperationAction(ISD::SMAX, VT, Custom);		setOperationAction(ISD::SMAX, VT, Custom);
setOperationAction(ISD::UMAX, VT, Custom);		setOperationAction(ISD::UMAX, VT, Custom);
setOperationAction(ISD::SHL, VT, Custom);		setOperationAction(ISD::SHL, VT, Custom);
setOperationAction(ISD::SRL, VT, Custom);		setOperationAction(ISD::SRL, VT, Custom);
setOperationAction(ISD::SRA, VT, Custom);		setOperationAction(ISD::SRA, VT, Custom);
setOperationAction(ISD::ABS, VT, Custom);		setOperationAction(ISD::ABS, VT, Custom);
setOperationAction(ISD::VECREDUCE_ADD, VT, Custom);		setOperationAction(ISD::VECREDUCE_ADD, VT, Custom);
setOperationAction(ISD::VECREDUCE_AND, VT, Custom);		setOperationAction(ISD::VECREDUCE_AND, VT, Custom);
setOperationAction(ISD::VECREDUCE_OR, VT, Custom);		setOperationAction(ISD::VECREDUCE_OR, VT, Custom);
setOperationAction(ISD::VECREDUCE_XOR, VT, Custom);		setOperationAction(ISD::VECREDUCE_XOR, VT, Custom);
setOperationAction(ISD::VECREDUCE_UMIN, VT, Custom);		setOperationAction(ISD::VECREDUCE_UMIN, VT, Custom);
setOperationAction(ISD::VECREDUCE_UMAX, VT, Custom);		setOperationAction(ISD::VECREDUCE_UMAX, VT, Custom);
setOperationAction(ISD::VECREDUCE_SMIN, VT, Custom);		setOperationAction(ISD::VECREDUCE_SMIN, VT, Custom);
setOperationAction(ISD::VECREDUCE_SMAX, VT, Custom);		setOperationAction(ISD::VECREDUCE_SMAX, VT, Custom);
setOperationAction(ISD::STEP_VECTOR, VT, Custom);		setOperationAction(ISD::STEP_VECTOR, VT, Custom);

setOperationAction(ISD::MULHU, VT, Expand);
setOperationAction(ISD::MULHS, VT, Expand);
setOperationAction(ISD::UMUL_LOHI, VT, Expand);		setOperationAction(ISD::UMUL_LOHI, VT, Expand);
setOperationAction(ISD::SMUL_LOHI, VT, Expand);		setOperationAction(ISD::SMUL_LOHI, VT, Expand);
}		}

// Illegal unpacked integer vector types.		// Illegal unpacked integer vector types.
for (auto VT : {MVT::nxv8i8, MVT::nxv4i16, MVT::nxv2i32}) {		for (auto VT : {MVT::nxv8i8, MVT::nxv4i16, MVT::nxv2i32}) {
setOperationAction(ISD::EXTRACT_SUBVECTOR, VT, Custom);		setOperationAction(ISD::EXTRACT_SUBVECTOR, VT, Custom);
setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom);		setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom);
▲ Show 20 Lines • Show All 92 Lines • ▼ Show 20 Lines	if (Subtarget->useSVEForFixedLengthVectors()) {

// These operations are not supported on NEON but SVE can do them.		// These operations are not supported on NEON but SVE can do them.
setOperationAction(ISD::BITREVERSE, MVT::v1i64, Custom);		setOperationAction(ISD::BITREVERSE, MVT::v1i64, Custom);
setOperationAction(ISD::CTLZ, MVT::v1i64, Custom);		setOperationAction(ISD::CTLZ, MVT::v1i64, Custom);
setOperationAction(ISD::CTLZ, MVT::v2i64, Custom);		setOperationAction(ISD::CTLZ, MVT::v2i64, Custom);
setOperationAction(ISD::CTTZ, MVT::v1i64, Custom);		setOperationAction(ISD::CTTZ, MVT::v1i64, Custom);
setOperationAction(ISD::MUL, MVT::v1i64, Custom);		setOperationAction(ISD::MUL, MVT::v1i64, Custom);
setOperationAction(ISD::MUL, MVT::v2i64, Custom);		setOperationAction(ISD::MUL, MVT::v2i64, Custom);
		setOperationAction(ISD::MULHS, MVT::v1i64, Custom);
		setOperationAction(ISD::MULHS, MVT::v2i64, Custom);
		setOperationAction(ISD::MULHU, MVT::v1i64, Custom);
		setOperationAction(ISD::MULHU, MVT::v2i64, Custom);
setOperationAction(ISD::SDIV, MVT::v8i8, Custom);		setOperationAction(ISD::SDIV, MVT::v8i8, Custom);
setOperationAction(ISD::SDIV, MVT::v16i8, Custom);		setOperationAction(ISD::SDIV, MVT::v16i8, Custom);
setOperationAction(ISD::SDIV, MVT::v4i16, Custom);		setOperationAction(ISD::SDIV, MVT::v4i16, Custom);
setOperationAction(ISD::SDIV, MVT::v8i16, Custom);		setOperationAction(ISD::SDIV, MVT::v8i16, Custom);
setOperationAction(ISD::SDIV, MVT::v2i32, Custom);		setOperationAction(ISD::SDIV, MVT::v2i32, Custom);
setOperationAction(ISD::SDIV, MVT::v4i32, Custom);		setOperationAction(ISD::SDIV, MVT::v4i32, Custom);
setOperationAction(ISD::SDIV, MVT::v1i64, Custom);		setOperationAction(ISD::SDIV, MVT::v1i64, Custom);
setOperationAction(ISD::SDIV, MVT::v2i64, Custom);		setOperationAction(ISD::SDIV, MVT::v2i64, Custom);
▲ Show 20 Lines • Show All 178 Lines • ▼ Show 20 Lines	void AArch64TargetLowering::addTypeForFixedLengthSVE(MVT VT) {
setOperationAction(ISD::FRINT, VT, Custom);		setOperationAction(ISD::FRINT, VT, Custom);
setOperationAction(ISD::FROUND, VT, Custom);		setOperationAction(ISD::FROUND, VT, Custom);
setOperationAction(ISD::FROUNDEVEN, VT, Custom);		setOperationAction(ISD::FROUNDEVEN, VT, Custom);
setOperationAction(ISD::FSQRT, VT, Custom);		setOperationAction(ISD::FSQRT, VT, Custom);
setOperationAction(ISD::FSUB, VT, Custom);		setOperationAction(ISD::FSUB, VT, Custom);
setOperationAction(ISD::FTRUNC, VT, Custom);		setOperationAction(ISD::FTRUNC, VT, Custom);
setOperationAction(ISD::LOAD, VT, Custom);		setOperationAction(ISD::LOAD, VT, Custom);
setOperationAction(ISD::MUL, VT, Custom);		setOperationAction(ISD::MUL, VT, Custom);
		setOperationAction(ISD::MULHS, VT, Custom);
		setOperationAction(ISD::MULHU, VT, Custom);
setOperationAction(ISD::OR, VT, Custom);		setOperationAction(ISD::OR, VT, Custom);
setOperationAction(ISD::SDIV, VT, Custom);		setOperationAction(ISD::SDIV, VT, Custom);
setOperationAction(ISD::SELECT, VT, Custom);		setOperationAction(ISD::SELECT, VT, Custom);
setOperationAction(ISD::SETCC, VT, Custom);		setOperationAction(ISD::SETCC, VT, Custom);
setOperationAction(ISD::SHL, VT, Custom);		setOperationAction(ISD::SHL, VT, Custom);
setOperationAction(ISD::SIGN_EXTEND, VT, Custom);		setOperationAction(ISD::SIGN_EXTEND, VT, Custom);
setOperationAction(ISD::SIGN_EXTEND_INREG, VT, Custom);		setOperationAction(ISD::SIGN_EXTEND_INREG, VT, Custom);
setOperationAction(ISD::SMAX, VT, Custom);		setOperationAction(ISD::SMAX, VT, Custom);
▲ Show 20 Lines • Show All 330 Lines • ▼ Show 20 Lines	case AArch64ISD::FIRST_NUMBER:
MAKE_CASE(AArch64ISD::FCSEL)		MAKE_CASE(AArch64ISD::FCSEL)
MAKE_CASE(AArch64ISD::CSINV)		MAKE_CASE(AArch64ISD::CSINV)
MAKE_CASE(AArch64ISD::CSNEG)		MAKE_CASE(AArch64ISD::CSNEG)
MAKE_CASE(AArch64ISD::CSINC)		MAKE_CASE(AArch64ISD::CSINC)
MAKE_CASE(AArch64ISD::THREAD_POINTER)		MAKE_CASE(AArch64ISD::THREAD_POINTER)
MAKE_CASE(AArch64ISD::TLSDESC_CALLSEQ)		MAKE_CASE(AArch64ISD::TLSDESC_CALLSEQ)
MAKE_CASE(AArch64ISD::ADD_PRED)		MAKE_CASE(AArch64ISD::ADD_PRED)
MAKE_CASE(AArch64ISD::MUL_PRED)		MAKE_CASE(AArch64ISD::MUL_PRED)
		MAKE_CASE(AArch64ISD::MULHS_PRED)
		MAKE_CASE(AArch64ISD::MULHU_PRED)
MAKE_CASE(AArch64ISD::SDIV_PRED)		MAKE_CASE(AArch64ISD::SDIV_PRED)
MAKE_CASE(AArch64ISD::SHL_PRED)		MAKE_CASE(AArch64ISD::SHL_PRED)
MAKE_CASE(AArch64ISD::SMAX_PRED)		MAKE_CASE(AArch64ISD::SMAX_PRED)
MAKE_CASE(AArch64ISD::SMIN_PRED)		MAKE_CASE(AArch64ISD::SMIN_PRED)
MAKE_CASE(AArch64ISD::SRA_PRED)		MAKE_CASE(AArch64ISD::SRA_PRED)
MAKE_CASE(AArch64ISD::SRL_PRED)		MAKE_CASE(AArch64ISD::SRL_PRED)
MAKE_CASE(AArch64ISD::SUB_PRED)		MAKE_CASE(AArch64ISD::SUB_PRED)
MAKE_CASE(AArch64ISD::UDIV_PRED)		MAKE_CASE(AArch64ISD::UDIV_PRED)
▲ Show 20 Lines • Show All 2,704 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::LowerOperation(SDValue Op,
case ISD::FSINCOS:		case ISD::FSINCOS:
return LowerFSINCOS(Op, DAG);		return LowerFSINCOS(Op, DAG);
case ISD::FLT_ROUNDS_:		case ISD::FLT_ROUNDS_:
return LowerFLT_ROUNDS_(Op, DAG);		return LowerFLT_ROUNDS_(Op, DAG);
case ISD::SET_ROUNDING:		case ISD::SET_ROUNDING:
return LowerSET_ROUNDING(Op, DAG);		return LowerSET_ROUNDING(Op, DAG);
case ISD::MUL:		case ISD::MUL:
return LowerMUL(Op, DAG);		return LowerMUL(Op, DAG);
		case ISD::MULHS:
		return LowerToPredicatedOp(Op, DAG, AArch64ISD::MULHS_PRED,
		/OverrideNEON=/true);
		case ISD::MULHU:
		return LowerToPredicatedOp(Op, DAG, AArch64ISD::MULHU_PRED,
		/OverrideNEON=/true);
case ISD::INTRINSIC_WO_CHAIN:		case ISD::INTRINSIC_WO_CHAIN:
return LowerINTRINSIC_WO_CHAIN(Op, DAG);		return LowerINTRINSIC_WO_CHAIN(Op, DAG);
case ISD::STORE:		case ISD::STORE:
return LowerSTORE(Op, DAG);		return LowerSTORE(Op, DAG);
case ISD::MGATHER:		case ISD::MGATHER:
return LowerMGATHER(Op, DAG);		return LowerMGATHER(Op, DAG);
case ISD::MSCATTER:		case ISD::MSCATTER:
return LowerMSCATTER(Op, DAG);		return LowerMSCATTER(Op, DAG);
▲ Show 20 Lines • Show All 13,055 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

Show First 20 Lines • Show All 183 Lines • ▼ Show 20 Lines
def AArch64fminnm_p : SDNode<"AArch64ISD::FMINNM_PRED", SDT_AArch64Arith>;		def AArch64fminnm_p : SDNode<"AArch64ISD::FMINNM_PRED", SDT_AArch64Arith>;
def AArch64fmax_p : SDNode<"AArch64ISD::FMAX_PRED", SDT_AArch64Arith>;		def AArch64fmax_p : SDNode<"AArch64ISD::FMAX_PRED", SDT_AArch64Arith>;
def AArch64fmin_p : SDNode<"AArch64ISD::FMIN_PRED", SDT_AArch64Arith>;		def AArch64fmin_p : SDNode<"AArch64ISD::FMIN_PRED", SDT_AArch64Arith>;
def AArch64fmul_p : SDNode<"AArch64ISD::FMUL_PRED", SDT_AArch64Arith>;		def AArch64fmul_p : SDNode<"AArch64ISD::FMUL_PRED", SDT_AArch64Arith>;
def AArch64fsub_p : SDNode<"AArch64ISD::FSUB_PRED", SDT_AArch64Arith>;		def AArch64fsub_p : SDNode<"AArch64ISD::FSUB_PRED", SDT_AArch64Arith>;
def AArch64lsl_p : SDNode<"AArch64ISD::SHL_PRED", SDT_AArch64Arith>;		def AArch64lsl_p : SDNode<"AArch64ISD::SHL_PRED", SDT_AArch64Arith>;
def AArch64lsr_p : SDNode<"AArch64ISD::SRL_PRED", SDT_AArch64Arith>;		def AArch64lsr_p : SDNode<"AArch64ISD::SRL_PRED", SDT_AArch64Arith>;
def AArch64mul_p : SDNode<"AArch64ISD::MUL_PRED", SDT_AArch64Arith>;		def AArch64mul_p : SDNode<"AArch64ISD::MUL_PRED", SDT_AArch64Arith>;
def AArch64sdiv_p : SDNode<"AArch64ISD::SDIV_PRED", SDT_AArch64Arith>;		def AArch64sdiv_p : SDNode<"AArch64ISD::SDIV_PRED", SDT_AArch64Arith>;
		paulwalker-armUnsubmitted Done Reply Inline Actions Just a note, because of my other comment, to say that this is fine, here we typically prefer the AArch64 names as you've used here, although it'll be nicer still if you maintained the existing alphabetical ordering :) paulwalker-arm: Just a note, because of my other comment, to say that this is fine, here we typically prefer…
def AArch64smax_p : SDNode<"AArch64ISD::SMAX_PRED", SDT_AArch64Arith>;		def AArch64smax_p : SDNode<"AArch64ISD::SMAX_PRED", SDT_AArch64Arith>;
def AArch64smin_p : SDNode<"AArch64ISD::SMIN_PRED", SDT_AArch64Arith>;		def AArch64smin_p : SDNode<"AArch64ISD::SMIN_PRED", SDT_AArch64Arith>;
		def AArch64smulh_p : SDNode<"AArch64ISD::MULHS_PRED", SDT_AArch64Arith>;
def AArch64sub_p : SDNode<"AArch64ISD::SUB_PRED", SDT_AArch64Arith>;		def AArch64sub_p : SDNode<"AArch64ISD::SUB_PRED", SDT_AArch64Arith>;
def AArch64udiv_p : SDNode<"AArch64ISD::UDIV_PRED", SDT_AArch64Arith>;		def AArch64udiv_p : SDNode<"AArch64ISD::UDIV_PRED", SDT_AArch64Arith>;
def AArch64umax_p : SDNode<"AArch64ISD::UMAX_PRED", SDT_AArch64Arith>;		def AArch64umax_p : SDNode<"AArch64ISD::UMAX_PRED", SDT_AArch64Arith>;
def AArch64umin_p : SDNode<"AArch64ISD::UMIN_PRED", SDT_AArch64Arith>;		def AArch64umin_p : SDNode<"AArch64ISD::UMIN_PRED", SDT_AArch64Arith>;
		def AArch64umulh_p : SDNode<"AArch64ISD::MULHU_PRED", SDT_AArch64Arith>;

def SDT_AArch64IntExtend : SDTypeProfile<1, 4, [		def SDT_AArch64IntExtend : SDTypeProfile<1, 4, [
SDTCisVec<0>, SDTCisVec<1>, SDTCisVec<2>, SDTCisVT<3, OtherVT>, SDTCisVec<4>,		SDTCisVec<0>, SDTCisVec<1>, SDTCisVec<2>, SDTCisVT<3, OtherVT>, SDTCisVec<4>,
SDTCVecEltisVT<1,i1>, SDTCisSameAs<0,2>, SDTCisVTSmallerThanOp<3, 2>, SDTCisSameAs<0,4>		SDTCVecEltisVT<1,i1>, SDTCisSameAs<0,2>, SDTCisVTSmallerThanOp<3, 2>, SDTCisSameAs<0,4>
]>;		]>;

// Predicated operations with the result of inactive lanes provided by the last operand.		// Predicated operations with the result of inactive lanes provided by the last operand.
def AArch64clz_mt : SDNode<"AArch64ISD::CTLZ_MERGE_PASSTHRU", SDT_AArch64Arith>;		def AArch64clz_mt : SDNode<"AArch64ISD::CTLZ_MERGE_PASSTHRU", SDT_AArch64Arith>;
▲ Show 20 Lines • Show All 136 Lines • ▼ Show 20 Lines	let Predicates = [HasSVE] in {
defm UMIN_ZI : sve_int_arith_imm1_unsigned<0b11, "umin", AArch64umin_p>;		defm UMIN_ZI : sve_int_arith_imm1_unsigned<0b11, "umin", AArch64umin_p>;

defm MUL_ZI : sve_int_arith_imm2<"mul", AArch64mul_p>;		defm MUL_ZI : sve_int_arith_imm2<"mul", AArch64mul_p>;
defm MUL_ZPmZ : sve_int_bin_pred_arit_2<0b000, "mul", "MUL_ZPZZ", int_aarch64_sve_mul, DestructiveBinaryComm>;		defm MUL_ZPmZ : sve_int_bin_pred_arit_2<0b000, "mul", "MUL_ZPZZ", int_aarch64_sve_mul, DestructiveBinaryComm>;
defm SMULH_ZPmZ : sve_int_bin_pred_arit_2<0b010, "smulh", "SMULH_ZPZZ", int_aarch64_sve_smulh, DestructiveBinaryComm>;		defm SMULH_ZPmZ : sve_int_bin_pred_arit_2<0b010, "smulh", "SMULH_ZPZZ", int_aarch64_sve_smulh, DestructiveBinaryComm>;
defm UMULH_ZPmZ : sve_int_bin_pred_arit_2<0b011, "umulh", "UMULH_ZPZZ", int_aarch64_sve_umulh, DestructiveBinaryComm>;		defm UMULH_ZPmZ : sve_int_bin_pred_arit_2<0b011, "umulh", "UMULH_ZPZZ", int_aarch64_sve_umulh, DestructiveBinaryComm>;

defm MUL_ZPZZ : sve_int_bin_pred_bhsd<AArch64mul_p>;		defm MUL_ZPZZ : sve_int_bin_pred_bhsd<AArch64mul_p>;
		defm SMULH_ZPZZ : sve_int_bin_pred_bhsd<AArch64smulh_p>;
		defm UMULH_ZPZZ : sve_int_bin_pred_bhsd<AArch64umulh_p>;

defm SDIV_ZPmZ : sve_int_bin_pred_arit_2_div<0b100, "sdiv", "SDIV_ZPZZ", int_aarch64_sve_sdiv, DestructiveBinaryCommWithRev, "SDIVR_ZPmZ">;		defm SDIV_ZPmZ : sve_int_bin_pred_arit_2_div<0b100, "sdiv", "SDIV_ZPZZ", int_aarch64_sve_sdiv, DestructiveBinaryCommWithRev, "SDIVR_ZPmZ">;
defm UDIV_ZPmZ : sve_int_bin_pred_arit_2_div<0b101, "udiv", "UDIV_ZPZZ", int_aarch64_sve_udiv, DestructiveBinaryCommWithRev, "UDIVR_ZPmZ">;		defm UDIV_ZPmZ : sve_int_bin_pred_arit_2_div<0b101, "udiv", "UDIV_ZPZZ", int_aarch64_sve_udiv, DestructiveBinaryCommWithRev, "UDIVR_ZPmZ">;
defm SDIVR_ZPmZ : sve_int_bin_pred_arit_2_div<0b110, "sdivr", "SDIVR_ZPZZ", int_aarch64_sve_sdivr, DestructiveBinaryCommWithRev, "SDIV_ZPmZ", /isReverseInstr/ 1>;		defm SDIVR_ZPmZ : sve_int_bin_pred_arit_2_div<0b110, "sdivr", "SDIVR_ZPZZ", int_aarch64_sve_sdivr, DestructiveBinaryCommWithRev, "SDIV_ZPmZ", /isReverseInstr/ 1>;
defm UDIVR_ZPmZ : sve_int_bin_pred_arit_2_div<0b111, "udivr", "UDIVR_ZPZZ", int_aarch64_sve_udivr, DestructiveBinaryCommWithRev, "UDIV_ZPmZ", /isReverseInstr/ 1>;		defm UDIVR_ZPmZ : sve_int_bin_pred_arit_2_div<0b111, "udivr", "UDIVR_ZPZZ", int_aarch64_sve_udivr, DestructiveBinaryCommWithRev, "UDIV_ZPmZ", /isReverseInstr/ 1>;

defm SDIV_ZPZZ : sve_int_bin_pred_sd<AArch64sdiv_p>;		defm SDIV_ZPZZ : sve_int_bin_pred_sd<AArch64sdiv_p>;
defm UDIV_ZPZZ : sve_int_bin_pred_sd<AArch64udiv_p>;		defm UDIV_ZPZZ : sve_int_bin_pred_sd<AArch64udiv_p>;
▲ Show 20 Lines • Show All 2,021 Lines • ▼ Show 20 Lines	let Predicates = [HasSVE2] in {
defm SQRDMULH_ZZZI : sve2_int_mul_by_indexed_elem<0b1101, "sqrdmulh", int_aarch64_sve_sqrdmulh_lane>;		defm SQRDMULH_ZZZI : sve2_int_mul_by_indexed_elem<0b1101, "sqrdmulh", int_aarch64_sve_sqrdmulh_lane>;

// SVE2 signed saturating doubling multiply high (unpredicated)		// SVE2 signed saturating doubling multiply high (unpredicated)
defm SQDMULH_ZZZ : sve2_int_mul<0b100, "sqdmulh", int_aarch64_sve_sqdmulh>;		defm SQDMULH_ZZZ : sve2_int_mul<0b100, "sqdmulh", int_aarch64_sve_sqdmulh>;
defm SQRDMULH_ZZZ : sve2_int_mul<0b101, "sqrdmulh", int_aarch64_sve_sqrdmulh>;		defm SQRDMULH_ZZZ : sve2_int_mul<0b101, "sqrdmulh", int_aarch64_sve_sqrdmulh>;

// SVE2 integer multiply vectors (unpredicated)		// SVE2 integer multiply vectors (unpredicated)
defm MUL_ZZZ : sve2_int_mul<0b000, "mul", null_frag, AArch64mul_p>;		defm MUL_ZZZ : sve2_int_mul<0b000, "mul", null_frag, AArch64mul_p>;
defm SMULH_ZZZ : sve2_int_mul<0b010, "smulh", null_frag>;		defm SMULH_ZZZ : sve2_int_mul<0b010, "smulh", null_frag, AArch64smulh_p>;
defm UMULH_ZZZ : sve2_int_mul<0b011, "umulh", null_frag>;		defm UMULH_ZZZ : sve2_int_mul<0b011, "umulh", null_frag, AArch64umulh_p>;
defm PMUL_ZZZ : sve2_int_mul_single<0b001, "pmul", int_aarch64_sve_pmul>;		defm PMUL_ZZZ : sve2_int_mul_single<0b001, "pmul", int_aarch64_sve_pmul>;

// Add patterns for unpredicated version of smulh and umulh.		// Add patterns for unpredicated version of smulh and umulh.
def : Pat<(nxv16i8 (int_aarch64_sve_smulh (nxv16i1 (AArch64ptrue 31)), nxv16i8:$Op1, nxv16i8:$Op2)),		def : Pat<(nxv16i8 (int_aarch64_sve_smulh (nxv16i1 (AArch64ptrue 31)), nxv16i8:$Op1, nxv16i8:$Op2)),
(SMULH_ZZZ_B $Op1, $Op2)>;		(SMULH_ZZZ_B $Op1, $Op2)>;
def : Pat<(nxv8i16 (int_aarch64_sve_smulh (nxv8i1 (AArch64ptrue 31)), nxv8i16:$Op1, nxv8i16:$Op2)),		def : Pat<(nxv8i16 (int_aarch64_sve_smulh (nxv8i1 (AArch64ptrue 31)), nxv8i16:$Op1, nxv8i16:$Op2)),
(SMULH_ZZZ_H $Op1, $Op2)>;		(SMULH_ZZZ_H $Op1, $Op2)>;
def : Pat<(nxv4i32 (int_aarch64_sve_smulh (nxv4i1 (AArch64ptrue 31)), nxv4i32:$Op1, nxv4i32:$Op2)),		def : Pat<(nxv4i32 (int_aarch64_sve_smulh (nxv4i1 (AArch64ptrue 31)), nxv4i32:$Op1, nxv4i32:$Op2)),
▲ Show 20 Lines • Show All 394 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-expand-div.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=aarch64-linux-gnu < %s \| FileCheck %s

				; Check that expensive divides are expanded into a more performant sequence

				;
				; SDIV
				;

				define <vscale x 16 x i8> @sdiv_i8(<vscale x 16 x i8> %a) #0 {
				; CHECK-LABEL: sdiv_i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov z1.b, #86 // =0x56
				; CHECK-NEXT: ptrue p0.b
				; CHECK-NEXT: smulh z0.b, p0/m, z0.b, z1.b
				; CHECK-NEXT: lsr z1.b, z0.b, #7
				; CHECK-NEXT: mov z2.b, #-1 // =0xffffffffffffffff
				; CHECK-NEXT: and z1.d, z1.d, z2.d
				; CHECK-NEXT: add z0.b, z0.b, z1.b
				; CHECK-NEXT: ret
				%div = sdiv <vscale x 16 x i8> %a, shufflevector (<vscale x 16 x i8> insertelement (<vscale x 16 x i8> undef, i8 3, i32 0), <vscale x 16 x i8> undef, <vscale x 16 x i32> zeroinitializer)
				ret <vscale x 16 x i8> %div
				}

				define <vscale x 8 x i16> @sdiv_i16(<vscale x 8 x i16> %a) #0 {
				; CHECK-LABEL: sdiv_i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov w8, #21846
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: mov z1.h, w8
				; CHECK-NEXT: smulh z0.h, p0/m, z0.h, z1.h
				; CHECK-NEXT: lsr z1.h, z0.h, #15
				; CHECK-NEXT: mov z2.h, #-1 // =0xffffffffffffffff
				; CHECK-NEXT: and z1.d, z1.d, z2.d
				; CHECK-NEXT: add z0.h, z0.h, z1.h
				; CHECK-NEXT: ret
				%div = sdiv <vscale x 8 x i16> %a, shufflevector (<vscale x 8 x i16> insertelement (<vscale x 8 x i16> undef, i16 3, i32 0), <vscale x 8 x i16> undef, <vscale x 8 x i32> zeroinitializer)
				ret <vscale x 8 x i16> %div
				}

				define <vscale x 4 x i32> @sdiv_i32(<vscale x 4 x i32> %a) #0 {
				; CHECK-LABEL: sdiv_i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov w8, #21846
				; CHECK-NEXT: movk w8, #21845, lsl #16
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: mov z1.s, w8
				; CHECK-NEXT: smulh z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: lsr z1.s, z0.s, #31
				; CHECK-NEXT: mov z2.s, #-1 // =0xffffffffffffffff
				; CHECK-NEXT: and z1.d, z1.d, z2.d
				; CHECK-NEXT: add z0.s, z0.s, z1.s
				; CHECK-NEXT: ret
				%div = sdiv <vscale x 4 x i32> %a, shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> undef, i32 3, i32 0), <vscale x 4 x i32> undef, <vscale x 4 x i32> zeroinitializer)
				ret <vscale x 4 x i32> %div
				}

				define <vscale x 2 x i64> @sdiv_i64(<vscale x 2 x i64> %a) #0 {
				; CHECK-LABEL: sdiv_i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov x8, #6148914691236517205
				; CHECK-NEXT: movk x8, #21846
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mov z1.d, x8
				; CHECK-NEXT: smulh z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: lsr z1.d, z0.d, #63
				; CHECK-NEXT: mov z2.d, #-1 // =0xffffffffffffffff
				; CHECK-NEXT: and z1.d, z1.d, z2.d
				; CHECK-NEXT: add z0.d, z0.d, z1.d
				; CHECK-NEXT: ret
				%div = sdiv <vscale x 2 x i64> %a, shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> undef, i64 3, i32 0), <vscale x 2 x i64> undef, <vscale x 2 x i32> zeroinitializer)
				ret <vscale x 2 x i64> %div
				}

				;
				; UDIV
				;

				define <vscale x 16 x i8> @udiv_i8(<vscale x 16 x i8> %a) #0 {
				; CHECK-LABEL: udiv_i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov z1.b, #-85 // =0xffffffffffffffab
				; CHECK-NEXT: ptrue p0.b
				; CHECK-NEXT: mov z2.b, #1 // =0x1
				; CHECK-NEXT: umulh z1.b, p0/m, z1.b, z0.b
				; CHECK-NEXT: lsr z1.b, z1.b, #1
				; CHECK-NEXT: cmpeq p0.b, p0/z, z2.b, #3
				; CHECK-NEXT: sel z0.b, p0, z0.b, z1.b
				; CHECK-NEXT: ret
				%div = udiv <vscale x 16 x i8> %a, shufflevector (<vscale x 16 x i8> insertelement (<vscale x 16 x i8> undef, i8 3, i32 0), <vscale x 16 x i8> undef, <vscale x 16 x i32> zeroinitializer)
				ret <vscale x 16 x i8> %div
				}

				define <vscale x 8 x i16> @udiv_i16(<vscale x 8 x i16> %a) #0 {
				; CHECK-LABEL: udiv_i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov w8, #-21845
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: mov z2.h, w8
				; CHECK-NEXT: mov z1.h, #1 // =0x1
				; CHECK-NEXT: umulh z2.h, p0/m, z2.h, z0.h
				; CHECK-NEXT: lsr z2.h, z2.h, #1
				; CHECK-NEXT: cmpeq p0.h, p0/z, z1.h, #3
				; CHECK-NEXT: sel z0.h, p0, z0.h, z2.h
				; CHECK-NEXT: ret
				%div = udiv <vscale x 8 x i16> %a, shufflevector (<vscale x 8 x i16> insertelement (<vscale x 8 x i16> undef, i16 3, i32 0), <vscale x 8 x i16> undef, <vscale x 8 x i32> zeroinitializer)
				ret <vscale x 8 x i16> %div
				}

				define <vscale x 4 x i32> @udiv_i32(<vscale x 4 x i32> %a) #0 {
				; CHECK-LABEL: udiv_i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov w8, #43691
				; CHECK-NEXT: movk w8, #43690, lsl #16
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: mov z2.s, w8
				; CHECK-NEXT: mov z1.s, #3 // =0x3
				; CHECK-NEXT: umulh z2.s, p0/m, z2.s, z0.s
				; CHECK-NEXT: lsr z2.s, z2.s, #1
				; CHECK-NEXT: cmpeq p0.s, p0/z, z1.s, #1
				; CHECK-NEXT: sel z0.s, p0, z0.s, z2.s
				; CHECK-NEXT: ret
				%div = udiv <vscale x 4 x i32> %a, shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> undef, i32 3, i32 0), <vscale x 4 x i32> undef, <vscale x 4 x i32> zeroinitializer)
				ret <vscale x 4 x i32> %div
				}

				define <vscale x 2 x i64> @udiv_i64(<vscale x 2 x i64> %a) #0 {
				; CHECK-LABEL: udiv_i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov x8, #-6148914691236517206
				; CHECK-NEXT: movk x8, #43691
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mov z2.d, x8
				; CHECK-NEXT: mov z1.d, #3 // =0x3
				; CHECK-NEXT: umulh z2.d, p0/m, z2.d, z0.d
				; CHECK-NEXT: lsr z2.d, z2.d, #1
				; CHECK-NEXT: cmpeq p0.d, p0/z, z1.d, #1
				; CHECK-NEXT: sel z0.d, p0, z0.d, z2.d
				; CHECK-NEXT: ret
				%div = udiv <vscale x 2 x i64> %a, shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> undef, i64 3, i32 0), <vscale x 2 x i64> undef, <vscale x 2 x i32> zeroinitializer)
				ret <vscale x 2 x i64> %div
				}

				attributes #0 = { "target-features"="+sve" }

llvm/test/CodeGen/AArch64/sve-fixed-length-int-div.ll

Show First 20 Lines • Show All 964 Lines • ▼ Show 20 Lines	; VBITS_GE_2048-NEXT: ret
%op2 = load <32 x i64>, <32 x i64>* %b		%op2 = load <32 x i64>, <32 x i64>* %b
%res = udiv <32 x i64> %op1, %op2		%res = udiv <32 x i64> %op1, %op2
store <32 x i64> %res, <32 x i64>* %a		store <32 x i64> %res, <32 x i64>* %a
ret void		ret void
}		}

; This used to crash because isUnaryPredicate and BuildUDIV don't know how		; This used to crash because isUnaryPredicate and BuildUDIV don't know how
; a SPLAT_VECTOR of fixed vector type should be handled.		; a SPLAT_VECTOR of fixed vector type should be handled.
define void @udiv_constantsplat_v8i32(<8 x i32>* %a) #0 {		define void @udiv_constantsplat_v8i32(<8 x i32>* %a) #1 {
; CHECK-LABEL: udiv_constantsplat_v8i32:		; CHECK-LABEL: udiv_constantsplat_v8i32:
; CHECK: ptrue [[PG:p[0-9]+]].s, vl[[#min(div(VBYTES,4),8)]]		; CHECK: ptrue [[PG:p[0-9]+]].s, vl[[#min(div(VBYTES,4),8)]]
; CHECK-NEXT: ld1w { [[OP1:z[0-9]+]].s }, [[PG]]/z, [x0]		; CHECK-NEXT: ld1w { [[OP1:z[0-9]+]].s }, [[PG]]/z, [x0]
; CHECK-NEXT: mov [[OP2:z[0-9]+]].s, #95		; CHECK-NEXT: mov [[OP2:z[0-9]+]].s, #95
; CHECK-NEXT: udiv [[RES:z[0-9]+]].s, [[PG]]/m, [[OP1]].s, [[OP2]].s		; CHECK-NEXT: udiv [[RES:z[0-9]+]].s, [[PG]]/m, [[OP1]].s, [[OP2]].s
; CHECK-NEXT: st1w { [[RES]].s }, [[PG]], [x0]		; CHECK-NEXT: st1w { [[RES]].s }, [[PG]], [x0]
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%op1 = load <8 x i32>, <8 x i32>* %a		%op1 = load <8 x i32>, <8 x i32>* %a
%res = udiv <8 x i32> %op1, <i32 95, i32 95, i32 95, i32 95, i32 95, i32 95, i32 95, i32 95>		%res = udiv <8 x i32> %op1, <i32 95, i32 95, i32 95, i32 95, i32 95, i32 95, i32 95, i32 95>
store <8 x i32> %res, <8 x i32>* %a		store <8 x i32> %res, <8 x i32>* %a
ret void		ret void
}		}

attributes #0 = { "target-features"="+sve" }		attributes #0 = { "target-features"="+sve" }
		attributes #1 = { "target-features"="+sve" minsize }

llvm/test/CodeGen/AArch64/sve-fixed-length-int-mulh.ll

This file was added.

				; RUN: llc -aarch64-sve-vector-bits-min=128 < %s \| FileCheck %s -D#VBYTES=16 -check-prefix=NO_SVE
				; RUN: llc -aarch64-sve-vector-bits-min=256 < %s \| FileCheck %s -D#VBYTES=32 -check-prefixes=CHECK,VBITS_GE_256
				; RUN: llc -aarch64-sve-vector-bits-min=384 < %s \| FileCheck %s -D#VBYTES=32 -check-prefixes=CHECK,VBITS_GE_256
				; RUN: llc -aarch64-sve-vector-bits-min=512 < %s \| FileCheck %s -D#VBYTES=64 -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_256
				; RUN: llc -aarch64-sve-vector-bits-min=640 < %s \| FileCheck %s -D#VBYTES=64 -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_256
				; RUN: llc -aarch64-sve-vector-bits-min=768 < %s \| FileCheck %s -D#VBYTES=64 -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_256
				paulwalker-armUnsubmitted Done Reply Inline Actions Should this be `VBITS_EQ_2048`? paulwalker-arm: Should this be `VBITS_EQ_2048`?
				paulwalker-armUnsubmitted Not Done Reply Inline Actions Please add `RUN` lines for all the support vector lengths. Also, just in case somebody wonders why, can you add a comment saying there's no validation for splitting vector operations because the necessary MULH DAG combine does no apply to illegally typed operations. paulwalker-arm: Please add `RUN` lines for all the support vector lengths. Also, just in case somebody wonders…
				bsmithAuthorUnsubmitted Done Reply Inline Actions There already is a comment as such, on line 12 (23 in the new patch) bsmith: There already is a comment as such, on line 12 (23 in the new patch)
				paulwalker-armUnsubmitted Not Done Reply Inline Actions That must have been where "my" idea came from :) paulwalker-arm: That must have been where "my" idea came from :)
				; RUN: llc -aarch64-sve-vector-bits-min=896 < %s \| FileCheck %s -D#VBYTES=64 -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_256
				; RUN: llc -aarch64-sve-vector-bits-min=1024 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_1024,VBITS_GE_512,VBITS_GE_256
				; RUN: llc -aarch64-sve-vector-bits-min=1152 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_1024,VBITS_GE_512,VBITS_GE_256
				; RUN: llc -aarch64-sve-vector-bits-min=1280 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_1024,VBITS_GE_512,VBITS_GE_256
				; RUN: llc -aarch64-sve-vector-bits-min=1408 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_1024,VBITS_GE_512,VBITS_GE_256
				; RUN: llc -aarch64-sve-vector-bits-min=1536 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_1024,VBITS_GE_512,VBITS_GE_256
				; RUN: llc -aarch64-sve-vector-bits-min=1664 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_1024,VBITS_GE_512,VBITS_GE_256
				; RUN: llc -aarch64-sve-vector-bits-min=1792 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_1024,VBITS_GE_512,VBITS_GE_256
				; RUN: llc -aarch64-sve-vector-bits-min=1920 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_1024,VBITS_GE_512,VBITS_GE_256
				; RUN: llc -aarch64-sve-vector-bits-min=2048 < %s \| FileCheck %s -D#VBYTES=256 -check-prefixes=CHECK,VBITS_GE_2048,VBITS_GE_1024,VBITS_GE_512,VBITS_GE_256

				; VBYTES represents the useful byte size of a vector register from the code
				; generator's point of view. It is clamped to power-of-2 values because
				; only power-of-2 vector lengths are considered legal, regardless of the
				; user specified vector length.

				; This test only tests the legal types for a given vector width, as mulh nodes
				; do not get generated for non-legal types.

				target triple = "aarch64-unknown-linux-gnu"

				; Don't use SVE when its registers are no bigger than NEON.
				; NO_SVE-NOT: ptrue

				;
				; SMULH
				;

				; Don't use SVE for 64-bit vectors.
				define <8 x i8> @smulh_v8i8(<8 x i8> %op1, <8 x i8> %op2) #0 {
				; CHECK-LABEL: smulh_v8i8:
				; CHECK: smull v0.8h, v0.8b, v1.8b
				; CHECK: ushr v1.8h, v0.8h, #8
				; CHECK: umov w8, v1.h[0]
				; CHECK: fmov s0, w8
				; CHECK: umov w8, v1.h[1]
				; CHECK: mov v0.b[1], w8
				; CHECK: umov w8, v1.h[2]
				; CHECK: mov v0.b[2], w8
				; CHECK: umov w8, v1.h[3]
				; CHECK: mov v0.b[3], w8
				; CHECK: ret
				%insert = insertelement <8 x i16> undef, i16 8, i64 0
				%splat = shufflevector <8 x i16> %insert, <8 x i16> undef, <8 x i32> zeroinitializer
				%1 = sext <8 x i8> %op1 to <8 x i16>
				%2 = sext <8 x i8> %op2 to <8 x i16>
				%mul = mul <8 x i16> %1, %2
				%shr = lshr <8 x i16> %mul, %splat
				%res = trunc <8 x i16> %shr to <8 x i8>
				ret <8 x i8> %res
				}

				; Don't use SVE for 128-bit vectors.
				define <16 x i8> @smulh_v16i8(<16 x i8> %op1, <16 x i8> %op2) #0 {
				; CHECK-LABEL: smulh_v16i8:
				; CHECK: smull2 v2.8h, v0.16b, v1.16b
				; CHECK: smull v0.8h, v0.8b, v1.8b
				; CHECK: uzp2 v0.16b, v0.16b, v2.16b
				; CHECK: ret
				%insert = insertelement <16 x i16> undef, i16 8, i64 0
				%splat = shufflevector <16 x i16> %insert, <16 x i16> undef, <16 x i32> zeroinitializer
				%1 = sext <16 x i8> %op1 to <16 x i16>
				paulwalker-armUnsubmitted Done Reply Inline Actions Given you're using `EQ_256` I doubt the `#min` logic means anything. That said, why not VBITS_GE_256, shouldn't the code be the same for larger vectors? paulwalker-arm: Given you're using `EQ_256` I doubt the `#min` logic means anything. That said, why not…
				%2 = sext <16 x i8> %op2 to <16 x i16>
				%mul = mul <16 x i16> %1, %2
				%shr = lshr <16 x i16> %mul, %splat
				%res = trunc <16 x i16> %shr to <16 x i8>
				ret <16 x i8> %res
				}

				define void @smulh_v32i8(<32 x i8>* %a, <32 x i8>* %b) #0 {
				; CHECK-LABEL: smulh_v32i8:
				; VBITS_GE_256: ptrue [[PG:p[0-9]+]].b, vl[[#min(VBYTES,32)]]
				; VBITS_GE_256-DAG: ld1b { [[OP1:z[0-9]+]].b }, [[PG]]/z, [x0]
				; VBITS_GE_256-DAG: ld1b { [[OP2:z[0-9]+]].b }, [[PG]]/z, [x1]
				; VBITS_GE_256: smulh [[RES:z[0-9]+]].b, [[PG]]/m, [[OP1]].b, [[OP2]].b
				; VBITS_GE_256: st1b { [[RES]].b }, [[PG]], [x0]
				; VBITS_GE_256: ret
				%op1 = load <32 x i8>, <32 x i8>* %a
				%op2 = load <32 x i8>, <32 x i8>* %b
				%insert = insertelement <32 x i16> undef, i16 8, i64 0
				%splat = shufflevector <32 x i16> %insert, <32 x i16> undef, <32 x i32> zeroinitializer
				%1 = sext <32 x i8> %op1 to <32 x i16>
				%2 = sext <32 x i8> %op2 to <32 x i16>
				%mul = mul <32 x i16> %1, %2
				%shr = lshr <32 x i16> %mul, %splat
				%res = trunc <32 x i16> %shr to <32 x i8>
				store <32 x i8> %res, <32 x i8>* %a
				ret void
				}

				define void @smulh_v64i8(<64 x i8>* %a, <64 x i8>* %b) #0 {
				; CHECK-LABEL: smulh_v64i8:
				; VBITS_GE_512: ptrue [[PG:p[0-9]+]].b, vl[[#min(VBYTES,64)]]
				; VBITS_GE_512-DAG: ld1b { [[OP1:z[0-9]+]].b }, [[PG]]/z, [x0]
				; VBITS_GE_512-DAG: ld1b { [[OP2:z[0-9]+]].b }, [[PG]]/z, [x1]
				; VBITS_GE_512: smulh [[RES:z[0-9]+]].b, [[PG]]/m, [[OP1]].b, [[OP2]].b
				; VBITS_GE_512: st1b { [[RES]].b }, [[PG]], [x0]
				; VBITS_GE_512: ret
				%op1 = load <64 x i8>, <64 x i8>* %a
				%op2 = load <64 x i8>, <64 x i8>* %b
				%insert = insertelement <64 x i16> undef, i16 8, i64 0
				%splat = shufflevector <64 x i16> %insert, <64 x i16> undef, <64 x i32> zeroinitializer
				%1 = sext <64 x i8> %op1 to <64 x i16>
				%2 = sext <64 x i8> %op2 to <64 x i16>
				%mul = mul <64 x i16> %1, %2
				%shr = lshr <64 x i16> %mul, %splat
				%res = trunc <64 x i16> %shr to <64 x i8>
				store <64 x i8> %res, <64 x i8>* %a
				ret void
				}

				define void @smulh_v128i8(<128 x i8>* %a, <128 x i8>* %b) #0 {
				; CHECK-LABEL: smulh_v128i8:
				; VBITS_GE_1024: ptrue [[PG:p[0-9]+]].b, vl[[#min(VBYTES,128)]]
				; VBITS_GE_1024-DAG: ld1b { [[OP1:z[0-9]+]].b }, [[PG]]/z, [x0]
				; VBITS_GE_1024-DAG: ld1b { [[OP2:z[0-9]+]].b }, [[PG]]/z, [x1]
				; VBITS_GE_1024: smulh [[RES:z[0-9]+]].b, [[PG]]/m, [[OP1]].b, [[OP2]].b
				; VBITS_GE_1024: st1b { [[RES]].b }, [[PG]], [x0]
				; VBITS_GE_1024: ret
				%op1 = load <128 x i8>, <128 x i8>* %a
				%op2 = load <128 x i8>, <128 x i8>* %b
				%insert = insertelement <128 x i16> undef, i16 8, i64 0
				%splat = shufflevector <128 x i16> %insert, <128 x i16> undef, <128 x i32> zeroinitializer
				%1 = sext <128 x i8> %op1 to <128 x i16>
				%2 = sext <128 x i8> %op2 to <128 x i16>
				%mul = mul <128 x i16> %1, %2
				%shr = lshr <128 x i16> %mul, %splat
				%res = trunc <128 x i16> %shr to <128 x i8>
				store <128 x i8> %res, <128 x i8>* %a
				ret void
				}

				define void @smulh_v256i8(<256 x i8>* %a, <256 x i8>* %b) #0 {
				; CHECK-LABEL: smulh_v256i8:
				; VBITS_GE_2048: ptrue [[PG:p[0-9]+]].b, vl[[#min(VBYTES,256)]]
				; VBITS_GE_2048-DAG: ld1b { [[OP1:z[0-9]+]].b }, [[PG]]/z, [x0]
				; VBITS_GE_2048-DAG: ld1b { [[OP2:z[0-9]+]].b }, [[PG]]/z, [x1]
				; VBITS_GE_2048: smulh [[RES:z[0-9]+]].b, [[PG]]/m, [[OP1]].b, [[OP2]].b
				; VBITS_GE_2048: st1b { [[RES]].b }, [[PG]], [x0]
				; VBITS_GE_2048: ret
				%op1 = load <256 x i8>, <256 x i8>* %a
				%op2 = load <256 x i8>, <256 x i8>* %b
				%insert = insertelement <256 x i16> undef, i16 8, i64 0
				%splat = shufflevector <256 x i16> %insert, <256 x i16> undef, <256 x i32> zeroinitializer
				%1 = sext <256 x i8> %op1 to <256 x i16>
				%2 = sext <256 x i8> %op2 to <256 x i16>
				%mul = mul <256 x i16> %1, %2
				%shr = lshr <256 x i16> %mul, %splat
				%res = trunc <256 x i16> %shr to <256 x i8>
				store <256 x i8> %res, <256 x i8>* %a
				ret void
				}

				; Don't use SVE for 64-bit vectors.
				define <4 x i16> @smulh_v4i16(<4 x i16> %op1, <4 x i16> %op2) #0 {
				; CHECK-LABEL: smulh_v4i16:
				; CHECK: smull v0.4s, v0.4h, v1.4h
				; CHECK: ushr v0.4s, v0.4s, #16
				; CHECK: mov w8, v0.s[1]
				; CHECK: mov w9, v0.s[2]
				; CHECK: mov w10, v0.s[3]
				; CHECK: mov v0.h[1], w8
				; CHECK: mov v0.h[2], w9
				; CHECK: mov v0.h[3], w10
				; CHECK: ret
				%insert = insertelement <4 x i32> undef, i32 16, i64 0
				%splat = shufflevector <4 x i32> %insert, <4 x i32> undef, <4 x i32> zeroinitializer
				%1 = sext <4 x i16> %op1 to <4 x i32>
				%2 = sext <4 x i16> %op2 to <4 x i32>
				%mul = mul <4 x i32> %1, %2
				%shr = lshr <4 x i32> %mul, %splat
				%res = trunc <4 x i32> %shr to <4 x i16>
				ret <4 x i16> %res
				}

				; Don't use SVE for 128-bit vectors.
				define <8 x i16> @smulh_v8i16(<8 x i16> %op1, <8 x i16> %op2) #0 {
				; CHECK-LABEL: smulh_v8i16:
				; CHECK: smull2 v2.4s, v0.8h, v1.8h
				; CHECK: smull v0.4s, v0.4h, v1.4h
				; CHECK: uzp2 v0.8h, v0.8h, v2.8h
				; CHECK: ret
				%insert = insertelement <8 x i32> undef, i32 16, i64 0
				%splat = shufflevector <8 x i32> %insert, <8 x i32> undef, <8 x i32> zeroinitializer
				%1 = sext <8 x i16> %op1 to <8 x i32>
				%2 = sext <8 x i16> %op2 to <8 x i32>
				%mul = mul <8 x i32> %1, %2
				%shr = lshr <8 x i32> %mul, %splat
				%res = trunc <8 x i32> %shr to <8 x i16>
				ret <8 x i16> %res
				}

				define void @smulh_v16i16(<16 x i16>* %a, <16 x i16>* %b) #0 {
				; CHECK-LABEL: smulh_v16i16:
				; VBITS_GE_256: ptrue [[PG:p[0-9]+]].h, vl[[#min(VBYTES,16)]]
				; VBITS_GE_256-DAG: ld1h { [[OP1:z[0-9]+]].h }, [[PG]]/z, [x0]
				; VBITS_GE_256-DAG: ld1h { [[OP2:z[0-9]+]].h }, [[PG]]/z, [x1]
				; VBITS_GE_256: smulh [[RES:z[0-9]+]].h, [[PG]]/m, [[OP1]].h, [[OP2]].h
				; VBITS_GE_256: st1h { [[RES]].h }, [[PG]], [x0]
				; VBITS_GE_256: ret
				%op1 = load <16 x i16>, <16 x i16>* %a
				%op2 = load <16 x i16>, <16 x i16>* %b
				%insert = insertelement <16 x i32> undef, i32 16, i64 0
				%splat = shufflevector <16 x i32> %insert, <16 x i32> undef, <16 x i32> zeroinitializer
				%1 = sext <16 x i16> %op1 to <16 x i32>
				%2 = sext <16 x i16> %op2 to <16 x i32>
				%mul = mul <16 x i32> %1, %2
				%shr = lshr <16 x i32> %mul, %splat
				%res = trunc <16 x i32> %shr to <16 x i16>
				store <16 x i16> %res, <16 x i16>* %a
				ret void
				}

				define void @smulh_v32i16(<32 x i16>* %a, <32 x i16>* %b) #0 {
				; CHECK-LABEL: smulh_v32i16:
				; VBITS_GE_512: ptrue [[PG:p[0-9]+]].h, vl[[#min(VBYTES,32)]]
				; VBITS_GE_512-DAG: ld1h { [[OP1:z[0-9]+]].h }, [[PG]]/z, [x0]
				; VBITS_GE_512-DAG: ld1h { [[OP2:z[0-9]+]].h }, [[PG]]/z, [x1]
				; VBITS_GE_512: smulh [[RES:z[0-9]+]].h, [[PG]]/m, [[OP1]].h, [[OP2]].h
				; VBITS_GE_512: st1h { [[RES]].h }, [[PG]], [x0]
				; VBITS_GE_512: ret
				%op1 = load <32 x i16>, <32 x i16>* %a
				%op2 = load <32 x i16>, <32 x i16>* %b
				%insert = insertelement <32 x i32> undef, i32 16, i64 0
				%splat = shufflevector <32 x i32> %insert, <32 x i32> undef, <32 x i32> zeroinitializer
				%1 = sext <32 x i16> %op1 to <32 x i32>
				%2 = sext <32 x i16> %op2 to <32 x i32>
				%mul = mul <32 x i32> %1, %2
				%shr = lshr <32 x i32> %mul, %splat
				%res = trunc <32 x i32> %shr to <32 x i16>
				store <32 x i16> %res, <32 x i16>* %a
				ret void
				}

				define void @smulh_v64i16(<64 x i16>* %a, <64 x i16>* %b) #0 {
				; CHECK-LABEL: smulh_v64i16:
				; VBITS_GE_1024: ptrue [[PG:p[0-9]+]].h, vl[[#min(VBYTES,64)]]
				; VBITS_GE_1024-DAG: ld1h { [[OP1:z[0-9]+]].h }, [[PG]]/z, [x0]
				; VBITS_GE_1024-DAG: ld1h { [[OP2:z[0-9]+]].h }, [[PG]]/z, [x1]
				; VBITS_GE_1024: smulh [[RES:z[0-9]+]].h, [[PG]]/m, [[OP1]].h, [[OP2]].h
				; VBITS_GE_1024: st1h { [[RES]].h }, [[PG]], [x0]
				; VBITS_GE_1024: ret
				%op1 = load <64 x i16>, <64 x i16>* %a
				%op2 = load <64 x i16>, <64 x i16>* %b
				%insert = insertelement <64 x i32> undef, i32 16, i64 0
				%splat = shufflevector <64 x i32> %insert, <64 x i32> undef, <64 x i32> zeroinitializer
				%1 = sext <64 x i16> %op1 to <64 x i32>
				%2 = sext <64 x i16> %op2 to <64 x i32>
				%mul = mul <64 x i32> %1, %2
				%shr = lshr <64 x i32> %mul, %splat
				%res = trunc <64 x i32> %shr to <64 x i16>
				store <64 x i16> %res, <64 x i16>* %a
				ret void
				}

				define void @smulh_v128i16(<128 x i16>* %a, <128 x i16>* %b) #0 {
				; CHECK-LABEL: smulh_v128i16:
				; VBITS_GE_2048: ptrue [[PG:p[0-9]+]].h, vl[[#min(VBYTES,128)]]
				; VBITS_GE_2048-DAG: ld1h { [[OP1:z[0-9]+]].h }, [[PG]]/z, [x0]
				; VBITS_GE_2048-DAG: ld1h { [[OP2:z[0-9]+]].h }, [[PG]]/z, [x1]
				; VBITS_GE_2048: smulh [[RES:z[0-9]+]].h, [[PG]]/m, [[OP1]].h, [[OP2]].h
				; VBITS_GE_2048: st1h { [[RES]].h }, [[PG]], [x0]
				; VBITS_GE_2048: ret
				%op1 = load <128 x i16>, <128 x i16>* %a
				%op2 = load <128 x i16>, <128 x i16>* %b
				%insert = insertelement <128 x i32> undef, i32 16, i64 0
				%splat = shufflevector <128 x i32> %insert, <128 x i32> undef, <128 x i32> zeroinitializer
				%1 = sext <128 x i16> %op1 to <128 x i32>
				%2 = sext <128 x i16> %op2 to <128 x i32>
				%mul = mul <128 x i32> %1, %2
				%shr = lshr <128 x i32> %mul, %splat
				%res = trunc <128 x i32> %shr to <128 x i16>
				store <128 x i16> %res, <128 x i16>* %a
				ret void
				}

				; Vector i64 multiplications are not legal for NEON so use SVE when available.
				define <2 x i32> @smulh_v2i32(<2 x i32> %op1, <2 x i32> %op2) #0 {
				; CHECK-LABEL: smulh_v2i32:
				; CHECK: sshll v0.2d, v0.2s, #0
				; CHECK: sshll v1.2d, v1.2s, #0
				; CHECK: ptrue p0.d, vl2
				; CHECK: mul z0.d, p0/m, z0.d, z1.d
				; CHECK: shrn v0.2s, v0.2d, #32
				; CHECK: ret
				%insert = insertelement <2 x i64> undef, i64 32, i64 0
				paulwalker-armUnsubmitted Not Done Reply Inline Actions Weirdly SVE is not being used here. Is the output different when SVE is disable? paulwalker-arm: Weirdly SVE is not being used here. Is the output different when SVE is disable?
				bsmithAuthorUnsubmitted Done Reply Inline Actions No, it's the same without SVE enabled. NEON has patterns to match 128-bit mulh nodes (but not 64-bit as above), as it can use the smull2+smull pattern below. Perhaps we should still fall back to SVE instead of this sequence? (Or I just fix the comment..) bsmith: No, it's the same without SVE enabled. NEON has patterns to match 128-bit mulh nodes (but not…
				paulwalker-armUnsubmitted Not Done Reply Inline Actions Thanks. In which case fixing the comment works for me. paulwalker-arm: Thanks. In which case fixing the comment works for me.
				%splat = shufflevector <2 x i64> %insert, <2 x i64> undef, <2 x i32> zeroinitializer
				%1 = sext <2 x i32> %op1 to <2 x i64>
				%2 = sext <2 x i32> %op2 to <2 x i64>
				%mul = mul <2 x i64> %1, %2
				%shr = lshr <2 x i64> %mul, %splat
				%res = trunc <2 x i64> %shr to <2 x i32>
				ret <2 x i32> %res
				}

				; Don't use SVE for 128-bit vectors.
				define <4 x i32> @smulh_v4i32(<4 x i32> %op1, <4 x i32> %op2) #0 {
				; CHECK-LABEL: smulh_v4i32:
				; CHECK: smull2 v2.2d, v0.4s, v1.4s
				; CHECK: smull v0.2d, v0.2s, v1.2s
				; CHECK: uzp2 v0.4s, v0.4s, v2.4s
				; CHECK: ret
				%insert = insertelement <4 x i64> undef, i64 32, i64 0
				%splat = shufflevector <4 x i64> %insert, <4 x i64> undef, <4 x i32> zeroinitializer
				%1 = sext <4 x i32> %op1 to <4 x i64>
				%2 = sext <4 x i32> %op2 to <4 x i64>
				%mul = mul <4 x i64> %1, %2
				%shr = lshr <4 x i64> %mul, %splat
				%res = trunc <4 x i64> %shr to <4 x i32>
				ret <4 x i32> %res
				}

				define void @smulh_v8i32(<8 x i32>* %a, <8 x i32>* %b) #0 {
				; CHECK-LABEL: smulh_v8i32:
				; VBITS_GE_256: ptrue [[PG:p[0-9]+]].s, vl[[#min(VBYTES,8)]]
				; VBITS_GE_256-DAG: ld1w { [[OP1:z[0-9]+]].s }, [[PG]]/z, [x0]
				; VBITS_GE_256-DAG: ld1w { [[OP2:z[0-9]+]].s }, [[PG]]/z, [x1]
				; VBITS_GE_256: smulh [[RES:z[0-9]+]].s, [[PG]]/m, [[OP1]].s, [[OP2]].s
				; VBITS_GE_256: st1w { [[RES]].s }, [[PG]], [x0]
				; VBITS_GE_256: ret
				%op1 = load <8 x i32>, <8 x i32>* %a
				%op2 = load <8 x i32>, <8 x i32>* %b
				%insert = insertelement <8 x i64> undef, i64 32, i64 0
				%splat = shufflevector <8 x i64> %insert, <8 x i64> undef, <8 x i32> zeroinitializer
				%1 = sext <8 x i32> %op1 to <8 x i64>
				%2 = sext <8 x i32> %op2 to <8 x i64>
				%mul = mul <8 x i64> %1, %2
				%shr = lshr <8 x i64> %mul, %splat
				%res = trunc <8 x i64> %shr to <8 x i32>
				store <8 x i32> %res, <8 x i32>* %a
				ret void
				}

				define void @smulh_v16i32(<16 x i32>* %a, <16 x i32>* %b) #0 {
				; CHECK-LABEL: smulh_v16i32:
				; VBITS_GE_512: ptrue [[PG:p[0-9]+]].s, vl[[#min(VBYTES,16)]]
				; VBITS_GE_512-DAG: ld1w { [[OP1:z[0-9]+]].s }, [[PG]]/z, [x0]
				; VBITS_GE_512-DAG: ld1w { [[OP2:z[0-9]+]].s }, [[PG]]/z, [x1]
				; VBITS_GE_512: smulh [[RES:z[0-9]+]].s, [[PG]]/m, [[OP1]].s, [[OP2]].s
				; VBITS_GE_512: st1w { [[RES]].s }, [[PG]], [x0]
				; VBITS_GE_512: ret
				%op1 = load <16 x i32>, <16 x i32>* %a
				%op2 = load <16 x i32>, <16 x i32>* %b
				%insert = insertelement <16 x i64> undef, i64 32, i64 0
				%splat = shufflevector <16 x i64> %insert, <16 x i64> undef, <16 x i32> zeroinitializer
				%1 = sext <16 x i32> %op1 to <16 x i64>
				%2 = sext <16 x i32> %op2 to <16 x i64>
				%mul = mul <16 x i64> %1, %2
				%shr = lshr <16 x i64> %mul, %splat
				%res = trunc <16 x i64> %shr to <16 x i32>
				store <16 x i32> %res, <16 x i32>* %a
				ret void
				}

				define void @smulh_v32i32(<32 x i32>* %a, <32 x i32>* %b) #0 {
				; CHECK-LABEL: smulh_v32i32:
				; VBITS_GE_1024: ptrue [[PG:p[0-9]+]].s, vl[[#min(VBYTES,32)]]
				; VBITS_GE_1024-DAG: ld1w { [[OP1:z[0-9]+]].s }, [[PG]]/z, [x0]
				; VBITS_GE_1024-DAG: ld1w { [[OP2:z[0-9]+]].s }, [[PG]]/z, [x1]
				; VBITS_GE_1024: smulh [[RES:z[0-9]+]].s, [[PG]]/m, [[OP1]].s, [[OP2]].s
				; VBITS_GE_1024: st1w { [[RES]].s }, [[PG]], [x0]
				; VBITS_GE_1024: ret
				%op1 = load <32 x i32>, <32 x i32>* %a
				%op2 = load <32 x i32>, <32 x i32>* %b
				%insert = insertelement <32 x i64> undef, i64 32, i64 0
				%splat = shufflevector <32 x i64> %insert, <32 x i64> undef, <32 x i32> zeroinitializer
				%1 = sext <32 x i32> %op1 to <32 x i64>
				%2 = sext <32 x i32> %op2 to <32 x i64>
				%mul = mul <32 x i64> %1, %2
				%shr = lshr <32 x i64> %mul, %splat
				%res = trunc <32 x i64> %shr to <32 x i32>
				store <32 x i32> %res, <32 x i32>* %a
				ret void
				}

				define void @smulh_v64i32(<64 x i32>* %a, <64 x i32>* %b) #0 {
				; CHECK-LABEL: smulh_v64i32:
				; VBITS_GE_2048: ptrue [[PG:p[0-9]+]].s, vl[[#min(VBYTES,64)]]
				; VBITS_GE_2048-DAG: ld1w { [[OP1:z[0-9]+]].s }, [[PG]]/z, [x0]
				; VBITS_GE_2048-DAG: ld1w { [[OP2:z[0-9]+]].s }, [[PG]]/z, [x1]
				; VBITS_GE_2048: smulh [[RES:z[0-9]+]].s, [[PG]]/m, [[OP1]].s, [[OP2]].s
				; VBITS_GE_2048: st1w { [[RES]].s }, [[PG]], [x0]
				; VBITS_GE_2048: ret
				%op1 = load <64 x i32>, <64 x i32>* %a
				%op2 = load <64 x i32>, <64 x i32>* %b
				%insert = insertelement <64 x i64> undef, i64 32, i64 0
				%splat = shufflevector <64 x i64> %insert, <64 x i64> undef, <64 x i32> zeroinitializer
				%1 = sext <64 x i32> %op1 to <64 x i64>
				paulwalker-armUnsubmitted Done Reply Inline Actions As there's no NEON instruction for i64 based vectors I'm wondering if it's worth using SVE for this case as well? much like we do for ISD::MUL. paulwalker-arm: As there's no NEON instruction for i64 based vectors I'm wondering if it's worth using SVE for…
				%2 = sext <64 x i32> %op2 to <64 x i64>
				%mul = mul <64 x i64> %1, %2
				%shr = lshr <64 x i64> %mul, %splat
				%res = trunc <64 x i64> %shr to <64 x i32>
				store <64 x i32> %res, <64 x i32>* %a
				ret void
				}

				; Vector i64 multiplications are not legal for NEON so use SVE when available.
				define <1 x i64> @smulh_v1i64(<1 x i64> %op1, <1 x i64> %op2) #0 {
				; CHECK-LABEL: smulh_v1i64:
				; CHECK: ptrue p0.d, vl1
				; CHECK: smulh z0.d, p0/m, z0.d, z1.d
				; CHECK: ret
				%insert = insertelement <1 x i128> undef, i128 64, i128 0
				%splat = shufflevector <1 x i128> %insert, <1 x i128> undef, <1 x i32> zeroinitializer
				%1 = sext <1 x i64> %op1 to <1 x i128>
				%2 = sext <1 x i64> %op2 to <1 x i128>
				%mul = mul <1 x i128> %1, %2
				%shr = lshr <1 x i128> %mul, %splat
				%res = trunc <1 x i128> %shr to <1 x i64>
				ret <1 x i64> %res
				}

				; Vector i64 multiplications are not legal for NEON so use SVE when available.
				define <2 x i64> @smulh_v2i64(<2 x i64> %op1, <2 x i64> %op2) #0 {
				; CHECK-LABEL: smulh_v2i64:
				; CHECK: ptrue p0.d, vl2
				; CHECK: smulh z0.d, p0/m, z0.d, z1.d
				; CHECK: ret
				%insert = insertelement <2 x i128> undef, i128 64, i128 0
				%splat = shufflevector <2 x i128> %insert, <2 x i128> undef, <2 x i32> zeroinitializer
				%1 = sext <2 x i64> %op1 to <2 x i128>
				%2 = sext <2 x i64> %op2 to <2 x i128>
				%mul = mul <2 x i128> %1, %2
				%shr = lshr <2 x i128> %mul, %splat
				%res = trunc <2 x i128> %shr to <2 x i64>
				ret <2 x i64> %res
				}

				define void @smulh_v4i64(<4 x i64>* %a, <4 x i64>* %b) #0 {
				; CHECK-LABEL: smulh_v4i64:
				; VBITS_GE_256: ptrue [[PG:p[0-9]+]].d, vl[[#min(VBYTES,4)]]
				; VBITS_GE_256-DAG: ld1d { [[OP1:z[0-9]+]].d }, [[PG]]/z, [x0]
				; VBITS_GE_256-DAG: ld1d { [[OP2:z[0-9]+]].d }, [[PG]]/z, [x1]
				; VBITS_GE_256: smulh [[RES:z[0-9]+]].d, [[PG]]/m, [[OP1]].d, [[OP2]].d
				; VBITS_GE_256: st1d { [[RES]].d }, [[PG]], [x0]
				; VBITS_GE_256: ret
				%op1 = load <4 x i64>, <4 x i64>* %a
				%op2 = load <4 x i64>, <4 x i64>* %b
				%insert = insertelement <4 x i128> undef, i128 64, i128 0
				%splat = shufflevector <4 x i128> %insert, <4 x i128> undef, <4 x i32> zeroinitializer
				%1 = sext <4 x i64> %op1 to <4 x i128>
				%2 = sext <4 x i64> %op2 to <4 x i128>
				%mul = mul <4 x i128> %1, %2
				%shr = lshr <4 x i128> %mul, %splat
				%res = trunc <4 x i128> %shr to <4 x i64>
				store <4 x i64> %res, <4 x i64>* %a
				ret void
				}

				define void @smulh_v8i64(<8 x i64>* %a, <8 x i64>* %b) #0 {
				; CHECK-LABEL: smulh_v8i64:
				; VBITS_GE_512: ptrue [[PG:p[0-9]+]].d, vl[[#min(VBYTES,8)]]
				; VBITS_GE_512-DAG: ld1d { [[OP1:z[0-9]+]].d }, [[PG]]/z, [x0]
				; VBITS_GE_512-DAG: ld1d { [[OP2:z[0-9]+]].d }, [[PG]]/z, [x1]
				; VBITS_GE_512: smulh [[RES:z[0-9]+]].d, [[PG]]/m, [[OP1]].d, [[OP2]].d
				; VBITS_GE_512: st1d { [[RES]].d }, [[PG]], [x0]
				; VBITS_GE_512: ret
				%op1 = load <8 x i64>, <8 x i64>* %a
				%op2 = load <8 x i64>, <8 x i64>* %b
				%insert = insertelement <8 x i128> undef, i128 64, i128 0
				%splat = shufflevector <8 x i128> %insert, <8 x i128> undef, <8 x i32> zeroinitializer
				%1 = sext <8 x i64> %op1 to <8 x i128>
				%2 = sext <8 x i64> %op2 to <8 x i128>
				%mul = mul <8 x i128> %1, %2
				%shr = lshr <8 x i128> %mul, %splat
				%res = trunc <8 x i128> %shr to <8 x i64>
				store <8 x i64> %res, <8 x i64>* %a
				ret void
				}

				define void @smulh_v16i64(<16 x i64>* %a, <16 x i64>* %b) #0 {
				; CHECK-LABEL: smulh_v16i64:
				; VBITS_GE_1024: ptrue [[PG:p[0-9]+]].d, vl[[#min(VBYTES,16)]]
				; VBITS_GE_1024-DAG: ld1d { [[OP1:z[0-9]+]].d }, [[PG]]/z, [x0]
				; VBITS_GE_1024-DAG: ld1d { [[OP2:z[0-9]+]].d }, [[PG]]/z, [x1]
				; VBITS_GE_1024: smulh [[RES:z[0-9]+]].d, [[PG]]/m, [[OP1]].d, [[OP2]].d
				; VBITS_GE_1024: st1d { [[RES]].d }, [[PG]], [x0]
				; VBITS_GE_1024: ret
				%op1 = load <16 x i64>, <16 x i64>* %a
				%op2 = load <16 x i64>, <16 x i64>* %b
				%insert = insertelement <16 x i128> undef, i128 64, i128 0
				%splat = shufflevector <16 x i128> %insert, <16 x i128> undef, <16 x i32> zeroinitializer
				%1 = sext <16 x i64> %op1 to <16 x i128>
				%2 = sext <16 x i64> %op2 to <16 x i128>
				%mul = mul <16 x i128> %1, %2
				%shr = lshr <16 x i128> %mul, %splat
				%res = trunc <16 x i128> %shr to <16 x i64>
				store <16 x i64> %res, <16 x i64>* %a
				ret void
				}

				define void @smulh_v32i64(<32 x i64>* %a, <32 x i64>* %b) #0 {
				; CHECK-LABEL: smulh_v32i64:
				; VBITS_GE_2048: ptrue [[PG:p[0-9]+]].d, vl[[#min(VBYTES,32)]]
				; VBITS_GE_2048-DAG: ld1d { [[OP1:z[0-9]+]].d }, [[PG]]/z, [x0]
				; VBITS_GE_2048-DAG: ld1d { [[OP2:z[0-9]+]].d }, [[PG]]/z, [x1]
				; VBITS_GE_2048: smulh [[RES:z[0-9]+]].d, [[PG]]/m, [[OP1]].d, [[OP2]].d
				; VBITS_GE_2048: st1d { [[RES]].d }, [[PG]], [x0]
				; VBITS_GE_2048: ret
				%op1 = load <32 x i64>, <32 x i64>* %a
				%op2 = load <32 x i64>, <32 x i64>* %b
				%insert = insertelement <32 x i128> undef, i128 64, i128 0
				%splat = shufflevector <32 x i128> %insert, <32 x i128> undef, <32 x i32> zeroinitializer
				%1 = sext <32 x i64> %op1 to <32 x i128>
				%2 = sext <32 x i64> %op2 to <32 x i128>
				%mul = mul <32 x i128> %1, %2
				%shr = lshr <32 x i128> %mul, %splat
				%res = trunc <32 x i128> %shr to <32 x i64>
				store <32 x i64> %res, <32 x i64>* %a
				ret void
				}

				;
				; UMULH
				;

				; Don't use SVE for 64-bit vectors.
				define <8 x i8> @umulh_v8i8(<8 x i8> %op1, <8 x i8> %op2) #0 {
				; CHECK-LABEL: umulh_v8i8:
				; CHECK: umull v0.8h, v0.8b, v1.8b
				; CHECK: ushr v1.8h, v0.8h, #8
				; CHECK: umov w8, v1.h[0]
				; CHECK: fmov s0, w8
				; CHECK: umov w8, v1.h[1]
				; CHECK: mov v0.b[1], w8
				; CHECK: umov w8, v1.h[2]
				; CHECK: mov v0.b[2], w8
				; CHECK: umov w8, v1.h[3]
				; CHECK: mov v0.b[3], w8
				; CHECK: ret
				%insert = insertelement <8 x i16> undef, i16 8, i64 0
				%splat = shufflevector <8 x i16> %insert, <8 x i16> undef, <8 x i32> zeroinitializer
				%1 = zext <8 x i8> %op1 to <8 x i16>
				%2 = zext <8 x i8> %op2 to <8 x i16>
				%mul = mul <8 x i16> %1, %2
				%shr = lshr <8 x i16> %mul, %splat
				%res = trunc <8 x i16> %shr to <8 x i8>
				ret <8 x i8> %res
				}

				; Don't use SVE for 128-bit vectors.
				define <16 x i8> @umulh_v16i8(<16 x i8> %op1, <16 x i8> %op2) #0 {
				; CHECK-LABEL: umulh_v16i8:
				; CHECK: umull2 v2.8h, v0.16b, v1.16b
				; CHECK: umull v0.8h, v0.8b, v1.8b
				; CHECK: uzp2 v0.16b, v0.16b, v2.16b
				; CHECK: ret
				%insert = insertelement <16 x i16> undef, i16 8, i64 0
				%splat = shufflevector <16 x i16> %insert, <16 x i16> undef, <16 x i32> zeroinitializer
				%1 = zext <16 x i8> %op1 to <16 x i16>
				%2 = zext <16 x i8> %op2 to <16 x i16>
				%mul = mul <16 x i16> %1, %2
				%shr = lshr <16 x i16> %mul, %splat
				%res = trunc <16 x i16> %shr to <16 x i8>
				ret <16 x i8> %res
				}

				define void @umulh_v32i8(<32 x i8>* %a, <32 x i8>* %b) #0 {
				; CHECK-LABEL: umulh_v32i8:
				; VBITS_GE_256: ptrue [[PG:p[0-9]+]].b, vl[[#min(VBYTES,32)]]
				; VBITS_GE_256-DAG: ld1b { [[OP1:z[0-9]+]].b }, [[PG]]/z, [x0]
				; VBITS_GE_256-DAG: ld1b { [[OP2:z[0-9]+]].b }, [[PG]]/z, [x1]
				; VBITS_GE_256: umulh [[RES:z[0-9]+]].b, [[PG]]/m, [[OP1]].b, [[OP2]].b
				; VBITS_GE_256: st1b { [[RES]].b }, [[PG]], [x0]
				; VBITS_GE_256: ret
				%op1 = load <32 x i8>, <32 x i8>* %a
				%op2 = load <32 x i8>, <32 x i8>* %b
				%insert = insertelement <32 x i16> undef, i16 8, i64 0
				%splat = shufflevector <32 x i16> %insert, <32 x i16> undef, <32 x i32> zeroinitializer
				%1 = zext <32 x i8> %op1 to <32 x i16>
				%2 = zext <32 x i8> %op2 to <32 x i16>
				%mul = mul <32 x i16> %1, %2
				%shr = lshr <32 x i16> %mul, %splat
				%res = trunc <32 x i16> %shr to <32 x i8>
				store <32 x i8> %res, <32 x i8>* %a
				ret void
				}

				define void @umulh_v64i8(<64 x i8>* %a, <64 x i8>* %b) #0 {
				; CHECK-LABEL: umulh_v64i8:
				; VBITS_GE_512: ptrue [[PG:p[0-9]+]].b, vl[[#min(VBYTES,64)]]
				; VBITS_GE_512-DAG: ld1b { [[OP1:z[0-9]+]].b }, [[PG]]/z, [x0]
				; VBITS_GE_512-DAG: ld1b { [[OP2:z[0-9]+]].b }, [[PG]]/z, [x1]
				; VBITS_GE_512: umulh [[RES:z[0-9]+]].b, [[PG]]/m, [[OP1]].b, [[OP2]].b
				; VBITS_GE_512: st1b { [[RES]].b }, [[PG]], [x0]
				; VBITS_GE_512: ret
				%op1 = load <64 x i8>, <64 x i8>* %a
				%op2 = load <64 x i8>, <64 x i8>* %b
				%insert = insertelement <64 x i16> undef, i16 8, i64 0
				%splat = shufflevector <64 x i16> %insert, <64 x i16> undef, <64 x i32> zeroinitializer
				%1 = zext <64 x i8> %op1 to <64 x i16>
				%2 = zext <64 x i8> %op2 to <64 x i16>
				%mul = mul <64 x i16> %1, %2
				%shr = lshr <64 x i16> %mul, %splat
				%res = trunc <64 x i16> %shr to <64 x i8>
				store <64 x i8> %res, <64 x i8>* %a
				ret void
				}

				define void @umulh_v128i8(<128 x i8>* %a, <128 x i8>* %b) #0 {
				; CHECK-LABEL: umulh_v128i8:
				; VBITS_GE_1024: ptrue [[PG:p[0-9]+]].b, vl[[#min(VBYTES,128)]]
				; VBITS_GE_1024-DAG: ld1b { [[OP1:z[0-9]+]].b }, [[PG]]/z, [x0]
				; VBITS_GE_1024-DAG: ld1b { [[OP2:z[0-9]+]].b }, [[PG]]/z, [x1]
				; VBITS_GE_1024: umulh [[RES:z[0-9]+]].b, [[PG]]/m, [[OP1]].b, [[OP2]].b
				; VBITS_GE_1024: st1b { [[RES]].b }, [[PG]], [x0]
				; VBITS_GE_1024: ret
				%op1 = load <128 x i8>, <128 x i8>* %a
				%op2 = load <128 x i8>, <128 x i8>* %b
				%insert = insertelement <128 x i16> undef, i16 8, i64 0
				%splat = shufflevector <128 x i16> %insert, <128 x i16> undef, <128 x i32> zeroinitializer
				%1 = zext <128 x i8> %op1 to <128 x i16>
				%2 = zext <128 x i8> %op2 to <128 x i16>
				%mul = mul <128 x i16> %1, %2
				%shr = lshr <128 x i16> %mul, %splat
				%res = trunc <128 x i16> %shr to <128 x i8>
				store <128 x i8> %res, <128 x i8>* %a
				ret void
				}

				define void @umulh_v256i8(<256 x i8>* %a, <256 x i8>* %b) #0 {
				; CHECK-LABEL: umulh_v256i8:
				; VBITS_GE_2048: ptrue [[PG:p[0-9]+]].b, vl[[#min(VBYTES,256)]]
				; VBITS_GE_2048-DAG: ld1b { [[OP1:z[0-9]+]].b }, [[PG]]/z, [x0]
				; VBITS_GE_2048-DAG: ld1b { [[OP2:z[0-9]+]].b }, [[PG]]/z, [x1]
				; VBITS_GE_2048: umulh [[RES:z[0-9]+]].b, [[PG]]/m, [[OP1]].b, [[OP2]].b
				; VBITS_GE_2048: st1b { [[RES]].b }, [[PG]], [x0]
				; VBITS_GE_2048: ret
				%op1 = load <256 x i8>, <256 x i8>* %a
				%op2 = load <256 x i8>, <256 x i8>* %b
				%insert = insertelement <256 x i16> undef, i16 8, i64 0
				%splat = shufflevector <256 x i16> %insert, <256 x i16> undef, <256 x i32> zeroinitializer
				%1 = zext <256 x i8> %op1 to <256 x i16>
				%2 = zext <256 x i8> %op2 to <256 x i16>
				%mul = mul <256 x i16> %1, %2
				%shr = lshr <256 x i16> %mul, %splat
				%res = trunc <256 x i16> %shr to <256 x i8>
				store <256 x i8> %res, <256 x i8>* %a
				ret void
				}

				; Don't use SVE for 64-bit vectors.
				define <4 x i16> @umulh_v4i16(<4 x i16> %op1, <4 x i16> %op2) #0 {
				; CHECK-LABEL: umulh_v4i16:
				; CHECK: umull v0.4s, v0.4h, v1.4h
				; CHECK: ushr v0.4s, v0.4s, #16
				; CHECK: mov w8, v0.s[1]
				; CHECK: mov w9, v0.s[2]
				; CHECK: mov w10, v0.s[3]
				; CHECK: mov v0.h[1], w8
				; CHECK: mov v0.h[2], w9
				; CHECK: mov v0.h[3], w10
				; CHECK: ret
				%insert = insertelement <4 x i32> undef, i32 16, i64 0
				%splat = shufflevector <4 x i32> %insert, <4 x i32> undef, <4 x i32> zeroinitializer
				%1 = zext <4 x i16> %op1 to <4 x i32>
				%2 = zext <4 x i16> %op2 to <4 x i32>
				%mul = mul <4 x i32> %1, %2
				%shr = lshr <4 x i32> %mul, %splat
				%res = trunc <4 x i32> %shr to <4 x i16>
				ret <4 x i16> %res
				}

				; Don't use SVE for 128-bit vectors.
				define <8 x i16> @umulh_v8i16(<8 x i16> %op1, <8 x i16> %op2) #0 {
				; CHECK-LABEL: umulh_v8i16:
				; CHECK: umull2 v2.4s, v0.8h, v1.8h
				; CHECK: umull v0.4s, v0.4h, v1.4h
				; CHECK: uzp2 v0.8h, v0.8h, v2.8h
				; CHECK: ret
				%insert = insertelement <8 x i32> undef, i32 16, i64 0
				%splat = shufflevector <8 x i32> %insert, <8 x i32> undef, <8 x i32> zeroinitializer
				%1 = zext <8 x i16> %op1 to <8 x i32>
				%2 = zext <8 x i16> %op2 to <8 x i32>
				%mul = mul <8 x i32> %1, %2
				%shr = lshr <8 x i32> %mul, %splat
				%res = trunc <8 x i32> %shr to <8 x i16>
				ret <8 x i16> %res
				}

				define void @umulh_v16i16(<16 x i16>* %a, <16 x i16>* %b) #0 {
				; CHECK-LABEL: umulh_v16i16:
				; VBITS_GE_256: ptrue [[PG:p[0-9]+]].h, vl[[#min(VBYTES,16)]]
				; VBITS_GE_256-DAG: ld1h { [[OP1:z[0-9]+]].h }, [[PG]]/z, [x0]
				; VBITS_GE_256-DAG: ld1h { [[OP2:z[0-9]+]].h }, [[PG]]/z, [x1]
				; VBITS_GE_256: umulh [[RES:z[0-9]+]].h, [[PG]]/m, [[OP1]].h, [[OP2]].h
				; VBITS_GE_256: st1h { [[RES]].h }, [[PG]], [x0]
				; VBITS_GE_256: ret
				%op1 = load <16 x i16>, <16 x i16>* %a
				%op2 = load <16 x i16>, <16 x i16>* %b
				%insert = insertelement <16 x i32> undef, i32 16, i64 0
				%splat = shufflevector <16 x i32> %insert, <16 x i32> undef, <16 x i32> zeroinitializer
				%1 = zext <16 x i16> %op1 to <16 x i32>
				%2 = zext <16 x i16> %op2 to <16 x i32>
				%mul = mul <16 x i32> %1, %2
				%shr = lshr <16 x i32> %mul, %splat
				%res = trunc <16 x i32> %shr to <16 x i16>
				store <16 x i16> %res, <16 x i16>* %a
				ret void
				}

				define void @umulh_v32i16(<32 x i16>* %a, <32 x i16>* %b) #0 {
				; CHECK-LABEL: umulh_v32i16:
				; VBITS_GE_512: ptrue [[PG:p[0-9]+]].h, vl[[#min(VBYTES,32)]]
				; VBITS_GE_512-DAG: ld1h { [[OP1:z[0-9]+]].h }, [[PG]]/z, [x0]
				; VBITS_GE_512-DAG: ld1h { [[OP2:z[0-9]+]].h }, [[PG]]/z, [x1]
				; VBITS_GE_512: umulh [[RES:z[0-9]+]].h, [[PG]]/m, [[OP1]].h, [[OP2]].h
				; VBITS_GE_512: st1h { [[RES]].h }, [[PG]], [x0]
				; VBITS_GE_512: ret
				%op1 = load <32 x i16>, <32 x i16>* %a
				%op2 = load <32 x i16>, <32 x i16>* %b
				%insert = insertelement <32 x i32> undef, i32 16, i64 0
				%splat = shufflevector <32 x i32> %insert, <32 x i32> undef, <32 x i32> zeroinitializer
				%1 = zext <32 x i16> %op1 to <32 x i32>
				%2 = zext <32 x i16> %op2 to <32 x i32>
				%mul = mul <32 x i32> %1, %2
				%shr = lshr <32 x i32> %mul, %splat
				%res = trunc <32 x i32> %shr to <32 x i16>
				store <32 x i16> %res, <32 x i16>* %a
				ret void
				}

				define void @umulh_v64i16(<64 x i16>* %a, <64 x i16>* %b) #0 {
				; CHECK-LABEL: umulh_v64i16:
				; VBITS_GE_1024: ptrue [[PG:p[0-9]+]].h, vl[[#min(VBYTES,64)]]
				; VBITS_GE_1024-DAG: ld1h { [[OP1:z[0-9]+]].h }, [[PG]]/z, [x0]
				; VBITS_GE_1024-DAG: ld1h { [[OP2:z[0-9]+]].h }, [[PG]]/z, [x1]
				; VBITS_GE_1024: umulh [[RES:z[0-9]+]].h, [[PG]]/m, [[OP1]].h, [[OP2]].h
				; VBITS_GE_1024: st1h { [[RES]].h }, [[PG]], [x0]
				; VBITS_GE_1024: ret
				%op1 = load <64 x i16>, <64 x i16>* %a
				%op2 = load <64 x i16>, <64 x i16>* %b
				%insert = insertelement <64 x i32> undef, i32 16, i64 0
				%splat = shufflevector <64 x i32> %insert, <64 x i32> undef, <64 x i32> zeroinitializer
				%1 = zext <64 x i16> %op1 to <64 x i32>
				%2 = zext <64 x i16> %op2 to <64 x i32>
				%mul = mul <64 x i32> %1, %2
				%shr = lshr <64 x i32> %mul, %splat
				%res = trunc <64 x i32> %shr to <64 x i16>
				store <64 x i16> %res, <64 x i16>* %a
				ret void
				}

				define void @umulh_v128i16(<128 x i16>* %a, <128 x i16>* %b) #0 {
				; CHECK-LABEL: umulh_v128i16:
				; VBITS_GE_2048: ptrue [[PG:p[0-9]+]].h, vl[[#min(VBYTES,128)]]
				; VBITS_GE_2048-DAG: ld1h { [[OP1:z[0-9]+]].h }, [[PG]]/z, [x0]
				; VBITS_GE_2048-DAG: ld1h { [[OP2:z[0-9]+]].h }, [[PG]]/z, [x1]
				; VBITS_GE_2048: umulh [[RES:z[0-9]+]].h, [[PG]]/m, [[OP1]].h, [[OP2]].h
				; VBITS_GE_2048: st1h { [[RES]].h }, [[PG]], [x0]
				; VBITS_GE_2048: ret
				%op1 = load <128 x i16>, <128 x i16>* %a
				%op2 = load <128 x i16>, <128 x i16>* %b
				%insert = insertelement <128 x i32> undef, i32 16, i64 0
				%splat = shufflevector <128 x i32> %insert, <128 x i32> undef, <128 x i32> zeroinitializer
				%1 = zext <128 x i16> %op1 to <128 x i32>
				%2 = zext <128 x i16> %op2 to <128 x i32>
				%mul = mul <128 x i32> %1, %2
				%shr = lshr <128 x i32> %mul, %splat
				%res = trunc <128 x i32> %shr to <128 x i16>
				store <128 x i16> %res, <128 x i16>* %a
				ret void
				}

				; Vector i64 multiplications are not legal for NEON so use SVE when available.
				define <2 x i32> @umulh_v2i32(<2 x i32> %op1, <2 x i32> %op2) #0 {
				; CHECK-LABEL: umulh_v2i32:
				; CHECK: ushll v0.2d, v0.2s, #0
				; CHECK: ushll v1.2d, v1.2s, #0
				; CHECK: ptrue p0.d, vl2
				; CHECK: mul z0.d, p0/m, z0.d, z1.d
				; CHECK: shrn v0.2s, v0.2d, #32
				; CHECK: ret
				%insert = insertelement <2 x i64> undef, i64 32, i64 0
				paulwalker-armUnsubmitted Not Done Reply Inline Actions Same comment as smulh_v4i32. paulwalker-arm: Same comment as smulh_v4i32.
				%splat = shufflevector <2 x i64> %insert, <2 x i64> undef, <2 x i32> zeroinitializer
				%1 = zext <2 x i32> %op1 to <2 x i64>
				%2 = zext <2 x i32> %op2 to <2 x i64>
				%mul = mul <2 x i64> %1, %2
				%shr = lshr <2 x i64> %mul, %splat
				%res = trunc <2 x i64> %shr to <2 x i32>
				ret <2 x i32> %res
				}

				; Don't use SVE for 128-bit vectors.
				define <4 x i32> @umulh_v4i32(<4 x i32> %op1, <4 x i32> %op2) #0 {
				; CHECK-LABEL: umulh_v4i32:
				; CHECK: umull2 v2.2d, v0.4s, v1.4s
				; CHECK: umull v0.2d, v0.2s, v1.2s
				; CHECK: uzp2 v0.4s, v0.4s, v2.4s
				; CHECK: ret
				%insert = insertelement <4 x i64> undef, i64 32, i64 0
				%splat = shufflevector <4 x i64> %insert, <4 x i64> undef, <4 x i32> zeroinitializer
				%1 = zext <4 x i32> %op1 to <4 x i64>
				%2 = zext <4 x i32> %op2 to <4 x i64>
				%mul = mul <4 x i64> %1, %2
				%shr = lshr <4 x i64> %mul, %splat
				%res = trunc <4 x i64> %shr to <4 x i32>
				ret <4 x i32> %res
				}

				define void @umulh_v8i32(<8 x i32>* %a, <8 x i32>* %b) #0 {
				; CHECK-LABEL: umulh_v8i32:
				; VBITS_GE_256: ptrue [[PG:p[0-9]+]].s, vl[[#min(VBYTES,8)]]
				; VBITS_GE_256-DAG: ld1w { [[OP1:z[0-9]+]].s }, [[PG]]/z, [x0]
				; VBITS_GE_256-DAG: ld1w { [[OP2:z[0-9]+]].s }, [[PG]]/z, [x1]
				; VBITS_GE_256: umulh [[RES:z[0-9]+]].s, [[PG]]/m, [[OP1]].s, [[OP2]].s
				; VBITS_GE_256: st1w { [[RES]].s }, [[PG]], [x0]
				; VBITS_GE_256: ret
				%op1 = load <8 x i32>, <8 x i32>* %a
				%op2 = load <8 x i32>, <8 x i32>* %b
				%insert = insertelement <8 x i64> undef, i64 32, i64 0
				%splat = shufflevector <8 x i64> %insert, <8 x i64> undef, <8 x i32> zeroinitializer
				%1 = zext <8 x i32> %op1 to <8 x i64>
				%2 = zext <8 x i32> %op2 to <8 x i64>
				%mul = mul <8 x i64> %1, %2
				%shr = lshr <8 x i64> %mul, %splat
				%res = trunc <8 x i64> %shr to <8 x i32>
				store <8 x i32> %res, <8 x i32>* %a
				ret void
				}

				define void @umulh_v16i32(<16 x i32>* %a, <16 x i32>* %b) #0 {
				; CHECK-LABEL: umulh_v16i32:
				; VBITS_GE_512: ptrue [[PG:p[0-9]+]].s, vl[[#min(VBYTES,16)]]
				; VBITS_GE_512-DAG: ld1w { [[OP1:z[0-9]+]].s }, [[PG]]/z, [x0]
				; VBITS_GE_512-DAG: ld1w { [[OP2:z[0-9]+]].s }, [[PG]]/z, [x1]
				; VBITS_GE_512: umulh [[RES:z[0-9]+]].s, [[PG]]/m, [[OP1]].s, [[OP2]].s
				; VBITS_GE_512: st1w { [[RES]].s }, [[PG]], [x0]
				; VBITS_GE_512: ret
				%op1 = load <16 x i32>, <16 x i32>* %a
				%op2 = load <16 x i32>, <16 x i32>* %b
				%insert = insertelement <16 x i64> undef, i64 32, i64 0
				%splat = shufflevector <16 x i64> %insert, <16 x i64> undef, <16 x i32> zeroinitializer
				%1 = zext <16 x i32> %op1 to <16 x i64>
				%2 = zext <16 x i32> %op2 to <16 x i64>
				%mul = mul <16 x i64> %1, %2
				%shr = lshr <16 x i64> %mul, %splat
				%res = trunc <16 x i64> %shr to <16 x i32>
				store <16 x i32> %res, <16 x i32>* %a
				ret void
				}

				define void @umulh_v32i32(<32 x i32>* %a, <32 x i32>* %b) #0 {
				; CHECK-LABEL: umulh_v32i32:
				; VBITS_GE_1024: ptrue [[PG:p[0-9]+]].s, vl[[#min(VBYTES,32)]]
				; VBITS_GE_1024-DAG: ld1w { [[OP1:z[0-9]+]].s }, [[PG]]/z, [x0]
				; VBITS_GE_1024-DAG: ld1w { [[OP2:z[0-9]+]].s }, [[PG]]/z, [x1]
				; VBITS_GE_1024: umulh [[RES:z[0-9]+]].s, [[PG]]/m, [[OP1]].s, [[OP2]].s
				; VBITS_GE_1024: st1w { [[RES]].s }, [[PG]], [x0]
				; VBITS_GE_1024: ret
				%op1 = load <32 x i32>, <32 x i32>* %a
				%op2 = load <32 x i32>, <32 x i32>* %b
				%insert = insertelement <32 x i64> undef, i64 32, i64 0
				%splat = shufflevector <32 x i64> %insert, <32 x i64> undef, <32 x i32> zeroinitializer
				%1 = zext <32 x i32> %op1 to <32 x i64>
				%2 = zext <32 x i32> %op2 to <32 x i64>
				%mul = mul <32 x i64> %1, %2
				%shr = lshr <32 x i64> %mul, %splat
				%res = trunc <32 x i64> %shr to <32 x i32>
				store <32 x i32> %res, <32 x i32>* %a
				ret void
				}

				define void @umulh_v64i32(<64 x i32>* %a, <64 x i32>* %b) #0 {
				; CHECK-LABEL: umulh_v64i32:
				; VBITS_GE_2048: ptrue [[PG:p[0-9]+]].s, vl[[#min(VBYTES,64)]]
				; VBITS_GE_2048-DAG: ld1w { [[OP1:z[0-9]+]].s }, [[PG]]/z, [x0]
				; VBITS_GE_2048-DAG: ld1w { [[OP2:z[0-9]+]].s }, [[PG]]/z, [x1]
				; VBITS_GE_2048: umulh [[RES:z[0-9]+]].s, [[PG]]/m, [[OP1]].s, [[OP2]].s
				; VBITS_GE_2048: st1w { [[RES]].s }, [[PG]], [x0]
				; VBITS_GE_2048: ret
				%op1 = load <64 x i32>, <64 x i32>* %a
				%op2 = load <64 x i32>, <64 x i32>* %b
				%insert = insertelement <64 x i64> undef, i64 32, i64 0
				%splat = shufflevector <64 x i64> %insert, <64 x i64> undef, <64 x i32> zeroinitializer
				%1 = zext <64 x i32> %op1 to <64 x i64>
				%2 = zext <64 x i32> %op2 to <64 x i64>
				%mul = mul <64 x i64> %1, %2
				%shr = lshr <64 x i64> %mul, %splat
				%res = trunc <64 x i64> %shr to <64 x i32>
				store <64 x i32> %res, <64 x i32>* %a
				ret void
				}

				; Vector i64 multiplications are not legal for NEON so use SVE when available.
				define <1 x i64> @umulh_v1i64(<1 x i64> %op1, <1 x i64> %op2) #0 {
				; CHECK-LABEL: umulh_v1i64:
				; CHECK: ptrue p0.d, vl1
				; CHECK: umulh z0.d, p0/m, z0.d, z1.d
				; CHECK: ret
				%insert = insertelement <1 x i128> undef, i128 64, i128 0
				%splat = shufflevector <1 x i128> %insert, <1 x i128> undef, <1 x i32> zeroinitializer
				%1 = zext <1 x i64> %op1 to <1 x i128>
				%2 = zext <1 x i64> %op2 to <1 x i128>
				%mul = mul <1 x i128> %1, %2
				%shr = lshr <1 x i128> %mul, %splat
				%res = trunc <1 x i128> %shr to <1 x i64>
				ret <1 x i64> %res
				}

				; Vector i64 multiplications are not legal for NEON so use SVE when available.
				define <2 x i64> @umulh_v2i64(<2 x i64> %op1, <2 x i64> %op2) #0 {
				; CHECK-LABEL: umulh_v2i64:
				; CHECK: ptrue p0.d, vl2
				; CHECK: umulh z0.d, p0/m, z0.d, z1.d
				; CHECK: ret
				%insert = insertelement <2 x i128> undef, i128 64, i128 0
				%splat = shufflevector <2 x i128> %insert, <2 x i128> undef, <2 x i32> zeroinitializer
				%1 = zext <2 x i64> %op1 to <2 x i128>
				%2 = zext <2 x i64> %op2 to <2 x i128>
				%mul = mul <2 x i128> %1, %2
				%shr = lshr <2 x i128> %mul, %splat
				%res = trunc <2 x i128> %shr to <2 x i64>
				ret <2 x i64> %res
				}

				define void @umulh_v4i64(<4 x i64>* %a, <4 x i64>* %b) #0 {
				; CHECK-LABEL: umulh_v4i64:
				; VBITS_GE_256: ptrue [[PG:p[0-9]+]].d, vl[[#min(VBYTES,4)]]
				; VBITS_GE_256-DAG: ld1d { [[OP1:z[0-9]+]].d }, [[PG]]/z, [x0]
				; VBITS_GE_256-DAG: ld1d { [[OP2:z[0-9]+]].d }, [[PG]]/z, [x1]
				; VBITS_GE_256: umulh [[RES:z[0-9]+]].d, [[PG]]/m, [[OP1]].d, [[OP2]].d
				; VBITS_GE_256: st1d { [[RES]].d }, [[PG]], [x0]
				; VBITS_GE_256: ret
				%op1 = load <4 x i64>, <4 x i64>* %a
				%op2 = load <4 x i64>, <4 x i64>* %b
				%insert = insertelement <4 x i128> undef, i128 64, i128 0
				%splat = shufflevector <4 x i128> %insert, <4 x i128> undef, <4 x i32> zeroinitializer
				%1 = zext <4 x i64> %op1 to <4 x i128>
				%2 = zext <4 x i64> %op2 to <4 x i128>
				%mul = mul <4 x i128> %1, %2
				%shr = lshr <4 x i128> %mul, %splat
				%res = trunc <4 x i128> %shr to <4 x i64>
				store <4 x i64> %res, <4 x i64>* %a
				ret void
				}

				define void @umulh_v8i64(<8 x i64>* %a, <8 x i64>* %b) #0 {
				; CHECK-LABEL: umulh_v8i64:
				; VBITS_GE_512: ptrue [[PG:p[0-9]+]].d, vl[[#min(VBYTES,8)]]
				; VBITS_GE_512-DAG: ld1d { [[OP1:z[0-9]+]].d }, [[PG]]/z, [x0]
				; VBITS_GE_512-DAG: ld1d { [[OP2:z[0-9]+]].d }, [[PG]]/z, [x1]
				; VBITS_GE_512: umulh [[RES:z[0-9]+]].d, [[PG]]/m, [[OP1]].d, [[OP2]].d
				; VBITS_GE_512: st1d { [[RES]].d }, [[PG]], [x0]
				; VBITS_GE_512: ret
				%op1 = load <8 x i64>, <8 x i64>* %a
				%op2 = load <8 x i64>, <8 x i64>* %b
				%insert = insertelement <8 x i128> undef, i128 64, i128 0
				%splat = shufflevector <8 x i128> %insert, <8 x i128> undef, <8 x i32> zeroinitializer
				%1 = zext <8 x i64> %op1 to <8 x i128>
				%2 = zext <8 x i64> %op2 to <8 x i128>
				%mul = mul <8 x i128> %1, %2
				%shr = lshr <8 x i128> %mul, %splat
				%res = trunc <8 x i128> %shr to <8 x i64>
				store <8 x i64> %res, <8 x i64>* %a
				ret void
				}

				define void @umulh_v16i64(<16 x i64>* %a, <16 x i64>* %b) #0 {
				; CHECK-LABEL: umulh_v16i64:
				; VBITS_GE_1024: ptrue [[PG:p[0-9]+]].d, vl[[#min(VBYTES,16)]]
				; VBITS_GE_1024-DAG: ld1d { [[OP1:z[0-9]+]].d }, [[PG]]/z, [x0]
				; VBITS_GE_1024-DAG: ld1d { [[OP2:z[0-9]+]].d }, [[PG]]/z, [x1]
				; VBITS_GE_1024: umulh [[RES:z[0-9]+]].d, [[PG]]/m, [[OP1]].d, [[OP2]].d
				; VBITS_GE_1024: st1d { [[RES]].d }, [[PG]], [x0]
				; VBITS_GE_1024: ret
				%op1 = load <16 x i64>, <16 x i64>* %a
				%op2 = load <16 x i64>, <16 x i64>* %b
				%insert = insertelement <16 x i128> undef, i128 64, i128 0
				%splat = shufflevector <16 x i128> %insert, <16 x i128> undef, <16 x i32> zeroinitializer
				%1 = zext <16 x i64> %op1 to <16 x i128>
				%2 = zext <16 x i64> %op2 to <16 x i128>
				%mul = mul <16 x i128> %1, %2
				%shr = lshr <16 x i128> %mul, %splat
				%res = trunc <16 x i128> %shr to <16 x i64>
				store <16 x i64> %res, <16 x i64>* %a
				ret void
				}

				define void @umulh_v32i64(<32 x i64>* %a, <32 x i64>* %b) #0 {
				; CHECK-LABEL: umulh_v32i64:
				; VBITS_GE_2048: ptrue [[PG:p[0-9]+]].d, vl[[#min(VBYTES,32)]]
				; VBITS_GE_2048-DAG: ld1d { [[OP1:z[0-9]+]].d }, [[PG]]/z, [x0]
				; VBITS_GE_2048-DAG: ld1d { [[OP2:z[0-9]+]].d }, [[PG]]/z, [x1]
				; VBITS_GE_2048: umulh [[RES:z[0-9]+]].d, [[PG]]/m, [[OP1]].d, [[OP2]].d
				; VBITS_GE_2048: st1d { [[RES]].d }, [[PG]], [x0]
				; VBITS_GE_2048: ret
				%op1 = load <32 x i64>, <32 x i64>* %a
				%op2 = load <32 x i64>, <32 x i64>* %b
				%insert = insertelement <32 x i128> undef, i128 64, i128 0
				%splat = shufflevector <32 x i128> %insert, <32 x i128> undef, <32 x i32> zeroinitializer
				%1 = zext <32 x i64> %op1 to <32 x i128>
				%2 = zext <32 x i64> %op2 to <32 x i128>
				%mul = mul <32 x i128> %1, %2
				%shr = lshr <32 x i128> %mul, %splat
				%res = trunc <32 x i128> %shr to <32 x i64>
				store <32 x i64> %res, <32 x i64>* %a
				ret void
				}
				attributes #0 = { "target-features"="+sve" }

llvm/test/CodeGen/AArch64/sve-int-arith-imm.ll

	Show First 20 Lines • Show All 770 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: lsr z0.d, z0.d, #63			; CHECK-NEXT: lsr z0.d, z0.d, #63
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%elt = insertelement <vscale x 2 x i64> undef, i64 63, i32 0			%elt = insertelement <vscale x 2 x i64> undef, i64 63, i32 0
	%splat = shufflevector <vscale x 2 x i64> %elt, <vscale x 2 x i64> undef, <vscale x 2 x i32> zeroinitializer			%splat = shufflevector <vscale x 2 x i64> %elt, <vscale x 2 x i64> undef, <vscale x 2 x i32> zeroinitializer
	%lshr = lshr <vscale x 2 x i64> %a, %splat			%lshr = lshr <vscale x 2 x i64> %a, %splat
	ret <vscale x 2 x i64> %lshr			ret <vscale x 2 x i64> %lshr
	}			}

	define <vscale x 4 x i32> @sdiv_const(<vscale x 4 x i32> %a) {			define <vscale x 4 x i32> @sdiv_const(<vscale x 4 x i32> %a) #0 {
	; CHECK-LABEL: sdiv_const:			; CHECK-LABEL: sdiv_const:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: mov z1.s, #3 // =0x3			; CHECK-NEXT: mov z1.s, #3 // =0x3
	; CHECK-NEXT: ptrue p0.s			; CHECK-NEXT: ptrue p0.s
	; CHECK-NEXT: sdiv z0.s, p0/m, z0.s, z1.s			; CHECK-NEXT: sdiv z0.s, p0/m, z0.s, z1.s
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%div = sdiv <vscale x 4 x i32> %a, shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> undef, i32 3, i32 0), <vscale x 4 x i32> undef, <vscale x 4 x i32> zeroinitializer)			%div = sdiv <vscale x 4 x i32> %a, shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> undef, i32 3, i32 0), <vscale x 4 x i32> undef, <vscale x 4 x i32> zeroinitializer)
	ret <vscale x 4 x i32> %div			ret <vscale x 4 x i32> %div
	}			}

	define <vscale x 4 x i32> @udiv_const(<vscale x 4 x i32> %a) {			define <vscale x 4 x i32> @udiv_const(<vscale x 4 x i32> %a) #0 {
	; CHECK-LABEL: udiv_const:			; CHECK-LABEL: udiv_const:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: mov z1.s, #3 // =0x3			; CHECK-NEXT: mov z1.s, #3 // =0x3
	; CHECK-NEXT: ptrue p0.s			; CHECK-NEXT: ptrue p0.s
	; CHECK-NEXT: udiv z0.s, p0/m, z0.s, z1.s			; CHECK-NEXT: udiv z0.s, p0/m, z0.s, z1.s
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%div = udiv <vscale x 4 x i32> %a, shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> undef, i32 3, i32 0), <vscale x 4 x i32> undef, <vscale x 4 x i32> zeroinitializer)			%div = udiv <vscale x 4 x i32> %a, shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> undef, i32 3, i32 0), <vscale x 4 x i32> undef, <vscale x 4 x i32> zeroinitializer)
	ret <vscale x 4 x i32> %div			ret <vscale x 4 x i32> %div
	}			}

				attributes #0 = { minsize }

llvm/test/CodeGen/AArch64/sve-int-mulh-pred.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=aarch64-linux-gnu < %s \| FileCheck %s

				;
				; SMULH
				;

				define <vscale x 16 x i8> @smulh_i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) #0 {
				; CHECK-LABEL: smulh_i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.b
				; CHECK-NEXT: smulh z0.b, p0/m, z0.b, z1.b
				; CHECK-NEXT: ret
				%insert = insertelement <vscale x 16 x i16> undef, i16 8, i64 0
				%splat = shufflevector <vscale x 16 x i16> %insert, <vscale x 16 x i16> undef, <vscale x 16 x i32> zeroinitializer
				%1 = sext <vscale x 16 x i8> %a to <vscale x 16 x i16>
				%2 = sext <vscale x 16 x i8> %b to <vscale x 16 x i16>
				%mul = mul <vscale x 16 x i16> %1, %2
				%shr = lshr <vscale x 16 x i16> %mul, %splat
				%tr = trunc <vscale x 16 x i16> %shr to <vscale x 16 x i8>
				ret <vscale x 16 x i8> %tr
				}

				define <vscale x 8 x i16> @smulh_i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b) #0 {
				; CHECK-LABEL: smulh_i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: smulh z0.h, p0/m, z0.h, z1.h
				; CHECK-NEXT: ret
				%insert = insertelement <vscale x 8 x i32> undef, i32 16, i64 0
				%splat = shufflevector <vscale x 8 x i32> %insert, <vscale x 8 x i32> undef, <vscale x 8 x i32> zeroinitializer
				%1 = sext <vscale x 8 x i16> %a to <vscale x 8 x i32>
				%2 = sext <vscale x 8 x i16> %b to <vscale x 8 x i32>
				%mul = mul <vscale x 8 x i32> %1, %2
				%shr = lshr <vscale x 8 x i32> %mul, %splat
				%tr = trunc <vscale x 8 x i32> %shr to <vscale x 8 x i16>
				ret <vscale x 8 x i16> %tr
				}

				define <vscale x 4 x i32> @smulh_i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b) #0 {
				; CHECK-LABEL: smulh_i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: smulh z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: ret
				%insert = insertelement <vscale x 4 x i64> undef, i64 32, i64 0
				%splat = shufflevector <vscale x 4 x i64> %insert, <vscale x 4 x i64> undef, <vscale x 4 x i32> zeroinitializer
				%1 = sext <vscale x 4 x i32> %a to <vscale x 4 x i64>
				%2 = sext <vscale x 4 x i32> %b to <vscale x 4 x i64>
				%mul = mul <vscale x 4 x i64> %1, %2
				%shr = lshr <vscale x 4 x i64> %mul, %splat
				%tr = trunc <vscale x 4 x i64> %shr to <vscale x 4 x i32>
				ret <vscale x 4 x i32> %tr
				}

				define <vscale x 2 x i64> @smulh_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: smulh_i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: smulh z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				%insert = insertelement <vscale x 2 x i128> undef, i128 64, i64 0
				%splat = shufflevector <vscale x 2 x i128> %insert, <vscale x 2 x i128> undef, <vscale x 2 x i32> zeroinitializer
				%1 = sext <vscale x 2 x i64> %a to <vscale x 2 x i128>
				%2 = sext <vscale x 2 x i64> %b to <vscale x 2 x i128>
				%mul = mul <vscale x 2 x i128> %1, %2
				%shr = lshr <vscale x 2 x i128> %mul, %splat
				%tr = trunc <vscale x 2 x i128> %shr to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %tr
				}

				;
				; UMULH
				;

				define <vscale x 16 x i8> @umulh_i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) #0 {
				; CHECK-LABEL: umulh_i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.b
				; CHECK-NEXT: umulh z0.b, p0/m, z0.b, z1.b
				; CHECK-NEXT: ret
				%insert = insertelement <vscale x 16 x i16> undef, i16 8, i64 0
				%splat = shufflevector <vscale x 16 x i16> %insert, <vscale x 16 x i16> undef, <vscale x 16 x i32> zeroinitializer
				%1 = zext <vscale x 16 x i8> %a to <vscale x 16 x i16>
				%2 = zext <vscale x 16 x i8> %b to <vscale x 16 x i16>
				%mul = mul <vscale x 16 x i16> %1, %2
				%shr = lshr <vscale x 16 x i16> %mul, %splat
				%tr = trunc <vscale x 16 x i16> %shr to <vscale x 16 x i8>
				ret <vscale x 16 x i8> %tr
				}

				define <vscale x 8 x i16> @umulh_i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b) #0 {
				; CHECK-LABEL: umulh_i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: umulh z0.h, p0/m, z0.h, z1.h
				; CHECK-NEXT: ret
				%insert = insertelement <vscale x 8 x i32> undef, i32 16, i64 0
				%splat = shufflevector <vscale x 8 x i32> %insert, <vscale x 8 x i32> undef, <vscale x 8 x i32> zeroinitializer
				%1 = zext <vscale x 8 x i16> %a to <vscale x 8 x i32>
				%2 = zext <vscale x 8 x i16> %b to <vscale x 8 x i32>
				%mul = mul <vscale x 8 x i32> %1, %2
				%shr = lshr <vscale x 8 x i32> %mul, %splat
				%tr = trunc <vscale x 8 x i32> %shr to <vscale x 8 x i16>
				ret <vscale x 8 x i16> %tr
				}

				define <vscale x 4 x i32> @umulh_i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b) #0 {
				; CHECK-LABEL: umulh_i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: umulh z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: ret
				%insert = insertelement <vscale x 4 x i64> undef, i64 32, i64 0
				%splat = shufflevector <vscale x 4 x i64> %insert, <vscale x 4 x i64> undef, <vscale x 4 x i32> zeroinitializer
				%1 = zext <vscale x 4 x i32> %a to <vscale x 4 x i64>
				%2 = zext <vscale x 4 x i32> %b to <vscale x 4 x i64>
				%mul = mul <vscale x 4 x i64> %1, %2
				%shr = lshr <vscale x 4 x i64> %mul, %splat
				%tr = trunc <vscale x 4 x i64> %shr to <vscale x 4 x i32>
				ret <vscale x 4 x i32> %tr
				}

				define <vscale x 2 x i64> @umulh_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: umulh_i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: umulh z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				%insert = insertelement <vscale x 2 x i128> undef, i128 64, i64 0
				%splat = shufflevector <vscale x 2 x i128> %insert, <vscale x 2 x i128> undef, <vscale x 2 x i32> zeroinitializer
				%1 = zext <vscale x 2 x i64> %a to <vscale x 2 x i128>
				%2 = zext <vscale x 2 x i64> %b to <vscale x 2 x i128>
				%mul = mul <vscale x 2 x i128> %1, %2
				%shr = lshr <vscale x 2 x i128> %mul, %splat
				%tr = trunc <vscale x 2 x i128> %shr to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %tr
				}

				attributes #0 = { "target-features"="+sve" }

llvm/test/CodeGen/AArch64/sve2-int-mulh.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=aarch64-linux-gnu < %s \| FileCheck %s

				;
				; SMULH
				;

				define <vscale x 16 x i8> @smulh_i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) #0 {
				; CHECK-LABEL: smulh_i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: smulh z0.b, z0.b, z1.b
				; CHECK-NEXT: ret
				%insert = insertelement <vscale x 16 x i16> undef, i16 8, i64 0
				%splat = shufflevector <vscale x 16 x i16> %insert, <vscale x 16 x i16> undef, <vscale x 16 x i32> zeroinitializer
				%1 = sext <vscale x 16 x i8> %a to <vscale x 16 x i16>
				%2 = sext <vscale x 16 x i8> %b to <vscale x 16 x i16>
				%mul = mul <vscale x 16 x i16> %1, %2
				%shr = lshr <vscale x 16 x i16> %mul, %splat
				%tr = trunc <vscale x 16 x i16> %shr to <vscale x 16 x i8>
				ret <vscale x 16 x i8> %tr
				}

				define <vscale x 8 x i16> @smulh_i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b) #0 {
				; CHECK-LABEL: smulh_i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: smulh z0.h, z0.h, z1.h
				; CHECK-NEXT: ret
				%insert = insertelement <vscale x 8 x i32> undef, i32 16, i64 0
				%splat = shufflevector <vscale x 8 x i32> %insert, <vscale x 8 x i32> undef, <vscale x 8 x i32> zeroinitializer
				%1 = sext <vscale x 8 x i16> %a to <vscale x 8 x i32>
				%2 = sext <vscale x 8 x i16> %b to <vscale x 8 x i32>
				%mul = mul <vscale x 8 x i32> %1, %2
				%shr = lshr <vscale x 8 x i32> %mul, %splat
				%tr = trunc <vscale x 8 x i32> %shr to <vscale x 8 x i16>
				ret <vscale x 8 x i16> %tr
				}

				define <vscale x 4 x i32> @smulh_i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b) #0 {
				; CHECK-LABEL: smulh_i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: smulh z0.s, z0.s, z1.s
				; CHECK-NEXT: ret
				%insert = insertelement <vscale x 4 x i64> undef, i64 32, i64 0
				%splat = shufflevector <vscale x 4 x i64> %insert, <vscale x 4 x i64> undef, <vscale x 4 x i32> zeroinitializer
				%1 = sext <vscale x 4 x i32> %a to <vscale x 4 x i64>
				%2 = sext <vscale x 4 x i32> %b to <vscale x 4 x i64>
				%mul = mul <vscale x 4 x i64> %1, %2
				%shr = lshr <vscale x 4 x i64> %mul, %splat
				%tr = trunc <vscale x 4 x i64> %shr to <vscale x 4 x i32>
				ret <vscale x 4 x i32> %tr
				}

				define <vscale x 2 x i64> @smulh_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: smulh_i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: smulh z0.d, z0.d, z1.d
				; CHECK-NEXT: ret
				%insert = insertelement <vscale x 2 x i128> undef, i128 64, i64 0
				%splat = shufflevector <vscale x 2 x i128> %insert, <vscale x 2 x i128> undef, <vscale x 2 x i32> zeroinitializer
				%1 = sext <vscale x 2 x i64> %a to <vscale x 2 x i128>
				%2 = sext <vscale x 2 x i64> %b to <vscale x 2 x i128>
				%mul = mul <vscale x 2 x i128> %1, %2
				%shr = lshr <vscale x 2 x i128> %mul, %splat
				%tr = trunc <vscale x 2 x i128> %shr to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %tr
				}

				;
				; UMULH
				;

				define <vscale x 16 x i8> @umulh_i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) #0 {
				; CHECK-LABEL: umulh_i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: umulh z0.b, z0.b, z1.b
				; CHECK-NEXT: ret
				%insert = insertelement <vscale x 16 x i16> undef, i16 8, i64 0
				%splat = shufflevector <vscale x 16 x i16> %insert, <vscale x 16 x i16> undef, <vscale x 16 x i32> zeroinitializer
				%1 = zext <vscale x 16 x i8> %a to <vscale x 16 x i16>
				%2 = zext <vscale x 16 x i8> %b to <vscale x 16 x i16>
				%mul = mul <vscale x 16 x i16> %1, %2
				%shr = lshr <vscale x 16 x i16> %mul, %splat
				%tr = trunc <vscale x 16 x i16> %shr to <vscale x 16 x i8>
				ret <vscale x 16 x i8> %tr
				}

				define <vscale x 8 x i16> @umulh_i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b) #0 {
				; CHECK-LABEL: umulh_i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: umulh z0.h, z0.h, z1.h
				; CHECK-NEXT: ret
				%insert = insertelement <vscale x 8 x i32> undef, i32 16, i64 0
				%splat = shufflevector <vscale x 8 x i32> %insert, <vscale x 8 x i32> undef, <vscale x 8 x i32> zeroinitializer
				%1 = zext <vscale x 8 x i16> %a to <vscale x 8 x i32>
				%2 = zext <vscale x 8 x i16> %b to <vscale x 8 x i32>
				%mul = mul <vscale x 8 x i32> %1, %2
				%shr = lshr <vscale x 8 x i32> %mul, %splat
				%tr = trunc <vscale x 8 x i32> %shr to <vscale x 8 x i16>
				ret <vscale x 8 x i16> %tr
				}

				define <vscale x 4 x i32> @umulh_i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b) #0 {
				; CHECK-LABEL: umulh_i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: umulh z0.s, z0.s, z1.s
				; CHECK-NEXT: ret
				%insert = insertelement <vscale x 4 x i64> undef, i64 32, i64 0
				%splat = shufflevector <vscale x 4 x i64> %insert, <vscale x 4 x i64> undef, <vscale x 4 x i32> zeroinitializer
				%1 = zext <vscale x 4 x i32> %a to <vscale x 4 x i64>
				%2 = zext <vscale x 4 x i32> %b to <vscale x 4 x i64>
				%mul = mul <vscale x 4 x i64> %1, %2
				%shr = lshr <vscale x 4 x i64> %mul, %splat
				%tr = trunc <vscale x 4 x i64> %shr to <vscale x 4 x i32>
				ret <vscale x 4 x i32> %tr
				}

				define <vscale x 2 x i64> @umulh_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) #0 {
				; CHECK-LABEL: umulh_i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: umulh z0.d, z0.d, z1.d
				; CHECK-NEXT: ret
				%insert = insertelement <vscale x 2 x i128> undef, i128 64, i64 0
				%splat = shufflevector <vscale x 2 x i128> %insert, <vscale x 2 x i128> undef, <vscale x 2 x i32> zeroinitializer
				%1 = zext <vscale x 2 x i64> %a to <vscale x 2 x i128>
				%2 = zext <vscale x 2 x i64> %b to <vscale x 2 x i128>
				%mul = mul <vscale x 2 x i128> %1, %2
				%shr = lshr <vscale x 2 x i128> %mul, %splat
				%tr = trunc <vscale x 2 x i128> %shr to <vscale x 2 x i64>
				ret <vscale x 2 x i64> %tr
				}

				attributes #0 = { "target-features"="+sve2" }