This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
9/13
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
7/11
sve-streaming-mode-fixed-length-int-arith.ll
1/3
sve-streaming-mode-fixed-length-int-div.ll
-
sve-streaming-mode-fixed-length-int-log.ll
1/3
sve-streaming-mode-fixed-length-int-mulh.ll
15/16
sve-streaming-mode-fixed-length-int-rem.ll

Differential D135324

[AArch64-SVE]: force using SVE in streaming mode to lower arithmetic and logical fixed-width vector ops.
ClosedPublic

Authored by hassnaa-arm on Oct 5 2022, 3:05 PM.

Download Raw Diff

Details

Reviewers

david-arm
kmclaughlin
sdesmalen
paulwalker-arm

Commits

rG956489700e73: [AArch64-SVE]: Force generating code compatible to streaming mode.

Summary

Force using SVE in streaming mode to lower arithmetic and logical fixed-width vector ops.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

hassnaa-arm created this revision.Oct 5 2022, 3:05 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 5 2022, 3:05 PM

Herald added subscribers: ctetreau, hiraditya, kristof.beyls, tschuett. · View Herald Transcript

hassnaa-arm requested review of this revision.Oct 5 2022, 3:06 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 5 2022, 3:06 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B190614: Diff 465569.Oct 5 2022, 3:06 PM

hassnaa-arm added reviewers: david-arm, kmclaughlin.Oct 5 2022, 3:06 PM

Matt added a subscriber: Matt.Oct 5 2022, 7:55 PM

Hi @hassnaa-arm, could you rename the title to something that describes the patch a little more? I think something like

[AArch64][SVE]: Force the use of SVE to lower fixed-width arithmetic ops in streaming mode

would be a bit clearer. What do you think?

Hi @hassnaa-arm, it looks like this patch is based off D133433. Can you add that as a parent revision so it's obvious to the reviewer please? You can do this by clicking on "Edit Related Revisions" -> "Edit Parent Revisions" at the top-right corner of the page. Thanks!

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
4462	Hi @hassnaa-arm, this change doesn't look right. I would expect it to break some tests? When we're not in streaming mode we also want to override NEON for 64-bit element types. Can you put the OverrideNEON flag back in, perhaps something like // If SVE is available then i64 vector multiplications can also be made legal. bool OverrideNEON = VT == MVT::v2i64 \|\| VT == MVT::v1i64 \|\| Subtarget->forceSVEInStreamingMode();
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-arith.ll
92	Again, this is illegal in streaming mode.
1588	This looks like a NEON instruction - can you investigate where this is coming from?

fix lowerMul to override NEON for v2i64 and v1i64 even if SVE is not forced

Harbormaster completed remote builds in B190701: Diff 465695.Oct 6 2022, 3:52 AM

hassnaa-arm marked an inline comment as done.Oct 6 2022, 3:53 AM

hassnaa-arm added a parent revision: D133433: [AArch64]: Force generating code compatible to streaming mode.Oct 6 2022, 3:57 AM

hassnaa-arm retitled this revision from [AArch64-SVE]: force using SVE in streaming mode to [AArch64-SVE]: force using SVE in streaming mode to lower arithmetic and logical fixed-width vector ops..Oct 6 2022, 4:02 AM

Thanks for this @hassnaa-arm! I had some comments about how to tidy up the tests a bit. I also think some there are some load/store test changes that shouldn't be part of this patch.

llvm/test/CodeGen/AArch64/sve-fixed-length-masked-stores.ll
420 ↗	(On Diff #465695)	nit: whitespace
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-ext-loads.ll
8 ↗	(On Diff #465695)	I don't think these changes should be part of this patch, since it's not changing loads and stores?
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-arith.ll
13	Could you also add a test for an illegal NEON type too, i.e. `<4 x i8>` or `<2 x i16>`?
92	Please ignore this comment! `stp q0, q1` is legal - my mistake!
473	Again, could you add at least one illegal type - `<4 x i8>` or `<2 x i16>`?
1410	Can you add an illegal NEON type such as `<2 x i16>`?
1588	Please ignore this - my mistake!
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-div.ll
299	This still has the `vscale_range(16,0)` attribute. Can you remove it and recreate the CHECK lines please?
1063	Again, this still has the `vscale_range(16,0)` attribute. Can you remove it and regenerate the CHECK lines?
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-mulh.ll
17	Can you add a test for an illegal type such as `<4 x i8>` too?
218	Wow, this code surely gets an award for being so impressively bad?!
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-rem.ll
472	I think that you can remove the tests greater than 512 bits, i.e. <128 x i8>. If the tests already work for <64 x i8> they are likely to work for anything larger too.
774	Again, maybe remove this test since I'm not sure what extra value it gives us?
1608	Again, maybe remove this test?
1775	Again, maybe remove this test?
2240	Again, maybe remove this test?
2335	Again, maybe remove this test?
2652	Again, maybe remove this test?
2747	Again, maybe remove this test?

david-arm added inline comments.Oct 6 2022, 7:33 AM

llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-rem.ll
3384	Again, maybe remove this test?
3686	Again, maybe remove this test?
4520	Again, maybe remove this test?
4687	Again, maybe remove this test?
5152	Again, maybe remove this test?
5247	Again, maybe remove this test?
5564	Again, maybe remove this test?
5659	Again, maybe remove this test?
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-loads.ll
2 ↗	(On Diff #465695)	Not part of this patch?
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-masked-store.ll
2 ↗	(On Diff #465695)	Not part of this patch?

revert changes related to load/store as they are not related to this patch

Harbormaster completed remote builds in B190757: Diff 465775.Oct 6 2022, 10:48 AM

add some illegal NEON types test cases and remove unnecessary tests

hassnaa-arm added inline comments.Oct 6 2022, 12:01 PM

llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-ext-loads.ll
8 ↗	(On Diff #465695)	I'm sorry, it's by mistake. I will correct it.

Harbormaster completed remote builds in B190784: Diff 465818.Oct 6 2022, 12:55 PM

This looks a lot better now thanks @hassnaa-arm and the test files are much smaller too. I spotted two issues in a couple of tests where we are using illegal NEON instructions, e.g. bic. Would you be able to investigate these please?

llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-div.ll
977	This is a NEON vector instruction - this is definitely illegal in streaming mode. Can you try to find out why this is being inserted please?
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-mulh.ll
1212	Again, these `bic` instructions are illegal in streaming mode.

Fix invalid bic instruction in streaming mode.
Fix invalid bic instruction that was generated during 'and' combining by converting the fixed-length vector to scalable one to combine SVEAnd instead of and.

Harbormaster completed remote builds in B191653: Diff 467017.Oct 11 2022, 10:14 PM

hassnaa-arm edited parent revisions, added: D135564: [AArch64-SVE]: Force generating code compatible to streaming mode.; removed: D133433: [AArch64]: Force generating code compatible to streaming mode.Oct 11 2022, 10:17 PM

Remove unrelated changes

Harbormaster completed remote builds in B191925: Diff 467414.Oct 13 2022, 2:54 AM

Remove unrelated changes

Harbormaster completed remote builds in B191926: Diff 467416.Oct 13 2022, 3:00 AM

Remove unrelated changes

Harbormaster completed remote builds in B191928: Diff 467417.Oct 13 2022, 3:03 AM

Update by latest changes of parent patch

Harbormaster completed remote builds in B192013: Diff 467545.Oct 13 2022, 12:39 PM

Update by changes of parent patch

Harbormaster completed remote builds in B192186: Diff 467788.Oct 14 2022, 9:17 AM

hassnaa-arm added reviewers: sdesmalen, paulwalker-arm.Oct 17 2022, 2:06 AM

Update by parent patch

Harbormaster completed remote builds in B192462: Diff 468154.Oct 17 2022, 3:56 AM

sdesmalen added inline comments.Oct 18 2022, 5:32 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
15756	nit: In LLVM the style is to start local variables with an upper-case, i.e. ScalableLHS.
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-arith.ll
66–93	I think this test can be removed, because you've already covered the "twice as wide" case (32 x i8) which ensures we don't emit any other instructions not valid in streaming mode. The "four times as wide' should already be covered by `sve-fixed-length-int-arith.ll`.
151	This test can be removed for the same reason as mentioned above.
224	This test can be removed for the same reason as mentioned above.
297	This test can be removed for the same reason as mentioned above. (same for all other 4 x as wide instances in the remainder of this file and other files in this patch)

hassnaa-arm marked 5 inline comments as done.Oct 18 2022, 7:34 AM

Remove not needed test cases

Harbormaster completed remote builds in B192745: Diff 468535.Oct 18 2022, 7:35 AM

paulwalker-arm added inline comments.Oct 19 2022, 5:47 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
15753–15759	This looks odd. We shouldn't really be doing lower within DAGCombine. What happens if you just exit the combine for the invalid case? That said, I can see functions like `tryAdvSIMDModImm32()` are used in other part of codegen so I'm wondering if the prevention logic is best place within such functions so all use cases are covered.
22384	When we hit a similar issue with `LowerToPredicatedOp()` we decide to drop the calls to `useSVEForFixedLengthVectorVT()` in favour or just using `VT.isFixedLengthVector() && isTypeLegal(VT)`. Would the same work in your case?

hassnaa-arm added inline comments.Oct 19 2022, 8:33 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
22384	Sorry, I don't understand. you mean dropping the call for `useSVEForFixedLengthVectorVT(...)` ? or you mean using use `SVEForFixedLengthVectorVT(VT)` without passing the ovrrideNEON parameter ?

paulwalker-arm added inline comments.Oct 19 2022, 8:42 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
22384	The former, so you can drop the call to `useSVEForFixedLengthVectorVT()` and instead have `assert(VT.isFixedLengthVector() && isTypeLegal(VT) && ...`. By this point we should be working with only legal types and there's no harm in handling any of them.

hassnaa-arm added inline comments.Oct 19 2022, 9:26 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
22384	why do you suggest that instead of using `useSVEForFixedLengthVectorVT()` ? and why do you suggest it for `LowerToPredicatedOp()` only not also other lowering functions ?

paulwalker-arm added inline comments.Oct 19 2022, 9:47 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
22384	`useSVEForFixedLengthVectorVT()` is a semi-complex function that exists to choose which path to take during code generation and is thusly used to determine how to lower an `ISD::ADD` for example. However, by calling `LowerToPredicatedOp()` you've already made that decision and so you only need to detect scenarios that would result in broken code generation. For the case of `LowerToPredicatedOp()` this just means ensuring the input is a legal fixed length vector.

hassnaa-arm marked 4 inline comments as done.Oct 20 2022, 4:26 AM

Remove step of converting to scalable vector that was added within DAGCombine (performAndCombine)

Harbormaster completed remote builds in B193186: Diff 469161.Oct 20 2022, 4:29 AM

hassnaa-arm added a child revision: D136147: [AArch64-SVE]: Test enabling streaming mode for tests of: shifts, extract subverter, build vector, concat, and extract vector elt.Oct 20 2022, 4:47 AM

Update by latest changes of parent patch.

Harbormaster completed remote builds in B193514: Diff 469597.Oct 21 2022, 8:02 AM

hassnaa-arm removed a child revision: D136147: [AArch64-SVE]: Test enabling streaming mode for tests of: shifts, extract subverter, build vector, concat, and extract vector elt.Oct 21 2022, 8:56 AM

Update by parent patch

Harbormaster completed remote builds in B193554: Diff 469650.Oct 21 2022, 10:30 AM

Update by parent patch

Harbormaster completed remote builds in B194673: Diff 471183.Oct 27 2022, 10:01 AM

Update by parent patch

Harbormaster completed remote builds in B194899: Diff 471493.Oct 28 2022, 4:52 AM

hassnaa-arm added a child revision: D136858: [AArch64-SVE]: Force generating code compatible to streaming mode for sve-fixed-length tests..Nov 1 2022, 3:59 AM

hassnaa-arm added a child revision: D137093: [AArch64][SVE][NFC] Add streaming mode SVE tests.

Hi @hassnaa-arm, I think this patch is very close to being ready! However, do you know why the test file sve-streaming-fixed-length-int-shifts.ll was deleted?

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1394	I remember in one of your previous patches that @sdesmalen mentioned you shouldn't need to add `v1i64` as it should be treated as a scalar. What happens if you remove it? I imagine your v1i64 tests might just generate scalar code?

Update by parent patch

Harbormaster completed remote builds in B195464: Diff 472290.Nov 1 2022, 7:04 AM

In D135324#3898875, @david-arm wrote:

Hi @hassnaa-arm, I think this patch is very close to being ready! However, do you know why the test file sve-streaming-fixed-length-int-shifts.ll was deleted?

It was a fault while rebasing the parent patch to this patch.
In the parent patch, that deleted file was replaced by sve-streaming-mode-fixed-length-int-shifts.ll

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1394	Yes, at that patch, there were no tests needing custom-lowering for v1i64. But in this patch, the test file of sve-streaming-mode-fixed-length-int-log.ll has invalid instructions for the test cases of : define <1 x i64> @and_v1i64(<1 x i64> %op1, <1 x i64> %op2) define <1 x i64> @xor_v1i64(<1 x i64> %op1, <1 x i64> %op2)

hassnaa-arm marked an inline comment as done and an inline comment as not done.Nov 1 2022, 7:42 AM

paulwalker-arm accepted this revision.Nov 1 2022, 6:07 PM

paulwalker-arm added inline comments.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1394	I'll also add that in general you don't want to revert such integer types to scalar because that'll cause GPR-VPR transfers that can be expensive, perhaps even more so when it comes to streaming mode. You can also see that within `LowerMUL` we have special handling for `v1i64` to keep it in the vector unit.

This revision is now accepted and ready to land.Nov 1 2022, 6:07 PM

sdesmalen added inline comments.Nov 2 2022, 1:38 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1394	Does that mean we need coverage as well for v1i32? (and perhaps also v1i8, v1i16). If so, I wonder if that might warrant a separate patch rather than support the odd case here?

paulwalker-arm added inline comments.Nov 2 2022, 6:52 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1394	Discussed offline but for completeness the answer is no. The other MVTs you list are not type legal (only 64/128-bit vectors are legal for NEON) and so they'll not make it into operation legalisation.

This revision was landed with ongoing or failed builds.Nov 10 2022, 4:38 AM

Closed by commit rG956489700e73: [AArch64-SVE]: Force generating code compatible to streaming mode. (authored by Hassnaa Hamdi <hassnaa.hamdi@arm.com>). · Explain Why

This revision was automatically updated to reflect the committed changes.

hassnaa-arm added a commit: rG956489700e73: [AArch64-SVE]: Force generating code compatible to streaming mode..

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

53 lines

test/

CodeGen/

AArch64/

sve-streaming-mode-fixed-length-int-arith.ll

791 lines

sve-streaming-mode-fixed-length-int-div.ll

737 lines

sve-streaming-mode-fixed-length-int-log.ll

498 lines

sve-streaming-mode-fixed-length-int-mulh.ll

893 lines

sve-streaming-mode-fixed-length-int-rem.ll

742 lines

Diff 469161

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,385 Lines • ▼ Show 20 Lines	if (Subtarget->hasSVE()) {
setOperationAction(ISD::MUL, MVT::v2i64, Custom);		setOperationAction(ISD::MUL, MVT::v2i64, Custom);

// NEON doesn't support across-vector reductions, but SVE does.		// NEON doesn't support across-vector reductions, but SVE does.
for (auto VT : {MVT::v4f16, MVT::v8f16, MVT::v2f32, MVT::v4f32, MVT::v2f64})		for (auto VT : {MVT::v4f16, MVT::v8f16, MVT::v2f32, MVT::v4f32, MVT::v2f64})
setOperationAction(ISD::VECREDUCE_SEQ_FADD, VT, Custom);		setOperationAction(ISD::VECREDUCE_SEQ_FADD, VT, Custom);

if (Subtarget->forceStreamingCompatibleSVE()) {		if (Subtarget->forceStreamingCompatibleSVE()) {
for (MVT VT : {MVT::v8i8, MVT::v16i8, MVT::v4i16, MVT::v8i16, MVT::v2i32,		for (MVT VT : {MVT::v8i8, MVT::v16i8, MVT::v4i16, MVT::v8i16, MVT::v2i32,
MVT::v4i32, MVT::v1i64, MVT::v2i64})		MVT::v4i32, MVT::v1i64, MVT::v2i64})
		david-armUnsubmitted Done Reply Inline Actions I remember in one of your previous patches that @sdesmalen mentioned you shouldn't need to add `v1i64` as it should be treated as a scalar. What happens if you remove it? I imagine your v1i64 tests might just generate scalar code? david-arm: I remember in one of your previous patches that @sdesmalen mentioned you shouldn't need to add…
		hassnaa-armAuthorUnsubmitted Not Done Reply Inline Actions Yes, at that patch, there were no tests needing custom-lowering for v1i64. But in this patch, the test file of sve-streaming-mode-fixed-length-int-log.ll has invalid instructions for the test cases of : define <1 x i64> @and_v1i64(<1 x i64> %op1, <1 x i64> %op2) define <1 x i64> @xor_v1i64(<1 x i64> %op1, <1 x i64> %op2) hassnaa-arm: Yes, at that patch, there were no tests needing custom-lowering for v1i64. But in this patch…
		paulwalker-armUnsubmitted Not Done Reply Inline Actions I'll also add that in general you don't want to revert such integer types to scalar because that'll cause GPR-VPR transfers that can be expensive, perhaps even more so when it comes to streaming mode. You can also see that within `LowerMUL` we have special handling for `v1i64` to keep it in the vector unit. paulwalker-arm: I'll also add that in general you don't want to revert such integer types to scalar because…
		sdesmalenUnsubmitted Not Done Reply Inline Actions Does that mean we need coverage as well for v1i32? (and perhaps also v1i8, v1i16). If so, I wonder if that might warrant a separate patch rather than support the odd case here? sdesmalen: Does that mean we need coverage as well for v1i32? (and perhaps also v1i8, v1i16). If so, I…
		paulwalker-armUnsubmitted Not Done Reply Inline Actions Discussed offline but for completeness the answer is no. The other MVTs you list are not type legal (only 64/128-bit vectors are legal for NEON) and so they'll not make it into operation legalisation. paulwalker-arm: Discussed offline but for completeness the answer is no. The other MVTs you list are not type…
addTypeForStreamingSVE(VT);		addTypeForStreamingSVE(VT);

for (MVT VT : {MVT::v4f16, MVT::v8f16, MVT::v2f32, MVT::v4f32, MVT::v1f64,		for (MVT VT : {MVT::v4f16, MVT::v8f16, MVT::v2f32, MVT::v4f32, MVT::v1f64,
MVT::v2f64})		MVT::v2f64})
addTypeForStreamingSVE(VT);		addTypeForStreamingSVE(VT);
}		}

// NOTE: Currently this has to happen after computeRegisterProperties rather		// NOTE: Currently this has to happen after computeRegisterProperties rather
▲ Show 20 Lines • Show All 202 Lines • ▼ Show 20 Lines	bool AArch64TargetLowering::shouldExpandGetActiveLaneMask(EVT ResVT,
return false;		return false;
}		}

void AArch64TargetLowering::addTypeForStreamingSVE(MVT VT) {		void AArch64TargetLowering::addTypeForStreamingSVE(MVT VT) {
setOperationAction(ISD::ANY_EXTEND, VT, Custom);		setOperationAction(ISD::ANY_EXTEND, VT, Custom);
setOperationAction(ISD::ZERO_EXTEND, VT, Custom);		setOperationAction(ISD::ZERO_EXTEND, VT, Custom);
setOperationAction(ISD::SIGN_EXTEND, VT, Custom);		setOperationAction(ISD::SIGN_EXTEND, VT, Custom);
setOperationAction(ISD::CONCAT_VECTORS, VT, Custom);		setOperationAction(ISD::CONCAT_VECTORS, VT, Custom);
		setOperationAction(ISD::ADD, VT, Custom);
		setOperationAction(ISD::SUB, VT, Custom);
		setOperationAction(ISD::MUL, VT, Custom);
		setOperationAction(ISD::MULHS, VT, Custom);
		setOperationAction(ISD::MULHU, VT, Custom);
		setOperationAction(ISD::ABS, VT, Custom);
		setOperationAction(ISD::AND, VT, Custom);
		setOperationAction(ISD::XOR, VT, Custom);
}		}

void AArch64TargetLowering::addTypeForFixedLengthSVE(MVT VT) {		void AArch64TargetLowering::addTypeForFixedLengthSVE(MVT VT) {
assert(VT.isFixedLengthVector() && "Expected fixed length vector type!");		assert(VT.isFixedLengthVector() && "Expected fixed length vector type!");

// By default everything must be expanded.		// By default everything must be expanded.
for (unsigned Op = 0; Op < ISD::BUILTIN_OP_END; ++Op)		for (unsigned Op = 0; Op < ISD::BUILTIN_OP_END; ++Op)
setOperationAction(Op, VT, Expand);		setOperationAction(Op, VT, Expand);
▲ Show 20 Lines • Show All 1,908 Lines • ▼ Show 20 Lines	if (Opc) {
// Emit the AArch64 operation with overflow check.		// Emit the AArch64 operation with overflow check.
Value = DAG.getNode(Opc, DL, VTs, LHS, RHS);		Value = DAG.getNode(Opc, DL, VTs, LHS, RHS);
Overflow = Value.getValue(1);		Overflow = Value.getValue(1);
}		}
return std::make_pair(Value, Overflow);		return std::make_pair(Value, Overflow);
}		}

SDValue AArch64TargetLowering::LowerXOR(SDValue Op, SelectionDAG &DAG) const {		SDValue AArch64TargetLowering::LowerXOR(SDValue Op, SelectionDAG &DAG) const {
if (useSVEForFixedLengthVectorVT(Op.getValueType()))		if (useSVEForFixedLengthVectorVT(Op.getValueType(),
		Subtarget->forceStreamingCompatibleSVE()))
return LowerToScalableOp(Op, DAG);		return LowerToScalableOp(Op, DAG);

SDValue Sel = Op.getOperand(0);		SDValue Sel = Op.getOperand(0);
SDValue Other = Op.getOperand(1);		SDValue Other = Op.getOperand(1);
SDLoc dl(Sel);		SDLoc dl(Sel);

// If the operand is an overflow checking operation, invert the condition		// If the operand is an overflow checking operation, invert the condition
// code and kill the Not operation. I.e., transform:		// code and kill the Not operation. I.e., transform:
▲ Show 20 Lines • Show All 895 Lines • ▼ Show 20 Lines	static unsigned selectUmullSmull(SDNode &N0, SDNode &N1, SelectionDAG &DAG,
}		}
return 0;		return 0;
}		}

SDValue AArch64TargetLowering::LowerMUL(SDValue Op, SelectionDAG &DAG) const {		SDValue AArch64TargetLowering::LowerMUL(SDValue Op, SelectionDAG &DAG) const {
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();

// If SVE is available then i64 vector multiplications can also be made legal.		// If SVE is available then i64 vector multiplications can also be made legal.
bool OverrideNEON = VT == MVT::v2i64 \|\| VT == MVT::v1i64;		bool OverrideNEON = VT == MVT::v2i64 \|\| VT == MVT::v1i64 \|\|
		Subtarget->forceStreamingCompatibleSVE();

if (VT.isScalableVector() \|\| useSVEForFixedLengthVectorVT(VT, OverrideNEON))		if (VT.isScalableVector() \|\| useSVEForFixedLengthVectorVT(VT, OverrideNEON))
return LowerToPredicatedOp(Op, DAG, AArch64ISD::MUL_PRED);		return LowerToPredicatedOp(Op, DAG, AArch64ISD::MUL_PRED);
		david-armUnsubmitted Done Reply Inline Actions Hi @hassnaa-arm, this change doesn't look right. I would expect it to break some tests? When we're not in streaming mode we also want to override NEON for 64-bit element types. Can you put the OverrideNEON flag back in, perhaps something like // If SVE is available then i64 vector multiplications can also be made legal. bool OverrideNEON = VT == MVT::v2i64 \|\| VT == MVT::v1i64 \|\| Subtarget->forceSVEInStreamingMode(); david-arm: Hi @hassnaa-arm, this change doesn't look right. I would expect it to break some tests? When…

// Multiplications are only custom-lowered for 128-bit vectors so that		// Multiplications are only custom-lowered for 128-bit vectors so that
// VMULL can be detected. Otherwise v2i64 multiplications are not legal.		// VMULL can be detected. Otherwise v2i64 multiplications are not legal.
assert(VT.is128BitVector() && VT.isInteger() &&		assert(VT.is128BitVector() && VT.isInteger() &&
"unexpected type for custom-lowering ISD::MUL");		"unexpected type for custom-lowering ISD::MUL");
SDNode *N0 = Op.getOperand(0).getNode();		SDNode *N0 = Op.getOperand(0).getNode();
SDNode *N1 = Op.getOperand(1).getNode();		SDNode *N1 = Op.getOperand(1).getNode();
bool isMLA = false;		bool isMLA = false;
▲ Show 20 Lines • Show All 6,899 Lines • ▼ Show 20 Lines	static SDValue tryAdvSIMDModImm64(unsigned NewOp, SDValue Op, SelectionDAG &DAG,
}		}

return SDValue();		return SDValue();
}		}

// Try 32-bit splatted SIMD immediate.		// Try 32-bit splatted SIMD immediate.
static SDValue tryAdvSIMDModImm32(unsigned NewOp, SDValue Op, SelectionDAG &DAG,		static SDValue tryAdvSIMDModImm32(unsigned NewOp, SDValue Op, SelectionDAG &DAG,
const APInt &Bits,		const APInt &Bits,
const SDValue *LHS = nullptr) {		const SDValue *LHS = nullptr,
		const AArch64Subtarget *const Subtarget = nullptr) {

		EVT VT = Op.getValueType();
		if(Subtarget && VT.isFixedLengthVector() && Subtarget->forceStreamingCompatibleSVE())
		return SDValue();

if (Bits.getHiBits(64) == Bits.getLoBits(64)) {		if (Bits.getHiBits(64) == Bits.getLoBits(64)) {
uint64_t Value = Bits.zextOrTrunc(64).getZExtValue();		uint64_t Value = Bits.zextOrTrunc(64).getZExtValue();
EVT VT = Op.getValueType();
MVT MovTy = (VT.getSizeInBits() == 128) ? MVT::v4i32 : MVT::v2i32;		MVT MovTy = (VT.getSizeInBits() == 128) ? MVT::v4i32 : MVT::v2i32;
bool isAdvSIMDModImm = false;		bool isAdvSIMDModImm = false;
uint64_t Shift;		uint64_t Shift;

if ((isAdvSIMDModImm = AArch64_AM::isAdvSIMDModImmType1(Value))) {		if ((isAdvSIMDModImm = AArch64_AM::isAdvSIMDModImmType1(Value))) {
Value = AArch64_AM::encodeAdvSIMDModImmType1(Value);		Value = AArch64_AM::encodeAdvSIMDModImmType1(Value);
Shift = 0;		Shift = 0;
}		}
Show All 28 Lines	static SDValue tryAdvSIMDModImm32(unsigned NewOp, SDValue Op, SelectionDAG &DAG,
}		}

return SDValue();		return SDValue();
}		}

// Try 16-bit splatted SIMD immediate.		// Try 16-bit splatted SIMD immediate.
static SDValue tryAdvSIMDModImm16(unsigned NewOp, SDValue Op, SelectionDAG &DAG,		static SDValue tryAdvSIMDModImm16(unsigned NewOp, SDValue Op, SelectionDAG &DAG,
const APInt &Bits,		const APInt &Bits,
const SDValue *LHS = nullptr) {		const SDValue *LHS = nullptr,
		const AArch64Subtarget *const Subtarget = nullptr) {
		EVT VT = Op.getValueType();
		if(Subtarget && VT.isFixedLengthVector() && Subtarget->forceStreamingCompatibleSVE())
		return SDValue();

if (Bits.getHiBits(64) == Bits.getLoBits(64)) {		if (Bits.getHiBits(64) == Bits.getLoBits(64)) {
uint64_t Value = Bits.zextOrTrunc(64).getZExtValue();		uint64_t Value = Bits.zextOrTrunc(64).getZExtValue();
EVT VT = Op.getValueType();
MVT MovTy = (VT.getSizeInBits() == 128) ? MVT::v8i16 : MVT::v4i16;		MVT MovTy = (VT.getSizeInBits() == 128) ? MVT::v8i16 : MVT::v4i16;
bool isAdvSIMDModImm = false;		bool isAdvSIMDModImm = false;
uint64_t Shift;		uint64_t Shift;

if ((isAdvSIMDModImm = AArch64_AM::isAdvSIMDModImmType5(Value))) {		if ((isAdvSIMDModImm = AArch64_AM::isAdvSIMDModImmType5(Value))) {
Value = AArch64_AM::encodeAdvSIMDModImmType5(Value);		Value = AArch64_AM::encodeAdvSIMDModImmType5(Value);
Shift = 0;		Shift = 0;
}		}
▲ Show 20 Lines • Show All 210 Lines • ▼ Show 20 Lines	static SDValue tryLowerToSLI(SDNode *N, SelectionDAG &DAG) {
LLVM_DEBUG(ResultSLI->dump(&DAG));		LLVM_DEBUG(ResultSLI->dump(&DAG));

++NumShiftInserts;		++NumShiftInserts;
return ResultSLI;		return ResultSLI;
}		}

SDValue AArch64TargetLowering::LowerVectorOR(SDValue Op,		SDValue AArch64TargetLowering::LowerVectorOR(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
if (useSVEForFixedLengthVectorVT(Op.getValueType()))		if (useSVEForFixedLengthVectorVT(Op.getValueType(),
		Subtarget->forceStreamingCompatibleSVE()))
return LowerToScalableOp(Op, DAG);		return LowerToScalableOp(Op, DAG);

// Attempt to form a vector S[LR]I from (or (and X, C1), (lsl Y, C2))		// Attempt to form a vector S[LR]I from (or (and X, C1), (lsl Y, C2))
if (SDValue Res = tryLowerToSLI(Op.getNode(), DAG))		if (SDValue Res = tryLowerToSLI(Op.getNode(), DAG))
return Res;		return Res;

EVT VT = Op.getValueType();		EVT VT = Op.getValueType();

▲ Show 20 Lines • Show All 4,055 Lines • ▼ Show 20 Lines	static SDValue performSVEAndCombine(SDNode *N,

if (isConstantSplatVectorMaskForType(Mask.getNode(), MemVT))		if (isConstantSplatVectorMaskForType(Mask.getNode(), MemVT))
return Src;		return Src;

return SDValue();		return SDValue();
}		}

static SDValue performANDCombine(SDNode *N,		static SDValue performANDCombine(SDNode *N,
TargetLowering::DAGCombinerInfo &DCI) {		TargetLowering::DAGCombinerInfo &DCI,
		const AArch64Subtarget *const Subtarget) {
SelectionDAG &DAG = DCI.DAG;		SelectionDAG &DAG = DCI.DAG;
SDValue LHS = N->getOperand(0);		SDValue LHS = N->getOperand(0);
SDValue RHS = N->getOperand(1);		SDValue RHS = N->getOperand(1);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);

if (SDValue R = performANDORCSELCombine(N, DAG))		if (SDValue R = performANDORCSELCombine(N, DAG))
return R;		return R;

if (!DAG.getTargetLoweringInfo().isTypeLegal(VT))		if (!DAG.getTargetLoweringInfo().isTypeLegal(VT))
return SDValue();		return SDValue();

if (VT.isScalableVector())		if (VT.isScalableVector())
return performSVEAndCombine(N, DCI);		return performSVEAndCombine(N, DCI);

// The combining code below works only for NEON vectors. In particular, it		// The combining code below works only for NEON vectors. In particular, it
// does not work for SVE when dealing with vectors wider than 128 bits.		// does not work for SVE when dealing with vectors wider than 128 bits.
if (!VT.is64BitVector() && !VT.is128BitVector())		if (!VT.is64BitVector() && !VT.is128BitVector())
return SDValue();		return SDValue();
		sdesmalenUnsubmitted Done Reply Inline Actions nit: In LLVM the style is to start local variables with an upper-case, i.e. ScalableLHS. sdesmalen: nit: In LLVM the style is to start local variables with an upper-case, i.e. ScalableLHS.

BuildVectorSDNode *BVN = dyn_cast<BuildVectorSDNode>(RHS.getNode());		BuildVectorSDNode *BVN = dyn_cast<BuildVectorSDNode>(RHS.getNode());
if (!BVN)		if (!BVN)
		paulwalker-armUnsubmitted Done Reply Inline Actions This looks odd. We shouldn't really be doing lower within DAGCombine. What happens if you just exit the combine for the invalid case? That said, I can see functions like `tryAdvSIMDModImm32()` are used in other part of codegen so I'm wondering if the prevention logic is best place within such functions so all use cases are covered. paulwalker-arm: This looks odd. We shouldn't really be doing lower within DAGCombine. What happens if you…
return SDValue();		return SDValue();

// AND does not accept an immediate, so check if we can use a BIC immediate		// AND does not accept an immediate, so check if we can use a BIC immediate
// instruction instead. We do this here instead of using a (and x, (mvni imm))		// instruction instead. We do this here instead of using a (and x, (mvni imm))
// pattern in isel, because some immediates may be lowered to the preferred		// pattern in isel, because some immediates may be lowered to the preferred
// (and x, (movi imm)) form, even though an mvni representation also exists.		// (and x, (movi imm)) form, even though an mvni representation also exists.
APInt DefBits(VT.getSizeInBits(), 0);		APInt DefBits(VT.getSizeInBits(), 0);
APInt UndefBits(VT.getSizeInBits(), 0);		APInt UndefBits(VT.getSizeInBits(), 0);
if (resolveBuildVector(BVN, DefBits, UndefBits)) {		if (resolveBuildVector(BVN, DefBits, UndefBits)) {
SDValue NewOp;		SDValue NewOp;

DefBits = ~DefBits;		DefBits = ~DefBits;
if ((NewOp = tryAdvSIMDModImm32(AArch64ISD::BICi, SDValue(N, 0), DAG,		if ((NewOp = tryAdvSIMDModImm32(AArch64ISD::BICi, SDValue(N, 0), DAG,
DefBits, &LHS)) \|\|		DefBits, &LHS, Subtarget)) \|\|
(NewOp = tryAdvSIMDModImm16(AArch64ISD::BICi, SDValue(N, 0), DAG,		(NewOp = tryAdvSIMDModImm16(AArch64ISD::BICi, SDValue(N, 0), DAG,
DefBits, &LHS)))		DefBits, &LHS, Subtarget)))
return NewOp;		return NewOp;

UndefBits = ~UndefBits;		UndefBits = ~UndefBits;
if ((NewOp = tryAdvSIMDModImm32(AArch64ISD::BICi, SDValue(N, 0), DAG,		if ((NewOp = tryAdvSIMDModImm32(AArch64ISD::BICi, SDValue(N, 0), DAG,
UndefBits, &LHS)) \|\|		UndefBits, &LHS, Subtarget)) \|\|
(NewOp = tryAdvSIMDModImm16(AArch64ISD::BICi, SDValue(N, 0), DAG,		(NewOp = tryAdvSIMDModImm16(AArch64ISD::BICi, SDValue(N, 0), DAG,
UndefBits, &LHS)))		UndefBits, &LHS, Subtarget)))
return NewOp;		return NewOp;
}		}

return SDValue();		return SDValue();
}		}

static bool hasPairwiseAdd(unsigned Opcode, EVT VT, bool FullFP16) {		static bool hasPairwiseAdd(unsigned Opcode, EVT VT, bool FullFP16) {
switch (Opcode) {		switch (Opcode) {
▲ Show 20 Lines • Show All 4,777 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::PerformDAGCombine(SDNode *N,
case ISD::FP_TO_SINT_SAT:		case ISD::FP_TO_SINT_SAT:
case ISD::FP_TO_UINT_SAT:		case ISD::FP_TO_UINT_SAT:
return performFpToIntCombine(N, DAG, DCI, Subtarget);		return performFpToIntCombine(N, DAG, DCI, Subtarget);
case ISD::FDIV:		case ISD::FDIV:
return performFDivCombine(N, DAG, DCI, Subtarget);		return performFDivCombine(N, DAG, DCI, Subtarget);
case ISD::OR:		case ISD::OR:
return performORCombine(N, DCI, Subtarget);		return performORCombine(N, DCI, Subtarget);
case ISD::AND:		case ISD::AND:
return performANDCombine(N, DCI);		return performANDCombine(N, DCI, Subtarget);
case ISD::INTRINSIC_WO_CHAIN:		case ISD::INTRINSIC_WO_CHAIN:
return performIntrinsicCombine(N, DCI, Subtarget);		return performIntrinsicCombine(N, DCI, Subtarget);
case ISD::ANY_EXTEND:		case ISD::ANY_EXTEND:
case ISD::ZERO_EXTEND:		case ISD::ZERO_EXTEND:
case ISD::SIGN_EXTEND:		case ISD::SIGN_EXTEND:
return performExtendCombine(N, DCI, DAG);		return performExtendCombine(N, DCI, DAG);
case ISD::SIGN_EXTEND_INREG:		case ISD::SIGN_EXTEND_INREG:
return performSignExtendInRegCombine(N, DCI, DAG);		return performSignExtendInRegCombine(N, DCI, DAG);
▲ Show 20 Lines • Show All 1,790 Lines • ▼ Show 20 Lines
}		}

// If a fixed length vector operation has no side effects when applied to		// If a fixed length vector operation has no side effects when applied to
// undefined elements, we can safely use scalable vectors to perform the same		// undefined elements, we can safely use scalable vectors to perform the same
// operation without needing to worry about predication.		// operation without needing to worry about predication.
SDValue AArch64TargetLowering::LowerToScalableOp(SDValue Op,		SDValue AArch64TargetLowering::LowerToScalableOp(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
assert(useSVEForFixedLengthVectorVT(VT) &&		assert(VT.isFixedLengthVector() && isTypeLegal(VT) &&
"Only expected to lower fixed length vector operation!");		"Only expected to lower fixed length vector operation!");
		paulwalker-armUnsubmitted Done Reply Inline Actions When we hit a similar issue with `LowerToPredicatedOp()` we decide to drop the calls to `useSVEForFixedLengthVectorVT()` in favour or just using `VT.isFixedLengthVector() && isTypeLegal(VT)`. Would the same work in your case? paulwalker-arm: When we hit a similar issue with `LowerToPredicatedOp()` we decide to drop the calls to…
		hassnaa-armAuthorUnsubmitted Done Reply Inline Actions Sorry, I don't understand. you mean dropping the call for `useSVEForFixedLengthVectorVT(...)` ? or you mean using use `SVEForFixedLengthVectorVT(VT)` without passing the ovrrideNEON parameter ? hassnaa-arm: Sorry, I don't understand. you mean dropping the call for `useSVEForFixedLengthVectorVT(...) `?
		paulwalker-armUnsubmitted Done Reply Inline Actions The former, so you can drop the call to `useSVEForFixedLengthVectorVT()` and instead have `assert(VT.isFixedLengthVector() && isTypeLegal(VT) && ...`. By this point we should be working with only legal types and there's no harm in handling any of them. paulwalker-arm: The former, so you can drop the call to `useSVEForFixedLengthVectorVT()` and instead have…
		hassnaa-armAuthorUnsubmitted Done Reply Inline Actions why do you suggest that instead of using `useSVEForFixedLengthVectorVT()` ? and why do you suggest it for `LowerToPredicatedOp()` only not also other lowering functions ? hassnaa-arm: why do you suggest that instead of using `useSVEForFixedLengthVectorVT()` ? and why do you…
		paulwalker-armUnsubmitted Done Reply Inline Actions `useSVEForFixedLengthVectorVT()` is a semi-complex function that exists to choose which path to take during code generation and is thusly used to determine how to lower an `ISD::ADD` for example. However, by calling `LowerToPredicatedOp()` you've already made that decision and so you only need to detect scenarios that would result in broken code generation. For the case of `LowerToPredicatedOp()` this just means ensuring the input is a legal fixed length vector. paulwalker-arm: `useSVEForFixedLengthVectorVT()` is a semi-complex function that exists to choose which path to…
EVT ContainerVT = getContainerForFixedLengthVector(DAG, VT);		EVT ContainerVT = getContainerForFixedLengthVector(DAG, VT);

// Create list of operands by converting existing ones to scalable types.		// Create list of operands by converting existing ones to scalable types.
SmallVector<SDValue, 4> Ops;		SmallVector<SDValue, 4> Ops;
for (const SDValue &V : Op->op_values()) {		for (const SDValue &V : Op->op_values()) {
assert(!isa<VTSDNode>(V) && "Unexpected VTSDNode node!");		assert(!isa<VTSDNode>(V) && "Unexpected VTSDNode node!");

// Pass through non-vector operands.		// Pass through non-vector operands.
if (!V.getValueType().isVector()) {		if (!V.getValueType().isVector()) {
Ops.push_back(V);		Ops.push_back(V);
continue;		continue;
}		}

// "cast" fixed length vector to a scalable vector.		// "cast" fixed length vector to a scalable vector.
assert(useSVEForFixedLengthVectorVT(V.getValueType()) &&		assert(useSVEForFixedLengthVectorVT(
		V.getValueType(), Subtarget->forceStreamingCompatibleSVE()) &&
"Only fixed length vectors are supported!");		"Only fixed length vectors are supported!");
Ops.push_back(convertToScalableVector(DAG, ContainerVT, V));		Ops.push_back(convertToScalableVector(DAG, ContainerVT, V));
}		}

auto ScalableRes = DAG.getNode(Op.getOpcode(), SDLoc(Op), ContainerVT, Ops);		auto ScalableRes = DAG.getNode(Op.getOpcode(), SDLoc(Op), ContainerVT, Ops);
return convertFromScalableVector(DAG, VT, ScalableRes);		return convertFromScalableVector(DAG, VT, ScalableRes);
}		}

▲ Show 20 Lines • Show All 570 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-arith.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -force-streaming-compatible-sve < %s \| FileCheck %s

				target triple = "aarch64-unknown-linux-gnu"

				;
				; ADD
				;
				define <4 x i8> @add_v4i8(<4 x i8> %op1, <4 x i8> %op2) #0 {
				; CHECK-LABEL: add_v4i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				david-armUnsubmitted Done Reply Inline Actions Could you also add a test for an illegal NEON type too, i.e. `<4 x i8>` or `<2 x i16>`? david-arm: Could you also add a test for an illegal NEON type too, i.e. `<4 x i8>` or `<2 x i16>`?
				; CHECK-NEXT: add z0.h, z0.h, z1.h
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = add <4 x i8> %op1, %op2
				ret <4 x i8> %res
				}

				define <8 x i8> @add_v8i8(<8 x i8> %op1, <8 x i8> %op2) #0 {
				; CHECK-LABEL: add_v8i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: add z0.b, z0.b, z1.b
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = add <8 x i8> %op1, %op2
				ret <8 x i8> %res
				}

				define <16 x i8> @add_v16i8(<16 x i8> %op1, <16 x i8> %op2) #0 {
				; CHECK-LABEL: add_v16i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: add z0.b, z0.b, z1.b
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%res = add <16 x i8> %op1, %op2
				ret <16 x i8> %res
				}

				define void @add_v32i8(<32 x i8>* %a, <32 x i8>* %b) #0 {
				; CHECK-LABEL: add_v32i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ldp q2, q3, [x1]
				; CHECK-NEXT: add z0.b, z0.b, z2.b
				; CHECK-NEXT: add z1.b, z1.b, z3.b
				; CHECK-NEXT: stp q0, q1, [x0]
				; CHECK-NEXT: ret
				%op1 = load <32 x i8>, <32 x i8>* %a
				%op2 = load <32 x i8>, <32 x i8>* %b
				%res = add <32 x i8> %op1, %op2
				store <32 x i8> %res, <32 x i8>* %a
				ret void
				}

				define <2 x i16> @add_v2i16(<2 x i16> %op1, <2 x i16> %op2) #0 {
				; CHECK-LABEL: add_v2i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: add z0.s, z0.s, z1.s
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = add <2 x i16> %op1, %op2
				ret <2 x i16> %res
				}

				define <4 x i16> @add_v4i16(<4 x i16> %op1, <4 x i16> %op2) #0 {
				; CHECK-LABEL: add_v4i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: add z0.h, z0.h, z1.h
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = add <4 x i16> %op1, %op2
				ret <4 x i16> %res
				}

				define <8 x i16> @add_v8i16(<8 x i16> %op1, <8 x i16> %op2) #0 {
				; CHECK-LABEL: add_v8i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: add z0.h, z0.h, z1.h
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				david-armUnsubmitted Not Done Reply Inline Actions Again, this is illegal in streaming mode. david-arm: Again, this is illegal in streaming mode.
				david-armUnsubmitted Not Done Reply Inline Actions Please ignore this comment! `stp q0, q1` is legal - my mistake! david-arm: Please ignore this comment! `stp q0, q1` is legal - my mistake!
				%res = add <8 x i16> %op1, %op2
				sdesmalenUnsubmitted Done Reply Inline Actions I think this test can be removed, because you've already covered the "twice as wide" case (32 x i8) which ensures we don't emit any other instructions not valid in streaming mode. The "four times as wide' should already be covered by `sve-fixed-length-int-arith.ll`. sdesmalen: I think this test can be removed, because you've already covered the "twice as wide" case (32 x…
				ret <8 x i16> %res
				}

				define void @add_v16i16(<16 x i16>* %a, <16 x i16>* %b) #0 {
				; CHECK-LABEL: add_v16i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ldp q2, q3, [x1]
				; CHECK-NEXT: add z0.h, z0.h, z2.h
				; CHECK-NEXT: add z1.h, z1.h, z3.h
				; CHECK-NEXT: stp q0, q1, [x0]
				; CHECK-NEXT: ret
				%op1 = load <16 x i16>, <16 x i16>* %a
				%op2 = load <16 x i16>, <16 x i16>* %b
				%res = add <16 x i16> %op1, %op2
				store <16 x i16> %res, <16 x i16>* %a
				ret void
				}

				define <2 x i32> @add_v2i32(<2 x i32> %op1, <2 x i32> %op2) #0 {
				; CHECK-LABEL: add_v2i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: add z0.s, z0.s, z1.s
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = add <2 x i32> %op1, %op2
				ret <2 x i32> %res
				}

				define <4 x i32> @add_v4i32(<4 x i32> %op1, <4 x i32> %op2) #0 {
				; CHECK-LABEL: add_v4i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: add z0.s, z0.s, z1.s
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%res = add <4 x i32> %op1, %op2
				ret <4 x i32> %res
				}

				define void @add_v8i32(<8 x i32>* %a, <8 x i32>* %b) #0 {
				; CHECK-LABEL: add_v8i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ldp q2, q3, [x1]
				; CHECK-NEXT: add z0.s, z0.s, z2.s
				; CHECK-NEXT: add z1.s, z1.s, z3.s
				; CHECK-NEXT: stp q0, q1, [x0]
				; CHECK-NEXT: ret
				%op1 = load <8 x i32>, <8 x i32>* %a
				%op2 = load <8 x i32>, <8 x i32>* %b
				%res = add <8 x i32> %op1, %op2
				store <8 x i32> %res, <8 x i32>* %a
				ret void
				}
				sdesmalenUnsubmitted Done Reply Inline Actions This test can be removed for the same reason as mentioned above. sdesmalen: This test can be removed for the same reason as mentioned above.

				define <1 x i64> @add_v1i64(<1 x i64> %op1, <1 x i64> %op2) #0 {
				; CHECK-LABEL: add_v1i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: add z0.d, z0.d, z1.d
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = add <1 x i64> %op1, %op2
				ret <1 x i64> %res
				}

				define <2 x i64> @add_v2i64(<2 x i64> %op1, <2 x i64> %op2) #0 {
				; CHECK-LABEL: add_v2i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: add z0.d, z0.d, z1.d
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%res = add <2 x i64> %op1, %op2
				ret <2 x i64> %res
				}

				define void @add_v4i64(<4 x i64>* %a, <4 x i64>* %b) #0 {
				; CHECK-LABEL: add_v4i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ldp q2, q3, [x1]
				; CHECK-NEXT: add z0.d, z0.d, z2.d
				; CHECK-NEXT: add z1.d, z1.d, z3.d
				; CHECK-NEXT: stp q0, q1, [x0]
				; CHECK-NEXT: ret
				%op1 = load <4 x i64>, <4 x i64>* %a
				%op2 = load <4 x i64>, <4 x i64>* %b
				%res = add <4 x i64> %op1, %op2
				store <4 x i64> %res, <4 x i64>* %a
				ret void
				}

				;
				; MUL
				;

				define <4 x i8> @mul_v4i8(<4 x i8> %op1, <4 x i8> %op2) #0 {
				; CHECK-LABEL: mul_v4i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.h, vl4
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: mul z0.h, p0/m, z0.h, z1.h
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = mul <4 x i8> %op1, %op2
				ret <4 x i8> %res
				}

				define <8 x i8> @mul_v8i8(<8 x i8> %op1, <8 x i8> %op2) #0 {
				; CHECK-LABEL: mul_v8i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.b, vl8
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: mul z0.b, p0/m, z0.b, z1.b
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = mul <8 x i8> %op1, %op2
				ret <8 x i8> %res
				}

				define <16 x i8> @mul_v16i8(<16 x i8> %op1, <16 x i8> %op2) #0 {
				; CHECK-LABEL: mul_v16i8:
				sdesmalenUnsubmitted Done Reply Inline Actions This test can be removed for the same reason as mentioned above. sdesmalen: This test can be removed for the same reason as mentioned above.
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: ptrue p0.b, vl16
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: mul z0.b, p0/m, z0.b, z1.b
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%res = mul <16 x i8> %op1, %op2
				ret <16 x i8> %res
				}

				define void @mul_v32i8(<32 x i8>* %a, <32 x i8>* %b) #0 {
				; CHECK-LABEL: mul_v32i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ptrue p0.b, vl16
				; CHECK-NEXT: ldp q2, q3, [x1]
				; CHECK-NEXT: mul z0.b, p0/m, z0.b, z2.b
				; CHECK-NEXT: mul z1.b, p0/m, z1.b, z3.b
				; CHECK-NEXT: stp q0, q1, [x0]
				; CHECK-NEXT: ret
				%op1 = load <32 x i8>, <32 x i8>* %a
				%op2 = load <32 x i8>, <32 x i8>* %b
				%res = mul <32 x i8> %op1, %op2
				store <32 x i8> %res, <32 x i8>* %a
				ret void
				}

				define <2 x i16> @mul_v2i16(<2 x i16> %op1, <2 x i16> %op2) #0 {
				; CHECK-LABEL: mul_v2i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.s, vl2
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: mul z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = mul <2 x i16> %op1, %op2
				ret <2 x i16> %res
				}

				define <4 x i16> @mul_v4i16(<4 x i16> %op1, <4 x i16> %op2) #0 {
				; CHECK-LABEL: mul_v4i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.h, vl4
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: mul z0.h, p0/m, z0.h, z1.h
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = mul <4 x i16> %op1, %op2
				ret <4 x i16> %res
				}

				define <8 x i16> @mul_v8i16(<8 x i16> %op1, <8 x i16> %op2) #0 {
				; CHECK-LABEL: mul_v8i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: ptrue p0.h, vl8
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: mul z0.h, p0/m, z0.h, z1.h
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%res = mul <8 x i16> %op1, %op2
				ret <8 x i16> %res
				}

				define void @mul_v16i16(<16 x i16>* %a, <16 x i16>* %b) #0 {
				; CHECK-LABEL: mul_v16i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ptrue p0.h, vl8
				; CHECK-NEXT: ldp q2, q3, [x1]
				sdesmalenUnsubmitted Done Reply Inline Actions This test can be removed for the same reason as mentioned above. (same for all other 4 x as wide instances in the remainder of this file and other files in this patch) sdesmalen: This test can be removed for the same reason as mentioned above. (same for all other 4 x as…
				; CHECK-NEXT: mul z0.h, p0/m, z0.h, z2.h
				; CHECK-NEXT: mul z1.h, p0/m, z1.h, z3.h
				; CHECK-NEXT: stp q0, q1, [x0]
				; CHECK-NEXT: ret
				%op1 = load <16 x i16>, <16 x i16>* %a
				%op2 = load <16 x i16>, <16 x i16>* %b
				%res = mul <16 x i16> %op1, %op2
				store <16 x i16> %res, <16 x i16>* %a
				ret void
				}

				define <2 x i32> @mul_v2i32(<2 x i32> %op1, <2 x i32> %op2) #0 {
				; CHECK-LABEL: mul_v2i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.s, vl2
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: mul z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = mul <2 x i32> %op1, %op2
				ret <2 x i32> %res
				}

				define <4 x i32> @mul_v4i32(<4 x i32> %op1, <4 x i32> %op2) #0 {
				; CHECK-LABEL: mul_v4i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: ptrue p0.s, vl4
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: mul z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%res = mul <4 x i32> %op1, %op2
				ret <4 x i32> %res
				}

				define void @mul_v8i32(<8 x i32>* %a, <8 x i32>* %b) #0 {
				; CHECK-LABEL: mul_v8i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ptrue p0.s, vl4
				; CHECK-NEXT: ldp q2, q3, [x1]
				; CHECK-NEXT: mul z0.s, p0/m, z0.s, z2.s
				; CHECK-NEXT: mul z1.s, p0/m, z1.s, z3.s
				; CHECK-NEXT: stp q0, q1, [x0]
				; CHECK-NEXT: ret
				%op1 = load <8 x i32>, <8 x i32>* %a
				%op2 = load <8 x i32>, <8 x i32>* %b
				%res = mul <8 x i32> %op1, %op2
				store <8 x i32> %res, <8 x i32>* %a
				ret void
				}

				define <1 x i64> @mul_v1i64(<1 x i64> %op1, <1 x i64> %op2) #0 {
				; CHECK-LABEL: mul_v1i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.d, vl1
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: mul z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = mul <1 x i64> %op1, %op2
				ret <1 x i64> %res
				}

				define <2 x i64> @mul_v2i64(<2 x i64> %op1, <2 x i64> %op2) #0 {
				; CHECK-LABEL: mul_v2i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: ptrue p0.d, vl2
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: mul z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%res = mul <2 x i64> %op1, %op2
				ret <2 x i64> %res
				}

				define void @mul_v4i64(<4 x i64>* %a, <4 x i64>* %b) #0 {
				; CHECK-LABEL: mul_v4i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ptrue p0.d, vl2
				; CHECK-NEXT: ldp q2, q3, [x1]
				; CHECK-NEXT: mul z0.d, p0/m, z0.d, z2.d
				; CHECK-NEXT: mul z1.d, p0/m, z1.d, z3.d
				; CHECK-NEXT: stp q0, q1, [x0]
				; CHECK-NEXT: ret
				%op1 = load <4 x i64>, <4 x i64>* %a
				%op2 = load <4 x i64>, <4 x i64>* %b
				%res = mul <4 x i64> %op1, %op2
				store <4 x i64> %res, <4 x i64>* %a
				ret void
				}

				;
				; SUB
				;

				define <4 x i8> @sub_v4i8(<4 x i8> %op1, <4 x i8> %op2) #0 {
				; CHECK-LABEL: sub_v4i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: sub z0.h, z0.h, z1.h
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = sub <4 x i8> %op1, %op2
				ret <4 x i8> %res
				}

				define <8 x i8> @sub_v8i8(<8 x i8> %op1, <8 x i8> %op2) #0 {
				; CHECK-LABEL: sub_v8i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: sub z0.b, z0.b, z1.b
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = sub <8 x i8> %op1, %op2
				ret <8 x i8> %res
				}

				define <16 x i8> @sub_v16i8(<16 x i8> %op1, <16 x i8> %op2) #0 {
				; CHECK-LABEL: sub_v16i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: sub z0.b, z0.b, z1.b
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%res = sub <16 x i8> %op1, %op2
				ret <16 x i8> %res
				}

				define void @sub_v32i8(<32 x i8>* %a, <32 x i8>* %b) #0 {
				; CHECK-LABEL: sub_v32i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ldp q2, q3, [x1]
				; CHECK-NEXT: sub z0.b, z0.b, z2.b
				; CHECK-NEXT: sub z1.b, z1.b, z3.b
				; CHECK-NEXT: stp q0, q1, [x0]
				; CHECK-NEXT: ret
				%op1 = load <32 x i8>, <32 x i8>* %a
				%op2 = load <32 x i8>, <32 x i8>* %b
				%res = sub <32 x i8> %op1, %op2
				store <32 x i8> %res, <32 x i8>* %a
				ret void
				}

				define <2 x i16> @sub_v2i16(<2 x i16> %op1, <2 x i16> %op2) #0 {
				; CHECK-LABEL: sub_v2i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: sub z0.s, z0.s, z1.s
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = sub <2 x i16> %op1, %op2
				ret <2 x i16> %res
				}

				define <4 x i16> @sub_v4i16(<4 x i16> %op1, <4 x i16> %op2) #0 {
				; CHECK-LABEL: sub_v4i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: sub z0.h, z0.h, z1.h
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = sub <4 x i16> %op1, %op2
				ret <4 x i16> %res
				}
				david-armUnsubmitted Done Reply Inline Actions Again, could you add at least one illegal type - `<4 x i8>` or `<2 x i16>`? david-arm: Again, could you add at least one illegal type - `<4 x i8>` or `<2 x i16>`?

				define <8 x i16> @sub_v8i16(<8 x i16> %op1, <8 x i16> %op2) #0 {
				; CHECK-LABEL: sub_v8i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: sub z0.h, z0.h, z1.h
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%res = sub <8 x i16> %op1, %op2
				ret <8 x i16> %res
				}

				define void @sub_v16i16(<16 x i16>* %a, <16 x i16>* %b) #0 {
				; CHECK-LABEL: sub_v16i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ldp q2, q3, [x1]
				; CHECK-NEXT: sub z0.h, z0.h, z2.h
				; CHECK-NEXT: sub z1.h, z1.h, z3.h
				; CHECK-NEXT: stp q0, q1, [x0]
				; CHECK-NEXT: ret
				%op1 = load <16 x i16>, <16 x i16>* %a
				%op2 = load <16 x i16>, <16 x i16>* %b
				%res = sub <16 x i16> %op1, %op2
				store <16 x i16> %res, <16 x i16>* %a
				ret void
				}

				define <2 x i32> @sub_v2i32(<2 x i32> %op1, <2 x i32> %op2) #0 {
				; CHECK-LABEL: sub_v2i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: sub z0.s, z0.s, z1.s
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = sub <2 x i32> %op1, %op2
				ret <2 x i32> %res
				}

				define <4 x i32> @sub_v4i32(<4 x i32> %op1, <4 x i32> %op2) #0 {
				; CHECK-LABEL: sub_v4i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: sub z0.s, z0.s, z1.s
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%res = sub <4 x i32> %op1, %op2
				ret <4 x i32> %res
				}

				define void @sub_v8i32(<8 x i32>* %a, <8 x i32>* %b) #0 {
				; CHECK-LABEL: sub_v8i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ldp q2, q3, [x1]
				; CHECK-NEXT: sub z0.s, z0.s, z2.s
				; CHECK-NEXT: sub z1.s, z1.s, z3.s
				; CHECK-NEXT: stp q0, q1, [x0]
				; CHECK-NEXT: ret
				%op1 = load <8 x i32>, <8 x i32>* %a
				%op2 = load <8 x i32>, <8 x i32>* %b
				%res = sub <8 x i32> %op1, %op2
				store <8 x i32> %res, <8 x i32>* %a
				ret void
				}

				define <1 x i64> @sub_v1i64(<1 x i64> %op1, <1 x i64> %op2) #0 {
				; CHECK-LABEL: sub_v1i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: sub z0.d, z0.d, z1.d
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = sub <1 x i64> %op1, %op2
				ret <1 x i64> %res
				}

				define <2 x i64> @sub_v2i64(<2 x i64> %op1, <2 x i64> %op2) #0 {
				; CHECK-LABEL: sub_v2i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: sub z0.d, z0.d, z1.d
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%res = sub <2 x i64> %op1, %op2
				ret <2 x i64> %res
				}

				define void @sub_v4i64(<4 x i64>* %a, <4 x i64>* %b) #0 {
				; CHECK-LABEL: sub_v4i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ldp q2, q3, [x1]
				; CHECK-NEXT: sub z0.d, z0.d, z2.d
				; CHECK-NEXT: sub z1.d, z1.d, z3.d
				; CHECK-NEXT: stp q0, q1, [x0]
				; CHECK-NEXT: ret
				%op1 = load <4 x i64>, <4 x i64>* %a
				%op2 = load <4 x i64>, <4 x i64>* %b
				%res = sub <4 x i64> %op1, %op2
				store <4 x i64> %res, <4 x i64>* %a
				ret void
				}

				;
				; ABS
				;

				define <4 x i8> @abs_v4i8(<4 x i8> %op1) #0 {
				; CHECK-LABEL: abs_v4i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: adrp x8, .LCPI42_0
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.h, vl4
				; CHECK-NEXT: ldr d1, [x8, :lo12:.LCPI42_0]
				; CHECK-NEXT: lsl z0.h, p0/m, z0.h, z1.h
				; CHECK-NEXT: asr z0.h, p0/m, z0.h, z1.h
				; CHECK-NEXT: abs z0.h, p0/m, z0.h
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = call <4 x i8> @llvm.abs.v4i8(<4 x i8> %op1, i1 false)
				ret <4 x i8> %res
				}

				define <8 x i8> @abs_v8i8(<8 x i8> %op1) #0 {
				; CHECK-LABEL: abs_v8i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.b, vl8
				; CHECK-NEXT: abs z0.b, p0/m, z0.b
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = call <8 x i8> @llvm.abs.v8i8(<8 x i8> %op1, i1 false)
				ret <8 x i8> %res
				}

				define <16 x i8> @abs_v16i8(<16 x i8> %op1) #0 {
				; CHECK-LABEL: abs_v16i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: ptrue p0.b, vl16
				; CHECK-NEXT: abs z0.b, p0/m, z0.b
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%res = call <16 x i8> @llvm.abs.v16i8(<16 x i8> %op1, i1 false)
				ret <16 x i8> %res
				}

				define void @abs_v32i8(<32 x i8>* %a) #0 {
				; CHECK-LABEL: abs_v32i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ptrue p0.b, vl16
				; CHECK-NEXT: abs z0.b, p0/m, z0.b
				; CHECK-NEXT: abs z1.b, p0/m, z1.b
				; CHECK-NEXT: stp q0, q1, [x0]
				; CHECK-NEXT: ret
				%op1 = load <32 x i8>, <32 x i8>* %a
				%res = call <32 x i8> @llvm.abs.v32i8(<32 x i8> %op1, i1 false)
				store <32 x i8> %res, <32 x i8>* %a
				ret void
				}

				define <2 x i16> @abs_v2i16(<2 x i16> %op1) #0 {
				; CHECK-LABEL: abs_v2i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: adrp x8, .LCPI46_0
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.s, vl2
				; CHECK-NEXT: ldr d1, [x8, :lo12:.LCPI46_0]
				; CHECK-NEXT: lsl z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: asr z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: abs z0.s, p0/m, z0.s
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = call <2 x i16> @llvm.abs.v2i16(<2 x i16> %op1, i1 false)
				ret <2 x i16> %res
				}

				define <4 x i16> @abs_v4i16(<4 x i16> %op1) #0 {
				; CHECK-LABEL: abs_v4i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.h, vl4
				; CHECK-NEXT: abs z0.h, p0/m, z0.h
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = call <4 x i16> @llvm.abs.v4i16(<4 x i16> %op1, i1 false)
				ret <4 x i16> %res
				}

				define <8 x i16> @abs_v8i16(<8 x i16> %op1) #0 {
				; CHECK-LABEL: abs_v8i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: ptrue p0.h, vl8
				; CHECK-NEXT: abs z0.h, p0/m, z0.h
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%res = call <8 x i16> @llvm.abs.v8i16(<8 x i16> %op1, i1 false)
				ret <8 x i16> %res
				}

				define void @abs_v16i16(<16 x i16>* %a) #0 {
				; CHECK-LABEL: abs_v16i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ptrue p0.h, vl8
				; CHECK-NEXT: abs z0.h, p0/m, z0.h
				; CHECK-NEXT: abs z1.h, p0/m, z1.h
				; CHECK-NEXT: stp q0, q1, [x0]
				; CHECK-NEXT: ret
				%op1 = load <16 x i16>, <16 x i16>* %a
				%res = call <16 x i16> @llvm.abs.v16i16(<16 x i16> %op1, i1 false)
				store <16 x i16> %res, <16 x i16>* %a
				ret void
				}

				define <2 x i32> @abs_v2i32(<2 x i32> %op1) #0 {
				; CHECK-LABEL: abs_v2i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.s, vl2
				; CHECK-NEXT: abs z0.s, p0/m, z0.s
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = call <2 x i32> @llvm.abs.v2i32(<2 x i32> %op1, i1 false)
				ret <2 x i32> %res
				}

				define <4 x i32> @abs_v4i32(<4 x i32> %op1) #0 {
				; CHECK-LABEL: abs_v4i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: ptrue p0.s, vl4
				; CHECK-NEXT: abs z0.s, p0/m, z0.s
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%res = call <4 x i32> @llvm.abs.v4i32(<4 x i32> %op1, i1 false)
				ret <4 x i32> %res
				}

				define void @abs_v8i32(<8 x i32>* %a) #0 {
				; CHECK-LABEL: abs_v8i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ptrue p0.s, vl4
				; CHECK-NEXT: abs z0.s, p0/m, z0.s
				; CHECK-NEXT: abs z1.s, p0/m, z1.s
				; CHECK-NEXT: stp q0, q1, [x0]
				; CHECK-NEXT: ret
				%op1 = load <8 x i32>, <8 x i32>* %a
				%res = call <8 x i32> @llvm.abs.v8i32(<8 x i32> %op1, i1 false)
				store <8 x i32> %res, <8 x i32>* %a
				ret void
				}

				define <1 x i64> @abs_v1i64(<1 x i64> %op1) #0 {
				; CHECK-LABEL: abs_v1i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.d, vl1
				; CHECK-NEXT: abs z0.d, p0/m, z0.d
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = call <1 x i64> @llvm.abs.v1i64(<1 x i64> %op1, i1 false)
				ret <1 x i64> %res
				}

				define <2 x i64> @abs_v2i64(<2 x i64> %op1) #0 {
				; CHECK-LABEL: abs_v2i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: ptrue p0.d, vl2
				; CHECK-NEXT: abs z0.d, p0/m, z0.d
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%res = call <2 x i64> @llvm.abs.v2i64(<2 x i64> %op1, i1 false)
				ret <2 x i64> %res
				}

				define void @abs_v4i64(<4 x i64>* %a) #0 {
				; CHECK-LABEL: abs_v4i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ptrue p0.d, vl2
				; CHECK-NEXT: abs z0.d, p0/m, z0.d
				; CHECK-NEXT: abs z1.d, p0/m, z1.d
				; CHECK-NEXT: stp q0, q1, [x0]
				; CHECK-NEXT: ret
				%op1 = load <4 x i64>, <4 x i64>* %a
				%res = call <4 x i64> @llvm.abs.v4i64(<4 x i64> %op1, i1 false)
				store <4 x i64> %res, <4 x i64>* %a
				ret void
				}

				declare <4 x i8> @llvm.abs.v4i8(<4 x i8>, i1)
				declare <8 x i8> @llvm.abs.v8i8(<8 x i8>, i1)
				declare <16 x i8> @llvm.abs.v16i8(<16 x i8>, i1)
				declare <32 x i8> @llvm.abs.v32i8(<32 x i8>, i1)
				declare <4 x i16> @llvm.abs.v4i16(<4 x i16>, i1)
				declare <2 x i16> @llvm.abs.v2i16(<2 x i16>, i1)
				declare <8 x i16> @llvm.abs.v8i16(<8 x i16>, i1)
				declare <16 x i16> @llvm.abs.v16i16(<16 x i16>, i1)
				declare <2 x i32> @llvm.abs.v2i32(<2 x i32>, i1)
				declare <4 x i32> @llvm.abs.v4i32(<4 x i32>, i1)
				declare <8 x i32> @llvm.abs.v8i32(<8 x i32>, i1)
				declare <1 x i64> @llvm.abs.v1i64(<1 x i64>, i1)
				declare <2 x i64> @llvm.abs.v2i64(<2 x i64>, i1)
				declare <4 x i64> @llvm.abs.v4i64(<4 x i64>, i1)


				attributes #0 = { "target-features"="+sve" }
				david-armUnsubmitted Not Done Reply Inline Actions This looks like a NEON instruction - can you investigate where this is coming from? david-arm: This looks like a NEON instruction - can you investigate where this is coming from?
				david-armUnsubmitted Not Done Reply Inline Actions Please ignore this - my mistake! david-arm: Please ignore this - my mistake!
				david-armUnsubmitted Done Reply Inline Actions Can you add an illegal NEON type such as `<2 x i16>`? david-arm: Can you add an illegal NEON type such as `<2 x i16>`?

llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-div.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -force-streaming-compatible-sve < %s \| FileCheck %s

				target triple = "aarch64-unknown-linux-gnu"

				;
				; SDIV
				;

				define <4 x i8> @sdiv_v4i8(<4 x i8> %op1, <4 x i8> %op2) #0 {
				; CHECK-LABEL: sdiv_v4i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: sub sp, sp, #16
				; CHECK-NEXT: .cfi_def_cfa_offset 16
				; CHECK-NEXT: adrp x8, .LCPI0_0
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.h, vl4
				; CHECK-NEXT: ldr d2, [x8, :lo12:.LCPI0_0]
				; CHECK-NEXT: lsl z0.h, p0/m, z0.h, z2.h
				; CHECK-NEXT: lsl z1.h, p0/m, z1.h, z2.h
				; CHECK-NEXT: asr z0.h, p0/m, z0.h, z2.h
				; CHECK-NEXT: asr z1.h, p0/m, z1.h, z2.h
				; CHECK-NEXT: ptrue p0.s, vl4
				; CHECK-NEXT: sunpklo z1.s, z1.h
				; CHECK-NEXT: sunpklo z0.s, z0.h
				; CHECK-NEXT: sdiv z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: fmov w8, s0
				; CHECK-NEXT: mov z1.s, z0.s[3]
				; CHECK-NEXT: mov z2.s, z0.s[2]
				; CHECK-NEXT: mov z0.s, z0.s[1]
				; CHECK-NEXT: fmov w9, s1
				; CHECK-NEXT: fmov w10, s2
				; CHECK-NEXT: strh w8, [sp, #8]
				; CHECK-NEXT: fmov w8, s0
				; CHECK-NEXT: strh w9, [sp, #14]
				; CHECK-NEXT: strh w10, [sp, #12]
				; CHECK-NEXT: strh w8, [sp, #10]
				; CHECK-NEXT: ldr d0, [sp, #8]
				; CHECK-NEXT: add sp, sp, #16
				; CHECK-NEXT: ret
				%res = sdiv <4 x i8> %op1, %op2
				ret <4 x i8> %res
				}

				define <8 x i8> @sdiv_v8i8(<8 x i8> %op1, <8 x i8> %op2) #0 {
				; CHECK-LABEL: sdiv_v8i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: sub sp, sp, #16
				; CHECK-NEXT: .cfi_def_cfa_offset 16
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.s, vl4
				; CHECK-NEXT: sunpklo z1.h, z1.b
				; CHECK-NEXT: sunpklo z0.h, z0.b
				; CHECK-NEXT: sunpkhi z2.s, z1.h
				; CHECK-NEXT: sunpkhi z3.s, z0.h
				; CHECK-NEXT: sunpklo z1.s, z1.h
				; CHECK-NEXT: sunpklo z0.s, z0.h
				; CHECK-NEXT: sdivr z2.s, p0/m, z2.s, z3.s
				; CHECK-NEXT: sdiv z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: uzp1 z0.h, z0.h, z2.h
				; CHECK-NEXT: mov z1.h, z0.h[7]
				; CHECK-NEXT: fmov w8, s0
				; CHECK-NEXT: fmov w9, s1
				; CHECK-NEXT: mov z2.h, z0.h[6]
				; CHECK-NEXT: mov z3.h, z0.h[5]
				; CHECK-NEXT: mov z4.h, z0.h[4]
				; CHECK-NEXT: fmov w10, s2
				; CHECK-NEXT: strb w8, [sp, #8]
				; CHECK-NEXT: fmov w8, s3
				; CHECK-NEXT: strb w9, [sp, #15]
				; CHECK-NEXT: fmov w9, s4
				; CHECK-NEXT: mov z5.h, z0.h[3]
				; CHECK-NEXT: mov z6.h, z0.h[2]
				; CHECK-NEXT: mov z0.h, z0.h[1]
				; CHECK-NEXT: strb w10, [sp, #14]
				; CHECK-NEXT: fmov w10, s5
				; CHECK-NEXT: strb w8, [sp, #13]
				; CHECK-NEXT: fmov w8, s6
				; CHECK-NEXT: strb w9, [sp, #12]
				; CHECK-NEXT: fmov w9, s0
				; CHECK-NEXT: strb w10, [sp, #11]
				; CHECK-NEXT: strb w8, [sp, #10]
				; CHECK-NEXT: strb w9, [sp, #9]
				; CHECK-NEXT: ldr d0, [sp, #8]
				; CHECK-NEXT: add sp, sp, #16
				; CHECK-NEXT: ret
				%res = sdiv <8 x i8> %op1, %op2
				ret <8 x i8> %res
				}

				define <16 x i8> @sdiv_v16i8(<16 x i8> %op1, <16 x i8> %op2) #0 {
				; CHECK-LABEL: sdiv_v16i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: sunpkhi z2.h, z1.b
				; CHECK-NEXT: sunpkhi z3.h, z0.b
				; CHECK-NEXT: ptrue p0.s, vl4
				; CHECK-NEXT: sunpklo z1.h, z1.b
				; CHECK-NEXT: sunpkhi z4.s, z2.h
				; CHECK-NEXT: sunpkhi z5.s, z3.h
				; CHECK-NEXT: sunpklo z2.s, z2.h
				; CHECK-NEXT: sunpklo z3.s, z3.h
				; CHECK-NEXT: sunpklo z0.h, z0.b
				; CHECK-NEXT: sdivr z4.s, p0/m, z4.s, z5.s
				; CHECK-NEXT: sdivr z2.s, p0/m, z2.s, z3.s
				; CHECK-NEXT: sunpkhi z3.s, z1.h
				; CHECK-NEXT: sunpkhi z5.s, z0.h
				; CHECK-NEXT: sunpklo z1.s, z1.h
				; CHECK-NEXT: sunpklo z0.s, z0.h
				; CHECK-NEXT: sdivr z3.s, p0/m, z3.s, z5.s
				; CHECK-NEXT: sdiv z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: uzp1 z1.h, z2.h, z4.h
				; CHECK-NEXT: uzp1 z0.h, z0.h, z3.h
				; CHECK-NEXT: uzp1 z0.b, z0.b, z1.b
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%res = sdiv <16 x i8> %op1, %op2
				ret <16 x i8> %res
				}

				define void @sdiv_v32i8(<32 x i8>* %a, <32 x i8>* %b) #0 {
				; CHECK-LABEL: sdiv_v32i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q3, q0, [x1]
				; CHECK-NEXT: ptrue p0.s, vl4
				; CHECK-NEXT: ldp q2, q1, [x0]
				; CHECK-NEXT: sunpkhi z4.h, z0.b
				; CHECK-NEXT: sunpklo z0.h, z0.b
				; CHECK-NEXT: sunpkhi z6.s, z4.h
				; CHECK-NEXT: sunpklo z4.s, z4.h
				; CHECK-NEXT: sunpkhi z16.s, z0.h
				; CHECK-NEXT: sunpklo z0.s, z0.h
				; CHECK-NEXT: sunpkhi z5.h, z1.b
				; CHECK-NEXT: sunpklo z1.h, z1.b
				; CHECK-NEXT: sunpkhi z7.s, z5.h
				; CHECK-NEXT: sunpklo z5.s, z5.h
				; CHECK-NEXT: sdivr z6.s, p0/m, z6.s, z7.s
				; CHECK-NEXT: sdivr z4.s, p0/m, z4.s, z5.s
				; CHECK-NEXT: sunpkhi z5.s, z1.h
				; CHECK-NEXT: sunpklo z1.s, z1.h
				; CHECK-NEXT: uzp1 z4.h, z4.h, z6.h
				; CHECK-NEXT: sdivr z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: sunpkhi z1.h, z3.b
				; CHECK-NEXT: sunpkhi z6.h, z2.b
				; CHECK-NEXT: sdiv z5.s, p0/m, z5.s, z16.s
				; CHECK-NEXT: sunpkhi z7.s, z1.h
				; CHECK-NEXT: sunpkhi z16.s, z6.h
				; CHECK-NEXT: sunpklo z1.s, z1.h
				; CHECK-NEXT: sunpklo z6.s, z6.h
				; CHECK-NEXT: sunpklo z3.h, z3.b
				; CHECK-NEXT: sunpklo z2.h, z2.b
				; CHECK-NEXT: sdivr z7.s, p0/m, z7.s, z16.s
				; CHECK-NEXT: sdivr z1.s, p0/m, z1.s, z6.s
				; CHECK-NEXT: sunpkhi z6.s, z3.h
				; CHECK-NEXT: sunpkhi z16.s, z2.h
				; CHECK-NEXT: sunpklo z3.s, z3.h
				; CHECK-NEXT: sunpklo z2.s, z2.h
				; CHECK-NEXT: sdivr z6.s, p0/m, z6.s, z16.s
				; CHECK-NEXT: sdiv z2.s, p0/m, z2.s, z3.s
				; CHECK-NEXT: uzp1 z1.h, z1.h, z7.h
				; CHECK-NEXT: uzp1 z2.h, z2.h, z6.h
				; CHECK-NEXT: uzp1 z0.h, z0.h, z5.h
				; CHECK-NEXT: uzp1 z1.b, z2.b, z1.b
				; CHECK-NEXT: uzp1 z0.b, z0.b, z4.b
				; CHECK-NEXT: stp q1, q0, [x0]
				; CHECK-NEXT: ret
				%op1 = load <32 x i8>, <32 x i8>* %a
				%op2 = load <32 x i8>, <32 x i8>* %b
				%res = sdiv <32 x i8> %op1, %op2
				store <32 x i8> %res, <32 x i8>* %a
				ret void
				}

				define <2 x i16> @sdiv_v2i16(<2 x i16> %op1, <2 x i16> %op2) #0 {
				; CHECK-LABEL: sdiv_v2i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: adrp x8, .LCPI4_0
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.s, vl2
				; CHECK-NEXT: ldr d2, [x8, :lo12:.LCPI4_0]
				; CHECK-NEXT: lsl z1.s, p0/m, z1.s, z2.s
				; CHECK-NEXT: lsl z0.s, p0/m, z0.s, z2.s
				; CHECK-NEXT: asr z1.s, p0/m, z1.s, z2.s
				; CHECK-NEXT: asr z0.s, p0/m, z0.s, z2.s
				; CHECK-NEXT: sdiv z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = sdiv <2 x i16> %op1, %op2
				ret <2 x i16> %res
				}

				define <4 x i16> @sdiv_v4i16(<4 x i16> %op1, <4 x i16> %op2) #0 {
				; CHECK-LABEL: sdiv_v4i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: sub sp, sp, #16
				; CHECK-NEXT: .cfi_def_cfa_offset 16
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.s, vl4
				; CHECK-NEXT: sunpklo z1.s, z1.h
				; CHECK-NEXT: sunpklo z0.s, z0.h
				; CHECK-NEXT: sdiv z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: fmov w8, s0
				; CHECK-NEXT: mov z1.s, z0.s[3]
				; CHECK-NEXT: mov z2.s, z0.s[2]
				; CHECK-NEXT: mov z0.s, z0.s[1]
				; CHECK-NEXT: fmov w9, s1
				; CHECK-NEXT: fmov w10, s2
				; CHECK-NEXT: strh w8, [sp, #8]
				; CHECK-NEXT: fmov w8, s0
				; CHECK-NEXT: strh w9, [sp, #14]
				; CHECK-NEXT: strh w10, [sp, #12]
				; CHECK-NEXT: strh w8, [sp, #10]
				; CHECK-NEXT: ldr d0, [sp, #8]
				; CHECK-NEXT: add sp, sp, #16
				; CHECK-NEXT: ret
				%res = sdiv <4 x i16> %op1, %op2
				ret <4 x i16> %res
				}

				define <8 x i16> @sdiv_v8i16(<8 x i16> %op1, <8 x i16> %op2) #0 {
				; CHECK-LABEL: sdiv_v8i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: ptrue p0.s, vl4
				; CHECK-NEXT: sunpkhi z2.s, z1.h
				; CHECK-NEXT: sunpkhi z3.s, z0.h
				; CHECK-NEXT: sunpklo z1.s, z1.h
				; CHECK-NEXT: sunpklo z0.s, z0.h
				; CHECK-NEXT: sdivr z2.s, p0/m, z2.s, z3.s
				; CHECK-NEXT: sdiv z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: uzp1 z0.h, z0.h, z2.h
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%res = sdiv <8 x i16> %op1, %op2
				ret <8 x i16> %res
				}

				define void @sdiv_v16i16(<16 x i16>* %a, <16 x i16>* %b) #0 {
				; CHECK-LABEL: sdiv_v16i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x1]
				; CHECK-NEXT: ptrue p0.s, vl4
				; CHECK-NEXT: sunpkhi z6.s, z0.h
				; CHECK-NEXT: sunpklo z0.s, z0.h
				; CHECK-NEXT: ldp q3, q2, [x0]
				; CHECK-NEXT: sunpkhi z4.s, z1.h
				; CHECK-NEXT: sunpklo z1.s, z1.h
				; CHECK-NEXT: sunpkhi z5.s, z2.h
				; CHECK-NEXT: sunpklo z2.s, z2.h
				; CHECK-NEXT: sdivr z4.s, p0/m, z4.s, z5.s
				; CHECK-NEXT: sunpkhi z5.s, z3.h
				; CHECK-NEXT: sunpklo z3.s, z3.h
				; CHECK-NEXT: sdiv z5.s, p0/m, z5.s, z6.s
				; CHECK-NEXT: sdivr z0.s, p0/m, z0.s, z3.s
				; CHECK-NEXT: sdivr z1.s, p0/m, z1.s, z2.s
				; CHECK-NEXT: uzp1 z0.h, z0.h, z5.h
				; CHECK-NEXT: uzp1 z1.h, z1.h, z4.h
				; CHECK-NEXT: stp q0, q1, [x0]
				; CHECK-NEXT: ret
				%op1 = load <16 x i16>, <16 x i16>* %a
				%op2 = load <16 x i16>, <16 x i16>* %b
				%res = sdiv <16 x i16> %op1, %op2
				store <16 x i16> %res, <16 x i16>* %a
				ret void
				}

				define <2 x i32> @sdiv_v2i32(<2 x i32> %op1, <2 x i32> %op2) #0 {
				; CHECK-LABEL: sdiv_v2i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.s, vl2
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: sdiv z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = sdiv <2 x i32> %op1, %op2
				ret <2 x i32> %res
				}

				define <4 x i32> @sdiv_v4i32(<4 x i32> %op1, <4 x i32> %op2) #0 {
				; CHECK-LABEL: sdiv_v4i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: ptrue p0.s, vl4
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: sdiv z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%res = sdiv <4 x i32> %op1, %op2
				ret <4 x i32> %res
				}

				define void @sdiv_v8i32(<8 x i32>* %a, <8 x i32>* %b) #0 {
				david-armUnsubmitted Done Reply Inline Actions This still has the `vscale_range(16,0)` attribute. Can you remove it and recreate the CHECK lines please? david-arm: This still has the `vscale_range(16,0)` attribute. Can you remove it and recreate the CHECK…
				; CHECK-LABEL: sdiv_v8i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ptrue p0.s, vl4
				; CHECK-NEXT: ldp q2, q3, [x1]
				; CHECK-NEXT: sdiv z0.s, p0/m, z0.s, z2.s
				; CHECK-NEXT: sdiv z1.s, p0/m, z1.s, z3.s
				; CHECK-NEXT: stp q0, q1, [x0]
				; CHECK-NEXT: ret
				%op1 = load <8 x i32>, <8 x i32>* %a
				%op2 = load <8 x i32>, <8 x i32>* %b
				%res = sdiv <8 x i32> %op1, %op2
				store <8 x i32> %res, <8 x i32>* %a
				ret void
				}

				define <1 x i64> @sdiv_v1i64(<1 x i64> %op1, <1 x i64> %op2) #0 {
				; CHECK-LABEL: sdiv_v1i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.d, vl1
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: sdiv z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = sdiv <1 x i64> %op1, %op2
				ret <1 x i64> %res
				}

				define <2 x i64> @sdiv_v2i64(<2 x i64> %op1, <2 x i64> %op2) #0 {
				; CHECK-LABEL: sdiv_v2i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: ptrue p0.d, vl2
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: sdiv z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%res = sdiv <2 x i64> %op1, %op2
				ret <2 x i64> %res
				}

				define void @sdiv_v4i64(<4 x i64>* %a, <4 x i64>* %b) #0 {
				; CHECK-LABEL: sdiv_v4i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ptrue p0.d, vl2
				; CHECK-NEXT: ldp q2, q3, [x1]
				; CHECK-NEXT: sdiv z0.d, p0/m, z0.d, z2.d
				; CHECK-NEXT: sdiv z1.d, p0/m, z1.d, z3.d
				; CHECK-NEXT: stp q0, q1, [x0]
				; CHECK-NEXT: ret
				%op1 = load <4 x i64>, <4 x i64>* %a
				%op2 = load <4 x i64>, <4 x i64>* %b
				%res = sdiv <4 x i64> %op1, %op2
				store <4 x i64> %res, <4 x i64>* %a
				ret void
				}

				;
				; UDIV
				;

				define <4 x i8> @udiv_v4i8(<4 x i8> %op1, <4 x i8> %op2) #0 {
				; CHECK-LABEL: udiv_v4i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: sub sp, sp, #16
				; CHECK-NEXT: .cfi_def_cfa_offset 16
				; CHECK-NEXT: adrp x8, .LCPI14_0
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.s, vl4
				; CHECK-NEXT: ldr d2, [x8, :lo12:.LCPI14_0]
				; CHECK-NEXT: and z0.d, z0.d, z2.d
				; CHECK-NEXT: and z1.d, z1.d, z2.d
				; CHECK-NEXT: uunpklo z1.s, z1.h
				; CHECK-NEXT: uunpklo z0.s, z0.h
				; CHECK-NEXT: udiv z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: fmov w8, s0
				; CHECK-NEXT: mov z1.s, z0.s[3]
				; CHECK-NEXT: mov z2.s, z0.s[2]
				; CHECK-NEXT: mov z0.s, z0.s[1]
				; CHECK-NEXT: fmov w9, s1
				; CHECK-NEXT: fmov w10, s2
				; CHECK-NEXT: strh w8, [sp, #8]
				; CHECK-NEXT: fmov w8, s0
				; CHECK-NEXT: strh w9, [sp, #14]
				; CHECK-NEXT: strh w10, [sp, #12]
				; CHECK-NEXT: strh w8, [sp, #10]
				; CHECK-NEXT: ldr d0, [sp, #8]
				; CHECK-NEXT: add sp, sp, #16
				; CHECK-NEXT: ret
				%res = udiv <4 x i8> %op1, %op2
				ret <4 x i8> %res
				}

				define <8 x i8> @udiv_v8i8(<8 x i8> %op1, <8 x i8> %op2) #0 {
				; CHECK-LABEL: udiv_v8i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: sub sp, sp, #16
				; CHECK-NEXT: .cfi_def_cfa_offset 16
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.s, vl4
				; CHECK-NEXT: uunpklo z1.h, z1.b
				; CHECK-NEXT: uunpklo z0.h, z0.b
				; CHECK-NEXT: uunpkhi z2.s, z1.h
				; CHECK-NEXT: uunpkhi z3.s, z0.h
				; CHECK-NEXT: uunpklo z1.s, z1.h
				; CHECK-NEXT: uunpklo z0.s, z0.h
				; CHECK-NEXT: udivr z2.s, p0/m, z2.s, z3.s
				; CHECK-NEXT: udiv z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: uzp1 z0.h, z0.h, z2.h
				; CHECK-NEXT: mov z1.h, z0.h[7]
				; CHECK-NEXT: fmov w8, s0
				; CHECK-NEXT: fmov w9, s1
				; CHECK-NEXT: mov z2.h, z0.h[6]
				; CHECK-NEXT: mov z3.h, z0.h[5]
				; CHECK-NEXT: mov z4.h, z0.h[4]
				; CHECK-NEXT: fmov w10, s2
				; CHECK-NEXT: strb w8, [sp, #8]
				; CHECK-NEXT: fmov w8, s3
				; CHECK-NEXT: strb w9, [sp, #15]
				; CHECK-NEXT: fmov w9, s4
				; CHECK-NEXT: mov z5.h, z0.h[3]
				; CHECK-NEXT: mov z6.h, z0.h[2]
				; CHECK-NEXT: mov z0.h, z0.h[1]
				; CHECK-NEXT: strb w10, [sp, #14]
				; CHECK-NEXT: fmov w10, s5
				; CHECK-NEXT: strb w8, [sp, #13]
				; CHECK-NEXT: fmov w8, s6
				; CHECK-NEXT: strb w9, [sp, #12]
				; CHECK-NEXT: fmov w9, s0
				; CHECK-NEXT: strb w10, [sp, #11]
				; CHECK-NEXT: strb w8, [sp, #10]
				; CHECK-NEXT: strb w9, [sp, #9]
				; CHECK-NEXT: ldr d0, [sp, #8]
				; CHECK-NEXT: add sp, sp, #16
				; CHECK-NEXT: ret
				%res = udiv <8 x i8> %op1, %op2
				ret <8 x i8> %res
				}

				define <16 x i8> @udiv_v16i8(<16 x i8> %op1, <16 x i8> %op2) #0 {
				; CHECK-LABEL: udiv_v16i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: uunpkhi z2.h, z1.b
				; CHECK-NEXT: uunpkhi z3.h, z0.b
				; CHECK-NEXT: ptrue p0.s, vl4
				; CHECK-NEXT: uunpklo z1.h, z1.b
				; CHECK-NEXT: uunpkhi z4.s, z2.h
				; CHECK-NEXT: uunpkhi z5.s, z3.h
				; CHECK-NEXT: uunpklo z2.s, z2.h
				; CHECK-NEXT: uunpklo z3.s, z3.h
				; CHECK-NEXT: uunpklo z0.h, z0.b
				; CHECK-NEXT: udivr z4.s, p0/m, z4.s, z5.s
				; CHECK-NEXT: udivr z2.s, p0/m, z2.s, z3.s
				; CHECK-NEXT: uunpkhi z3.s, z1.h
				; CHECK-NEXT: uunpkhi z5.s, z0.h
				; CHECK-NEXT: uunpklo z1.s, z1.h
				; CHECK-NEXT: uunpklo z0.s, z0.h
				; CHECK-NEXT: udivr z3.s, p0/m, z3.s, z5.s
				; CHECK-NEXT: udiv z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: uzp1 z1.h, z2.h, z4.h
				; CHECK-NEXT: uzp1 z0.h, z0.h, z3.h
				; CHECK-NEXT: uzp1 z0.b, z0.b, z1.b
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%res = udiv <16 x i8> %op1, %op2
				ret <16 x i8> %res
				}

				define void @udiv_v32i8(<32 x i8>* %a, <32 x i8>* %b) #0 {
				; CHECK-LABEL: udiv_v32i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q3, q0, [x1]
				; CHECK-NEXT: ptrue p0.s, vl4
				; CHECK-NEXT: ldp q2, q1, [x0]
				; CHECK-NEXT: uunpkhi z4.h, z0.b
				; CHECK-NEXT: uunpklo z0.h, z0.b
				; CHECK-NEXT: uunpkhi z6.s, z4.h
				; CHECK-NEXT: uunpklo z4.s, z4.h
				; CHECK-NEXT: uunpkhi z16.s, z0.h
				; CHECK-NEXT: uunpklo z0.s, z0.h
				; CHECK-NEXT: uunpkhi z5.h, z1.b
				; CHECK-NEXT: uunpklo z1.h, z1.b
				; CHECK-NEXT: uunpkhi z7.s, z5.h
				; CHECK-NEXT: uunpklo z5.s, z5.h
				; CHECK-NEXT: udivr z6.s, p0/m, z6.s, z7.s
				; CHECK-NEXT: udivr z4.s, p0/m, z4.s, z5.s
				; CHECK-NEXT: uunpkhi z5.s, z1.h
				; CHECK-NEXT: uunpklo z1.s, z1.h
				; CHECK-NEXT: uzp1 z4.h, z4.h, z6.h
				; CHECK-NEXT: udivr z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: uunpkhi z1.h, z3.b
				; CHECK-NEXT: uunpkhi z6.h, z2.b
				; CHECK-NEXT: udiv z5.s, p0/m, z5.s, z16.s
				; CHECK-NEXT: uunpkhi z7.s, z1.h
				; CHECK-NEXT: uunpkhi z16.s, z6.h
				; CHECK-NEXT: uunpklo z1.s, z1.h
				; CHECK-NEXT: uunpklo z6.s, z6.h
				; CHECK-NEXT: uunpklo z3.h, z3.b
				; CHECK-NEXT: uunpklo z2.h, z2.b
				; CHECK-NEXT: udivr z7.s, p0/m, z7.s, z16.s
				; CHECK-NEXT: udivr z1.s, p0/m, z1.s, z6.s
				; CHECK-NEXT: uunpkhi z6.s, z3.h
				; CHECK-NEXT: uunpkhi z16.s, z2.h
				; CHECK-NEXT: uunpklo z3.s, z3.h
				; CHECK-NEXT: uunpklo z2.s, z2.h
				; CHECK-NEXT: udivr z6.s, p0/m, z6.s, z16.s
				; CHECK-NEXT: udiv z2.s, p0/m, z2.s, z3.s
				; CHECK-NEXT: uzp1 z1.h, z1.h, z7.h
				; CHECK-NEXT: uzp1 z2.h, z2.h, z6.h
				; CHECK-NEXT: uzp1 z0.h, z0.h, z5.h
				; CHECK-NEXT: uzp1 z1.b, z2.b, z1.b
				; CHECK-NEXT: uzp1 z0.b, z0.b, z4.b
				; CHECK-NEXT: stp q1, q0, [x0]
				; CHECK-NEXT: ret
				%op1 = load <32 x i8>, <32 x i8>* %a
				%op2 = load <32 x i8>, <32 x i8>* %b
				%res = udiv <32 x i8> %op1, %op2
				store <32 x i8> %res, <32 x i8>* %a
				ret void
				}

				define <2 x i16> @udiv_v2i16(<2 x i16> %op1, <2 x i16> %op2) #0 {
				; CHECK-LABEL: udiv_v2i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: adrp x8, .LCPI18_0
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.s, vl2
				; CHECK-NEXT: ldr d2, [x8, :lo12:.LCPI18_0]
				; CHECK-NEXT: and z1.d, z1.d, z2.d
				; CHECK-NEXT: and z0.d, z0.d, z2.d
				; CHECK-NEXT: udiv z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = udiv <2 x i16> %op1, %op2
				ret <2 x i16> %res
				}

				define <4 x i16> @udiv_v4i16(<4 x i16> %op1, <4 x i16> %op2) #0 {
				; CHECK-LABEL: udiv_v4i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: sub sp, sp, #16
				; CHECK-NEXT: .cfi_def_cfa_offset 16
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.s, vl4
				; CHECK-NEXT: uunpklo z1.s, z1.h
				; CHECK-NEXT: uunpklo z0.s, z0.h
				; CHECK-NEXT: udiv z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: fmov w8, s0
				; CHECK-NEXT: mov z1.s, z0.s[3]
				; CHECK-NEXT: mov z2.s, z0.s[2]
				; CHECK-NEXT: mov z0.s, z0.s[1]
				; CHECK-NEXT: fmov w9, s1
				; CHECK-NEXT: fmov w10, s2
				; CHECK-NEXT: strh w8, [sp, #8]
				; CHECK-NEXT: fmov w8, s0
				; CHECK-NEXT: strh w9, [sp, #14]
				; CHECK-NEXT: strh w10, [sp, #12]
				; CHECK-NEXT: strh w8, [sp, #10]
				; CHECK-NEXT: ldr d0, [sp, #8]
				; CHECK-NEXT: add sp, sp, #16
				; CHECK-NEXT: ret
				%res = udiv <4 x i16> %op1, %op2
				ret <4 x i16> %res
				}

				define <8 x i16> @udiv_v8i16(<8 x i16> %op1, <8 x i16> %op2) #0 {
				; CHECK-LABEL: udiv_v8i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: ptrue p0.s, vl4
				; CHECK-NEXT: uunpkhi z2.s, z1.h
				; CHECK-NEXT: uunpkhi z3.s, z0.h
				; CHECK-NEXT: uunpklo z1.s, z1.h
				; CHECK-NEXT: uunpklo z0.s, z0.h
				; CHECK-NEXT: udivr z2.s, p0/m, z2.s, z3.s
				; CHECK-NEXT: udiv z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: uzp1 z0.h, z0.h, z2.h
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%res = udiv <8 x i16> %op1, %op2
				ret <8 x i16> %res
				}

				define void @udiv_v16i16(<16 x i16>* %a, <16 x i16>* %b) #0 {
				; CHECK-LABEL: udiv_v16i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x1]
				; CHECK-NEXT: ptrue p0.s, vl4
				; CHECK-NEXT: uunpkhi z6.s, z0.h
				; CHECK-NEXT: uunpklo z0.s, z0.h
				; CHECK-NEXT: ldp q3, q2, [x0]
				; CHECK-NEXT: uunpkhi z4.s, z1.h
				; CHECK-NEXT: uunpklo z1.s, z1.h
				; CHECK-NEXT: uunpkhi z5.s, z2.h
				; CHECK-NEXT: uunpklo z2.s, z2.h
				; CHECK-NEXT: udivr z4.s, p0/m, z4.s, z5.s
				; CHECK-NEXT: uunpkhi z5.s, z3.h
				; CHECK-NEXT: uunpklo z3.s, z3.h
				; CHECK-NEXT: udiv z5.s, p0/m, z5.s, z6.s
				; CHECK-NEXT: udivr z0.s, p0/m, z0.s, z3.s
				; CHECK-NEXT: udivr z1.s, p0/m, z1.s, z2.s
				; CHECK-NEXT: uzp1 z0.h, z0.h, z5.h
				; CHECK-NEXT: uzp1 z1.h, z1.h, z4.h
				; CHECK-NEXT: stp q0, q1, [x0]
				; CHECK-NEXT: ret
				%op1 = load <16 x i16>, <16 x i16>* %a
				%op2 = load <16 x i16>, <16 x i16>* %b
				%res = udiv <16 x i16> %op1, %op2
				store <16 x i16> %res, <16 x i16>* %a
				ret void
				}

				define <2 x i32> @udiv_v2i32(<2 x i32> %op1, <2 x i32> %op2) #0 {
				; CHECK-LABEL: udiv_v2i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.s, vl2
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: udiv z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = udiv <2 x i32> %op1, %op2
				ret <2 x i32> %res
				}

				define <4 x i32> @udiv_v4i32(<4 x i32> %op1, <4 x i32> %op2) #0 {
				; CHECK-LABEL: udiv_v4i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: ptrue p0.s, vl4
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: udiv z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%res = udiv <4 x i32> %op1, %op2
				ret <4 x i32> %res
				}

				define void @udiv_v8i32(<8 x i32>* %a, <8 x i32>* %b) #0 {
				; CHECK-LABEL: udiv_v8i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ptrue p0.s, vl4
				; CHECK-NEXT: ldp q2, q3, [x1]
				; CHECK-NEXT: udiv z0.s, p0/m, z0.s, z2.s
				; CHECK-NEXT: udiv z1.s, p0/m, z1.s, z3.s
				; CHECK-NEXT: stp q0, q1, [x0]
				; CHECK-NEXT: ret
				%op1 = load <8 x i32>, <8 x i32>* %a
				%op2 = load <8 x i32>, <8 x i32>* %b
				%res = udiv <8 x i32> %op1, %op2
				store <8 x i32> %res, <8 x i32>* %a
				ret void
				}

				define <1 x i64> @udiv_v1i64(<1 x i64> %op1, <1 x i64> %op2) #0 {
				; CHECK-LABEL: udiv_v1i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.d, vl1
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: udiv z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = udiv <1 x i64> %op1, %op2
				ret <1 x i64> %res
				}

				define <2 x i64> @udiv_v2i64(<2 x i64> %op1, <2 x i64> %op2) #0 {
				; CHECK-LABEL: udiv_v2i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: ptrue p0.d, vl2
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: udiv z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%res = udiv <2 x i64> %op1, %op2
				ret <2 x i64> %res
				}

				define void @udiv_v4i64(<4 x i64>* %a, <4 x i64>* %b) #0 {
				; CHECK-LABEL: udiv_v4i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ptrue p0.d, vl2
				; CHECK-NEXT: ldp q2, q3, [x1]
				; CHECK-NEXT: udiv z0.d, p0/m, z0.d, z2.d
				; CHECK-NEXT: udiv z1.d, p0/m, z1.d, z3.d
				; CHECK-NEXT: stp q0, q1, [x0]
				; CHECK-NEXT: ret
				%op1 = load <4 x i64>, <4 x i64>* %a
				%op2 = load <4 x i64>, <4 x i64>* %b
				%res = udiv <4 x i64> %op1, %op2
				store <4 x i64> %res, <4 x i64>* %a
				ret void
				}

				define void @udiv_constantsplat_v8i32(<8 x i32>* %a) #0 {
				; CHECK-LABEL: udiv_constantsplat_v8i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: adrp x8, .LCPI28_0
				; CHECK-NEXT: adrp x9, .LCPI28_1
				; CHECK-NEXT: ldp q1, q2, [x0]
				; CHECK-NEXT: ptrue p0.s, vl4
				; CHECK-NEXT: ldr q0, [x8, :lo12:.LCPI28_0]
				; CHECK-NEXT: adrp x8, .LCPI28_2
				; CHECK-NEXT: ldr q3, [x9, :lo12:.LCPI28_1]
				; CHECK-NEXT: movprfx z5, z1
				; CHECK-NEXT: umulh z5.s, p0/m, z5.s, z0.s
				; CHECK-NEXT: sub z1.s, z1.s, z5.s
				; CHECK-NEXT: umulh z0.s, p0/m, z0.s, z2.s
				; CHECK-NEXT: ldr q4, [x8, :lo12:.LCPI28_2]
				; CHECK-NEXT: sub z2.s, z2.s, z0.s
				; CHECK-NEXT: lsr z1.s, p0/m, z1.s, z3.s
				; CHECK-NEXT: lsr z2.s, p0/m, z2.s, z3.s
				; CHECK-NEXT: add z1.s, z1.s, z5.s
				; CHECK-NEXT: add z0.s, z2.s, z0.s
				; CHECK-NEXT: lsr z1.s, p0/m, z1.s, z4.s
				; CHECK-NEXT: lsr z0.s, p0/m, z0.s, z4.s
				; CHECK-NEXT: stp q1, q0, [x0]
				; CHECK-NEXT: ret
				%op1 = load <8 x i32>, <8 x i32>* %a
				%res = udiv <8 x i32> %op1, <i32 95, i32 95, i32 95, i32 95, i32 95, i32 95, i32 95, i32 95>
				store <8 x i32> %res, <8 x i32>* %a
				ret void
				}

				attributes #0 = { "target-features"="+sve" }
				david-armUnsubmitted Not Done Reply Inline Actions Again, this still has the `vscale_range(16,0)` attribute. Can you remove it and regenerate the CHECK lines? david-arm: Again, this still has the `vscale_range(16,0)` attribute. Can you remove it and regenerate the…
				david-armUnsubmitted Not Done Reply Inline Actions This is a NEON vector instruction - this is definitely illegal in streaming mode. Can you try to find out why this is being inserted please? david-arm: This is a NEON vector instruction - this is definitely illegal in streaming mode. Can you try…

llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-log.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -force-streaming-compatible-sve < %s \| FileCheck %s

				target triple = "aarch64-unknown-linux-gnu"

				;
				; AND
				;

				define <8 x i8> @and_v8i8(<8 x i8> %op1, <8 x i8> %op2) #0 {
				; CHECK-LABEL: and_v8i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: and z0.d, z0.d, z1.d
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = and <8 x i8> %op1, %op2
				ret <8 x i8> %res
				}

				define <16 x i8> @and_v16i8(<16 x i8> %op1, <16 x i8> %op2) #0 {
				; CHECK-LABEL: and_v16i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: and z0.d, z0.d, z1.d
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%res = and <16 x i8> %op1, %op2
				ret <16 x i8> %res
				}

				define void @and_v32i8(<32 x i8>* %a, <32 x i8>* %b) #0 {
				; CHECK-LABEL: and_v32i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ldp q2, q3, [x1]
				; CHECK-NEXT: and z0.d, z0.d, z2.d
				; CHECK-NEXT: and z1.d, z1.d, z3.d
				; CHECK-NEXT: stp q0, q1, [x0]
				; CHECK-NEXT: ret
				%op1 = load <32 x i8>, <32 x i8>* %a
				%op2 = load <32 x i8>, <32 x i8>* %b
				%res = and <32 x i8> %op1, %op2
				store <32 x i8> %res, <32 x i8>* %a
				ret void
				}

				define <4 x i16> @and_v4i16(<4 x i16> %op1, <4 x i16> %op2) #0 {
				; CHECK-LABEL: and_v4i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: and z0.d, z0.d, z1.d
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = and <4 x i16> %op1, %op2
				ret <4 x i16> %res
				}

				define <8 x i16> @and_v8i16(<8 x i16> %op1, <8 x i16> %op2) #0 {
				; CHECK-LABEL: and_v8i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: and z0.d, z0.d, z1.d
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%res = and <8 x i16> %op1, %op2
				ret <8 x i16> %res
				}

				define void @and_v16i16(<16 x i16>* %a, <16 x i16>* %b) #0 {
				; CHECK-LABEL: and_v16i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ldp q2, q3, [x1]
				; CHECK-NEXT: and z0.d, z0.d, z2.d
				; CHECK-NEXT: and z1.d, z1.d, z3.d
				; CHECK-NEXT: stp q0, q1, [x0]
				; CHECK-NEXT: ret
				%op1 = load <16 x i16>, <16 x i16>* %a
				%op2 = load <16 x i16>, <16 x i16>* %b
				%res = and <16 x i16> %op1, %op2
				store <16 x i16> %res, <16 x i16>* %a
				ret void
				}

				define <2 x i32> @and_v2i32(<2 x i32> %op1, <2 x i32> %op2) #0 {
				; CHECK-LABEL: and_v2i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: and z0.d, z0.d, z1.d
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = and <2 x i32> %op1, %op2
				ret <2 x i32> %res
				}

				define <4 x i32> @and_v4i32(<4 x i32> %op1, <4 x i32> %op2) #0 {
				; CHECK-LABEL: and_v4i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: and z0.d, z0.d, z1.d
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%res = and <4 x i32> %op1, %op2
				ret <4 x i32> %res
				}

				define void @and_v8i32(<8 x i32>* %a, <8 x i32>* %b) #0 {
				; CHECK-LABEL: and_v8i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ldp q2, q3, [x1]
				; CHECK-NEXT: and z0.d, z0.d, z2.d
				; CHECK-NEXT: and z1.d, z1.d, z3.d
				; CHECK-NEXT: stp q0, q1, [x0]
				; CHECK-NEXT: ret
				%op1 = load <8 x i32>, <8 x i32>* %a
				%op2 = load <8 x i32>, <8 x i32>* %b
				%res = and <8 x i32> %op1, %op2
				store <8 x i32> %res, <8 x i32>* %a
				ret void
				}

				define <1 x i64> @and_v1i64(<1 x i64> %op1, <1 x i64> %op2) #0 {
				; CHECK-LABEL: and_v1i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: and z0.d, z0.d, z1.d
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = and <1 x i64> %op1, %op2
				ret <1 x i64> %res
				}

				define <2 x i64> @and_v2i64(<2 x i64> %op1, <2 x i64> %op2) #0 {
				; CHECK-LABEL: and_v2i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: and z0.d, z0.d, z1.d
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%res = and <2 x i64> %op1, %op2
				ret <2 x i64> %res
				}

				define void @and_v4i64(<4 x i64>* %a, <4 x i64>* %b) #0 {
				; CHECK-LABEL: and_v4i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ldp q2, q3, [x1]
				; CHECK-NEXT: and z0.d, z0.d, z2.d
				; CHECK-NEXT: and z1.d, z1.d, z3.d
				; CHECK-NEXT: stp q0, q1, [x0]
				; CHECK-NEXT: ret
				%op1 = load <4 x i64>, <4 x i64>* %a
				%op2 = load <4 x i64>, <4 x i64>* %b
				%res = and <4 x i64> %op1, %op2
				store <4 x i64> %res, <4 x i64>* %a
				ret void
				}

				;
				; OR
				;

				define <8 x i8> @or_v8i8(<8 x i8> %op1, <8 x i8> %op2) #0 {
				; CHECK-LABEL: or_v8i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: orr z0.d, z0.d, z1.d
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = or <8 x i8> %op1, %op2
				ret <8 x i8> %res
				}

				define <16 x i8> @or_v16i8(<16 x i8> %op1, <16 x i8> %op2) #0 {
				; CHECK-LABEL: or_v16i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: orr z0.d, z0.d, z1.d
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%res = or <16 x i8> %op1, %op2
				ret <16 x i8> %res
				}

				define void @or_v32i8(<32 x i8>* %a, <32 x i8>* %b) #0 {
				; CHECK-LABEL: or_v32i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ldp q2, q3, [x1]
				; CHECK-NEXT: orr z0.d, z0.d, z2.d
				; CHECK-NEXT: orr z1.d, z1.d, z3.d
				; CHECK-NEXT: stp q0, q1, [x0]
				; CHECK-NEXT: ret
				%op1 = load <32 x i8>, <32 x i8>* %a
				%op2 = load <32 x i8>, <32 x i8>* %b
				%res = or <32 x i8> %op1, %op2
				store <32 x i8> %res, <32 x i8>* %a
				ret void
				}

				define <4 x i16> @or_v4i16(<4 x i16> %op1, <4 x i16> %op2) #0 {
				; CHECK-LABEL: or_v4i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: orr z0.d, z0.d, z1.d
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = or <4 x i16> %op1, %op2
				ret <4 x i16> %res
				}

				define <8 x i16> @or_v8i16(<8 x i16> %op1, <8 x i16> %op2) #0 {
				; CHECK-LABEL: or_v8i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: orr z0.d, z0.d, z1.d
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%res = or <8 x i16> %op1, %op2
				ret <8 x i16> %res
				}

				define void @or_v16i16(<16 x i16>* %a, <16 x i16>* %b) #0 {
				; CHECK-LABEL: or_v16i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ldp q2, q3, [x1]
				; CHECK-NEXT: orr z0.d, z0.d, z2.d
				; CHECK-NEXT: orr z1.d, z1.d, z3.d
				; CHECK-NEXT: stp q0, q1, [x0]
				; CHECK-NEXT: ret
				%op1 = load <16 x i16>, <16 x i16>* %a
				%op2 = load <16 x i16>, <16 x i16>* %b
				%res = or <16 x i16> %op1, %op2
				store <16 x i16> %res, <16 x i16>* %a
				ret void
				}

				define <2 x i32> @or_v2i32(<2 x i32> %op1, <2 x i32> %op2) #0 {
				; CHECK-LABEL: or_v2i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: orr z0.d, z0.d, z1.d
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = or <2 x i32> %op1, %op2
				ret <2 x i32> %res
				}

				define <4 x i32> @or_v4i32(<4 x i32> %op1, <4 x i32> %op2) #0 {
				; CHECK-LABEL: or_v4i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: orr z0.d, z0.d, z1.d
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%res = or <4 x i32> %op1, %op2
				ret <4 x i32> %res
				}

				define void @or_v8i32(<8 x i32>* %a, <8 x i32>* %b) #0 {
				; CHECK-LABEL: or_v8i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ldp q2, q3, [x1]
				; CHECK-NEXT: orr z0.d, z0.d, z2.d
				; CHECK-NEXT: orr z1.d, z1.d, z3.d
				; CHECK-NEXT: stp q0, q1, [x0]
				; CHECK-NEXT: ret
				%op1 = load <8 x i32>, <8 x i32>* %a
				%op2 = load <8 x i32>, <8 x i32>* %b
				%res = or <8 x i32> %op1, %op2
				store <8 x i32> %res, <8 x i32>* %a
				ret void
				}

				define <1 x i64> @or_v1i64(<1 x i64> %op1, <1 x i64> %op2) #0 {
				; CHECK-LABEL: or_v1i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: orr z0.d, z0.d, z1.d
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = or <1 x i64> %op1, %op2
				ret <1 x i64> %res
				}

				define <2 x i64> @or_v2i64(<2 x i64> %op1, <2 x i64> %op2) #0 {
				; CHECK-LABEL: or_v2i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: orr z0.d, z0.d, z1.d
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%res = or <2 x i64> %op1, %op2
				ret <2 x i64> %res
				}

				define void @or_v4i64(<4 x i64>* %a, <4 x i64>* %b) #0 {
				; CHECK-LABEL: or_v4i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ldp q2, q3, [x1]
				; CHECK-NEXT: orr z0.d, z0.d, z2.d
				; CHECK-NEXT: orr z1.d, z1.d, z3.d
				; CHECK-NEXT: stp q0, q1, [x0]
				; CHECK-NEXT: ret
				%op1 = load <4 x i64>, <4 x i64>* %a
				%op2 = load <4 x i64>, <4 x i64>* %b
				%res = or <4 x i64> %op1, %op2
				store <4 x i64> %res, <4 x i64>* %a
				ret void
				}

				;
				; XOR
				;

				define <8 x i8> @xor_v8i8(<8 x i8> %op1, <8 x i8> %op2) #0 {
				; CHECK-LABEL: xor_v8i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: eor z0.d, z0.d, z1.d
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = xor <8 x i8> %op1, %op2
				ret <8 x i8> %res
				}

				define <16 x i8> @xor_v16i8(<16 x i8> %op1, <16 x i8> %op2) #0 {
				; CHECK-LABEL: xor_v16i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: eor z0.d, z0.d, z1.d
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%res = xor <16 x i8> %op1, %op2
				ret <16 x i8> %res
				}

				define void @xor_v32i8(<32 x i8>* %a, <32 x i8>* %b) #0 {
				; CHECK-LABEL: xor_v32i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ldp q2, q3, [x1]
				; CHECK-NEXT: eor z0.d, z0.d, z2.d
				; CHECK-NEXT: eor z1.d, z1.d, z3.d
				; CHECK-NEXT: stp q0, q1, [x0]
				; CHECK-NEXT: ret
				%op1 = load <32 x i8>, <32 x i8>* %a
				%op2 = load <32 x i8>, <32 x i8>* %b
				%res = xor <32 x i8> %op1, %op2
				store <32 x i8> %res, <32 x i8>* %a
				ret void
				}

				define <4 x i16> @xor_v4i16(<4 x i16> %op1, <4 x i16> %op2) #0 {
				; CHECK-LABEL: xor_v4i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: eor z0.d, z0.d, z1.d
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = xor <4 x i16> %op1, %op2
				ret <4 x i16> %res
				}

				define <8 x i16> @xor_v8i16(<8 x i16> %op1, <8 x i16> %op2) #0 {
				; CHECK-LABEL: xor_v8i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: eor z0.d, z0.d, z1.d
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%res = xor <8 x i16> %op1, %op2
				ret <8 x i16> %res
				}

				define void @xor_v16i16(<16 x i16>* %a, <16 x i16>* %b) #0 {
				; CHECK-LABEL: xor_v16i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ldp q2, q3, [x1]
				; CHECK-NEXT: eor z0.d, z0.d, z2.d
				; CHECK-NEXT: eor z1.d, z1.d, z3.d
				; CHECK-NEXT: stp q0, q1, [x0]
				; CHECK-NEXT: ret
				%op1 = load <16 x i16>, <16 x i16>* %a
				%op2 = load <16 x i16>, <16 x i16>* %b
				%res = xor <16 x i16> %op1, %op2
				store <16 x i16> %res, <16 x i16>* %a
				ret void
				}

				define <2 x i32> @xor_v2i32(<2 x i32> %op1, <2 x i32> %op2) #0 {
				; CHECK-LABEL: xor_v2i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: eor z0.d, z0.d, z1.d
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = xor <2 x i32> %op1, %op2
				ret <2 x i32> %res
				}

				define <4 x i32> @xor_v4i32(<4 x i32> %op1, <4 x i32> %op2) #0 {
				; CHECK-LABEL: xor_v4i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: eor z0.d, z0.d, z1.d
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%res = xor <4 x i32> %op1, %op2
				ret <4 x i32> %res
				}

				define void @xor_v8i32(<8 x i32>* %a, <8 x i32>* %b) #0 {
				; CHECK-LABEL: xor_v8i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ldp q2, q3, [x1]
				; CHECK-NEXT: eor z0.d, z0.d, z2.d
				; CHECK-NEXT: eor z1.d, z1.d, z3.d
				; CHECK-NEXT: stp q0, q1, [x0]
				; CHECK-NEXT: ret
				%op1 = load <8 x i32>, <8 x i32>* %a
				%op2 = load <8 x i32>, <8 x i32>* %b
				%res = xor <8 x i32> %op1, %op2
				store <8 x i32> %res, <8 x i32>* %a
				ret void
				}

				define <1 x i64> @xor_v1i64(<1 x i64> %op1, <1 x i64> %op2) #0 {
				; CHECK-LABEL: xor_v1i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: eor z0.d, z0.d, z1.d
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = xor <1 x i64> %op1, %op2
				ret <1 x i64> %res
				}

				define <2 x i64> @xor_v2i64(<2 x i64> %op1, <2 x i64> %op2) #0 {
				; CHECK-LABEL: xor_v2i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: eor z0.d, z0.d, z1.d
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%res = xor <2 x i64> %op1, %op2
				ret <2 x i64> %res
				}

				define void @xor_v4i64(<4 x i64>* %a, <4 x i64>* %b) #0 {
				; CHECK-LABEL: xor_v4i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ldp q2, q3, [x1]
				; CHECK-NEXT: eor z0.d, z0.d, z2.d
				; CHECK-NEXT: eor z1.d, z1.d, z3.d
				; CHECK-NEXT: stp q0, q1, [x0]
				; CHECK-NEXT: ret
				%op1 = load <4 x i64>, <4 x i64>* %a
				%op2 = load <4 x i64>, <4 x i64>* %b
				%res = xor <4 x i64> %op1, %op2
				store <4 x i64> %res, <4 x i64>* %a
				ret void
				}

				attributes #0 = { "target-features"="+sve" }

llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-mulh.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -force-streaming-compatible-sve < %s \| FileCheck %s

				; This test only tests the legal types for a given vector width, as mulh nodes
				; do not get generated for non-legal types.

				target triple = "aarch64-unknown-linux-gnu"

				;
				; SMULH
				;

				define <4 x i8> @smulh_v4i8(<4 x i8> %op1, <4 x i8> %op2) #0 {
				; CHECK-LABEL: smulh_v4i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: adrp x8, .LCPI0_0
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				david-armUnsubmitted Done Reply Inline Actions Can you add a test for an illegal type such as `<4 x i8>` too? david-arm: Can you add a test for an illegal type such as `<4 x i8>` too?
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.h, vl4
				; CHECK-NEXT: ldr d2, [x8, :lo12:.LCPI0_0]
				; CHECK-NEXT: adrp x8, .LCPI0_1
				; CHECK-NEXT: lsl z0.h, p0/m, z0.h, z2.h
				; CHECK-NEXT: lsl z1.h, p0/m, z1.h, z2.h
				; CHECK-NEXT: ldr d3, [x8, :lo12:.LCPI0_1]
				; CHECK-NEXT: asr z0.h, p0/m, z0.h, z2.h
				; CHECK-NEXT: asr z1.h, p0/m, z1.h, z2.h
				; CHECK-NEXT: mul z0.h, p0/m, z0.h, z1.h
				; CHECK-NEXT: lsr z0.h, p0/m, z0.h, z3.h
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%insert = insertelement <4 x i16> undef, i16 4, i64 0
				%splat = shufflevector <4 x i16> %insert, <4 x i16> undef, <4 x i32> zeroinitializer
				%1 = sext <4 x i8> %op1 to <4 x i16>
				%2 = sext <4 x i8> %op2 to <4 x i16>
				%mul = mul <4 x i16> %1, %2
				%shr = lshr <4 x i16> %mul, <i16 4, i16 4, i16 4, i16 4>
				%res = trunc <4 x i16> %shr to <4 x i8>
				ret <4 x i8> %res
				}

				define <8 x i8> @smulh_v8i8(<8 x i8> %op1, <8 x i8> %op2) #0 {
				; CHECK-LABEL: smulh_v8i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.b, vl8
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: smulh z0.b, p0/m, z0.b, z1.b
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%insert = insertelement <8 x i16> undef, i16 8, i64 0
				%splat = shufflevector <8 x i16> %insert, <8 x i16> undef, <8 x i32> zeroinitializer
				%1 = sext <8 x i8> %op1 to <8 x i16>
				%2 = sext <8 x i8> %op2 to <8 x i16>
				%mul = mul <8 x i16> %1, %2
				%shr = lshr <8 x i16> %mul, <i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8>
				%res = trunc <8 x i16> %shr to <8 x i8>
				ret <8 x i8> %res
				}

				define <16 x i8> @smulh_v16i8(<16 x i8> %op1, <16 x i8> %op2) #0 {
				; CHECK-LABEL: smulh_v16i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: ptrue p0.b, vl16
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: smulh z0.b, p0/m, z0.b, z1.b
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%1 = sext <16 x i8> %op1 to <16 x i16>
				%2 = sext <16 x i8> %op2 to <16 x i16>
				%mul = mul <16 x i16> %1, %2
				%shr = lshr <16 x i16> %mul, <i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8>
				%res = trunc <16 x i16> %shr to <16 x i8>
				ret <16 x i8> %res
				}

				define void @smulh_v32i8(<32 x i8>* %a, <32 x i8>* %b) #0 {
				; CHECK-LABEL: smulh_v32i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: sub sp, sp, #32
				; CHECK-NEXT: .cfi_def_cfa_offset 32
				; CHECK-NEXT: ldp q2, q3, [x0]
				; CHECK-NEXT: adrp x8, .LCPI3_0
				; CHECK-NEXT: ptrue p0.h, vl8
				; CHECK-NEXT: sunpklo z0.h, z2.b
				; CHECK-NEXT: ext z2.b, z2.b, z2.b, #8
				; CHECK-NEXT: sunpklo z2.h, z2.b
				; CHECK-NEXT: ldp q4, q5, [x1]
				; CHECK-NEXT: sunpklo z6.h, z3.b
				; CHECK-NEXT: ext z3.b, z3.b, z3.b, #8
				; CHECK-NEXT: sunpklo z3.h, z3.b
				; CHECK-NEXT: sunpklo z1.h, z4.b
				; CHECK-NEXT: ext z4.b, z4.b, z4.b, #8
				; CHECK-NEXT: sunpklo z4.h, z4.b
				; CHECK-NEXT: mul z0.h, p0/m, z0.h, z1.h
				; CHECK-NEXT: sunpklo z7.h, z5.b
				; CHECK-NEXT: ext z5.b, z5.b, z5.b, #8
				; CHECK-NEXT: ldr q16, [x8, :lo12:.LCPI3_0]
				; CHECK-NEXT: sunpklo z5.h, z5.b
				; CHECK-NEXT: mul z3.h, p0/m, z3.h, z5.h
				; CHECK-NEXT: movprfx z5, z6
				; CHECK-NEXT: mul z5.h, p0/m, z5.h, z7.h
				; CHECK-NEXT: mul z2.h, p0/m, z2.h, z4.h
				; CHECK-NEXT: movprfx z4, z5
				; CHECK-NEXT: lsr z4.h, p0/m, z4.h, z16.h
				; CHECK-NEXT: lsr z3.h, p0/m, z3.h, z16.h
				; CHECK-NEXT: fmov w9, s4
				; CHECK-NEXT: fmov w8, s3
				; CHECK-NEXT: mov z5.h, z3.h[7]
				; CHECK-NEXT: mov z6.h, z3.h[6]
				; CHECK-NEXT: mov z7.h, z3.h[5]
				; CHECK-NEXT: fmov w10, s5
				; CHECK-NEXT: strb w9, [sp, #16]
				; CHECK-NEXT: strb w8, [sp, #24]
				; CHECK-NEXT: fmov w8, s6
				; CHECK-NEXT: fmov w9, s7
				; CHECK-NEXT: mov z17.h, z3.h[4]
				; CHECK-NEXT: mov z18.h, z3.h[3]
				; CHECK-NEXT: mov z19.h, z3.h[2]
				; CHECK-NEXT: strb w10, [sp, #31]
				; CHECK-NEXT: fmov w10, s17
				; CHECK-NEXT: strb w8, [sp, #30]
				; CHECK-NEXT: fmov w8, s18
				; CHECK-NEXT: strb w9, [sp, #29]
				; CHECK-NEXT: fmov w9, s19
				; CHECK-NEXT: mov z20.h, z3.h[1]
				; CHECK-NEXT: mov z3.h, z4.h[7]
				; CHECK-NEXT: mov z21.h, z4.h[6]
				; CHECK-NEXT: strb w10, [sp, #28]
				; CHECK-NEXT: fmov w10, s20
				; CHECK-NEXT: strb w8, [sp, #27]
				; CHECK-NEXT: fmov w8, s3
				; CHECK-NEXT: strb w9, [sp, #26]
				; CHECK-NEXT: fmov w9, s21
				; CHECK-NEXT: mov z22.h, z4.h[5]
				; CHECK-NEXT: mov z23.h, z4.h[4]
				; CHECK-NEXT: mov z24.h, z4.h[3]
				; CHECK-NEXT: strb w10, [sp, #25]
				; CHECK-NEXT: fmov w10, s22
				; CHECK-NEXT: strb w8, [sp, #23]
				; CHECK-NEXT: fmov w8, s23
				; CHECK-NEXT: strb w9, [sp, #22]
				; CHECK-NEXT: fmov w9, s24
				; CHECK-NEXT: mov z25.h, z4.h[2]
				; CHECK-NEXT: mov z26.h, z4.h[1]
				; CHECK-NEXT: strb w10, [sp, #21]
				; CHECK-NEXT: fmov w10, s25
				; CHECK-NEXT: strb w8, [sp, #20]
				; CHECK-NEXT: movprfx z1, z2
				; CHECK-NEXT: lsr z1.h, p0/m, z1.h, z16.h
				; CHECK-NEXT: strb w9, [sp, #19]
				; CHECK-NEXT: fmov w8, s26
				; CHECK-NEXT: fmov w9, s1
				; CHECK-NEXT: lsr z0.h, p0/m, z0.h, z16.h
				; CHECK-NEXT: mov z2.h, z1.h[7]
				; CHECK-NEXT: mov z3.h, z1.h[6]
				; CHECK-NEXT: strb w10, [sp, #18]
				; CHECK-NEXT: fmov w10, s0
				; CHECK-NEXT: strb w8, [sp, #17]
				; CHECK-NEXT: fmov w8, s2
				; CHECK-NEXT: strb w9, [sp, #8]
				; CHECK-NEXT: fmov w9, s3
				; CHECK-NEXT: mov z4.h, z1.h[5]
				; CHECK-NEXT: mov z5.h, z1.h[4]
				; CHECK-NEXT: mov z6.h, z1.h[3]
				; CHECK-NEXT: strb w10, [sp]
				; CHECK-NEXT: fmov w10, s4
				; CHECK-NEXT: strb w8, [sp, #15]
				; CHECK-NEXT: fmov w8, s5
				; CHECK-NEXT: strb w9, [sp, #14]
				; CHECK-NEXT: fmov w9, s6
				; CHECK-NEXT: mov z7.h, z1.h[2]
				; CHECK-NEXT: mov z16.h, z1.h[1]
				; CHECK-NEXT: mov z1.h, z0.h[7]
				; CHECK-NEXT: strb w10, [sp, #13]
				; CHECK-NEXT: fmov w10, s7
				; CHECK-NEXT: strb w8, [sp, #12]
				; CHECK-NEXT: fmov w8, s16
				; CHECK-NEXT: strb w9, [sp, #11]
				; CHECK-NEXT: fmov w9, s1
				; CHECK-NEXT: mov z17.h, z0.h[6]
				; CHECK-NEXT: mov z18.h, z0.h[5]
				; CHECK-NEXT: mov z19.h, z0.h[4]
				; CHECK-NEXT: strb w10, [sp, #10]
				; CHECK-NEXT: fmov w10, s17
				; CHECK-NEXT: strb w8, [sp, #9]
				; CHECK-NEXT: fmov w8, s18
				; CHECK-NEXT: strb w9, [sp, #7]
				; CHECK-NEXT: fmov w9, s19
				; CHECK-NEXT: mov z20.h, z0.h[3]
				; CHECK-NEXT: mov z21.h, z0.h[2]
				; CHECK-NEXT: mov z22.h, z0.h[1]
				; CHECK-NEXT: strb w10, [sp, #6]
				; CHECK-NEXT: fmov w10, s20
				; CHECK-NEXT: strb w8, [sp, #5]
				; CHECK-NEXT: fmov w8, s21
				; CHECK-NEXT: strb w9, [sp, #4]
				; CHECK-NEXT: fmov w9, s22
				; CHECK-NEXT: strb w10, [sp, #3]
				; CHECK-NEXT: strb w8, [sp, #2]
				; CHECK-NEXT: strb w9, [sp, #1]
				; CHECK-NEXT: ldp q0, q1, [sp]
				; CHECK-NEXT: stp q0, q1, [x0]
				; CHECK-NEXT: add sp, sp, #32
				; CHECK-NEXT: ret
				%op1 = load <32 x i8>, <32 x i8>* %a
				%op2 = load <32 x i8>, <32 x i8>* %b
				%1 = sext <32 x i8> %op1 to <32 x i16>
				%2 = sext <32 x i8> %op2 to <32 x i16>
				%mul = mul <32 x i16> %1, %2
				%shr = lshr <32 x i16> %mul, <i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8>
				%res = trunc <32 x i16> %shr to <32 x i8>
				store <32 x i8> %res, <32 x i8>* %a
				ret void
				}

				define <2 x i16> @smulh_v2i16(<2 x i16> %op1, <2 x i16> %op2) #0 {
				; CHECK-LABEL: smulh_v2i16:
				david-armUnsubmitted Not Done Reply Inline Actions Wow, this code surely gets an award for being so impressively bad?! david-arm: Wow, this code surely gets an award for being so impressively bad?!
				; CHECK: // %bb.0:
				; CHECK-NEXT: adrp x8, .LCPI4_0
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.s, vl2
				; CHECK-NEXT: ldr d2, [x8, :lo12:.LCPI4_0]
				; CHECK-NEXT: lsl z0.s, p0/m, z0.s, z2.s
				; CHECK-NEXT: lsl z1.s, p0/m, z1.s, z2.s
				; CHECK-NEXT: asr z0.s, p0/m, z0.s, z2.s
				; CHECK-NEXT: asr z1.s, p0/m, z1.s, z2.s
				; CHECK-NEXT: mul z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: lsr z0.s, p0/m, z0.s, z2.s
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%1 = sext <2 x i16> %op1 to <2 x i32>
				%2 = sext <2 x i16> %op2 to <2 x i32>
				%mul = mul <2 x i32> %1, %2
				%shr = lshr <2 x i32> %mul, <i32 16, i32 16>
				%res = trunc <2 x i32> %shr to <2 x i16>
				ret <2 x i16> %res
				}

				define <4 x i16> @smulh_v4i16(<4 x i16> %op1, <4 x i16> %op2) #0 {
				; CHECK-LABEL: smulh_v4i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.h, vl4
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: smulh z0.h, p0/m, z0.h, z1.h
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%1 = sext <4 x i16> %op1 to <4 x i32>
				%2 = sext <4 x i16> %op2 to <4 x i32>
				%mul = mul <4 x i32> %1, %2
				%shr = lshr <4 x i32> %mul, <i32 16, i32 16, i32 16, i32 16>
				%res = trunc <4 x i32> %shr to <4 x i16>
				ret <4 x i16> %res
				}

				define <8 x i16> @smulh_v8i16(<8 x i16> %op1, <8 x i16> %op2) #0 {
				; CHECK-LABEL: smulh_v8i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: ptrue p0.h, vl8
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: smulh z0.h, p0/m, z0.h, z1.h
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%1 = sext <8 x i16> %op1 to <8 x i32>
				%2 = sext <8 x i16> %op2 to <8 x i32>
				%mul = mul <8 x i32> %1, %2
				%shr = lshr <8 x i32> %mul, <i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>
				%res = trunc <8 x i32> %shr to <8 x i16>
				ret <8 x i16> %res
				}

				define void @smulh_v16i16(<16 x i16>* %a, <16 x i16>* %b) #0 {
				; CHECK-LABEL: smulh_v16i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ptrue p0.h, vl4
				; CHECK-NEXT: mov z5.d, z0.d
				; CHECK-NEXT: ext z5.b, z5.b, z5.b, #8
				; CHECK-NEXT: ldp q2, q3, [x1]
				; CHECK-NEXT: mov z4.d, z1.d
				; CHECK-NEXT: ext z4.b, z4.b, z4.b, #8
				; CHECK-NEXT: mov z6.d, z2.d
				; CHECK-NEXT: smulh z0.h, p0/m, z0.h, z2.h
				; CHECK-NEXT: ext z6.b, z6.b, z6.b, #8
				; CHECK-NEXT: mov z2.d, z3.d
				; CHECK-NEXT: smulh z1.h, p0/m, z1.h, z3.h
				; CHECK-NEXT: ext z2.b, z2.b, z3.b, #8
				; CHECK-NEXT: movprfx z3, z5
				; CHECK-NEXT: smulh z3.h, p0/m, z3.h, z6.h
				; CHECK-NEXT: smulh z2.h, p0/m, z2.h, z4.h
				; CHECK-NEXT: splice z0.h, p0, z0.h, z3.h
				; CHECK-NEXT: splice z1.h, p0, z1.h, z2.h
				; CHECK-NEXT: stp q0, q1, [x0]
				; CHECK-NEXT: ret
				%op1 = load <16 x i16>, <16 x i16>* %a
				%op2 = load <16 x i16>, <16 x i16>* %b
				%1 = sext <16 x i16> %op1 to <16 x i32>
				%2 = sext <16 x i16> %op2 to <16 x i32>
				%mul = mul <16 x i32> %1, %2
				%shr = lshr <16 x i32> %mul, <i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>
				%res = trunc <16 x i32> %shr to <16 x i16>
				store <16 x i16> %res, <16 x i16>* %a
				ret void
				}

				define <2 x i32> @smulh_v2i32(<2 x i32> %op1, <2 x i32> %op2) #0 {
				; CHECK-LABEL: smulh_v2i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.s, vl2
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: smulh z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%1 = sext <2 x i32> %op1 to <2 x i64>
				%2 = sext <2 x i32> %op2 to <2 x i64>
				%mul = mul <2 x i64> %1, %2
				%shr = lshr <2 x i64> %mul, <i64 32, i64 32>
				%res = trunc <2 x i64> %shr to <2 x i32>
				ret <2 x i32> %res
				}

				define <4 x i32> @smulh_v4i32(<4 x i32> %op1, <4 x i32> %op2) #0 {
				; CHECK-LABEL: smulh_v4i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: ptrue p0.s, vl4
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: smulh z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%1 = sext <4 x i32> %op1 to <4 x i64>
				%2 = sext <4 x i32> %op2 to <4 x i64>
				%mul = mul <4 x i64> %1, %2
				%shr = lshr <4 x i64> %mul, <i64 32, i64 32, i64 32, i64 32>
				%res = trunc <4 x i64> %shr to <4 x i32>
				ret <4 x i32> %res
				}

				define void @smulh_v8i32(<8 x i32>* %a, <8 x i32>* %b) #0 {
				; CHECK-LABEL: smulh_v8i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ptrue p0.s, vl2
				; CHECK-NEXT: mov z5.d, z0.d
				; CHECK-NEXT: ext z5.b, z5.b, z5.b, #8
				; CHECK-NEXT: ldp q2, q3, [x1]
				; CHECK-NEXT: mov z4.d, z1.d
				; CHECK-NEXT: ext z4.b, z4.b, z4.b, #8
				; CHECK-NEXT: mov z6.d, z2.d
				; CHECK-NEXT: smulh z0.s, p0/m, z0.s, z2.s
				; CHECK-NEXT: ext z6.b, z6.b, z6.b, #8
				; CHECK-NEXT: mov z2.d, z3.d
				; CHECK-NEXT: smulh z1.s, p0/m, z1.s, z3.s
				; CHECK-NEXT: ext z2.b, z2.b, z3.b, #8
				; CHECK-NEXT: movprfx z3, z5
				; CHECK-NEXT: smulh z3.s, p0/m, z3.s, z6.s
				; CHECK-NEXT: smulh z2.s, p0/m, z2.s, z4.s
				; CHECK-NEXT: splice z0.s, p0, z0.s, z3.s
				; CHECK-NEXT: splice z1.s, p0, z1.s, z2.s
				; CHECK-NEXT: stp q0, q1, [x0]
				; CHECK-NEXT: ret
				%op1 = load <8 x i32>, <8 x i32>* %a
				%op2 = load <8 x i32>, <8 x i32>* %b
				%1 = sext <8 x i32> %op1 to <8 x i64>
				%2 = sext <8 x i32> %op2 to <8 x i64>
				%mul = mul <8 x i64> %1, %2
				%shr = lshr <8 x i64> %mul, <i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32>
				%res = trunc <8 x i64> %shr to <8 x i32>
				store <8 x i32> %res, <8 x i32>* %a
				ret void
				}

				define <1 x i64> @smulh_v1i64(<1 x i64> %op1, <1 x i64> %op2) #0 {
				; CHECK-LABEL: smulh_v1i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.d, vl1
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: smulh z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%insert = insertelement <1 x i128> undef, i128 64, i128 0
				%splat = shufflevector <1 x i128> %insert, <1 x i128> undef, <1 x i32> zeroinitializer
				%1 = sext <1 x i64> %op1 to <1 x i128>
				%2 = sext <1 x i64> %op2 to <1 x i128>
				%mul = mul <1 x i128> %1, %2
				%shr = lshr <1 x i128> %mul, %splat
				%res = trunc <1 x i128> %shr to <1 x i64>
				ret <1 x i64> %res
				}

				define <2 x i64> @smulh_v2i64(<2 x i64> %op1, <2 x i64> %op2) #0 {
				; CHECK-LABEL: smulh_v2i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: ptrue p0.d, vl2
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: smulh z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%1 = sext <2 x i64> %op1 to <2 x i128>
				%2 = sext <2 x i64> %op2 to <2 x i128>
				%mul = mul <2 x i128> %1, %2
				%shr = lshr <2 x i128> %mul, <i128 64, i128 64>
				%res = trunc <2 x i128> %shr to <2 x i64>
				ret <2 x i64> %res
				}

				define void @smulh_v4i64(<4 x i64>* %a, <4 x i64>* %b) #0 {
				; CHECK-LABEL: smulh_v4i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ptrue p0.d, vl1
				; CHECK-NEXT: fmov x9, d0
				; CHECK-NEXT: ldp q2, q3, [x1]
				; CHECK-NEXT: mov z4.d, z1.d[1]
				; CHECK-NEXT: fmov x8, d1
				; CHECK-NEXT: mov z1.d, z0.d[1]
				; CHECK-NEXT: fmov x13, d4
				; CHECK-NEXT: fmov x10, d1
				; CHECK-NEXT: mov z0.d, z2.d[1]
				; CHECK-NEXT: fmov x12, d2
				; CHECK-NEXT: fmov x11, d0
				; CHECK-NEXT: mov z0.d, z3.d[1]
				; CHECK-NEXT: fmov x14, d0
				; CHECK-NEXT: smulh x9, x9, x12
				; CHECK-NEXT: smulh x10, x10, x11
				; CHECK-NEXT: fmov x11, d3
				; CHECK-NEXT: smulh x12, x13, x14
				; CHECK-NEXT: smulh x8, x8, x11
				; CHECK-NEXT: fmov d0, x9
				; CHECK-NEXT: fmov d1, x10
				; CHECK-NEXT: fmov d3, x12
				; CHECK-NEXT: fmov d2, x8
				; CHECK-NEXT: splice z0.d, p0, z0.d, z1.d
				; CHECK-NEXT: splice z2.d, p0, z2.d, z3.d
				; CHECK-NEXT: stp q0, q2, [x0]
				; CHECK-NEXT: ret
				%op1 = load <4 x i64>, <4 x i64>* %a
				%op2 = load <4 x i64>, <4 x i64>* %b
				%1 = sext <4 x i64> %op1 to <4 x i128>
				%2 = sext <4 x i64> %op2 to <4 x i128>
				%mul = mul <4 x i128> %1, %2
				%shr = lshr <4 x i128> %mul, <i128 64, i128 64, i128 64, i128 64>
				%res = trunc <4 x i128> %shr to <4 x i64>
				store <4 x i64> %res, <4 x i64>* %a
				ret void
				}

				;
				; UMULH
				;

				define <4 x i8> @umulh_v4i8(<4 x i8> %op1, <4 x i8> %op2) #0 {
				; CHECK-LABEL: umulh_v4i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: adrp x8, .LCPI14_0
				; CHECK-NEXT: adrp x9, .LCPI14_1
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.h, vl4
				; CHECK-NEXT: ldr d2, [x8, :lo12:.LCPI14_0]
				; CHECK-NEXT: ldr d3, [x9, :lo12:.LCPI14_1]
				; CHECK-NEXT: and z0.d, z0.d, z2.d
				; CHECK-NEXT: and z1.d, z1.d, z2.d
				; CHECK-NEXT: mul z0.h, p0/m, z0.h, z1.h
				; CHECK-NEXT: lsr z0.h, p0/m, z0.h, z3.h
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%1 = zext <4 x i8> %op1 to <4 x i16>
				%2 = zext <4 x i8> %op2 to <4 x i16>
				%mul = mul <4 x i16> %1, %2
				%shr = lshr <4 x i16> %mul, <i16 4, i16 4, i16 4, i16 4>
				%res = trunc <4 x i16> %shr to <4 x i8>
				ret <4 x i8> %res
				}

				define <8 x i8> @umulh_v8i8(<8 x i8> %op1, <8 x i8> %op2) #0 {
				; CHECK-LABEL: umulh_v8i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.b, vl8
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: umulh z0.b, p0/m, z0.b, z1.b
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%1 = zext <8 x i8> %op1 to <8 x i16>
				%2 = zext <8 x i8> %op2 to <8 x i16>
				%mul = mul <8 x i16> %1, %2
				%shr = lshr <8 x i16> %mul, <i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8>
				%res = trunc <8 x i16> %shr to <8 x i8>
				ret <8 x i8> %res
				}

				define <16 x i8> @umulh_v16i8(<16 x i8> %op1, <16 x i8> %op2) #0 {
				; CHECK-LABEL: umulh_v16i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: ptrue p0.b, vl16
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: umulh z0.b, p0/m, z0.b, z1.b
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%1 = zext <16 x i8> %op1 to <16 x i16>
				%2 = zext <16 x i8> %op2 to <16 x i16>
				%mul = mul <16 x i16> %1, %2
				%shr = lshr <16 x i16> %mul, <i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8>
				%res = trunc <16 x i16> %shr to <16 x i8>
				ret <16 x i8> %res
				}

				define void @umulh_v32i8(<32 x i8>* %a, <32 x i8>* %b) #0 {
				; CHECK-LABEL: umulh_v32i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: sub sp, sp, #32
				; CHECK-NEXT: .cfi_def_cfa_offset 32
				; CHECK-NEXT: ldp q2, q3, [x0]
				; CHECK-NEXT: adrp x8, .LCPI17_0
				; CHECK-NEXT: ptrue p0.h, vl8
				; CHECK-NEXT: uunpklo z0.h, z2.b
				; CHECK-NEXT: ext z2.b, z2.b, z2.b, #8
				; CHECK-NEXT: uunpklo z2.h, z2.b
				; CHECK-NEXT: ldp q4, q5, [x1]
				; CHECK-NEXT: uunpklo z6.h, z3.b
				; CHECK-NEXT: ext z3.b, z3.b, z3.b, #8
				; CHECK-NEXT: uunpklo z3.h, z3.b
				; CHECK-NEXT: uunpklo z1.h, z4.b
				; CHECK-NEXT: ext z4.b, z4.b, z4.b, #8
				; CHECK-NEXT: uunpklo z4.h, z4.b
				; CHECK-NEXT: mul z0.h, p0/m, z0.h, z1.h
				; CHECK-NEXT: uunpklo z7.h, z5.b
				; CHECK-NEXT: ext z5.b, z5.b, z5.b, #8
				; CHECK-NEXT: ldr q16, [x8, :lo12:.LCPI17_0]
				; CHECK-NEXT: uunpklo z5.h, z5.b
				; CHECK-NEXT: mul z3.h, p0/m, z3.h, z5.h
				; CHECK-NEXT: movprfx z5, z6
				; CHECK-NEXT: mul z5.h, p0/m, z5.h, z7.h
				; CHECK-NEXT: mul z2.h, p0/m, z2.h, z4.h
				; CHECK-NEXT: movprfx z4, z5
				; CHECK-NEXT: lsr z4.h, p0/m, z4.h, z16.h
				; CHECK-NEXT: lsr z3.h, p0/m, z3.h, z16.h
				; CHECK-NEXT: fmov w9, s4
				; CHECK-NEXT: fmov w8, s3
				; CHECK-NEXT: mov z5.h, z3.h[7]
				; CHECK-NEXT: mov z6.h, z3.h[6]
				; CHECK-NEXT: mov z7.h, z3.h[5]
				; CHECK-NEXT: fmov w10, s5
				; CHECK-NEXT: strb w9, [sp, #16]
				; CHECK-NEXT: strb w8, [sp, #24]
				; CHECK-NEXT: fmov w8, s6
				; CHECK-NEXT: fmov w9, s7
				; CHECK-NEXT: mov z17.h, z3.h[4]
				; CHECK-NEXT: mov z18.h, z3.h[3]
				; CHECK-NEXT: mov z19.h, z3.h[2]
				; CHECK-NEXT: strb w10, [sp, #31]
				; CHECK-NEXT: fmov w10, s17
				; CHECK-NEXT: strb w8, [sp, #30]
				; CHECK-NEXT: fmov w8, s18
				; CHECK-NEXT: strb w9, [sp, #29]
				; CHECK-NEXT: fmov w9, s19
				; CHECK-NEXT: mov z20.h, z3.h[1]
				; CHECK-NEXT: mov z3.h, z4.h[7]
				; CHECK-NEXT: mov z21.h, z4.h[6]
				; CHECK-NEXT: strb w10, [sp, #28]
				; CHECK-NEXT: fmov w10, s20
				; CHECK-NEXT: strb w8, [sp, #27]
				; CHECK-NEXT: fmov w8, s3
				; CHECK-NEXT: strb w9, [sp, #26]
				; CHECK-NEXT: fmov w9, s21
				; CHECK-NEXT: mov z22.h, z4.h[5]
				; CHECK-NEXT: mov z23.h, z4.h[4]
				; CHECK-NEXT: mov z24.h, z4.h[3]
				; CHECK-NEXT: strb w10, [sp, #25]
				; CHECK-NEXT: fmov w10, s22
				; CHECK-NEXT: strb w8, [sp, #23]
				; CHECK-NEXT: fmov w8, s23
				; CHECK-NEXT: strb w9, [sp, #22]
				; CHECK-NEXT: fmov w9, s24
				; CHECK-NEXT: mov z25.h, z4.h[2]
				; CHECK-NEXT: mov z26.h, z4.h[1]
				; CHECK-NEXT: strb w10, [sp, #21]
				; CHECK-NEXT: fmov w10, s25
				; CHECK-NEXT: strb w8, [sp, #20]
				; CHECK-NEXT: movprfx z1, z2
				; CHECK-NEXT: lsr z1.h, p0/m, z1.h, z16.h
				; CHECK-NEXT: strb w9, [sp, #19]
				; CHECK-NEXT: fmov w8, s26
				; CHECK-NEXT: fmov w9, s1
				; CHECK-NEXT: lsr z0.h, p0/m, z0.h, z16.h
				; CHECK-NEXT: mov z2.h, z1.h[7]
				; CHECK-NEXT: mov z3.h, z1.h[6]
				; CHECK-NEXT: strb w10, [sp, #18]
				; CHECK-NEXT: fmov w10, s0
				; CHECK-NEXT: strb w8, [sp, #17]
				; CHECK-NEXT: fmov w8, s2
				; CHECK-NEXT: strb w9, [sp, #8]
				; CHECK-NEXT: fmov w9, s3
				; CHECK-NEXT: mov z4.h, z1.h[5]
				; CHECK-NEXT: mov z5.h, z1.h[4]
				; CHECK-NEXT: mov z6.h, z1.h[3]
				; CHECK-NEXT: strb w10, [sp]
				; CHECK-NEXT: fmov w10, s4
				; CHECK-NEXT: strb w8, [sp, #15]
				; CHECK-NEXT: fmov w8, s5
				; CHECK-NEXT: strb w9, [sp, #14]
				; CHECK-NEXT: fmov w9, s6
				; CHECK-NEXT: mov z7.h, z1.h[2]
				; CHECK-NEXT: mov z16.h, z1.h[1]
				; CHECK-NEXT: mov z1.h, z0.h[7]
				; CHECK-NEXT: strb w10, [sp, #13]
				; CHECK-NEXT: fmov w10, s7
				; CHECK-NEXT: strb w8, [sp, #12]
				; CHECK-NEXT: fmov w8, s16
				; CHECK-NEXT: strb w9, [sp, #11]
				; CHECK-NEXT: fmov w9, s1
				; CHECK-NEXT: mov z17.h, z0.h[6]
				; CHECK-NEXT: mov z18.h, z0.h[5]
				; CHECK-NEXT: mov z19.h, z0.h[4]
				; CHECK-NEXT: strb w10, [sp, #10]
				; CHECK-NEXT: fmov w10, s17
				; CHECK-NEXT: strb w8, [sp, #9]
				; CHECK-NEXT: fmov w8, s18
				; CHECK-NEXT: strb w9, [sp, #7]
				; CHECK-NEXT: fmov w9, s19
				; CHECK-NEXT: mov z20.h, z0.h[3]
				; CHECK-NEXT: mov z21.h, z0.h[2]
				; CHECK-NEXT: mov z22.h, z0.h[1]
				; CHECK-NEXT: strb w10, [sp, #6]
				; CHECK-NEXT: fmov w10, s20
				; CHECK-NEXT: strb w8, [sp, #5]
				; CHECK-NEXT: fmov w8, s21
				; CHECK-NEXT: strb w9, [sp, #4]
				; CHECK-NEXT: fmov w9, s22
				; CHECK-NEXT: strb w10, [sp, #3]
				; CHECK-NEXT: strb w8, [sp, #2]
				; CHECK-NEXT: strb w9, [sp, #1]
				; CHECK-NEXT: ldp q0, q1, [sp]
				; CHECK-NEXT: stp q0, q1, [x0]
				; CHECK-NEXT: add sp, sp, #32
				; CHECK-NEXT: ret
				%op1 = load <32 x i8>, <32 x i8>* %a
				%op2 = load <32 x i8>, <32 x i8>* %b
				%1 = zext <32 x i8> %op1 to <32 x i16>
				%2 = zext <32 x i8> %op2 to <32 x i16>
				%mul = mul <32 x i16> %1, %2
				%shr = lshr <32 x i16> %mul, <i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8>
				%res = trunc <32 x i16> %shr to <32 x i8>
				store <32 x i8> %res, <32 x i8>* %a
				ret void
				}

				define <2 x i16> @umulh_v2i16(<2 x i16> %op1, <2 x i16> %op2) #0 {
				; CHECK-LABEL: umulh_v2i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: adrp x8, .LCPI18_0
				; CHECK-NEXT: adrp x9, .LCPI18_1
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.s, vl2
				; CHECK-NEXT: ldr d2, [x8, :lo12:.LCPI18_0]
				; CHECK-NEXT: ldr d3, [x9, :lo12:.LCPI18_1]
				; CHECK-NEXT: and z0.d, z0.d, z2.d
				; CHECK-NEXT: and z1.d, z1.d, z2.d
				; CHECK-NEXT: mul z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: lsr z0.s, p0/m, z0.s, z3.s
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%1 = zext <2 x i16> %op1 to <2 x i32>
				%2 = zext <2 x i16> %op2 to <2 x i32>
				%mul = mul <2 x i32> %1, %2
				%shr = lshr <2 x i32> %mul, <i32 16, i32 16>
				%res = trunc <2 x i32> %shr to <2 x i16>
				ret <2 x i16> %res
				}

				define <4 x i16> @umulh_v4i16(<4 x i16> %op1, <4 x i16> %op2) #0 {
				; CHECK-LABEL: umulh_v4i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.h, vl4
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: umulh z0.h, p0/m, z0.h, z1.h
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%1 = zext <4 x i16> %op1 to <4 x i32>
				%2 = zext <4 x i16> %op2 to <4 x i32>
				%mul = mul <4 x i32> %1, %2
				%shr = lshr <4 x i32> %mul, <i32 16, i32 16, i32 16, i32 16>
				%res = trunc <4 x i32> %shr to <4 x i16>
				ret <4 x i16> %res
				}

				define <8 x i16> @umulh_v8i16(<8 x i16> %op1, <8 x i16> %op2) #0 {
				; CHECK-LABEL: umulh_v8i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: ptrue p0.h, vl8
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: umulh z0.h, p0/m, z0.h, z1.h
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%1 = zext <8 x i16> %op1 to <8 x i32>
				%2 = zext <8 x i16> %op2 to <8 x i32>
				%mul = mul <8 x i32> %1, %2
				%shr = lshr <8 x i32> %mul, <i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>
				%res = trunc <8 x i32> %shr to <8 x i16>
				ret <8 x i16> %res
				}

				define void @umulh_v16i16(<16 x i16>* %a, <16 x i16>* %b) #0 {
				; CHECK-LABEL: umulh_v16i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ptrue p0.h, vl4
				; CHECK-NEXT: mov z5.d, z0.d
				; CHECK-NEXT: ext z5.b, z5.b, z5.b, #8
				; CHECK-NEXT: ldp q2, q3, [x1]
				; CHECK-NEXT: mov z4.d, z1.d
				; CHECK-NEXT: ext z4.b, z4.b, z4.b, #8
				; CHECK-NEXT: mov z6.d, z2.d
				; CHECK-NEXT: umulh z0.h, p0/m, z0.h, z2.h
				; CHECK-NEXT: ext z6.b, z6.b, z6.b, #8
				; CHECK-NEXT: mov z2.d, z3.d
				; CHECK-NEXT: umulh z1.h, p0/m, z1.h, z3.h
				; CHECK-NEXT: ext z2.b, z2.b, z3.b, #8
				; CHECK-NEXT: movprfx z3, z5
				; CHECK-NEXT: umulh z3.h, p0/m, z3.h, z6.h
				; CHECK-NEXT: umulh z2.h, p0/m, z2.h, z4.h
				; CHECK-NEXT: splice z0.h, p0, z0.h, z3.h
				; CHECK-NEXT: splice z1.h, p0, z1.h, z2.h
				; CHECK-NEXT: stp q0, q1, [x0]
				; CHECK-NEXT: ret
				%op1 = load <16 x i16>, <16 x i16>* %a
				%op2 = load <16 x i16>, <16 x i16>* %b
				%1 = zext <16 x i16> %op1 to <16 x i32>
				%2 = zext <16 x i16> %op2 to <16 x i32>
				%mul = mul <16 x i32> %1, %2
				%shr = lshr <16 x i32> %mul, <i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>
				%res = trunc <16 x i32> %shr to <16 x i16>
				store <16 x i16> %res, <16 x i16>* %a
				ret void
				}

				define <2 x i32> @umulh_v2i32(<2 x i32> %op1, <2 x i32> %op2) #0 {
				; CHECK-LABEL: umulh_v2i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.s, vl2
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: umulh z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%1 = zext <2 x i32> %op1 to <2 x i64>
				%2 = zext <2 x i32> %op2 to <2 x i64>
				%mul = mul <2 x i64> %1, %2
				%shr = lshr <2 x i64> %mul, <i64 32, i64 32>
				%res = trunc <2 x i64> %shr to <2 x i32>
				ret <2 x i32> %res
				}

				define <4 x i32> @umulh_v4i32(<4 x i32> %op1, <4 x i32> %op2) #0 {
				; CHECK-LABEL: umulh_v4i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: ptrue p0.s, vl4
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: umulh z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%1 = zext <4 x i32> %op1 to <4 x i64>
				%2 = zext <4 x i32> %op2 to <4 x i64>
				%mul = mul <4 x i64> %1, %2
				%shr = lshr <4 x i64> %mul, <i64 32, i64 32, i64 32, i64 32>
				%res = trunc <4 x i64> %shr to <4 x i32>
				ret <4 x i32> %res
				}

				define void @umulh_v8i32(<8 x i32>* %a, <8 x i32>* %b) #0 {
				; CHECK-LABEL: umulh_v8i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ptrue p0.s, vl2
				; CHECK-NEXT: mov z5.d, z0.d
				; CHECK-NEXT: ext z5.b, z5.b, z5.b, #8
				; CHECK-NEXT: ldp q2, q3, [x1]
				; CHECK-NEXT: mov z4.d, z1.d
				; CHECK-NEXT: ext z4.b, z4.b, z4.b, #8
				; CHECK-NEXT: mov z6.d, z2.d
				; CHECK-NEXT: umulh z0.s, p0/m, z0.s, z2.s
				; CHECK-NEXT: ext z6.b, z6.b, z6.b, #8
				; CHECK-NEXT: mov z2.d, z3.d
				; CHECK-NEXT: umulh z1.s, p0/m, z1.s, z3.s
				; CHECK-NEXT: ext z2.b, z2.b, z3.b, #8
				; CHECK-NEXT: movprfx z3, z5
				; CHECK-NEXT: umulh z3.s, p0/m, z3.s, z6.s
				; CHECK-NEXT: umulh z2.s, p0/m, z2.s, z4.s
				; CHECK-NEXT: splice z0.s, p0, z0.s, z3.s
				; CHECK-NEXT: splice z1.s, p0, z1.s, z2.s
				; CHECK-NEXT: stp q0, q1, [x0]
				; CHECK-NEXT: ret
				%op1 = load <8 x i32>, <8 x i32>* %a
				%op2 = load <8 x i32>, <8 x i32>* %b
				%insert = insertelement <8 x i64> undef, i64 32, i64 0
				%splat = shufflevector <8 x i64> %insert, <8 x i64> undef, <8 x i32> zeroinitializer
				%1 = zext <8 x i32> %op1 to <8 x i64>
				%2 = zext <8 x i32> %op2 to <8 x i64>
				%mul = mul <8 x i64> %1, %2
				%shr = lshr <8 x i64> %mul, <i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32, i64 32>
				%res = trunc <8 x i64> %shr to <8 x i32>
				store <8 x i32> %res, <8 x i32>* %a
				ret void
				}

				define <1 x i64> @umulh_v1i64(<1 x i64> %op1, <1 x i64> %op2) #0 {
				; CHECK-LABEL: umulh_v1i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.d, vl1
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: umulh z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%1 = zext <1 x i64> %op1 to <1 x i128>
				%2 = zext <1 x i64> %op2 to <1 x i128>
				%mul = mul <1 x i128> %1, %2
				%shr = lshr <1 x i128> %mul, <i128 64>
				%res = trunc <1 x i128> %shr to <1 x i64>
				ret <1 x i64> %res
				}

				define <2 x i64> @umulh_v2i64(<2 x i64> %op1, <2 x i64> %op2) #0 {
				; CHECK-LABEL: umulh_v2i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: ptrue p0.d, vl2
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: umulh z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%1 = zext <2 x i64> %op1 to <2 x i128>
				%2 = zext <2 x i64> %op2 to <2 x i128>
				%mul = mul <2 x i128> %1, %2
				%shr = lshr <2 x i128> %mul, <i128 64, i128 64>
				%res = trunc <2 x i128> %shr to <2 x i64>
				ret <2 x i64> %res
				}

				define void @umulh_v4i64(<4 x i64>* %a, <4 x i64>* %b) #0 {
				; CHECK-LABEL: umulh_v4i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ptrue p0.d, vl1
				; CHECK-NEXT: fmov x9, d0
				; CHECK-NEXT: ldp q2, q3, [x1]
				; CHECK-NEXT: mov z4.d, z1.d[1]
				; CHECK-NEXT: fmov x8, d1
				; CHECK-NEXT: mov z1.d, z0.d[1]
				; CHECK-NEXT: fmov x13, d4
				; CHECK-NEXT: fmov x10, d1
				; CHECK-NEXT: mov z0.d, z2.d[1]
				; CHECK-NEXT: fmov x12, d2
				; CHECK-NEXT: fmov x11, d0
				; CHECK-NEXT: mov z0.d, z3.d[1]
				; CHECK-NEXT: fmov x14, d0
				; CHECK-NEXT: umulh x9, x9, x12
				; CHECK-NEXT: umulh x10, x10, x11
				; CHECK-NEXT: fmov x11, d3
				; CHECK-NEXT: umulh x12, x13, x14
				; CHECK-NEXT: umulh x8, x8, x11
				; CHECK-NEXT: fmov d0, x9
				; CHECK-NEXT: fmov d1, x10
				; CHECK-NEXT: fmov d3, x12
				; CHECK-NEXT: fmov d2, x8
				; CHECK-NEXT: splice z0.d, p0, z0.d, z1.d
				; CHECK-NEXT: splice z2.d, p0, z2.d, z3.d
				; CHECK-NEXT: stp q0, q2, [x0]
				; CHECK-NEXT: ret
				%op1 = load <4 x i64>, <4 x i64>* %a
				%op2 = load <4 x i64>, <4 x i64>* %b
				%1 = zext <4 x i64> %op1 to <4 x i128>
				%2 = zext <4 x i64> %op2 to <4 x i128>
				%mul = mul <4 x i128> %1, %2
				%shr = lshr <4 x i128> %mul, <i128 64, i128 64, i128 64, i128 64>
				%res = trunc <4 x i128> %shr to <4 x i64>
				store <4 x i64> %res, <4 x i64>* %a
				ret void
				}

				attributes #0 = { "target-features"="+sve" }
				david-armUnsubmitted Not Done Reply Inline Actions Again, these `bic` instructions are illegal in streaming mode. david-arm: Again, these `bic` instructions are illegal in streaming mode.

llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-rem.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -force-streaming-compatible-sve < %s \| FileCheck %s

				target triple = "aarch64-unknown-linux-gnu"

				;
				; SREM
				;

				define <4 x i8> @srem_v4i8(<4 x i8> %op1, <4 x i8> %op2) #0 {
				; CHECK-LABEL: srem_v4i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: sub sp, sp, #16
				; CHECK-NEXT: .cfi_def_cfa_offset 16
				; CHECK-NEXT: adrp x8, .LCPI0_0
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.h, vl4
				; CHECK-NEXT: ptrue p1.s, vl4
				; CHECK-NEXT: ldr d2, [x8, :lo12:.LCPI0_0]
				; CHECK-NEXT: lsl z0.h, p0/m, z0.h, z2.h
				; CHECK-NEXT: lsl z1.h, p0/m, z1.h, z2.h
				; CHECK-NEXT: asr z0.h, p0/m, z0.h, z2.h
				; CHECK-NEXT: asr z1.h, p0/m, z1.h, z2.h
				; CHECK-NEXT: sunpklo z2.s, z1.h
				; CHECK-NEXT: sunpklo z3.s, z0.h
				; CHECK-NEXT: sdivr z2.s, p1/m, z2.s, z3.s
				; CHECK-NEXT: fmov w8, s2
				; CHECK-NEXT: mov z3.s, z2.s[3]
				; CHECK-NEXT: mov z4.s, z2.s[2]
				; CHECK-NEXT: mov z2.s, z2.s[1]
				; CHECK-NEXT: fmov w9, s3
				; CHECK-NEXT: fmov w10, s4
				; CHECK-NEXT: strh w8, [sp, #8]
				; CHECK-NEXT: fmov w8, s2
				; CHECK-NEXT: strh w9, [sp, #14]
				; CHECK-NEXT: strh w10, [sp, #12]
				; CHECK-NEXT: strh w8, [sp, #10]
				; CHECK-NEXT: ldr d2, [sp, #8]
				; CHECK-NEXT: mls z0.h, p0/m, z2.h, z1.h
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: add sp, sp, #16
				; CHECK-NEXT: ret
				%res = srem <4 x i8> %op1, %op2
				ret <4 x i8> %res
				}

				define <8 x i8> @srem_v8i8(<8 x i8> %op1, <8 x i8> %op2) #0 {
				; CHECK-LABEL: srem_v8i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: sub sp, sp, #16
				; CHECK-NEXT: .cfi_def_cfa_offset 16
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: sunpklo z2.h, z1.b
				; CHECK-NEXT: sunpklo z3.h, z0.b
				; CHECK-NEXT: ptrue p0.s, vl4
				; CHECK-NEXT: sunpkhi z4.s, z2.h
				; CHECK-NEXT: sunpkhi z5.s, z3.h
				; CHECK-NEXT: sunpklo z2.s, z2.h
				; CHECK-NEXT: sunpklo z3.s, z3.h
				; CHECK-NEXT: sdivr z4.s, p0/m, z4.s, z5.s
				; CHECK-NEXT: sdivr z2.s, p0/m, z2.s, z3.s
				; CHECK-NEXT: ptrue p0.b, vl8
				; CHECK-NEXT: uzp1 z2.h, z2.h, z4.h
				; CHECK-NEXT: mov z3.h, z2.h[7]
				; CHECK-NEXT: fmov w8, s2
				; CHECK-NEXT: fmov w9, s3
				; CHECK-NEXT: mov z4.h, z2.h[6]
				; CHECK-NEXT: mov z5.h, z2.h[5]
				; CHECK-NEXT: mov z6.h, z2.h[4]
				; CHECK-NEXT: fmov w10, s4
				; CHECK-NEXT: strb w8, [sp, #8]
				; CHECK-NEXT: fmov w8, s5
				; CHECK-NEXT: strb w9, [sp, #15]
				; CHECK-NEXT: fmov w9, s6
				; CHECK-NEXT: mov z7.h, z2.h[3]
				; CHECK-NEXT: mov z16.h, z2.h[2]
				; CHECK-NEXT: mov z2.h, z2.h[1]
				; CHECK-NEXT: strb w10, [sp, #14]
				; CHECK-NEXT: fmov w10, s7
				; CHECK-NEXT: strb w8, [sp, #13]
				; CHECK-NEXT: fmov w8, s16
				; CHECK-NEXT: strb w9, [sp, #12]
				; CHECK-NEXT: fmov w9, s2
				; CHECK-NEXT: strb w10, [sp, #11]
				; CHECK-NEXT: strb w8, [sp, #10]
				; CHECK-NEXT: strb w9, [sp, #9]
				; CHECK-NEXT: ldr d2, [sp, #8]
				; CHECK-NEXT: mls z0.b, p0/m, z2.b, z1.b
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: add sp, sp, #16
				; CHECK-NEXT: ret
				%res = srem <8 x i8> %op1, %op2
				ret <8 x i8> %res
				}

				define <16 x i8> @srem_v16i8(<16 x i8> %op1, <16 x i8> %op2) #0 {
				; CHECK-LABEL: srem_v16i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: sunpkhi z2.h, z1.b
				; CHECK-NEXT: sunpkhi z3.h, z0.b
				; CHECK-NEXT: ptrue p0.s, vl4
				; CHECK-NEXT: sunpkhi z5.s, z2.h
				; CHECK-NEXT: sunpkhi z6.s, z3.h
				; CHECK-NEXT: sunpklo z2.s, z2.h
				; CHECK-NEXT: sunpklo z3.s, z3.h
				; CHECK-NEXT: sunpklo z4.h, z1.b
				; CHECK-NEXT: sdivr z2.s, p0/m, z2.s, z3.s
				; CHECK-NEXT: sunpklo z3.h, z0.b
				; CHECK-NEXT: sdivr z5.s, p0/m, z5.s, z6.s
				; CHECK-NEXT: sunpkhi z6.s, z4.h
				; CHECK-NEXT: sunpkhi z7.s, z3.h
				; CHECK-NEXT: sunpklo z4.s, z4.h
				; CHECK-NEXT: sunpklo z3.s, z3.h
				; CHECK-NEXT: sdivr z6.s, p0/m, z6.s, z7.s
				; CHECK-NEXT: sdiv z3.s, p0/m, z3.s, z4.s
				; CHECK-NEXT: uzp1 z2.h, z2.h, z5.h
				; CHECK-NEXT: uzp1 z3.h, z3.h, z6.h
				; CHECK-NEXT: ptrue p0.b, vl16
				; CHECK-NEXT: uzp1 z2.b, z3.b, z2.b
				; CHECK-NEXT: mls z0.b, p0/m, z2.b, z1.b
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%res = srem <16 x i8> %op1, %op2
				ret <16 x i8> %res
				}

				define void @srem_v32i8(<32 x i8>* %a, <32 x i8>* %b) #0 {
				; CHECK-LABEL: srem_v32i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q2, q0, [x0]
				; CHECK-NEXT: ptrue p0.s, vl4
				; CHECK-NEXT: ldp q3, q1, [x1]
				; CHECK-NEXT: sunpkhi z5.h, z0.b
				; CHECK-NEXT: sunpklo z7.h, z0.b
				; CHECK-NEXT: sunpkhi z17.s, z5.h
				; CHECK-NEXT: sunpklo z5.s, z5.h
				; CHECK-NEXT: sunpkhi z4.h, z1.b
				; CHECK-NEXT: sunpklo z6.h, z1.b
				; CHECK-NEXT: sunpkhi z16.s, z4.h
				; CHECK-NEXT: sunpklo z4.s, z4.h
				; CHECK-NEXT: sunpkhi z18.s, z6.h
				; CHECK-NEXT: sdivr z4.s, p0/m, z4.s, z5.s
				; CHECK-NEXT: sunpkhi z5.s, z7.h
				; CHECK-NEXT: sunpklo z6.s, z6.h
				; CHECK-NEXT: sunpklo z7.s, z7.h
				; CHECK-NEXT: sdiv z5.s, p0/m, z5.s, z18.s
				; CHECK-NEXT: sdivr z6.s, p0/m, z6.s, z7.s
				; CHECK-NEXT: sdivr z16.s, p0/m, z16.s, z17.s
				; CHECK-NEXT: uzp1 z5.h, z6.h, z5.h
				; CHECK-NEXT: sunpkhi z6.h, z3.b
				; CHECK-NEXT: sunpkhi z7.h, z2.b
				; CHECK-NEXT: uzp1 z4.h, z4.h, z16.h
				; CHECK-NEXT: sunpkhi z16.s, z6.h
				; CHECK-NEXT: sunpkhi z17.s, z7.h
				; CHECK-NEXT: sunpklo z6.s, z6.h
				; CHECK-NEXT: sunpklo z7.s, z7.h
				; CHECK-NEXT: sdivr z16.s, p0/m, z16.s, z17.s
				; CHECK-NEXT: sdivr z6.s, p0/m, z6.s, z7.s
				; CHECK-NEXT: sunpklo z7.h, z3.b
				; CHECK-NEXT: sunpklo z17.h, z2.b
				; CHECK-NEXT: sunpkhi z18.s, z7.h
				; CHECK-NEXT: sunpkhi z19.s, z17.h
				; CHECK-NEXT: sunpklo z7.s, z7.h
				; CHECK-NEXT: sunpklo z17.s, z17.h
				; CHECK-NEXT: sdivr z18.s, p0/m, z18.s, z19.s
				; CHECK-NEXT: sdivr z7.s, p0/m, z7.s, z17.s
				; CHECK-NEXT: uzp1 z6.h, z6.h, z16.h
				; CHECK-NEXT: uzp1 z7.h, z7.h, z18.h
				; CHECK-NEXT: ptrue p0.b, vl16
				; CHECK-NEXT: uzp1 z6.b, z7.b, z6.b
				; CHECK-NEXT: uzp1 z4.b, z5.b, z4.b
				; CHECK-NEXT: mls z2.b, p0/m, z6.b, z3.b
				; CHECK-NEXT: mls z0.b, p0/m, z4.b, z1.b
				; CHECK-NEXT: stp q2, q0, [x0]
				; CHECK-NEXT: ret
				%op1 = load <32 x i8>, <32 x i8>* %a
				%op2 = load <32 x i8>, <32 x i8>* %b
				%res = srem <32 x i8> %op1, %op2
				store <32 x i8> %res, <32 x i8>* %a
				ret void
				}

				define <4 x i16> @srem_v4i16(<4 x i16> %op1, <4 x i16> %op2) #0 {
				; CHECK-LABEL: srem_v4i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: sub sp, sp, #16
				; CHECK-NEXT: .cfi_def_cfa_offset 16
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.s, vl4
				; CHECK-NEXT: sunpklo z2.s, z1.h
				; CHECK-NEXT: sunpklo z3.s, z0.h
				; CHECK-NEXT: sdivr z2.s, p0/m, z2.s, z3.s
				; CHECK-NEXT: ptrue p0.h, vl4
				; CHECK-NEXT: fmov w8, s2
				; CHECK-NEXT: mov z3.s, z2.s[3]
				; CHECK-NEXT: mov z4.s, z2.s[2]
				; CHECK-NEXT: mov z2.s, z2.s[1]
				; CHECK-NEXT: fmov w9, s3
				; CHECK-NEXT: fmov w10, s4
				; CHECK-NEXT: strh w8, [sp, #8]
				; CHECK-NEXT: fmov w8, s2
				; CHECK-NEXT: strh w9, [sp, #14]
				; CHECK-NEXT: strh w10, [sp, #12]
				; CHECK-NEXT: strh w8, [sp, #10]
				; CHECK-NEXT: ldr d2, [sp, #8]
				; CHECK-NEXT: mls z0.h, p0/m, z2.h, z1.h
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: add sp, sp, #16
				; CHECK-NEXT: ret
				%res = srem <4 x i16> %op1, %op2
				ret <4 x i16> %res
				}

				define <8 x i16> @srem_v8i16(<8 x i16> %op1, <8 x i16> %op2) #0 {
				; CHECK-LABEL: srem_v8i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: ptrue p0.s, vl4
				; CHECK-NEXT: sunpkhi z2.s, z1.h
				; CHECK-NEXT: sunpkhi z3.s, z0.h
				; CHECK-NEXT: sunpklo z4.s, z1.h
				; CHECK-NEXT: sdivr z2.s, p0/m, z2.s, z3.s
				; CHECK-NEXT: sunpklo z5.s, z0.h
				; CHECK-NEXT: movprfx z3, z5
				; CHECK-NEXT: sdiv z3.s, p0/m, z3.s, z4.s
				; CHECK-NEXT: uzp1 z2.h, z3.h, z2.h
				; CHECK-NEXT: ptrue p0.h, vl8
				; CHECK-NEXT: mls z0.h, p0/m, z2.h, z1.h
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%res = srem <8 x i16> %op1, %op2
				ret <8 x i16> %res
				}

				define void @srem_v16i16(<16 x i16>* %a, <16 x i16>* %b) #0 {
				; CHECK-LABEL: srem_v16i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q2, q0, [x0]
				; CHECK-NEXT: ptrue p0.s, vl4
				; CHECK-NEXT: sunpkhi z17.s, z2.h
				; CHECK-NEXT: ldp q3, q1, [x1]
				; CHECK-NEXT: sunpkhi z5.s, z0.h
				; CHECK-NEXT: sunpklo z7.s, z0.h
				; CHECK-NEXT: sunpkhi z16.s, z3.h
				; CHECK-NEXT: sdivr z16.s, p0/m, z16.s, z17.s
				; CHECK-NEXT: sunpkhi z4.s, z1.h
				; CHECK-NEXT: sunpklo z6.s, z1.h
				; CHECK-NEXT: sdivr z4.s, p0/m, z4.s, z5.s
				; CHECK-NEXT: sunpklo z5.s, z3.h
				; CHECK-NEXT: sdivr z6.s, p0/m, z6.s, z7.s
				; CHECK-NEXT: sunpklo z7.s, z2.h
				; CHECK-NEXT: sdivr z5.s, p0/m, z5.s, z7.s
				; CHECK-NEXT: ptrue p0.h, vl8
				; CHECK-NEXT: uzp1 z5.h, z5.h, z16.h
				; CHECK-NEXT: uzp1 z4.h, z6.h, z4.h
				; CHECK-NEXT: mls z2.h, p0/m, z5.h, z3.h
				; CHECK-NEXT: mls z0.h, p0/m, z4.h, z1.h
				; CHECK-NEXT: stp q2, q0, [x0]
				; CHECK-NEXT: ret
				%op1 = load <16 x i16>, <16 x i16>* %a
				%op2 = load <16 x i16>, <16 x i16>* %b
				%res = srem <16 x i16> %op1, %op2
				store <16 x i16> %res, <16 x i16>* %a
				ret void
				}

				define <2 x i32> @srem_v2i32(<2 x i32> %op1, <2 x i32> %op2) #0 {
				; CHECK-LABEL: srem_v2i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.s, vl2
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: movprfx z2, z0
				; CHECK-NEXT: sdiv z2.s, p0/m, z2.s, z1.s
				; CHECK-NEXT: mls z0.s, p0/m, z2.s, z1.s
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = srem <2 x i32> %op1, %op2
				ret <2 x i32> %res
				}

				define <4 x i32> @srem_v4i32(<4 x i32> %op1, <4 x i32> %op2) #0 {
				; CHECK-LABEL: srem_v4i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: ptrue p0.s, vl4
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: movprfx z2, z0
				; CHECK-NEXT: sdiv z2.s, p0/m, z2.s, z1.s
				; CHECK-NEXT: mls z0.s, p0/m, z2.s, z1.s
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%res = srem <4 x i32> %op1, %op2
				ret <4 x i32> %res
				}

				define void @srem_v8i32(<8 x i32>* %a, <8 x i32>* %b) #0 {
				; CHECK-LABEL: srem_v8i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ptrue p0.s, vl4
				; CHECK-NEXT: ldp q2, q3, [x1]
				; CHECK-NEXT: movprfx z4, z0
				; CHECK-NEXT: sdiv z4.s, p0/m, z4.s, z2.s
				; CHECK-NEXT: movprfx z5, z1
				; CHECK-NEXT: sdiv z5.s, p0/m, z5.s, z3.s
				; CHECK-NEXT: mls z0.s, p0/m, z4.s, z2.s
				; CHECK-NEXT: mls z1.s, p0/m, z5.s, z3.s
				; CHECK-NEXT: stp q0, q1, [x0]
				; CHECK-NEXT: ret
				%op1 = load <8 x i32>, <8 x i32>* %a
				%op2 = load <8 x i32>, <8 x i32>* %b
				%res = srem <8 x i32> %op1, %op2
				store <8 x i32> %res, <8 x i32>* %a
				ret void
				}

				define <1 x i64> @srem_v1i64(<1 x i64> %op1, <1 x i64> %op2) #0 {
				; CHECK-LABEL: srem_v1i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.d, vl1
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: movprfx z2, z0
				; CHECK-NEXT: sdiv z2.d, p0/m, z2.d, z1.d
				; CHECK-NEXT: mls z0.d, p0/m, z2.d, z1.d
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = srem <1 x i64> %op1, %op2
				ret <1 x i64> %res
				}

				define <2 x i64> @srem_v2i64(<2 x i64> %op1, <2 x i64> %op2) #0 {
				; CHECK-LABEL: srem_v2i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: ptrue p0.d, vl2
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: movprfx z2, z0
				; CHECK-NEXT: sdiv z2.d, p0/m, z2.d, z1.d
				; CHECK-NEXT: mls z0.d, p0/m, z2.d, z1.d
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%res = srem <2 x i64> %op1, %op2
				ret <2 x i64> %res
				}

				define void @srem_v4i64(<4 x i64>* %a, <4 x i64>* %b) #0 {
				; CHECK-LABEL: srem_v4i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ptrue p0.d, vl2
				; CHECK-NEXT: ldp q2, q3, [x1]
				; CHECK-NEXT: movprfx z4, z0
				; CHECK-NEXT: sdiv z4.d, p0/m, z4.d, z2.d
				; CHECK-NEXT: movprfx z5, z1
				; CHECK-NEXT: sdiv z5.d, p0/m, z5.d, z3.d
				; CHECK-NEXT: mls z0.d, p0/m, z4.d, z2.d
				; CHECK-NEXT: mls z1.d, p0/m, z5.d, z3.d
				; CHECK-NEXT: stp q0, q1, [x0]
				; CHECK-NEXT: ret
				%op1 = load <4 x i64>, <4 x i64>* %a
				%op2 = load <4 x i64>, <4 x i64>* %b
				%res = srem <4 x i64> %op1, %op2
				store <4 x i64> %res, <4 x i64>* %a
				ret void
				}

				;
				; UREM
				;

				define <4 x i8> @urem_v4i8(<4 x i8> %op1, <4 x i8> %op2) #0 {
				; CHECK-LABEL: urem_v4i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: sub sp, sp, #16
				; CHECK-NEXT: .cfi_def_cfa_offset 16
				; CHECK-NEXT: adrp x8, .LCPI13_0
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.s, vl4
				; CHECK-NEXT: ldr d2, [x8, :lo12:.LCPI13_0]
				; CHECK-NEXT: and z0.d, z0.d, z2.d
				; CHECK-NEXT: and z1.d, z1.d, z2.d
				; CHECK-NEXT: uunpklo z2.s, z1.h
				; CHECK-NEXT: uunpklo z3.s, z0.h
				; CHECK-NEXT: udivr z2.s, p0/m, z2.s, z3.s
				; CHECK-NEXT: ptrue p0.h, vl4
				; CHECK-NEXT: fmov w8, s2
				; CHECK-NEXT: mov z3.s, z2.s[3]
				; CHECK-NEXT: mov z4.s, z2.s[2]
				; CHECK-NEXT: mov z2.s, z2.s[1]
				; CHECK-NEXT: fmov w9, s3
				; CHECK-NEXT: fmov w10, s4
				; CHECK-NEXT: strh w8, [sp, #8]
				; CHECK-NEXT: fmov w8, s2
				; CHECK-NEXT: strh w9, [sp, #14]
				; CHECK-NEXT: strh w10, [sp, #12]
				; CHECK-NEXT: strh w8, [sp, #10]
				; CHECK-NEXT: ldr d2, [sp, #8]
				; CHECK-NEXT: mls z0.h, p0/m, z2.h, z1.h
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: add sp, sp, #16
				; CHECK-NEXT: ret
				%res = urem <4 x i8> %op1, %op2
				ret <4 x i8> %res
				}

				define <8 x i8> @urem_v8i8(<8 x i8> %op1, <8 x i8> %op2) #0 {
				; CHECK-LABEL: urem_v8i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: sub sp, sp, #16
				; CHECK-NEXT: .cfi_def_cfa_offset 16
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: uunpklo z2.h, z1.b
				; CHECK-NEXT: uunpklo z3.h, z0.b
				; CHECK-NEXT: ptrue p0.s, vl4
				; CHECK-NEXT: uunpkhi z4.s, z2.h
				; CHECK-NEXT: uunpkhi z5.s, z3.h
				; CHECK-NEXT: uunpklo z2.s, z2.h
				; CHECK-NEXT: uunpklo z3.s, z3.h
				; CHECK-NEXT: udivr z4.s, p0/m, z4.s, z5.s
				; CHECK-NEXT: udivr z2.s, p0/m, z2.s, z3.s
				; CHECK-NEXT: ptrue p0.b, vl8
				; CHECK-NEXT: uzp1 z2.h, z2.h, z4.h
				; CHECK-NEXT: mov z3.h, z2.h[7]
				; CHECK-NEXT: fmov w8, s2
				; CHECK-NEXT: fmov w9, s3
				; CHECK-NEXT: mov z4.h, z2.h[6]
				; CHECK-NEXT: mov z5.h, z2.h[5]
				; CHECK-NEXT: mov z6.h, z2.h[4]
				; CHECK-NEXT: fmov w10, s4
				; CHECK-NEXT: strb w8, [sp, #8]
				; CHECK-NEXT: fmov w8, s5
				; CHECK-NEXT: strb w9, [sp, #15]
				; CHECK-NEXT: fmov w9, s6
				; CHECK-NEXT: mov z7.h, z2.h[3]
				; CHECK-NEXT: mov z16.h, z2.h[2]
				; CHECK-NEXT: mov z2.h, z2.h[1]
				; CHECK-NEXT: strb w10, [sp, #14]
				; CHECK-NEXT: fmov w10, s7
				; CHECK-NEXT: strb w8, [sp, #13]
				; CHECK-NEXT: fmov w8, s16
				; CHECK-NEXT: strb w9, [sp, #12]
				; CHECK-NEXT: fmov w9, s2
				; CHECK-NEXT: strb w10, [sp, #11]
				; CHECK-NEXT: strb w8, [sp, #10]
				; CHECK-NEXT: strb w9, [sp, #9]
				; CHECK-NEXT: ldr d2, [sp, #8]
				; CHECK-NEXT: mls z0.b, p0/m, z2.b, z1.b
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: add sp, sp, #16
				; CHECK-NEXT: ret
				%res = urem <8 x i8> %op1, %op2
				ret <8 x i8> %res
				}

				define <16 x i8> @urem_v16i8(<16 x i8> %op1, <16 x i8> %op2) #0 {
				; CHECK-LABEL: urem_v16i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: uunpkhi z2.h, z1.b
				; CHECK-NEXT: uunpkhi z3.h, z0.b
				; CHECK-NEXT: ptrue p0.s, vl4
				david-armUnsubmitted Done Reply Inline Actions I think that you can remove the tests greater than 512 bits, i.e. <128 x i8>. If the tests already work for <64 x i8> they are likely to work for anything larger too. david-arm: I think that you can remove the tests greater than 512 bits, i.e. <128 x i8>. If the tests…
				; CHECK-NEXT: uunpkhi z5.s, z2.h
				; CHECK-NEXT: uunpkhi z6.s, z3.h
				; CHECK-NEXT: uunpklo z2.s, z2.h
				; CHECK-NEXT: uunpklo z3.s, z3.h
				; CHECK-NEXT: uunpklo z4.h, z1.b
				; CHECK-NEXT: udivr z2.s, p0/m, z2.s, z3.s
				; CHECK-NEXT: uunpklo z3.h, z0.b
				; CHECK-NEXT: udivr z5.s, p0/m, z5.s, z6.s
				; CHECK-NEXT: uunpkhi z6.s, z4.h
				; CHECK-NEXT: uunpkhi z7.s, z3.h
				; CHECK-NEXT: uunpklo z4.s, z4.h
				; CHECK-NEXT: uunpklo z3.s, z3.h
				; CHECK-NEXT: udivr z6.s, p0/m, z6.s, z7.s
				; CHECK-NEXT: udiv z3.s, p0/m, z3.s, z4.s
				; CHECK-NEXT: uzp1 z2.h, z2.h, z5.h
				; CHECK-NEXT: uzp1 z3.h, z3.h, z6.h
				; CHECK-NEXT: ptrue p0.b, vl16
				; CHECK-NEXT: uzp1 z2.b, z3.b, z2.b
				; CHECK-NEXT: mls z0.b, p0/m, z2.b, z1.b
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%res = urem <16 x i8> %op1, %op2
				ret <16 x i8> %res
				}

				define void @urem_v32i8(<32 x i8>* %a, <32 x i8>* %b) #0 {
				; CHECK-LABEL: urem_v32i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q2, q0, [x0]
				; CHECK-NEXT: ptrue p0.s, vl4
				; CHECK-NEXT: ldp q3, q1, [x1]
				; CHECK-NEXT: uunpkhi z5.h, z0.b
				; CHECK-NEXT: uunpklo z7.h, z0.b
				; CHECK-NEXT: uunpkhi z17.s, z5.h
				; CHECK-NEXT: uunpklo z5.s, z5.h
				; CHECK-NEXT: uunpkhi z4.h, z1.b
				; CHECK-NEXT: uunpklo z6.h, z1.b
				; CHECK-NEXT: uunpkhi z16.s, z4.h
				; CHECK-NEXT: uunpklo z4.s, z4.h
				; CHECK-NEXT: uunpkhi z18.s, z6.h
				; CHECK-NEXT: udivr z4.s, p0/m, z4.s, z5.s
				; CHECK-NEXT: uunpkhi z5.s, z7.h
				; CHECK-NEXT: uunpklo z6.s, z6.h
				; CHECK-NEXT: uunpklo z7.s, z7.h
				; CHECK-NEXT: udiv z5.s, p0/m, z5.s, z18.s
				; CHECK-NEXT: udivr z6.s, p0/m, z6.s, z7.s
				; CHECK-NEXT: udivr z16.s, p0/m, z16.s, z17.s
				; CHECK-NEXT: uzp1 z5.h, z6.h, z5.h
				; CHECK-NEXT: uunpkhi z6.h, z3.b
				; CHECK-NEXT: uunpkhi z7.h, z2.b
				; CHECK-NEXT: uzp1 z4.h, z4.h, z16.h
				; CHECK-NEXT: uunpkhi z16.s, z6.h
				; CHECK-NEXT: uunpkhi z17.s, z7.h
				; CHECK-NEXT: uunpklo z6.s, z6.h
				; CHECK-NEXT: uunpklo z7.s, z7.h
				; CHECK-NEXT: udivr z16.s, p0/m, z16.s, z17.s
				; CHECK-NEXT: udivr z6.s, p0/m, z6.s, z7.s
				; CHECK-NEXT: uunpklo z7.h, z3.b
				; CHECK-NEXT: uunpklo z17.h, z2.b
				; CHECK-NEXT: uunpkhi z18.s, z7.h
				; CHECK-NEXT: uunpkhi z19.s, z17.h
				; CHECK-NEXT: uunpklo z7.s, z7.h
				; CHECK-NEXT: uunpklo z17.s, z17.h
				; CHECK-NEXT: udivr z18.s, p0/m, z18.s, z19.s
				; CHECK-NEXT: udivr z7.s, p0/m, z7.s, z17.s
				; CHECK-NEXT: uzp1 z6.h, z6.h, z16.h
				; CHECK-NEXT: uzp1 z7.h, z7.h, z18.h
				; CHECK-NEXT: ptrue p0.b, vl16
				; CHECK-NEXT: uzp1 z6.b, z7.b, z6.b
				; CHECK-NEXT: uzp1 z4.b, z5.b, z4.b
				; CHECK-NEXT: mls z2.b, p0/m, z6.b, z3.b
				; CHECK-NEXT: mls z0.b, p0/m, z4.b, z1.b
				; CHECK-NEXT: stp q2, q0, [x0]
				; CHECK-NEXT: ret
				%op1 = load <32 x i8>, <32 x i8>* %a
				%op2 = load <32 x i8>, <32 x i8>* %b
				%res = urem <32 x i8> %op1, %op2
				store <32 x i8> %res, <32 x i8>* %a
				ret void
				}

				define <4 x i16> @urem_v4i16(<4 x i16> %op1, <4 x i16> %op2) #0 {
				; CHECK-LABEL: urem_v4i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: sub sp, sp, #16
				; CHECK-NEXT: .cfi_def_cfa_offset 16
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.s, vl4
				; CHECK-NEXT: uunpklo z2.s, z1.h
				; CHECK-NEXT: uunpklo z3.s, z0.h
				; CHECK-NEXT: udivr z2.s, p0/m, z2.s, z3.s
				; CHECK-NEXT: ptrue p0.h, vl4
				; CHECK-NEXT: fmov w8, s2
				; CHECK-NEXT: mov z3.s, z2.s[3]
				; CHECK-NEXT: mov z4.s, z2.s[2]
				; CHECK-NEXT: mov z2.s, z2.s[1]
				; CHECK-NEXT: fmov w9, s3
				; CHECK-NEXT: fmov w10, s4
				; CHECK-NEXT: strh w8, [sp, #8]
				; CHECK-NEXT: fmov w8, s2
				; CHECK-NEXT: strh w9, [sp, #14]
				; CHECK-NEXT: strh w10, [sp, #12]
				; CHECK-NEXT: strh w8, [sp, #10]
				; CHECK-NEXT: ldr d2, [sp, #8]
				; CHECK-NEXT: mls z0.h, p0/m, z2.h, z1.h
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: add sp, sp, #16
				; CHECK-NEXT: ret
				%res = urem <4 x i16> %op1, %op2
				ret <4 x i16> %res
				}

				define <8 x i16> @urem_v8i16(<8 x i16> %op1, <8 x i16> %op2) #0 {
				; CHECK-LABEL: urem_v8i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: ptrue p0.s, vl4
				; CHECK-NEXT: uunpkhi z2.s, z1.h
				; CHECK-NEXT: uunpkhi z3.s, z0.h
				; CHECK-NEXT: uunpklo z4.s, z1.h
				; CHECK-NEXT: udivr z2.s, p0/m, z2.s, z3.s
				; CHECK-NEXT: uunpklo z5.s, z0.h
				; CHECK-NEXT: movprfx z3, z5
				; CHECK-NEXT: udiv z3.s, p0/m, z3.s, z4.s
				; CHECK-NEXT: uzp1 z2.h, z3.h, z2.h
				; CHECK-NEXT: ptrue p0.h, vl8
				; CHECK-NEXT: mls z0.h, p0/m, z2.h, z1.h
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%res = urem <8 x i16> %op1, %op2
				ret <8 x i16> %res
				}

				define void @urem_v16i16(<16 x i16>* %a, <16 x i16>* %b) #0 {
				; CHECK-LABEL: urem_v16i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q2, q0, [x0]
				; CHECK-NEXT: ptrue p0.s, vl4
				; CHECK-NEXT: uunpkhi z17.s, z2.h
				; CHECK-NEXT: ldp q3, q1, [x1]
				; CHECK-NEXT: uunpkhi z5.s, z0.h
				; CHECK-NEXT: uunpklo z7.s, z0.h
				; CHECK-NEXT: uunpkhi z16.s, z3.h
				; CHECK-NEXT: udivr z16.s, p0/m, z16.s, z17.s
				; CHECK-NEXT: uunpkhi z4.s, z1.h
				; CHECK-NEXT: uunpklo z6.s, z1.h
				; CHECK-NEXT: udivr z4.s, p0/m, z4.s, z5.s
				; CHECK-NEXT: uunpklo z5.s, z3.h
				; CHECK-NEXT: udivr z6.s, p0/m, z6.s, z7.s
				; CHECK-NEXT: uunpklo z7.s, z2.h
				; CHECK-NEXT: udivr z5.s, p0/m, z5.s, z7.s
				; CHECK-NEXT: ptrue p0.h, vl8
				; CHECK-NEXT: uzp1 z5.h, z5.h, z16.h
				; CHECK-NEXT: uzp1 z4.h, z6.h, z4.h
				; CHECK-NEXT: mls z2.h, p0/m, z5.h, z3.h
				; CHECK-NEXT: mls z0.h, p0/m, z4.h, z1.h
				; CHECK-NEXT: stp q2, q0, [x0]
				; CHECK-NEXT: ret
				%op1 = load <16 x i16>, <16 x i16>* %a
				%op2 = load <16 x i16>, <16 x i16>* %b
				%res = urem <16 x i16> %op1, %op2
				store <16 x i16> %res, <16 x i16>* %a
				ret void
				}

				define <2 x i32> @urem_v2i32(<2 x i32> %op1, <2 x i32> %op2) #0 {
				; CHECK-LABEL: urem_v2i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.s, vl2
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: movprfx z2, z0
				; CHECK-NEXT: udiv z2.s, p0/m, z2.s, z1.s
				; CHECK-NEXT: mls z0.s, p0/m, z2.s, z1.s
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = urem <2 x i32> %op1, %op2
				ret <2 x i32> %res
				}

				define <4 x i32> @urem_v4i32(<4 x i32> %op1, <4 x i32> %op2) #0 {
				; CHECK-LABEL: urem_v4i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: ptrue p0.s, vl4
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: movprfx z2, z0
				; CHECK-NEXT: udiv z2.s, p0/m, z2.s, z1.s
				; CHECK-NEXT: mls z0.s, p0/m, z2.s, z1.s
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%res = urem <4 x i32> %op1, %op2
				ret <4 x i32> %res
				}

				define void @urem_v8i32(<8 x i32>* %a, <8 x i32>* %b) #0 {
				; CHECK-LABEL: urem_v8i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ptrue p0.s, vl4
				; CHECK-NEXT: ldp q2, q3, [x1]
				; CHECK-NEXT: movprfx z4, z0
				; CHECK-NEXT: udiv z4.s, p0/m, z4.s, z2.s
				; CHECK-NEXT: movprfx z5, z1
				; CHECK-NEXT: udiv z5.s, p0/m, z5.s, z3.s
				; CHECK-NEXT: mls z0.s, p0/m, z4.s, z2.s
				; CHECK-NEXT: mls z1.s, p0/m, z5.s, z3.s
				; CHECK-NEXT: stp q0, q1, [x0]
				; CHECK-NEXT: ret
				%op1 = load <8 x i32>, <8 x i32>* %a
				%op2 = load <8 x i32>, <8 x i32>* %b
				%res = urem <8 x i32> %op1, %op2
				store <8 x i32> %res, <8 x i32>* %a
				ret void
				}

				define <1 x i64> @urem_v1i64(<1 x i64> %op1, <1 x i64> %op2) #0 {
				; CHECK-LABEL: urem_v1i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.d, vl1
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: movprfx z2, z0
				; CHECK-NEXT: udiv z2.d, p0/m, z2.d, z1.d
				; CHECK-NEXT: mls z0.d, p0/m, z2.d, z1.d
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%res = urem <1 x i64> %op1, %op2
				ret <1 x i64> %res
				}

				define <2 x i64> @urem_v2i64(<2 x i64> %op1, <2 x i64> %op2) #0 {
				; CHECK-LABEL: urem_v2i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: ptrue p0.d, vl2
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: movprfx z2, z0
				; CHECK-NEXT: udiv z2.d, p0/m, z2.d, z1.d
				; CHECK-NEXT: mls z0.d, p0/m, z2.d, z1.d
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%res = urem <2 x i64> %op1, %op2
				ret <2 x i64> %res
				}

				define void @urem_v4i64(<4 x i64>* %a, <4 x i64>* %b) #0 {
				; CHECK-LABEL: urem_v4i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp q0, q1, [x0]
				; CHECK-NEXT: ptrue p0.d, vl2
				; CHECK-NEXT: ldp q2, q3, [x1]
				; CHECK-NEXT: movprfx z4, z0
				; CHECK-NEXT: udiv z4.d, p0/m, z4.d, z2.d
				; CHECK-NEXT: movprfx z5, z1
				; CHECK-NEXT: udiv z5.d, p0/m, z5.d, z3.d
				; CHECK-NEXT: mls z0.d, p0/m, z4.d, z2.d
				; CHECK-NEXT: mls z1.d, p0/m, z5.d, z3.d
				; CHECK-NEXT: stp q0, q1, [x0]
				; CHECK-NEXT: ret
				%op1 = load <4 x i64>, <4 x i64>* %a
				%op2 = load <4 x i64>, <4 x i64>* %b
				%res = urem <4 x i64> %op1, %op2
				store <4 x i64> %res, <4 x i64>* %a
				ret void
				}

				attributes #0 = { "target-features"="+sve" }
				david-armUnsubmitted Done Reply Inline Actions Again, maybe remove this test since I'm not sure what extra value it gives us? david-arm: Again, maybe remove this test since I'm not sure what extra value it gives us?
				david-armUnsubmitted Done Reply Inline Actions Again, maybe remove this test? david-arm: Again, maybe remove this test?
				david-armUnsubmitted Done Reply Inline Actions Again, maybe remove this test? david-arm: Again, maybe remove this test?
				david-armUnsubmitted Done Reply Inline Actions Again, maybe remove this test? david-arm: Again, maybe remove this test?
				david-armUnsubmitted Done Reply Inline Actions Again, maybe remove this test? david-arm: Again, maybe remove this test?
				david-armUnsubmitted Done Reply Inline Actions Again, maybe remove this test? david-arm: Again, maybe remove this test?
				david-armUnsubmitted Done Reply Inline Actions Again, maybe remove this test? david-arm: Again, maybe remove this test?
				david-armUnsubmitted Done Reply Inline Actions Again, maybe remove this test? david-arm: Again, maybe remove this test?
				david-armUnsubmitted Done Reply Inline Actions Again, maybe remove this test? david-arm: Again, maybe remove this test?
				david-armUnsubmitted Done Reply Inline Actions Again, maybe remove this test? david-arm: Again, maybe remove this test?
				david-armUnsubmitted Done Reply Inline Actions Again, maybe remove this test? david-arm: Again, maybe remove this test?
				david-armUnsubmitted Done Reply Inline Actions Again, maybe remove this test? david-arm: Again, maybe remove this test?
				david-armUnsubmitted Not Done Reply Inline Actions Again, maybe remove this test? david-arm: Again, maybe remove this test?
				david-armUnsubmitted Done Reply Inline Actions Again, maybe remove this test? david-arm: Again, maybe remove this test?
				david-armUnsubmitted Done Reply Inline Actions Again, maybe remove this test? david-arm: Again, maybe remove this test?

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64-SVE]: force using SVE in streaming mode to lower arithmetic and logical fixed-width vector ops.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 469161

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-arith.ll

llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-div.ll

llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-log.ll

llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-mulh.ll

llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-rem.ll

[AArch64-SVE]: force using SVE in streaming mode to lower arithmetic and logical fixed-width vector ops.
ClosedPublic