This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
1/5
AArch64SVEInstrInfo.td
-
SVEInstrFormats.td
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
sve-gather-scatter-addr-opts.ll
-
sve-int-arith.ll
-
sve-pseudos-expand-undef.mir
-
sve-streaming-mode-fixed-length-int-rem.ll

Differential D145488

[CodeGen][AArch64] Generate Pseudo instructions for integer MLA/MAD/MLS/MSB
ClosedPublic

Authored by sushgokh on Mar 7 2023, 4:16 AM.

Download Raw Diff

Details

Reviewers

paulwalker-arm
SjoerdMeijer
dmgreen

Commits

rG7b338a691ec9: [CodeGen][AArch64] Generate Pseudo instructions for integer MLA/MAD/MLS/MSB

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

sushgokh created this revision.Mar 7 2023, 4:16 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 7 2023, 4:16 AM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald Transcript

sushgokh requested review of this revision.Mar 7 2023, 4:16 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 7 2023, 4:16 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B217838: Diff 502980.Mar 7 2023, 5:15 AM

paulwalker-arm added inline comments.Mar 7 2023, 8:23 AM

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
409–410	The `add(a, select(...` pattern has specific requirements for the result of inactive lanes that matches `AArch64mla_m1`. Did you mean to move it into `AArch64mla_p`?
411–414	As above I think the `sub(a, select(...` pattern should remain assigned to AArch64mls_m1.
494	Please delete redundant whitespace.

sushgokh added inline comments.Mar 8 2023, 2:54 AM

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
409–410	The current semantics of add(a, select(...) ) is resulting in mla za, p0/m, zn * zm The patch, even after moving this pattern to AArch64mla_p, should generate the pseudo with same predicate semantics (i.e. p0/m), right? I assume that thats the meaning of FalseLanesUndef. Correct me if wrong. If this understanding is incorrect, is there any other way to specify multiple patterns with different inactive lane requirements to a single pseudo? One option is define pattern that explicitly maps add(a, select(...) ) to specific pseudo. Other option is let this pattern remain with AArch64mla_m1 rather than AArch64mla_p

paulwalker-arm added inline comments.Mar 8 2023, 5:59 AM

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
409–410	The current semantics of add(a, select(...) ) is resulting in mla za, p0/m, zn * zm The output is only valid because of how the operation has been register allocated. Remember the point of this patch is that `MLA_ZPZZZ` can emit an `MLA` or `MAD` based on what the register allocator chooses to do. You can see the bug the current implementation will introduce by running the following through llc: ` define <vscale x 16 x i8> @good(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b, <vscale x 16 x i8> %c, <vscale x 16 x i1> %mask) { %mul = mul nsw <vscale x 16 x i8> %b, %c %sel = select <vscale x 16 x i1> %mask, <vscale x 16 x i8> %mul, <vscale x 16 x i8> zeroinitializer %add = add <vscale x 16 x i8> %a, %sel ret <vscale x 16 x i8> %add } define <vscale x 16 x i8> @bad(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b, <vscale x 16 x i8> %c, <vscale x 16 x i1> %mask) { %mul = mul nsw <vscale x 16 x i8> %a, %b %sel = select <vscale x 16 x i1> %mask, <vscale x 16 x i8> %mul, <vscale x 16 x i8> zeroinitializer %add = add <vscale x 16 x i8> %c, %sel ret <vscale x 16 x i8> %add } good: mla z0.b, p0/m, z1.b, z2.b ret bad: mad z0.b, p0/m, z1.b, z2.b ret Both functions expect the inactive lanes to return the original value of the addend. However the second example will incorrectly set the result of inactive lanes to the original value of the multiplicand. Essential both sets of IR are synonymous to the AArch64mla_m1 operation. The patch, even after moving this pattern to AArch64mla_p, should generate the pseudo with same predicate semantics (i.e. p0/m), right? I assume that thats the meaning of FalseLanesUndef. Correct me if wrong. `FalseLanesUndef` means the inactive lanes have no defined value but as explained above the `add(a, select(...) )` pattern sets the inactive lanes to a defined value, namely `added + 0`. If this understanding is incorrect, is there any other way to specify multiple patterns with different inactive lane requirements to a single pseudo? One option is define pattern that explicitly maps add(a, select(...) ) to specific pseudo. Other option is let this pattern remain with AArch64mla_m1 rather than AArch64mla_p Based on the above I think option 3 is the only correct answer.

@paulwalker-arm

for e = 0 to elements-1
    if ElemP[mask, e, esize] == '1' then
        integer element1 = UInt(Elem[operand1, e, esize]);
        integer element2 = UInt(Elem[operand2, e, esize]);
        integer product = element1 * element2;
        if sub_op then
            Elem[result, e, esize] = Elem[operand3, e, esize] - product;
        else
            Elem[result, e, esize] = Elem[operand3, e, esize] + product;
    else
        Elem[result, e, esize] = Elem[**operand1**, e, esize];

Ah ! Didnt notice the semantics highlighted above. I was thinking that to be operand3. Thanks for pointing out. Will make appropriate changes and upload the patch

sushgokh updated this revision to Diff 503627.Mar 8 2023, 10:26 PM

Harbormaster completed remote builds in B218298: Diff 503627.Mar 8 2023, 11:20 PM

paulwalker-arm accepted this revision.Mar 9 2023, 5:37 AM

This revision is now accepted and ready to land.Mar 9 2023, 5:37 AM

Closed by commit rG7b338a691ec9: [CodeGen][AArch64] Generate Pseudo instructions for integer MLA/MAD/MLS/MSB (authored by sushgokh). · Explain WhyMar 9 2023, 9:29 PM

This revision was automatically updated to reflect the committed changes.

sushgokh added a commit: rG7b338a691ec9: [CodeGen][AArch64] Generate Pseudo instructions for integer MLA/MAD/MLS/MSB.

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64SVEInstrInfo.td

18 lines

SVEInstrFormats.td

45 lines

test/

CodeGen/

AArch64/

sve-gather-scatter-addr-opts.ll

15 lines

sve-int-arith.ll

45 lines

sve-pseudos-expand-undef.mir

20 lines

sve-streaming-mode-fixed-length-int-rem.ll

44 lines

Diff 504021

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

Show First 20 Lines • Show All 395 Lines • ▼ Show 20 Lines
def AArch64add_m1 : PatFrags<(ops node:$pred, node:$op1, node:$op2),		def AArch64add_m1 : PatFrags<(ops node:$pred, node:$op1, node:$op2),
[(int_aarch64_sve_add node:$pred, node:$op1, node:$op2),		[(int_aarch64_sve_add node:$pred, node:$op1, node:$op2),
(add node:$op1, (vselect node:$pred, node:$op2, (SVEDup0)))]>;		(add node:$op1, (vselect node:$pred, node:$op2, (SVEDup0)))]>;
def AArch64sub_m1 : PatFrags<(ops node:$pred, node:$op1, node:$op2),		def AArch64sub_m1 : PatFrags<(ops node:$pred, node:$op1, node:$op2),
[(int_aarch64_sve_sub node:$pred, node:$op1, node:$op2),		[(int_aarch64_sve_sub node:$pred, node:$op1, node:$op2),
(sub node:$op1, (vselect node:$pred, node:$op2, (SVEDup0)))]>;		(sub node:$op1, (vselect node:$pred, node:$op2, (SVEDup0)))]>;
def AArch64mla_m1 : PatFrags<(ops node:$pred, node:$op1, node:$op2, node:$op3),		def AArch64mla_m1 : PatFrags<(ops node:$pred, node:$op1, node:$op2, node:$op3),
[(int_aarch64_sve_mla node:$pred, node:$op1, node:$op2, node:$op3),		[(int_aarch64_sve_mla node:$pred, node:$op1, node:$op2, node:$op3),
(add node:$op1, (AArch64mul_p_oneuse node:$pred, node:$op2, node:$op3)),
// add(a, select(mask, mul(b, c), splat(0))) -> mla(a, mask, b, c)		// add(a, select(mask, mul(b, c), splat(0))) -> mla(a, mask, b, c)
(add node:$op1, (vselect node:$pred, (AArch64mul_p_oneuse (SVEAllActive), node:$op2, node:$op3), (SVEDup0)))]>;		(add node:$op1, (vselect node:$pred, (AArch64mul_p_oneuse (SVEAllActive), node:$op2, node:$op3), (SVEDup0)))]>;
		// pattern for generating pseudo for MLA_ZPmZZ/MAD_ZPmZZ
		def AArch64mla_p : PatFrags<(ops node:$pred, node:$op1, node:$op2, node:$op3),
		[(add node:$op1, (AArch64mul_p_oneuse node:$pred, node:$op2, node:$op3))]>;
def AArch64mls_m1 : PatFrags<(ops node:$pred, node:$op1, node:$op2, node:$op3),		def AArch64mls_m1 : PatFrags<(ops node:$pred, node:$op1, node:$op2, node:$op3),
[(int_aarch64_sve_mls node:$pred, node:$op1, node:$op2, node:$op3),		[(int_aarch64_sve_mls node:$pred, node:$op1, node:$op2, node:$op3),
		paulwalker-armUnsubmitted Not Done Reply Inline Actions The `add(a, select(...` pattern has specific requirements for the result of inactive lanes that matches `AArch64mla_m1`. Did you mean to move it into `AArch64mla_p`? paulwalker-arm: The `add(a, select(...` pattern has specific requirements for the result of inactive lanes that…
		sushgokhAuthorUnsubmitted Done Reply Inline Actions The current semantics of add(a, select(...) ) is resulting in mla za, p0/m, zn * zm The patch, even after moving this pattern to AArch64mla_p, should generate the pseudo with same predicate semantics (i.e. p0/m), right? I assume that thats the meaning of FalseLanesUndef. Correct me if wrong. If this understanding is incorrect, is there any other way to specify multiple patterns with different inactive lane requirements to a single pseudo? One option is define pattern that explicitly maps add(a, select(...) ) to specific pseudo. Other option is let this pattern remain with AArch64mla_m1 rather than AArch64mla_p sushgokh: The current semantics of add(a, select(...) ) is resulting in mla za, p0/m, zn * zm The…
		paulwalker-armUnsubmitted Not Done Reply Inline Actions The current semantics of add(a, select(...) ) is resulting in mla za, p0/m, zn * zm The output is only valid because of how the operation has been register allocated. Remember the point of this patch is that `MLA_ZPZZZ` can emit an `MLA` or `MAD` based on what the register allocator chooses to do. You can see the bug the current implementation will introduce by running the following through llc: ` define <vscale x 16 x i8> @good(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b, <vscale x 16 x i8> %c, <vscale x 16 x i1> %mask) { %mul = mul nsw <vscale x 16 x i8> %b, %c %sel = select <vscale x 16 x i1> %mask, <vscale x 16 x i8> %mul, <vscale x 16 x i8> zeroinitializer %add = add <vscale x 16 x i8> %a, %sel ret <vscale x 16 x i8> %add } define <vscale x 16 x i8> @bad(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b, <vscale x 16 x i8> %c, <vscale x 16 x i1> %mask) { %mul = mul nsw <vscale x 16 x i8> %a, %b %sel = select <vscale x 16 x i1> %mask, <vscale x 16 x i8> %mul, <vscale x 16 x i8> zeroinitializer %add = add <vscale x 16 x i8> %c, %sel ret <vscale x 16 x i8> %add } good: mla z0.b, p0/m, z1.b, z2.b ret bad: mad z0.b, p0/m, z1.b, z2.b ret Both functions expect the inactive lanes to return the original value of the addend. However the second example will incorrectly set the result of inactive lanes to the original value of the multiplicand. Essential both sets of IR are synonymous to the AArch64mla_m1 operation. The patch, even after moving this pattern to AArch64mla_p, should generate the pseudo with same predicate semantics (i.e. p0/m), right? I assume that thats the meaning of FalseLanesUndef. Correct me if wrong. `FalseLanesUndef` means the inactive lanes have no defined value but as explained above the `add(a, select(...) )` pattern sets the inactive lanes to a defined value, namely `added + 0`. If this understanding is incorrect, is there any other way to specify multiple patterns with different inactive lane requirements to a single pseudo? One option is define pattern that explicitly maps add(a, select(...) ) to specific pseudo. Other option is let this pattern remain with AArch64mla_m1 rather than AArch64mla_p Based on the above I think option 3 is the only correct answer. paulwalker-arm: > The current semantics of add(a, select(...) ) is resulting in > mla za, p0/m, zn * zm The…
(sub node:$op1, (AArch64mul_p_oneuse node:$pred, node:$op2, node:$op3)),
// sub(a, select(mask, mul(b, c), splat(0))) -> mls(a, mask, b, c)		// sub(a, select(mask, mul(b, c), splat(0))) -> mls(a, mask, b, c)
(sub node:$op1, (vselect node:$pred, (AArch64mul_p_oneuse (SVEAllActive), node:$op2, node:$op3), (SVEDup0)))]>;		(sub node:$op1, (vselect node:$pred, (AArch64mul_p_oneuse (SVEAllActive), node:$op2, node:$op3), (SVEDup0)))]>;
		def AArch64mls_p : PatFrags<(ops node:$pred, node:$op1, node:$op2, node:$op3),
		[(sub node:$op1, (AArch64mul_p_oneuse node:$pred, node:$op2, node:$op3))]>;
		paulwalker-armUnsubmitted Not Done Reply Inline Actions As above I think the `sub(a, select(...` pattern should remain assigned to AArch64mls_m1. paulwalker-arm: As above I think the `sub(a, select(...` pattern should remain assigned to AArch64mls_m1.
def AArch64eor3 : PatFrags<(ops node:$op1, node:$op2, node:$op3),		def AArch64eor3 : PatFrags<(ops node:$op1, node:$op2, node:$op3),
[(int_aarch64_sve_eor3 node:$op1, node:$op2, node:$op3),		[(int_aarch64_sve_eor3 node:$op1, node:$op2, node:$op3),
(xor node:$op1, (xor node:$op2, node:$op3))]>;		(xor node:$op1, (xor node:$op2, node:$op3))]>;

class fma_patfrags<SDPatternOperator intrinsic, SDPatternOperator sdnode>		class fma_patfrags<SDPatternOperator intrinsic, SDPatternOperator sdnode>
: PatFrags<(ops node:$pred, node:$op1, node:$op2, node:$op3),		: PatFrags<(ops node:$pred, node:$op1, node:$op2, node:$op3),
[(intrinsic node:$pred, node:$op1, node:$op2, node:$op3),		[(intrinsic node:$pred, node:$op1, node:$op2, node:$op3),
(sdnode (SVEAllActive), node:$op1, (vselect node:$pred, (AArch64fmul_p_oneuse (SVEAllActive), node:$op2, node:$op3), (SVEDup0)))],		(sdnode (SVEAllActive), node:$op1, (vselect node:$pred, (AArch64fmul_p_oneuse (SVEAllActive), node:$op2, node:$op3), (SVEDup0)))],
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	let Predicates = [HasSVEorSME] in {
defm ADD_ZI : sve_int_arith_imm0<0b000, "add", add>;		defm ADD_ZI : sve_int_arith_imm0<0b000, "add", add>;
defm SUB_ZI : sve_int_arith_imm0<0b001, "sub", sub>;		defm SUB_ZI : sve_int_arith_imm0<0b001, "sub", sub>;
defm SUBR_ZI : sve_int_arith_imm0<0b011, "subr", AArch64subr>;		defm SUBR_ZI : sve_int_arith_imm0<0b011, "subr", AArch64subr>;
defm SQADD_ZI : sve_int_arith_imm0<0b100, "sqadd", saddsat>;		defm SQADD_ZI : sve_int_arith_imm0<0b100, "sqadd", saddsat>;
defm UQADD_ZI : sve_int_arith_imm0<0b101, "uqadd", uaddsat>;		defm UQADD_ZI : sve_int_arith_imm0<0b101, "uqadd", uaddsat>;
defm SQSUB_ZI : sve_int_arith_imm0<0b110, "sqsub", ssubsat>;		defm SQSUB_ZI : sve_int_arith_imm0<0b110, "sqsub", ssubsat>;
defm UQSUB_ZI : sve_int_arith_imm0<0b111, "uqsub", usubsat>;		defm UQSUB_ZI : sve_int_arith_imm0<0b111, "uqsub", usubsat>;

defm MAD_ZPmZZ : sve_int_mladdsub_vvv_pred<0b0, "mad", int_aarch64_sve_mad>;		defm MAD_ZPmZZ : sve_int_mladdsub_vvv_pred<0b0, "mad", int_aarch64_sve_mad, "MLA_ZPmZZ", /isReverseInstr/ 1>;
defm MSB_ZPmZZ : sve_int_mladdsub_vvv_pred<0b1, "msb", int_aarch64_sve_msb>;		defm MSB_ZPmZZ : sve_int_mladdsub_vvv_pred<0b1, "msb", int_aarch64_sve_msb, "MLS_ZPmZZ", /isReverseInstr/ 1>;
defm MLA_ZPmZZ : sve_int_mlas_vvv_pred<0b0, "mla", AArch64mla_m1>;		defm MLA_ZPmZZ : sve_int_mlas_vvv_pred<0b0, "mla", AArch64mla_m1, "MLA_ZPZZZ", "MAD_ZPmZZ">;
defm MLS_ZPmZZ : sve_int_mlas_vvv_pred<0b1, "mls", AArch64mls_m1>;		defm MLS_ZPmZZ : sve_int_mlas_vvv_pred<0b1, "mls", AArch64mls_m1, "MLS_ZPZZZ", "MSB_ZPmZZ">;

		defm MLA_ZPZZZ : sve_int_3op_p_mladdsub<AArch64mla_p>;
		paulwalker-armUnsubmitted Not Done Reply Inline Actions Please delete redundant whitespace. paulwalker-arm: Please delete redundant whitespace.
		defm MLS_ZPZZZ : sve_int_3op_p_mladdsub<AArch64mls_p>;

// SVE predicated integer reductions.		// SVE predicated integer reductions.
defm SADDV_VPZ : sve_int_reduce_0_saddv<0b000, "saddv", AArch64saddv_p>;		defm SADDV_VPZ : sve_int_reduce_0_saddv<0b000, "saddv", AArch64saddv_p>;
defm UADDV_VPZ : sve_int_reduce_0_uaddv<0b001, "uaddv", AArch64uaddv_p>;		defm UADDV_VPZ : sve_int_reduce_0_uaddv<0b001, "uaddv", AArch64uaddv_p>;
defm SMAXV_VPZ : sve_int_reduce_1<0b000, "smaxv", AArch64smaxv_p>;		defm SMAXV_VPZ : sve_int_reduce_1<0b000, "smaxv", AArch64smaxv_p>;
defm UMAXV_VPZ : sve_int_reduce_1<0b001, "umaxv", AArch64umaxv_p>;		defm UMAXV_VPZ : sve_int_reduce_1<0b001, "umaxv", AArch64umaxv_p>;
defm SMINV_VPZ : sve_int_reduce_1<0b010, "sminv", AArch64sminv_p>;		defm SMINV_VPZ : sve_int_reduce_1<0b010, "sminv", AArch64sminv_p>;
defm UMINV_VPZ : sve_int_reduce_1<0b011, "uminv", AArch64uminv_p>;		defm UMINV_VPZ : sve_int_reduce_1<0b011, "uminv", AArch64uminv_p>;
▲ Show 20 Lines • Show All 3,402 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/SVEInstrFormats.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,117 Lines • ▼ Show 20 Lines	: I<(outs zprty:$Zdn), (ins PPR3bAny:$Pg, zprty:$_Zdn, zprty:$Zm, zprty:$Za),
let Inst{4-0} = Zdn;		let Inst{4-0} = Zdn;

let Constraints = "$Zdn = $_Zdn";		let Constraints = "$Zdn = $_Zdn";
let DestructiveInstType = DestructiveOther;		let DestructiveInstType = DestructiveOther;
let ElementSize = zprty.ElementSize;		let ElementSize = zprty.ElementSize;
let hasSideEffects = 0;		let hasSideEffects = 0;
}		}

multiclass sve_int_mladdsub_vvv_pred<bits<1> opc, string asm, SDPatternOperator op> {		multiclass sve_int_mladdsub_vvv_pred<bits<1> opc, string asm, SDPatternOperator op,
def _B : sve_int_mladdsub_vvv_pred<0b00, opc, asm, ZPR8>;		string revname, bit isReverseInstr=0> {
def _H : sve_int_mladdsub_vvv_pred<0b01, opc, asm, ZPR16>;		def _B : sve_int_mladdsub_vvv_pred<0b00, opc, asm, ZPR8>,
def _S : sve_int_mladdsub_vvv_pred<0b10, opc, asm, ZPR32>;		SVEInstr2Rev<NAME # _B, revname # _B, isReverseInstr>;
def _D : sve_int_mladdsub_vvv_pred<0b11, opc, asm, ZPR64>;		def _H : sve_int_mladdsub_vvv_pred<0b01, opc, asm, ZPR16>,
		SVEInstr2Rev<NAME # _H, revname # _H, isReverseInstr>;
		def _S : sve_int_mladdsub_vvv_pred<0b10, opc, asm, ZPR32>,
		SVEInstr2Rev<NAME # _S, revname # _S, isReverseInstr>;
		def _D : sve_int_mladdsub_vvv_pred<0b11, opc, asm, ZPR64>,
		SVEInstr2Rev<NAME # _D, revname # _D, isReverseInstr>;

def : SVE_4_Op_Pat<nxv16i8, op, nxv16i1, nxv16i8, nxv16i8, nxv16i8, !cast<Instruction>(NAME # _B)>;		def : SVE_4_Op_Pat<nxv16i8, op, nxv16i1, nxv16i8, nxv16i8, nxv16i8, !cast<Instruction>(NAME # _B)>;
def : SVE_4_Op_Pat<nxv8i16, op, nxv8i1, nxv8i16, nxv8i16, nxv8i16, !cast<Instruction>(NAME # _H)>;		def : SVE_4_Op_Pat<nxv8i16, op, nxv8i1, nxv8i16, nxv8i16, nxv8i16, !cast<Instruction>(NAME # _H)>;
def : SVE_4_Op_Pat<nxv4i32, op, nxv4i1, nxv4i32, nxv4i32, nxv4i32, !cast<Instruction>(NAME # _S)>;		def : SVE_4_Op_Pat<nxv4i32, op, nxv4i1, nxv4i32, nxv4i32, nxv4i32, !cast<Instruction>(NAME # _S)>;
def : SVE_4_Op_Pat<nxv2i64, op, nxv2i1, nxv2i64, nxv2i64, nxv2i64, !cast<Instruction>(NAME # _D)>;		def : SVE_4_Op_Pat<nxv2i64, op, nxv2i1, nxv2i64, nxv2i64, nxv2i64, !cast<Instruction>(NAME # _D)>;
}		}

class sve_int_mlas_vvv_pred<bits<2> sz8_64, bits<1> opc, string asm,		class sve_int_mlas_vvv_pred<bits<2> sz8_64, bits<1> opc, string asm,
Show All 12 Lines	: I<(outs zprty:$Zda), (ins PPR3bAny:$Pg, zprty:$_Zda, zprty:$Zn, zprty:$Zm),
let Inst{20-16} = Zm;		let Inst{20-16} = Zm;
let Inst{15-14} = 0b01;		let Inst{15-14} = 0b01;
let Inst{13} = opc;		let Inst{13} = opc;
let Inst{12-10} = Pg;		let Inst{12-10} = Pg;
let Inst{9-5} = Zn;		let Inst{9-5} = Zn;
let Inst{4-0} = Zda;		let Inst{4-0} = Zda;

let Constraints = "$Zda = $_Zda";		let Constraints = "$Zda = $_Zda";
let DestructiveInstType = DestructiveOther;		let DestructiveInstType = DestructiveTernaryCommWithRev;
let ElementSize = zprty.ElementSize;		let ElementSize = zprty.ElementSize;
let hasSideEffects = 0;		let hasSideEffects = 0;
}		}

multiclass sve_int_mlas_vvv_pred<bits<1> opc, string asm, SDPatternOperator op> {		multiclass sve_int_mlas_vvv_pred<bits<1> opc, string asm, SDPatternOperator op,
def _B : sve_int_mlas_vvv_pred<0b00, opc, asm, ZPR8>;		string Ps, string revname, bit isReverseInstr=0> {
def _H : sve_int_mlas_vvv_pred<0b01, opc, asm, ZPR16>;		def _B : sve_int_mlas_vvv_pred<0b00, opc, asm, ZPR8>,
def _S : sve_int_mlas_vvv_pred<0b10, opc, asm, ZPR32>;		SVEPseudo2Instr<Ps # _B, 1>, SVEInstr2Rev<NAME # _B, revname # _B, isReverseInstr>;
def _D : sve_int_mlas_vvv_pred<0b11, opc, asm, ZPR64>;		def _H : sve_int_mlas_vvv_pred<0b01, opc, asm, ZPR16>,
		SVEPseudo2Instr<Ps # _H, 1>, SVEInstr2Rev<NAME # _H, revname # _H, isReverseInstr>;
		def _S : sve_int_mlas_vvv_pred<0b10, opc, asm, ZPR32>,
		SVEPseudo2Instr<Ps # _S, 1>, SVEInstr2Rev<NAME # _S, revname # _S, isReverseInstr>;
		def _D : sve_int_mlas_vvv_pred<0b11, opc, asm, ZPR64>,
		SVEPseudo2Instr<Ps # _D, 1>, SVEInstr2Rev<NAME # _D, revname # _D, isReverseInstr>;

def : SVE_4_Op_Pat<nxv16i8, op, nxv16i1, nxv16i8, nxv16i8, nxv16i8, !cast<Instruction>(NAME # _B)>;		def : SVE_4_Op_Pat<nxv16i8, op, nxv16i1, nxv16i8, nxv16i8, nxv16i8, !cast<Instruction>(NAME # _B)>;
def : SVE_4_Op_Pat<nxv8i16, op, nxv8i1, nxv8i16, nxv8i16, nxv8i16, !cast<Instruction>(NAME # _H)>;		def : SVE_4_Op_Pat<nxv8i16, op, nxv8i1, nxv8i16, nxv8i16, nxv8i16, !cast<Instruction>(NAME # _H)>;
def : SVE_4_Op_Pat<nxv4i32, op, nxv4i1, nxv4i32, nxv4i32, nxv4i32, !cast<Instruction>(NAME # _S)>;		def : SVE_4_Op_Pat<nxv4i32, op, nxv4i1, nxv4i32, nxv4i32, nxv4i32, !cast<Instruction>(NAME # _S)>;
def : SVE_4_Op_Pat<nxv2i64, op, nxv2i1, nxv2i64, nxv2i64, nxv2i64, !cast<Instruction>(NAME # _D)>;		def : SVE_4_Op_Pat<nxv2i64, op, nxv2i1, nxv2i64, nxv2i64, nxv2i64, !cast<Instruction>(NAME # _D)>;
}		}

		//class for generating pseudo for SVE MLA/MAD/MLS/MSB
		multiclass sve_int_3op_p_mladdsub<SDPatternOperator op> {
		def _UNDEF_B : PredThreeOpPseudo<NAME # _B, ZPR8, FalseLanesUndef>;
		def _UNDEF_H : PredThreeOpPseudo<NAME # _H, ZPR16, FalseLanesUndef>;
		def _UNDEF_S : PredThreeOpPseudo<NAME # _S, ZPR32, FalseLanesUndef>;
		def _UNDEF_D : PredThreeOpPseudo<NAME # _D, ZPR64, FalseLanesUndef>;

		def : SVE_4_Op_Pat<nxv16i8, op, nxv16i1, nxv16i8, nxv16i8, nxv16i8, !cast<Instruction>(NAME # _UNDEF_B)>;
		def : SVE_4_Op_Pat<nxv8i16, op, nxv8i1, nxv8i16, nxv8i16, nxv8i16, !cast<Instruction>(NAME # _UNDEF_H)>;
		def : SVE_4_Op_Pat<nxv4i32, op, nxv4i1, nxv4i32, nxv4i32, nxv4i32, !cast<Instruction>(NAME # _UNDEF_S)>;
		def : SVE_4_Op_Pat<nxv2i64, op, nxv2i1, nxv2i64, nxv2i64, nxv2i64, !cast<Instruction>(NAME # _UNDEF_D)>;
		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// SVE2 Integer Multiply-Add - Unpredicated Group		// SVE2 Integer Multiply-Add - Unpredicated Group
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

class sve2_int_mla<bits<2> sz, bits<5> opc, string asm,		class sve2_int_mla<bits<2> sz, bits<5> opc, string asm,
ZPRRegOp zprty1, ZPRRegOp zprty2>		ZPRRegOp zprty1, ZPRRegOp zprty2>
: I<(outs zprty1:$Zda), (ins zprty1:$_Zda, zprty2:$Zn, zprty2:$Zm),		: I<(outs zprty1:$Zda), (ins zprty1:$_Zda, zprty2:$Zn, zprty2:$Zm),
asm, "\t$Zda, $Zn, $Zm", "", []>, Sched<[]> {		asm, "\t$Zda, $Zn, $Zm", "", []>, Sched<[]> {
▲ Show 20 Lines • Show All 6,738 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-gather-scatter-addr-opts.ll

	Show First 20 Lines • Show All 66 Lines • ▼ Show 20 Lines
	;; Negative tests			;; Negative tests

	; Ensure we don't use a "vscale x 4" scatter. Cannot prove that variable stride			; Ensure we don't use a "vscale x 4" scatter. Cannot prove that variable stride
	; will not wrap when shrunk to be i32 based.			; will not wrap when shrunk to be i32 based.
	define void @scatter_f16_index_offset_var(ptr %base, i64 %offset, i64 %scale, <vscale x 4 x i1> %pg, <vscale x 4 x half> %data) #0 {			define void @scatter_f16_index_offset_var(ptr %base, i64 %offset, i64 %scale, <vscale x 4 x i1> %pg, <vscale x 4 x half> %data) #0 {
	; CHECK-LABEL: scatter_f16_index_offset_var:			; CHECK-LABEL: scatter_f16_index_offset_var:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: index z1.d, #0, #1			; CHECK-NEXT: index z1.d, #0, #1
	; CHECK-NEXT: mov z3.d, x1
	; CHECK-NEXT: mov z2.d, z1.d
	; CHECK-NEXT: mov z4.d, z3.d
	; CHECK-NEXT: ptrue p1.d			; CHECK-NEXT: ptrue p1.d
				; CHECK-NEXT: mov z2.d, z1.d
				; CHECK-NEXT: mov z3.d, x1
	; CHECK-NEXT: incd z2.d			; CHECK-NEXT: incd z2.d
	; CHECK-NEXT: mla z3.d, p1/m, z1.d, z3.d			; CHECK-NEXT: mad z1.d, p1/m, z3.d, z3.d
	; CHECK-NEXT: mla z4.d, p1/m, z2.d, z4.d			; CHECK-NEXT: mad z2.d, p1/m, z3.d, z3.d
	; CHECK-NEXT: punpklo p1.h, p0.b			; CHECK-NEXT: punpklo p1.h, p0.b
	; CHECK-NEXT: uunpklo z1.d, z0.s			; CHECK-NEXT: uunpklo z3.d, z0.s
	; CHECK-NEXT: punpkhi p0.h, p0.b			; CHECK-NEXT: punpkhi p0.h, p0.b
	; CHECK-NEXT: uunpkhi z0.d, z0.s			; CHECK-NEXT: uunpkhi z0.d, z0.s
	; CHECK-NEXT: st1h { z1.d }, p1, [x0, z3.d, lsl #1]			; CHECK-NEXT: st1h { z3.d }, p1, [x0, z1.d, lsl #1]
	; CHECK-NEXT: st1h { z0.d }, p0, [x0, z4.d, lsl #1]			; CHECK-NEXT: st1h { z0.d }, p0, [x0, z2.d, lsl #1]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%t0 = insertelement <vscale x 4 x i64> undef, i64 %offset, i32 0			%t0 = insertelement <vscale x 4 x i64> undef, i64 %offset, i32 0
	%t1 = shufflevector <vscale x 4 x i64> %t0, <vscale x 4 x i64> undef, <vscale x 4 x i32> zeroinitializer			%t1 = shufflevector <vscale x 4 x i64> %t0, <vscale x 4 x i64> undef, <vscale x 4 x i32> zeroinitializer
	%t2 = insertelement <vscale x 4 x i64> undef, i64 %scale, i32 0			%t2 = insertelement <vscale x 4 x i64> undef, i64 %scale, i32 0
	%t3 = shufflevector <vscale x 4 x i64> %t0, <vscale x 4 x i64> undef, <vscale x 4 x i32> zeroinitializer			%t3 = shufflevector <vscale x 4 x i64> %t0, <vscale x 4 x i64> undef, <vscale x 4 x i32> zeroinitializer
	%step = call <vscale x 4 x i64> @llvm.experimental.stepvector.nxv4i64()			%step = call <vscale x 4 x i64> @llvm.experimental.stepvector.nxv4i64()
	%t4 = mul <vscale x 4 x i64> %t3, %step			%t4 = mul <vscale x 4 x i64> %t3, %step
	%t5 = add <vscale x 4 x i64> %t1, %t4			%t5 = add <vscale x 4 x i64> %t1, %t4
	▲ Show 20 Lines • Show All 397 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-int-arith.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s \| FileCheck %s		; RUN: llc -mtriple=aarch64 -mattr=+sve < %s \| FileCheck %s

define <vscale x 2 x i64> @add_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) {		define <vscale x 2 x i64> @add_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) {
; CHECK-LABEL: add_i64:		; CHECK-LABEL: add_i64:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: add z0.d, z0.d, z1.d		; CHECK-NEXT: add z0.d, z0.d, z1.d
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%res = add <vscale x 2 x i64> %a, %b		%res = add <vscale x 2 x i64> %a, %b
ret <vscale x 2 x i64> %res		ret <vscale x 2 x i64> %res
▲ Show 20 Lines • Show All 321 Lines • ▼ Show 20 Lines
; CHECK-LABEL: uqsub_i8:		; CHECK-LABEL: uqsub_i8:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: uqsub z0.b, z0.b, z1.b		; CHECK-NEXT: uqsub z0.b, z0.b, z1.b
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%res = call <vscale x 16 x i8> @llvm.usub.sat.nxv16i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b)		%res = call <vscale x 16 x i8> @llvm.usub.sat.nxv16i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b)
ret <vscale x 16 x i8> %res		ret <vscale x 16 x i8> %res
}		}

; Next four cases should generate mad instruction once pseudo instructions are emitted for MLA/MAD

define <vscale x 16 x i8> @mad_i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b, <vscale x 16 x i8> %c) {		define <vscale x 16 x i8> @mad_i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b, <vscale x 16 x i8> %c) {
; CHECK-LABEL: mad_i8:		; CHECK-LABEL: mad_i8:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: ptrue p0.b		; CHECK-NEXT: ptrue p0.b
; CHECK-NEXT: mla z2.b, p0/m, z0.b, z1.b		; CHECK-NEXT: mad z0.b, p0/m, z1.b, z2.b
; CHECK-NEXT: mov z0.d, z2.d
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%prod = mul <vscale x 16 x i8> %a, %b		%prod = mul <vscale x 16 x i8> %a, %b
%res = add <vscale x 16 x i8> %c, %prod		%res = add <vscale x 16 x i8> %c, %prod
ret <vscale x 16 x i8> %res		ret <vscale x 16 x i8> %res
}		}

define <vscale x 8 x i16> @mad_i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b, <vscale x 8 x i16> %c) {		define <vscale x 8 x i16> @mad_i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b, <vscale x 8 x i16> %c) {
; CHECK-LABEL: mad_i16:		; CHECK-LABEL: mad_i16:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: ptrue p0.h		; CHECK-NEXT: ptrue p0.h
; CHECK-NEXT: mla z2.h, p0/m, z0.h, z1.h		; CHECK-NEXT: mad z0.h, p0/m, z1.h, z2.h
; CHECK-NEXT: mov z0.d, z2.d
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%prod = mul <vscale x 8 x i16> %a, %b		%prod = mul <vscale x 8 x i16> %a, %b
%res = add <vscale x 8 x i16> %c, %prod		%res = add <vscale x 8 x i16> %c, %prod
ret <vscale x 8 x i16> %res		ret <vscale x 8 x i16> %res
}		}

define <vscale x 4 x i32> @mad_i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b, <vscale x 4 x i32> %c) {		define <vscale x 4 x i32> @mad_i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b, <vscale x 4 x i32> %c) {
; CHECK-LABEL: mad_i32:		; CHECK-LABEL: mad_i32:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: ptrue p0.s		; CHECK-NEXT: ptrue p0.s
; CHECK-NEXT: mla z2.s, p0/m, z0.s, z1.s		; CHECK-NEXT: mad z0.s, p0/m, z1.s, z2.s
; CHECK-NEXT: mov z0.d, z2.d
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%prod = mul <vscale x 4 x i32> %a, %b		%prod = mul <vscale x 4 x i32> %a, %b
%res = add <vscale x 4 x i32> %c, %prod		%res = add <vscale x 4 x i32> %c, %prod
ret <vscale x 4 x i32> %res		ret <vscale x 4 x i32> %res
}		}

define <vscale x 2 x i64> @mad_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b, <vscale x 2 x i64> %c) {		define <vscale x 2 x i64> @mad_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b, <vscale x 2 x i64> %c) {
; CHECK-LABEL: mad_i64:		; CHECK-LABEL: mad_i64:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: ptrue p0.d		; CHECK-NEXT: ptrue p0.d
; CHECK-NEXT: mla z2.d, p0/m, z0.d, z1.d		; CHECK-NEXT: mad z0.d, p0/m, z1.d, z2.d
; CHECK-NEXT: mov z0.d, z2.d
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%prod = mul <vscale x 2 x i64> %a, %b		%prod = mul <vscale x 2 x i64> %a, %b
%res = add <vscale x 2 x i64> %c, %prod		%res = add <vscale x 2 x i64> %c, %prod
ret <vscale x 2 x i64> %res		ret <vscale x 2 x i64> %res
}		}

define <vscale x 16 x i8> @mla_i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b, <vscale x 16 x i8> %c) {		define <vscale x 16 x i8> @mla_i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b, <vscale x 16 x i8> %c) {
; CHECK-LABEL: mla_i8:		; CHECK-LABEL: mla_i8:
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
; CHECK-NEXT: st1b { z1.b }, p0, [x0]		; CHECK-NEXT: st1b { z1.b }, p0, [x0]
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%prod = mul <vscale x 16 x i8> %a, %b		%prod = mul <vscale x 16 x i8> %a, %b
store <vscale x 16 x i8> %prod, <vscale x 16 x i8>* %p		store <vscale x 16 x i8> %prod, <vscale x 16 x i8>* %p
%res = add <vscale x 16 x i8> %c, %prod		%res = add <vscale x 16 x i8> %c, %prod
ret <vscale x 16 x i8> %res		ret <vscale x 16 x i8> %res
}		}

; Next four cases should generate msb instruction once psuedo instruction is emitted for MLS/MSB

define <vscale x 16 x i8> @msb_i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b, <vscale x 16 x i8> %c) {		define <vscale x 16 x i8> @msb_i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b, <vscale x 16 x i8> %c) {
; CHECK-LABEL: msb_i8:		; CHECK-LABEL: msb_i8:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: ptrue p0.b		; CHECK-NEXT: ptrue p0.b
; CHECK-NEXT: mls z2.b, p0/m, z0.b, z1.b		; CHECK-NEXT: msb z0.b, p0/m, z1.b, z2.b
; CHECK-NEXT: mov z0.d, z2.d
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%prod = mul <vscale x 16 x i8> %a, %b		%prod = mul <vscale x 16 x i8> %a, %b
%res = sub <vscale x 16 x i8> %c, %prod		%res = sub <vscale x 16 x i8> %c, %prod
ret <vscale x 16 x i8> %res		ret <vscale x 16 x i8> %res
}		}

define <vscale x 8 x i16> @msb_i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b, <vscale x 8 x i16> %c) {		define <vscale x 8 x i16> @msb_i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b, <vscale x 8 x i16> %c) {
; CHECK-LABEL: msb_i16:		; CHECK-LABEL: msb_i16:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: ptrue p0.h		; CHECK-NEXT: ptrue p0.h
; CHECK-NEXT: mls z2.h, p0/m, z0.h, z1.h		; CHECK-NEXT: msb z0.h, p0/m, z1.h, z2.h
; CHECK-NEXT: mov z0.d, z2.d
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%prod = mul <vscale x 8 x i16> %a, %b		%prod = mul <vscale x 8 x i16> %a, %b
%res = sub <vscale x 8 x i16> %c, %prod		%res = sub <vscale x 8 x i16> %c, %prod
ret <vscale x 8 x i16> %res		ret <vscale x 8 x i16> %res
}		}

define <vscale x 4 x i32> @msb_i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b, <vscale x 4 x i32> %c) {		define <vscale x 4 x i32> @msb_i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b, <vscale x 4 x i32> %c) {
; CHECK-LABEL: msb_i32:		; CHECK-LABEL: msb_i32:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: ptrue p0.s		; CHECK-NEXT: ptrue p0.s
; CHECK-NEXT: mls z2.s, p0/m, z0.s, z1.s		; CHECK-NEXT: msb z0.s, p0/m, z1.s, z2.s
; CHECK-NEXT: mov z0.d, z2.d
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%prod = mul <vscale x 4 x i32> %a, %b		%prod = mul <vscale x 4 x i32> %a, %b
%res = sub <vscale x 4 x i32> %c, %prod		%res = sub <vscale x 4 x i32> %c, %prod
ret <vscale x 4 x i32> %res		ret <vscale x 4 x i32> %res
}		}

define <vscale x 2 x i64> @msb_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b, <vscale x 2 x i64> %c) {		define <vscale x 2 x i64> @msb_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b, <vscale x 2 x i64> %c) {
; CHECK-LABEL: msb_i64:		; CHECK-LABEL: msb_i64:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: ptrue p0.d		; CHECK-NEXT: ptrue p0.d
; CHECK-NEXT: mls z2.d, p0/m, z0.d, z1.d		; CHECK-NEXT: msb z0.d, p0/m, z1.d, z2.d
; CHECK-NEXT: mov z0.d, z2.d
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%prod = mul <vscale x 2 x i64> %a, %b		%prod = mul <vscale x 2 x i64> %a, %b
%res = sub <vscale x 2 x i64> %c, %prod		%res = sub <vscale x 2 x i64> %c, %prod
ret <vscale x 2 x i64> %res		ret <vscale x 2 x i64> %res
}		}

define <vscale x 16 x i8> @mls_i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b, <vscale x 16 x i8> %c) {		define <vscale x 16 x i8> @mls_i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b, <vscale x 16 x i8> %c) {
; CHECK-LABEL: mls_i8:		; CHECK-LABEL: mls_i8:
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines

; Test cases below have one of the add/sub operands as constant splat		; Test cases below have one of the add/sub operands as constant splat

define <vscale x 2 x i64> @muladd_i64_positiveAddend(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b)		define <vscale x 2 x i64> @muladd_i64_positiveAddend(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
; CHECK-LABEL: muladd_i64_positiveAddend:		; CHECK-LABEL: muladd_i64_positiveAddend:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: ptrue p0.d		; CHECK-NEXT: ptrue p0.d
; CHECK-NEXT: mov z2.d, #0xffffffff		; CHECK-NEXT: mov z2.d, #0xffffffff
; CHECK-NEXT: mla z2.d, p0/m, z0.d, z1.d		; CHECK-NEXT: mad z0.d, p0/m, z1.d, z2.d
; CHECK-NEXT: mov z0.d, z2.d
; CHECK-NEXT: ret		; CHECK-NEXT: ret
{		{
%1 = mul <vscale x 2 x i64> %a, %b		%1 = mul <vscale x 2 x i64> %a, %b
%2 = add <vscale x 2 x i64> %1, shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 4294967295, i64 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)		%2 = add <vscale x 2 x i64> %1, shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 4294967295, i64 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)
ret <vscale x 2 x i64> %2		ret <vscale x 2 x i64> %2
}		}

define <vscale x 2 x i64> @muladd_i64_negativeAddend(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b)		define <vscale x 2 x i64> @muladd_i64_negativeAddend(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
; CHECK-LABEL: muladd_i64_negativeAddend:		; CHECK-LABEL: muladd_i64_negativeAddend:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: ptrue p0.d		; CHECK-NEXT: ptrue p0.d
; CHECK-NEXT: mov z2.d, #0xffffffff00000001		; CHECK-NEXT: mov z2.d, #0xffffffff00000001
; CHECK-NEXT: mla z2.d, p0/m, z0.d, z1.d		; CHECK-NEXT: mad z0.d, p0/m, z1.d, z2.d
; CHECK-NEXT: mov z0.d, z2.d
; CHECK-NEXT: ret		; CHECK-NEXT: ret
{		{
%1 = mul <vscale x 2 x i64> %a, %b		%1 = mul <vscale x 2 x i64> %a, %b
%2 = add <vscale x 2 x i64> %1, shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 -4294967295, i64 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)		%2 = add <vscale x 2 x i64> %1, shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 -4294967295, i64 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)
ret <vscale x 2 x i64> %2		ret <vscale x 2 x i64> %2
}		}


define <vscale x 4 x i32> @muladd_i32_positiveAddend(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b)		define <vscale x 4 x i32> @muladd_i32_positiveAddend(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b)
; CHECK-LABEL: muladd_i32_positiveAddend:		; CHECK-LABEL: muladd_i32_positiveAddend:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: ptrue p0.s		; CHECK-NEXT: ptrue p0.s
; CHECK-NEXT: mov z2.s, #0x10000		; CHECK-NEXT: mov z2.s, #0x10000
; CHECK-NEXT: mla z2.s, p0/m, z0.s, z1.s		; CHECK-NEXT: mad z0.s, p0/m, z1.s, z2.s
; CHECK-NEXT: mov z0.d, z2.d
; CHECK-NEXT: ret		; CHECK-NEXT: ret
{		{
%1 = mul <vscale x 4 x i32> %a, %b		%1 = mul <vscale x 4 x i32> %a, %b
%2 = add <vscale x 4 x i32> %1, shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 65536, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)		%2 = add <vscale x 4 x i32> %1, shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 65536, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
ret <vscale x 4 x i32> %2		ret <vscale x 4 x i32> %2
}		}

define <vscale x 4 x i32> @muladd_i32_negativeAddend(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b)		define <vscale x 4 x i32> @muladd_i32_negativeAddend(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b)
; CHECK-LABEL: muladd_i32_negativeAddend:		; CHECK-LABEL: muladd_i32_negativeAddend:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: ptrue p0.s		; CHECK-NEXT: ptrue p0.s
; CHECK-NEXT: mov z2.s, #0xffff0000		; CHECK-NEXT: mov z2.s, #0xffff0000
; CHECK-NEXT: mla z2.s, p0/m, z0.s, z1.s		; CHECK-NEXT: mad z0.s, p0/m, z1.s, z2.s
; CHECK-NEXT: mov z0.d, z2.d
; CHECK-NEXT: ret		; CHECK-NEXT: ret
{		{
%1 = mul <vscale x 4 x i32> %a, %b		%1 = mul <vscale x 4 x i32> %a, %b
%2 = add <vscale x 4 x i32> %1, shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 -65536, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)		%2 = add <vscale x 4 x i32> %1, shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 -65536, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
ret <vscale x 4 x i32> %2		ret <vscale x 4 x i32> %2
}		}

define <vscale x 8 x i16> @muladd_i16_positiveAddend(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b)		define <vscale x 8 x i16> @muladd_i16_positiveAddend(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b)
Show All 9 Lines	; CHECK-NEXT: ret
ret <vscale x 8 x i16> %2		ret <vscale x 8 x i16> %2
}		}

define <vscale x 8 x i16> @muladd_i16_negativeAddend(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b)		define <vscale x 8 x i16> @muladd_i16_negativeAddend(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b)
; CHECK-LABEL: muladd_i16_negativeAddend:		; CHECK-LABEL: muladd_i16_negativeAddend:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: ptrue p0.h		; CHECK-NEXT: ptrue p0.h
; CHECK-NEXT: mov z2.h, #-255 // =0xffffffffffffff01		; CHECK-NEXT: mov z2.h, #-255 // =0xffffffffffffff01
; CHECK-NEXT: mla z2.h, p0/m, z0.h, z1.h		; CHECK-NEXT: mad z0.h, p0/m, z1.h, z2.h
; CHECK-NEXT: mov z0.d, z2.d
; CHECK-NEXT: ret		; CHECK-NEXT: ret
{		{
%1 = mul <vscale x 8 x i16> %a, %b		%1 = mul <vscale x 8 x i16> %a, %b
%2 = add <vscale x 8 x i16> %1, shufflevector (<vscale x 8 x i16> insertelement (<vscale x 8 x i16> poison, i16 -255, i16 0), <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer)		%2 = add <vscale x 8 x i16> %1, shufflevector (<vscale x 8 x i16> insertelement (<vscale x 8 x i16> poison, i16 -255, i16 0), <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer)
ret <vscale x 8 x i16> %2		ret <vscale x 8 x i16> %2
}		}

define <vscale x 16 x i8> @muladd_i8_positiveAddend(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b)		define <vscale x 16 x i8> @muladd_i8_positiveAddend(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b)
▲ Show 20 Lines • Show All 162 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-pseudos-expand-undef.mir

Show All 14 Lines	bb.0:
; CHECK: add_x		; CHECK: add_x
; CHECK-NOT: MOVPRFX		; CHECK-NOT: MOVPRFX
; CHECK: $z0 = FADD_ZPmZ_S renamable $p0, killed $z0, renamable $z0		; CHECK: $z0 = FADD_ZPmZ_S renamable $p0, killed $z0, renamable $z0
; CHECK-NEXT: RET		; CHECK-NEXT: RET
renamable $z0 = FADD_ZPZZ_UNDEF_S renamable $p0, renamable $z0, killed renamable $z0		renamable $z0 = FADD_ZPZZ_UNDEF_S renamable $p0, renamable $z0, killed renamable $z0
RET_ReallyLR		RET_ReallyLR

...		...

		# CHECK: {{.}} MSB_ZPmZZ_B {{.}}
		---
		name: expand_mls_to_msb
		body: \|
		bb.0:
		renamable $p0 = PTRUE_B 31
		renamable $z0 = MLS_ZPZZZ_UNDEF_B killed renamable $p0, killed renamable $z2, killed renamable $z0, killed renamable $z1
		RET_ReallyLR implicit $z0
		...

		# CHECK: {{.}} MAD_ZPmZZ_B {{.}}
		---
		name: expand_mla_to_mad
		body: \|
		bb.0:
		renamable $p0 = PTRUE_B 31
		renamable $z0 = MLA_ZPZZZ_UNDEF_B killed renamable $p0, killed renamable $z2, killed renamable $z0, killed renamable $z1
		RET_ReallyLR implicit $z0
		...

llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-rem.ll

	Show First 20 Lines • Show All 103 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = srem <16 x i8> %op1, %op2			%res = srem <16 x i8> %op1, %op2
	ret <16 x i8> %res			ret <16 x i8> %res
	}			}

	define void @srem_v32i8(ptr %a, ptr %b) #0 {			define void @srem_v32i8(ptr %a, ptr %b) #0 {
	; CHECK-LABEL: srem_v32i8:			; CHECK-LABEL: srem_v32i8:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldp q1, q0, [x0]			; CHECK-NEXT: ldp q2, q0, [x0]
	; CHECK-NEXT: ptrue p0.s, vl4			; CHECK-NEXT: ptrue p0.s, vl4
	; CHECK-NEXT: ptrue p1.h, vl4			; CHECK-NEXT: ptrue p1.h, vl4
	; CHECK-NEXT: ldp q3, q2, [x1]			; CHECK-NEXT: ldp q3, q1, [x1]
	; CHECK-NEXT: mov z5.d, z0.d			; CHECK-NEXT: mov z5.d, z0.d
	; CHECK-NEXT: sunpklo z7.h, z0.b			; CHECK-NEXT: sunpklo z7.h, z0.b
	; CHECK-NEXT: ext z5.b, z5.b, z0.b, #8			; CHECK-NEXT: ext z5.b, z5.b, z0.b, #8
	; CHECK-NEXT: sunpklo z5.h, z5.b			; CHECK-NEXT: sunpklo z5.h, z5.b
	; CHECK-NEXT: sunpklo z18.s, z5.h			; CHECK-NEXT: sunpklo z18.s, z5.h
	; CHECK-NEXT: ext z5.b, z5.b, z5.b, #8			; CHECK-NEXT: ext z5.b, z5.b, z5.b, #8
	; CHECK-NEXT: sunpklo z5.s, z5.h			; CHECK-NEXT: sunpklo z5.s, z5.h
	; CHECK-NEXT: mov z4.d, z2.d			; CHECK-NEXT: mov z4.d, z1.d
	; CHECK-NEXT: sunpklo z6.h, z2.b			; CHECK-NEXT: sunpklo z6.h, z1.b
	; CHECK-NEXT: ext z4.b, z4.b, z2.b, #8			; CHECK-NEXT: ext z4.b, z4.b, z1.b, #8
	; CHECK-NEXT: sunpklo z16.s, z6.h			; CHECK-NEXT: sunpklo z16.s, z6.h
	; CHECK-NEXT: sunpklo z4.h, z4.b			; CHECK-NEXT: sunpklo z4.h, z4.b
	; CHECK-NEXT: ext z6.b, z6.b, z6.b, #8			; CHECK-NEXT: ext z6.b, z6.b, z6.b, #8
	; CHECK-NEXT: sunpklo z17.s, z4.h			; CHECK-NEXT: sunpklo z17.s, z4.h
	; CHECK-NEXT: ext z4.b, z4.b, z4.b, #8			; CHECK-NEXT: ext z4.b, z4.b, z4.b, #8
	; CHECK-NEXT: sunpklo z4.s, z4.h			; CHECK-NEXT: sunpklo z4.s, z4.h
	; CHECK-NEXT: sdivr z17.s, p0/m, z17.s, z18.s			; CHECK-NEXT: sdivr z17.s, p0/m, z17.s, z18.s
	; CHECK-NEXT: sdivr z4.s, p0/m, z4.s, z5.s			; CHECK-NEXT: sdivr z4.s, p0/m, z4.s, z5.s
	; CHECK-NEXT: sunpklo z18.s, z7.h			; CHECK-NEXT: sunpklo z18.s, z7.h
	; CHECK-NEXT: uzp1 z17.h, z17.h, z17.h			; CHECK-NEXT: uzp1 z17.h, z17.h, z17.h
	; CHECK-NEXT: uzp1 z4.h, z4.h, z4.h			; CHECK-NEXT: uzp1 z4.h, z4.h, z4.h
	; CHECK-NEXT: ext z7.b, z7.b, z7.b, #8			; CHECK-NEXT: ext z7.b, z7.b, z7.b, #8
	; CHECK-NEXT: sunpklo z5.s, z6.h			; CHECK-NEXT: sunpklo z5.s, z6.h
	; CHECK-NEXT: splice z17.h, p1, z17.h, z4.h			; CHECK-NEXT: splice z17.h, p1, z17.h, z4.h
	; CHECK-NEXT: sunpklo z4.s, z7.h			; CHECK-NEXT: sunpklo z4.s, z7.h
	; CHECK-NEXT: mov z6.d, z3.d			; CHECK-NEXT: mov z6.d, z3.d
	; CHECK-NEXT: mov z7.d, z1.d			; CHECK-NEXT: mov z7.d, z2.d
	; CHECK-NEXT: ext z6.b, z6.b, z3.b, #8			; CHECK-NEXT: ext z6.b, z6.b, z3.b, #8
	; CHECK-NEXT: ext z7.b, z7.b, z1.b, #8			; CHECK-NEXT: ext z7.b, z7.b, z2.b, #8
	; CHECK-NEXT: sdivr z16.s, p0/m, z16.s, z18.s			; CHECK-NEXT: sdivr z16.s, p0/m, z16.s, z18.s
	; CHECK-NEXT: sunpklo z6.h, z6.b			; CHECK-NEXT: sunpklo z6.h, z6.b
	; CHECK-NEXT: sunpklo z7.h, z7.b			; CHECK-NEXT: sunpklo z7.h, z7.b
	; CHECK-NEXT: sdiv z4.s, p0/m, z4.s, z5.s			; CHECK-NEXT: sdiv z4.s, p0/m, z4.s, z5.s
	; CHECK-NEXT: uzp1 z5.h, z16.h, z16.h			; CHECK-NEXT: uzp1 z5.h, z16.h, z16.h
	; CHECK-NEXT: sunpklo z16.s, z6.h			; CHECK-NEXT: sunpklo z16.s, z6.h
	; CHECK-NEXT: sunpklo z18.s, z7.h			; CHECK-NEXT: sunpklo z18.s, z7.h
	; CHECK-NEXT: ext z6.b, z6.b, z6.b, #8			; CHECK-NEXT: ext z6.b, z6.b, z6.b, #8
	; CHECK-NEXT: ext z7.b, z7.b, z7.b, #8			; CHECK-NEXT: ext z7.b, z7.b, z7.b, #8
	; CHECK-NEXT: sunpklo z6.s, z6.h			; CHECK-NEXT: sunpklo z6.s, z6.h
	; CHECK-NEXT: sunpklo z7.s, z7.h			; CHECK-NEXT: sunpklo z7.s, z7.h
	; CHECK-NEXT: sdivr z16.s, p0/m, z16.s, z18.s			; CHECK-NEXT: sdivr z16.s, p0/m, z16.s, z18.s
	; CHECK-NEXT: sdivr z6.s, p0/m, z6.s, z7.s			; CHECK-NEXT: sdivr z6.s, p0/m, z6.s, z7.s
	; CHECK-NEXT: uzp1 z4.h, z4.h, z4.h			; CHECK-NEXT: uzp1 z4.h, z4.h, z4.h
	; CHECK-NEXT: uzp1 z7.h, z16.h, z16.h			; CHECK-NEXT: uzp1 z7.h, z16.h, z16.h
	; CHECK-NEXT: uzp1 z6.h, z6.h, z6.h			; CHECK-NEXT: uzp1 z6.h, z6.h, z6.h
	; CHECK-NEXT: splice z5.h, p1, z5.h, z4.h			; CHECK-NEXT: splice z5.h, p1, z5.h, z4.h
	; CHECK-NEXT: splice z7.h, p1, z7.h, z6.h			; CHECK-NEXT: splice z7.h, p1, z7.h, z6.h
	; CHECK-NEXT: sunpklo z4.h, z3.b			; CHECK-NEXT: sunpklo z4.h, z3.b
	; CHECK-NEXT: sunpklo z6.h, z1.b			; CHECK-NEXT: sunpklo z6.h, z2.b
	; CHECK-NEXT: sunpklo z16.s, z4.h			; CHECK-NEXT: sunpklo z16.s, z4.h
	; CHECK-NEXT: sunpklo z18.s, z6.h			; CHECK-NEXT: sunpklo z18.s, z6.h
	; CHECK-NEXT: ext z4.b, z4.b, z4.b, #8			; CHECK-NEXT: ext z4.b, z4.b, z4.b, #8
	; CHECK-NEXT: ext z6.b, z6.b, z6.b, #8			; CHECK-NEXT: ext z6.b, z6.b, z6.b, #8
	; CHECK-NEXT: sunpklo z4.s, z4.h			; CHECK-NEXT: sunpklo z4.s, z4.h
	; CHECK-NEXT: sunpklo z6.s, z6.h			; CHECK-NEXT: sunpklo z6.s, z6.h
	; CHECK-NEXT: sdivr z16.s, p0/m, z16.s, z18.s			; CHECK-NEXT: sdivr z16.s, p0/m, z16.s, z18.s
	; CHECK-NEXT: sdivr z4.s, p0/m, z4.s, z6.s			; CHECK-NEXT: sdivr z4.s, p0/m, z4.s, z6.s
	; CHECK-NEXT: uzp1 z16.h, z16.h, z16.h			; CHECK-NEXT: uzp1 z16.h, z16.h, z16.h
	; CHECK-NEXT: uzp1 z4.h, z4.h, z4.h			; CHECK-NEXT: uzp1 z4.h, z4.h, z4.h
	; CHECK-NEXT: splice z16.h, p1, z16.h, z4.h			; CHECK-NEXT: splice z16.h, p1, z16.h, z4.h
	; CHECK-NEXT: uzp1 z6.b, z17.b, z17.b			; CHECK-NEXT: uzp1 z6.b, z17.b, z17.b
	; CHECK-NEXT: uzp1 z5.b, z5.b, z5.b			; CHECK-NEXT: uzp1 z5.b, z5.b, z5.b
	; CHECK-NEXT: ptrue p0.b, vl8			; CHECK-NEXT: ptrue p0.b, vl8
	; CHECK-NEXT: uzp1 z4.b, z7.b, z7.b			; CHECK-NEXT: uzp1 z4.b, z7.b, z7.b
	; CHECK-NEXT: uzp1 z7.b, z16.b, z16.b			; CHECK-NEXT: uzp1 z7.b, z16.b, z16.b
	; CHECK-NEXT: ptrue p1.b, vl16			; CHECK-NEXT: ptrue p1.b, vl16
	; CHECK-NEXT: splice z7.b, p0, z7.b, z4.b			; CHECK-NEXT: splice z7.b, p0, z7.b, z4.b
	; CHECK-NEXT: splice z5.b, p0, z5.b, z6.b			; CHECK-NEXT: splice z5.b, p0, z5.b, z6.b
	; CHECK-NEXT: mls z1.b, p1/m, z7.b, z3.b			; CHECK-NEXT: mls z2.b, p1/m, z7.b, z3.b
	; CHECK-NEXT: mls z0.b, p1/m, z5.b, z2.b			; CHECK-NEXT: mls z0.b, p1/m, z5.b, z1.b
	; CHECK-NEXT: stp q1, q0, [x0]			; CHECK-NEXT: stp q2, q0, [x0]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%op1 = load <32 x i8>, ptr %a			%op1 = load <32 x i8>, ptr %a
	%op2 = load <32 x i8>, ptr %b			%op2 = load <32 x i8>, ptr %b
	%res = srem <32 x i8> %op1, %op2			%res = srem <32 x i8> %op1, %op2
	store <32 x i8> %res, ptr %a			store <32 x i8> %res, ptr %a
	ret void			ret void
	}			}

	▲ Show 20 Lines • Show All 292 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = urem <16 x i8> %op1, %op2			%res = urem <16 x i8> %op1, %op2
	ret <16 x i8> %res			ret <16 x i8> %res
	}			}

	define void @urem_v32i8(ptr %a, ptr %b) #0 {			define void @urem_v32i8(ptr %a, ptr %b) #0 {
	; CHECK-LABEL: urem_v32i8:			; CHECK-LABEL: urem_v32i8:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldp q1, q0, [x0]			; CHECK-NEXT: ldp q2, q0, [x0]
	; CHECK-NEXT: ptrue p0.s, vl4			; CHECK-NEXT: ptrue p0.s, vl4
	; CHECK-NEXT: ptrue p1.h, vl4			; CHECK-NEXT: ptrue p1.h, vl4
	; CHECK-NEXT: ldp q3, q2, [x1]			; CHECK-NEXT: ldp q3, q1, [x1]
	; CHECK-NEXT: mov z5.d, z0.d			; CHECK-NEXT: mov z5.d, z0.d
	; CHECK-NEXT: uunpklo z7.h, z0.b			; CHECK-NEXT: uunpklo z7.h, z0.b
	; CHECK-NEXT: ext z5.b, z5.b, z0.b, #8			; CHECK-NEXT: ext z5.b, z5.b, z0.b, #8
	; CHECK-NEXT: uunpklo z5.h, z5.b			; CHECK-NEXT: uunpklo z5.h, z5.b
	; CHECK-NEXT: uunpklo z18.s, z5.h			; CHECK-NEXT: uunpklo z18.s, z5.h
	; CHECK-NEXT: ext z5.b, z5.b, z5.b, #8			; CHECK-NEXT: ext z5.b, z5.b, z5.b, #8
	; CHECK-NEXT: uunpklo z5.s, z5.h			; CHECK-NEXT: uunpklo z5.s, z5.h
	; CHECK-NEXT: mov z4.d, z2.d			; CHECK-NEXT: mov z4.d, z1.d
	; CHECK-NEXT: uunpklo z6.h, z2.b			; CHECK-NEXT: uunpklo z6.h, z1.b
	; CHECK-NEXT: ext z4.b, z4.b, z2.b, #8			; CHECK-NEXT: ext z4.b, z4.b, z1.b, #8
	; CHECK-NEXT: uunpklo z16.s, z6.h			; CHECK-NEXT: uunpklo z16.s, z6.h
	; CHECK-NEXT: uunpklo z4.h, z4.b			; CHECK-NEXT: uunpklo z4.h, z4.b
	; CHECK-NEXT: ext z6.b, z6.b, z6.b, #8			; CHECK-NEXT: ext z6.b, z6.b, z6.b, #8
	; CHECK-NEXT: uunpklo z17.s, z4.h			; CHECK-NEXT: uunpklo z17.s, z4.h
	; CHECK-NEXT: ext z4.b, z4.b, z4.b, #8			; CHECK-NEXT: ext z4.b, z4.b, z4.b, #8
	; CHECK-NEXT: uunpklo z4.s, z4.h			; CHECK-NEXT: uunpklo z4.s, z4.h
	; CHECK-NEXT: udivr z17.s, p0/m, z17.s, z18.s			; CHECK-NEXT: udivr z17.s, p0/m, z17.s, z18.s
	; CHECK-NEXT: udivr z4.s, p0/m, z4.s, z5.s			; CHECK-NEXT: udivr z4.s, p0/m, z4.s, z5.s
	; CHECK-NEXT: uunpklo z18.s, z7.h			; CHECK-NEXT: uunpklo z18.s, z7.h
	; CHECK-NEXT: uzp1 z17.h, z17.h, z17.h			; CHECK-NEXT: uzp1 z17.h, z17.h, z17.h
	; CHECK-NEXT: uzp1 z4.h, z4.h, z4.h			; CHECK-NEXT: uzp1 z4.h, z4.h, z4.h
	; CHECK-NEXT: ext z7.b, z7.b, z7.b, #8			; CHECK-NEXT: ext z7.b, z7.b, z7.b, #8
	; CHECK-NEXT: uunpklo z5.s, z6.h			; CHECK-NEXT: uunpklo z5.s, z6.h
	; CHECK-NEXT: splice z17.h, p1, z17.h, z4.h			; CHECK-NEXT: splice z17.h, p1, z17.h, z4.h
	; CHECK-NEXT: uunpklo z4.s, z7.h			; CHECK-NEXT: uunpklo z4.s, z7.h
	; CHECK-NEXT: mov z6.d, z3.d			; CHECK-NEXT: mov z6.d, z3.d
	; CHECK-NEXT: mov z7.d, z1.d			; CHECK-NEXT: mov z7.d, z2.d
	; CHECK-NEXT: ext z6.b, z6.b, z3.b, #8			; CHECK-NEXT: ext z6.b, z6.b, z3.b, #8
	; CHECK-NEXT: ext z7.b, z7.b, z1.b, #8			; CHECK-NEXT: ext z7.b, z7.b, z2.b, #8
	; CHECK-NEXT: udivr z16.s, p0/m, z16.s, z18.s			; CHECK-NEXT: udivr z16.s, p0/m, z16.s, z18.s
	; CHECK-NEXT: uunpklo z6.h, z6.b			; CHECK-NEXT: uunpklo z6.h, z6.b
	; CHECK-NEXT: uunpklo z7.h, z7.b			; CHECK-NEXT: uunpklo z7.h, z7.b
	; CHECK-NEXT: udiv z4.s, p0/m, z4.s, z5.s			; CHECK-NEXT: udiv z4.s, p0/m, z4.s, z5.s
	; CHECK-NEXT: uzp1 z5.h, z16.h, z16.h			; CHECK-NEXT: uzp1 z5.h, z16.h, z16.h
	; CHECK-NEXT: uunpklo z16.s, z6.h			; CHECK-NEXT: uunpklo z16.s, z6.h
	; CHECK-NEXT: uunpklo z18.s, z7.h			; CHECK-NEXT: uunpklo z18.s, z7.h
	; CHECK-NEXT: ext z6.b, z6.b, z6.b, #8			; CHECK-NEXT: ext z6.b, z6.b, z6.b, #8
	; CHECK-NEXT: ext z7.b, z7.b, z7.b, #8			; CHECK-NEXT: ext z7.b, z7.b, z7.b, #8
	; CHECK-NEXT: uunpklo z6.s, z6.h			; CHECK-NEXT: uunpklo z6.s, z6.h
	; CHECK-NEXT: uunpklo z7.s, z7.h			; CHECK-NEXT: uunpklo z7.s, z7.h
	; CHECK-NEXT: udivr z16.s, p0/m, z16.s, z18.s			; CHECK-NEXT: udivr z16.s, p0/m, z16.s, z18.s
	; CHECK-NEXT: udivr z6.s, p0/m, z6.s, z7.s			; CHECK-NEXT: udivr z6.s, p0/m, z6.s, z7.s
	; CHECK-NEXT: uzp1 z4.h, z4.h, z4.h			; CHECK-NEXT: uzp1 z4.h, z4.h, z4.h
	; CHECK-NEXT: uzp1 z7.h, z16.h, z16.h			; CHECK-NEXT: uzp1 z7.h, z16.h, z16.h
	; CHECK-NEXT: uzp1 z6.h, z6.h, z6.h			; CHECK-NEXT: uzp1 z6.h, z6.h, z6.h
	; CHECK-NEXT: splice z5.h, p1, z5.h, z4.h			; CHECK-NEXT: splice z5.h, p1, z5.h, z4.h
	; CHECK-NEXT: splice z7.h, p1, z7.h, z6.h			; CHECK-NEXT: splice z7.h, p1, z7.h, z6.h
	; CHECK-NEXT: uunpklo z4.h, z3.b			; CHECK-NEXT: uunpklo z4.h, z3.b
	; CHECK-NEXT: uunpklo z6.h, z1.b			; CHECK-NEXT: uunpklo z6.h, z2.b
	; CHECK-NEXT: uunpklo z16.s, z4.h			; CHECK-NEXT: uunpklo z16.s, z4.h
	; CHECK-NEXT: uunpklo z18.s, z6.h			; CHECK-NEXT: uunpklo z18.s, z6.h
	; CHECK-NEXT: ext z4.b, z4.b, z4.b, #8			; CHECK-NEXT: ext z4.b, z4.b, z4.b, #8
	; CHECK-NEXT: ext z6.b, z6.b, z6.b, #8			; CHECK-NEXT: ext z6.b, z6.b, z6.b, #8
	; CHECK-NEXT: uunpklo z4.s, z4.h			; CHECK-NEXT: uunpklo z4.s, z4.h
	; CHECK-NEXT: uunpklo z6.s, z6.h			; CHECK-NEXT: uunpklo z6.s, z6.h
	; CHECK-NEXT: udivr z16.s, p0/m, z16.s, z18.s			; CHECK-NEXT: udivr z16.s, p0/m, z16.s, z18.s
	; CHECK-NEXT: udivr z4.s, p0/m, z4.s, z6.s			; CHECK-NEXT: udivr z4.s, p0/m, z4.s, z6.s
	; CHECK-NEXT: uzp1 z16.h, z16.h, z16.h			; CHECK-NEXT: uzp1 z16.h, z16.h, z16.h
	; CHECK-NEXT: uzp1 z4.h, z4.h, z4.h			; CHECK-NEXT: uzp1 z4.h, z4.h, z4.h
	; CHECK-NEXT: splice z16.h, p1, z16.h, z4.h			; CHECK-NEXT: splice z16.h, p1, z16.h, z4.h
	; CHECK-NEXT: uzp1 z6.b, z17.b, z17.b			; CHECK-NEXT: uzp1 z6.b, z17.b, z17.b
	; CHECK-NEXT: uzp1 z5.b, z5.b, z5.b			; CHECK-NEXT: uzp1 z5.b, z5.b, z5.b
	; CHECK-NEXT: ptrue p0.b, vl8			; CHECK-NEXT: ptrue p0.b, vl8
	; CHECK-NEXT: uzp1 z4.b, z7.b, z7.b			; CHECK-NEXT: uzp1 z4.b, z7.b, z7.b
	; CHECK-NEXT: uzp1 z7.b, z16.b, z16.b			; CHECK-NEXT: uzp1 z7.b, z16.b, z16.b
	; CHECK-NEXT: ptrue p1.b, vl16			; CHECK-NEXT: ptrue p1.b, vl16
	; CHECK-NEXT: splice z7.b, p0, z7.b, z4.b			; CHECK-NEXT: splice z7.b, p0, z7.b, z4.b
	; CHECK-NEXT: splice z5.b, p0, z5.b, z6.b			; CHECK-NEXT: splice z5.b, p0, z5.b, z6.b
	; CHECK-NEXT: mls z1.b, p1/m, z7.b, z3.b			; CHECK-NEXT: mls z2.b, p1/m, z7.b, z3.b
	; CHECK-NEXT: mls z0.b, p1/m, z5.b, z2.b			; CHECK-NEXT: mls z0.b, p1/m, z5.b, z1.b
	; CHECK-NEXT: stp q1, q0, [x0]			; CHECK-NEXT: stp q2, q0, [x0]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%op1 = load <32 x i8>, ptr %a			%op1 = load <32 x i8>, ptr %a
	%op2 = load <32 x i8>, ptr %b			%op2 = load <32 x i8>, ptr %b
	%res = urem <32 x i8> %op1, %op2			%res = urem <32 x i8> %op1, %op2
	store <32 x i8> %res, ptr %a			store <32 x i8> %res, ptr %a
	ret void			ret void
	}			}

	▲ Show 20 Lines • Show All 195 Lines • Show Last 20 Lines