This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
4
AArch64SVEInstrInfo.td
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
3
sve-sel-instruction-undef-predicate.ll

Differential D147619

[SVE] Add patterns to delete redundant sel instructions
Needs ReviewPublic

Authored by lizhijin on Apr 5 2023, 8:03 AM.

Download Raw Diff

Details

Reviewers

paulwalker-arm
sdesmalen
david-arm
0-wiz-0
efriedma
kmclaughlin

Summary

For instructions with FalseLanesUndef, we can transform:

(sel $pg (inst $pg, $op1, $op2), $op2)

(inst $pg, $op1, $op2)

Diff Detail

Event Timeline

lizhijin created this revision.Apr 5 2023, 8:03 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 5 2023, 8:03 AM

Herald added subscribers: psnobl, hiraditya, tschuett. · View Herald Transcript

lizhijin requested review of this revision.Apr 5 2023, 8:03 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 5 2023, 8:03 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B223806: Diff 511106.Apr 5 2023, 9:08 AM

Matt added a subscriber: Matt.Apr 5 2023, 10:37 PM

Hi @lizhijin, I've added comments relating to the correctness of the patch but I'm a little concerned about the relevance of the patch's intent.

What real world use case are you targeting? I ask because on first glance it seems like you care about instances where users simulate svadd_m like builtins via a combination of svadd_x and svsel builtins. If this is the case then I think a cleaner implementation would be to handle this as an instcombine. However, if you're using the _u intrinsics purely as a proxy, then I'm wondering what the real IR looks like.

If you look at AArch64fadd_m1 you'll see we already have some support for the style of merging operations we expect to get during auto-vectorisation and there's sure to be benefit in extending this to cover more of the binops as your patch does but the patterns themselves are slightly different (i.e. the binop typically takes an all active predicate).

None of this necessary blocks this patch but I'm worried about unnecessarily polluting isel.

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
442	This looks wrong and the likely reason for the incorrect test output. It doesn't make sense for the same operation to match to both these `vselect` patterns because their expected results differ. If you follow the advice attached to `AArch64fadd_p1` below, you'll not need this class but for reference you'll either need one of these `vselect` patterns for non-commutative operations like (sub, fdiv) with the other pattern useful for the `AArch64fsubr_m1` if you want to cover those cases. For the commutative operations the second `vselect` pattern probably wants to be something like (vselect node:$pg, (sdnode node:$pg, node:$op2, node:$op1), node:$op1)
449	We already have a naming scheme for such operations which in this case is `AArch64fadd_m1`, which already exists. You should be able to extend the existing PatFrags and then the existing patterns will just work.
458–459	Given we already have AArch64ISD nodes for the unary passthru operations I think it'll be better to remove the select via a DAG combine? If you agree then I'd prefer this to be done as a separate patch.
651–660	These pseudo instructions intentionally have no requirement for the results for the inactive lanes and thus do not represent the behaviour you expect from from your `_p1` nodes. If you follow the advice above you'll not need to change this block of code.
llvm/test/CodeGen/AArch64/sve-sel-instruction-undef-predicate.ll
2	Please remove the `-mtriple` parameter because the triple is already set within the IR.
11	Please can you remove all the `entry:`'s because the tests shouldn't need them.
40–43	This output is identical to `fadd_f64` which is incorrect because this test wants the results for inactive lanes to come from %b (i.e. z1). I'd expect the output to be `fadd z1.d, p0/m, z1.d, z0.d`?

The aim is to combine svadd_x and svsel intrinsics.
As we can see gcc has patterns to combine sel and some other instructions, while clang can't.
gcc : https://godbolt.org/z/G4aYefG51
clang : https://godbolt.org/z/G4aYefG51

Firstly I want to implement it by DAG Combiner for most instructions (sel + svmul , sel + fmla and etc) , but I find some instructions don't have DAG opcode, like fmulx, fsubr, etc. Do you mean to implement it in the mid-end inst combine or DAG-Combine or machine-combiner ?

In D147619#4248541, @paulwalker-arm wrote:

Hi @lizhijin, I've added comments relating to the correctness of the patch but I'm a little concerned about the relevance of the patch's intent.

What real world use case are you targeting? I ask because on first glance it seems like you care about instances where users simulate svadd_m like builtins via a combination of svadd_x and svsel builtins. If this is the case then I think a cleaner implementation would be to handle this as an instcombine. However, if you're using the _u intrinsics purely as a proxy, then I'm wondering what the real IR looks like.

If you look at AArch64fadd_m1 you'll see we already have some support for the style of merging operations we expect to get during auto-vectorisation and there's sure to be benefit in extending this to cover more of the binops as your patch does but the patterns themselves are slightly different (i.e. the binop typically takes an all active predicate).

None of this necessary blocks this patch but I'm worried about unnecessarily polluting isel.

Thanks for the clarification. In this instance, based on the current design, these look like inst combines to me (see AArch64TTIImpl::instCombineIntrinsic).

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64SVEInstrInfo.td

45 lines

test/

CodeGen/

AArch64/

sve-sel-instruction-undef-predicate.ll

500 lines

Diff 511106

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 429 Lines • ▼ Show 20 Lines
def AArch64fmla_m1 : fma_patfrags<int_aarch64_sve_fmla, AArch64fadd_p_nsz>;		def AArch64fmla_m1 : fma_patfrags<int_aarch64_sve_fmla, AArch64fadd_p_nsz>;
def AArch64fmls_m1 : fma_patfrags<int_aarch64_sve_fmls, AArch64fsub_p>;		def AArch64fmls_m1 : fma_patfrags<int_aarch64_sve_fmls, AArch64fsub_p>;

def AArch64smax_m1 : EitherVSelectOrPassthruPatFrags<int_aarch64_sve_smax, AArch64smax_p>;		def AArch64smax_m1 : EitherVSelectOrPassthruPatFrags<int_aarch64_sve_smax, AArch64smax_p>;
def AArch64umax_m1 : EitherVSelectOrPassthruPatFrags<int_aarch64_sve_umax, AArch64umax_p>;		def AArch64umax_m1 : EitherVSelectOrPassthruPatFrags<int_aarch64_sve_umax, AArch64umax_p>;
def AArch64smin_m1 : EitherVSelectOrPassthruPatFrags<int_aarch64_sve_smin, AArch64smin_p>;		def AArch64smin_m1 : EitherVSelectOrPassthruPatFrags<int_aarch64_sve_smin, AArch64smin_p>;
def AArch64umin_m1 : EitherVSelectOrPassthruPatFrags<int_aarch64_sve_umin, AArch64umin_p>;		def AArch64umin_m1 : EitherVSelectOrPassthruPatFrags<int_aarch64_sve_umin, AArch64umin_p>;

		class fp_bin_patfrags<SDPatternOperator sdnode>
		: PatFrags<(ops node:$pg, node:$op1, node:$op2),
		[(sdnode node:$pg, node:$op1, node:$op2),
		(vselect node:$pg, (sdnode node:$pg, node:$op1, node:$op2), node:$op1),
		(vselect node:$pg, (sdnode node:$pg, node:$op1, node:$op2), node:$op2)]>;
		paulwalker-armUnsubmitted Not Done Reply Inline Actions This looks wrong and the likely reason for the incorrect test output. It doesn't make sense for the same operation to match to both these `vselect` patterns because their expected results differ. If you follow the advice attached to `AArch64fadd_p1` below, you'll not need this class but for reference you'll either need one of these `vselect` patterns for non-commutative operations like (sub, fdiv) with the other pattern useful for the `AArch64fsubr_m1` if you want to cover those cases. For the commutative operations the second `vselect` pattern probably wants to be something like (vselect node:$pg, (sdnode node:$pg, node:$op2, node:$op1), node:$op1) paulwalker-arm: This looks wrong and the likely reason for the incorrect test output. It doesn't make sense…

		class fp_unary_patfrags<SDPatternOperator sdnode>
		: PatFrags<(ops node:$pg, node:$op, node:$pt),
		[(sdnode node:$pg, node:$op, node:$pt),
		(vselect node:$pg, (sdnode node:$pg, node:$op, node:$pt), node:$op)]>;

		def AArch64fadd_p1 : fp_bin_patfrags<AArch64fadd_p>;
		paulwalker-armUnsubmitted Not Done Reply Inline Actions We already have a naming scheme for such operations which in this case is `AArch64fadd_m1`, which already exists. You should be able to extend the existing PatFrags and then the existing patterns will just work. paulwalker-arm: We already have a naming scheme for such operations which in this case is `AArch64fadd_m1`…
		def AArch64fsub_p1 : fp_bin_patfrags<AArch64fsub_p>;
		def AArch64fmul_p1 : fp_bin_patfrags<AArch64fmul_p>;
		def AArch64fmaxnm_p1 : fp_bin_patfrags<AArch64fmaxnm_p>;
		def AArch64fminnm_p1 : fp_bin_patfrags<AArch64fminnm_p>;
		def AArch64fmax_p1 : fp_bin_patfrags<AArch64fmax_p>;
		def AArch64fmin_p1 : fp_bin_patfrags<AArch64fmin_p>;
		def AArch64fabd_p1 : fp_bin_patfrags<AArch64fabd_p>;
		def AArch64fdiv_p1 : fp_bin_patfrags<AArch64fdiv_p>;
		def AArch64fneg_mt1 : fp_unary_patfrags<AArch64fneg_mt>;
		def AArch64fabs_mt1 : fp_unary_patfrags<AArch64fabs_mt>;
		paulwalker-armUnsubmitted Not Done Reply Inline Actions Given we already have AArch64ISD nodes for the unary passthru operations I think it'll be better to remove the select via a DAG combine? If you agree then I'd prefer this to be done as a separate patch. paulwalker-arm: Given we already have AArch64ISD nodes for the unary passthru operations I think it'll be…

let Predicates = [HasSVE] in {		let Predicates = [HasSVE] in {
defm RDFFR_PPz : sve_int_rdffr_pred<0b0, "rdffr", int_aarch64_sve_rdffr_z>;		defm RDFFR_PPz : sve_int_rdffr_pred<0b0, "rdffr", int_aarch64_sve_rdffr_z>;
def RDFFRS_PPz : sve_int_rdffr_pred<0b1, "rdffrs">;		def RDFFRS_PPz : sve_int_rdffr_pred<0b1, "rdffrs">;
defm RDFFR_P : sve_int_rdffr_unpred<"rdffr", int_aarch64_sve_rdffr>;		defm RDFFR_P : sve_int_rdffr_unpred<"rdffr", int_aarch64_sve_rdffr>;
def SETFFR : sve_int_setffr<"setffr", int_aarch64_sve_setffr>;		def SETFFR : sve_int_setffr<"setffr", int_aarch64_sve_setffr>;
def WRFFR : sve_int_wrffr<"wrffr", int_aarch64_sve_wrffr>;		def WRFFR : sve_int_wrffr<"wrffr", int_aarch64_sve_wrffr>;
} // End HasSVE		} // End HasSVE

▲ Show 20 Lines • Show All 101 Lines • ▼ Show 20 Lines	let Predicates = [HasSVEorSME] in {
defm ABS_ZPmZ : sve_int_un_pred_arit_0< 0b110, "abs", AArch64abs_mt>;		defm ABS_ZPmZ : sve_int_un_pred_arit_0< 0b110, "abs", AArch64abs_mt>;
defm NEG_ZPmZ : sve_int_un_pred_arit_0< 0b111, "neg", AArch64neg_mt>;		defm NEG_ZPmZ : sve_int_un_pred_arit_0< 0b111, "neg", AArch64neg_mt>;

defm CLS_ZPmZ : sve_int_un_pred_arit_1< 0b000, "cls", AArch64cls_mt>;		defm CLS_ZPmZ : sve_int_un_pred_arit_1< 0b000, "cls", AArch64cls_mt>;
defm CLZ_ZPmZ : sve_int_un_pred_arit_1< 0b001, "clz", AArch64clz_mt>;		defm CLZ_ZPmZ : sve_int_un_pred_arit_1< 0b001, "clz", AArch64clz_mt>;
defm CNT_ZPmZ : sve_int_un_pred_arit_1< 0b010, "cnt", AArch64cnt_mt>;		defm CNT_ZPmZ : sve_int_un_pred_arit_1< 0b010, "cnt", AArch64cnt_mt>;
defm CNOT_ZPmZ : sve_int_un_pred_arit_1< 0b011, "cnot", AArch64cnot_mt>;		defm CNOT_ZPmZ : sve_int_un_pred_arit_1< 0b011, "cnot", AArch64cnot_mt>;
defm NOT_ZPmZ : sve_int_un_pred_arit_1< 0b110, "not", AArch64not_mt>;		defm NOT_ZPmZ : sve_int_un_pred_arit_1< 0b110, "not", AArch64not_mt>;
defm FABS_ZPmZ : sve_int_un_pred_arit_1_fp<0b100, "fabs", AArch64fabs_mt>;		defm FABS_ZPmZ : sve_int_un_pred_arit_1_fp<0b100, "fabs", AArch64fabs_mt1>;
defm FNEG_ZPmZ : sve_int_un_pred_arit_1_fp<0b101, "fneg", AArch64fneg_mt>;		defm FNEG_ZPmZ : sve_int_un_pred_arit_1_fp<0b101, "fneg", AArch64fneg_mt1>;

// zext(cmpeq(x, splat(0))) -> cnot(x)		// zext(cmpeq(x, splat(0))) -> cnot(x)
def : Pat<(nxv16i8 (zext (nxv16i1 (AArch64setcc_z (nxv16i1 (SVEAllActive):$Pg), nxv16i8:$Op2, (SVEDup0), SETEQ)))),		def : Pat<(nxv16i8 (zext (nxv16i1 (AArch64setcc_z (nxv16i1 (SVEAllActive):$Pg), nxv16i8:$Op2, (SVEDup0), SETEQ)))),
(CNOT_ZPmZ_B $Op2, $Pg, $Op2)>;		(CNOT_ZPmZ_B $Op2, $Pg, $Op2)>;
def : Pat<(nxv8i16 (zext (nxv8i1 (AArch64setcc_z (nxv8i1 (SVEAllActive):$Pg), nxv8i16:$Op2, (SVEDup0), SETEQ)))),		def : Pat<(nxv8i16 (zext (nxv8i1 (AArch64setcc_z (nxv8i1 (SVEAllActive):$Pg), nxv8i16:$Op2, (SVEDup0), SETEQ)))),
(CNOT_ZPmZ_H $Op2, $Pg, $Op2)>;		(CNOT_ZPmZ_H $Op2, $Pg, $Op2)>;
def : Pat<(nxv4i32 (zext (nxv4i1 (AArch64setcc_z (nxv4i1 (SVEAllActive):$Pg), nxv4i32:$Op2, (SVEDup0), SETEQ)))),		def : Pat<(nxv4i32 (zext (nxv4i1 (AArch64setcc_z (nxv4i1 (SVEAllActive):$Pg), nxv4i32:$Op2, (SVEDup0), SETEQ)))),
(CNOT_ZPmZ_S $Op2, $Pg, $Op2)>;		(CNOT_ZPmZ_S $Op2, $Pg, $Op2)>;
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	let Predicates = [HasSVEorSME] in {
defm FMAX_ZPmZ : sve_fp_2op_p_zds<0b0110, "fmax", "FMAX_ZPZZ", int_aarch64_sve_fmax, DestructiveBinaryComm>;		defm FMAX_ZPmZ : sve_fp_2op_p_zds<0b0110, "fmax", "FMAX_ZPZZ", int_aarch64_sve_fmax, DestructiveBinaryComm>;
defm FMIN_ZPmZ : sve_fp_2op_p_zds<0b0111, "fmin", "FMIN_ZPZZ", int_aarch64_sve_fmin, DestructiveBinaryComm>;		defm FMIN_ZPmZ : sve_fp_2op_p_zds<0b0111, "fmin", "FMIN_ZPZZ", int_aarch64_sve_fmin, DestructiveBinaryComm>;
defm FABD_ZPmZ : sve_fp_2op_p_zds<0b1000, "fabd", "FABD_ZPZZ", int_aarch64_sve_fabd, DestructiveBinaryComm>;		defm FABD_ZPmZ : sve_fp_2op_p_zds<0b1000, "fabd", "FABD_ZPZZ", int_aarch64_sve_fabd, DestructiveBinaryComm>;
defm FSCALE_ZPmZ : sve_fp_2op_p_zds_fscale<0b1001, "fscale", int_aarch64_sve_fscale>;		defm FSCALE_ZPmZ : sve_fp_2op_p_zds_fscale<0b1001, "fscale", int_aarch64_sve_fscale>;
defm FMULX_ZPmZ : sve_fp_2op_p_zds<0b1010, "fmulx", "FMULX_ZPZZ", int_aarch64_sve_fmulx, DestructiveBinaryComm>;		defm FMULX_ZPmZ : sve_fp_2op_p_zds<0b1010, "fmulx", "FMULX_ZPZZ", int_aarch64_sve_fmulx, DestructiveBinaryComm>;
defm FDIVR_ZPmZ : sve_fp_2op_p_zds<0b1100, "fdivr", "FDIVR_ZPZZ", int_aarch64_sve_fdivr, DestructiveBinaryCommWithRev, "FDIV_ZPmZ", /isReverseInstr/ 1>;		defm FDIVR_ZPmZ : sve_fp_2op_p_zds<0b1100, "fdivr", "FDIVR_ZPZZ", int_aarch64_sve_fdivr, DestructiveBinaryCommWithRev, "FDIV_ZPmZ", /isReverseInstr/ 1>;
defm FDIV_ZPmZ : sve_fp_2op_p_zds<0b1101, "fdiv", "FDIV_ZPZZ", int_aarch64_sve_fdiv, DestructiveBinaryCommWithRev, "FDIVR_ZPmZ">;		defm FDIV_ZPmZ : sve_fp_2op_p_zds<0b1101, "fdiv", "FDIV_ZPZZ", int_aarch64_sve_fdiv, DestructiveBinaryCommWithRev, "FDIVR_ZPmZ">;

defm FADD_ZPZZ : sve_fp_bin_pred_hfd<AArch64fadd_p>;		defm FADD_ZPZZ : sve_fp_bin_pred_hfd<AArch64fadd_p1>;
defm FSUB_ZPZZ : sve_fp_bin_pred_hfd<AArch64fsub_p>;		defm FSUB_ZPZZ : sve_fp_bin_pred_hfd<AArch64fsub_p1>;
defm FMUL_ZPZZ : sve_fp_bin_pred_hfd<AArch64fmul_p>;		defm FMUL_ZPZZ : sve_fp_bin_pred_hfd<AArch64fmul_p1>;
defm FMAXNM_ZPZZ : sve_fp_bin_pred_hfd<AArch64fmaxnm_p>;		defm FMAXNM_ZPZZ : sve_fp_bin_pred_hfd<AArch64fmaxnm_p1>;
defm FMINNM_ZPZZ : sve_fp_bin_pred_hfd<AArch64fminnm_p>;		defm FMINNM_ZPZZ : sve_fp_bin_pred_hfd<AArch64fminnm_p1>;
defm FMAX_ZPZZ : sve_fp_bin_pred_hfd<AArch64fmax_p>;		defm FMAX_ZPZZ : sve_fp_bin_pred_hfd<AArch64fmax_p1>;
defm FMIN_ZPZZ : sve_fp_bin_pred_hfd<AArch64fmin_p>;		defm FMIN_ZPZZ : sve_fp_bin_pred_hfd<AArch64fmin_p1>;
defm FABD_ZPZZ : sve_fp_bin_pred_hfd<AArch64fabd_p>;		defm FABD_ZPZZ : sve_fp_bin_pred_hfd<AArch64fabd_p1>;
defm FMULX_ZPZZ : sve_fp_bin_pred_hfd<int_aarch64_sve_fmulx_u>;		defm FMULX_ZPZZ : sve_fp_bin_pred_hfd<int_aarch64_sve_fmulx_u>;
defm FDIV_ZPZZ : sve_fp_bin_pred_hfd<AArch64fdiv_p>;		defm FDIV_ZPZZ : sve_fp_bin_pred_hfd<AArch64fdiv_p1>;
		paulwalker-armUnsubmitted Not Done Reply Inline Actions These pseudo instructions intentionally have no requirement for the results for the inactive lanes and thus do not represent the behaviour you expect from from your `_p1` nodes. If you follow the advice above you'll not need to change this block of code. paulwalker-arm: These pseudo instructions intentionally have no requirement for the results for the inactive…
} // End HasSVEorSME		} // End HasSVEorSME

let Predicates = [HasSVEorSME, UseExperimentalZeroingPseudos] in {		let Predicates = [HasSVEorSME, UseExperimentalZeroingPseudos] in {
defm FADD_ZPZZ : sve_fp_2op_p_zds_zeroing_hsd<int_aarch64_sve_fadd>;		defm FADD_ZPZZ : sve_fp_2op_p_zds_zeroing_hsd<int_aarch64_sve_fadd>;
defm FSUB_ZPZZ : sve_fp_2op_p_zds_zeroing_hsd<int_aarch64_sve_fsub>;		defm FSUB_ZPZZ : sve_fp_2op_p_zds_zeroing_hsd<int_aarch64_sve_fsub>;
defm FMUL_ZPZZ : sve_fp_2op_p_zds_zeroing_hsd<int_aarch64_sve_fmul>;		defm FMUL_ZPZZ : sve_fp_2op_p_zds_zeroing_hsd<int_aarch64_sve_fmul>;
defm FSUBR_ZPZZ : sve_fp_2op_p_zds_zeroing_hsd<int_aarch64_sve_fsubr>;		defm FSUBR_ZPZZ : sve_fp_2op_p_zds_zeroing_hsd<int_aarch64_sve_fsubr>;
defm FMAXNM_ZPZZ : sve_fp_2op_p_zds_zeroing_hsd<int_aarch64_sve_fmaxnm>;		defm FMAXNM_ZPZZ : sve_fp_2op_p_zds_zeroing_hsd<int_aarch64_sve_fmaxnm>;
▲ Show 20 Lines • Show All 3,283 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-sel-instruction-undef-predicate.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2
				; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s \| FileCheck %s
				paulwalker-armUnsubmitted Not Done Reply Inline Actions Please remove the `-mtriple` parameter because the triple is already set within the IR. paulwalker-arm: Please remove the `-mtriple` parameter because the triple is already set within the IR.

				target triple = "aarch64-unknown-linux-gnu"

				define <vscale x 2 x double> @fadd_f64(<vscale x 2 x i1> %pg, <vscale x 2 x double> %a, <vscale x 2 x double> %b) {
				; CHECK-LABEL: fadd_f64:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fadd z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				entry:
				paulwalker-armUnsubmitted Not Done Reply Inline Actions Please can you remove all the `entry:`'s because the tests shouldn't need them. paulwalker-arm: Please can you remove all the `entry:`'s because the tests shouldn't need them.
				%r = tail call <vscale x 2 x double> @llvm.aarch64.sve.fadd.u.nxv2f64(<vscale x 2 x i1> %pg, <vscale x 2 x double> %a, <vscale x 2 x double> %b)
				%sel = select <vscale x 2 x i1> %pg, <vscale x 2 x double> %r, <vscale x 2 x double> %a
				ret <vscale x 2 x double> %sel
				}

				define <vscale x 4 x float> @fadd_f32(<vscale x 4 x i1> %pg, <vscale x 4 x float> %a, <vscale x 4 x float> %b) {
				; CHECK-LABEL: fadd_f32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fadd z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: ret
				entry:
				%r = tail call <vscale x 4 x float> @llvm.aarch64.sve.fadd.u.nxv4f32(<vscale x 4 x i1> %pg, <vscale x 4 x float> %a, <vscale x 4 x float> %b)
				%sel = select <vscale x 4 x i1> %pg, <vscale x 4 x float> %r, <vscale x 4 x float> %a
				ret <vscale x 4 x float> %sel
				}

				define <vscale x 8 x half> @fadd_f16(<vscale x 8 x i1> %pg, <vscale x 8 x half> %a, <vscale x 8 x half> %b) {
				; CHECK-LABEL: fadd_f16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fadd z0.h, p0/m, z0.h, z1.h
				; CHECK-NEXT: ret
				entry:
				%r = tail call <vscale x 8 x half> @llvm.aarch64.sve.fadd.u.nxv8f16(<vscale x 8 x i1> %pg, <vscale x 8 x half> %a, <vscale x 8 x half> %b)
				%sel = select <vscale x 8 x i1> %pg, <vscale x 8 x half> %r, <vscale x 8 x half> %a
				ret <vscale x 8 x half> %sel
				}

				define <vscale x 2 x double> @fadd_rev_f64(<vscale x 2 x i1> %pg, <vscale x 2 x double> %a, <vscale x 2 x double> %b) {
				; CHECK-LABEL: fadd_rev_f64:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fadd z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				paulwalker-armUnsubmitted Not Done Reply Inline Actions This output is identical to `fadd_f64` which is incorrect because this test wants the results for inactive lanes to come from %b (i.e. z1). I'd expect the output to be `fadd z1.d, p0/m, z1.d, z0.d`? paulwalker-arm: This output is identical to `fadd_f64` which is incorrect because this test wants the results…
				entry:
				%r = tail call <vscale x 2 x double> @llvm.aarch64.sve.fadd.u.nxv2f64(<vscale x 2 x i1> %pg, <vscale x 2 x double> %a, <vscale x 2 x double> %b)
				%sel = select <vscale x 2 x i1> %pg, <vscale x 2 x double> %r, <vscale x 2 x double> %b
				ret <vscale x 2 x double> %sel
				}

				define <vscale x 4 x float> @fadd_rev_f32(<vscale x 4 x i1> %pg, <vscale x 4 x float> %a, <vscale x 4 x float> %b) {
				; CHECK-LABEL: fadd_rev_f32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fadd z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: ret
				entry:
				%r = tail call <vscale x 4 x float> @llvm.aarch64.sve.fadd.u.nxv4f32(<vscale x 4 x i1> %pg, <vscale x 4 x float> %a, <vscale x 4 x float> %b)
				%sel = select <vscale x 4 x i1> %pg, <vscale x 4 x float> %r, <vscale x 4 x float> %b
				ret <vscale x 4 x float> %sel
				}

				define <vscale x 8 x half> @fadd_rev_f16(<vscale x 8 x i1> %pg, <vscale x 8 x half> %a, <vscale x 8 x half> %b) {
				; CHECK-LABEL: fadd_rev_f16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fadd z0.h, p0/m, z0.h, z1.h
				; CHECK-NEXT: ret
				entry:
				%r = tail call <vscale x 8 x half> @llvm.aarch64.sve.fadd.u.nxv8f16(<vscale x 8 x i1> %pg, <vscale x 8 x half> %a, <vscale x 8 x half> %b)
				%sel = select <vscale x 8 x i1> %pg, <vscale x 8 x half> %r, <vscale x 8 x half> %b
				ret <vscale x 8 x half> %sel
				}

				define <vscale x 2 x double> @fsub_f64(<vscale x 2 x i1> %pg, <vscale x 2 x double> %a, <vscale x 2 x double> %b) {
				; CHECK-LABEL: fsub_f64:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fsub z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				entry:
				%r = tail call <vscale x 2 x double> @llvm.aarch64.sve.fsub.u.nxv2f64(<vscale x 2 x i1> %pg, <vscale x 2 x double> %a, <vscale x 2 x double> %b)
				%sel = select <vscale x 2 x i1> %pg, <vscale x 2 x double> %r, <vscale x 2 x double> %a
				ret <vscale x 2 x double> %sel
				}

				define <vscale x 4 x float> @fsub_f32(<vscale x 4 x i1> %pg, <vscale x 4 x float> %a, <vscale x 4 x float> %b) {
				; CHECK-LABEL: fsub_f32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fsub z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: ret
				entry:
				%r = tail call <vscale x 4 x float> @llvm.aarch64.sve.fsub.u.nxv4f32(<vscale x 4 x i1> %pg, <vscale x 4 x float> %a, <vscale x 4 x float> %b)
				%sel = select <vscale x 4 x i1> %pg, <vscale x 4 x float> %r, <vscale x 4 x float> %a
				ret <vscale x 4 x float> %sel
				}

				define <vscale x 8 x half> @fsub_f16(<vscale x 8 x i1> %pg, <vscale x 8 x half> %a, <vscale x 8 x half> %b) {
				; CHECK-LABEL: fsub_f16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fsub z0.h, p0/m, z0.h, z1.h
				; CHECK-NEXT: ret
				entry:
				%r = tail call <vscale x 8 x half> @llvm.aarch64.sve.fsub.u.nxv8f16(<vscale x 8 x i1> %pg, <vscale x 8 x half> %a, <vscale x 8 x half> %b)
				%sel = select <vscale x 8 x i1> %pg, <vscale x 8 x half> %r, <vscale x 8 x half> %a
				ret <vscale x 8 x half> %sel
				}

				define <vscale x 2 x double> @fsub_rev_f64(<vscale x 2 x i1> %pg, <vscale x 2 x double> %a, <vscale x 2 x double> %b) {
				; CHECK-LABEL: fsub_rev_f64:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fsub z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				entry:
				%r = tail call <vscale x 2 x double> @llvm.aarch64.sve.fsub.u.nxv2f64(<vscale x 2 x i1> %pg, <vscale x 2 x double> %a, <vscale x 2 x double> %b)
				%sel = select <vscale x 2 x i1> %pg, <vscale x 2 x double> %r, <vscale x 2 x double> %b
				ret <vscale x 2 x double> %sel
				}

				define <vscale x 4 x float> @fsub_rev_f32(<vscale x 4 x i1> %pg, <vscale x 4 x float> %a, <vscale x 4 x float> %b) {
				; CHECK-LABEL: fsub_rev_f32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fsub z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: ret
				entry:
				%r = tail call <vscale x 4 x float> @llvm.aarch64.sve.fsub.u.nxv4f32(<vscale x 4 x i1> %pg, <vscale x 4 x float> %a, <vscale x 4 x float> %b)
				%sel = select <vscale x 4 x i1> %pg, <vscale x 4 x float> %r, <vscale x 4 x float> %b
				ret <vscale x 4 x float> %sel
				}

				define <vscale x 8 x half> @fsub_rev_f16(<vscale x 8 x i1> %pg, <vscale x 8 x half> %a, <vscale x 8 x half> %b) {
				; CHECK-LABEL: fsub_rev_f16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fsub z0.h, p0/m, z0.h, z1.h
				; CHECK-NEXT: ret
				entry:
				%r = tail call <vscale x 8 x half> @llvm.aarch64.sve.fsub.u.nxv8f16(<vscale x 8 x i1> %pg, <vscale x 8 x half> %a, <vscale x 8 x half> %b)
				%sel = select <vscale x 8 x i1> %pg, <vscale x 8 x half> %r, <vscale x 8 x half> %b
				ret <vscale x 8 x half> %sel
				}

				define <vscale x 2 x double> @fmul_f64(<vscale x 2 x i1> %pg, <vscale x 2 x double> %a, <vscale x 2 x double> %b) {
				; CHECK-LABEL: fmul_f64:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fmul z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				entry:
				%r = tail call <vscale x 2 x double> @llvm.aarch64.sve.fmul.u.nxv2f64(<vscale x 2 x i1> %pg, <vscale x 2 x double> %a, <vscale x 2 x double> %b)
				%sel = select <vscale x 2 x i1> %pg, <vscale x 2 x double> %r, <vscale x 2 x double> %a
				ret <vscale x 2 x double> %sel
				}

				define <vscale x 4 x float> @fmul_f32(<vscale x 4 x i1> %pg, <vscale x 4 x float> %a, <vscale x 4 x float> %b) {
				; CHECK-LABEL: fmul_f32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fmul z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: ret
				entry:
				%r = tail call <vscale x 4 x float> @llvm.aarch64.sve.fmul.u.nxv4f32(<vscale x 4 x i1> %pg, <vscale x 4 x float> %a, <vscale x 4 x float> %b)
				%sel = select <vscale x 4 x i1> %pg, <vscale x 4 x float> %r, <vscale x 4 x float> %a
				ret <vscale x 4 x float> %sel
				}

				define <vscale x 8 x half> @fmul_f16(<vscale x 8 x i1> %pg, <vscale x 8 x half> %a, <vscale x 8 x half> %b) {
				; CHECK-LABEL: fmul_f16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fmul z0.h, p0/m, z0.h, z1.h
				; CHECK-NEXT: ret
				entry:
				%r = tail call <vscale x 8 x half> @llvm.aarch64.sve.fmul.u.nxv8f16(<vscale x 8 x i1> %pg, <vscale x 8 x half> %a, <vscale x 8 x half> %b)
				%sel = select <vscale x 8 x i1> %pg, <vscale x 8 x half> %r, <vscale x 8 x half> %a
				ret <vscale x 8 x half> %sel
				}

				define <vscale x 2 x double> @fmaxnm_f64(<vscale x 2 x i1> %pg, <vscale x 2 x double> %a, <vscale x 2 x double> %b) {
				; CHECK-LABEL: fmaxnm_f64:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fmaxnm z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				entry:
				%r = tail call <vscale x 2 x double> @llvm.aarch64.sve.fmaxnm.u.nxv2f64(<vscale x 2 x i1> %pg, <vscale x 2 x double> %a, <vscale x 2 x double> %b)
				%sel = select <vscale x 2 x i1> %pg, <vscale x 2 x double> %r, <vscale x 2 x double> %a
				ret <vscale x 2 x double> %sel
				}

				define <vscale x 4 x float> @fmaxnm_f32(<vscale x 4 x i1> %pg, <vscale x 4 x float> %a, <vscale x 4 x float> %b) {
				; CHECK-LABEL: fmaxnm_f32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fmaxnm z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: ret
				entry:
				%r = tail call <vscale x 4 x float> @llvm.aarch64.sve.fmaxnm.u.nxv4f32(<vscale x 4 x i1> %pg, <vscale x 4 x float> %a, <vscale x 4 x float> %b)
				%sel = select <vscale x 4 x i1> %pg, <vscale x 4 x float> %r, <vscale x 4 x float> %a
				ret <vscale x 4 x float> %sel
				}

				define <vscale x 8 x half> @fmaxnm_f16(<vscale x 8 x i1> %pg, <vscale x 8 x half> %a, <vscale x 8 x half> %b) {
				; CHECK-LABEL: fmaxnm_f16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fmaxnm z0.h, p0/m, z0.h, z1.h
				; CHECK-NEXT: ret
				entry:
				%r = tail call <vscale x 8 x half> @llvm.aarch64.sve.fmaxnm.u.nxv8f16(<vscale x 8 x i1> %pg, <vscale x 8 x half> %a, <vscale x 8 x half> %b)
				%sel = select <vscale x 8 x i1> %pg, <vscale x 8 x half> %r, <vscale x 8 x half> %a
				ret <vscale x 8 x half> %sel
				}

				define <vscale x 2 x double> @fminnm_f64(<vscale x 2 x i1> %pg, <vscale x 2 x double> %a, <vscale x 2 x double> %b) {
				; CHECK-LABEL: fminnm_f64:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fminnm z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				entry:
				%r = tail call <vscale x 2 x double> @llvm.aarch64.sve.fminnm.u.nxv2f64(<vscale x 2 x i1> %pg, <vscale x 2 x double> %a, <vscale x 2 x double> %b)
				%sel = select <vscale x 2 x i1> %pg, <vscale x 2 x double> %r, <vscale x 2 x double> %a
				ret <vscale x 2 x double> %sel
				}

				define <vscale x 4 x float> @fminnm_f32(<vscale x 4 x i1> %pg, <vscale x 4 x float> %a, <vscale x 4 x float> %b) {
				; CHECK-LABEL: fminnm_f32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fminnm z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: ret
				entry:
				%r = tail call <vscale x 4 x float> @llvm.aarch64.sve.fminnm.u.nxv4f32(<vscale x 4 x i1> %pg, <vscale x 4 x float> %a, <vscale x 4 x float> %b)
				%sel = select <vscale x 4 x i1> %pg, <vscale x 4 x float> %r, <vscale x 4 x float> %a
				ret <vscale x 4 x float> %sel
				}

				define <vscale x 8 x half> @fminnm_f16(<vscale x 8 x i1> %pg, <vscale x 8 x half> %a, <vscale x 8 x half> %b) {
				; CHECK-LABEL: fminnm_f16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fminnm z0.h, p0/m, z0.h, z1.h
				; CHECK-NEXT: ret
				entry:
				%r = tail call <vscale x 8 x half> @llvm.aarch64.sve.fminnm.u.nxv8f16(<vscale x 8 x i1> %pg, <vscale x 8 x half> %a, <vscale x 8 x half> %b)
				%sel = select <vscale x 8 x i1> %pg, <vscale x 8 x half> %r, <vscale x 8 x half> %a
				ret <vscale x 8 x half> %sel
				}

				define <vscale x 2 x double> @fmax_f64(<vscale x 2 x i1> %pg, <vscale x 2 x double> %a, <vscale x 2 x double> %b) {
				; CHECK-LABEL: fmax_f64:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fmax z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				entry:
				%r = tail call <vscale x 2 x double> @llvm.aarch64.sve.fmax.u.nxv2f64(<vscale x 2 x i1> %pg, <vscale x 2 x double> %a, <vscale x 2 x double> %b)
				%sel = select <vscale x 2 x i1> %pg, <vscale x 2 x double> %r, <vscale x 2 x double> %a
				ret <vscale x 2 x double> %sel
				}

				define <vscale x 4 x float> @fmax_f32(<vscale x 4 x i1> %pg, <vscale x 4 x float> %a, <vscale x 4 x float> %b) {
				; CHECK-LABEL: fmax_f32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fmax z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: ret
				entry:
				%r = tail call <vscale x 4 x float> @llvm.aarch64.sve.fmax.u.nxv4f32(<vscale x 4 x i1> %pg, <vscale x 4 x float> %a, <vscale x 4 x float> %b)
				%sel = select <vscale x 4 x i1> %pg, <vscale x 4 x float> %r, <vscale x 4 x float> %a
				ret <vscale x 4 x float> %sel
				}

				define <vscale x 8 x half> @fmax_f16(<vscale x 8 x i1> %pg, <vscale x 8 x half> %a, <vscale x 8 x half> %b) {
				; CHECK-LABEL: fmax_f16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fmax z0.h, p0/m, z0.h, z1.h
				; CHECK-NEXT: ret
				entry:
				%r = tail call <vscale x 8 x half> @llvm.aarch64.sve.fmax.u.nxv8f16(<vscale x 8 x i1> %pg, <vscale x 8 x half> %a, <vscale x 8 x half> %b)
				%sel = select <vscale x 8 x i1> %pg, <vscale x 8 x half> %r, <vscale x 8 x half> %a
				ret <vscale x 8 x half> %sel
				}

				define <vscale x 2 x double> @fmin_f64(<vscale x 2 x i1> %pg, <vscale x 2 x double> %a, <vscale x 2 x double> %b) {
				; CHECK-LABEL: fmin_f64:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fmin z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				entry:
				%r = tail call <vscale x 2 x double> @llvm.aarch64.sve.fmin.u.nxv2f64(<vscale x 2 x i1> %pg, <vscale x 2 x double> %a, <vscale x 2 x double> %b)
				%sel = select <vscale x 2 x i1> %pg, <vscale x 2 x double> %r, <vscale x 2 x double> %a
				ret <vscale x 2 x double> %sel
				}

				define <vscale x 4 x float> @fmin_f32(<vscale x 4 x i1> %pg, <vscale x 4 x float> %a, <vscale x 4 x float> %b) {
				; CHECK-LABEL: fmin_f32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fmin z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: ret
				entry:
				%r = tail call <vscale x 4 x float> @llvm.aarch64.sve.fmin.u.nxv4f32(<vscale x 4 x i1> %pg, <vscale x 4 x float> %a, <vscale x 4 x float> %b)
				%sel = select <vscale x 4 x i1> %pg, <vscale x 4 x float> %r, <vscale x 4 x float> %a
				ret <vscale x 4 x float> %sel
				}

				define <vscale x 8 x half> @fmin_f16(<vscale x 8 x i1> %pg, <vscale x 8 x half> %a, <vscale x 8 x half> %b) {
				; CHECK-LABEL: fmin_f16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fmin z0.h, p0/m, z0.h, z1.h
				; CHECK-NEXT: ret
				entry:
				%r = tail call <vscale x 8 x half> @llvm.aarch64.sve.fmin.u.nxv8f16(<vscale x 8 x i1> %pg, <vscale x 8 x half> %a, <vscale x 8 x half> %b)
				%sel = select <vscale x 8 x i1> %pg, <vscale x 8 x half> %r, <vscale x 8 x half> %a
				ret <vscale x 8 x half> %sel
				}

				define <vscale x 2 x double> @fabd_f64(<vscale x 2 x i1> %pg, <vscale x 2 x double> %a, <vscale x 2 x double> %b) {
				; CHECK-LABEL: fabd_f64:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fabd z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				entry:
				%r = tail call <vscale x 2 x double> @llvm.aarch64.sve.fabd.u.nxv2f64(<vscale x 2 x i1> %pg, <vscale x 2 x double> %a, <vscale x 2 x double> %b)
				%sel = select <vscale x 2 x i1> %pg, <vscale x 2 x double> %r, <vscale x 2 x double> %a
				ret <vscale x 2 x double> %sel
				}

				define <vscale x 4 x float> @fabd_f32(<vscale x 4 x i1> %pg, <vscale x 4 x float> %a, <vscale x 4 x float> %b) {
				; CHECK-LABEL: fabd_f32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fabd z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: ret
				entry:
				%r = tail call <vscale x 4 x float> @llvm.aarch64.sve.fabd.u.nxv4f32(<vscale x 4 x i1> %pg, <vscale x 4 x float> %a, <vscale x 4 x float> %b)
				%sel = select <vscale x 4 x i1> %pg, <vscale x 4 x float> %r, <vscale x 4 x float> %a
				ret <vscale x 4 x float> %sel
				}

				define <vscale x 8 x half> @fabd_f16(<vscale x 8 x i1> %pg, <vscale x 8 x half> %a, <vscale x 8 x half> %b) {
				; CHECK-LABEL: fabd_f16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fabd z0.h, p0/m, z0.h, z1.h
				; CHECK-NEXT: ret
				entry:
				%r = tail call <vscale x 8 x half> @llvm.aarch64.sve.fabd.u.nxv8f16(<vscale x 8 x i1> %pg, <vscale x 8 x half> %a, <vscale x 8 x half> %b)
				%sel = select <vscale x 8 x i1> %pg, <vscale x 8 x half> %r, <vscale x 8 x half> %a
				ret <vscale x 8 x half> %sel
				}

				define <vscale x 2 x double> @fdiv_f64(<vscale x 2 x i1> %pg, <vscale x 2 x double> %a, <vscale x 2 x double> %b) {
				; CHECK-LABEL: fdiv_f64:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fdiv z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				entry:
				%r = tail call <vscale x 2 x double> @llvm.aarch64.sve.fdiv.u.nxv2f64(<vscale x 2 x i1> %pg, <vscale x 2 x double> %a, <vscale x 2 x double> %b)
				%sel = select <vscale x 2 x i1> %pg, <vscale x 2 x double> %r, <vscale x 2 x double> %a
				ret <vscale x 2 x double> %sel
				}

				define <vscale x 4 x float> @fdiv_f32(<vscale x 4 x i1> %pg, <vscale x 4 x float> %a, <vscale x 4 x float> %b) {
				; CHECK-LABEL: fdiv_f32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fdiv z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: ret
				entry:
				%r = tail call <vscale x 4 x float> @llvm.aarch64.sve.fdiv.u.nxv4f32(<vscale x 4 x i1> %pg, <vscale x 4 x float> %a, <vscale x 4 x float> %b)
				%sel = select <vscale x 4 x i1> %pg, <vscale x 4 x float> %r, <vscale x 4 x float> %a
				ret <vscale x 4 x float> %sel
				}

				define <vscale x 8 x half> @fdiv_f16(<vscale x 8 x i1> %pg, <vscale x 8 x half> %a, <vscale x 8 x half> %b) {
				; CHECK-LABEL: fdiv_f16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fdiv z0.h, p0/m, z0.h, z1.h
				; CHECK-NEXT: ret
				entry:
				%r = tail call <vscale x 8 x half> @llvm.aarch64.sve.fdiv.u.nxv8f16(<vscale x 8 x i1> %pg, <vscale x 8 x half> %a, <vscale x 8 x half> %b)
				%sel = select <vscale x 8 x i1> %pg, <vscale x 8 x half> %r, <vscale x 8 x half> %a
				ret <vscale x 8 x half> %sel
				}

				define <vscale x 2 x double> @fdiv_rev_f64(<vscale x 2 x i1> %pg, <vscale x 2 x double> %a, <vscale x 2 x double> %b) {
				; CHECK-LABEL: fdiv_rev_f64:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fdiv z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				entry:
				%r = tail call <vscale x 2 x double> @llvm.aarch64.sve.fdiv.u.nxv2f64(<vscale x 2 x i1> %pg, <vscale x 2 x double> %a, <vscale x 2 x double> %b)
				%sel = select <vscale x 2 x i1> %pg, <vscale x 2 x double> %r, <vscale x 2 x double> %b
				ret <vscale x 2 x double> %sel
				}

				define <vscale x 4 x float> @fdiv_rev_f32(<vscale x 4 x i1> %pg, <vscale x 4 x float> %a, <vscale x 4 x float> %b) {
				; CHECK-LABEL: fdiv_rev_f32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fdiv z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: ret
				entry:
				%r = tail call <vscale x 4 x float> @llvm.aarch64.sve.fdiv.u.nxv4f32(<vscale x 4 x i1> %pg, <vscale x 4 x float> %a, <vscale x 4 x float> %b)
				%sel = select <vscale x 4 x i1> %pg, <vscale x 4 x float> %r, <vscale x 4 x float> %b
				ret <vscale x 4 x float> %sel
				}

				define <vscale x 8 x half> @fdiv_rev_f16(<vscale x 8 x i1> %pg, <vscale x 8 x half> %a, <vscale x 8 x half> %b) {
				; CHECK-LABEL: fdiv_rev_f16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fdiv z0.h, p0/m, z0.h, z1.h
				; CHECK-NEXT: ret
				entry:
				%r = tail call <vscale x 8 x half> @llvm.aarch64.sve.fdiv.u.nxv8f16(<vscale x 8 x i1> %pg, <vscale x 8 x half> %a, <vscale x 8 x half> %b)
				%sel = select <vscale x 8 x i1> %pg, <vscale x 8 x half> %r, <vscale x 8 x half> %b
				ret <vscale x 8 x half> %sel
				}

				define <vscale x 2 x double> @fneg_f64(<vscale x 2 x i1> %pg, <vscale x 2 x double> %a, <vscale x 2 x double> %b) {
				; CHECK-LABEL: fneg_f64:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fneg z0.d, p0/m, z1.d
				; CHECK-NEXT: ret
				entry:
				%r = tail call <vscale x 2 x double> @llvm.aarch64.sve.fneg.u.nxv2f64(<vscale x 2 x double> %a,<vscale x 2 x i1> %pg, <vscale x 2 x double> %b)
				%sel = select <vscale x 2 x i1> %pg, <vscale x 2 x double> %r, <vscale x 2 x double> %b
				ret <vscale x 2 x double> %sel
				}

				define <vscale x 4 x float> @fneg_f32(<vscale x 4 x i1> %pg, <vscale x 4 x float> %a, <vscale x 4 x float> %b) {
				; CHECK-LABEL: fneg_f32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fneg z0.s, p0/m, z1.s
				; CHECK-NEXT: ret
				entry:
				%r = tail call <vscale x 4 x float> @llvm.aarch64.sve.fneg.u.nxv4f32(<vscale x 4 x float> %a, <vscale x 4 x i1> %pg, <vscale x 4 x float> %b)
				%sel = select <vscale x 4 x i1> %pg, <vscale x 4 x float> %r, <vscale x 4 x float> %b
				ret <vscale x 4 x float> %sel
				}

				define <vscale x 8 x half> @fneg_f16(<vscale x 8 x i1> %pg, <vscale x 8 x half> %a, <vscale x 8 x half> %b) {
				; CHECK-LABEL: fneg_f16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fneg z0.h, p0/m, z1.h
				; CHECK-NEXT: ret
				entry:
				%r = tail call <vscale x 8 x half> @llvm.aarch64.sve.fneg.u.nxv8f16(<vscale x 8 x half> %a, <vscale x 8 x i1> %pg, <vscale x 8 x half> %b)
				%sel = select <vscale x 8 x i1> %pg, <vscale x 8 x half> %r, <vscale x 8 x half> %b
				ret <vscale x 8 x half> %sel
				}

				define <vscale x 2 x double> @fabs_f64(<vscale x 2 x i1> %pg, <vscale x 2 x double> %a, <vscale x 2 x double> %b) {
				; CHECK-LABEL: fabs_f64:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fabs z0.d, p0/m, z1.d
				; CHECK-NEXT: ret
				entry:
				%r = tail call <vscale x 2 x double> @llvm.aarch64.sve.fabs.u.nxv2f64(<vscale x 2 x double> %a,<vscale x 2 x i1> %pg, <vscale x 2 x double> %b)
				%sel = select <vscale x 2 x i1> %pg, <vscale x 2 x double> %r, <vscale x 2 x double> %b
				ret <vscale x 2 x double> %sel
				}

				define <vscale x 4 x float> @fabs_f32(<vscale x 4 x i1> %pg, <vscale x 4 x float> %a, <vscale x 4 x float> %b) {
				; CHECK-LABEL: fabs_f32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fabs z0.s, p0/m, z1.s
				; CHECK-NEXT: ret
				entry:
				%r = tail call <vscale x 4 x float> @llvm.aarch64.sve.fabs.u.nxv4f32(<vscale x 4 x float> %a, <vscale x 4 x i1> %pg, <vscale x 4 x float> %b)
				%sel = select <vscale x 4 x i1> %pg, <vscale x 4 x float> %r, <vscale x 4 x float> %b
				ret <vscale x 4 x float> %sel
				}

				define <vscale x 8 x half> @fabs_f16(<vscale x 8 x i1> %pg, <vscale x 8 x half> %a, <vscale x 8 x half> %b) {
				; CHECK-LABEL: fabs_f16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fabs z0.h, p0/m, z1.h
				; CHECK-NEXT: ret
				entry:
				%r = tail call <vscale x 8 x half> @llvm.aarch64.sve.fabs.u.nxv8f16(<vscale x 8 x half> %a, <vscale x 8 x i1> %pg, <vscale x 8 x half> %b)
				%sel = select <vscale x 8 x i1> %pg, <vscale x 8 x half> %r, <vscale x 8 x half> %b
				ret <vscale x 8 x half> %sel
				}

				declare <vscale x 2 x double> @llvm.aarch64.sve.fadd.u.nxv2f64(<vscale x 2 x i1>, <vscale x 2 x double>, <vscale x 2 x double>)
				declare <vscale x 4 x float> @llvm.aarch64.sve.fadd.u.nxv4f32(<vscale x 4 x i1>, <vscale x 4 x float>, <vscale x 4 x float>)
				declare <vscale x 8 x half> @llvm.aarch64.sve.fadd.u.nxv8f16(<vscale x 8 x i1>, <vscale x 8 x half>, <vscale x 8 x half>)
				declare <vscale x 2 x double> @llvm.aarch64.sve.fsub.u.nxv2f64(<vscale x 2 x i1>, <vscale x 2 x double>, <vscale x 2 x double>)
				declare <vscale x 4 x float> @llvm.aarch64.sve.fsub.u.nxv4f32(<vscale x 4 x i1>, <vscale x 4 x float>, <vscale x 4 x float>)
				declare <vscale x 8 x half> @llvm.aarch64.sve.fsub.u.nxv8f16(<vscale x 8 x i1>, <vscale x 8 x half>, <vscale x 8 x half>)
				declare <vscale x 2 x double> @llvm.aarch64.sve.fmul.u.nxv2f64(<vscale x 2 x i1>, <vscale x 2 x double>, <vscale x 2 x double>)
				declare <vscale x 4 x float> @llvm.aarch64.sve.fmul.u.nxv4f32(<vscale x 4 x i1>, <vscale x 4 x float>, <vscale x 4 x float>)
				declare <vscale x 8 x half> @llvm.aarch64.sve.fmul.u.nxv8f16(<vscale x 8 x i1>, <vscale x 8 x half>, <vscale x 8 x half>)
				declare <vscale x 2 x double> @llvm.aarch64.sve.fmaxnm.u.nxv2f64(<vscale x 2 x i1>, <vscale x 2 x double>, <vscale x 2 x double>)
				declare <vscale x 4 x float> @llvm.aarch64.sve.fmaxnm.u.nxv4f32(<vscale x 4 x i1>, <vscale x 4 x float>, <vscale x 4 x float>)
				declare <vscale x 8 x half> @llvm.aarch64.sve.fmaxnm.u.nxv8f16(<vscale x 8 x i1>, <vscale x 8 x half>, <vscale x 8 x half>)
				declare <vscale x 2 x double> @llvm.aarch64.sve.fminnm.u.nxv2f64(<vscale x 2 x i1>, <vscale x 2 x double>, <vscale x 2 x double>)
				declare <vscale x 4 x float> @llvm.aarch64.sve.fminnm.u.nxv4f32(<vscale x 4 x i1>, <vscale x 4 x float>, <vscale x 4 x float>)
				declare <vscale x 8 x half> @llvm.aarch64.sve.fminnm.u.nxv8f16(<vscale x 8 x i1>, <vscale x 8 x half>, <vscale x 8 x half>)
				declare <vscale x 2 x double> @llvm.aarch64.sve.fmax.u.nxv2f64(<vscale x 2 x i1>, <vscale x 2 x double>, <vscale x 2 x double>)
				declare <vscale x 4 x float> @llvm.aarch64.sve.fmax.u.nxv4f32(<vscale x 4 x i1>, <vscale x 4 x float>, <vscale x 4 x float>)
				declare <vscale x 8 x half> @llvm.aarch64.sve.fmax.u.nxv8f16(<vscale x 8 x i1>, <vscale x 8 x half>, <vscale x 8 x half>)
				declare <vscale x 2 x double> @llvm.aarch64.sve.fmin.u.nxv2f64(<vscale x 2 x i1>, <vscale x 2 x double>, <vscale x 2 x double>)
				declare <vscale x 4 x float> @llvm.aarch64.sve.fmin.u.nxv4f32(<vscale x 4 x i1>, <vscale x 4 x float>, <vscale x 4 x float>)
				declare <vscale x 8 x half> @llvm.aarch64.sve.fmin.u.nxv8f16(<vscale x 8 x i1>, <vscale x 8 x half>, <vscale x 8 x half>)
				declare <vscale x 2 x double> @llvm.aarch64.sve.fabd.u.nxv2f64(<vscale x 2 x i1>, <vscale x 2 x double>, <vscale x 2 x double>)
				declare <vscale x 4 x float> @llvm.aarch64.sve.fabd.u.nxv4f32(<vscale x 4 x i1>, <vscale x 4 x float>, <vscale x 4 x float>)
				declare <vscale x 8 x half> @llvm.aarch64.sve.fabd.u.nxv8f16(<vscale x 8 x i1>, <vscale x 8 x half>, <vscale x 8 x half>)
				declare <vscale x 2 x double> @llvm.aarch64.sve.fdiv.u.nxv2f64(<vscale x 2 x i1>, <vscale x 2 x double>, <vscale x 2 x double>)
				declare <vscale x 4 x float> @llvm.aarch64.sve.fdiv.u.nxv4f32(<vscale x 4 x i1>, <vscale x 4 x float>, <vscale x 4 x float>)
				declare <vscale x 8 x half> @llvm.aarch64.sve.fdiv.u.nxv8f16(<vscale x 8 x i1>, <vscale x 8 x half>, <vscale x 8 x half>)
				declare <vscale x 2 x double> @llvm.aarch64.sve.fneg.u.nxv2f64(<vscale x 2 x double>, <vscale x 2 x i1>, <vscale x 2 x double>)
				declare <vscale x 4 x float> @llvm.aarch64.sve.fneg.u.nxv4f32(<vscale x 4 x float>, <vscale x 4 x i1>, <vscale x 4 x float>)
				declare <vscale x 8 x half> @llvm.aarch64.sve.fneg.u.nxv8f16(<vscale x 8 x half>, <vscale x 8 x i1>, <vscale x 8 x half>)
				declare <vscale x 2 x double> @llvm.aarch64.sve.fabs.u.nxv2f64(<vscale x 2 x double>, <vscale x 2 x i1>, <vscale x 2 x double>)
				declare <vscale x 4 x float> @llvm.aarch64.sve.fabs.u.nxv4f32(<vscale x 4 x float>, <vscale x 4 x i1>, <vscale x 4 x float>)
				declare <vscale x 8 x half> @llvm.aarch64.sve.fabs.u.nxv8f16(<vscale x 8 x half>, <vscale x 8 x i1>, <vscale x 8 x half>)