This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
1/3
AArch64SVEInstrInfo.td
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
sve-fp-combine.ll

Differential D130564

[AArch64][SVE] Add patterns to select masked FP arith
ClosedPublic

Authored by c-rhodes on Jul 26 2022, 5:44 AM.

Download Raw Diff

Details

Reviewers

paulwalker-arm
bsmith
peterwaller-arm
efriedma

Commits

rGa6dec9f5b284: [AArch64][SVE] Add patterns to select masked FP arith

Summary

Add patterns to select predicated instructions when lowering:

fadd(a, select(mask, b, splat(0)))
fsub(a, select(mask, b, splat(0)))

'fadd' is unsafe unless no-signed zeros fast-math flag is set, since

-0.0 + 0.0 = 0.0

changes the sign. Alive2: https://alive2.llvm.org/ce/z/wbhJh_

Also adds FMA patterns for:

fadd(a, select(mask, mul(b, c), splat(0))) -> fmla(a, mask, b, c)
fsub(a, select(mask, mul(b, c), splat(0))) -> fmla(a, mask, b, c)

These patterns require the 'contract' fast-math flag to be set, and the
fadd 'nsz' as above.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

c-rhodes created this revision.Jul 26 2022, 5:44 AM

Herald added a reviewer: efriedma. · View Herald TranscriptJul 26 2022, 5:44 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: psnobl, hiraditya, kristof.beyls, tschuett. · View Herald Transcript

c-rhodes requested review of this revision.Jul 26 2022, 5:44 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 26 2022, 5:44 AM

Harbormaster completed remote builds in B177584: Diff 447655.Jul 26 2022, 5:45 AM

efriedma added inline comments.Jul 26 2022, 9:29 AM

llvm/test/CodeGen/AArch64/sve-masked-fp-arith.ll
135 ↗	(On Diff #447655)	This transform isn't legal without the "contract" fast-math flag.

Matt added a subscriber: Matt.Jul 26 2022, 12:37 PM

Check ‘contract’ fast-math flag is set for FMLA/FMLS transform.
Tests moved to existing sve-fp-combine.ll

Harbormaster completed remote builds in B177825: Diff 447989.Jul 27 2022, 3:35 AM

c-rhodes marked an inline comment as done.Jul 27 2022, 3:37 AM

c-rhodes added inline comments.

llvm/test/CodeGen/AArch64/sve-masked-fp-arith.ll
135 ↗	(On Diff #447655)	This transform isn't legal without the "contract" fast-math flag. Thanks for pointing that out Eli, fixed now.

paulwalker-arm added inline comments.Jul 28 2022, 10:27 AM

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
386–387	Do flags on the fmul matter? It's the result of the fadd/fsub that's affected by the contraction and so I think only those nodes require the contract flag. I'm not totally sure but I do wonder if we need to also check for no-signed-zeros because for the equivalent reduction code -0.0 is the nop value.

c-rhodes marked an inline comment as done.Jul 29 2022, 1:36 AM

c-rhodes added inline comments.

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
386–387	Do flags on the fmul matter? It's the result of the fadd/fsub that's affected by the contraction and so I think only those nodes require the contract flag. Not entirely sure to be honest, the existing SVE patterns we have to combine fmul+fadd into fma don't kick in unless contract is also on the fmul: https://godbolt.org/z/xWsn7vs5f I checked some other targets (X86 and Power9) and they also don't combine unless contract is on the fmul, but there is a combine in AArch64 for `fmadd` that kicks in without contract on fmul: https://godbolt.org/z/rzzTb8s9W I'm not totally sure but I do wonder if we need to also check for no-signed-zeros because for the equivalent reduction code -0.0 is the nop value. Not sure either, I'll look into it.

c-rhodes added inline comments.Aug 2 2022, 3:27 AM

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

386–387

I'm not totally sure but I do wonder if we need to also check for no-signed-zeros because for the equivalent reduction code -0.0 is the nop value.

Not sure either, I'll look into it.

I think I understand the issue now.

printf("%g %g\n",  0.0f + 0.0f, -0.0f + 0.0f);
printf("%g %g\n",  0.0f - 0.0f, -0.0f - 0.0f);
printf("%g %g\n",  0.0f * 0.0f, -0.0f * 0.0f);

gives:

0 0
0 -0
0 -0

so fadd produces different result and is unsafe with no-sign zeroes. Alive2 agrees:

op	signed zeroes	no-signed zeroes
fadd	https://alive2.llvm.org/ce/z/qfBana	https://alive2.llvm.org/ce/z/wbhJh_
fsub	https://alive2.llvm.org/ce/z/wqkSwC	N/A
fmul	https://alive2.llvm.org/ce/z/88Z_AG	https://alive2.llvm.org/ce/z/qig4sU

nsz is required for the fadd/sel and fadd/sel/fmul (FMLA) patterns. Although the fmul/sel patterns aren't valid according to Alive2.

fmul(a, select(mask, b, splat(0))) transform isn’t correct, remove it.
contract fast-math flag only required on the fadd/fsub op.
fadd(a, select(mask, b, splat(0))’transform is unsafe unless nsz fast-math flag is specified.

Harbormaster completed remote builds in B178975: Diff 449584.Aug 3 2022, 1:39 AM

c-rhodes marked an inline comment as done.Aug 3 2022, 1:39 AM

As we increase the idioms that need to match to fma like instructions we may want to move some of this logic into c++ code to canonicalise the DAG and emit FMA_PRED/vselect combos that are easier to match, but that can wait for another day.

This revision is now accepted and ready to land.Aug 5 2022, 4:01 AM

This revision was landed with ongoing or failed builds.Aug 8 2022, 1:45 AM

Closed by commit rGa6dec9f5b284: [AArch64][SVE] Add patterns to select masked FP arith (authored by c-rhodes). · Explain Why

This revision was automatically updated to reflect the committed changes.

c-rhodes mentioned this in rG17ac26a78eaa: [AArch64][SVE] NFC: Add tests for masked FP arith patterns (D130564).

c-rhodes added a commit: rGa6dec9f5b284: [AArch64][SVE] Add patterns to select masked FP arith.

dmgreen mentioned this in D147723: [AArch64][SVE] Extend predicated fma patterns to negative zero.Apr 6 2023, 9:46 AM

dmgreen mentioned this in rGcfee494fea09: [AArch64][SVE] Extend predicated fma patterns to negative zero.Apr 12 2023, 7:53 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64SVEInstrInfo.td

41 lines

test/

CodeGen/

AArch64/

sve-fp-combine.ll

82 lines

Diff 450732

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

Show First 20 Lines • Show All 193 Lines • ▼ Show 20 Lines

def AArch64smin_p : SDNode<"AArch64ISD::SMIN_PRED", SDT_AArch64Arith>; def AArch64smin_p : SDNode<"AArch64ISD::SMIN_PRED", SDT_AArch64Arith>;

def AArch64smulh_p : SDNode<"AArch64ISD::MULHS_PRED", SDT_AArch64Arith>; def AArch64smulh_p : SDNode<"AArch64ISD::MULHS_PRED", SDT_AArch64Arith>;

def AArch64uabd_p : SDNode<"AArch64ISD::ABDU_PRED", SDT_AArch64Arith>; def AArch64uabd_p : SDNode<"AArch64ISD::ABDU_PRED", SDT_AArch64Arith>;

def AArch64udiv_p : SDNode<"AArch64ISD::UDIV_PRED", SDT_AArch64Arith>; def AArch64udiv_p : SDNode<"AArch64ISD::UDIV_PRED", SDT_AArch64Arith>;

def AArch64umax_p : SDNode<"AArch64ISD::UMAX_PRED", SDT_AArch64Arith>; def AArch64umax_p : SDNode<"AArch64ISD::UMAX_PRED", SDT_AArch64Arith>;

def AArch64umin_p : SDNode<"AArch64ISD::UMIN_PRED", SDT_AArch64Arith>; def AArch64umin_p : SDNode<"AArch64ISD::UMIN_PRED", SDT_AArch64Arith>;

def AArch64umulh_p : SDNode<"AArch64ISD::MULHU_PRED", SDT_AArch64Arith>; def AArch64umulh_p : SDNode<"AArch64ISD::MULHU_PRED", SDT_AArch64Arith>;

def AArch64fadd_p_nsz : PatFrag<(ops node:$op1, node:$op2, node:$op3),

(AArch64fadd_p node:$op1, node:$op2, node:$op3), [{

return N->getFlags().hasNoSignedZeros();

}]>;

def SDT_AArch64Arith_Imm : SDTypeProfile<1, 3, [ def SDT_AArch64Arith_Imm : SDTypeProfile<1, 3, [

SDTCisVec<0>, SDTCisVec<1>, SDTCisVec<2>, SDTCisVT<3,i32>, SDTCisVec<0>, SDTCisVec<1>, SDTCisVec<2>, SDTCisVT<3,i32>,

SDTCVecEltisVT<1,i1>, SDTCisSameAs<0,2> SDTCVecEltisVT<1,i1>, SDTCisSameAs<0,2>

]>; ]>;

def AArch64asrd_m1 : SDNode<"AArch64ISD::SRAD_MERGE_OP1", SDT_AArch64Arith_Imm>; def AArch64asrd_m1 : SDNode<"AArch64ISD::SRAD_MERGE_OP1", SDT_AArch64Arith_Imm>;

def SDT_AArch64IntExtend : SDTypeProfile<1, 4, [ def SDT_AArch64IntExtend : SDTypeProfile<1, 4, [

Show All 27 Lines

// These are like the above but we don't yet have need for ISD nodes. They allow // These are like the above but we don't yet have need for ISD nodes. They allow

// a single pattern to match intrinsic and ISD operand layouts. // a single pattern to match intrinsic and ISD operand layouts.

def AArch64cls_mt : PatFrags<(ops node:$pg, node:$op, node:$pt), [(int_aarch64_sve_cls node:$pt, node:$pg, node:$op)]>; def AArch64cls_mt : PatFrags<(ops node:$pg, node:$op, node:$pt), [(int_aarch64_sve_cls node:$pt, node:$pg, node:$op)]>;

def AArch64cnot_mt : PatFrags<(ops node:$pg, node:$op, node:$pt), [(int_aarch64_sve_cnot node:$pt, node:$pg, node:$op)]>; def AArch64cnot_mt : PatFrags<(ops node:$pg, node:$op, node:$pt), [(int_aarch64_sve_cnot node:$pt, node:$pg, node:$op)]>;

def AArch64not_mt : PatFrags<(ops node:$pg, node:$op, node:$pt), [(int_aarch64_sve_not node:$pt, node:$pg, node:$op)]>; def AArch64not_mt : PatFrags<(ops node:$pg, node:$op, node:$pt), [(int_aarch64_sve_not node:$pt, node:$pg, node:$op)]>;

def AArch64fmul_m1 : EitherVSelectOrPassthruPatFrags<int_aarch64_sve_fmul, AArch64fmul_p>; def AArch64fmul_m1 : EitherVSelectOrPassthruPatFrags<int_aarch64_sve_fmul, AArch64fmul_p>;

def AArch64fadd_m1 : EitherVSelectOrPassthruPatFrags<int_aarch64_sve_fadd, AArch64fadd_p>; def AArch64fadd_m1 : PatFrags<(ops node:$pg, node:$op1, node:$op2), [

def AArch64fsub_m1 : EitherVSelectOrPassthruPatFrags<int_aarch64_sve_fsub, AArch64fsub_p>; (int_aarch64_sve_fadd node:$pg, node:$op1, node:$op2),

(vselect node:$pg, (AArch64fadd_p (SVEAllActive), node:$op1, node:$op2), node:$op1),

(AArch64fadd_p_nsz (SVEAllActive), node:$op1, (vselect node:$pg, node:$op2, (SVEDup0)))

]>;

def AArch64fsub_m1 : PatFrags<(ops node:$pg, node:$op1, node:$op2), [

(int_aarch64_sve_fsub node:$pg, node:$op1, node:$op2),

(vselect node:$pg, (AArch64fsub_p (SVEAllActive), node:$op1, node:$op2), node:$op1),

(AArch64fsub_p (SVEAllActive), node:$op1, (vselect node:$pg, node:$op2, (SVEDup0)))

]>;

def AArch64saba : PatFrags<(ops node:$op1, node:$op2, node:$op3), def AArch64saba : PatFrags<(ops node:$op1, node:$op2, node:$op3),

[(int_aarch64_sve_saba node:$op1, node:$op2, node:$op3), [(int_aarch64_sve_saba node:$op1, node:$op2, node:$op3),

(add node:$op1, (AArch64sabd_p (SVEAllActive), node:$op2, node:$op3))]>; (add node:$op1, (AArch64sabd_p (SVEAllActive), node:$op2, node:$op3))]>;

def AArch64uaba : PatFrags<(ops node:$op1, node:$op2, node:$op3), def AArch64uaba : PatFrags<(ops node:$op1, node:$op2, node:$op3),

[(int_aarch64_sve_uaba node:$op1, node:$op2, node:$op3), [(int_aarch64_sve_uaba node:$op1, node:$op2, node:$op3),

(add node:$op1, (AArch64uabd_p (SVEAllActive), node:$op2, node:$op3))]>; (add node:$op1, (AArch64uabd_p (SVEAllActive), node:$op2, node:$op3))]>;

▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines

def reinterpret_cast : SDNode<"AArch64ISD::REINTERPRET_CAST", SDTUnaryOp>; def reinterpret_cast : SDNode<"AArch64ISD::REINTERPRET_CAST", SDTUnaryOp>;

def AArch64mul_p_oneuse : PatFrag<(ops node:$pred, node:$src1, node:$src2), def AArch64mul_p_oneuse : PatFrag<(ops node:$pred, node:$src1, node:$src2),

(AArch64mul_p node:$pred, node:$src1, node:$src2), [{ (AArch64mul_p node:$pred, node:$src1, node:$src2), [{

return N->hasOneUse(); return N->hasOneUse();

}]>; }]>;

def AArch64fmul_p_oneuse : PatFrag<(ops node:$pred, node:$src1, node:$src2),

(AArch64fmul_p node:$pred, node:$src1, node:$src2), [{

return N->hasOneUse();

}]>;

def AArch64fabd_p : PatFrag<(ops node:$pg, node:$op1, node:$op2), def AArch64fabd_p : PatFrag<(ops node:$pg, node:$op1, node:$op2),

(AArch64fabs_mt node:$pg, (AArch64fsub_p node:$pg, node:$op1, node:$op2), undef)>; (AArch64fabs_mt node:$pg, (AArch64fsub_p node:$pg, node:$op1, node:$op2), undef)>;

// FMAs with a negated multiplication operand can be commuted. // FMAs with a negated multiplication operand can be commuted.

def AArch64fmls_p : PatFrags<(ops node:$pred, node:$op1, node:$op2, node:$op3), def AArch64fmls_p : PatFrags<(ops node:$pred, node:$op1, node:$op2, node:$op3),

[(AArch64fma_p node:$pred, (AArch64fneg_mt node:$pred, node:$op1, (undef)), node:$op2, node:$op3), [(AArch64fma_p node:$pred, (AArch64fneg_mt node:$pred, node:$op1, (undef)), node:$op2, node:$op3),

(AArch64fma_p node:$pred, node:$op2, (AArch64fneg_mt node:$pred, node:$op1, (undef)), node:$op3)]>; (AArch64fma_p node:$pred, node:$op2, (AArch64fneg_mt node:$pred, node:$op1, (undef)), node:$op3)]>;

Show All 32 Lines def AArch64mla_m1 : PatFrags<(ops node:$pred, node:$op1, node:$op2, node:$op3),

// add(a, select(mask, mul(b, c), splat(0))) -> mla(a, mask, b, c) // add(a, select(mask, mul(b, c), splat(0))) -> mla(a, mask, b, c)

(add node:$op1, (vselect node:$pred, (AArch64mul_p_oneuse (SVEAllActive), node:$op2, node:$op3), (SVEDup0)))]>; (add node:$op1, (vselect node:$pred, (AArch64mul_p_oneuse (SVEAllActive), node:$op2, node:$op3), (SVEDup0)))]>;

def AArch64mls_m1 : PatFrags<(ops node:$pred, node:$op1, node:$op2, node:$op3), def AArch64mls_m1 : PatFrags<(ops node:$pred, node:$op1, node:$op2, node:$op3),

[(int_aarch64_sve_mls node:$pred, node:$op1, node:$op2, node:$op3), [(int_aarch64_sve_mls node:$pred, node:$op1, node:$op2, node:$op3),

(sub node:$op1, (AArch64mul_p_oneuse node:$pred, node:$op2, node:$op3)), (sub node:$op1, (AArch64mul_p_oneuse node:$pred, node:$op2, node:$op3)),

// sub(a, select(mask, mul(b, c), splat(0))) -> mls(a, mask, b, c) // sub(a, select(mask, mul(b, c), splat(0))) -> mls(a, mask, b, c)

(sub node:$op1, (vselect node:$pred, (AArch64mul_p_oneuse (SVEAllActive), node:$op2, node:$op3), (SVEDup0)))]>; (sub node:$op1, (vselect node:$pred, (AArch64mul_p_oneuse (SVEAllActive), node:$op2, node:$op3), (SVEDup0)))]>;

class fma_patfrags<SDPatternOperator intrinsic, SDPatternOperator sdnode>

: PatFrags<(ops node:$pred, node:$op1, node:$op2, node:$op3),

[(intrinsic node:$pred, node:$op1, node:$op2, node:$op3),

(sdnode (SVEAllActive), node:$op1, (vselect node:$pred, (AArch64fmul_p_oneuse (SVEAllActive), node:$op2, node:$op3), (SVEDup0)))],

[{

if ((N->getOpcode() != AArch64ISD::FADD_PRED) &&

(N->getOpcode() != AArch64ISD::FSUB_PRED))

return true; // it's the intrinsic

return N->getFlags().hasAllowContract();

}]>;

paulwalker-armUnsubmitted

Done

Do flags on the fmul matter? It's the result of the fadd/fsub that's affected by the contraction and so I think only those nodes require the contract flag.

I'm not totally sure but I do wonder if we need to also check for no-signed-zeros because for the equivalent reduction code -0.0 is the nop value.

paulwalker-arm: Do flags on the fmul matter? It's the result of the fadd/fsub that's affected by the…

c-rhodesAuthorUnsubmitted

Not Done

Do flags on the fmul matter? It's the result of the fadd/fsub that's affected by the contraction and so I think only those nodes require the contract flag.

Not entirely sure to be honest, the existing SVE patterns we have to combine fmul+fadd into fma don't kick in unless contract is also on the fmul: https://godbolt.org/z/xWsn7vs5f

I checked some other targets (X86 and Power9) and they also don't combine unless contract is on the fmul, but there is a combine in AArch64 for fmadd that kicks in without contract on fmul: https://godbolt.org/z/rzzTb8s9W

I'm not totally sure but I do wonder if we need to also check for no-signed-zeros because for the equivalent reduction code -0.0 is the nop value.

Not sure either, I'll look into it.

c-rhodes: > Do flags on the fmul matter? It's the result of the fadd/fsub that's affected by the…

c-rhodesAuthorUnsubmitted

Not Done

I'm not totally sure but I do wonder if we need to also check for no-signed-zeros because for the equivalent reduction code -0.0 is the nop value.

Not sure either, I'll look into it.

I think I understand the issue now.

printf("%g %g\n",  0.0f + 0.0f, -0.0f + 0.0f);
printf("%g %g\n",  0.0f - 0.0f, -0.0f - 0.0f);
printf("%g %g\n",  0.0f * 0.0f, -0.0f * 0.0f);

gives:

0 0
0 -0
0 -0

so fadd produces different result and is unsafe with no-sign zeroes. Alive2 agrees:

op	signed zeroes	no-signed zeroes
fadd	https://alive2.llvm.org/ce/z/qfBana	https://alive2.llvm.org/ce/z/wbhJh_
fsub	https://alive2.llvm.org/ce/z/wqkSwC	N/A
fmul	https://alive2.llvm.org/ce/z/88Z_AG	https://alive2.llvm.org/ce/z/qig4sU

nsz is required for the fadd/sel and fadd/sel/fmul (FMLA) patterns. Although the fmul/sel patterns aren't valid according to Alive2.

c-rhodes: > > I'm not totally sure but I do wonder if we need to also check for no-signed-zeros because…

def AArch64fmla_m1 : fma_patfrags<int_aarch64_sve_fmla, AArch64fadd_p_nsz>;

def AArch64fmls_m1 : fma_patfrags<int_aarch64_sve_fmls, AArch64fsub_p>;

let Predicates = [HasSVE] in { let Predicates = [HasSVE] in {

defm RDFFR_PPz : sve_int_rdffr_pred<0b0, "rdffr", int_aarch64_sve_rdffr_z>; defm RDFFR_PPz : sve_int_rdffr_pred<0b0, "rdffr", int_aarch64_sve_rdffr_z>;

def RDFFRS_PPz : sve_int_rdffr_pred<0b1, "rdffrs">; def RDFFRS_PPz : sve_int_rdffr_pred<0b1, "rdffrs">;

defm RDFFR_P : sve_int_rdffr_unpred<"rdffr", int_aarch64_sve_rdffr>; defm RDFFR_P : sve_int_rdffr_unpred<"rdffr", int_aarch64_sve_rdffr>;

def SETFFR : sve_int_setffr<"setffr", int_aarch64_sve_setffr>; def SETFFR : sve_int_setffr<"setffr", int_aarch64_sve_setffr>;

def WRFFR : sve_int_wrffr<"wrffr", int_aarch64_sve_wrffr>; def WRFFR : sve_int_wrffr<"wrffr", int_aarch64_sve_wrffr>;

} // End HasSVE } // End HasSVE

▲ Show 20 Lines • Show All 220 Lines • ▼ Show 20 Lines

let Predicates = [HasSVE] in { let Predicates = [HasSVE] in {

defm FTSSEL_ZZZ : sve_int_bin_cons_misc_0_b<"ftssel", int_aarch64_sve_ftssel_x>; defm FTSSEL_ZZZ : sve_int_bin_cons_misc_0_b<"ftssel", int_aarch64_sve_ftssel_x>;

} // End HasSVE } // End HasSVE

let Predicates = [HasSVEorSME] in { let Predicates = [HasSVEorSME] in {

defm FCADD_ZPmZ : sve_fp_fcadd<"fcadd", int_aarch64_sve_fcadd>; defm FCADD_ZPmZ : sve_fp_fcadd<"fcadd", int_aarch64_sve_fcadd>;

defm FCMLA_ZPmZZ : sve_fp_fcmla<"fcmla", int_aarch64_sve_fcmla>; defm FCMLA_ZPmZZ : sve_fp_fcmla<"fcmla", int_aarch64_sve_fcmla>;

defm FMLA_ZPmZZ : sve_fp_3op_p_zds_a<0b00, "fmla", "FMLA_ZPZZZ", int_aarch64_sve_fmla, "FMAD_ZPmZZ">; defm FMLA_ZPmZZ : sve_fp_3op_p_zds_a<0b00, "fmla", "FMLA_ZPZZZ", AArch64fmla_m1, "FMAD_ZPmZZ">;

defm FMLS_ZPmZZ : sve_fp_3op_p_zds_a<0b01, "fmls", "FMLS_ZPZZZ", int_aarch64_sve_fmls, "FMSB_ZPmZZ">; defm FMLS_ZPmZZ : sve_fp_3op_p_zds_a<0b01, "fmls", "FMLS_ZPZZZ", AArch64fmls_m1, "FMSB_ZPmZZ">;

defm FNMLA_ZPmZZ : sve_fp_3op_p_zds_a<0b10, "fnmla", "FNMLA_ZPZZZ", int_aarch64_sve_fnmla, "FNMAD_ZPmZZ">; defm FNMLA_ZPmZZ : sve_fp_3op_p_zds_a<0b10, "fnmla", "FNMLA_ZPZZZ", int_aarch64_sve_fnmla, "FNMAD_ZPmZZ">;

defm FNMLS_ZPmZZ : sve_fp_3op_p_zds_a<0b11, "fnmls", "FNMLS_ZPZZZ", int_aarch64_sve_fnmls, "FNMSB_ZPmZZ">; defm FNMLS_ZPmZZ : sve_fp_3op_p_zds_a<0b11, "fnmls", "FNMLS_ZPZZZ", int_aarch64_sve_fnmls, "FNMSB_ZPmZZ">;

defm FMAD_ZPmZZ : sve_fp_3op_p_zds_b<0b00, "fmad", int_aarch64_sve_fmad, "FMLA_ZPmZZ", /*isReverseInstr*/ 1>; defm FMAD_ZPmZZ : sve_fp_3op_p_zds_b<0b00, "fmad", int_aarch64_sve_fmad, "FMLA_ZPmZZ", /*isReverseInstr*/ 1>;

defm FMSB_ZPmZZ : sve_fp_3op_p_zds_b<0b01, "fmsb", int_aarch64_sve_fmsb, "FMLS_ZPmZZ", /*isReverseInstr*/ 1>; defm FMSB_ZPmZZ : sve_fp_3op_p_zds_b<0b01, "fmsb", int_aarch64_sve_fmsb, "FMLS_ZPmZZ", /*isReverseInstr*/ 1>;

defm FNMAD_ZPmZZ : sve_fp_3op_p_zds_b<0b10, "fnmad", int_aarch64_sve_fnmad, "FNMLA_ZPmZZ", /*isReverseInstr*/ 1>; defm FNMAD_ZPmZZ : sve_fp_3op_p_zds_b<0b10, "fnmad", int_aarch64_sve_fnmad, "FNMLA_ZPmZZ", /*isReverseInstr*/ 1>;

defm FNMSB_ZPmZZ : sve_fp_3op_p_zds_b<0b11, "fnmsb", int_aarch64_sve_fnmsb, "FNMLS_ZPmZZ", /*isReverseInstr*/ 1>; defm FNMSB_ZPmZZ : sve_fp_3op_p_zds_b<0b11, "fnmsb", int_aarch64_sve_fnmsb, "FNMLS_ZPmZZ", /*isReverseInstr*/ 1>;

▲ Show 20 Lines • Show All 2,912 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-fp-combine.ll

Show First 20 Lines • Show All 820 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
%mul = fmul contract <vscale x 2 x double> %m1, %m2		%mul = fmul contract <vscale x 2 x double> %m1, %m2
%res = fsub contract <vscale x 2 x double> %mul, %acc		%res = fsub contract <vscale x 2 x double> %mul, %acc
ret <vscale x 2 x double> %res		ret <vscale x 2 x double> %res
}		}

define <vscale x 8 x half> @fadd_h_sel(<vscale x 8 x half> %a, <vscale x 8 x half> %b, <vscale x 8 x i1> %mask) {		define <vscale x 8 x half> @fadd_h_sel(<vscale x 8 x half> %a, <vscale x 8 x half> %b, <vscale x 8 x i1> %mask) {
; CHECK-LABEL: fadd_h_sel:		; CHECK-LABEL: fadd_h_sel:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov z2.h, #0 // =0x0		; CHECK-NEXT: fadd z0.h, p0/m, z0.h, z1.h
; CHECK-NEXT: sel z1.h, p0, z1.h, z2.h
; CHECK-NEXT: fadd z0.h, z0.h, z1.h
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%sel = select <vscale x 8 x i1> %mask, <vscale x 8 x half> %b, <vscale x 8 x half> zeroinitializer		%sel = select <vscale x 8 x i1> %mask, <vscale x 8 x half> %b, <vscale x 8 x half> zeroinitializer
%fadd = fadd nsz <vscale x 8 x half> %a, %sel		%fadd = fadd nsz <vscale x 8 x half> %a, %sel
ret <vscale x 8 x half> %fadd		ret <vscale x 8 x half> %fadd
}		}

define <vscale x 4 x float> @fadd_s_sel(<vscale x 4 x float> %a, <vscale x 4 x float> %b, <vscale x 4 x i1> %mask) {		define <vscale x 4 x float> @fadd_s_sel(<vscale x 4 x float> %a, <vscale x 4 x float> %b, <vscale x 4 x i1> %mask) {
; CHECK-LABEL: fadd_s_sel:		; CHECK-LABEL: fadd_s_sel:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov z2.s, #0 // =0x0		; CHECK-NEXT: fadd z0.s, p0/m, z0.s, z1.s
; CHECK-NEXT: sel z1.s, p0, z1.s, z2.s
; CHECK-NEXT: fadd z0.s, z0.s, z1.s
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%sel = select <vscale x 4 x i1> %mask, <vscale x 4 x float> %b, <vscale x 4 x float> zeroinitializer		%sel = select <vscale x 4 x i1> %mask, <vscale x 4 x float> %b, <vscale x 4 x float> zeroinitializer
%fadd = fadd nsz <vscale x 4 x float> %a, %sel		%fadd = fadd nsz <vscale x 4 x float> %a, %sel
ret <vscale x 4 x float> %fadd		ret <vscale x 4 x float> %fadd
}		}

define <vscale x 2 x double> @fadd_d_sel(<vscale x 2 x double> %a, <vscale x 2 x double> %b, <vscale x 2 x i1> %mask) {		define <vscale x 2 x double> @fadd_d_sel(<vscale x 2 x double> %a, <vscale x 2 x double> %b, <vscale x 2 x i1> %mask) {
; CHECK-LABEL: fadd_d_sel:		; CHECK-LABEL: fadd_d_sel:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov z2.d, #0 // =0x0		; CHECK-NEXT: fadd z0.d, p0/m, z0.d, z1.d
; CHECK-NEXT: sel z1.d, p0, z1.d, z2.d
; CHECK-NEXT: fadd z0.d, z0.d, z1.d
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%sel = select <vscale x 2 x i1> %mask, <vscale x 2 x double> %b, <vscale x 2 x double> zeroinitializer		%sel = select <vscale x 2 x i1> %mask, <vscale x 2 x double> %b, <vscale x 2 x double> zeroinitializer
%fadd = fadd nsz <vscale x 2 x double> %a, %sel		%fadd = fadd nsz <vscale x 2 x double> %a, %sel
ret <vscale x 2 x double> %fadd		ret <vscale x 2 x double> %fadd
}		}

define <vscale x 8 x half> @fsub_h_sel(<vscale x 8 x half> %a, <vscale x 8 x half> %b, <vscale x 8 x i1> %mask) {		define <vscale x 8 x half> @fsub_h_sel(<vscale x 8 x half> %a, <vscale x 8 x half> %b, <vscale x 8 x i1> %mask) {
; CHECK-LABEL: fsub_h_sel:		; CHECK-LABEL: fsub_h_sel:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov z2.h, #0 // =0x0		; CHECK-NEXT: fsub z0.h, p0/m, z0.h, z1.h
; CHECK-NEXT: sel z1.h, p0, z1.h, z2.h
; CHECK-NEXT: fsub z0.h, z0.h, z1.h
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%sel = select <vscale x 8 x i1> %mask, <vscale x 8 x half> %b, <vscale x 8 x half> zeroinitializer		%sel = select <vscale x 8 x i1> %mask, <vscale x 8 x half> %b, <vscale x 8 x half> zeroinitializer
%fsub = fsub <vscale x 8 x half> %a, %sel		%fsub = fsub <vscale x 8 x half> %a, %sel
ret <vscale x 8 x half> %fsub		ret <vscale x 8 x half> %fsub
}		}

define <vscale x 4 x float> @fsub_s_sel(<vscale x 4 x float> %a, <vscale x 4 x float> %b, <vscale x 4 x i1> %mask) {		define <vscale x 4 x float> @fsub_s_sel(<vscale x 4 x float> %a, <vscale x 4 x float> %b, <vscale x 4 x i1> %mask) {
; CHECK-LABEL: fsub_s_sel:		; CHECK-LABEL: fsub_s_sel:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov z2.s, #0 // =0x0		; CHECK-NEXT: fsub z0.s, p0/m, z0.s, z1.s
; CHECK-NEXT: sel z1.s, p0, z1.s, z2.s
; CHECK-NEXT: fsub z0.s, z0.s, z1.s
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%sel = select <vscale x 4 x i1> %mask, <vscale x 4 x float> %b, <vscale x 4 x float> zeroinitializer		%sel = select <vscale x 4 x i1> %mask, <vscale x 4 x float> %b, <vscale x 4 x float> zeroinitializer
%fsub = fsub <vscale x 4 x float> %a, %sel		%fsub = fsub <vscale x 4 x float> %a, %sel
ret <vscale x 4 x float> %fsub		ret <vscale x 4 x float> %fsub
}		}

define <vscale x 2 x double> @fsub_d_sel(<vscale x 2 x double> %a, <vscale x 2 x double> %b, <vscale x 2 x i1> %mask) {		define <vscale x 2 x double> @fsub_d_sel(<vscale x 2 x double> %a, <vscale x 2 x double> %b, <vscale x 2 x i1> %mask) {
; CHECK-LABEL: fsub_d_sel:		; CHECK-LABEL: fsub_d_sel:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov z2.d, #0 // =0x0		; CHECK-NEXT: fsub z0.d, p0/m, z0.d, z1.d
; CHECK-NEXT: sel z1.d, p0, z1.d, z2.d
; CHECK-NEXT: fsub z0.d, z0.d, z1.d
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%sel = select <vscale x 2 x i1> %mask, <vscale x 2 x double> %b, <vscale x 2 x double> zeroinitializer		%sel = select <vscale x 2 x i1> %mask, <vscale x 2 x double> %b, <vscale x 2 x double> zeroinitializer
%fsub = fsub <vscale x 2 x double> %a, %sel		%fsub = fsub <vscale x 2 x double> %a, %sel
ret <vscale x 2 x double> %fsub		ret <vscale x 2 x double> %fsub
}		}

define <vscale x 8 x half> @fadd_sel_fmul_h(<vscale x 8 x half> %a, <vscale x 8 x half> %b, <vscale x 8 x half> %c, <vscale x 8 x i1> %mask) {		define <vscale x 8 x half> @fadd_sel_fmul_h(<vscale x 8 x half> %a, <vscale x 8 x half> %b, <vscale x 8 x half> %c, <vscale x 8 x i1> %mask) {
; CHECK-LABEL: fadd_sel_fmul_h:		; CHECK-LABEL: fadd_sel_fmul_h:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov z3.h, #0 // =0x0		; CHECK-NEXT: fmla z0.h, p0/m, z1.h, z2.h
; CHECK-NEXT: fmul z1.h, z1.h, z2.h
; CHECK-NEXT: sel z1.h, p0, z1.h, z3.h
; CHECK-NEXT: fadd z0.h, z0.h, z1.h
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%fmul = fmul <vscale x 8 x half> %b, %c		%fmul = fmul <vscale x 8 x half> %b, %c
%sel = select <vscale x 8 x i1> %mask, <vscale x 8 x half> %fmul, <vscale x 8 x half> zeroinitializer		%sel = select <vscale x 8 x i1> %mask, <vscale x 8 x half> %fmul, <vscale x 8 x half> zeroinitializer
%fadd = fadd nsz contract <vscale x 8 x half> %a, %sel		%fadd = fadd nsz contract <vscale x 8 x half> %a, %sel
ret <vscale x 8 x half> %fadd		ret <vscale x 8 x half> %fadd
}		}

define <vscale x 4 x float> @fadd_sel_fmul_s(<vscale x 4 x float> %a, <vscale x 4 x float> %b, <vscale x 4 x float> %c, <vscale x 4 x i1> %mask) {		define <vscale x 4 x float> @fadd_sel_fmul_s(<vscale x 4 x float> %a, <vscale x 4 x float> %b, <vscale x 4 x float> %c, <vscale x 4 x i1> %mask) {
; CHECK-LABEL: fadd_sel_fmul_s:		; CHECK-LABEL: fadd_sel_fmul_s:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov z3.s, #0 // =0x0		; CHECK-NEXT: fmla z0.s, p0/m, z1.s, z2.s
; CHECK-NEXT: fmul z1.s, z1.s, z2.s
; CHECK-NEXT: sel z1.s, p0, z1.s, z3.s
; CHECK-NEXT: fadd z0.s, z0.s, z1.s
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%fmul = fmul <vscale x 4 x float> %b, %c		%fmul = fmul <vscale x 4 x float> %b, %c
%sel = select <vscale x 4 x i1> %mask, <vscale x 4 x float> %fmul, <vscale x 4 x float> zeroinitializer		%sel = select <vscale x 4 x i1> %mask, <vscale x 4 x float> %fmul, <vscale x 4 x float> zeroinitializer
%fadd = fadd nsz contract <vscale x 4 x float> %a, %sel		%fadd = fadd nsz contract <vscale x 4 x float> %a, %sel
ret <vscale x 4 x float> %fadd		ret <vscale x 4 x float> %fadd
}		}

define <vscale x 2 x double> @fadd_sel_fmul_d(<vscale x 2 x double> %a, <vscale x 2 x double> %b, <vscale x 2 x double> %c, <vscale x 2 x i1> %mask) {		define <vscale x 2 x double> @fadd_sel_fmul_d(<vscale x 2 x double> %a, <vscale x 2 x double> %b, <vscale x 2 x double> %c, <vscale x 2 x i1> %mask) {
; CHECK-LABEL: fadd_sel_fmul_d:		; CHECK-LABEL: fadd_sel_fmul_d:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov z3.d, #0 // =0x0		; CHECK-NEXT: fmla z0.d, p0/m, z1.d, z2.d
; CHECK-NEXT: fmul z1.d, z1.d, z2.d
; CHECK-NEXT: sel z1.d, p0, z1.d, z3.d
; CHECK-NEXT: fadd z0.d, z0.d, z1.d
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%fmul = fmul <vscale x 2 x double> %b, %c		%fmul = fmul <vscale x 2 x double> %b, %c
%sel = select <vscale x 2 x i1> %mask, <vscale x 2 x double> %fmul, <vscale x 2 x double> zeroinitializer		%sel = select <vscale x 2 x i1> %mask, <vscale x 2 x double> %fmul, <vscale x 2 x double> zeroinitializer
%fadd = fadd nsz contract <vscale x 2 x double> %a, %sel		%fadd = fadd nsz contract <vscale x 2 x double> %a, %sel
ret <vscale x 2 x double> %fadd		ret <vscale x 2 x double> %fadd
}		}

define <vscale x 8 x half> @fsub_sel_fmul_h(<vscale x 8 x half> %a, <vscale x 8 x half> %b, <vscale x 8 x half> %c, <vscale x 8 x i1> %mask) {		define <vscale x 8 x half> @fsub_sel_fmul_h(<vscale x 8 x half> %a, <vscale x 8 x half> %b, <vscale x 8 x half> %c, <vscale x 8 x i1> %mask) {
; CHECK-LABEL: fsub_sel_fmul_h:		; CHECK-LABEL: fsub_sel_fmul_h:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov z3.h, #0 // =0x0		; CHECK-NEXT: fmls z0.h, p0/m, z1.h, z2.h
; CHECK-NEXT: fmul z1.h, z1.h, z2.h
; CHECK-NEXT: sel z1.h, p0, z1.h, z3.h
; CHECK-NEXT: fsub z0.h, z0.h, z1.h
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%fmul = fmul <vscale x 8 x half> %b, %c		%fmul = fmul <vscale x 8 x half> %b, %c
%sel = select <vscale x 8 x i1> %mask, <vscale x 8 x half> %fmul, <vscale x 8 x half> zeroinitializer		%sel = select <vscale x 8 x i1> %mask, <vscale x 8 x half> %fmul, <vscale x 8 x half> zeroinitializer
%fsub = fsub contract <vscale x 8 x half> %a, %sel		%fsub = fsub contract <vscale x 8 x half> %a, %sel
ret <vscale x 8 x half> %fsub		ret <vscale x 8 x half> %fsub
}		}

define <vscale x 4 x float> @fsub_sel_fmul_s(<vscale x 4 x float> %a, <vscale x 4 x float> %b, <vscale x 4 x float> %c, <vscale x 4 x i1> %mask) {		define <vscale x 4 x float> @fsub_sel_fmul_s(<vscale x 4 x float> %a, <vscale x 4 x float> %b, <vscale x 4 x float> %c, <vscale x 4 x i1> %mask) {
; CHECK-LABEL: fsub_sel_fmul_s:		; CHECK-LABEL: fsub_sel_fmul_s:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov z3.s, #0 // =0x0		; CHECK-NEXT: fmls z0.s, p0/m, z1.s, z2.s
; CHECK-NEXT: fmul z1.s, z1.s, z2.s
; CHECK-NEXT: sel z1.s, p0, z1.s, z3.s
; CHECK-NEXT: fsub z0.s, z0.s, z1.s
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%fmul = fmul <vscale x 4 x float> %b, %c		%fmul = fmul <vscale x 4 x float> %b, %c
%sel = select <vscale x 4 x i1> %mask, <vscale x 4 x float> %fmul, <vscale x 4 x float> zeroinitializer		%sel = select <vscale x 4 x i1> %mask, <vscale x 4 x float> %fmul, <vscale x 4 x float> zeroinitializer
%fsub = fsub contract <vscale x 4 x float> %a, %sel		%fsub = fsub contract <vscale x 4 x float> %a, %sel
ret <vscale x 4 x float> %fsub		ret <vscale x 4 x float> %fsub
}		}

define <vscale x 2 x double> @fsub_sel_fmul_d(<vscale x 2 x double> %a, <vscale x 2 x double> %b, <vscale x 2 x double> %c, <vscale x 2 x i1> %mask) {		define <vscale x 2 x double> @fsub_sel_fmul_d(<vscale x 2 x double> %a, <vscale x 2 x double> %b, <vscale x 2 x double> %c, <vscale x 2 x i1> %mask) {
; CHECK-LABEL: fsub_sel_fmul_d:		; CHECK-LABEL: fsub_sel_fmul_d:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov z3.d, #0 // =0x0		; CHECK-NEXT: fmls z0.d, p0/m, z1.d, z2.d
; CHECK-NEXT: fmul z1.d, z1.d, z2.d
; CHECK-NEXT: sel z1.d, p0, z1.d, z3.d
; CHECK-NEXT: fsub z0.d, z0.d, z1.d
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%fmul = fmul <vscale x 2 x double> %b, %c		%fmul = fmul <vscale x 2 x double> %b, %c
%sel = select <vscale x 2 x i1> %mask, <vscale x 2 x double> %fmul, <vscale x 2 x double> zeroinitializer		%sel = select <vscale x 2 x i1> %mask, <vscale x 2 x double> %fmul, <vscale x 2 x double> zeroinitializer
%fsub = fsub contract <vscale x 2 x double> %a, %sel		%fsub = fsub contract <vscale x 2 x double> %a, %sel
ret <vscale x 2 x double> %fsub		ret <vscale x 2 x double> %fsub
}		}

		; Verify combine requires contract fast-math flag.
		define <vscale x 4 x float> @fadd_sel_fmul_no_contract_s(<vscale x 4 x float> %a, <vscale x 4 x float> %b, <vscale x 4 x float> %c, <vscale x 4 x i1> %mask) {
		; CHECK-LABEL: fadd_sel_fmul_no_contract_s:
		; CHECK: // %bb.0:
		; CHECK-NEXT: fmul z1.s, z1.s, z2.s
		; CHECK-NEXT: fadd z0.s, p0/m, z0.s, z1.s
		; CHECK-NEXT: ret
		%fmul = fmul <vscale x 4 x float> %b, %c
		%sel = select <vscale x 4 x i1> %mask, <vscale x 4 x float> %fmul, <vscale x 4 x float> zeroinitializer
		%fadd = fadd nsz <vscale x 4 x float> %a, %sel
		ret <vscale x 4 x float> %fadd
		}

		; Verify combine requires no-signed zeros fast-math flag.
		define <vscale x 4 x float> @fadd_sel_fmul_no_nsz_s(<vscale x 4 x float> %a, <vscale x 4 x float> %b, <vscale x 4 x float> %c, <vscale x 4 x i1> %mask) {
		; CHECK-LABEL: fadd_sel_fmul_no_nsz_s:
		; CHECK: // %bb.0:
		; CHECK-NEXT: mov z3.s, #0 // =0x0
		; CHECK-NEXT: fmul z1.s, z1.s, z2.s
		; CHECK-NEXT: sel z1.s, p0, z1.s, z3.s
		; CHECK-NEXT: fadd z0.s, z0.s, z1.s
		; CHECK-NEXT: ret
		%fmul = fmul <vscale x 4 x float> %b, %c
		%sel = select <vscale x 4 x i1> %mask, <vscale x 4 x float> %fmul, <vscale x 4 x float> zeroinitializer
		%fadd = fadd contract <vscale x 4 x float> %a, %sel
		ret <vscale x 4 x float> %fadd
		}

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Add patterns to select masked FP arithClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 450732

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

llvm/test/CodeGen/AArch64/sve-fp-combine.ll

[AArch64][SVE] Add patterns to select masked FP arith
ClosedPublic