This is an archive of the discontinued LLVM Phabricator instance.

LGTM! I think the patch looks good to go as is, but if you do manage to work out why we're adding the compares and selects for @fsqrt_4f32 and remove them that would be awesome. :)

llvm/test/CodeGen/AArch64/sve-fp-reciprocal.ll
25	It's interesting that the fmul here is the predicated form, whereas for fdiv_recip_4f32 it's the unpredicated form. This has nothing to do with your patch though, but perhaps worth investigating in the future?
56	nit: Just for clarity is it worth renaming these to `@fdiv_2f64` and `@fdiv_recip_2f64` to be consistent with the f32 versions?
137	Again, I don't think this is caused by your patch, but it's probably worth investigating why we're selecting between the original input and the estimate based on a zero input. It feels inconsistent with `@fsqrt_4f32` where we don't seem to worry about the input.
144	nit: Again, maybe for consistency it's better to use the name `@fsqrt_2f64` here and below?

This revision is now accepted and ready to land.Oct 13 2021, 1:51 AM

Matt added a subscriber: Matt.Oct 13 2021, 2:50 PM

paulwalker-arm added a subscriber: paulwalker-arm.Oct 14 2021, 9:20 AM

paulwalker-arm added inline comments.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
8255	Out of interest is there a reason we ignore f16 vectors here?
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
1987–2011	Are these required? The patterns should already exist within the instruction definition classes. All that's needed is to add c++ code to lower the intrinsics to these `AArch64ISD` nodes, which is something we've done for other operations so as not to have duplicate patterns.

Add lowering of the aarch64_sve_frecp[e|s]_x & aarch64_sve_frsqrt[e|s]_x intrinsics to existing AArch64ISD nodes in AArch64ISelLowering.cpp & removed duplicate tablegen patterns.
Added tests for nxv2f16, nxv4f16 & nxv8f16 types

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
8255	When I created this patch I thought that `DAGCombiner::BuildDivEstimate/BuildSqrtEstimate` didn't support f16 types, which was incorrect. I've added the f16 vector types here & added tests for these to sve-fp-reciprocal.ll.

Harbormaster completed remote builds in B129716: Diff 380921.Oct 20 2021, 6:40 AM

LGTM! It looks like you've addressed @paulwalker-arm's comments. I'm happy for us to look at investigating removing the fcmeq and sel instructions at a later time.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
8253	nit: Can you fix the formatting issue before merging please?
llvm/test/CodeGen/AArch64/sve-fp-reciprocal.ll
158	nit: For the fsqrt functions can you remove the second `%b` arguments as they seem to be unused?

paulwalker-arm added inline comments.Oct 21 2021, 3:25 AM

llvm/test/CodeGen/AArch64/sve-fp-reciprocal.ll
25	@david-arm The predicate is generated for the unpacked types so that inactive lanes can never trigger floating point exceptions. So this is expected behaviour. @kmclaughlin This does raise an interesting point though. Is it safe to use the reciprocal instructions for unpacked types? With the answer depending on whether these instruction can generate exceptions.

Just a couple of extra points that depend on the answer to my previous question.

llvm/lib/Target/AArch64/SVEInstrFormats.td
1935–1936 ↗	(On Diff #380921)	If the answer to my previous question is that it is unsafe for unpacked vectors then please remove these patterns. If there comes a time that we definitely want to support unpacked vectors I think we'll probably need slightly differ ISEL patterns for those types.
2637 ↗	(On Diff #380921)	As above.

Removed additional patterns and tests for unpacked vector types.

llvm/lib/Target/AArch64/SVEInstrFormats.td
1935–1936 ↗	(On Diff #380921)	Hi @paulwalker-arm, I've removed these patterns as the reciprocal instructions can generate exceptions.

Harbormaster completed remote builds in B130130: Diff 381526.Oct 22 2021, 6:44 AM

Thanks @kmclaughlin.

This revision was landed with ongoing or failed builds.Oct 25 2021, 3:31 AM

Closed by commit rG1f49b71fe5fa: [SVE][CodeGen] Enable reciprocal estimates for scalable fdiv/fsqrt (authored by kmclaughlin). · Explain Why

This revision was automatically updated to reflect the committed changes.

kmclaughlin added a commit: rG1f49b71fe5fa: [SVE][CodeGen] Enable reciprocal estimates for scalable fdiv/fsqrt.

pengfei mentioned this in D114765: [X86][FP16] Only generate approximate rsqrt when Reciprocal is true for half type.Nov 29 2021, 6:45 PM

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

22 lines

AArch64SVEInstrInfo.td

8 lines

test/

CodeGen/

AArch64/

sve-fp-reciprocal.ll

179 lines

Diff 381912

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,124 Lines • ▼ Show 20 Lines	return DAG.getNode(AArch64ISD::FCVTZS_MERGE_PASSTHRU, dl,
Op.getValueType(), Op.getOperand(2), Op.getOperand(3),		Op.getValueType(), Op.getOperand(2), Op.getOperand(3),
Op.getOperand(1));		Op.getOperand(1));
case Intrinsic::aarch64_sve_fsqrt:		case Intrinsic::aarch64_sve_fsqrt:
return DAG.getNode(AArch64ISD::FSQRT_MERGE_PASSTHRU, dl, Op.getValueType(),		return DAG.getNode(AArch64ISD::FSQRT_MERGE_PASSTHRU, dl, Op.getValueType(),
Op.getOperand(2), Op.getOperand(3), Op.getOperand(1));		Op.getOperand(2), Op.getOperand(3), Op.getOperand(1));
case Intrinsic::aarch64_sve_frecpx:		case Intrinsic::aarch64_sve_frecpx:
return DAG.getNode(AArch64ISD::FRECPX_MERGE_PASSTHRU, dl, Op.getValueType(),		return DAG.getNode(AArch64ISD::FRECPX_MERGE_PASSTHRU, dl, Op.getValueType(),
Op.getOperand(2), Op.getOperand(3), Op.getOperand(1));		Op.getOperand(2), Op.getOperand(3), Op.getOperand(1));
		case Intrinsic::aarch64_sve_frecpe_x:
		return DAG.getNode(AArch64ISD::FRECPE, dl, Op.getValueType(),
		Op.getOperand(1));
		case Intrinsic::aarch64_sve_frecps_x:
		return DAG.getNode(AArch64ISD::FRECPS, dl, Op.getValueType(),
		Op.getOperand(1), Op.getOperand(2));
		case Intrinsic::aarch64_sve_frsqrte_x:
		return DAG.getNode(AArch64ISD::FRSQRTE, dl, Op.getValueType(),
		Op.getOperand(1));
		case Intrinsic::aarch64_sve_frsqrts_x:
		return DAG.getNode(AArch64ISD::FRSQRTS, dl, Op.getValueType(),
		Op.getOperand(1), Op.getOperand(2));
case Intrinsic::aarch64_sve_fabs:		case Intrinsic::aarch64_sve_fabs:
return DAG.getNode(AArch64ISD::FABS_MERGE_PASSTHRU, dl, Op.getValueType(),		return DAG.getNode(AArch64ISD::FABS_MERGE_PASSTHRU, dl, Op.getValueType(),
Op.getOperand(2), Op.getOperand(3), Op.getOperand(1));		Op.getOperand(2), Op.getOperand(3), Op.getOperand(1));
case Intrinsic::aarch64_sve_abs:		case Intrinsic::aarch64_sve_abs:
return DAG.getNode(AArch64ISD::ABS_MERGE_PASSTHRU, dl, Op.getValueType(),		return DAG.getNode(AArch64ISD::ABS_MERGE_PASSTHRU, dl, Op.getValueType(),
Op.getOperand(2), Op.getOperand(3), Op.getOperand(1));		Op.getOperand(2), Op.getOperand(3), Op.getOperand(1));
case Intrinsic::aarch64_sve_neg:		case Intrinsic::aarch64_sve_neg:
return DAG.getNode(AArch64ISD::NEG_MERGE_PASSTHRU, dl, Op.getValueType(),		return DAG.getNode(AArch64ISD::NEG_MERGE_PASSTHRU, dl, Op.getValueType(),
▲ Show 20 Lines • Show All 4,089 Lines • ▼ Show 20 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// AArch64 Optimization Hooks		// AArch64 Optimization Hooks
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

static SDValue getEstimate(const AArch64Subtarget *ST, unsigned Opcode,		static SDValue getEstimate(const AArch64Subtarget *ST, unsigned Opcode,
SDValue Operand, SelectionDAG &DAG,		SDValue Operand, SelectionDAG &DAG,
int &ExtraSteps) {		int &ExtraSteps) {
EVT VT = Operand.getValueType();		EVT VT = Operand.getValueType();
if (ST->hasNEON() &&		if ((ST->hasNEON() &&
(VT == MVT::f64 \|\| VT == MVT::v1f64 \|\| VT == MVT::v2f64 \|\|		(VT == MVT::f64 \|\| VT == MVT::v1f64 \|\| VT == MVT::v2f64 \|\|
VT == MVT::f32 \|\| VT == MVT::v1f32 \|\|		VT == MVT::f32 \|\| VT == MVT::v1f32 \|\| VT == MVT::v2f32 \|\|
VT == MVT::v2f32 \|\| VT == MVT::v4f32)) {		VT == MVT::v4f32)) \|\|
		david-armUnsubmitted Done Reply Inline Actions nit: Can you fix the formatting issue before merging please? david-arm: nit: Can you fix the formatting issue before merging please?
		(ST->hasSVE() &&
		(VT == MVT::nxv8f16 \|\| VT == MVT::nxv4f32 \|\| VT == MVT::nxv2f64))) {
		paulwalker-armUnsubmitted Done Reply Inline Actions Out of interest is there a reason we ignore f16 vectors here? paulwalker-arm: Out of interest is there a reason we ignore f16 vectors here?
		kmclaughlinAuthorUnsubmitted Done Reply Inline Actions When I created this patch I thought that `DAGCombiner::BuildDivEstimate/BuildSqrtEstimate` didn't support f16 types, which was incorrect. I've added the f16 vector types here & added tests for these to sve-fp-reciprocal.ll. kmclaughlin: When I created this patch I thought that `DAGCombiner::BuildDivEstimate/BuildSqrtEstimate`…
if (ExtraSteps == TargetLoweringBase::ReciprocalEstimate::Unspecified)		if (ExtraSteps == TargetLoweringBase::ReciprocalEstimate::Unspecified)
// For the reciprocal estimates, convergence is quadratic, so the number		// For the reciprocal estimates, convergence is quadratic, so the number
// of digits is doubled after each iteration. In ARMv8, the accuracy of		// of digits is doubled after each iteration. In ARMv8, the accuracy of
// the initial estimate is 2^-8. Thus the number of extra steps to refine		// the initial estimate is 2^-8. Thus the number of extra steps to refine
// the result for float (23 mantissa bits) is 2 and for double (52		// the result for float (23 mantissa bits) is 2 and for double (52
// mantissa bits) is 3.		// mantissa bits) is 3.
ExtraSteps = VT.getScalarType() == MVT::f64 ? 3 : 2;		ExtraSteps = VT.getScalarType() == MVT::f64 ? 3 : 2;

▲ Show 20 Lines • Show All 10,952 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

Show First 20 Lines • Show All 396 Lines • ▼ Show 20 Lines	let Predicates = [HasSVEorStreamingSVE] in {
defm SABD_ZPmZ : sve_int_bin_pred_arit_1<0b100, "sabd", "SABD_ZPZZ", int_aarch64_sve_sabd, DestructiveBinaryComm>;		defm SABD_ZPmZ : sve_int_bin_pred_arit_1<0b100, "sabd", "SABD_ZPZZ", int_aarch64_sve_sabd, DestructiveBinaryComm>;
defm UABD_ZPmZ : sve_int_bin_pred_arit_1<0b101, "uabd", "UABD_ZPZZ", int_aarch64_sve_uabd, DestructiveBinaryComm>;		defm UABD_ZPmZ : sve_int_bin_pred_arit_1<0b101, "uabd", "UABD_ZPZZ", int_aarch64_sve_uabd, DestructiveBinaryComm>;

defm SMAX_ZPZZ : sve_int_bin_pred_bhsd<AArch64smax_p>;		defm SMAX_ZPZZ : sve_int_bin_pred_bhsd<AArch64smax_p>;
defm UMAX_ZPZZ : sve_int_bin_pred_bhsd<AArch64umax_p>;		defm UMAX_ZPZZ : sve_int_bin_pred_bhsd<AArch64umax_p>;
defm SMIN_ZPZZ : sve_int_bin_pred_bhsd<AArch64smin_p>;		defm SMIN_ZPZZ : sve_int_bin_pred_bhsd<AArch64smin_p>;
defm UMIN_ZPZZ : sve_int_bin_pred_bhsd<AArch64umin_p>;		defm UMIN_ZPZZ : sve_int_bin_pred_bhsd<AArch64umin_p>;

defm FRECPE_ZZ : sve_fp_2op_u_zd<0b110, "frecpe", int_aarch64_sve_frecpe_x>;		defm FRECPE_ZZ : sve_fp_2op_u_zd<0b110, "frecpe", AArch64frecpe>;
defm FRSQRTE_ZZ : sve_fp_2op_u_zd<0b111, "frsqrte", int_aarch64_sve_frsqrte_x>;		defm FRSQRTE_ZZ : sve_fp_2op_u_zd<0b111, "frsqrte", AArch64frsqrte>;

defm FADD_ZPmI : sve_fp_2op_i_p_zds<0b000, "fadd", "FADD_ZPZI", sve_fpimm_half_one, fpimm_half, fpimm_one, int_aarch64_sve_fadd>;		defm FADD_ZPmI : sve_fp_2op_i_p_zds<0b000, "fadd", "FADD_ZPZI", sve_fpimm_half_one, fpimm_half, fpimm_one, int_aarch64_sve_fadd>;
defm FSUB_ZPmI : sve_fp_2op_i_p_zds<0b001, "fsub", "FSUB_ZPZI", sve_fpimm_half_one, fpimm_half, fpimm_one, int_aarch64_sve_fsub>;		defm FSUB_ZPmI : sve_fp_2op_i_p_zds<0b001, "fsub", "FSUB_ZPZI", sve_fpimm_half_one, fpimm_half, fpimm_one, int_aarch64_sve_fsub>;
defm FMUL_ZPmI : sve_fp_2op_i_p_zds<0b010, "fmul", "FMUL_ZPZI", sve_fpimm_half_two, fpimm_half, fpimm_two, int_aarch64_sve_fmul>;		defm FMUL_ZPmI : sve_fp_2op_i_p_zds<0b010, "fmul", "FMUL_ZPZI", sve_fpimm_half_two, fpimm_half, fpimm_two, int_aarch64_sve_fmul>;
defm FSUBR_ZPmI : sve_fp_2op_i_p_zds<0b011, "fsubr", "FSUBR_ZPZI", sve_fpimm_half_one, fpimm_half, fpimm_one, int_aarch64_sve_fsubr>;		defm FSUBR_ZPmI : sve_fp_2op_i_p_zds<0b011, "fsubr", "FSUBR_ZPZI", sve_fpimm_half_one, fpimm_half, fpimm_one, int_aarch64_sve_fsubr>;
defm FMAXNM_ZPmI : sve_fp_2op_i_p_zds<0b100, "fmaxnm", "FMAXNM_ZPZI", sve_fpimm_zero_one, fpimm0, fpimm_one, int_aarch64_sve_fmaxnm>;		defm FMAXNM_ZPmI : sve_fp_2op_i_p_zds<0b100, "fmaxnm", "FMAXNM_ZPZI", sve_fpimm_zero_one, fpimm0, fpimm_one, int_aarch64_sve_fmaxnm>;
defm FMINNM_ZPmI : sve_fp_2op_i_p_zds<0b101, "fminnm", "FMINNM_ZPZI", sve_fpimm_zero_one, fpimm0, fpimm_one, int_aarch64_sve_fminnm>;		defm FMINNM_ZPmI : sve_fp_2op_i_p_zds<0b101, "fminnm", "FMINNM_ZPZI", sve_fpimm_zero_one, fpimm0, fpimm_one, int_aarch64_sve_fminnm>;
defm FMAX_ZPmI : sve_fp_2op_i_p_zds<0b110, "fmax", "FMAX_ZPZI", sve_fpimm_zero_one, fpimm0, fpimm_one, int_aarch64_sve_fmax>;		defm FMAX_ZPmI : sve_fp_2op_i_p_zds<0b110, "fmax", "FMAX_ZPZI", sve_fpimm_zero_one, fpimm0, fpimm_one, int_aarch64_sve_fmax>;
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	let Predicates = [HasSVEorStreamingSVE] in {
defm FMUL_ZZZ : sve_fp_3op_u_zd<0b010, "fmul", fmul, AArch64fmul_p>;		defm FMUL_ZZZ : sve_fp_3op_u_zd<0b010, "fmul", fmul, AArch64fmul_p>;
} // End HasSVEorStreamingSVE		} // End HasSVEorStreamingSVE

let Predicates = [HasSVE] in {		let Predicates = [HasSVE] in {
defm FTSMUL_ZZZ : sve_fp_3op_u_zd_ftsmul<0b011, "ftsmul", int_aarch64_sve_ftsmul_x>;		defm FTSMUL_ZZZ : sve_fp_3op_u_zd_ftsmul<0b011, "ftsmul", int_aarch64_sve_ftsmul_x>;
} // End HasSVE		} // End HasSVE

let Predicates = [HasSVEorStreamingSVE] in {		let Predicates = [HasSVEorStreamingSVE] in {
defm FRECPS_ZZZ : sve_fp_3op_u_zd<0b110, "frecps", int_aarch64_sve_frecps_x>;		defm FRECPS_ZZZ : sve_fp_3op_u_zd<0b110, "frecps", AArch64frecps>;
defm FRSQRTS_ZZZ : sve_fp_3op_u_zd<0b111, "frsqrts", int_aarch64_sve_frsqrts_x>;		defm FRSQRTS_ZZZ : sve_fp_3op_u_zd<0b111, "frsqrts", AArch64frsqrts>;
} // End HasSVEorStreamingSVE		} // End HasSVEorStreamingSVE

let Predicates = [HasSVE] in {		let Predicates = [HasSVE] in {
defm FTSSEL_ZZZ : sve_int_bin_cons_misc_0_b<"ftssel", int_aarch64_sve_ftssel_x>;		defm FTSSEL_ZZZ : sve_int_bin_cons_misc_0_b<"ftssel", int_aarch64_sve_ftssel_x>;
} // End HasSVE		} // End HasSVE

let Predicates = [HasSVEorStreamingSVE] in {		let Predicates = [HasSVEorStreamingSVE] in {
defm FCADD_ZPmZ : sve_fp_fcadd<"fcadd", int_aarch64_sve_fcadd>;		defm FCADD_ZPmZ : sve_fp_fcadd<"fcadd", int_aarch64_sve_fcadd>;
▲ Show 20 Lines • Show All 1,482 Lines • ▼ Show 20 Lines	def : Pat<(add GPR32:$op, (i32 (trunc (vscale (sve_cntd_imm_neg i32:$imm))))),
(i32 (EXTRACT_SUBREG (DECD_XPiI (INSERT_SUBREG (i64 (IMPLICIT_DEF)),		(i32 (EXTRACT_SUBREG (DECD_XPiI (INSERT_SUBREG (i64 (IMPLICIT_DEF)),
GPR32:$op, sub_32), 31, $imm),		GPR32:$op, sub_32), 31, $imm),
sub_32))>;		sub_32))>;
}		}

def : Pat<(add GPR64:$op, (vscale (sve_rdvl_imm i32:$imm))),		def : Pat<(add GPR64:$op, (vscale (sve_rdvl_imm i32:$imm))),
(ADDVL_XXI GPR64:$op, $imm)>;		(ADDVL_XXI GPR64:$op, $imm)>;

// FIXME: BigEndian requires an additional REV instruction to satisfy the		// FIXME: BigEndian requires an additional REV instruction to satisfy the
// constraint that none of the bits change when stored to memory as one		// constraint that none of the bits change when stored to memory as one
// type, and and reloaded as another type.		// type, and and reloaded as another type.
let Predicates = [IsLE] in {		let Predicates = [IsLE] in {
def : Pat<(nxv16i8 (bitconvert (nxv8i16 ZPR:$src))), (nxv16i8 ZPR:$src)>;		def : Pat<(nxv16i8 (bitconvert (nxv8i16 ZPR:$src))), (nxv16i8 ZPR:$src)>;
def : Pat<(nxv16i8 (bitconvert (nxv4i32 ZPR:$src))), (nxv16i8 ZPR:$src)>;		def : Pat<(nxv16i8 (bitconvert (nxv4i32 ZPR:$src))), (nxv16i8 ZPR:$src)>;
def : Pat<(nxv16i8 (bitconvert (nxv2i64 ZPR:$src))), (nxv16i8 ZPR:$src)>;		def : Pat<(nxv16i8 (bitconvert (nxv2i64 ZPR:$src))), (nxv16i8 ZPR:$src)>;
def : Pat<(nxv16i8 (bitconvert (nxv8f16 ZPR:$src))), (nxv16i8 ZPR:$src)>;		def : Pat<(nxv16i8 (bitconvert (nxv8f16 ZPR:$src))), (nxv16i8 ZPR:$src)>;
def : Pat<(nxv16i8 (bitconvert (nxv4f32 ZPR:$src))), (nxv16i8 ZPR:$src)>;		def : Pat<(nxv16i8 (bitconvert (nxv4f32 ZPR:$src))), (nxv16i8 ZPR:$src)>;
def : Pat<(nxv16i8 (bitconvert (nxv2f64 ZPR:$src))), (nxv16i8 ZPR:$src)>;		def : Pat<(nxv16i8 (bitconvert (nxv2f64 ZPR:$src))), (nxv16i8 ZPR:$src)>;

def : Pat<(nxv8i16 (bitconvert (nxv16i8 ZPR:$src))), (nxv8i16 ZPR:$src)>;		def : Pat<(nxv8i16 (bitconvert (nxv16i8 ZPR:$src))), (nxv8i16 ZPR:$src)>;
def : Pat<(nxv8i16 (bitconvert (nxv4i32 ZPR:$src))), (nxv8i16 ZPR:$src)>;		def : Pat<(nxv8i16 (bitconvert (nxv4i32 ZPR:$src))), (nxv8i16 ZPR:$src)>;
def : Pat<(nxv8i16 (bitconvert (nxv2i64 ZPR:$src))), (nxv8i16 ZPR:$src)>;		def : Pat<(nxv8i16 (bitconvert (nxv2i64 ZPR:$src))), (nxv8i16 ZPR:$src)>;
def : Pat<(nxv8i16 (bitconvert (nxv8f16 ZPR:$src))), (nxv8i16 ZPR:$src)>;		def : Pat<(nxv8i16 (bitconvert (nxv8f16 ZPR:$src))), (nxv8i16 ZPR:$src)>;
def : Pat<(nxv8i16 (bitconvert (nxv4f32 ZPR:$src))), (nxv8i16 ZPR:$src)>;		def : Pat<(nxv8i16 (bitconvert (nxv4f32 ZPR:$src))), (nxv8i16 ZPR:$src)>;
def : Pat<(nxv8i16 (bitconvert (nxv2f64 ZPR:$src))), (nxv8i16 ZPR:$src)>;		def : Pat<(nxv8i16 (bitconvert (nxv2f64 ZPR:$src))), (nxv8i16 ZPR:$src)>;

def : Pat<(nxv4i32 (bitconvert (nxv16i8 ZPR:$src))), (nxv4i32 ZPR:$src)>;		def : Pat<(nxv4i32 (bitconvert (nxv16i8 ZPR:$src))), (nxv4i32 ZPR:$src)>;
def : Pat<(nxv4i32 (bitconvert (nxv8i16 ZPR:$src))), (nxv4i32 ZPR:$src)>;		def : Pat<(nxv4i32 (bitconvert (nxv8i16 ZPR:$src))), (nxv4i32 ZPR:$src)>;
def : Pat<(nxv4i32 (bitconvert (nxv2i64 ZPR:$src))), (nxv4i32 ZPR:$src)>;		def : Pat<(nxv4i32 (bitconvert (nxv2i64 ZPR:$src))), (nxv4i32 ZPR:$src)>;
def : Pat<(nxv4i32 (bitconvert (nxv8f16 ZPR:$src))), (nxv4i32 ZPR:$src)>;		def : Pat<(nxv4i32 (bitconvert (nxv8f16 ZPR:$src))), (nxv4i32 ZPR:$src)>;
def : Pat<(nxv4i32 (bitconvert (nxv4f32 ZPR:$src))), (nxv4i32 ZPR:$src)>;		def : Pat<(nxv4i32 (bitconvert (nxv4f32 ZPR:$src))), (nxv4i32 ZPR:$src)>;
def : Pat<(nxv4i32 (bitconvert (nxv2f64 ZPR:$src))), (nxv4i32 ZPR:$src)>;		def : Pat<(nxv4i32 (bitconvert (nxv2f64 ZPR:$src))), (nxv4i32 ZPR:$src)>;

		paulwalker-armUnsubmitted Done Reply Inline Actions Are these required? The patterns should already exist within the instruction definition classes. All that's needed is to add c++ code to lower the intrinsics to these `AArch64ISD` nodes, which is something we've done for other operations so as not to have duplicate patterns. paulwalker-arm: Are these required? The patterns should already exist within the instruction definition…
def : Pat<(nxv2i64 (bitconvert (nxv16i8 ZPR:$src))), (nxv2i64 ZPR:$src)>;		def : Pat<(nxv2i64 (bitconvert (nxv16i8 ZPR:$src))), (nxv2i64 ZPR:$src)>;
def : Pat<(nxv2i64 (bitconvert (nxv8i16 ZPR:$src))), (nxv2i64 ZPR:$src)>;		def : Pat<(nxv2i64 (bitconvert (nxv8i16 ZPR:$src))), (nxv2i64 ZPR:$src)>;
def : Pat<(nxv2i64 (bitconvert (nxv4i32 ZPR:$src))), (nxv2i64 ZPR:$src)>;		def : Pat<(nxv2i64 (bitconvert (nxv4i32 ZPR:$src))), (nxv2i64 ZPR:$src)>;
def : Pat<(nxv2i64 (bitconvert (nxv8f16 ZPR:$src))), (nxv2i64 ZPR:$src)>;		def : Pat<(nxv2i64 (bitconvert (nxv8f16 ZPR:$src))), (nxv2i64 ZPR:$src)>;
def : Pat<(nxv2i64 (bitconvert (nxv4f32 ZPR:$src))), (nxv2i64 ZPR:$src)>;		def : Pat<(nxv2i64 (bitconvert (nxv4f32 ZPR:$src))), (nxv2i64 ZPR:$src)>;
def : Pat<(nxv2i64 (bitconvert (nxv2f64 ZPR:$src))), (nxv2i64 ZPR:$src)>;		def : Pat<(nxv2i64 (bitconvert (nxv2f64 ZPR:$src))), (nxv2i64 ZPR:$src)>;

def : Pat<(nxv8f16 (bitconvert (nxv16i8 ZPR:$src))), (nxv8f16 ZPR:$src)>;		def : Pat<(nxv8f16 (bitconvert (nxv16i8 ZPR:$src))), (nxv8f16 ZPR:$src)>;
▲ Show 20 Lines • Show All 1,154 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-fp-reciprocal.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s \| FileCheck %s

				; FDIV

				define <vscale x 8 x half> @fdiv_8f16(<vscale x 8 x half> %a, <vscale x 8 x half> %b) {
				; CHECK-LABEL: fdiv_8f16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: fdiv z0.h, p0/m, z0.h, z1.h
				; CHECK-NEXT: ret
				%fdiv = fdiv fast <vscale x 8 x half> %a, %b
				ret <vscale x 8 x half> %fdiv
				}

				define <vscale x 8 x half> @fdiv_recip_8f16(<vscale x 8 x half> %a, <vscale x 8 x half> %b) #0 {
				; CHECK-LABEL: fdiv_recip_8f16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: frecpe z2.h, z1.h
				; CHECK-NEXT: frecps z3.h, z1.h, z2.h
				; CHECK-NEXT: fmul z2.h, z2.h, z3.h
				; CHECK-NEXT: frecps z1.h, z1.h, z2.h
				; CHECK-NEXT: fmul z1.h, z2.h, z1.h
				; CHECK-NEXT: fmul z0.h, z1.h, z0.h
				; CHECK-NEXT: ret
				david-armUnsubmitted Not Done Reply Inline Actions It's interesting that the fmul here is the predicated form, whereas for fdiv_recip_4f32 it's the unpredicated form. This has nothing to do with your patch though, but perhaps worth investigating in the future? david-arm: It's interesting that the fmul here is the predicated form, whereas for fdiv_recip_4f32 it's…
				paulwalker-armUnsubmitted Not Done Reply Inline Actions @david-arm The predicate is generated for the unpacked types so that inactive lanes can never trigger floating point exceptions. So this is expected behaviour. @kmclaughlin This does raise an interesting point though. Is it safe to use the reciprocal instructions for unpacked types? With the answer depending on whether these instruction can generate exceptions. paulwalker-arm: @david-arm The predicate is generated for the unpacked types so that inactive lanes can never…
				%fdiv = fdiv fast <vscale x 8 x half> %a, %b
				ret <vscale x 8 x half> %fdiv
				}

				define <vscale x 4 x float> @fdiv_4f32(<vscale x 4 x float> %a, <vscale x 4 x float> %b) {
				; CHECK-LABEL: fdiv_4f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: fdiv z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: ret
				%fdiv = fdiv fast <vscale x 4 x float> %a, %b
				ret <vscale x 4 x float> %fdiv
				}

				define <vscale x 4 x float> @fdiv_recip_4f32(<vscale x 4 x float> %a, <vscale x 4 x float> %b) #0 {
				; CHECK-LABEL: fdiv_recip_4f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: frecpe z2.s, z1.s
				; CHECK-NEXT: frecps z3.s, z1.s, z2.s
				; CHECK-NEXT: fmul z2.s, z2.s, z3.s
				; CHECK-NEXT: frecps z1.s, z1.s, z2.s
				; CHECK-NEXT: fmul z1.s, z2.s, z1.s
				; CHECK-NEXT: fmul z0.s, z1.s, z0.s
				; CHECK-NEXT: ret
				%fdiv = fdiv fast <vscale x 4 x float> %a, %b
				ret <vscale x 4 x float> %fdiv
				}

				define <vscale x 2 x double> @fdiv_2f64(<vscale x 2 x double> %a, <vscale x 2 x double> %b) {
				; CHECK-LABEL: fdiv_2f64:
				; CHECK: // %bb.0:
				david-armUnsubmitted Done Reply Inline Actions nit: Just for clarity is it worth renaming these to `@fdiv_2f64` and `@fdiv_recip_2f64` to be consistent with the f32 versions? david-arm: nit: Just for clarity is it worth renaming these to `@fdiv_2f64` and `@fdiv_recip_2f64` to be…
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: fdiv z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				%fdiv = fdiv fast <vscale x 2 x double> %a, %b
				ret <vscale x 2 x double> %fdiv
				}

				define <vscale x 2 x double> @fdiv_recip_2f64(<vscale x 2 x double> %a, <vscale x 2 x double> %b) #0 {
				; CHECK-LABEL: fdiv_recip_2f64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: frecpe z2.d, z1.d
				; CHECK-NEXT: frecps z3.d, z1.d, z2.d
				; CHECK-NEXT: fmul z2.d, z2.d, z3.d
				; CHECK-NEXT: frecps z3.d, z1.d, z2.d
				; CHECK-NEXT: fmul z2.d, z2.d, z3.d
				; CHECK-NEXT: frecps z1.d, z1.d, z2.d
				; CHECK-NEXT: fmul z1.d, z2.d, z1.d
				; CHECK-NEXT: fmul z0.d, z1.d, z0.d
				; CHECK-NEXT: ret
				%fdiv = fdiv fast <vscale x 2 x double> %a, %b
				ret <vscale x 2 x double> %fdiv
				}

				; FSQRT

				define <vscale x 8 x half> @fsqrt_8f16(<vscale x 8 x half> %a) {
				; CHECK-LABEL: fsqrt_8f16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: fsqrt z0.h, p0/m, z0.h
				; CHECK-NEXT: ret
				%fsqrt = call fast <vscale x 8 x half> @llvm.sqrt.nxv8f16(<vscale x 8 x half> %a)
				ret <vscale x 8 x half> %fsqrt
				}

				define <vscale x 8 x half> @fsqrt_recip_8f16(<vscale x 8 x half> %a) #0 {
				; CHECK-LABEL: fsqrt_recip_8f16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: frsqrte z1.h, z0.h
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: fmul z2.h, z1.h, z1.h
				; CHECK-NEXT: fcmeq p0.h, p0/z, z0.h, #0.0
				; CHECK-NEXT: frsqrts z2.h, z0.h, z2.h
				; CHECK-NEXT: fmul z1.h, z1.h, z2.h
				; CHECK-NEXT: fmul z2.h, z1.h, z1.h
				; CHECK-NEXT: frsqrts z2.h, z0.h, z2.h
				; CHECK-NEXT: fmul z1.h, z1.h, z2.h
				; CHECK-NEXT: fmul z1.h, z0.h, z1.h
				; CHECK-NEXT: sel z0.h, p0, z0.h, z1.h
				; CHECK-NEXT: ret
				%fsqrt = call fast <vscale x 8 x half> @llvm.sqrt.nxv8f16(<vscale x 8 x half> %a)
				ret <vscale x 8 x half> %fsqrt
				}

				define <vscale x 4 x float> @fsqrt_4f32(<vscale x 4 x float> %a) {
				; CHECK-LABEL: fsqrt_4f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: fsqrt z0.s, p0/m, z0.s
				; CHECK-NEXT: ret
				%fsqrt = call fast <vscale x 4 x float> @llvm.sqrt.nxv4f32(<vscale x 4 x float> %a)
				ret <vscale x 4 x float> %fsqrt
				}

				define <vscale x 4 x float> @fsqrt_recip_4f32(<vscale x 4 x float> %a) #0 {
				; CHECK-LABEL: fsqrt_recip_4f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: frsqrte z1.s, z0.s
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: fmul z2.s, z1.s, z1.s
				; CHECK-NEXT: fcmeq p0.s, p0/z, z0.s, #0.0
				; CHECK-NEXT: frsqrts z2.s, z0.s, z2.s
				; CHECK-NEXT: fmul z1.s, z1.s, z2.s
				; CHECK-NEXT: fmul z2.s, z1.s, z1.s
				; CHECK-NEXT: frsqrts z2.s, z0.s, z2.s
				; CHECK-NEXT: fmul z1.s, z1.s, z2.s
				; CHECK-NEXT: fmul z1.s, z0.s, z1.s
				; CHECK-NEXT: sel z0.s, p0, z0.s, z1.s
				; CHECK-NEXT: ret
				%fsqrt = call fast <vscale x 4 x float> @llvm.sqrt.nxv4f32(<vscale x 4 x float> %a)
				ret <vscale x 4 x float> %fsqrt
				david-armUnsubmitted Not Done Reply Inline Actions Again, I don't think this is caused by your patch, but it's probably worth investigating why we're selecting between the original input and the estimate based on a zero input. It feels inconsistent with `@fsqrt_4f32` where we don't seem to worry about the input. david-arm: Again, I don't think this is caused by your patch, but it's probably worth investigating why…
				}

				define <vscale x 2 x double> @fsqrt_2f64(<vscale x 2 x double> %a) {
				; CHECK-LABEL: fsqrt_2f64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: fsqrt z0.d, p0/m, z0.d
				david-armUnsubmitted Done Reply Inline Actions nit: Again, maybe for consistency it's better to use the name `@fsqrt_2f64` here and below? david-arm: nit: Again, maybe for consistency it's better to use the name `@fsqrt_2f64` here and below?
				; CHECK-NEXT: ret
				%fsqrt = call fast <vscale x 2 x double> @llvm.sqrt.nxv2f64(<vscale x 2 x double> %a)
				ret <vscale x 2 x double> %fsqrt
				}

				define <vscale x 2 x double> @fsqrt_recip_2f64(<vscale x 2 x double> %a) #0 {
				; CHECK-LABEL: fsqrt_recip_2f64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: frsqrte z1.d, z0.d
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: fmul z2.d, z1.d, z1.d
				; CHECK-NEXT: fcmeq p0.d, p0/z, z0.d, #0.0
				; CHECK-NEXT: frsqrts z2.d, z0.d, z2.d
				; CHECK-NEXT: fmul z1.d, z1.d, z2.d
				david-armUnsubmitted Done Reply Inline Actions nit: For the fsqrt functions can you remove the second `%b` arguments as they seem to be unused? david-arm: nit: For the fsqrt functions can you remove the second `%b` arguments as they seem to be unused?
				; CHECK-NEXT: fmul z2.d, z1.d, z1.d
				; CHECK-NEXT: frsqrts z2.d, z0.d, z2.d
				; CHECK-NEXT: fmul z1.d, z1.d, z2.d
				; CHECK-NEXT: fmul z2.d, z1.d, z1.d
				; CHECK-NEXT: frsqrts z2.d, z0.d, z2.d
				; CHECK-NEXT: fmul z1.d, z1.d, z2.d
				; CHECK-NEXT: fmul z1.d, z0.d, z1.d
				; CHECK-NEXT: sel z0.d, p0, z0.d, z1.d
				; CHECK-NEXT: ret
				%fsqrt = call fast <vscale x 2 x double> @llvm.sqrt.nxv2f64(<vscale x 2 x double> %a)
				ret <vscale x 2 x double> %fsqrt
				}

				declare <vscale x 2 x half> @llvm.sqrt.nxv2f16(<vscale x 2 x half>)
				declare <vscale x 4 x half> @llvm.sqrt.nxv4f16(<vscale x 4 x half>)
				declare <vscale x 8 x half> @llvm.sqrt.nxv8f16(<vscale x 8 x half>)
				declare <vscale x 2 x float> @llvm.sqrt.nxv2f32(<vscale x 2 x float>)
				declare <vscale x 4 x float> @llvm.sqrt.nxv4f32(<vscale x 4 x float>)
				declare <vscale x 2 x double> @llvm.sqrt.nxv2f64(<vscale x 2 x double>)

				attributes #0 = { "reciprocal-estimates"="all" }

This is an archive of the discontinued LLVM Phabricator instance.

[SVE][CodeGen] Enable reciprocal estimates for scalable fdiv/fsqrtClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 381912

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

llvm/test/CodeGen/AArch64/sve-fp-reciprocal.ll

[SVE][CodeGen] Enable reciprocal estimates for scalable fdiv/fsqrt
ClosedPublic