According to langref, llvm.maximum/minimum has -0.0 < +0.0 semantics and propagates NaN.
Expand the nodes on targets not supporting the operation, by adding extra check for NaN and using is_fpclass to check zero signs.
Paths
| Differential D158053
[Legalizer] Expand fmaximum and fminimum AbandonedPublic Authored by qiucf on Aug 15 2023, 11:30 PM.
Details
Summary According to langref, llvm.maximum/minimum has -0.0 < +0.0 semantics and propagates NaN. Expand the nodes on targets not supporting the operation, by adding extra check for NaN and using is_fpclass to check zero signs.
Diff Detail
Event TimelineComment Actions NaN handling is not the only difference between these intrinsics, they also have different signed zero semantics. Also, I really don't think another target should implement custom lowering for this without adding generic legalization support first. Please add generic legalization first, and if that's not optimal for ppc, add custom lowering. It makes no sense that every target is re-implementing this using custom lowering right now. Comment Actions This should be done as a generic legalization and is missing the -0.0 < +0.0 semantic detailed here: https://llvm.org/docs/LangRef.html#llvm-minimum-intrinsic This revision now requires changes to proceed.Aug 16 2023, 6:12 AM qiucf retitled this revision from [PowerPC] Implement llvm.maximum/minimum intrinsic to [Legalizer] Expand fmaximum and fminimum. Comment ActionsMove to generic expansion. Some targets do not set legal status of fmaximum and fminimum properly, so after expansion added, legalizer goes into the expansion path. RISCV looks suspicious so I marked them as legal, but uncertain for ARM. Also this call needs to be added into VectorLegalizer::Expand, which will make large case change, I think that's also because of the wrong action setting of fminimum/fmaximum. PowerPC codegen is not optimal on VSX targets, further patch will be provided to optimize it. Herald added subscribers: wangpc, luke, sunshaoce and 23 others. · View Herald TranscriptAug 18 2023, 3:15 AM Comment Actions I think we should either define the existing FMINNUM_IEEE/FMAXNUM_IEEE to have the correct IEEE 2019 signed zero ordering (I can't name a target that doesn't have this behavior), or we have to add a pair of DAG nodes that do
Comment Actions I don't believe any of the existing Arm/AArch64 tests should change with this patch. The SVE test looks like it has just regenerated check lines without any changes? Can you include this change to make the operations always legal when we have NEON, It will be better than expanding them: diff --git a/llvm/lib/Target/ARM/ARMISelLowering.cpp b/llvm/lib/Target/ARM/ARMISelLowering.cpp index 80a147666094..27f12f3a1cc8 100644 --- a/llvm/lib/Target/ARM/ARMISelLowering.cpp +++ b/llvm/lib/Target/ARM/ARMISelLowering.cpp @@ -1543,15 +1543,11 @@ ARMTargetLowering::ARMTargetLowering(const TargetMachine &TM, if (Subtarget->hasNEON()) { // vmin and vmax aren't available in a scalar form, so we can use - // a NEON instruction with an undef lane instead. This has a performance - // penalty on some cores, so we don't do this unless we have been - // asked to by the core tuning model. - if (Subtarget->useNEONForSinglePrecisionFP()) { - setOperationAction(ISD::FMINIMUM, MVT::f32, Legal); - setOperationAction(ISD::FMAXIMUM, MVT::f32, Legal); - setOperationAction(ISD::FMINIMUM, MVT::f16, Legal); - setOperationAction(ISD::FMAXIMUM, MVT::f16, Legal); - } + // a NEON instruction with an undef lane instead. + setOperationAction(ISD::FMINIMUM, MVT::f32, Legal); + setOperationAction(ISD::FMAXIMUM, MVT::f32, Legal); + setOperationAction(ISD::FMINIMUM, MVT::f16, Legal); + setOperationAction(ISD::FMAXIMUM, MVT::f16, Legal); setOperationAction(ISD::FMINIMUM, MVT::v2f32, Legal); setOperationAction(ISD::FMAXIMUM, MVT::v2f32, Legal); setOperationAction(ISD::FMINIMUM, MVT::v4f32, Legal);
qiucf marked 2 inline comments as done. Comment Actions
Some targets have instructions legal for fminnum_ieee. It's better to add new one.
Comment Actions
I thought AMDGPU was the only one using it, but I see PPC and Loongarch have started. AMDGPU definitely has the correct signed zero behavior, so do the other 2? Comment Actions
PowerPC VSX max/min instruction (1) respects zero sign ordering; (2) max(QNaN, non-NaN) = non-NaN; (3) max(SNaN, *) = QNaN, so it matches fmaximum_ieee semantics. (Page 735 Power ISA 3.1). LoongArch ISA says 'the operation of these two instructions follows the specification of maxNum(x,y) operation in the IEEE 754-2008 standard', so I think it is also right. (CC @SixWeining ) It seems no way to get fmaximum_ieee/fminimum_ieee through IR?
Revision Contents
Diff 552239 llvm/include/llvm/CodeGen/TargetLowering.h
llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
llvm/lib/Target/ARM/ARMISelLowering.cpp
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
llvm/test/CodeGen/ARM/minnum-maxnum-intrinsics.ll
llvm/test/CodeGen/PowerPC/fminimum-fmaximum-f128.ll
llvm/test/CodeGen/PowerPC/fminimum-fmaximum.ll
|
Are we expected to return *a* NaN or the specific NaN value of one of the arguments?
APfloat's implementation returns one of the argument values.
X86ISelLowering.cpp does the same.