This is an archive of the discontinued LLVM Phabricator instance.

[Legalizer] Expand fmaximum and fminimum
AbandonedPublic

Authored by qiucf on Aug 15 2023, 11:30 PM.

Details

Reviewers
nemanjai
shchenz
stefanp
tlively
RKSimon
e-kud
craig.topper
jcranmer-intel
Group Reviewers
Restricted Project
Summary

According to langref, llvm.maximum/minimum has -0.0 < +0.0 semantics and propagates NaN.

Expand the nodes on targets not supporting the operation, by adding extra check for NaN and using is_fpclass to check zero signs.

Diff Detail

Event Timeline

qiucf created this revision.Aug 15 2023, 11:30 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 15 2023, 11:30 PM
qiucf requested review of this revision.Aug 15 2023, 11:30 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 15 2023, 11:30 PM
nikic added a subscriber: nikic.Aug 15 2023, 11:38 PM

NaN handling is not the only difference between these intrinsics, they also have different signed zero semantics.

Also, I really don't think another target should implement custom lowering for this without adding generic legalization support first. Please add generic legalization first, and if that's not optimal for ppc, add custom lowering. It makes no sense that every target is re-implementing this using custom lowering right now.

RKSimon requested changes to this revision.Aug 16 2023, 6:12 AM

This should be done as a generic legalization and is missing the -0.0 < +0.0 semantic detailed here: https://llvm.org/docs/LangRef.html#llvm-minimum-intrinsic

This revision now requires changes to proceed.Aug 16 2023, 6:12 AM
qiucf updated this revision to Diff 551451.Aug 18 2023, 3:15 AM
qiucf retitled this revision from [PowerPC] Implement llvm.maximum/minimum intrinsic to [Legalizer] Expand fmaximum and fminimum.
qiucf edited the summary of this revision. (Show Details)

Move to generic expansion.

Some targets do not set legal status of fmaximum and fminimum properly, so after expansion added, legalizer goes into the expansion path. RISCV looks suspicious so I marked them as legal, but uncertain for ARM.

Also this call needs to be added into VectorLegalizer::Expand, which will make large case change, I think that's also because of the wrong action setting of fminimum/fmaximum.

PowerPC codegen is not optimal on VSX targets, further patch will be provided to optimize it.

arsenm added a subscriber: arsenm.Aug 18 2023, 4:20 AM

I think we should either define the existing FMINNUM_IEEE/FMAXNUM_IEEE to have the correct IEEE 2019 signed zero ordering (I can't name a target that doesn't have this behavior), or we have to add a pair of DAG nodes that do

craig.topper added inline comments.Aug 18 2023, 7:49 AM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
441

Can you reverse the if and else here to remove the ! from the condition.

e-kud added inline comments.Aug 18 2023, 9:07 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8183

Could it be simplified for maximum as follows?

Tmp = select(is_fpclass(PosZero, LHS), LHS, MinMax)
Res = select(is_fpclass(PosZero, RHS), RHS, Tmp)
MinMax = select(IsZero, Res, MinMax)

I don't believe any of the existing Arm/AArch64 tests should change with this patch. The SVE test looks like it has just regenerated check lines without any changes? Can you include this change to make the operations always legal when we have NEON, It will be better than expanding them:

diff --git a/llvm/lib/Target/ARM/ARMISelLowering.cpp b/llvm/lib/Target/ARM/ARMISelLowering.cpp
index 80a147666094..27f12f3a1cc8 100644
--- a/llvm/lib/Target/ARM/ARMISelLowering.cpp
+++ b/llvm/lib/Target/ARM/ARMISelLowering.cpp
@@ -1543,15 +1543,11 @@ ARMTargetLowering::ARMTargetLowering(const TargetMachine &TM,
 
   if (Subtarget->hasNEON()) {
     // vmin and vmax aren't available in a scalar form, so we can use
-    // a NEON instruction with an undef lane instead.  This has a performance
-    // penalty on some cores, so we don't do this unless we have been
-    // asked to by the core tuning model.
-    if (Subtarget->useNEONForSinglePrecisionFP()) {
-      setOperationAction(ISD::FMINIMUM, MVT::f32, Legal);
-      setOperationAction(ISD::FMAXIMUM, MVT::f32, Legal);
-      setOperationAction(ISD::FMINIMUM, MVT::f16, Legal);
-      setOperationAction(ISD::FMAXIMUM, MVT::f16, Legal);
-    }
+    // a NEON instruction with an undef lane instead.
+    setOperationAction(ISD::FMINIMUM, MVT::f32, Legal);
+    setOperationAction(ISD::FMAXIMUM, MVT::f32, Legal);
+    setOperationAction(ISD::FMINIMUM, MVT::f16, Legal);
+    setOperationAction(ISD::FMAXIMUM, MVT::f16, Legal);
     setOperationAction(ISD::FMINIMUM, MVT::v2f32, Legal);
     setOperationAction(ISD::FMAXIMUM, MVT::v2f32, Legal);
     setOperationAction(ISD::FMINIMUM, MVT::v4f32, Legal);
tra added a subscriber: tra.Aug 21 2023, 1:47 PM
tra added inline comments.Aug 21 2023, 2:13 PM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8161

Are we expected to return *a* NaN or the specific NaN value of one of the arguments?
APfloat's implementation returns one of the argument values.
X86ISelLowering.cpp does the same.

qiucf updated this revision to Diff 552239.Aug 22 2023, 12:33 AM
qiucf marked 2 inline comments as done.

I think we should either define the existing FMINNUM_IEEE/FMAXNUM_IEEE to have the correct IEEE 2019 signed zero ordering (I can't name a target that doesn't have this behavior), or we have to add a pair of DAG nodes that do

Some targets have instructions legal for fminnum_ieee. It's better to add new one.

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8161

Does the standard specify if the exact NaN should be returned? Choosing one NaN will produce more complex code.

tra added inline comments.Aug 22 2023, 10:48 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8161

I don't have access to the standard text. All I could find online are these notes https://grouper.ieee.org/groups/msc/ANSI_IEEE-Std-754-2019/background/minNum_maxNum_Removal_Demotion_v3.pdf oulining the state of the min/max and the recommeendations for the standard changes. They do seem to care about NaN associativity/commutativity, but I have no idea what exactly ended up specified by the standard.

I assume that the authors of the implementations in APFloat and X86 did implement it according to the standard.

@tlively -- would you happen to know the details?

tlively added inline comments.Aug 22 2023, 12:00 PM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8161

Looked into it, and yes, the standard says (in section 6.2) that the output NaN should have the same payload as one of the input NaNs, explicitly not specifying which input NaN the payload is from if there are multiple. Furthermore, the output NaN must be canonical even if the input is a non-canonical NaN.

I think we should either define the existing FMINNUM_IEEE/FMAXNUM_IEEE to have the correct IEEE 2019 signed zero ordering (I can't name a target that doesn't have this behavior), or we have to add a pair of DAG nodes that do

Some targets have instructions legal for fminnum_ieee. It's better to add new one.

I thought AMDGPU was the only one using it, but I see PPC and Loongarch have started. AMDGPU definitely has the correct signed zero behavior, so do the other 2?

I think we should either define the existing FMINNUM_IEEE/FMAXNUM_IEEE to have the correct IEEE 2019 signed zero ordering (I can't name a target that doesn't have this behavior), or we have to add a pair of DAG nodes that do

Some targets have instructions legal for fminnum_ieee. It's better to add new one.

I thought AMDGPU was the only one using it, but I see PPC and Loongarch have started. AMDGPU definitely has the correct signed zero behavior, so do the other 2?

PowerPC VSX max/min instruction (1) respects zero sign ordering; (2) max(QNaN, non-NaN) = non-NaN; (3) max(SNaN, *) = QNaN, so it matches fmaximum_ieee semantics. (Page 735 Power ISA 3.1).

LoongArch ISA says 'the operation of these two instructions follows the specification of maxNum(x,y) operation in the IEEE 754-2008 standard', so I think it is also right. (CC @SixWeining )

It seems no way to get fmaximum_ieee/fminimum_ieee through IR?