This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/GlobalISel: Calculate isKnownNeverNaN for fminnum and fmaxnum
ClosedPublic

Authored by Petar.Avramovic on Nov 18 2020, 8:29 AM.

Details

Summary

Implements same logis as in SelectionDAG.
G_FMINNUM_IEEE and G_FMAXNUM_IEEE are never SNaN by definition and
never NaN when one operand is known non-NaN and other known non-SNaN.
G_FMINNUM_IEEE and G_FMAXNUM_IEEE are never NaN/SNaN when one of the
operands is known non-NaN/SNaN.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptNov 18 2020, 8:29 AM
Petar.Avramovic requested review of this revision.Nov 18 2020, 8:29 AM
arsenm added inline comments.Nov 18 2020, 8:32 AM
llvm/lib/CodeGen/GlobalISel/Utils.cpp
533–539

Generic opcodes should not have target dependent semantics like this, especially since these already have semantics inherited from the IR/C fmin/fmax definition. They return the non-nan argument regardless of snan or not

llvm/lib/CodeGen/GlobalISel/Utils.cpp
537–538

Can this then be return true?

arsenm added inline comments.Nov 18 2020, 8:37 AM
llvm/lib/CodeGen/GlobalISel/Utils.cpp
537–538

No, it depends on the inputs. You can refer to the existing SelectionDAG::isKnownNeverNaN:

// Only one needs to be known not-nan, since it will be returned if the
// other ends up being one.
return isKnownNeverNaN(Op.getOperand(0), SNaN, Depth + 1) ||
       isKnownNeverNaN(Op.getOperand(1), SNaN, Depth + 1);
llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-fminnum.mir
823
return isKnownNeverNaN(Op.getOperand(0), SNaN, Depth + 1) ||
       isKnownNeverNaN(Op.getOperand(1), SNaN, Depth + 1);
}

This is legalized first. %2 is isKnownNeverSNaN with IEEE = true, but since inputs for %2:_(s32) = G_FMAXNUM %0, %1 are not yet canonicalized isKnownNeverSNaN will fail and canonicalize is inserted. However since we know what will happen with '%2:_(s32) = G_FMAXNUM %0, %1 we declare it isKnownNeverSNaN to not depend on order of legalization.

arsenm added inline comments.Nov 18 2020, 8:50 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-fminnum.mir
823

The legalization order doesn't matter. These operations have their own independent semantics. isKnownNeverNaN needs to understand both pairs

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-fminnum.mir
818–820

Using only that formula this will not change and we are left with unnecessary G_FCANONICALIZE(G_FMAXNUM_IEEE) which stops us from performing further combines both from td file and combiner. Do you have other suggestion how to avoid making G_FCANONICALIZE?

823

How can this be done?

The legalization order doesn't matter.

Formula above gives different result depending if input was legalized already or not. It would work if we would legalize uses first.

arsenm added inline comments.Nov 18 2020, 9:08 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-fminnum.mir
823

The same way SelectionDAG handles this:

case ISD::FMINNUM_IEEE:
case ISD::FMAXNUM_IEEE: {
  if (SNaN)
    return true;
  // This can return a NaN if either operand is an sNaN, or if both operands
  // are NaN.
  return (isKnownNeverNaN(Op.getOperand(0), false, Depth + 1) &&
          isKnownNeverSNaN(Op.getOperand(1), Depth + 1)) ||
         (isKnownNeverNaN(Op.getOperand(1), false, Depth + 1) &&
          isKnownNeverSNaN(Op.getOperand(0), Depth + 1));
}
llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-fminnum.mir
823

I meant for FMINNUM, writing same code as in SDAG doesn't deal with unnecessary canonicalize. FMINNUM_IEEE is the same (there is only if (SNaN) part atm).

arsenm added inline comments.Nov 18 2020, 9:31 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-fminnum.mir
823

The lowering for fminnum to fminnum_ieee tries to avoid the canonicalize depending on the instruction/inputs. This is how SelectionDAG also does it. For this patch, it just needs to fully handle the correct semanics for the two pairs

llvm/lib/CodeGen/GlobalISel/Utils.cpp
533–539

Generic opcodes should not have target dependent semantics like this, especially since these already have semantics inherited from the IR/C fmin/fmax definition.

I would say that only "is Known Never any NaN" is given. This patch does not change that (isKnownNeverNaN(SNaN = false)).

They return the non-nan argument regardless of snan or not

still stands. We are just aware of the way we lower instruction and use it in isKnownNeverSNaN only. This is only used for efficiency to avoid having to quiet "known non SNaN" into instruction that is about to be lowered.

537–538

I don't think

return isKnownNeverNaN(DefMI->getOperand(1).getReg(), MRI, SNaN) ||
       isKnownNeverNaN(DefMI->getOperand(2).getReg(), MRI, SNaN);

is enough for globalisel since it does not look at the whole picture. The formula above works when target naturally has FMINNUM instruction and knowing the inputs gives conclusion about the output (one input is not nan result is not nan). This would be generic formula (FIXME, use this formula in isKnownNeverNaNForMI instead of return false).
But subtarget with IEEE=true does not have FMINNUM and has to lower it to FMINNUM_IEEE. This gives additional critical piece of information FMINNUM is never SNaN with IEEE=true.
Are test changes correct?

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-fminnum.mir
823

About examining the input: We are likely examining inputs in the middle of the legalization. With IEEE=true fminnum will never be the input but fminnum_ieee. Recursively asking for fminnum's inputs is not correct in this context since (maybe quieted)inputs will go to fminnum_ieee (its result is never SNaN) not fminnum and this is target dependent.
I don't see how this breaks correct semantics.

arsenm added inline comments.Nov 18 2020, 11:45 AM
llvm/lib/CodeGen/GlobalISel/Utils.cpp
537–538

The single instruction is the whole picture and does not depend on the target. These have defined, universal NAN handling semantics regardless of however they are lowered. This patch should only aim to faithfully implement these semantics. Ensuring we don't have redundant quieting canonicalizes is an optimization beyond this patch.

As an optimization, the expansion to the IEEE version tries to avoid inserting a quiet, which mostly works well enough. We are missing some optimization hints. I have long wished that we had separate no-qnan and no-snan fast math flags (as well as renaming the IR intrinsics to match the real fmin/fmax semantics, and introduce a new pair for the IEEE-758 2008 semantics)

reverse ping

Petar.Avramovic edited the summary of this revision. (Show Details)

Removing target dependent semantics.

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-fmaxnum.mir
721

Thoughts on adding

%0:_(s32) = G_FMINNUM_IEEE/G_FMAXNUM_IEEE ...
%1:_(s32) = G_FCANONICALIZE %0
->
%0:_(s32) = G_FMINNUM_IEEE/G_FMAXNUM_IEEE ...
%1:_(s32) = COPY %0

combine post legalizer?

arsenm added inline comments.Dec 18 2020, 9:36 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-fmaxnum.mir
721

Yes, redundant canonicalization elimination belongs post legalizer

arsenm accepted this revision.Feb 12 2021, 7:40 AM
This revision is now accepted and ready to land.Feb 12 2021, 7:40 AM