This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/SelectionDAG Check for NaN, DX10Clamp and IEEE in fmed3 combine
Needs ReviewPublic

Authored by Petar.Avramovic on Sep 30 2020, 7:39 AM.

Details

Reviewers
foad
arsenm
Summary

Add more checks for potential SNaN or QNaN input together with DX10Clamp
and IEEE flags in combines that involve floating point min, max, med3 and
clamp in order to get equivalent instruction for each input.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptSep 30 2020, 7:39 AM
Petar.Avramovic requested review of this revision.Sep 30 2020, 7:39 AM

I think it's a futile to expect correct behavior with IEEE=0. No other operation behaves correctly with respect to snan quieting, and we're not going to implement all of the necessary manual quieting (i.e. isKnownNeverSNaN is always wrong since the operations that quiet don't really)

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
9816

Missing the

9822–9823

Should not need to do both of these checks

9825

Should short circuit expensive check based on IEEE mode

Simplify logic in 'is safe to clamp' checks.

foad added inline comments.Oct 13 2020, 5:15 AM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
9819

Is there a good reason to keep flags from the inner node, here, but not on line 9775?

9824–9825

You're assuming that Info->getMode().DX10Clamp affects the behaviour of AMDGPUISD::CLAMP which I think is OK, though it would be nice if that was clearly documented (I can't understand the comment about dx10_enable in AMDGPUISelLowering.h).

You're also assuming that Info->getMode().IEEE affects the behaviour of the Op0 node which could be ISD::FMAXNUM or ISD::FMAXNUM_IEEE or FMAX_LEGACY. This seems wrong to me.

9917

Typo "is NaN".

9933

Typo "remaining".

9959

You're assuming that the behaviour of AMDGPUISD::FMED3 depends on Info->getMode().IEEE, which seems reasonable.

9961

A constant could be a NaN, but isKnownNeverNaN should already have handled the constant case. So I don't think you need the check for isa<ConstantFPSDNode>(X) at all.

9961

Why do you keep flags here but not on line 9925?

9961

Typo "depending".

Fix typos.

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
9824–9825

Check for Info->getMode().IEEE only for ISD::FMAXNUM. This node is known never SNaN when IEEE is off.

llvm/test/CodeGen/AMDGPU/clamp.ll
420

isa<ConstantFPSDNode>(X) check in min max combine is there to cover existing test where X is NaN constant, min max will be folded to clamp and clamp will become constant in performClampCombine.

foad added inline comments.Oct 14 2020, 7:36 AM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
9824–9825

As I understand it, the behaviour of ISD::FMAXNUM should not depend on the IEEE mode. (Yes we currently codegen it to an instruction that behaves differently depending on the IEEE mode bit, but that's a bug that could be fixed, if anyone cared enough to fix it.)

llvm/test/CodeGen/AMDGPU/clamp.ll
420

This test is wrong then, isn't it? It's checking that fmed3(0,1,snan) -> 0, with DX10Clamp=1 and IEEE=1. But that's the wrong answer. fmed3(0,1,snan) is qnan in IEEE mode. This test is relying on an incorrect folding from fmed3 to clamp.

Maybe the test should be changed to put the 0 last, i.e. fmed3(snan,1,0) ?

Respect description of ISD::FMAXNUM and ISD::FMAXNUM_IEEE (their behavior is not affected by IEEE mode).
Legalizer will deal with use of FMAXNUM when IEEE=true (makes sure that potential SNaN input gets quieted) and we can try to fold FMAXNUM_IEEE later.
Most of the test changes come from clamp/fmed on top of fcanonicalize with IEEE=true.

We should probably check if ISD::FMAX... matches IEEE mode when folding min3 and max3 since these also depend on IEEE mode.
However this brings many regressions. For some reason in combine round after legalization nodes are not visited starting from end of the function but kind of random.For example in min(max(max(.., ..), 0.0) 1.0) middle max gets visited first and it folds into max3 and we miss clamp fold since min was not visited first.

All of this could be fixed with some test changes.
Currently, there no clean way to represent ISD::FMAXNUM_IEEE in IR. It can only be legalized from ISD::FMAXNUM by quieting its input.
I assume that producer of function with IEEE=true would desire llvm.maxnum_ieee.* (TODO introduce this in IR) instead of llvm.maxnum.* for IR floating point max.
This would make intention of tests that make use of isKnownNeverSNaN more clear since currently ISD::FMAXNUM_IEEE can't have SNaN input because it can only be produced from ISD::FMAXNUM.

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
9847

This patch fixes clamp and fmed3 folds that could be incorrect with IEEE=true(potential not-silenced SNaN inputs).
@arsenm Folding ISD::FMAXNUM with IEEE=true into max3 here could be incorrect when one of the inputs is SNaN. There are no correctness issues with IEEE=false.
Thoughts on moving this combine after legalizer? We could also potentially move min3/max3 and min/max patterns with constants to td file so that global-isel can use them as well.