Add more checks for potential SNaN or QNaN input together with DX10Clamp
and IEEE flags in combines that involve floating point min, max, med3 and
clamp in order to get equivalent instruction for each input.
Details
Diff Detail
Event Timeline
I think it's a futile to expect correct behavior with IEEE=0. No other operation behaves correctly with respect to snan quieting, and we're not going to implement all of the necessary manual quieting (i.e. isKnownNeverSNaN is always wrong since the operations that quiet don't really)
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
9816 | Missing the | |
9822–9823 | Should not need to do both of these checks | |
9825 | Should short circuit expensive check based on IEEE mode |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
9819 | Is there a good reason to keep flags from the inner node, here, but not on line 9775? | |
9824–9825 | You're assuming that Info->getMode().DX10Clamp affects the behaviour of AMDGPUISD::CLAMP which I think is OK, though it would be nice if that was clearly documented (I can't understand the comment about dx10_enable in AMDGPUISelLowering.h). You're also assuming that Info->getMode().IEEE affects the behaviour of the Op0 node which could be ISD::FMAXNUM or ISD::FMAXNUM_IEEE or FMAX_LEGACY. This seems wrong to me. | |
9917 | Typo "is NaN". | |
9933 | Typo "remaining". | |
9959 | You're assuming that the behaviour of AMDGPUISD::FMED3 depends on Info->getMode().IEEE, which seems reasonable. | |
9961 | A constant could be a NaN, but isKnownNeverNaN should already have handled the constant case. So I don't think you need the check for isa<ConstantFPSDNode>(X) at all. | |
9961 | Why do you keep flags here but not on line 9925? | |
9961 | Typo "depending". |
Fix typos.
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
9824–9825 | Check for Info->getMode().IEEE only for ISD::FMAXNUM. This node is known never SNaN when IEEE is off. | |
llvm/test/CodeGen/AMDGPU/clamp.ll | ||
420 | isa<ConstantFPSDNode>(X) check in min max combine is there to cover existing test where X is NaN constant, min max will be folded to clamp and clamp will become constant in performClampCombine. |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
9824–9825 | As I understand it, the behaviour of ISD::FMAXNUM should not depend on the IEEE mode. (Yes we currently codegen it to an instruction that behaves differently depending on the IEEE mode bit, but that's a bug that could be fixed, if anyone cared enough to fix it.) | |
llvm/test/CodeGen/AMDGPU/clamp.ll | ||
420 | This test is wrong then, isn't it? It's checking that fmed3(0,1,snan) -> 0, with DX10Clamp=1 and IEEE=1. But that's the wrong answer. fmed3(0,1,snan) is qnan in IEEE mode. This test is relying on an incorrect folding from fmed3 to clamp. Maybe the test should be changed to put the 0 last, i.e. fmed3(snan,1,0) ? |
Respect description of ISD::FMAXNUM and ISD::FMAXNUM_IEEE (their behavior is not affected by IEEE mode).
Legalizer will deal with use of FMAXNUM when IEEE=true (makes sure that potential SNaN input gets quieted) and we can try to fold FMAXNUM_IEEE later.
Most of the test changes come from clamp/fmed on top of fcanonicalize with IEEE=true.
We should probably check if ISD::FMAX... matches IEEE mode when folding min3 and max3 since these also depend on IEEE mode.
However this brings many regressions. For some reason in combine round after legalization nodes are not visited starting from end of the function but kind of random.For example in min(max(max(.., ..), 0.0) 1.0) middle max gets visited first and it folds into max3 and we miss clamp fold since min was not visited first.
All of this could be fixed with some test changes.
Currently, there no clean way to represent ISD::FMAXNUM_IEEE in IR. It can only be legalized from ISD::FMAXNUM by quieting its input.
I assume that producer of function with IEEE=true would desire llvm.maxnum_ieee.* (TODO introduce this in IR) instead of llvm.maxnum.* for IR floating point max.
This would make intention of tests that make use of isKnownNeverSNaN more clear since currently ISD::FMAXNUM_IEEE can't have SNaN input because it can only be produced from ISD::FMAXNUM.
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
9847 | This patch fixes clamp and fmed3 folds that could be incorrect with IEEE=true(potential not-silenced SNaN inputs). |
Missing the