We are using multiplication by 1.0 to flush denormals and quiet sNaNs.
That is possible to omit this multiplication if source of the
fcanonicalize instruction is known to be flushed/quieted, i.e.
if it comes from another instruction known to do the normalization
and we are using IEEE mode to quiet sNaNs.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
4638 ↗ | (On Diff #105896) | fcanonicalize should have the idempotent optimization applied so there shouldn't be a need to handle it |
4662 ↗ | (On Diff #105896) | We don't handle this |
4665–4668 ↗ | (On Diff #105896) | The output is never flushed anywhere? |
4669–4670 ↗ | (On Diff #105896) | We don't handle these |
4671–4672 ↗ | (On Diff #105896) | I don't think this is true, but should have a named check in the subtarget |
4678–4680 ↗ | (On Diff #105896) | We constant fold this already so this shouldn't be necessary |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
4638 ↗ | (On Diff #105896) | It is not: v_mul_f32_e32 v2, 1.0, v2 v_mul_f32_e32 v2, 1.0, v2 |
4662 ↗ | (On Diff #105896) | We do not handle this now, but may handle later. The idea is still the same as with FSIN and FCOS. |
4665–4668 ↗ | (On Diff #105896) | GFX9 flushes. |
4669–4670 ↗ | (On Diff #105896) | This again only now. I had a use of it in HSAIL and may actually port it. |
4671–4672 ↗ | (On Diff #105896) | I would rather think about denorm support flag in TD for every single instruction wrt subtarget. Why add just a single one? |
4678–4680 ↗ | (On Diff #105896) | This is necessary. We may constant fold it to remove fcanonicalize on a constant, but consider: canonicalize(fmax(x, 0.0f)); Without above code it would not be handled because we have to look though arguments of min/max/fneg/fabs. I have a test for it actually. In particular it is needed to fold an extremely common case like max(max, ...). |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
4638 ↗ | (On Diff #105896) | And I do not think there are idempotent SDNodes. |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
4638 ↗ | (On Diff #105896) | I don't think it's done now, but fcanonicalize (fcanonicalize x) should fold to just one canonicalize |
4665–4668 ↗ | (On Diff #105896) | That is broken. If that is the case we probably shouldn't be using the regular minnum/maxnum intrinsics without denormals, and then it's an optimization to fold canonicalize (min/max) to the weird target behavior. |
4671–4672 ↗ | (On Diff #105896) | We don't need a full fledged subtarget feature, just put the generation check in a function with name/description rather than adding more random looking generation checks |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
4638 ↗ | (On Diff #105896) | ...and that is what I am doing here. |
4665–4668 ↗ | (On Diff #105896) | The library is common for different targets. |
4671–4672 ↗ | (On Diff #105896) | I'm not sure I follow. Could you please describe a name of such check? |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
4665–4668 ↗ | (On Diff #105896) | We need to fix that then. If it's not returning exactly one of the inputs it's not implementing the IEEE min/max |
4671–4672 ↗ | (On Diff #105896) | hasAddr64() or hasMed3_16() |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
4638 ↗ | (On Diff #105896) | In fact even if a general folding is implemented in DAGCombiner, I believe it shall stay here as well. It is the same idea as with constants, something else which also normalizes can be combined with it. |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
4671–4672 ↗ | (On Diff #105896) | hasNormalizingMinMax()? It returns us to the initial point. |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
4665–4668 ↗ | (On Diff #105896) | In fact IEEE754-2008 (as described for maxnum/minnum llvm ir) tells to return canonicalized numbers, so it is the opposite: GFX9 is finally IEEE compliant. |
GFX9 handles denorms in min/max instructions, although its behavior in respect to signaling NaN handling is not identical if they are quieted or not before the instruction. That is still possible to perform GFX9 specific part of min/max handling given non-nans flag.
However, given the controversy and that GFX9 part of the change is non-essential I have removed specialization here, replacing it with todo. It can be done separately when needed.
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
4665–4668 ↗ | (On Diff #105896) | This part is split-off the patch. |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
4636 ↗ | (On Diff #106062) | Since FMAD always flushes I don't think it's OK to handle it |
4696–4699 ↗ | (On Diff #106062) | This is really a check for whether SNaNs are enabled. We have the isKnownNeverSNan check for this already. I think we need to fix having separate FPExceptions and enableIEEEBit subtarget checks |
4671–4672 ↗ | (On Diff #105896) | The SC name was SupportsMinMaxDenormModes. Either way works |
test/CodeGen/AMDGPU/fcanonicalize-elimination.ll | ||
2 ↗ | (On Diff #106062) | This needs a run line for gfx9 with denormals disabled too |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
4636 ↗ | (On Diff #106062) | FMAD - Perform a * b + c, while getting the same result as the separately rounded operations. |
4671–4672 ↗ | (On Diff #105896) | This code part is dropped for now. |
Switched to use of isKnownNeverSNan() and added test for it.
Added test run line for GFX9 without denorms.
LGTM. Can you file a bug for the broken minnum/maxnum denormal handling
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
4706 ↗ | (On Diff #106086) | This would read more accurately as IsIEEEMode || isKnownNeverSNan. |