This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] fcanonicalize elimination optimization
ClosedPublic

Authored by rampitec on Jul 10 2017, 11:49 AM.

Details

Summary

We are using multiplication by 1.0 to flush denormals and quiet sNaNs.
That is possible to omit this multiplication if source of the
fcanonicalize instruction is known to be flushed/quieted, i.e.
if it comes from another instruction known to do the normalization
and we are using IEEE mode to quiet sNaNs.

Diff Detail

Event Timeline

rampitec created this revision.Jul 10 2017, 11:49 AM
arsenm added inline comments.Jul 10 2017, 12:10 PM
lib/Target/AMDGPU/SIISelLowering.cpp
4639

fcanonicalize should have the idempotent optimization applied so there shouldn't be a need to handle it

4663

We don't handle this

4666–4669

The output is never flushed anywhere?

4670–4671

We don't handle these

4672–4673

I don't think this is true, but should have a named check in the subtarget

4679–4681

We constant fold this already so this shouldn't be necessary

rampitec added inline comments.Jul 10 2017, 12:21 PM
lib/Target/AMDGPU/SIISelLowering.cpp
4639

It is not:

v_mul_f32_e32 v2, 1.0, v2
v_mul_f32_e32 v2, 1.0, v2
4663

We do not handle this now, but may handle later. The idea is still the same as with FSIN and FCOS.

4666–4669

GFX9 flushes.

4670–4671

This again only now. I had a use of it in HSAIL and may actually port it.

4672–4673

I would rather think about denorm support flag in TD for every single instruction wrt subtarget. Why add just a single one?

4679–4681

This is necessary. We may constant fold it to remove fcanonicalize on a constant, but consider:

canonicalize(fmax(x, 0.0f));

Without above code it would not be handled because we have to look though arguments of min/max/fneg/fabs. I have a test for it actually.

In particular it is needed to fold an extremely common case like max(max, ...).

rampitec added inline comments.Jul 10 2017, 12:40 PM
lib/Target/AMDGPU/SIISelLowering.cpp
4639

And I do not think there are idempotent SDNodes.

arsenm added inline comments.Jul 10 2017, 12:53 PM
lib/Target/AMDGPU/SIISelLowering.cpp
4639

I don't think it's done now, but fcanonicalize (fcanonicalize x) should fold to just one canonicalize

4666–4669

That is broken. If that is the case we probably shouldn't be using the regular minnum/maxnum intrinsics without denormals, and then it's an optimization to fold canonicalize (min/max) to the weird target behavior.

4672–4673

We don't need a full fledged subtarget feature, just put the generation check in a function with name/description rather than adding more random looking generation checks

rampitec added inline comments.Jul 10 2017, 1:10 PM
lib/Target/AMDGPU/SIISelLowering.cpp
4639

...and that is what I am doing here.

4666–4669

The library is common for different targets.

4672–4673

I'm not sure I follow. Could you please describe a name of such check?

rampitec updated this revision to Diff 105910.Jul 10 2017, 1:23 PM

Rebase to master.
Updated test for commuted instruction after latest changes.

arsenm added inline comments.Jul 10 2017, 1:32 PM
lib/Target/AMDGPU/SIISelLowering.cpp
4666–4669

We need to fix that then. If it's not returning exactly one of the inputs it's not implementing the IEEE min/max

4672–4673

hasAddr64() or hasMed3_16()

rampitec added inline comments.Jul 10 2017, 1:34 PM
lib/Target/AMDGPU/SIISelLowering.cpp
4639

In fact even if a general folding is implemented in DAGCombiner, I believe it shall stay here as well. It is the same idea as with constants, something else which also normalizes can be combined with it.

rampitec added inline comments.Jul 10 2017, 1:36 PM
lib/Target/AMDGPU/SIISelLowering.cpp
4672–4673

hasNormalizingMinMax()? It returns us to the initial point.

rampitec added inline comments.Jul 10 2017, 6:32 PM
lib/Target/AMDGPU/SIISelLowering.cpp
4666–4669

In fact IEEE754-2008 (as described for maxnum/minnum llvm ir) tells to return canonicalized numbers, so it is the opposite: GFX9 is finally IEEE compliant.

rampitec updated this revision to Diff 106062.Jul 11 2017, 10:34 AM

GFX9 handles denorms in min/max instructions, although its behavior in respect to signaling NaN handling is not identical if they are quieted or not before the instruction. That is still possible to perform GFX9 specific part of min/max handling given non-nans flag.
However, given the controversy and that GFX9 part of the change is non-essential I have removed specialization here, replacing it with todo. It can be done separately when needed.

rampitec marked 12 inline comments as done.Jul 11 2017, 10:58 AM
rampitec added inline comments.
lib/Target/AMDGPU/SIISelLowering.cpp
4666–4669

This part is split-off the patch.

arsenm added inline comments.Jul 11 2017, 11:23 AM
lib/Target/AMDGPU/SIISelLowering.cpp
4637

Since FMAD always flushes I don't think it's OK to handle it

4672–4673

The SC name was SupportsMinMaxDenormModes. Either way works

4704–4707

This is really a check for whether SNaNs are enabled. We have the isKnownNeverSNan check for this already. I think we need to fix having separate FPExceptions and enableIEEEBit subtarget checks

test/CodeGen/AMDGPU/fcanonicalize-elimination.ll
3

This needs a run line for gfx9 with denormals disabled too

rampitec marked 4 inline comments as done.Jul 11 2017, 1:22 PM
rampitec added inline comments.
lib/Target/AMDGPU/SIISelLowering.cpp
4637

FMAD - Perform a * b + c, while getting the same result as the separately rounded operations.
It is essentially v_mac_f32 which always flushes, but if denorms are disabled it is lowered as fma.
So handling it here is correct, and there is a test test_fold_canonicalize_fmuladd_value_f32 for it.

4672–4673

This code part is dropped for now.

rampitec updated this revision to Diff 106086.Jul 11 2017, 1:24 PM
rampitec marked an inline comment as done.

Switched to use of isKnownNeverSNan() and added test for it.
Added test run line for GFX9 without denorms.

arsenm accepted this revision.Jul 12 2017, 1:41 PM

LGTM. Can you file a bug for the broken minnum/maxnum denormal handling

lib/Target/AMDGPU/SIISelLowering.cpp
4707

This would read more accurately as IsIEEEMode || isKnownNeverSNan.

This revision is now accepted and ready to land.Jul 12 2017, 1:41 PM
rampitec updated this revision to Diff 106298.Jul 12 2017, 1:47 PM
rampitec marked an inline comment as done.

Reorganized IEEE/knowNeverSNan condition per review.

This revision was automatically updated to reflect the committed changes.

LGTM. Can you file a bug for the broken minnum/maxnum denormal handling

Bug filed.