This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Try to make isKnownNeverSNan more accurate
AbandonedPublic

Authored by arsenm on Jul 19 2018, 11:14 AM.

Details

Summary

If I'm interpreting the standard correctly, everything
here was essentially incorrect.

Saying no if fp exceptions are not enabled, then no value is an snan is incorrect.
A loaded value could still be an snan. If FP exceptions
are enabled, certain operations will produce an snan.
Common operations won't produce an snan, and will quiet
an snan input.

Try to get the set of operations that would raise an exception if
they were to produce an snan more correct.

Diff Detail

Event Timeline

arsenm created this revision.Jul 19 2018, 11:14 AM

I would tend to say isKnownNeverSNan here basically tells not that it cannot be sNaN, but rather that we do not care even if it is. At least we should not care if FP exceptions are off.

rampitec added inline comments.Jul 19 2018, 12:48 PM
lib/Target/AMDGPU/SIISelLowering.cpp
6733

Do these quiet incoming sNaNs?

I would tend to say isKnownNeverSNan here basically tells not that it cannot be sNaN, but rather that we do not care even if it is. At least we should not care if FP exceptions are off.

Except that's not true since there can be sNaNs can be loaded, so even if exceptions aren't handled they need to be dealt with

lib/Target/AMDGPU/SIISelLowering.cpp
6733

That's my understanding of how the basic operations work

scanon added inline comments.Jul 19 2018, 2:41 PM
lib/Target/AMDGPU/SIISelLowering.cpp
6733

Yes, all computational operations quiet sNaNs. The only things that produce sNaN are fcopysign, fabs, (fneg would if we had it), and things like loads and bitcasts.

scanon added inline comments.Jul 19 2018, 2:42 PM
lib/Target/AMDGPU/SIISelLowering.cpp
6733

(and to be clear, fcopysign and fabs can only produce sNaN if their input is sNaN.)

I would tend to say isKnownNeverSNan here basically tells not that it cannot be sNaN, but rather that we do not care even if it is. At least we should not care if FP exceptions are off.

Except that's not true since there can be sNaNs can be loaded, so even if exceptions aren't handled they need to be dealt with

Do you mean we shall quiet snans with an fmin/fmax even if exception handling is off? My understanding was different. In fact I thought we can completely ignore them with say fast math.

At the very least it is knownNeverNan, then it cannot be a signaling nan as well.

I would tend to say isKnownNeverSNan here basically tells not that it cannot be sNaN, but rather that we do not care even if it is. At least we should not care if FP exceptions are off.

Except that's not true since there can be sNaNs can be loaded, so even if exceptions aren't handled they need to be dealt with

Do you mean we shall quiet snans with an fmin/fmax even if exception handling is off? My understanding was different. In fact I thought we can completely ignore them with say fast math.

Yes. My understanding is the exception handling aspect is different from the existence of sNaNs, and also orthogonal to enabling fast math. If exception handling is enabled, operations that can produce sNaNs will, but there can still be sNaNs loaded from memory, constants etc. I think if the function is marked no-nans, we can maybe assume this won't happen? The fast math flags aren't sufficient there because arbitrary operations like a load or function argument don't have an associated flag.

With ieee_mode enabled (which is currently the default) v_min_f32 et. al. a qNaN is returned if either input is an sNaN. If ieee_mode is off, it returns the non-NaN operand as-if it were a qNaN. I think for what we actually want, ieee_mode is harmful since it requires the library implementation of OpenCL's fmin to insert canonicalizes to quiet the inputs. Since LLVM doesn't properly handle sNaNs anywhere, I think enabling this is a bit pointless. However, whether it's on or not, I think we need to to get sNaN behavior correct at least for this one operation in order to be able to optimize out redundant canonicalizes.

arsenm updated this revision to Diff 156523.Jul 20 2018, 9:52 AM

Check ieee_mode in minnum/maxnum handling, although I think it is totally broken that this is necessary

First, I think this is wrong diff attached, that is not what is in the trunk on the left side of the diff now.

I would tend to say isKnownNeverSNan here basically tells not that it cannot be sNaN, but rather that we do not care even if it is. At least we should not care if FP exceptions are off.

Except that's not true since there can be sNaNs can be loaded, so even if exceptions aren't handled they need to be dealt with

Do you mean we shall quiet snans with an fmin/fmax even if exception handling is off? My understanding was different. In fact I thought we can completely ignore them with say fast math.

Yes. My understanding is the exception handling aspect is different from the existence of sNaNs, and also orthogonal to enabling fast math. If exception handling is enabled, operations that can produce sNaNs will, but there can still be sNaNs loaded from memory, constants etc. I think if the function is marked no-nans, we can maybe assume this won't happen? The fast math flags aren't sufficient there because arbitrary operations like a load or function argument don't have an associated flag.

With ieee_mode enabled (which is currently the default) v_min_f32 et. al. a qNaN is returned if either input is an sNaN. If ieee_mode is off, it returns the non-NaN operand as-if it were a qNaN. I think for what we actually want, ieee_mode is harmful since it requires the library implementation of OpenCL's fmin to insert canonicalizes to quiet the inputs. Since LLVM doesn't properly handle sNaNs anywhere, I think enabling this is a bit pointless. However, whether it's on or not, I think we need to to get sNaN behavior correct at least for this one operation in order to be able to optimize out redundant canonicalizes.

More general, we have several questions:

  1. Should we care about sNaNs with FP exceptions disabled?
  2. Should we care about sNaNs if a node is known never NaN?
  3. Should we care about sNaNs if no-nans is enabled?
  4. Should we care about sNaNs if fast-math is enabled?
  5. Should we turn ieee_mode off?

Let’s see:

  1. Description of setHasFloatingPointExceptions():
/// Tells the code generator that this target supports floating point
/// exceptions and cares about preserving floating point exception behavior.

I read it this way: if we say no here, we do not care about preserving floating point exception behavior. I.e. we may assume there are no sNaNs even if they are present.

  1. sNaN is a signaling NaN, so it is a NaN. If a node is known to be not a NaN, it cannot be its signaling form as well.
  2. -enable-no-nans-fp-math and derivatives:
/// NoNaNsFPMath - This flag is enabled when the
/// -enable-no-nans-fp-math flag is specified on the command line. When
/// this flag is off (the default), the code generator is not allowed to
/// assume the FP arithmetic arguments and results are never NaNs.

So then if that flag is on we are allowed to assume not only FP arithmetic results are never NaN, but also arguments. Even if they are and even if they are loaded or constant sNaN/NaN.

  1. UnsafeAlgebra includes NoNaNs and thus all of the above applies.
  2. I do not believe we have to disable ieee_mode.
    • Even min/max handling was changed few times with and without ieee_mode bit in different generations of our GPU, so that is not a universal solution.
    • Other fp operations will not quiet sNaNs as required if you disable it.

I believe we were exploring the idea to disable ieee in the past and found we will have more problems without it. With ieee we only have min/max problem.

I would tend to say isKnownNeverSNan here basically tells not that it cannot be sNaN, but rather that we do not care even if it is. At least we should not care if FP exceptions are off.

Except that's not true since there can be sNaNs can be loaded, so even if exceptions aren't handled they need to be dealt with

Do you mean we shall quiet snans with an fmin/fmax even if exception handling is off? My understanding was different. In fact I thought we can completely ignore them with say fast math.

Yes. My understanding is the exception handling aspect is different from the existence of sNaNs, and also orthogonal to enabling fast math. If exception handling is enabled, operations that can produce sNaNs will, but there can still be sNaNs loaded from memory, constants etc. I think if the function is marked no-nans, we can maybe assume this won't happen? The fast math flags aren't sufficient there because arbitrary operations like a load or function argument don't have an associated flag.

With ieee_mode enabled (which is currently the default) v_min_f32 et. al. a qNaN is returned if either input is an sNaN. If ieee_mode is off, it returns the non-NaN operand as-if it were a qNaN. I think for what we actually want, ieee_mode is harmful since it requires the library implementation of OpenCL's fmin to insert canonicalizes to quiet the inputs. Since LLVM doesn't properly handle sNaNs anywhere, I think enabling this is a bit pointless. However, whether it's on or not, I think we need to to get sNaN behavior correct at least for this one operation in order to be able to optimize out redundant canonicalizes.

More general, we have several questions:

  1. Should we care about sNaNs with FP exceptions disabled?

Yes. This has to work. The OpenCL conformance tests check for this

  1. Should we care about sNaNs if a node is known never NaN?

Knowing it's never nan is how you know you don't need to care about it

  1. Should we care about sNaNs if no-nans is enabled?

I think this is a grey area, but probably not.

  1. Should we care about sNaNs if fast-math is enabled?
  2. Should we turn ieee_mode off?

Let’s see:

  1. Description of setHasFloatingPointExceptions():
/// Tells the code generator that this target supports floating point
/// exceptions and cares about preserving floating point exception behavior.

I read it this way: if we say no here, we do not care about preserving floating point exception behavior. I.e. we may assume there are no sNaNs even if they are present.

This is not true. Assuming this will break the fmin/fmax conformance tests. This is now just something we always turn on. We don't enable trap on FP exception, but that doesn't mean that there aren't sNaNs somewhere that need to be handled correctly. This property also should probably be removed, since things are moving to relying on the STRICT_* versions of operations to get this behavior

  1. sNaN is a signaling NaN, so it is a NaN. If a node is known to be not a NaN, it cannot be its signaling form as well.

Yes

  1. -enable-no-nans-fp-math and derivatives:
/// NoNaNsFPMath - This flag is enabled when the
/// -enable-no-nans-fp-math flag is specified on the command line. When
/// this flag is off (the default), the code generator is not allowed to
/// assume the FP arithmetic arguments and results are never NaNs.

So then if that flag is on we are allowed to assume not only FP arithmetic results are never NaN, but also arguments. Even if they are and even if they are loaded or constant sNaN/NaN.

Yes

  1. UnsafeAlgebra includes NoNaNs and thus all of the above applies.

I do not agree with this. The per-instruction flags have decoupled the unsafe algebra properties from no-nans, and the per-function/global flags should follow suit.

arsenm updated this revision to Diff 156576.Jul 20 2018, 1:07 PM

Attach right diff

  1. Should we care about sNaNs with FP exceptions disabled?

Yes. This has to work. The OpenCL conformance tests check for this

OK. Makes sense. Just maybe we need to tell llvm that we actually support exceptions, which should not necessarily mean we trap on them.

  1. UnsafeAlgebra includes NoNaNs and thus all of the above applies.

I do not agree with this. The per-instruction flags have decoupled the unsafe algebra properties from no-nans, and the per-function/global flags should follow suit.

OK, UnsafeAlgebra is probably not a right condition. FastMathFlags::isFast() definitely is.

  1. UnsafeAlgebra includes NoNaNs and thus all of the above applies.

I do not agree with this. The per-instruction flags have decoupled the unsafe algebra properties from no-nans, and the per-function/global flags should follow suit.

OK, UnsafeAlgebra is probably not a right condition. FastMathFlags::isFast() definitely is.

Is fast is just all the flags enabled. No nans is just one bit of it

So given the discussion you seem to be missing DAG.isKnownNeverNaN(Op) condition.

rampitec added inline comments.Jul 20 2018, 1:32 PM
lib/Target/AMDGPU/SIISelLowering.h
317

It does not belong to this patch.

arsenm abandoned this revision.Jul 26 2018, 4:23 AM

Superseded by D49662 and D49841