This is an archive of the discontinued LLVM Phabricator instance.

Introduce intrinsic llvm.isnan
ClosedPublic

Authored by sepavloff on Jun 24 2021, 6:16 AM.

Details

Summary

Clang has builtin function '__builtin_isnan', which implements C
library function 'isnan'. This function now is implemented entirely in
clang codegen, which expands the function into set of IR operations.
There are three mechanisms by which the expansion can be made.

  • The most common mechanism is using an unordered comparison made by instruction 'fcmp uno'. This simple solution is target-independent and works well in most cases. It however is not suitable if floating point exceptions are tracked. Corresponding IEEE 754 operation and C function must never raise FP exception, even if the argument is a signaling NaN. Compare instructions usually does not have such property, they raise 'invalid' exception in such case. So this mechanism is unsuitable when exception behavior is strict. In particular it could result in unexpected trapping if argument is SNaN.
  • Another solution was implemented in https://reviews.llvm.org/D95948. It is used in the cases when raising FP exceptions by 'isnan' is not allowed. This solution implements 'isnan' using integer operations. It solves the problem of exceptions, but offers one solution for all targets, however some can do the check in more efficient way.
  • Solution implemented by https://reviews.llvm.org/D96568 introduced a hook 'clang::TargetCodeGenInfo::testFPKind', which injects target specific code into IR. Now only SystemZ implements this hook and it generates a call to target specific intrinsic function.

Although these mechanisms allow to implement 'isnan' with enough
efficiency, expanding 'isnan' in clang has drawbacks:

  • The operation 'isnan' is hidden behind generic integer operations or target-specific intrinsics. It complicates analysis and can prevent some optimizations.
  • IR can be created by tools other than clang, in this case treatment of 'isnan' has to be duplicated in that tool.

Another issue with the current implementation of 'isnan' comes from the
use of options '-ffast-math' or '-fno-honor-nans'. If such option is
specified, 'fcmp uno' may be optimized to 'false'. It is valid
optimization in general, but it results in 'isnan' always returning
'false'. For example, in some libc++ implementations the following code
returns 'false':

std::isnan(std::numeric_limits<float>::quiet_NaN())

The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
operands are never NaNs. This assumption however should not be applied
to the functions that check FP number properties, including 'isnan'. If
such function returns expected result instead of actually making
checks, it becomes useless in many cases. The option '-ffast-math' is
often used for performance critical code, as it can speed up execution
by the expense of manual treatment of corner cases. If 'isnan' returns
assumed result, a user cannot use it in the manual treatment of NaNs
and has to invent replacements, like making the check using integer
operations. There is a discussion in https://reviews.llvm.org/D18513#387418,
which also expresses the opinion, that limitations imposed by
'-ffast-math' should be applied only to 'math' functions but not to
'tests'.

To overcome these drawbacks, this change introduces a new IR intrinsic
function 'llvm.isnan', which realises the check as specified by IEEE-754
and C standards in target-agnostic way. During IR transformations it
does not undergo undesirable optimizations. It reaches instruction
selection, where is lowered in target-dependent way. The lowering can
vary depending on options like '-ffast-math' or '-ffp-model' so the
resulting code satisfies requested semantics.

Diff Detail

Event Timeline

sepavloff created this revision.Jun 24 2021, 6:16 AM
sepavloff requested review of this revision.Jun 24 2021, 6:16 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJun 24 2021, 6:16 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Are you planning to do this for the other FP test builtin (isinf, isfinite, isinf_sign, isnormal)?

Are you planning to do this for the other FP test builtin (isinf, isfinite, isinf_sign, isnormal)?

Yes, they have similar problems.
If someone would like to implement them it would be nice.

Are you planning to do this for the other FP test builtin (isinf, isfinite, isinf_sign, isnormal)?

Yes, they have similar problems.
If someone would like to implement them it would be nice.

From a user perspective, it seems sub-optimal to postpone the others "untile someone wants to implement them".
Once isnan behaves differently than the rest, I can see users being confused and rightfully so.
I don't have a strong opinion but I'd prefer we switch them over together.

Are you planning to do this for the other FP test builtin (isinf, isfinite, isinf_sign, isnormal)?

Yes, they have similar problems.
If someone would like to implement them it would be nice.

From a user perspective, it seems sub-optimal to postpone the others "untile someone wants to implement them".
Once isnan behaves differently than the rest, I can see users being confused and rightfully so.
I don't have a strong opinion but I'd prefer we switch them over together.

Sure. I Just want to say that if someone wants or has plans to implement these functions, I appreciate these efforts. This functionality is in my plans. Considering only one function in this patch must facilitatу the review process.

Doesn't gcc also fold isnan to false under fast math? If we diverge here that means your code would only work correctly with clang.

Doesn't gcc also fold isnan to false under fast math? If we diverge here that means your code would only work correctly with clang.

GCC does the same transformation, ICC and MSVC do not: https://godbolt.org/z/ovboWqPeb. Both clang and GCC do this transformation only with optimization level > 0. With -O0 both produce code that does real check, so semantic of the produced code is different depending on optimization level, it looks more like a bug rather than feature.

There is a GCC ticket: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84949, which refers to the similar thing. It is created against to libstdc++ but the reason is in the compiler. In one on the comments there (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84949#c8) an opinion is expressed:

... My conclusion: std::numeric_limits means "has NaN bitpattern" and "has IEC559 bit layout" not "has NaNs with NaN behavior" and "has IEC559 behavior".

So this behavior is considered as incorrect.

LGTM (besides comment fix) but I'm not too familiar with the vector side of things

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
6987

I seem to have made a mistake when I wrote this.

craig.topper added inline comments.Jun 25 2021, 10:28 AM
llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
665

Don't you net to preserve the NoFPExcept flag? Same with all the other type legalization functions

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
601

If this is ResultVT then the Extend created next is always a NOP. Should this be MVT::i1?

4730

I wonder if we should be using getSetCCResultType here like WidenVecOp_SETCC?

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6422

Why not pass flags to getNode?

6426

This breaks if we add constant folding for ISD::ISNAN to getNode.

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7001

Why can't we just check < 0 here? Why do we need to shift?

craig.topper added inline comments.Jun 25 2021, 11:54 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
22183

The code you copied this form was overly complicated. You can output Glue instead of MVT::i16 from XAM node and then pass that directly to FNSTSW16r in place of FPSW, FPSW.getValue(1). I have made this change to X86ISelDAGToDAG.cpp

A few of the AArch64 sequences don't look ideal, but that's not the fault of your patch.

I'd like to see some test coverage for all the floating-point types (half, bfloat16, ppc_fp128).

sepavloff updated this revision to Diff 355539.Jun 30 2021, 7:14 AM

Addressed reviewer's notes

  • Math flags are set when ISNAN node is transformed,
  • They are set in calls to getNode,
  • WidenVecOp_ISNAN is made similar to WidenVecOp_SETCC,
  • Building ISNAN node was changed,
  • Fixed error in comment,
  • Removed extra shift.

I'd like to see some test coverage for all the floating-point types (half, bfloat16, ppc_fp128).

Tests for half are added to aarch64-fpclass.ll, new file was created to test ppc_fp128. As for bfloat16, tests are added to aarch64-fpclass.ll but only in strictfp mode. Without this attribute codegen creates unordered compare operation, which is not supported for bfloat16.

sepavloff added inline comments.Jun 30 2021, 7:26 AM
llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
665

Yes, it is more correct way. Updated functions.

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
601

Indeed. Thank you!

4730

Rewritten the function in this way.

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6422

Yes, it should be set via getNode.

6426

Now expansion occurs before getNode.

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
6987

Thank you! I updated the comment.

7001

It seems the shift is not needed.

xgupta added a subscriber: xgupta.Jun 30 2021, 7:46 AM

Missed optimization in X86 codegen proposed by Craig

The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
operands are never NaNs. This assumption however should not be applied
to the functions that check FP number properties, including 'isnan'. If
such function returns expected result instead of actually making
checks, it becomes useless in many cases.

This doesn't work the way you want it to, at least given the way nnan/ninf are currently defined in LangRef. It's possible to end up in a situation where isnan(x) == isnan(x) evaluates to false at runtime. It doesn't matter how you compute isnan; the problem is that the input is poisoned.

I think the right solution to this sort of issue is to insert a "freeze" in the LLVM IR, or something like that. Not sure how we'd expect users to write this in C. Suggestions welcome.

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
6981

Maybe we want to consider falling back to the integer path if SETCC isn't legal for the given operand type? We could do that as a followup, though.

7001

Instead of emitting ExpMaskV - AbsV != 0, can we just emit ExpMaskV != AbsV?

Matt added a subscriber: Matt.Jul 20 2021, 7:07 AM
sepavloff updated this revision to Diff 361707.Jul 26 2021, 9:58 AM

Updated patch

  • Rebased,
  • Applied small enhancement to integer implementation.
tschuett added inline comments.
llvm/docs/LangRef.rst
20991

The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
operands are never NaNs. This assumption however should not be applied
to the functions that check FP number properties, including 'isnan'. If
such function returns expected result instead of actually making
checks, it becomes useless in many cases.

This doesn't work the way you want it to, at least given the way nnan/ninf are currently defined in LangRef. It's possible to end up in a situation where isnan(x) == isnan(x) evaluates to false at runtime. It doesn't matter how you compute isnan; the problem is that the input is poisoned.

I think the right solution to this sort of issue is to insert a "freeze" in the LLVM IR, or something like that. Not sure how we'd expect users to write this in C. Suggestions welcome.

According to the documentation, nnan/ninf may be applied to fneg, fadd, fsub, fmul, fdiv, frem, fcmp, phi, select and call. We can ignore this flag for calls of isnan and similar functions. Of course, if conditions of using -ffast-math are broken, we have undefined behavior and isnan(x) != isnan(x) becomes possible, like in this code:

%c = fadd %a, nan
%r = call llvm.isnan.f32(%c)

Similarly, it is legitimate to optimize isnan in the code:

%c = fadd %a, %b
%r = call llvm.isnan.f32(%c)

In this case the result of fadd cannot be NaN, otherwise contract of -ffast-math is broken. So isnan in this case may be optimized to false.

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
6981

It makes sense, it could be beneficial for targets that have limited set of floating point comparisons. However straightforward check like:

if (Flags.hasNoFPExcept() && isOperationLegalOrCustom(ISD::SETCC, OperandVT))

results in worse code, mainly for vector types. It should be more complex check.

7001

Instead of emitting ExpMaskV - AbsV != 0, can we just emit ExpMaskV != AbsV?

Implemented.

Fixed documentation

sepavloff added inline comments.Jul 26 2021, 10:29 AM
llvm/docs/LangRef.rst
20991

Fixed, thank you.

The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
operands are never NaNs. This assumption however should not be applied
to the functions that check FP number properties, including 'isnan'. If
such function returns expected result instead of actually making
checks, it becomes useless in many cases.

This doesn't work the way you want it to, at least given the way nnan/ninf are currently defined in LangRef. It's possible to end up in a situation where isnan(x) == isnan(x) evaluates to false at runtime. It doesn't matter how you compute isnan; the problem is that the input is poisoned.

I think the right solution to this sort of issue is to insert a "freeze" in the LLVM IR, or something like that. Not sure how we'd expect users to write this in C. Suggestions welcome.

According to the documentation, nnan/ninf may be applied to fneg, fadd, fsub, fmul, fdiv, frem, fcmp, phi, select and call. We can ignore this flag for calls of isnan and similar functions. Of course, if conditions of using -ffast-math are broken, we have undefined behavior and isnan(x) != isnan(x) becomes possible, like in this code:

Right... so how can you produce a NaN in these circumstances? You could load one from memory, I guess?

It would probably be a good idea to have an instcombine that combines away isnan on a value produced by an operation marked nnan, so we don't confuse people reading assembly into assuming isnan is actually reliable in that context.

Add InstCombine optimization for nnan operations

The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
operands are never NaNs. This assumption however should not be applied
to the functions that check FP number properties, including 'isnan'. If
such function returns expected result instead of actually making
checks, it becomes useless in many cases.

This doesn't work the way you want it to, at least given the way nnan/ninf are currently defined in LangRef. It's possible to end up in a situation where isnan(x) == isnan(x) evaluates to false at runtime. It doesn't matter how you compute isnan; the problem is that the input is poisoned.

I think the right solution to this sort of issue is to insert a "freeze" in the LLVM IR, or something like that. Not sure how we'd expect users to write this in C. Suggestions welcome.

According to the documentation, nnan/ninf may be applied to fneg, fadd, fsub, fmul, fdiv, frem, fcmp, phi, select and call. We can ignore this flag for calls of isnan and similar functions. Of course, if conditions of using -ffast-math are broken, we have undefined behavior and isnan(x) != isnan(x) becomes possible, like in this code:

Right... so how can you produce a NaN in these circumstances? You could load one from memory, I guess?

Yes, they come from structures in memory. I think they can also come from function arguments, if some source files are compiled with option -ffast-math and some without.

It would probably be a good idea to have an instcombine that combines away isnan on a value produced by an operation marked nnan, so we don't confuse people reading assembly into assuming isnan is actually reliable in that context.

Added such transformation (file llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp, test in llvm/test/Transforms/InstSimplify/ConstProp/fpclassify.ll).

RKSimon added inline comments.Jul 28 2021, 12:24 PM
llvm/test/CodeGen/X86/x86-fpclass.ll
174

add nounwind to reduce cfi noise (other tests would benefit as well)?

llvm/test/Transforms/InstSimplify/ConstProp/fpclassify.ll
2

Use update_test_checks.py?

sepavloff updated this revision to Diff 362770.Jul 29 2021, 7:30 AM

Updated tests

  • Use update_test_checks to generate assertions in one file,
  • Use nounwrap to get rid of .cfi directives.
sepavloff added inline comments.Jul 29 2021, 7:32 AM
llvm/test/CodeGen/X86/x86-fpclass.ll
174

Good hint, thank you!

llvm/test/Transforms/InstSimplify/ConstProp/fpclassify.ll
2

Done.

RKSimon added inline comments.Jul 29 2021, 7:42 AM
llvm/test/Transforms/InstCombine/fpclass.ll
30

You probably need some negative tests (no flags, ninf instead of nnan etc.)?

sepavloff updated this revision to Diff 363297.Jul 31 2021, 7:44 AM

Added tests and changed check for legality of SETCC

  • Added tests for InstCombine,
  • Modified condition that checks for lowering of SETCC. Now the check only analyze SETCC for scalar type. It results in better code in most cases.
sepavloff added inline comments.Jul 31 2021, 7:47 AM
llvm/test/Transforms/InstCombine/fpclass.ll
30

Added few such tests.

efriedma accepted this revision.Jul 31 2021, 12:48 PM

LGTM. (Since there have been a bunch of reviewers involved, please give a few days before you merge.)

This revision is now accepted and ready to land.Jul 31 2021, 12:48 PM
This revision was landed with ongoing or failed builds.Aug 4 2021, 1:28 AM
This revision was automatically updated to reflect the committed changes.

This appears to have caused some failures on PPC buildbots. For example: https://lab.llvm.org/buildbot/#/builders/105/builds/13446
We are investigating this. Can you please pull this to bring the bots back to green until we track down the reason for the problem and can provide a fix?

This comment was removed by nemanjai.

Rather than reverting this commit again, I pushed 62fe3dcf98d1 to use the same expansion as before (using unordered comparison) for ppc_fp128. I am not sure if there are other types that suffer from the same problem (perhaps the Intel 80-bit long double).

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
6981

Out of curiosity, why was this added when you recognized that it results in worse code? This is certainly part of the reason for the regression for ppc_fp128.

It would appear that before this patch, we would emit a comparison for all types that are not IEEE FP types (such as ppc_fp128). Those semantics do not seem to have carried over.

Rather than reverting this commit again, I pushed 62fe3dcf98d1 to use the same expansion as before (using unordered comparison) for ppc_fp128.

Thank you very much for fixing that!

I am not sure if there are other types that suffer from the same problem (perhaps the Intel 80-bit long double).

Intel fp80 is also non-IEEE type but it got custom lowering in this patch. There is little chance for such type to work properly without custom lowering.

I am working on patch that would add custom lowering of llvm.isnan to PowerPC.

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
6981

Out of curiosity, why was this added when you recognized that it results in worse code? This is certainly part of the reason for the regression for ppc_fp128.

It is my mistake. After experiments I forgot to remove this change. I am sorry.

For x86 and AArch64 I used modified test-suite, with changes from D106804. Without proper tests it is hard to reveal why one intrinsic starts to fail.

It would appear that before this patch, we would emit a comparison for all types that are not IEEE FP types (such as ppc_fp128). Those semantics do not seem to have carried over.

The previous behavior is not correct in non-default FP environment. Unordered comparison raises Invalid exception if an operand is signaling NaN. On the other hand, isnan must never raise exceptions.

nemanjai added inline comments.Aug 9 2021, 9:23 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
6981

Well, if the must never raise exceptions is an IEEE-754 requirement (i.e. as noted in 5.7.2), I think it is reasonable that operations on types that do not conform to IEEE-754 are not bound by it.

sepavloff added inline comments.Aug 10 2021, 9:18 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
6981

C standard defines macro isnan, of which the recent draft (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2596.pdf, F.3p6) states:

The C classification macros fpclassify, iscanonical, isfinite, isinf, isnan, isnormal,
issignaling, issubnormal, and iszero provide the IEC 60559 operations indicated in the table above 
provided their arguments are in the format of their semantic type. Then these macros
raise no floating-point exceptions, even if an argument is a signaling NaN.

This statement is not restricted to IEEE-compatible types, so any floating point type must behave according to this statement.

dxf added a subscriber: dxf.Aug 11 2021, 7:50 PM
sivachandra added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
22193

While I do not understand the code mechanics of this patch, I am mostly in agreement with the general direction of this patch. However, it has lead to a change in behavior wrt 80-bit x86 floating point numbers. Unlike the 32-bit and 64-bit floating point numbers, 80-bit numbers have an additional class of "Unsupported Numbers". Those numbers were previously treated as NaNs. Since this change uses the fxam instruction to classify the input number, that is not the case any more as the fxam instruction distinguishes between unsupported numbers and NaNs. So, to restore the previous behavior, can we extend this patch to treat unsupported numbers as NaNs? At a high level, what I am effectively saying is that we should implement isnan this way:

bool isnan(long double x) {
  uint16_t status;
  __asm__ __volatile__("fldt %0" : : "m"(x));
  __asm__ __volatile__("fxam");
  __asm__ __volatile__("fnstsw %0": "=m"(status):);
  uint16_t c0c2c3 = (status >> 8) & 0x45;
  return c0c2c3 <= 1; // This patch seems to be only doing c0c2c3 == 1 check.
}
xgupta removed a subscriber: xgupta.Aug 12 2021, 12:28 AM

Patch D108037 must make lowering of llvm.isnan more close to the code produced by __builtin_isnan earlier.

spatel added a subscriber: spatel.Aug 20 2021, 7:50 AM

Is it intentional that we are not canonicalizing the intrinsic call back to fcmp uno in the default FP environment?

Is it intentional that we are not canonicalizing the intrinsic call back to fcmp uno in the default FP environment?

It is lowered to unordered comparison by default. Changing llvm.isnan to fcmp uno somewhere in IR would make it possible to optimize out the latter if fast-math mode is on. Preserving semantics of isnan when fast-math is in effect was one of the goals of this change.

Is it intentional that we are not canonicalizing the intrinsic call back to fcmp uno in the default FP environment?

It is lowered to unordered comparison by default. Changing llvm.isnan to fcmp uno somewhere in IR would make it possible to optimize out the latter if fast-math mode is on. Preserving semantics of isnan when fast-math is in effect was one of the goals of this change.

Eeek. Was there an RFC about this?
This does not sound good to me at all,
much like "let's not apply fast-math flags to x86 vector intrinsics".

Is it intentional that we are not canonicalizing the intrinsic call back to fcmp uno in the default FP environment?

It is lowered to unordered comparison by default. Changing llvm.isnan to fcmp uno somewhere in IR would make it possible to optimize out the latter if fast-math mode is on. Preserving semantics of isnan when fast-math is in effect was one of the goals of this change.

I understand that the codegen was supposed to be no worse, but the difference in IR causes optimizer regressions like:
https://llvm.org/PR51556

If we want this intrinsic (and its siblings that haven't been created yet) to survive through IR, then we have to enhance IR passes to recognize the new patterns.
It would be easier to do this in steps: (1) create the intrinsic only if not in the default FP env, (2) update IR analysis/passes to recognize the intrinsic, (3) create the intrinsic in the default FP env with no FMF, (4) create the intrinsic always.

xbolva00 added a subscriber: xbolva00.EditedAug 20 2021, 9:13 AM

Is it intentional that we are not canonicalizing the intrinsic call back to fcmp uno in the default FP environment?

It is lowered to unordered comparison by default. Changing llvm.isnan to fcmp uno somewhere in IR would make it possible to optimize out the latter if fast-math mode is on. Preserving semantics of isnan when fast-math is in effect was one of the goals of this change.

I understand that the codegen was supposed to be no worse, but the difference in IR causes optimizer regressions like:
https://llvm.org/PR51556

If we want this intrinsic (and its siblings that haven't been created yet) to survive through IR, then we have to enhance IR passes to recognize the new patterns.
It would be easier to do this in steps: (1) create the intrinsic only if not in the default FP env, (2) update IR analysis/passes to recognize the intrinsic, (3) create the intrinsic in the default FP env with no FMF, (4) create the intrinsic always.

Meanwhile this should be reverted.. also possibly in llvm 13? (Regressions vs 12)

Is it intentional that we are not canonicalizing the intrinsic call back to fcmp uno in the default FP environment?

It is lowered to unordered comparison by default. Changing llvm.isnan to fcmp uno somewhere in IR would make it possible to optimize out the latter if fast-math mode is on. Preserving semantics of isnan when fast-math is in effect was one of the goals of this change.

I understand that the codegen was supposed to be no worse, but the difference in IR causes optimizer regressions like:
https://llvm.org/PR51556

If we want this intrinsic (and its siblings that haven't been created yet) to survive through IR, then we have to enhance IR passes to recognize the new patterns.
It would be easier to do this in steps: (1) create the intrinsic only if not in the default FP env, (2) update IR analysis/passes to recognize the intrinsic, (3) create the intrinsic in the default FP env with no FMF, (4) create the intrinsic always.

+1, but right now i'm not sold on the behavior of not optimizing away NaN checks in no-NaN's mode.
At least that part should be reconciled now. It *might* be an improvement, but it caters to expectations
of one group while catering away from the documentation and existing expectations of other groups.
This shouldn't be decided in a review, it should be driven by an RFC.

Is it intentional that we are not canonicalizing the intrinsic call back to fcmp uno in the default FP environment?

It is lowered to unordered comparison by default. Changing llvm.isnan to fcmp uno somewhere in IR would make it possible to optimize out the latter if fast-math mode is on. Preserving semantics of isnan when fast-math is in effect was one of the goals of this change.

I understand that the codegen was supposed to be no worse, but the difference in IR causes optimizer regressions like:
https://llvm.org/PR51556

If we want this intrinsic (and its siblings that haven't been created yet) to survive through IR, then we have to enhance IR passes to recognize the new patterns.
It would be easier to do this in steps: (1) create the intrinsic only if not in the default FP env, (2) update IR analysis/passes to recognize the intrinsic, (3) create the intrinsic in the default FP env with no FMF, (4) create the intrinsic always.

+1, but right now i'm not sold on the behavior of not optimizing away NaN checks in no-NaN's mode.
At least that part should be reconciled now. It *might* be an improvement, but it caters to expectations
of one group while catering away from the documentation and existing expectations of other groups.
This shouldn't be decided in a review, it should be driven by an RFC.

Agree. I think a revert followed by RFC to make sure there is consensus on semantics is needed.

kpn added a comment.Aug 20 2021, 10:30 AM

Is it intentional that we are not canonicalizing the intrinsic call back to fcmp uno in the default FP environment?

It is lowered to unordered comparison by default. Changing llvm.isnan to fcmp uno somewhere in IR would make it possible to optimize out the latter if fast-math mode is on. Preserving semantics of isnan when fast-math is in effect was one of the goals of this change.

Eeek. Was there an RFC about this?
This does not sound good to me at all,
much like "let's not apply fast-math flags to x86 vector intrinsics".

We can switch into and out of the default FP environment inside a single function. If we want different behavior based on the FP environment then this should be a constrained intrinsic. Then the intrinsic would know the FP environment, or at least enough about it to know if traps and FP status bits are relevant.

I think the distinction is constrained vs non-constrained because FMF can optionally be used in both cases.

Is it intentional that we are not canonicalizing the intrinsic call back to fcmp uno in the default FP environment?

It is lowered to unordered comparison by default. Changing llvm.isnan to fcmp uno somewhere in IR would make it possible to optimize out the latter if fast-math mode is on. Preserving semantics of isnan when fast-math is in effect was one of the goals of this change.

Eeek. Was there an RFC about this?
This does not sound good to me at all,
much like "let's not apply fast-math flags to x86 vector intrinsics".

We can switch into and out of the default FP environment inside a single function.

Really? The constrained intrinsic documentation claims the reverse (https://llvm.org/docs/LangRef.html#constrainedfp):

If any FP operation in a function is constrained then they all must be constrained. This is required for correct LLVM IR. Optimizations that move code around can create miscompiles if mixing of constrained and normal operations is done. The correct way to mix constrained and less constrained operations is to use the rounding mode and exception handling metadata to mark constrained intrinsics as having LLVM’s default behavior.

Is it intentional that we are not canonicalizing the intrinsic call back to fcmp uno in the default FP environment?

It is lowered to unordered comparison by default. Changing llvm.isnan to fcmp uno somewhere in IR would make it possible to optimize out the latter if fast-math mode is on. Preserving semantics of isnan when fast-math is in effect was one of the goals of this change.

Eeek. Was there an RFC about this?
This does not sound good to me at all,
much like "let's not apply fast-math flags to x86 vector intrinsics".

We can switch into and out of the default FP environment inside a single function.

Really? The constrained intrinsic documentation claims the reverse (https://llvm.org/docs/LangRef.html#constrainedfp):

If any FP operation in a function is constrained then they all must be constrained. This is required for correct LLVM IR. Optimizations that move code around can create miscompiles if mixing of constrained and normal operations is done. The correct way to mix constrained and less constrained operations is to use the rounding mode and exception handling metadata to mark constrained intrinsics as having LLVM’s default behavior.

@sepavloff please can you undo the clang part of this change (+@hans) and post an RFC to further hash out the design here?

I posted the RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-August/152257.html

Depending on the feedback I'll revert the check or modify the implementation.

kpn added a comment.Aug 23 2021, 7:25 AM

Is it intentional that we are not canonicalizing the intrinsic call back to fcmp uno in the default FP environment?

It is lowered to unordered comparison by default. Changing llvm.isnan to fcmp uno somewhere in IR would make it possible to optimize out the latter if fast-math mode is on. Preserving semantics of isnan when fast-math is in effect was one of the goals of this change.

Eeek. Was there an RFC about this?
This does not sound good to me at all,
much like "let's not apply fast-math flags to x86 vector intrinsics".

We can switch into and out of the default FP environment inside a single function.

Really? The constrained intrinsic documentation claims the reverse (https://llvm.org/docs/LangRef.html#constrainedfp):

If any FP operation in a function is constrained then they all must be constrained. This is required for correct LLVM IR. Optimizations that move code around can create miscompiles if mixing of constrained and normal operations is done. The correct way to mix constrained and less constrained operations is to use the rounding mode and exception handling metadata to mark constrained intrinsics as having LLVM’s default behavior.

Use of constrained intrinsics does not mean that we are automatically in an alternate FP environment.

When constrained intrinsics are used and the metadata says the rounding mode is "tonearest" with exceptions set to "ignore" then that's the default FP environment. If, for example, #pragma STDC FENV_ACCESS is used in only a part of a function then the constrained intrinsics will be used in the entire function but the metadata will specify different exception or rounding behavior in the part covered by the FENV_ACCESS.

That the constrained intrinsics can state that they are in the default FP environment is what makes it safe for EarlyCSE to treat them the same as a normal FP instruction (which is assumed to be in the default FP environment). For example.

Ah fair enough. Thanks.