This is an archive of the discontinued LLVM Phabricator instance.

InstCombine: Perform basic isnan combines on llvm.is.fpclass
ClosedPublic

Authored by arsenm on Nov 10 2022, 7:13 PM.

Details

Summary

is.fpclass(x, qnan|snan) -> fcmp uno x, 0.0
is.fpclass(nnan x, qnan|snan|other) -> is.fpclass(x, other)

Start porting the existing combines from llvm.amdgcn.class to the
generic intrinsic. Start with the ones which aren't dependent on the
FP mode.

Diff Detail

Event Timeline

arsenm created this revision.Nov 10 2022, 7:13 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 10 2022, 7:13 PM
arsenm requested review of this revision.Nov 10 2022, 7:13 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 10 2022, 7:13 PM
Herald added a subscriber: wdng. · View Herald Transcript
sepavloff added inline comments.Nov 10 2022, 10:06 PM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
818–829

The second argument is described with ImmArg. It must always be a constant.

842

This replacement is not valid in general case, only if FP exceptions are ignored. If the argument is a signaling NaN, compare instruction raises Invalid exception.

849

It also is not always valid, only if the argument is not a signaling NaN or FP exceptions are ignored..

881

Constant folding should be done in ConstantFolding.cpp.

arsenm added inline comments.Nov 10 2022, 10:30 PM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
842

Isn't that implied by not being a constrained intrinsic?

849

Isn't that implied by not being a constrained intrinsic?

sepavloff added inline comments.Nov 10 2022, 11:30 PM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
842

is_fpclass does not depend on FP environment and does not change it. So it does not have constrained variant.

arsenm updated this revision to Diff 474788.Nov 11 2022, 9:05 AM

Rebase on added strictfp checks, and split code between InstSimplify and ConstantFolding

foad added inline comments.Nov 13 2022, 11:24 PM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
822

Seems like you should handle Mask == (fcAllFlags & ~fcNan) here too. Same for the other Mask == cases below. And allow InstCombine to "freely invert" this intrinsic by flipping all bits in the mask.

845

What's this !CVal check for?

sepavloff added inline comments.Nov 13 2022, 11:43 PM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
824

Is it profitable to make such replacement early? Is there any advantage, at least hypothetical?

846–850

Deleted code?

jcranmer-intel added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
819

It feels like you should be able to use ninf/nnan/nsz flags to modify the mask here to make it more likely to fall into one of these categories, although this may provoke some backlash from those who argue that nnan isnan(x) => unconditional false shouldn't be an allowable optimization.

837

You're also missing cases for Mask == fcPosInf and Mask == fcNegInf, which can similarly be lowered to quiet comparisons.

842

Builder.CreateFCmp* functions look like they create quiet comparison instructions in FP-constrained mode, and you need to use CreateFCmpS to generate one of the signalling comparisons. So replacing them even in strict mode is legal.

arsenm added inline comments.Nov 14 2022, 12:26 PM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
818–829

I'm debating removing this restriction

819

I think this is debatable, and beyond the scope of this patch where I'd like to simply move what we already do for the target specific intrinsic

824

I think an fcmp should be more canonical and better handled by existing optimizations, than class which is handled ~nowhere

837

Ditto, for now I'd like to just move the code

845

I moved this part to InstSimplify, will drop

sepavloff added inline comments.Nov 14 2022, 8:26 PM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
842

Quiet comparison instruction do not raise FP exception if an argument is quiet NaN. It still generates exceptions for signaling NaNs.

arsenm updated this revision to Diff 475996.Nov 16 2022, 8:32 PM
arsenm marked an inline comment as done.

Remove dead code, stop turning poison into undef

arsenm added inline comments.Nov 16 2022, 9:23 PM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
819

we can't actually put fast math flags here, because it's only allowed for calls with FP return types

foad added a comment.Nov 29 2022, 2:57 AM

LGTM modulo inline comments.

llvm/lib/Analysis/InstructionSimplify.cpp
6058–6066

Poison/undef checks should probably go first, before other simplifications?

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
822

Please add a TODO for this if you're not going to do it now.

846

Might be simpler to modify the instruction in place with setOperand, and then Worklist.pushUsersToWorkList. But perhaps that is less clear.

arsenm added inline comments.Nov 29 2022, 10:04 AM
llvm/lib/Analysis/InstructionSimplify.cpp
6058–6066

I specifically had these later. The first operand shouldn't matter based on the test. I'm thinking about the interaction between fast math optimizations, and class uses used to guard regions where nnan/ninf is expected.

foad added inline comments.Nov 30 2022, 1:37 AM
llvm/lib/Analysis/InstructionSimplify.cpp
6058–6066

It's maybe not worth arguing about, but...

LangRef says "Most instructions return ‘poison’ when one of their arguments is ‘poison’". If you're saying is.fpclass is an exception to that rule then it at least ought to be documented. But I'm not sure why simplifying to poison would be a problem for the kind of code you're talking about - do you have an example?

arsenm added inline comments.Nov 30 2022, 6:26 AM
llvm/lib/Analysis/InstructionSimplify.cpp
6058–6066

Cases like this, if you violate nnan/ninf but you have a test of the special cases:

div = fdiv nan x / nan
if (!is_fpclass(div, nan)) {
    // do fast stuff
}

I was viewing this as similar to select of poison. select with a poison condition propagates poison, but the value operands can be unobserved. Similarly here, a test of 0 shouldn't need to observe the actual compared value.

I was planning on revisiting this as I get further in optimizing all the special case checks in the math libraries. For now, this is the more conservative direction. The main thing I'm worried about is the asymmetry between equivalent fcmps and the class if we make this the expected behavior.

arsenm updated this revision to Diff 478935.Nov 30 2022, 6:54 AM

Add todo to handle inverted masks

arsenm updated this revision to Diff 478940.Nov 30 2022, 7:08 AM

Fix stale test

sepavloff added inline comments.Nov 30 2022, 7:24 AM
llvm/lib/Analysis/InstructionSimplify.cpp
6058–6066

In the example you presented the operation producing NaN can be evaluated at compile time, so is_fpclass can be evaluated during compilation. There is no reason to create poison value here.

In the presence of nnan/ninf it is also saver to evaluate is_fpclass instead of poisoning it, because fdiv nan, nan and is_fpclass(div, nan) can come from different functions compiled with different fast-math settings merged, for example, by LTO. Poisoning can make a correct program non-working.

nlopes added inline comments.Nov 30 2022, 7:36 AM
llvm/lib/Analysis/InstructionSimplify.cpp
6058–6066

The issue of special casing handling of poison is that then you cannot easily convert to/from other instructions. Blocking propagation of poison is similar to freeze, which is sticky.
Select is an exception, as it blocks poison propagation, but the caveat is that we had to remove this transformation: select X, true, Y -> or X, Y (as the or propagates more poison than the select did).

So I would suggest to not special case poison anywhere unless it's really important for some reason.

arsenm updated this revision to Diff 478950.Nov 30 2022, 7:39 AM

Propagate poison. Leave undef alone, I still think test all/test none should win for undef

jyknight added inline comments.Nov 30 2022, 8:28 AM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
824

This seems OK.

I'm not sure it's the best choice -- if a CPU actually has an fpclassify instruction, is it really a good idea to canonicalize in generic code to fcmp? But I think that's fine to revisit later if it becomes a problem.

839

These removal of Mask bits should come before the Mask == $X tests, shouldn't they?

848

Why are unknown bits even accepted? ISTM it should be an error in Verifier::visitIntrinsicCall to pass invalid bits.

Also: commit description should mention the instcombine/etc changes; it makes it sound like just an amdgpu change.

foad added inline comments.Nov 30 2022, 8:56 AM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
839

I assume that this function will be called again to revisit the modified instrinsic.

arsenm added inline comments.Nov 30 2022, 9:01 AM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
824

I've been thinking if we can do a classify in <= 2 IR instructions, fcmp+fabs is probably a better canonical form. If a class pattern is 3-4+ instructions, the class is probably better. FCmp + fneg + fabs are always going to be more broadly understood. Fcmp also supports fast math flags, unlike class (I guess we could fix that though)

839

Right, it's revisited. I don't expect the bit removal part to ever actually happen

848

I don't know. Really this should be an i10 argument. I've been debating whether to add a verifier check, or make it an i10. I'd prefer to do that separately and clean up the bits here for now.

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
848

The LangRef documentation of llvm.is.fpclass doesn't pin down the handling for noncanonical values well. It's plausible they could be handled by extension of extra bits, but existing code seems to ignore them for ppc_fp128 and treat them as NaNs that are neither qNaN nor sNaN for x86_fp80.

Not that it makes any difference to this patch, but it suggests that making it a verifier check instead of an i10 is the better path, as it is slightly better future-proofed.

arsenm added inline comments.Nov 30 2022, 3:17 PM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
848

Another reason is the operand doesn't really need to be immarg. The AMDGPU class instruction handles a register just fine (in fact nearly every test mask needs to be materialized). In that case we would want folds to a canonicalizable constant value

arsenm updated this revision to Diff 481859.Dec 10 2022, 7:49 AM
arsenm retitled this revision from InstCombine: Port amdgcn.class intrinsic combines to is.fpclass to InstCombine: Perform basic isnan combines on llvm.is.fpclass.
arsenm edited the summary of this revision. (Show Details)

Break up patches into 4 pieces. Assume excess bits in test mask is a verifier error, and don't handle the FP mode dependent transforms here

arsenm updated this revision to Diff 481862.Dec 10 2022, 8:41 AM

Resplit patches to handle negated isnan case here

arsenm updated this revision to Diff 482842.Dec 14 2022, 6:30 AM

Rebase

llvm/test/Transforms/InstCombine/is_fpclass.ll
358

This will be recovered by D139130, depending on which lands first

foad added a comment.Dec 16 2022, 9:38 AM

is.fpclass(x, qnan|snan) -> fcmp uno x, 0.0
is.fpclass(nnan x, qnan|snan|other) -> is.fpclass(x, other)

The first one sounds good. Do you have a specific motivation for the second one?

is.fpclass(x, qnan|snan) -> fcmp uno x, 0.0
is.fpclass(nnan x, qnan|snan|other) -> is.fpclass(x, other)

The first one sounds good. Do you have a specific motivation for the second one?

Eventually all the class-like tests should be merged into one class call. We still want to reduce the number of tests to perform in the end and also don't want to be ordering dependent

Early conversion is_fpclass->fcmp may result in incorrect semantics. It make the code:

if (!isnan(x)) {
…
}

indistinguishable from:

if (x == x) {
…
}

But they have different behavior if x is signaling NaN. Although the replacement is proposed for strictfp functions only, inlining may require conversion to a form that uses constrained intrinsics. The resulting code would have different behavior than original.

inlining may require conversion to a form that uses constrained intrinsics

Functions which are not strictfp are allowed to introduce "spurious" fp flag writes; for example, we can flatten control flow that contains floating-point ops. Inlining the function doesn't change that general rule. The inliner converts fp operations just to ensure that later optimizations don't move those operations around.

kpn added a comment.Jan 18 2023, 11:49 AM

inlining may require conversion to a form that uses constrained intrinsics

Functions which are not strictfp are allowed to introduce "spurious" fp flag writes; for example, we can flatten control flow that contains floating-point ops. Inlining the function doesn't change that general rule. The inliner converts fp operations just to ensure that later optimizations don't move those operations around.

I didn't think we had code in the tree to convert normal FP instructions into constrained intrinsics. Andy Kaylor had a ticket with code to do this, but I didn't think it ever went in. Where is this code used by the inliner?

I didn't think we had code in the tree to convert normal FP instructions into constrained intrinsics.

I'm not sure if that actually got merged; I was just assuming based on my memory of the discussions. In any case, it's not really relevant to the point I was trying to make.

inlining may require conversion to a form that uses constrained intrinsics

Functions which are not strictfp are allowed to introduce "spurious" fp flag writes; for example, we can flatten control flow that contains floating-point ops. Inlining the function doesn't change that general rule. The inliner converts fp operations just to ensure that later optimizations don't move those operations around.

"spurious" fp flag writes can appear only as a result of transformations, the source code provided to compiler does not have them. Compiler should avoid such transformation at early stages of IR pipeline, otherwise it produces program with incorrect semantics.

inlining may require conversion to a form that uses constrained intrinsics

Functions which are not strictfp are allowed to introduce "spurious" fp flag writes; for example, we can flatten control flow that contains floating-point ops. Inlining the function doesn't change that general rule. The inliner converts fp operations just to ensure that later optimizations don't move those operations around.

I didn't think we had code in the tree to convert normal FP instructions into constrained intrinsics. Andy Kaylor had a ticket with code to do this, but I didn't think it ever went in. Where is this code used by the inliner?

Here it is: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Utils/CloneFunction.cpp#L390

kpn added a comment.Jan 19 2023, 6:47 AM

inlining may require conversion to a form that uses constrained intrinsics

Functions which are not strictfp are allowed to introduce "spurious" fp flag writes; for example, we can flatten control flow that contains floating-point ops. Inlining the function doesn't change that general rule. The inliner converts fp operations just to ensure that later optimizations don't move those operations around.

I didn't think we had code in the tree to convert normal FP instructions into constrained intrinsics. Andy Kaylor had a ticket with code to do this, but I didn't think it ever went in. Where is this code used by the inliner?

Here it is: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Utils/CloneFunction.cpp#L390

Ah, excellent. I misremembered. Thanks for the correction!

But they have different behavior if x is signaling NaN. Although the replacement is proposed for strictfp functions only, inlining may require conversion to a form that uses constrained intrinsics. The resulting code would have different behavior than original.

A transformation that’s valid for !strictfp functions cannot be invalid because of the potential for being inlined into a strictfp function. If this isn’t possible, there is a representational issue the inliner would need to account for.

But they have different behavior if x is signaling NaN. Although the replacement is proposed for strictfp functions only, inlining may require conversion to a form that uses constrained intrinsics. The resulting code would have different behavior than original.

A transformation that’s valid for !strictfp functions cannot be invalid because of the potential for being inlined into a strictfp function. If this isn’t possible, there is a representational issue the inliner would need to account for.

Thinking about this more, I think any situation where this could be an issue would be malformed to begin with. You had a piece of code that was expecting exceptions to be enabled calling into code where exceptions were disabled without turning exceptions off

kpn added a comment.Jan 20 2023, 7:14 AM

But they have different behavior if x is signaling NaN. Although the replacement is proposed for strictfp functions only, inlining may require conversion to a form that uses constrained intrinsics. The resulting code would have different behavior than original.

A transformation that’s valid for !strictfp functions cannot be invalid because of the potential for being inlined into a strictfp function. If this isn’t possible, there is a representational issue the inliner would need to account for.

Thinking about this more, I think any situation where this could be an issue would be malformed to begin with. You had a piece of code that was expecting exceptions to be enabled calling into code where exceptions were disabled without turning exceptions off

If exceptions were turned off, the call was made, and exceptions were turned back on then there would be no correctness issue. Inlining the called function wouldn't change that. The toggling of the exception enablement would be invisible to LLVM. Thus we can't categorically say that a strictfp function calling a !strictfp function is malformed.

The constrained intrinsics that specify the default rounding and exceptions disabled are supposed to behave the same as the normal instructions. They're supposed to have the exact same meaning and behavior. So a conversion of isnan(x) to (x == x) for the constrained intrinsics in the default environment is just as correct as the instructions that do the same thing but aren't constrained intrinsics. Thus inlining of a !strictfp function into a strictfp function with the conversion of FP instructions into constrained intrinsics can't introduce any new correctness issues.

Matt added a subscriber: Matt.Jan 25 2023, 9:08 AM

ping, I have a lot of stuff blocked on this

But they have different behavior if x is signaling NaN. Although the replacement is proposed for strictfp functions only, inlining may require conversion to a form that uses constrained intrinsics. The resulting code would have different behavior than original.

A transformation that’s valid for !strictfp functions cannot be invalid because of the potential for being inlined into a strictfp function. If this isn’t possible, there is a representational issue the inliner would need to account for.

Thinking about this more, I think any situation where this could be an issue would be malformed to begin with. You had a piece of code that was expecting exceptions to be enabled calling into code where exceptions were disabled without turning exceptions off

If exceptions were turned off, the call was made, and exceptions were turned back on then there would be no correctness issue. Inlining the called function wouldn't change that. The toggling of the exception enablement would be invisible to LLVM. Thus we can't categorically say that a strictfp function calling a !strictfp function is malformed.

On most targets turning off FP exceptions means reading content of FP control register, changing value of mask bits and putting the modified register value back. It is expensive operation and cannot be made invisible to LLVM. Some targets (like RISCV) do not have possibility to mask FP exceptions at all, so at IR level there is no way to turn exception off. Default FP environment supposes that the exceptions are ignored, not disabled. In general case they are raised always.

The constrained intrinsics that specify the default rounding and exceptions disabled are supposed to behave the same as the normal instructions. They're supposed to have the exact same meaning and behavior. So a conversion of isnan(x) to (x == x) for the constrained intrinsics in the default environment is just as correct as the instructions that do the same thing but aren't constrained intrinsics. Thus inlining of a !strictfp function into a strictfp function with the conversion of FP instructions into constrained intrinsics can't introduce any new correctness issues.

No matter what exception behavior is specified in constrained intrinsics call, processor instruction will raise FP exceptions. If bits in FP environment specifies trap on Invalid exception, the check x != x may result in crash, if x is signaling NaN. The check isnan(x) in this case is safe. So replacement is_fpclass->fcmp changes semantics and cannot be used here.

Particular cores may implement is_fpclass as CMP instruction, it is made by default lowering, but making such replacement in IR is incorrect.

On most targets turning off FP exceptions means reading content of FP control register, changing value of mask bits and putting the modified register value back. It is expensive operation and cannot be made invisible to LLVM. Some targets (like RISCV) do not have possibility to mask FP exceptions at all, so at IR level there is no way to turn exception off. Default FP environment supposes that the exceptions are ignored, not disabled. In general case they are raised always.

Managing the FP mode is a user responsibility. If the operation in unconstrained we can assume the default floating point environment without preserved exceptions.

The constrained intrinsics that specify the default rounding and exceptions disabled are supposed to behave the same as the normal instructions. They're supposed to have the exact same meaning and behavior. So a conversion of isnan(x) to (x == x) for the constrained intrinsics in the default environment is just as correct as the instructions that do the same thing but aren't constrained intrinsics. Thus inlining of a !strictfp function into a strictfp function with the conversion of FP instructions into constrained intrinsics can't introduce any new correctness issues.

No matter what exception behavior is specified in constrained intrinsics call, processor instruction will raise FP exceptions. If bits in FP environment specifies trap on Invalid exception, the check x != x may result in crash, if x is signaling NaN. The check isnan(x) in this case is safe

Particular cores may implement is_fpclass as CMP instruction, it is made by default lowering, but making such replacement in IR is incorrect.

Floating point exception doesn’t mean trap, but that’s a possible mode. The IR semantics are not dependent on the possible set of lowerings. Ultimately between the user and codegen, this code must never have trapped and may never be transformed to a form that could trap. The replacement fcmp must not trap, but it’s not the middle end’s responsibility to ensure that. This is correct regardless of what any particular processor may do or whatever the source did.

kpn added a comment.Jan 26 2023, 9:54 AM

But they have different behavior if x is signaling NaN. Although the replacement is proposed for strictfp functions only, inlining may require conversion to a form that uses constrained intrinsics. The resulting code would have different behavior than original.

A transformation that’s valid for !strictfp functions cannot be invalid because of the potential for being inlined into a strictfp function. If this isn’t possible, there is a representational issue the inliner would need to account for.

Thinking about this more, I think any situation where this could be an issue would be malformed to begin with. You had a piece of code that was expecting exceptions to be enabled calling into code where exceptions were disabled without turning exceptions off

If exceptions were turned off, the call was made, and exceptions were turned back on then there would be no correctness issue. Inlining the called function wouldn't change that. The toggling of the exception enablement would be invisible to LLVM. Thus we can't categorically say that a strictfp function calling a !strictfp function is malformed.

On most targets turning off FP exceptions means reading content of FP control register, changing value of mask bits and putting the modified register value back. It is expensive operation and cannot be made invisible to LLVM. Some targets (like RISCV) do not have possibility to mask FP exceptions at all, so at IR level there is no way to turn exception off. Default FP environment supposes that the exceptions are ignored, not disabled. In general case they are raised always.

Ok, true, it takes executing code to change the floating point environment. Yes, there would be inline assembly or a function call to change the FP environment. LLVM would see that code because there would be IR for it. All true.

But LLVM wouldn't know what the inline assembly was doing. It wouldn't recognize the function call and thus wouldn't know what it was doing. LLVM would not know the floating point environment had changed. Rephrased, the change in the floating point environment would, to LLVM, be invisible. That's the "invisible" I was referring to earlier.

The constrained intrinsics that specify the default rounding and exceptions disabled are supposed to behave the same as the normal instructions. They're supposed to have the exact same meaning and behavior. So a conversion of isnan(x) to (x == x) for the constrained intrinsics in the default environment is just as correct as the instructions that do the same thing but aren't constrained intrinsics. Thus inlining of a !strictfp function into a strictfp function with the conversion of FP instructions into constrained intrinsics can't introduce any new correctness issues.

No matter what exception behavior is specified in constrained intrinsics call, processor instruction will raise FP exceptions. If bits in FP environment specifies trap on Invalid exception, the check x != x may result in crash, if x is signaling NaN. The check isnan(x) in this case is safe. So replacement is_fpclass->fcmp changes semantics and cannot be used here.

Particular cores may implement is_fpclass as CMP instruction, it is made by default lowering, but making such replacement in IR is incorrect.

Be careful of how IEEE 754 uses the same terminology that a Unix person uses, but the words have different meanings. I'm going to use the term "754 trap" to mean a trap in the IEEE-754 document's use of the term. I'm going to say "Unix trap" when a trap involves transferring of control to the OS.

An FP instruction can "754 trap" but the result may just be changing the FP status bits in the environment to record that something happened. And if we are not using the constrained intrinsics, or we are using them with exceptions "ignore" and rounding "roundtoeven", then we are assumed to not be accessing the FP status bits.

A CPU is allowed to always "Unix trap" and transfer control to the OS. I think you are saying that RISCV does this. But the OS is allowed to fix things up so that the application doesn't observe the CPU's trap. Indeed, in the default FP environment the OS is _required_ to hide the CPU's "Unix trap" from the application. From the application's point of view this is the same as a CPU not doing a "Unix trap" at all. In this case we can treat it the same as a CPU that doesn't trap in the default FP environment.

It's true that a blind replacement of is_fpclass with fcmp would be incorrect _if_ the floating point environment is not the default environment. But if we are in the default FP environment we can assume that any CPU trap will be hidden from us by the OS and therefore is not a part of our discussion of IR correctness.

If exceptions were turned off, the call was made, and exceptions were turned back on then there would be no correctness issue. Inlining the called function wouldn't change that. The toggling of the exception enablement would be invisible to LLVM. Thus we can't categorically say that a strictfp function calling a !strictfp function is malformed.

On most targets turning off FP exceptions means reading content of FP control register, changing value of mask bits and putting the modified register value back. It is expensive operation and cannot be made invisible to LLVM. Some targets (like RISCV) do not have possibility to mask FP exceptions at all, so at IR level there is no way to turn exception off. Default FP environment supposes that the exceptions are ignored, not disabled. In general case they are raised always.

Non-strictfp functions are assumed to be in the default FP environment, which implies there's some form of undefined behavior if you call a non-strictfp function with a non-default FP environment. The precise, formal semantics that effect this rule is of course underdefined,

This is how we define default FP environment in the LangRef today:

The default LLVM floating-point environment assumes that floating-point instructions do not have side effects. Results assume the round-to-nearest rounding mode. No floating-point exception state is maintained in this environment. Therefore, there is no attempt to create or preserve invalid operation (SNaN) or division-by-zero exceptions.

From this definition, it seems clear to me that we are legally allowed to insert instructions that would cause FP exceptions in non-strictfp functions.

To rephrase the rules (as I understand them) in somewhat more precise terms, if you call a non-strictfp function with a non-default FP environment, the values of any FP operation are unspecified, floating-point sticky bits have unspecified values, and if the dynamic FP environment is set to generate hardware traps, the act of *calling* the function is UB (that is, we are permitted to introduce FP operations that may trap in code-paths where none existed). In this understanding, converting a non-strictfp function into a strictfp function with bare FP instructions replaced with constrained intrinsics and round.tonearest and fpexcept.ignore metadata is a valid optimization, but one that narrows the possible semantics (since non-strictfp may introduce FP operations that strictfp may not).

Be careful of how IEEE 754 uses the same terminology that a Unix person uses, but the words have different meanings. I'm going to use the term "754 trap" to mean a trap in the IEEE-754 document's use of the term. I'm going to say "Unix trap" when a trap involves transferring of control to the OS.

An FP instruction can "754 trap" but the result may just be changing the FP status bits in the environment to record that something happened. And if we are not using the constrained intrinsics, or we are using them with exceptions "ignore" and rounding "roundtoeven", then we are assumed to not be accessing the FP status bits.

A CPU is allowed to always "Unix trap" and transfer control to the OS. I think you are saying that RISCV does this. But the OS is allowed to fix things up so that the application doesn't observe the CPU's trap. Indeed, in the default FP environment the OS is _required_ to hide the CPU's "Unix trap" from the application. From the application's point of view this is the same as a CPU not doing a "Unix trap" at all. In this case we can treat it the same as a CPU that doesn't trap in the default FP environment.

I went ahead and looked at the RISC-V specification to see what it does on FP exceptions. The RISC-V instructions only model 754 traps as sticky bits in the fcsr, and there's an explicit note that it provides no way to convert a 754 trap to a Unix trap (to use your terminology).

FWIW, C itself requires that implementations provide FP exception handling as sticky-bits (this is the IEEE 754 default exception handling), with the existence of a Unix trapping mode being an allowable extension (which it declines to specify).

On most targets turning off FP exceptions means reading content of FP control register, changing value of mask bits and putting the modified register value back. It is expensive operation and cannot be made invisible to LLVM. Some targets (like RISCV) do not have possibility to mask FP exceptions at all, so at IR level there is no way to turn exception off. Default FP environment supposes that the exceptions are ignored, not disabled. In general case they are raised always.

Managing the FP mode is a user responsibility. If the operation in unconstrained we can assume the default floating point environment without preserved exceptions.

This is true for rounding direction, denormal behavior or any other FP control mode. They are represented by bits in some registers, may be set/read by appropriate API functions. Exception behavior is a very different thing. There is no register that keeps "current exception behavior". It is only a compiler hint that facilitates code generation. So user has limited means to control exception behavior.

Floating point exception doesn’t mean trap, but that’s a possible mode. The IR semantics are not dependent on the possible set of lowerings. Ultimately between the user and codegen, this code must never have trapped and may never be transformed to a form that could trap.

Absolutely.

The replacement fcmp must not trap, but it’s not the middle end’s responsibility to ensure that. This is correct regardless of what any particular processor may do or whatever the source did.

It is the middle end's responsibility to make only transformations that keep FP semantics. Replacement is_fpclass -> fcmp changes the semantics and can produce incorrect programs.

Ok, true, it takes executing code to change the floating point environment. Yes, there would be inline assembly or a function call to change the FP environment. LLVM would see that code because there would be IR for it. All true.

But LLVM wouldn't know what the inline assembly was doing. It wouldn't recognize the function call and thus wouldn't know what it was doing. LLVM would not know the floating point environment had changed. Rephrased, the change in the floating point environment would, to LLVM, be invisible. That's the "invisible" I was referring to earlier.

Ok, I see what is "invisible". Do you think inline assembly should be decorated as def/use of FP environment similar to function calls as is done in D111433 and D139549?

It's true that a blind replacement of is_fpclass with fcmp would be incorrect _if_ the floating point environment is not the default environment. But if we are in the default FP environment we can assume that any CPU trap will be hidden from us by the OS and therefore is not a part of our discussion of IR correctness.

What about such code?

int get_code(float x) {
  return isnan(x) ? 1 : 2;
}
void func1(float x) {
  int code1 = get_code(x);
  ...
}
#pragma STDC FENV_ACCESS ON
void func2(float x) {
  int code1 = get_code(x);
  ...
}

Acoording to C standard it must work as intended. If compiler replaces isnan with comparison, the code would be broken.

Non-strictfp functions are assumed to be in the default FP environment, which implies there's some form of undefined behavior if you call a non-strictfp function with a non-default FP environment. The precise, formal semantics that effect this rule is of course underdefined,

This is how we define default FP environment in the LangRef today:

The default LLVM floating-point environment assumes that floating-point instructions do not have side effects. Results assume the round-to-nearest rounding mode. No floating-point exception state is maintained in this environment. Therefore, there is no attempt to create or preserve invalid operation (SNaN) or division-by-zero exceptions.

From this definition, it seems clear to me that we are legally allowed to insert instructions that would cause FP exceptions in non-strictfp functions.

LLVM must be able to represent semantics of the supported languages. The code snippet above represents a program that is valid from viewpoint of C standard but would be broken if the middle end makes replacement isnan(x)->x!=x. So such replacement must not be made.

It is the middle end's responsibility to make only transformations that keep FP semantics. Replacement is_fpclass -> fcmp changes the semantics and can produce incorrect programs.

It doesn't change the semantics because the FP exception that the machine fcmp may end up raising can be assumed to be invisible based on the IR semantics. An unconstrained fcmp does not have observable FP exceptions. For any regular non-constrained operation, you simply never need to be concerned with floating point exceptions. They can't be observed or relied on.

Ok, true, it takes executing code to change the floating point environment. Yes, there would be inline assembly or a function call to change the FP environment. LLVM would see that code because there would be IR for it. All true.

But LLVM wouldn't know what the inline assembly was doing. It wouldn't recognize the function call and thus wouldn't know what it was doing. LLVM would not know the floating point environment had changed. Rephrased, the change in the floating point environment would, to LLVM, be invisible. That's the "invisible" I was referring to earlier.

Ok, I see what is "invisible". Do you think inline assembly should be decorated as def/use of FP environment similar to function calls as is done in D111433 and D139549?

It's true that a blind replacement of is_fpclass with fcmp would be incorrect _if_ the floating point environment is not the default environment. But if we are in the default FP environment we can assume that any CPU trap will be hidden from us by the OS and therefore is not a part of our discussion of IR correctness.

What about such code?

int get_code(float x) {
  return isnan(x) ? 1 : 2;
}
void func1(float x) {
  int code1 = get_code(x);
  ...
}
#pragma STDC FENV_ACCESS ON
void func2(float x) {
  int code1 = get_code(x);
  ...
}

Acoording to C standard it must work as intended. If compiler replaces isnan with comparison, the code would be broken.

This code didn't actually manipulate the floating point mode, just made it legal to do so for the scope of the pragma. If you were to insert code to modify the mode to enable FP exceptions inside func2, it would be illegal because you did not restore the mode before calling into code outside of the pragma scope. FENV_ACCESS doesn't require the compiler to fixup the floating point mode for you before calling into other code.

What about such code?

int get_code(float x) {
  return isnan(x) ? 1 : 2;
}
void func1(float x) {
  int code1 = get_code(x);
  ...
}
#pragma STDC FENV_ACCESS ON
void func2(float x) {
  int code1 = get_code(x);
  ...
}

Acoording to C standard it must work as intended. If compiler replaces isnan with comparison, the code would be broken.

From the C standard (7.6.1p2 from draft N3054):

(When execution passes from a part of the program translated with FENV_ACCESS "off" to a part translated with FENV_ACCESS "on", the state of the floating-point status flags is unspecified and the floating-point control modes have their default settings.)

  1. The purpose of the FENV_ACCESS pragma is to allow certain optimizations that could subvert flag tests and mode changes (e.g., global common subexpression elimination, code motion, and constant folding). In general, if the state of FENV_ACCESS is "off", the translator can assume that the flags are not tested, and that default modes are in effect, except where specified otherwise by an FENV_ROUND pragma.

On a strict reading of the standard, on transition from get_code (which has FENV_ACCESS OFF) back to func2, the FP flags are unspecified, which means it is legal to transform the non-flag-setting isnan into flag-setting fcmp. There is a slight wording mismatch, as FENV_ACCESS is about testing, not setting flags, but the footnote seems to indicate that the correct reading is that FENV_ACCESS is necessary if you want guarantees about flags being set or not set.

If it would help, I could contact the CFP study group to see what they think is the correct reading of the standard.

On a strict reading of the standard, on transition from get_code (which has FENV_ACCESS OFF) back to func2, the FP flags are unspecified, which means it is legal to transform the non-flag-setting isnan into flag-setting fcmp. There is a slight wording mismatch, as FENV_ACCESS is about testing, not setting flags, but the footnote seems to indicate that the correct reading is that FENV_ACCESS is necessary if you want guarantees about flags being set or not set.

If it would help, I could contact the CFP study group to see what they think is the correct reading of the standard.

For the purpose of this change, the exact definition of FENV_ACCESS doesn't matter. This is not something that the context of a single instruction can or should consider. This would have to be managed by implicit mode switches inserted by the frontend to maintain the invariant that non-strict functions don't have to be concerned about it. Our strictfp representation would be totally broken if we had to think about this in a non-strict function

For the purpose of this change, the exact definition of FENV_ACCESS doesn't matter. This is not something that the context of a single instruction can or should consider. This would have to be managed by implicit mode switches inserted by the frontend to maintain the invariant that non-strict functions don't have to be concerned about it. Our strictfp representation would be totally broken if we had to think about this in a non-strict function

The underlying question is if it's legal for a compiler to insert a call to a flags-setting instruction in FENV_ACCESS OFF along code paths that never had them (with the answer almost certainly "yes"). We have these semantics for non-strictfp in LLVM.

What about such code?

int get_code(float x) {
  return isnan(x) ? 1 : 2;
}
void func1(float x) {
  int code1 = get_code(x);
  ...
}
#pragma STDC FENV_ACCESS ON
void func2(float x) {
  int code1 = get_code(x);
  ...
}

Acoording to C standard it must work as intended. If compiler replaces isnan with comparison, the code would be broken.

This code didn't actually manipulate the floating point mode, just made it legal to do so for the scope of the pragma. If you were to insert code to modify the mode to enable FP exceptions inside func2, it would be illegal because you did not restore the mode before calling into code outside of the pragma scope. FENV_ACCESS doesn't require the compiler to fixup the floating point mode for you before calling into other code.

Pragma STDC FENV_ACCESS ON may be used not only for code that manipulates control modes, but also when exception status is examined (7.6.1p2):

The FENV_ACCESS pragma provides a means to inform the implementation when a program might
 access the floating-point environment to test floating-point status flags or run under non-default
 floating-point control modes.

A call to fetestexcep could be put somewhere inside func2 to have an access to FP environment, but it does not change the behavior of isnan inside get_code.

From the C standard (7.6.1p2 from draft N3054):

(When execution passes from a part of the program translated with FENV_ACCESS "off" to a part translated with FENV_ACCESS "on", the state of the floating-point status flags is unspecified and the floating-point control modes have their default settings.)

  1. The purpose of the FENV_ACCESS pragma is to allow certain optimizations that could subvert flag tests and mode changes (e.g., global common subexpression elimination, code motion, and constant folding). In general, if the state of FENV_ACCESS is "off", the translator can assume that the flags are not tested, and that default modes are in effect, except where specified otherwise by an FENV_ROUND pragma.

On a strict reading of the standard, on transition from get_code (which has FENV_ACCESS OFF) back to func2, the FP flags are unspecified, which means it is legal to transform the non-flag-setting isnan into flag-setting fcmp. There is a slight wording mismatch, as FENV_ACCESS is about testing, not setting flags, but the footnote seems to indicate that the correct reading is that FENV_ACCESS is necessary if you want guarantees about flags being set or not set.

This statement is not pertinent to this case because the only operation in get_code is a call to isnan, but it must not raise any exception according to the same standard (F3p6):

The C classification macros fpclassify, iscanonical, isfinite, isinf, isnan, isnormal,
issignaling, issubnormal, iszero, and signbit provide the IEC 60559 operations indicated
in the table above provided their arguments are in the format of their semantic type. Then these
macros raise no floating-point exceptions, even if an argument is a signaling NaN.

and this property must be kept no matter if exceptions are ignored or not.

In a broader context, it looks natural that a non-strictfp function may lose some exceptions that strictfp function would raise. But it would be counterintuitive if it raised new exceptions, which cannot be observed in strictfp function.

I do not believe there's any problem with an isnan to fcmp transform -- as long as we restrict it to non-strictfp functions. And I'm not sure why there's so much debate here. I thought all those sorts of semantic questions were already pretty well settled. Does the documentation about strictfp vs not-strictfp semantics need to be made clearer?

The C classification macros fpclassify, iscanonical, isfinite, isinf, isnan, isnormal,
issignaling, issubnormal, iszero, and signbit provide the IEC 60559 operations indicated
in the table above provided their arguments are in the format of their semantic type. Then these
macros raise no floating-point exceptions, even if an argument is a signaling NaN.

and this property must be kept no matter if exceptions are ignored or not.

No, the property does not need to be maintained if the exception flag state cannot be observed by correct code. And that's exactly the case at hand in FENV_ACCESS OFF code (and, correspondingly, non-strictfp in LLVM IR). In such code, you are not allowed to have trap-on-exception enabled, nor are you allowed to access the floating-point exception status flags, and, the state of the status flags is unspecified upon transitioning back to FENV_ACCESS ON code. Thus, within such code, we may set the status flags to any arbitrary value we wish -- nobody can (correctly) tell the difference.

In a broader context, it looks natural that a non-strictfp function may lose some exceptions that strictfp function would raise. But it would be counterintuitive if it raised new exceptions, which cannot be observed in strictfp function.

That's not a problem. It is definitely permissible for a non-strictfp function to raise an FP-exceptions that "should not" be raised. That this is permissible is one of the most important properties of the non-strict operations -- it allows code motion which is impermissible if the status-flag side effects must be taken into account. (e.g. speculating an "fadd" above a conditional).

jyknight added inline comments.Jan 31 2023, 4:06 PM
llvm/include/llvm/IR/IRBuilder.h
2437 ↗(On Diff #483955)

I don't think this function should be added -- it emits the wrong thing for strictfp mode, but also just doesn't seem useful.

That we sometimes canonicalize to fcmp is just an implementation detail -- the CreateFCmpUNO call can just be done inline in foldIntrinsicIsFPClass.

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
824

That may be true. However, on architectures that have an fpclass instruction, it could be profitable to canonicalize other operations into llvm.is.fpclass operations, especially if that then allows merging multiple llvm.is.fpclass calls into one.

E.g. bool a = isinf(x) || isnan(x) can turn into a single instruction llvm.is.fpclass(x, snan/qnan/pinf/ninf).

(However, this isn't an objection to taking this patch for now -- revisiting the canonical form later is always possible.)

jyknight added inline comments.Jan 31 2023, 5:25 PM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
824

(Sorry, I just noticed you already have another patch which touches upon that already. OK!)

The C classification macros fpclassify, iscanonical, isfinite, isinf, isnan, isnormal,
issignaling, issubnormal, iszero, and signbit provide the IEC 60559 operations indicated
in the table above provided their arguments are in the format of their semantic type. Then these
macros raise no floating-point exceptions, even if an argument is a signaling NaN.

and this property must be kept no matter if exceptions are ignored or not.

No, the property does not need to be maintained if the exception flag state cannot be observed by correct code. And that's exactly the case at hand in FENV_ACCESS OFF code (and, correspondingly, non-strictfp in LLVM IR). In such code, you are not allowed to have trap-on-exception enabled, nor are you allowed to access the floating-point exception status flags, and, the state of the status flags is unspecified upon transitioning back to FENV_ACCESS ON code. Thus, within such code, we may set the status flags to any arbitrary value we wish -- nobody can (correctly) tell the difference.

What about the original case, when isnan is used in inline function?

inline int get_code(float x) {
  return isnan(x) ? 1 : 2;
}
#pragma STDC FENV_ACCESS ON
void func2(float x) {
  int code1 = get_code(x);
  ...
}

Can this function, defined as non-strictfp, raise Invalid exception when inlined into strictfp function?

In a broader context, it looks natural that a non-strictfp function may lose some exceptions that strictfp function would raise. But it would be counterintuitive if it raised new exceptions, which cannot be observed in strictfp function.

That's not a problem. It is definitely permissible for a non-strictfp function to raise an FP-exceptions that "should not" be raised. That this is permissible is one of the most important properties of the non-strict operations -- it allows code motion which is impermissible if the status-flag side effects must be taken into account. (e.g. speculating an "fadd" above a conditional).

There is a use case, when a computation is made with traps allowed for errors (Invalid, Overflow, DivideByZero). If an error occurs, computation is stopped. It saves from checks in multiple places during computation. The computation runs in default mode, because rounding mode is standard and precise exception flag behavior is unimportant. If non-strictfp functions are allowed to throw spurious exceptions, such solution won't work. Running computations in strictfp mode is not an option because of poor performance.

arsenm added a comment.Feb 1 2023, 3:39 AM

Can this function, defined as non-strictfp, raise Invalid exception when inlined into strictfp function?

Inlining cannot change semantics. The function can, but not from the part derived from the inlined callee

arsenm updated this revision to Diff 493895.Feb 1 2023, 3:43 AM
arsenm marked an inline comment as done.

Drop IRBuilder helper

What about the original case, when isnan is used in inline function?

inline int get_code(float x) {
  return isnan(x) ? 1 : 2;
}
#pragma STDC FENV_ACCESS ON
void func2(float x) {
  int code1 = get_code(x);
  ...
}

Can this function, defined as non-strictfp, raise Invalid exception when inlined into strictfp function?

Since the function is non-strictfp, it's legal to optimize it in a way to raise spurious exceptions. If the isnan is retained at the time the function is inlined into a strictfp function, then it is no longer possible to optimize the copy of isnan in the strictfp function to raise spurious exceptions. But an optimization pass that runs before the inliner may still introduce spurious exceptions.

There is a use case, when a computation is made with traps allowed for errors (Invalid, Overflow, DivideByZero). If an error occurs, computation is stopped. It saves from checks in multiple places during computation. The computation runs in default mode, because rounding mode is standard and precise exception flag behavior is unimportant. If non-strictfp functions are allowed to throw spurious exceptions, such solution won't work. Running computations in strictfp mode is not an option because of poor performance.

Enabling hardware traps for FP exceptions makes the FP environment no longer in the default mode, which requires the computation to be done in a strictfp function to have any defined behavior.

As far as I can tell, this conversation has run its course, and I don't think there were any more comments on the actual code, so I'm going to accept this revision.

Use-cases that require traps or otherwise observing floating-point status flags must generate strictfp functions and constrained intrinsics. (E.g. in Clang, you might wish to compile with -ffp-exception-behavior=maytrap). That was true before this change, too.

If the performance of strictfp code is not as good as desired, then I'm sure there's more work that can be done optimizing it -- but there is a reason it's a separate mode.

jyknight accepted this revision.Feb 4 2023, 6:56 PM
This revision is now accepted and ready to land.Feb 4 2023, 6:56 PM