Page MenuHomePhabricator

Add support for options -frounding-math, -ftrapping-math, -ffp-model=, and -ffp-exception-behavior=, : Specify floating point behavior
ClosedPublic

Authored by mibintc on May 31 2019, 7:01 AM.

Details

Summary

Intel would like to contribute a patch to implement support for these Intel- and Microsoft -fp options. This message is to describe the options and request feedback from the community.
-frounding-math (supported by gcc and ICC)
-fp-model=[precise|strict|fast] and -fp-exception-behavior=[ignore|maytrap|strict]

This contribution dovetails with the llvm patch "Teach the IRBuilder about constrained fadd and friends". The motivation for providing these is that having umbrella options such as -fp-model= to control most basic FP options is better and easier to understand for users.

The option settings -fp-model=[precise|strict|fast] are supported by both ICC and CL. The CL and ICC -fp-model option is documented on these pages:

https://docs.microsoft.com/en-us/cpp/build/reference/fp-specify-floating-point-behavior?view=vs-2019
https://software.intel.com/en-us/cpp-compiler-developer-guide-and-reference-fp-model-fp

Currently, clang's default behavior corresponds to -fp-model=precise. Clang/llvm support for -fp-model=strict and
-fp-exception-behavior= was developed in the D53157 patch, and there is current llvm support for the fast settings by using the fast math flags llvm::FastMathFlags. Note: the clang-cl wrapper to support Microsoft options has simplified support for these options by mapping /fp-model=except to ftrapping-math, fp-mdel=fast to ffast-math, fp-model=precise and fp-model=strict to fno-fast-math (see clang/Driver/CLCompatOptions.td).

These are the settings for -fp-model=

precise - Disables optimizations that are not value-safe on floating-point data, although FP contraction is enabled.
strict - Enables precise and except, disables contractions (FMA), and enables pragma stdc fenv_access.   [Note: fenv_access not currently supported in clang]
fast - Equivalent to -ffast-math

What follows here is Microsoft /fp documentation from the msdn site. It's copied here solely for the purposes of saving
information for the future in case the information is removed from the msdn site:

/fp (Specify floating-point behavior)

Specifies how the compiler treats floating-point expressions, optimizations, and exceptions. The /fp options specify whether the generated code allows floating-point environment changes to the rounding mode, exception masks, and subnormal behavior, and whether floating-point status checks return current, accurate results. It controls whether the compiler generates code that maintains source operation and expression ordering and conforms to the standard for NaN propagation, or if it instead generates more efficient code that may reorder or combine operations and use simplifying algebraic transformations that are not allowed by the standard.

Syntax

/fp:[precise | strict | fast | except[-]]

Arguments

precise

By default, the compiler uses /fp:precise behavior.

Under /fp:precise the compiler preserves the source expression ordering and rounding properties of floating-point code when it generates and optimizes object code for the target machine. The compiler rounds to source code precision at four specific points during expression evaluation: at assignments, at typecasts, when a floating-point argument is passed to a function call, and when a floating-point value is returned from a function call. Intermediate computations may be performed at machine precision. Typecasts can be used to explicitly round intermediate computations.

The compiler does not perform algebraic transformations on floating-point expressions, such as reassociation or distribution, unless the transformation is guaranteed to produce a bitwise identical result.
Expressions that involve special values (NaN, +infinity, -infinity, -0.0) are processed according to IEEE-754 specifications. For example, x != x evaluates to true if x is NaN. Floating-point *contractions*, that is, machine instructions that combine floating-point operations, may be generated under /fp:precise.

The compiler generates code intended to run in the [default floating-point environment](#the-default-floating-point-environment) and assumes that the floating-point environment is not accessed or modified at runtime. That is, it assumes that the code does not unmask floating-point exceptions, read or write floating-point status registers, or change rounding modes.

If your floating-point code does not depend on the order of operations and expressions in your floating-point statements (for example, if you don't care whether a * b + a * c is computed as (b + c) * a or 2 * a as a + a), consider the [/fp:fast](#fast) option, which can produce faster, more efficient code. If your code both depends on the order of operations and expressions, and accesses or alters the floating-point environment (for example, to change rounding modes or to trap floating-point exceptions), use [/fp:strict](#strict).

strict

/fp:strict has behavior similar to /fp:precise, that is, the compiler preserves the source ordering and rounding properties of floating-point code when it generates and optimizes object code for the target machine, and observes the standard when handling special values. In addition, the program may safely access or modify the floating-point environment at runtime.

Under /fp:strict, the compiler generates code that allows the program to safely unmask floating-point exceptions, read or write floating-point status registers, or change rounding modes. It rounds to source code precision at four specific points during expression evaluation: at assignments, at typecasts, when a floating-point argument is passed to a function call, and when a floating-point value is returned from a function call. Intermediate computations may be performed at machine precision. Typecasts can be used to explicitly round intermediate computations. The compiler does not perform algebraic transformations on floating-point expressions, such as reassociation or distribution, unless the transformation is guaranteed to produce a bitwise identical result. Expressions that involve special values (NaN, +infinity, -infinity, -0.0) are processed according to IEEE-754 specifications. For example, x != x evaluates to true if x is NaN. Floating-point contractions are not generated under /fp:strict.

/fp:strict is computationally more expensive than /fp:precise because the compiler must insert additional instructions to trap exceptions and allow programs to access or modify the floating-point environment at runtime. If your code doesn’t use this capability, but requires source code ordering and rounding, or relies on special values, use /fp:precise. Otherwise, consider using /fp:fast, which can produce faster and smaller code.

fast

The /fp:fast option allows the compiler to reorder, combine, or simplify floating-point operations to optimize floating-point code for speed and space. The compiler may omit rounding at assignment statements, typecasts, or function calls. It may reorder operations or perform algebraic transforms, for example, by use of associative and distributive laws, even if such transformations result in observably different rounding behavior. Because of this enhanced optimization, the result of some floating-point computations may differ from those produced by other /fp options. Special values (NaN, +infinity, -infinity, -0.0) may not be propagated or behave strictly according to the IEEE-754 standard. Floating-point contractions may be generated under /fp:fast. The compiler is still bound by the underlying architecture under /fp:fast, and additional optimizations may be available through use of the [/arch](arch-minimum-cpu-architecture.md) option.

Under /fp:fast, the compiler generates code intended to run in the default floating-point environment and assumes that the floating-point environment isn’t accessed or modified at runtime. That is, it assumes that the code does not unmask floating-point exceptions, read or write floating-point status registers, or change rounding modes.

/fp:fast is intended for programs that do not require strict source code ordering and rounding of floating-point expressions, and do not rely on the standard rules for handling special values such as NaN. If your floating-point code requires preservation of source code ordering and rounding, or relies on standard behavior of special values, use [/fp:precise](#precise). If your code accesses or modifies the floating-point environment to change rounding modes, unmask floating-point exceptions, or check floating-point status, use [/fp:strict](#strict).

except

The /fp:except option generates code to ensures that any unmasked floating-point exceptions are raised at the exact point at which they occur, and that no additional floating-point exceptions are raised. By default, the /fp:strict option enables /fp:except, and /fp:precise does not. The /fp:except option is not compatible with /fp:fast. The option can be explicitly disabled by us of /fp:except-.

Note that /fp:except does not enable any floating-point exceptions by itself, but it is required for programs to enable floating-point exceptions. See [_controlfp](../../c-runtime-library/reference/control87-controlfp-control87-2.md) for information on how to enable floating-point exceptions.

Remarks

Multiple /fp options can be specified in the same compiler command line. Only one of /fp:strict, /fp:fast, and /fp:precise options can be in effect at a time. If more than one of these options is specified on the command line, the later option takes precedence and the compiler generates a warning.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
mibintc updated this revision to Diff 229893.Nov 18 2019, 12:02 PM
mibintc added a reviewer: kpn.
Herald added a project: Restricted Project. · View Herald TranscriptNov 18 2019, 12:02 PM

I added a nullptr check in IRBuilder.h and a test case to cover the segfault reported in https://bugs.llvm.org/show_bug.cgi?id=44048

kpn added a comment.Nov 18 2019, 12:18 PM

Does anyone think a warning is appropriate because the new flags are exercising experimental, incomplete code in both clang and llvm? The warning would be removed when we believe the feature is complete and ready to use.

llvm/include/llvm/IR/IRBuilder.h
260

This looks reasonable to me.

It smells like there's a larger strictfp IRBuilder issue, but that's not an issue for this patch here. The larger issue won't be hit since the new options affect the entire compilation. It therefore shouldn't block this patch.

rjmccall added inline comments.Nov 18 2019, 12:23 PM
llvm/include/llvm/IR/IRBuilder.h
260

Does IRBuilder actually support inserting into an unparented basic block? I feel like this is exposing a much more serious mis-use of IRBuilder.

In D62731#1750412, @kpn wrote:

Does anyone think a warning is appropriate because the new flags are exercising experimental, incomplete code in both clang and llvm? The warning would be removed when we believe the feature is complete and ready to use.

@kpn Can you say more about "incomplete code in ... clang". I don't know what's missing from clang.

kpn added a comment.Nov 18 2019, 1:11 PM
In D62731#1750412, @kpn wrote:

Does anyone think a warning is appropriate because the new flags are exercising experimental, incomplete code in both clang and llvm? The warning would be removed when we believe the feature is complete and ready to use.

@kpn Can you say more about "incomplete code in ... clang". I don't know what's missing from clang.

See D70256. Calls to clang's math builtins that are llvm intrinsics need to be changed to be calls to the constrained intrinsics. I've been waiting to submit the patches because they weren't testable without these command line options.

There may be other issues as well. I'm not sure.

michele.scandale added inline comments.
clang/lib/Driver/ToolChains/Clang.cpp
2418

Isn't the default -fno-rounding-math?

2426

Shouldn't this be set to true similarly to what you do for TrappingMathPresent to track whether there is an explicit option related to rounding math?

2595

Running clang -### -ftrapping-math -ffp-exception-behavior=ignore lead to this assertion to fail. As far as I can see TrappingMath is not changed in the case FPExceptionBehavior is "ignore" or "maytrap".
Clearly in the "ignore" case it should be safe to just set TrappingMath to false, but I'm not sure about the "maytrap" case.
It seems that -ffp-exception-behavior is more general than -f{,no-}trapping-math, so it seems natural to me to see ftrapping-math and foo-trapping-math as aliases for ffp-exception-behavior=strict and ffp-exception-behavior=ignore respectively. If we agree on this, then I would expect the reasoning inside the compiler only in terms of FPExceptionBehavior.

2597

With this change if I run clang -### -ffast-math test.c I don't see -fno-trapping-math passed to the CC1.
This is changing the changes the value of the function level attribute "no-trapping-math" (see lib/CodeGen/CGCall.cpp : 1747).
Is this an intended change?

Moreover since with this patch the default value for trapping math changed, the "no-trapping-math" function level attribute is incorrect also for default case.

kpn added inline comments.Nov 19 2019, 6:06 AM
llvm/include/llvm/IR/IRBuilder.h
260

I suspect you are correct. If we let this "F && " change go in we'll have a situation where whether or not a block is currently in a function when a function call is emitted will affect whether or not the eventual function definition gets the strictfp attribute. That seems like an unfortunate inconsistency.

I'm still looking into it. I hope to have an IRBuilder review up today or tomorrow.

mibintc marked 4 inline comments as done.Nov 19 2019, 12:02 PM

inline replies to Michele, will upload a new patch shortly

clang/lib/Driver/ToolChains/Clang.cpp
2418

Yes the default is no rounding math, I'll remove the comment. Thank you.

2426

There's a switch statement above this that interprets the command line option -fp-model=strict as though frounding had appeared on the command line by assigning a new value to optID so that's why there is a discrepancy. Also I'm using the *Present boolean variables to preserve the output from Driver so that pre-existing driver test cases don't need to be changed.

2595

Thanks for pointing out this assertion failure, I will upload a patch with fix. Yes we could entirely express ftrapping-math and fno-trapping-math via the ffp-exception-behavior= option. That would probably be better--currently the trapping option becomes effective via the exception behavior parameter to the llvm floating point constrained intrinsics, and it can take 3 values. I thought it would be too radical at the moment, so I didn't propose that in this patch.

In the patch I'm about to add, I added a test case for the assertion that you saw.

2597

Before this patch, ftrapping-math was added to the Driver and also a bitfield, `NoTrappingFPMath was created in the LLVM to describe the state of trapping-math, but otherwise that bit wasn't consulted and the option had no effect. Gcc describes ftrapping-math as the default, but in llvm by default floating point exceptions are masked and this corresponds to the floating point Constrained Intrinsics having exception behavior set to ignored. This patch changed the llvm constructor to set the trapping bit to "no trap". In fact I'd like to get rid of the NoTrappingFPMath` bitfield since it's not being used, but I didn't make that change at this point.

If I remember correctly, there are a bunch of driver tests that failed if fno-trapping-math is output to cc1. I'd have to reconstruct the details. Since fno-trapping-math is the default, it isn't passed through on the cc1 command line: the Clang.cpp driver doesn't pass through the positive and negative for each existing option.

Thanks for pointing out the line in CGCall.cpp, it seems the CodeGenOpts aren't getting set up perfectly I'll fix that in CompilerInvocation.cpp; I don't see anything setting trapping-math as part of function level attribute, @michele.scandale did I overlook that/can you point out where that is?

llvm/include/llvm/IR/IRBuilder.h
260

As I just commented on the related patch @kpn posted, it appears that IRBuilder doesn't entirely support inserting into an unparented block. I was surprised by this, but there are places that need to be able to get to the Module from the BasicBlock. So, I think something problematic may be happening in the failing case.

mibintc updated this revision to Diff 230128.Nov 19 2019, 12:41 PM

Here's an update in response to comments from @michele.scandale
I fixed the assertion error and added a test case
I fixed the setting of ftrapping-math in CodeGenOpts
I deleted an incorrect comment
I added a diagnostic when -fp-exception-behavior= is overridden on the command line by f[no-]trapping-math
I updated one of the test cases to work with some small modifications that will be made to IRBuilder.h

clang/lib/Driver/ToolChains/Clang.cpp
2597

I guess you are referring to the code in TargetMachine.cpp where the function level attributes are used to reset the TargetOptions state whenever we initiate the backend codegen for a given function. Considering that the trapping math option as stated in the documentation did not have any effect, I'm not surprised to see not many uses. The only one I can see is in llvm/lib/Target/ARM/ARMAsmPrinter.cpp : 687 where the function level attribute affects the emission of some ARM specific attributes.

My only concern was that the change of the default value for trapping math was not propagated entirely causing this function level attribute to be initialized incorrectly.
Fixing the logic in CompilerInvocation.cpp considering the change of default it is fine for me.

Given that ffp-exception-behavior={ignore,maytrap,strict} supersedes -f{,no-}trapping-math I would expect long term to see the internal state of the compiler frontend to only care about the new state FPExceptionBehavior for both language and code generation options. And I guess the same would apply to the backend stage as well.

clang/lib/Driver/ToolChains/Clang.cpp
2356

Here it seems you are changing optID to OPT_ffast_math to reuse the logic specified below for that case to reset the state of the floating point options.

2364

Here the state of the floating point options seems unchanged except for FPContract. If I run clang -ffp-model=fast -ffp-model=precise, I would expect the state of the floating point options to match the one of -fno-fast-math except for FPContract which you want to be set to "fast".

I think you might need to replicate the reset for all the option here as well, so at this point I don't know how much worth is to use the optID reset trick for the "fast" case only.

mibintc updated this revision to Diff 231758.Dec 2 2019, 12:20 PM

This commit was reverted in 30e7ee3c4bac because a null deref error occurred in IRBuilder.h when setting strictfp attribute, see https://bugs.llvm.org/show_bug.cgi?id=44048 for information about that bug.
This patch moves setting strictfp from IRBuilder into clang/CodeGen. Also addresses some code review comments that were received after the revert. I'll add some inline comments next.

mibintc marked 9 inline comments as done.Dec 2 2019, 12:38 PM

I added inline comments describing what I did in this version of the patch to address the bug https://bugs.llvm.org/show_bug.cgi?id=44048

clang/include/clang/AST/DeclBase.h
1539

This corresponds to "strictfp" LLVM attribute. I add this here because I want to collect the information during Sema and set the attribute during CodeGen. The next thing I want to do is to add support for modifying float_control via pragma within function bodies (enable floating point control at the block level). If I wasn't preparing to support floating_control via statement-level pragma then setting the bit could be accomplished entirely within CodeGen.

1560

Need to adjust the number of bits here, because it's at the threshold of overrunning 64 bits.

clang/include/clang/Basic/DiagnosticDriverKinds.td
444

@kpn thought it would be a good idea to add a Warning that the implementation of float control is experimental and partially implemented. That's what this is for.

clang/lib/Driver/ToolChains/Clang.cpp
2364

@michele.scandale Thanks for your helpful review, I think I fixed the things that you remarked on. I also added a test case for the assertion fail that you saw.

clang/test/CodeGen/fpconstrained.cpp
10

This is the test case from the bug report (null deref/segfault/in IRBuilder)

llvm/include/llvm/IR/IRBuilder.h
268

@kpn I got rid of this line because the function attribute is being set in CodeGen

llvm/unittests/IR/IRBuilderTest.cpp
186

@kpn I changed the test to create the function attribute a priori since it will be set in CodeGen before passing to IRBuilder

kpn added inline comments.Dec 2 2019, 12:52 PM
llvm/include/llvm/IR/IRBuilder.h
268

Makes sense.

llvm/unittests/IR/IRBuilderTest.cpp
186

Right, of course.

I'm not going to quibble over the use of auto. It's fine I think.

clang/lib/Driver/ToolChains/Clang.cpp
2364

Thanks!

mibintc marked an inline comment as done.Dec 4 2019, 11:34 AM

I've pushed the updated patch,
commit cdbed2dd856c14687efd741c2d8321686102acb8

The tests fail on Windows, http://45.33.8.238/win/3405/step_6.txt

Please take a look, and if it takes a while to fix please revert while you investigte.

I reverted it again because build break on windows

The tests fail on Windows, http://45.33.8.238/win/3405/step_6.txt

Please take a look, and if it takes a while to fix please revert while you investigte.

It appears that only the 1st failure there is the fault of this patch. The 2nd seems to have come from some openmp patch (that didn't consider dso_local on windows).

The first (fpconstrained.cpp) likely just needs the check-lines to NOT explicitly say the %4/%5 and capture the loads of those with a wildcard instead.

It appears that only the 1st failure there is the fault of this patch. The 2nd seems to have come from some openmp patch (that didn't consider dso_local on windows).

The first (fpconstrained.cpp) likely just needs the check-lines to NOT explicitly say the %4/%5 and capture the loads of those with a wildcard instead.

Yes, sorry, I pasted the wrong link. http://45.33.8.238/win/3402/step_6.txt is the one for this commit, and it has just one of the two failures.

I fixed the lit test problem and pushed it again

commit 7f9b5138470db1dc58f3bc05631284c653c9ed7a
Author: Melanie Blower <melanie.blower@intel.com>

I've noticed you removed the change for CompilerInvocation.cpp about the initialization of the codegen option NoTrappingMath. Was that an accident?

I've noticed you removed the change for CompilerInvocation.cpp about the initialization of the codegen option NoTrappingMath. Was that an accident?

I checked the old and new version of the patch and it seems like initialization of NoTrappingMath is unchanged, the definition of the option has it default to 0, and CompilerInvocation.cpp sets it like this in ParseLangArgs, was it something else you were looking at?
Opts.NoTrappingMath = Args.hasArg(OPT_fno_trapping_math);

michele.scandale added a comment.EditedDec 10 2019, 4:28 PM

I've noticed you removed the change for CompilerInvocation.cpp about the initialization of the codegen option NoTrappingMath. Was that an accident?

I checked the old and new version of the patch and it seems like initialization of NoTrappingMath is unchanged, the definition of the option has it default to 0, and CompilerInvocation.cpp sets it like this in ParseLangArgs, was it something else you were looking at?
Opts.NoTrappingMath = Args.hasArg(OPT_fno_trapping_math);

In the driver code you don't always pass -fno-trapping-math, therefore when when the compiler setup the CodeGen options in ParseCodeGenArgs you will end up executing Opts.NoTrappingMath = Args.hasArg(OPT_fno_trapping_math); hence you will have that Opt.NoTrappingMath = false. This is inconsistent with the state of the compiler driver where no-trapping-math is enabled default.

If you want to keep the default of the CC1 different than the default of the compiler driver, that's fine to me, but in that case the compiler driver needs to pass -fno-trapping-math to the CC1.
If we want the same new default, then the logic in ParseCodeGenArgs must be updated.

I've noticed you removed the change for CompilerInvocation.cpp about the initialization of the codegen option NoTrappingMath. Was that an accident?

Thanks again Michele. I'd like to get rid of Opts.NoTrappingMath, but I haven't been bold enough yet. NoTrappingMath is not expressive enough because it can hold only 2 values, whereas the Exception behavior can be ignore, strict or maytrap. So I'd get rid of that Opts field, and the only place where I see it actually being used is in llvm/lib/Target/ARM/ARMAsmPrinter.cpp and the change in this patch doesn't seem to affect the ARM logic so I think if I got rid of it, it would be OK. All the other instances of the string are in llvm test cases.

I've noticed you removed the change for CompilerInvocation.cpp about the initialization of the codegen option NoTrappingMath. Was that an accident?

Thanks again Michele. I'd like to get rid of Opts.NoTrappingMath, but I haven't been bold enough yet.

I don't see a problem with this, but it would be nice to make the -f[no-]trapping-math command line option work. GNU compatibility is good.

rupprecht added inline comments.
clang/lib/Sema/SemaExpr.cpp
13047

Looks like this is leftover debugging? I'm seeing log spam compiling some files -- this message repeated hundreds of times. I'll go ahead and create a patch that nukes this.

rupprecht marked an inline comment as done.Dec 11 2019, 4:55 PM
rupprecht added inline comments.
clang/lib/Sema/SemaExpr.cpp
13047

Sorry for the noise, looks like f4a7d5659df7cb56c1baa34a39e9fe2639472741 already did this.

I don't see a problem with this, but it would be nice to make the -f[no-]trapping-math command line option work. GNU compatibility is good.

Thanks Cameron, I'll go that way

rupprecht added inline comments.Dec 12 2019, 12:51 PM
clang/include/clang/Basic/DiagnosticDriverKinds.td
444

Instead of adding a warning, I'd like to propose -frounding-math not be enabled unless an additional flag (e.g. -fexperimental-float-control) is passed. Or maybe this feature should be called -f[no-]experimental-rounding-math instead.

There are plenty of builds that are already specifying -frounding-math (e.g. they also support building w/ a compiler such as gcc that implements this), and adding this experimental/incomplete implementation is a surprise to those builds.

If I'm wrong and it's completely safe to ignore the warning (maybe it's incomplete but not in any unsafe way), we should just not have it at all.

clang/include/clang/Basic/DiagnosticDriverKinds.td
444

You raise an interesting point about people who might be using -frounding-math already. However, if they are using this flag because they also sometimes build with a compiler such as gcc that supports the flag, they are currently getting incorrect behavior from clang. Without this patch, clang silently ignores the option and the optimizer silently ignores the fact that the program may be changing the rounding mode dynamically. The user may or may not be aware of this.

With this patch such a user is likely to observe two effects: (1) their code will suddenly get slower, and (2) it will probably start behaving correctly with regard to rounding mode changes. The rounding behavior will definitely not get worse. I think the warning is useful as an indication that something has changed. I don't think requiring an additional option should be necessary.

rupprecht added inline comments.Dec 12 2019, 4:19 PM
clang/include/clang/Basic/DiagnosticDriverKinds.td
444

However, if they are using this flag because they also sometimes build with a compiler such as gcc that supports the flag, they are currently getting incorrect behavior from clang

Incorrect, yes, but also likely valid behavior. "experimental" seems to imply a miscompile when using this option should not be unexpected.

As I suggested before: if I'm wrong, and this behavior is only going to make the code more correct (as you suggest), can we remove the warning that this must be ack'd explicitly by adding -Wno-experimental-float-control to builds? I don't understand the motivation for the warning.

Currently we emit a warning if you use -frounding-math, saying that the option is ignored. That should have alerted users that they're not getting the correct behavior now. This patch removes the warning and (IIUC) implements the correct behavior but is over-conservative. If that's correct, I think that's acceptable and we don't need an "experimental" flag or a warning.

Currently we emit a warning if you use -frounding-math, saying that the option is ignored. That should have alerted users that they're not getting the correct behavior now. This patch removes the warning and (IIUC) implements the correct behavior but is over-conservative. If that's correct, I think that's acceptable and we don't need an "experimental" flag or a warning.

Oh, I didn't realize we were already warning about that. In theory, we should handle rounding math correctly with this change. It's possible we've missed some things, but I suppose that's always true. I think there are a few general intrinsics left that need constrained versions but don't have them, and we don't have any solution yet for target-specific intrinsics. If any of those have special handling that assumes the default rounding mode we will get it wrong. I don't think most users would be likely to encounter a problem.

Currently we emit a warning if you use -frounding-math, saying that the option is ignored. That should have alerted users that they're not getting the correct behavior now. This patch removes the warning and (IIUC) implements the correct behavior but is over-conservative. If that's correct, I think that's acceptable and we don't need an "experimental" flag or a warning.

Oh, I didn't realize we were already warning about that. In theory, we should handle rounding math correctly with this change. It's possible we've missed some things, but I suppose that's always true. I think there are a few general intrinsics left that need constrained versions but don't have them, and we don't have any solution yet for target-specific intrinsics. If any of those have special handling that assumes the default rounding mode we will get it wrong. I don't think most users would be likely to encounter a problem.

Hmm. The target-specific intrinsics thing does concern me, since I assume many targets have a bunch of vector intrinsics that may be sensitive to rounding. Do we have an idea of how we'd fix that? If it's a short-term incorrectness that we have a plan to fix, I don't mind the risk, but if we don't know how we'd even start to address it...

Hmm. The target-specific intrinsics thing does concern me, since I assume many targets have a bunch of vector intrinsics that may be sensitive to rounding. Do we have an idea of how we'd fix that? If it's a short-term incorrectness that we have a plan to fix, I don't mind the risk, but if we don't know how we'd even start to address it...

I see two potential problem cases with the target-specific intrinsics:

  1. Some optimization pass recognizes the intrinisic and uses its semantics to perform some optimization such as constant folding
  2. Some optimization performs code motion that moves the intrinsic (or, in the backend, the instruction it represents) across an operation that changes the rounding mode

I don't know if there are any instances of the first case in the public repository. Downstream users could be doing it. Those will need special handling if they exist (checking for the the strictfp attribute).

The second case should be handled in IR by fesetround() or other such intrinsics being marked as having side effects. It's possible that there are target-specific intrinsics to change the rounding mode that aren't marked as having side effects, but if so that's simply a bug. The other part of this is that the intrinsic might be lowered to MC and the MC instructions in a way that neglects rounding mode. Many targets have instructions with forms that take an explicit rounding mode argument and the backends may be using that with the default rounding mode. I am not aware of any such case, but it's definitely possible.

Finally, our design for handling strict fp mode in the backends is that rounding mode control will be handled by explicitly modeling the dependency between the relevant control registers and instructions that implicitly use the rounding mode controled by those registers. X86 only recently started doing this. There may be other backends that have not implemented it. Some may never do so.

I don't have a strong preference about what to do with the warning. I have a slight preference for replacing the existing warning with a more specific warning saying that floating math support is a work in progress. Eventually we need a way for backends to indicate that they believe their support is complete.

I believe your analysis on the second point is unfortunately missing half the problem. Functions like fesetround will be treated as having arbitrary side-effects by default, which mean arbitrary code can't be reordered with calls to them. It'd be good to model that more precisely — to flag that the *only* side-effect they have is changing the FP state — but that's not really the big problem; that's just enabling better optimization by e.g. allowing them to be reordered with non-FP code. The big problem is that intrinsic calls are not arbitrary code: the vast majority of intrinsics (e.g. all the ARM vector intrinsics, many of which can be floating-point) are marked IntrNoMem (which I believe corresponds to readnone), which means calls to those intrinsics can be reordered with other code whether or not that code has arbitrary side-effects. It's good that people are looking at achieving better modeling for the x86 backend, but we need to have a plan that doesn't require heroic effort just to get basic correctness.

I would suggest that we need a function/call attribute roughly on the level of readonly / readnone, maybe readfponly, that says that a function has no side-effects and no dependencies on anything *except* the FP state. Basic queries like Instruction::mayReadMemory() that are supposed to be used generically in code-motion transforms would then return true for calls marked that way only if they're FP-constrained functions. So outside of an FP-constrained function we'd preserve optimizability, and inside one we'd be appropriately conservative. The generic backend could similarly just default to treating those intrinsic calls as having side-effects in FP-constrained functions, and there'd just be some way for a backend to opt out of that when it provides precise modeling of FP state. It'd then be a fairly straightforward change to go through the target intrinsic tables and mark which ones have dependencies on FP state.

... The big problem is that intrinsic calls are not arbitrary code: the vast majority of intrinsics (e.g. all the ARM vector intrinsics, many of which can be floating-point) are marked IntrNoMem (which I believe corresponds to readnone), which means calls to those intrinsics can be reordered with other code whether or not that code has arbitrary side-effects.

Oh, you're right. With the constrained intrinsics we are currently handling that by using IntrInaccessibleMemOnly as a proxy for access to the FP environment, but that's stronger than we'd want for architecture-specific intrinsics in the default case. We have talked about an fpenv-specific attribute, but nothing has been done. So, I guess that does leave us in the situation where rounding controls might not be correctly respected if target-specific intrinsics are used.

It's good that people are looking at achieving better modeling for the x86 backend, but we need to have a plan that doesn't require heroic effort just to get basic correctness.

Do you mean in the backend? If so, I don't think that's possible. The backends just don't have any sort of feature that could be used to get conservatively correct behavior for cheap the way intrinsics give it to us in the middle end. Once you go into instruction selection things get very low level in a hurry.

clang/include/clang/Basic/DiagnosticDriverKinds.td
444

The "experimental" code won't be incorrect in any way that the code generated when we ignore the option is. The things that have been implemented will work correctly. The things that are not implemented will have the potential to disregard runtime changes to the rounding mode. Currently, dynamic changes to the rounding mode always have the potential of being ignored.

It's good that people are looking at achieving better modeling for the x86 backend, but we need to have a plan that doesn't require heroic effort just to get basic correctness.

Do you mean in the backend? If so, I don't think that's possible. The backends just don't have any sort of feature that could be used to get conservatively correct behavior for cheap the way intrinsics give it to us in the middle end. Once you go into instruction selection things get very low level in a hurry.

I'm looking for simple ways to modeling X86 intrinsics, but haven't find better one than modeling it one by one.

I would suggest that we need a function/call attribute roughly on the level of readonly / readnone, maybe readfponly, that says that a function has no side-effects and no dependencies on anything *except* the FP state.

Do you mean mark it at the declaration of intrinsics? Is it reasonable to mark except on dependent intrinsics?

Basic queries like Instruction::mayReadMemory() that are supposed to be used generically in code-motion transforms would then return true for calls marked that way only if they're FP-constrained functions.

Middle end or back end? I think in middle end you may need to change all releated passes to get such information to prevent optimization. And in back end, I think we can simply chain intrinsics marked except with other FP nodes like what common code doing.

It's good that people are looking at achieving better modeling for the x86 backend, but we need to have a plan that doesn't require heroic effort just to get basic correctness.

Do you mean in the backend? If so, I don't think that's possible. The backends just don't have any sort of feature that could be used to get conservatively correct behavior for cheap the way intrinsics give it to us in the middle end. Once you go into instruction selection things get very low level in a hurry.

I'm looking for simple ways to modeling X86 intrinsics, but haven't find better one than modeling it one by one.

I would suggest that we need a function/call attribute roughly on the level of readonly / readnone, maybe readfponly, that says that a function has no side-effects and no dependencies on anything *except* the FP state.

Do you mean mark it at the declaration of intrinsics? Is it reasonable to mark except on dependent intrinsics?

Basic queries like Instruction::mayReadMemory() that are supposed to be used generically in code-motion transforms would then return true for calls marked that way only if they're FP-constrained functions.

Middle end or back end? I think in middle end you may need to change all releated passes to get such information to prevent optimization. And in back end, I think we can simply chain intrinsics marked except with other FP nodes like what common code doing.

We don't want to do this all the time. So we need a new property for the intrinsics that means we should prevent code motion in the middle end when the calling function has the strictfp attribute. Similarly SelectionDAGBuilder should use INTRINSIC_W_CHAIN instead of INTRINSIC_WO_CHAIN for any of these intrinsics when strictp is enabled.

It seems the discussion of whether or not this is incomplete died out -- I'd prefer to assume it is incomplete if there is no consensus. Mailed D71635 to rename -frounding-math to -fexperimental-rounding-math.

Alternatively we could remove the warning. I still don't see a good argument for the middle ground of having it called -frounding-math but also generate a warning.

It seems the discussion of whether or not this is incomplete died out -- I'd prefer to assume it is incomplete if there is no consensus. Mailed D71635 to rename -frounding-math to -fexperimental-rounding-math.

Alternatively we could remove the warning. I still don't see a good argument for the middle ground of having it called -frounding-math but also generate a warning.

It's definitely incomplete but the results will not be any worse than you get when -frounding-math is ignored.

My preference would be to change the text of the warning that is issued but allow -frounding-math to be enabled by this commit without requiring an additional option.

I would also very much like to see this patch re-committed. It's currently in the "approved" state. If anyone objects to this being committed, please use the "request changes" action to indicate this.

rupprecht closed this revision.Dec 17 2019, 7:38 PM

It seems the discussion of whether or not this is incomplete died out -- I'd prefer to assume it is incomplete if there is no consensus. Mailed D71635 to rename -frounding-math to -fexperimental-rounding-math.

Alternatively we could remove the warning. I still don't see a good argument for the middle ground of having it called -frounding-math but also generate a warning.

It's definitely incomplete but the results will not be any worse than you get when -frounding-math is ignored.

My preference would be to change the text of the warning that is issued but allow -frounding-math to be enabled by this commit without requiring an additional option.

If other reviewers agree, then let's just remove the warning. I can send a patch tomorrow unless someone else wants to do that.

I would also very much like to see this patch re-committed. It's currently in the "approved" state. If anyone objects to this being committed, please use the "request changes" action to indicate this.

It is already re-committed. 7f9b5138470db1dc58f3bc05631284c653c9ed7a reapplied it, but IIUC it was not closed in phabricator due to leading whitespace in the commit message:

Reapply af57dbf12e54 "Add support for options -frounding-math, ftrapping-math, -ffp-model=, and -ffp-exception-behavior="
...
        Differential Revision: https://reviews.llvm.org/D62731

The "Differential" needs to be the first thing, whitespace cannot come before it.

It seems the discussion of whether or not this is incomplete died out -- I'd prefer to assume it is incomplete if there is no consensus. Mailed D71635 to rename -frounding-math to -fexperimental-rounding-math.

Alternatively we could remove the warning. I still don't see a good argument for the middle ground of having it called -frounding-math but also generate a warning.

It's definitely incomplete but the results will not be any worse than you get when -frounding-math is ignored.

My preference would be to change the text of the warning that is issued but allow -frounding-math to be enabled by this commit without requiring an additional option.

If other reviewers agree, then let's just remove the warning. I can send a patch tomorrow unless someone else wants to do that.

Mailed D71671 to do this.

If neither patch is acceptable, then I would like to revert this commit instead, as we are having issues with this patch.

It seems the discussion of whether or not this is incomplete died out -- I'd prefer to assume it is incomplete if there is no consensus. Mailed D71635 to rename -frounding-math to -fexperimental-rounding-math.

Alternatively we could remove the warning. I still don't see a good argument for the middle ground of having it called -frounding-math but also generate a warning.

It's definitely incomplete but the results will not be any worse than you get when -frounding-math is ignored.

My preference would be to change the text of the warning that is issued but allow -frounding-math to be enabled by this commit without requiring an additional option.

If other reviewers agree, then let's just remove the warning. I can send a patch tomorrow unless someone else wants to do that.

Mailed D71671 to do this.

If neither patch is acceptable, then I would like to revert this commit instead, as we are having issues with this patch.

I think we should stick with this patch and remove the warning like you proposed in D71671

I have found bug in clang-cl (win32 clang), related to recent inroduction of ffp-exception-behavior.
Unfortunately, I don't have a working patch yet, and since LLVM bugtracker registration is closed, I can not even submit a bug.

So, if it is not a trouble for you, I will email the bug description here.

Please let me know if it isn't appropriate. Bug description:

Windows: clang-cl is generating call to non-existing lib function for win32 with /fp:except option.
With recent ffp-exception-behavior=maytrap/strict, fp:except in clang-cl became generate FPE aware code.

But in case of floorf and ceilf it generates call to non-existing library function.

clang-cl.exe -m32 /Ox /fp:except testFloor.cpp /FA
testFloor.cpp:

#include <math.h>
float ret(float v) {  return floorf(v); }

resulting assember:

push    eax
movss    xmm0, dword ptr [esp + 8]
movss    dword ptr [esp], xmm0
call    _floorf #no such function!!!
pop    eax
ret

Expected behaviour:

there is no floorf lib function. Like with cosf and other math functions, floorf in MSVC is implemented as inline function.

So, it should be call to _floor (with apropriate conversion first).

I have found bug in clang-cl (win32 clang), related to recent inroduction of ffp-exception-behavior.
Unfortunately, I don't have a working patch yet, and since LLVM bugtracker registration is closed, I can not even submit a bug.

So, if it is not a trouble for you, I will email the bug description here.

Please let me know if it isn't appropriate. Bug description:

Windows: clang-cl is generating call to non-existing lib function for win32 with /fp:except option.
With recent ffp-exception-behavior=maytrap/strict, fp:except in clang-cl became generate FPE aware code.

But in case of floorf and ceilf it generates call to non-existing library function.

clang-cl.exe -m32 /Ox /fp:except testFloor.cpp /FA
testFloor.cpp:

#include <math.h>
float ret(float v) {  return floorf(v); }

resulting assember:

push    eax
movss    xmm0, dword ptr [esp + 8]
movss    dword ptr [esp], xmm0
call    _floorf #no such function!!!
pop    eax
ret

Expected behaviour:

there is no floorf lib function. Like with cosf and other math functions, floorf in MSVC is implemented as inline function.

So, it should be call to _floor (with apropriate conversion first).

Hopefully fixed by 53ee806d93e8d2371726ec5ce59b0e68b309c258

Found some issue when looking at this code: -ftrapping_math and -fno_trapping_math will never have effect

clang/lib/Frontend/CompilerInvocation.cpp
3155

Calling 'Opts.setFPExceptionMode(xx)' here has no effect, as it will be overruled later on, on line 3174 !
Same is true on line 3159