This is an archive of the discontinued LLVM Phabricator instance.

Implement inlining of strictfp functions
ClosedPublic

Authored by sepavloff on Nov 4 2019, 2:58 AM.

Details

Summary

According to the current design, if a floating point operation is
represented by constrained intrinsic somewhere in a function, all
floating point operations in the function must be represented by
constrained intrinsics. It imposes additional requirements to inlining
mechanism. If non-strictfp function is inlined into strictfp function,
all ordinary FP operations must be replaced with their constrained
counterparts. Such behavior is implemented by this change.

Inlining strictfp function into non-strictfp is not implemented as it
would require replacement of all FP operations in the host function,
which now is undesirable due to expected performance loss.

Event Timeline

sepavloff created this revision.Nov 4 2019, 2:58 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 4 2019, 2:58 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
kpn added inline comments.Nov 4 2019, 6:43 AM
llvm/lib/Transforms/Utils/CloneFunction.cpp
331

Not all constrained intrinsics take both metadata arguments. See the LangRef or the IR verifier for the details.

333

I don't see the strictfp attribute being added to the call.

335

Is this the path that a call instruction goes through? Because it'll need the strictfp attribute added.

llvm/test/Transforms/Inline/inline_strictfp.ll
16

All function calls in a strictfp-marked function require the strictfp attribute. See D68233 for the current version of the verifier for this.

simoll added a subscriber: simoll.Nov 4 2019, 9:37 AM
andrew.w.kaylor added inline comments.Nov 4 2019, 3:08 PM
llvm/include/llvm/IR/Instruction.h
50 ↗(On Diff #227675)

It seems wasteful to take one of these bits for something that can be deduced from other information we already have.

llvm/lib/Transforms/Utils/CloneFunction.cpp
310

There is some ongoing discussion about how predicated vector instructions will handle constrained FP mode. I think Simon's intention is to have just a single intrinsic that is used regardless of whether strictfp semantics are needed.

Also, in addition to converting fp intrinsics to strict equivalents, I think we need to add the strictfp attribute to callsites. We're currently preventing libcall simplification for callsites marked strictfp. The plan has been for front ends to mark all calls as strictfp so that they don't need to identify math library calls. I could probably be convinced that this is not necessary. Arguably, simplifyLibCalls could look at the callee's function attributes instead. @kpn has been considering whether this behavior should be relaxed.

pengfei added a subscriber: pengfei.Nov 4 2019, 9:28 PM
simoll added inline comments.Nov 6 2019, 1:25 AM
llvm/lib/Transforms/Utils/CloneFunction.cpp
310

Yes, the idea for LLVM-VP is to have only one set of fp intrinsics for both constrained and default-env fp ops (eg llvm.vp.fadd).
The intrinsic declarations are qualified as unconstrained by default (the function has the readnone attribute and does not have strictfp).
If the VP call does not have the strictfp attribute, the metadata params have to specify the default-fp env.

The VP fp intrinsics are constrained, only if the callsite has the strictfp attribute. In that case, the same rules as for llvm.experimental.constrained.* intrinsics apply.

sepavloff updated this revision to Diff 228721.Nov 11 2019, 9:45 AM

Updated patch

  • add attribute strictfp to calls of constrained intrinsics,
  • do not use flag DependsOnFPEnvironment.
llvm/lib/Transforms/Utils/CloneFunction.cpp
312

After further investigation, I think it is going to be necessary to attach the strictfp attribute to all callsites. In the call to cloneBlock() below, we're calling SimplifyInstruction after a call has been cloned but before it is inserted into a function. Consequently, if we don't attach the strictfp attribute to library calls, SimplifyInstruction may constant fold them away. That's not so bad for the case where we're inlining a no-strictfp function into a strictfp function, but for the case where both functions are strictfp it's a problem.

sepavloff updated this revision to Diff 229114.Nov 13 2019, 8:39 AM

Updated patch

  • Do not add rounding mode for some intrinsics,
  • Add attribute strictfp to all calls in inlined function,
  • Some cleanup of tests.
sepavloff marked 4 inline comments as done.Nov 13 2019, 9:21 AM
sepavloff added inline comments.
llvm/lib/Transforms/Utils/CloneFunction.cpp
310

Yes, the idea for LLVM-VP is to have only one set of fp intrinsics for both constrained and default-env fp ops (eg llvm.vp.fadd).

It means that these calls do not need to be transformed and they may be processes as any other intrinsic.

The VP fp intrinsics are constrained, only if the callsite has the strictfp attribute.

As @andrew.w.kaylor pointed out, all function calls need to have strictfp attribute, it is now attached to all calls in the inlined function.

simoll added inline comments.Nov 13 2019, 9:28 AM
llvm/lib/Transforms/Utils/CloneFunction.cpp
310

As @andrew.w.kaylor pointed out, all function calls need to have strictfp attribute, it is now attached to all calls in the inlined function.

Great! That should work seamlessly for VP intrinsics once we enable their fp-constrained usage.

sepavloff planned changes to this revision.Dec 19 2019, 10:56 PM

When function that uses default FP environment is inlined into strictfp function, code of the former will be executed in the FP environment set in the strictfp function. To fix this behavior the FP environment should be saved upon entry to the inlined function, FP environment reset to default state, and the saved state must be restored upon leaving the inlined function.

craig.topper added a subscriber: craig.topper.EditedDec 19 2019, 11:26 PM

When function that uses default FP environment is inlined into strictfp function, code of the former will be executed in the FP environment set in the strictfp function. To fix this behavior the FP environment should be saved upon entry to the inlined function, FP environment reset to default state, and the saved state must be restored upon leaving the inlined function.

So if the function is inlined it would use the reset state, but if it doesn't get inlined it would use the caller's state. Doesn't that mean whether or not the compiler inlines the function changes the behavior of the program?

When function that uses default FP environment is inlined into strictfp function, code of the former will be executed in the FP environment set in the strictfp function. To fix this behavior the FP environment should be saved upon entry to the inlined function, FP environment reset to default state, and the saved state must be restored upon leaving the inlined function.

So if the function is inlined it would use the reset state, but if it doesn't get inlined it would use the caller's state. Doesn't that mean whether or not the compiler inlines the function changes the behavior of the program?

It is so now. If a code in which #pargma STDC FENV_ACCESS ON acts calls an external function, it would be executed in caller's FP environment. This is wrong if the callee expects default one. We could put get_fenv, reset_fenv and set_fenv, introduced in D71742, around all calls, unless the called function has strictfp attribute, or we know that it does not use FP operation. It however could create unneeded code, which is bad for performance. We need to elaborate proper solution.

kpn added a comment.Dec 20 2019, 12:06 PM

When function that uses default FP environment is inlined into strictfp function, code of the former will be executed in the FP environment set in the strictfp function. To fix this behavior the FP environment should be saved upon entry to the inlined function, FP environment reset to default state, and the saved state must be restored upon leaving the inlined function.

So if the function is inlined it would use the reset state, but if it doesn't get inlined it would use the caller's state. Doesn't that mean whether or not the compiler inlines the function changes the behavior of the program?

It is so now. If a code in which #pargma STDC FENV_ACCESS ON acts calls an external function, it would be executed in caller's FP environment. This is wrong if the callee expects default one. We could put get_fenv, reset_fenv and set_fenv, introduced in D71742, around all calls, unless the called function has strictfp attribute, or we know that it does not use FP operation. It however could create unneeded code, which is bad for performance. We need to elaborate proper solution.

If FENV_ACCESS is ON and a function is called that expects it to be OFF then isn't that just plain undefined behavior? Unless it was changed since C99 I don't see how this is the compiler's problem to solve. And the compiler really shouldn't be changing the FP environment implicitly just because an arbitrary function was called, or we fell out of an FENV_ACCESS=ON scope, or whatever.

In D69798#1793237, @kpn wrote:

If FENV_ACCESS is ON and a function is called that expects it to be OFF then isn't that just plain undefined behavior? Unless it was changed since C99 I don't see how this is the compiler's problem to solve. And the compiler really shouldn't be changing the FP environment implicitly just because an arbitrary function was called, or we fell out of an FENV_ACCESS=ON scope, or whatever.

Yes, exactly. The purpose of FENV_ACCESS is to inform the compiler about FP environment changes that the program (explicitly) performs; under no circumstances is FENV_ACCESS intended to instruct the compiler to change the FP env on its own.

In D69798#1793237, @kpn wrote:

When function that uses default FP environment is inlined into strictfp function, code of the former will be executed in the FP environment set in the strictfp function. To fix this behavior the FP environment should be saved upon entry to the inlined function, FP environment reset to default state, and the saved state must be restored upon leaving the inlined function.

So if the function is inlined it would use the reset state, but if it doesn't get inlined it would use the caller's state. Doesn't that mean whether or not the compiler inlines the function changes the behavior of the program?

It is so now. If a code in which #pargma STDC FENV_ACCESS ON acts calls an external function, it would be executed in caller's FP environment. This is wrong if the callee expects default one. We could put get_fenv, reset_fenv and set_fenv, introduced in D71742, around all calls, unless the called function has strictfp attribute, or we know that it does not use FP operation. It however could create unneeded code, which is bad for performance. We need to elaborate proper solution.

If FENV_ACCESS is ON and a function is called that expects it to be OFF then isn't that just plain undefined behavior? Unless it was changed since C99 I don't see how this is the compiler's problem to solve. And the compiler really shouldn't be changing the FP environment implicitly just because an arbitrary function was called, or we fell out of an FENV_ACCESS=ON scope, or whatever.

This is a matter of convention. If it is a user responsibility to call only 'proper' functions, FP state switch is not required. It however could be a fragile solution, because in complex programs it is hard to guarantee that non of the call recursively does not imply default FP environment. The safer solution is to save/restore the environment in unclear cases, and let backend to optimize out unnecessary calls.

kpn added a comment.Dec 23 2019, 6:20 AM
In D69798#1793237, @kpn wrote:

If FENV_ACCESS is ON and a function is called that expects it to be OFF then isn't that just plain undefined behavior? Unless it was changed since C99 I don't see how this is the compiler's problem to solve. And the compiler really shouldn't be changing the FP environment implicitly just because an arbitrary function was called, or we fell out of an FENV_ACCESS=ON scope, or whatever.

This is a matter of convention. If it is a user responsibility to call only 'proper' functions, FP state switch is not required. It however could be a fragile solution, because in complex programs it is hard to guarantee that non of the call recursively does not imply default FP environment. The safer solution is to save/restore the environment in unclear cases, and let backend to optimize out unnecessary calls.

It _is_ the responsibility of the user to not call problematic functions. That _is_ the current convention. Yes, it may be fragile, but that's still the user's problem to solve. And the compiler can't know when the calls aren't needed because that information about functions in other TU simply isn't available. Plus, there are cases where functions are compiled with the #pragma and are expecting to be called with a non-default FP environment, but they don't change the environment themselves. So having the compiler insert calls to change the environment before calling would be an error. It's a mistake to trade off a set of potential errors caused by the programmer in exchange for a set of problems caused by the compiler.

In D69798#1794921, @kpn wrote:
In D69798#1793237, @kpn wrote:

If FENV_ACCESS is ON and a function is called that expects it to be OFF then isn't that just plain undefined behavior? Unless it was changed since C99 I don't see how this is the compiler's problem to solve. And the compiler really shouldn't be changing the FP environment implicitly just because an arbitrary function was called, or we fell out of an FENV_ACCESS=ON scope, or whatever.

This is a matter of convention. If it is a user responsibility to call only 'proper' functions, FP state switch is not required. It however could be a fragile solution, because in complex programs it is hard to guarantee that non of the call recursively does not imply default FP environment. The safer solution is to save/restore the environment in unclear cases, and let backend to optimize out unnecessary calls.

It _is_ the responsibility of the user to not call problematic functions. That _is_ the current convention. Yes, it may be fragile, but that's still the user's problem to solve. And the compiler can't know when the calls aren't needed because that information about functions in other TU simply isn't available. Plus, there are cases where functions are compiled with the #pragma and are expecting to be called with a non-default FP environment, but they don't change the environment themselves. So having the compiler insert calls to change the environment before calling would be an error. It's a mistake to trade off a set of potential errors caused by the programmer in exchange for a set of problems caused by the compiler.

You are right. The case of strictfp function which does not set FP environment is especially nice. The idea to surround external function calls with FP save/restore calls is not good. However when compiler makes inlining, it knows if the inlined function is strictfp. If it is, FP state is not modified. If it isn't, the function expects default FP environment and putting save/restore call brings FP state to that expected by the inlined function.

kpn added a comment.Dec 23 2019, 9:37 AM

Imagine a case where some function "X()" under the #pragma and a non-default FP environment calls a function "Y()" in a different TU and not under the #pragma. Then the programmer moves "Y()" into a header file. Under your proposal Y() would get different FP environments at run time simply because it was moved from a different TU into a header file. Surprise!

You'd have the same issues if the programmer enabled LTO. Imagine having functions executing with differing FP environments depending on whether or not LTO was enabled.

No, the compiler should never change the FP environment implicitly. That way the compiler avoids introducing ugly surprises. The programmer would be left with just the mess they made themselves.

sepavloff updated this revision to Diff 242043.Feb 3 2020, 5:34 AM

Updated patch

qiucf added a subscriber: qiucf.Mar 15 2021, 1:54 AM

I had forgotten that this patch never landed, but I was investigating a bug yesterday that I think this will help with (https://github.com/llvm/llvm-project/issues/48669).

@kpn are you happy with the current form. It's gotten stale in a few places, but I think it's basically correct.

Herald added a project: Restricted Project. · View Herald TranscriptMar 25 2022, 11:00 AM
kpn added a comment.Mar 25 2022, 11:15 AM

I had forgotten that this patch never landed, but I was investigating a bug yesterday that I think this will help with (https://github.com/llvm/llvm-project/issues/48669).

@kpn are you happy with the current form. It's gotten stale in a few places, but I think it's basically correct.

Yes, it looks fine. From reading my comments, I was worried about 'strictfp' not being attached to cloned calls. But it looks like it attaches it after the clone. So it's fine.

I'd love to see this go in the tree.

@sepavloff I apologize for having lost track of this for so long. Do you have time to rebase this and the dependent patch?

sepavloff updated this revision to Diff 418563.Mar 28 2022, 6:14 AM

Rebased and made some update

@andrew.w.kaylor Thank you for reviving this work!
I rebased the patches, the dependency patch has only small changes, this patch changed a bit more, because a new mechanism for intrinsic type parameters was used.

This revision is now accepted and ready to land.Mar 29 2022, 9:39 AM
This revision was landed with ongoing or failed builds.Mar 31 2022, 5:16 AM
This revision was automatically updated to reflect the committed changes.