Page MenuHomePhabricator

Implement inlining of strictfp functions
Needs ReviewPublic

Authored by sepavloff on Nov 4 2019, 2:58 AM.

Details

Summary

According to the current design, if a floating point operation is
represented by constrained intrinsic somewhere in a function, all
floating point operations in the function must be represented by
constrained intrinsics. It imposes additional requirements to inlining
mechanism. If non-strictfp function is inlined into strictfp function,
all ordinary FP operations must be replaced with their constrained
counterparts. Such behavior is implemented by this change.

Inlining strictfp function into non-strictfp is not implemented as it
would require replacement of all FP operations in the host function,
which now is undesirable due to expected performance loss.

Event Timeline

sepavloff created this revision.Nov 4 2019, 2:58 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 4 2019, 2:58 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
kpn added inline comments.Nov 4 2019, 6:43 AM
llvm/lib/Transforms/Utils/CloneFunction.cpp
331

Not all constrained intrinsics take both metadata arguments. See the LangRef or the IR verifier for the details.

333

I don't see the strictfp attribute being added to the call.

335

Is this the path that a call instruction goes through? Because it'll need the strictfp attribute added.

llvm/test/Transforms/Inline/inline_strictfp.ll
16

All function calls in a strictfp-marked function require the strictfp attribute. See D68233 for the current version of the verifier for this.

simoll added a subscriber: simoll.Nov 4 2019, 9:37 AM
andrew.w.kaylor added inline comments.Nov 4 2019, 3:08 PM
llvm/include/llvm/IR/Instruction.h
50 ↗(On Diff #227675)

It seems wasteful to take one of these bits for something that can be deduced from other information we already have.

llvm/lib/Transforms/Utils/CloneFunction.cpp
310

There is some ongoing discussion about how predicated vector instructions will handle constrained FP mode. I think Simon's intention is to have just a single intrinsic that is used regardless of whether strictfp semantics are needed.

Also, in addition to converting fp intrinsics to strict equivalents, I think we need to add the strictfp attribute to callsites. We're currently preventing libcall simplification for callsites marked strictfp. The plan has been for front ends to mark all calls as strictfp so that they don't need to identify math library calls. I could probably be convinced that this is not necessary. Arguably, simplifyLibCalls could look at the callee's function attributes instead. @kpn has been considering whether this behavior should be relaxed.

pengfei added a subscriber: pengfei.Nov 4 2019, 9:28 PM
simoll added inline comments.Nov 6 2019, 1:25 AM
llvm/lib/Transforms/Utils/CloneFunction.cpp
310

Yes, the idea for LLVM-VP is to have only one set of fp intrinsics for both constrained and default-env fp ops (eg llvm.vp.fadd).
The intrinsic declarations are qualified as unconstrained by default (the function has the readnone attribute and does not have strictfp).
If the VP call does not have the strictfp attribute, the metadata params have to specify the default-fp env.

The VP fp intrinsics are constrained, only if the callsite has the strictfp attribute. In that case, the same rules as for llvm.experimental.constrained.* intrinsics apply.

sepavloff updated this revision to Diff 228721.Mon, Nov 11, 9:45 AM

Updated patch

  • add attribute strictfp to calls of constrained intrinsics,
  • do not use flag DependsOnFPEnvironment.
llvm/lib/Transforms/Utils/CloneFunction.cpp
312

After further investigation, I think it is going to be necessary to attach the strictfp attribute to all callsites. In the call to cloneBlock() below, we're calling SimplifyInstruction after a call has been cloned but before it is inserted into a function. Consequently, if we don't attach the strictfp attribute to library calls, SimplifyInstruction may constant fold them away. That's not so bad for the case where we're inlining a no-strictfp function into a strictfp function, but for the case where both functions are strictfp it's a problem.

sepavloff updated this revision to Diff 229114.Wed, Nov 13, 8:39 AM

Updated patch

  • Do not add rounding mode for some intrinsics,
  • Add attribute strictfp to all calls in inlined function,
  • Some cleanup of tests.
sepavloff marked 4 inline comments as done.Wed, Nov 13, 9:21 AM
sepavloff added inline comments.
llvm/lib/Transforms/Utils/CloneFunction.cpp
310

Yes, the idea for LLVM-VP is to have only one set of fp intrinsics for both constrained and default-env fp ops (eg llvm.vp.fadd).

It means that these calls do not need to be transformed and they may be processes as any other intrinsic.

The VP fp intrinsics are constrained, only if the callsite has the strictfp attribute.

As @andrew.w.kaylor pointed out, all function calls need to have strictfp attribute, it is now attached to all calls in the inlined function.

simoll added inline comments.Wed, Nov 13, 9:28 AM
llvm/lib/Transforms/Utils/CloneFunction.cpp
310

As @andrew.w.kaylor pointed out, all function calls need to have strictfp attribute, it is now attached to all calls in the inlined function.

Great! That should work seamlessly for VP intrinsics once we enable their fp-constrained usage.