This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Utils/
-
Transforms/
-
Utils/
3/8
CloneFunction.cpp
-
InlineFunction.cpp
-
test/Transforms/
-
Transforms/
-
CodeExtractor/
-
PartialInlineAttributes.ll
-
Inline/
-
inline-strictfp.ll

Differential D69798

Implement inlining of strictfp functions
ClosedPublic

Authored by sepavloff on Nov 4 2019, 2:58 AM.

Download Raw Diff

Details

Reviewers

echristo
spatel
andrew.w.kaylor
kpn
cameron.mcinally

Commits

rG47b3b76825dc: Implement inlining of strictfp functions

Summary

According to the current design, if a floating point operation is
represented by constrained intrinsic somewhere in a function, all
floating point operations in the function must be represented by
constrained intrinsics. It imposes additional requirements to inlining
mechanism. If non-strictfp function is inlined into strictfp function,
all ordinary FP operations must be replaced with their constrained
counterparts. Such behavior is implemented by this change.

Inlining strictfp function into non-strictfp is not implemented as it
would require replacement of all FP operations in the host function,
which now is undesirable due to expected performance loss.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

sepavloff created this revision.Nov 4 2019, 2:58 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 4 2019, 2:58 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

Harbormaster completed remote builds in B40469: Diff 227675.Nov 4 2019, 2:59 AM

sepavloff added a parent revision: D69562: Mapping of FP operations to constrained intrinsics.Nov 4 2019, 3:10 AM

sepavloff added a child revision: D69272: Enable '#pragma STDC FENV_ACCESS' in frontend.Nov 4 2019, 4:09 AM

spatel added reviewers: kpn, cameron.mcinally.Nov 4 2019, 6:21 AM

kpn added inline comments.Nov 4 2019, 6:43 AM

llvm/lib/Transforms/Utils/CloneFunction.cpp
372	Not all constrained intrinsics take both metadata arguments. See the LangRef or the IR verifier for the details.
374	I don't see the strictfp attribute being added to the call.
376	Is this the path that a call instruction goes through? Because it'll need the strictfp attribute added.
llvm/test/Transforms/Inline/inline_strictfp.ll
15 ↗	(On Diff #227675)	All function calls in a strictfp-marked function require the strictfp attribute. See D68233 for the current version of the verifier for this.

kpn mentioned this in D69616: [FPEnv] The inliner shouldn't mix strictfp and non-strictfp functions..Nov 4 2019, 7:40 AM

simoll added a subscriber: simoll.Nov 4 2019, 9:37 AM

andrew.w.kaylor added inline comments.Nov 4 2019, 3:08 PM

llvm/include/llvm/IR/Instruction.h
50 ↗	(On Diff #227675)	It seems wasteful to take one of these bits for something that can be deduced from other information we already have.
llvm/lib/Transforms/Utils/CloneFunction.cpp
351	There is some ongoing discussion about how predicated vector instructions will handle constrained FP mode. I think Simon's intention is to have just a single intrinsic that is used regardless of whether strictfp semantics are needed. Also, in addition to converting fp intrinsics to strict equivalents, I think we need to add the strictfp attribute to callsites. We're currently preventing libcall simplification for callsites marked strictfp. The plan has been for front ends to mark all calls as strictfp so that they don't need to identify math library calls. I could probably be convinced that this is not necessary. Arguably, simplifyLibCalls could look at the callee's function attributes instead. @kpn has been considering whether this behavior should be relaxed.

pengfei added a subscriber: pengfei.Nov 4 2019, 9:28 PM

simoll added inline comments.Nov 6 2019, 1:25 AM

llvm/lib/Transforms/Utils/CloneFunction.cpp
351	Yes, the idea for LLVM-VP is to have only one set of fp intrinsics for both constrained and default-env fp ops (eg `llvm.vp.fadd`). The intrinsic declarations are qualified as unconstrained by default (the function has the `readnone` attribute and does not have `strictfp`). If the VP call does not have the `strictfp` attribute, the metadata params have to specify the default-fp env. The VP fp intrinsics are constrained, only if the callsite has the `strictfp` attribute. In that case, the same rules as for `llvm.experimental.constrained.*` intrinsics apply.

sepavloff mentioned this in D69562: Mapping of FP operations to constrained intrinsics.Nov 6 2019, 6:23 AM

Updated patch

add attribute strictfp to calls of constrained intrinsics,
do not use flag DependsOnFPEnvironment.

Harbormaster completed remote builds in B40754: Diff 228721.Nov 11 2019, 9:45 AM

andrew.w.kaylor added inline comments.Nov 11 2019, 3:48 PM

llvm/lib/Transforms/Utils/CloneFunction.cpp
353	After further investigation, I think it is going to be necessary to attach the strictfp attribute to all callsites. In the call to cloneBlock() below, we're calling SimplifyInstruction after a call has been cloned but before it is inserted into a function. Consequently, if we don't attach the strictfp attribute to library calls, SimplifyInstruction may constant fold them away. That's not so bad for the case where we're inlining a no-strictfp function into a strictfp function, but for the case where both functions are strictfp it's a problem.

Updated patch

Do not add rounding mode for some intrinsics,
Add attribute strictfp to all calls in inlined function,
Some cleanup of tests.

Harbormaster completed remote builds in B40892: Diff 229114.Nov 13 2019, 8:46 AM

sepavloff marked 4 inline comments as done.Nov 13 2019, 9:21 AM

sepavloff added inline comments.

llvm/lib/Transforms/Utils/CloneFunction.cpp
351	Yes, the idea for LLVM-VP is to have only one set of fp intrinsics for both constrained and default-env fp ops (eg llvm.vp.fadd). It means that these calls do not need to be transformed and they may be processes as any other intrinsic. The VP fp intrinsics are constrained, only if the callsite has the strictfp attribute. As @andrew.w.kaylor pointed out, all function calls need to have `strictfp` attribute, it is now attached to all calls in the inlined function.

simoll added inline comments.Nov 13 2019, 9:28 AM

llvm/lib/Transforms/Utils/CloneFunction.cpp
351	As @andrew.w.kaylor pointed out, all function calls need to have strictfp attribute, it is now attached to all calls in the inlined function. Great! That should work seamlessly for VP intrinsics once we enable their fp-constrained usage.

When function that uses default FP environment is inlined into strictfp function, code of the former will be executed in the FP environment set in the strictfp function. To fix this behavior the FP environment should be saved upon entry to the inlined function, FP environment reset to default state, and the saved state must be restored upon leaving the inlined function.

sepavloff added a parent revision: D71742: Added intrinsics for access to FP environment.Dec 19 2019, 10:57 PM

In D69798#1792231, @sepavloff wrote:

When function that uses default FP environment is inlined into strictfp function, code of the former will be executed in the FP environment set in the strictfp function. To fix this behavior the FP environment should be saved upon entry to the inlined function, FP environment reset to default state, and the saved state must be restored upon leaving the inlined function.

So if the function is inlined it would use the reset state, but if it doesn't get inlined it would use the caller's state. Doesn't that mean whether or not the compiler inlines the function changes the behavior of the program?

In D69798#1792237, @craig.topper wrote:

In D69798#1792231, @sepavloff wrote:

When function that uses default FP environment is inlined into strictfp function, code of the former will be executed in the FP environment set in the strictfp function. To fix this behavior the FP environment should be saved upon entry to the inlined function, FP environment reset to default state, and the saved state must be restored upon leaving the inlined function.

So if the function is inlined it would use the reset state, but if it doesn't get inlined it would use the caller's state. Doesn't that mean whether or not the compiler inlines the function changes the behavior of the program?

It is so now. If a code in which #pargma STDC FENV_ACCESS ON acts calls an external function, it would be executed in caller's FP environment. This is wrong if the callee expects default one. We could put get_fenv, reset_fenv and set_fenv, introduced in D71742, around all calls, unless the called function has strictfp attribute, or we know that it does not use FP operation. It however could create unneeded code, which is bad for performance. We need to elaborate proper solution.

In D69798#1792906, @sepavloff wrote:

In D69798#1792237, @craig.topper wrote:

In D69798#1792231, @sepavloff wrote:

When function that uses default FP environment is inlined into strictfp function, code of the former will be executed in the FP environment set in the strictfp function. To fix this behavior the FP environment should be saved upon entry to the inlined function, FP environment reset to default state, and the saved state must be restored upon leaving the inlined function.

So if the function is inlined it would use the reset state, but if it doesn't get inlined it would use the caller's state. Doesn't that mean whether or not the compiler inlines the function changes the behavior of the program?

It is so now. If a code in which #pargma STDC FENV_ACCESS ON acts calls an external function, it would be executed in caller's FP environment. This is wrong if the callee expects default one. We could put get_fenv, reset_fenv and set_fenv, introduced in D71742, around all calls, unless the called function has strictfp attribute, or we know that it does not use FP operation. It however could create unneeded code, which is bad for performance. We need to elaborate proper solution.

If FENV_ACCESS is ON and a function is called that expects it to be OFF then isn't that just plain undefined behavior? Unless it was changed since C99 I don't see how this is the compiler's problem to solve. And the compiler really shouldn't be changing the FP environment implicitly just because an arbitrary function was called, or we fell out of an FENV_ACCESS=ON scope, or whatever.

In D69798#1793237, @kpn wrote:

If FENV_ACCESS is ON and a function is called that expects it to be OFF then isn't that just plain undefined behavior? Unless it was changed since C99 I don't see how this is the compiler's problem to solve. And the compiler really shouldn't be changing the FP environment implicitly just because an arbitrary function was called, or we fell out of an FENV_ACCESS=ON scope, or whatever.

Yes, exactly. The purpose of FENV_ACCESS is to inform the compiler about FP environment changes that the program (explicitly) performs; under no circumstances is FENV_ACCESS intended to instruct the compiler to change the FP env on its own.

In D69798#1793237, @kpn wrote:

In D69798#1792906, @sepavloff wrote:

In D69798#1792237, @craig.topper wrote:

In D69798#1792231, @sepavloff wrote:

When function that uses default FP environment is inlined into strictfp function, code of the former will be executed in the FP environment set in the strictfp function. To fix this behavior the FP environment should be saved upon entry to the inlined function, FP environment reset to default state, and the saved state must be restored upon leaving the inlined function.

So if the function is inlined it would use the reset state, but if it doesn't get inlined it would use the caller's state. Doesn't that mean whether or not the compiler inlines the function changes the behavior of the program?

It is so now. If a code in which #pargma STDC FENV_ACCESS ON acts calls an external function, it would be executed in caller's FP environment. This is wrong if the callee expects default one. We could put get_fenv, reset_fenv and set_fenv, introduced in D71742, around all calls, unless the called function has strictfp attribute, or we know that it does not use FP operation. It however could create unneeded code, which is bad for performance. We need to elaborate proper solution.

If FENV_ACCESS is ON and a function is called that expects it to be OFF then isn't that just plain undefined behavior? Unless it was changed since C99 I don't see how this is the compiler's problem to solve. And the compiler really shouldn't be changing the FP environment implicitly just because an arbitrary function was called, or we fell out of an FENV_ACCESS=ON scope, or whatever.

This is a matter of convention. If it is a user responsibility to call only 'proper' functions, FP state switch is not required. It however could be a fragile solution, because in complex programs it is hard to guarantee that non of the call recursively does not imply default FP environment. The safer solution is to save/restore the environment in unclear cases, and let backend to optimize out unnecessary calls.

In D69798#1794639, @sepavloff wrote:

In D69798#1793237, @kpn wrote:

If FENV_ACCESS is ON and a function is called that expects it to be OFF then isn't that just plain undefined behavior? Unless it was changed since C99 I don't see how this is the compiler's problem to solve. And the compiler really shouldn't be changing the FP environment implicitly just because an arbitrary function was called, or we fell out of an FENV_ACCESS=ON scope, or whatever.

This is a matter of convention. If it is a user responsibility to call only 'proper' functions, FP state switch is not required. It however could be a fragile solution, because in complex programs it is hard to guarantee that non of the call recursively does not imply default FP environment. The safer solution is to save/restore the environment in unclear cases, and let backend to optimize out unnecessary calls.

It _is_ the responsibility of the user to not call problematic functions. That _is_ the current convention. Yes, it may be fragile, but that's still the user's problem to solve. And the compiler can't know when the calls aren't needed because that information about functions in other TU simply isn't available. Plus, there are cases where functions are compiled with the #pragma and are expecting to be called with a non-default FP environment, but they don't change the environment themselves. So having the compiler insert calls to change the environment before calling would be an error. It's a mistake to trade off a set of potential errors caused by the programmer in exchange for a set of problems caused by the compiler.

In D69798#1794921, @kpn wrote:

In D69798#1794639, @sepavloff wrote:

In D69798#1793237, @kpn wrote:

If FENV_ACCESS is ON and a function is called that expects it to be OFF then isn't that just plain undefined behavior? Unless it was changed since C99 I don't see how this is the compiler's problem to solve. And the compiler really shouldn't be changing the FP environment implicitly just because an arbitrary function was called, or we fell out of an FENV_ACCESS=ON scope, or whatever.

This is a matter of convention. If it is a user responsibility to call only 'proper' functions, FP state switch is not required. It however could be a fragile solution, because in complex programs it is hard to guarantee that non of the call recursively does not imply default FP environment. The safer solution is to save/restore the environment in unclear cases, and let backend to optimize out unnecessary calls.

It _is_ the responsibility of the user to not call problematic functions. That _is_ the current convention. Yes, it may be fragile, but that's still the user's problem to solve. And the compiler can't know when the calls aren't needed because that information about functions in other TU simply isn't available. Plus, there are cases where functions are compiled with the #pragma and are expecting to be called with a non-default FP environment, but they don't change the environment themselves. So having the compiler insert calls to change the environment before calling would be an error. It's a mistake to trade off a set of potential errors caused by the programmer in exchange for a set of problems caused by the compiler.

You are right. The case of strictfp function which does not set FP environment is especially nice. The idea to surround external function calls with FP save/restore calls is not good. However when compiler makes inlining, it knows if the inlined function is strictfp. If it is, FP state is not modified. If it isn't, the function expects default FP environment and putting save/restore call brings FP state to that expected by the inlined function.

Imagine a case where some function "X()" under the #pragma and a non-default FP environment calls a function "Y()" in a different TU and not under the #pragma. Then the programmer moves "Y()" into a header file. Under your proposal Y() would get different FP environments at run time simply because it was moved from a different TU into a header file. Surprise!

You'd have the same issues if the programmer enabled LTO. Imagine having functions executing with differing FP environments depending on whether or not LTO was enabled.

No, the compiler should never change the FP environment implicitly. That way the compiler avoids introducing ugly surprises. The programmer would be left with just the mess they made themselves.

Updated patch

sepavloff removed a parent revision: D71742: Added intrinsics for access to FP environment.Feb 3 2020, 5:37 AM

Harbormaster completed remote builds in B45578: Diff 242043.Feb 3 2020, 5:42 AM

qiucf added a subscriber: qiucf.Mar 15 2021, 1:54 AM

Herald added a subscriber: jdoerfert. · View Herald TranscriptMar 15 2021, 1:54 AM

I had forgotten that this patch never landed, but I was investigating a bug yesterday that I think this will help with (https://github.com/llvm/llvm-project/issues/48669).

@kpn are you happy with the current form. It's gotten stale in a few places, but I think it's basically correct.

Herald added a project: Restricted Project. · View Herald TranscriptMar 25 2022, 11:00 AM

In D69798#3408468, @andrew.w.kaylor wrote:

I had forgotten that this patch never landed, but I was investigating a bug yesterday that I think this will help with (https://github.com/llvm/llvm-project/issues/48669).

@kpn are you happy with the current form. It's gotten stale in a few places, but I think it's basically correct.

Yes, it looks fine. From reading my comments, I was worried about 'strictfp' not being attached to cloned calls. But it looks like it attaches it after the clone. So it's fine.

I'd love to see this go in the tree.

@sepavloff I apologize for having lost track of this for so long. Do you have time to rebase this and the dependent patch?

Rebased and made some update

Harbormaster completed remote builds in B156545: Diff 418563.Mar 28 2022, 6:15 AM

@andrew.w.kaylor Thank you for reviving this work!
I rebased the patches, the dependency patch has only small changes, this patch changed a bit more, because a new mechanism for intrinsic type parameters was used.

lgtm

This revision is now accepted and ready to land.Mar 29 2022, 9:39 AM

This revision was landed with ongoing or failed builds.Mar 31 2022, 5:16 AM

Closed by commit rG47b3b76825dc: Implement inlining of strictfp functions (authored by sepavloff). · Explain Why

This revision was automatically updated to reflect the committed changes.

sepavloff added a commit: rG47b3b76825dc: Implement inlining of strictfp functions.

Thanks!

kpn mentioned this in rG76c22b18eafd: [FPEnv][AMDGPU] Correct strictfp tests..Jul 27 2023, 5:37 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Utils/

CloneFunction.cpp

100 lines

InlineFunction.cpp

7 lines

test/

Transforms/

CodeExtractor/

PartialInlineAttributes.ll

4 lines

Inline/

inline-strictfp.ll

145 lines

Diff 419404

llvm/lib/Transforms/Utils/CloneFunction.cpp

Show First 20 Lines • Show All 316 Lines • ▼ Show 20 Lines
/// This is a private class used to implement CloneAndPruneFunctionInto.		/// This is a private class used to implement CloneAndPruneFunctionInto.
struct PruningFunctionCloner {		struct PruningFunctionCloner {
Function *NewFunc;		Function *NewFunc;
const Function *OldFunc;		const Function *OldFunc;
ValueToValueMapTy &VMap;		ValueToValueMapTy &VMap;
bool ModuleLevelChanges;		bool ModuleLevelChanges;
const char *NameSuffix;		const char *NameSuffix;
ClonedCodeInfo *CodeInfo;		ClonedCodeInfo *CodeInfo;
		bool HostFuncIsStrictFP;

		Instruction *cloneInstruction(BasicBlock::const_iterator II);

public:		public:
PruningFunctionCloner(Function newFunc, const Function oldFunc,		PruningFunctionCloner(Function newFunc, const Function oldFunc,
ValueToValueMapTy &valueMap, bool moduleLevelChanges,		ValueToValueMapTy &valueMap, bool moduleLevelChanges,
const char nameSuffix, ClonedCodeInfo codeInfo)		const char nameSuffix, ClonedCodeInfo codeInfo)
: NewFunc(newFunc), OldFunc(oldFunc), VMap(valueMap),		: NewFunc(newFunc), OldFunc(oldFunc), VMap(valueMap),
ModuleLevelChanges(moduleLevelChanges), NameSuffix(nameSuffix),		ModuleLevelChanges(moduleLevelChanges), NameSuffix(nameSuffix),
CodeInfo(codeInfo) {}		CodeInfo(codeInfo) {
		HostFuncIsStrictFP =
		newFunc->getAttributes().hasFnAttr(Attribute::StrictFP);
		}

/// The specified block is found to be reachable, clone it and		/// The specified block is found to be reachable, clone it and
/// anything that it can reach.		/// anything that it can reach.
void CloneBlock(const BasicBlock *BB, BasicBlock::const_iterator StartingInst,		void CloneBlock(const BasicBlock *BB, BasicBlock::const_iterator StartingInst,
std::vector<const BasicBlock *> &ToClone);		std::vector<const BasicBlock *> &ToClone);
};		};
} // namespace		} // namespace

		static bool hasRoundingModeOperand(Intrinsic::ID CIID) {
		switch (CIID) {
		#define INSTRUCTION(NAME, NARG, ROUND_MODE, INTRINSIC) \
		case Intrinsic::INTRINSIC: \
		return ROUND_MODE == 1;
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions There is some ongoing discussion about how predicated vector instructions will handle constrained FP mode. I think Simon's intention is to have just a single intrinsic that is used regardless of whether strictfp semantics are needed. Also, in addition to converting fp intrinsics to strict equivalents, I think we need to add the strictfp attribute to callsites. We're currently preventing libcall simplification for callsites marked strictfp. The plan has been for front ends to mark all calls as strictfp so that they don't need to identify math library calls. I could probably be convinced that this is not necessary. Arguably, simplifyLibCalls could look at the callee's function attributes instead. @kpn has been considering whether this behavior should be relaxed. andrew.w.kaylor: There is some ongoing discussion about how predicated vector instructions will handle…
		simollUnsubmitted Not Done Reply Inline Actions Yes, the idea for LLVM-VP is to have only one set of fp intrinsics for both constrained and default-env fp ops (eg `llvm.vp.fadd`). The intrinsic declarations are qualified as unconstrained by default (the function has the `readnone` attribute and does not have `strictfp`). If the VP call does not have the `strictfp` attribute, the metadata params have to specify the default-fp env. The VP fp intrinsics are constrained, only if the callsite has the `strictfp` attribute. In that case, the same rules as for `llvm.experimental.constrained.` intrinsics apply. simoll:* Yes, the idea for [LLVM-VP](https://reviews.llvm.org/D57504) is to have only one set of fp…
		sepavloffAuthorUnsubmitted Done Reply Inline Actions Yes, the idea for LLVM-VP is to have only one set of fp intrinsics for both constrained and default-env fp ops (eg llvm.vp.fadd). It means that these calls do not need to be transformed and they may be processes as any other intrinsic. The VP fp intrinsics are constrained, only if the callsite has the strictfp attribute. As @andrew.w.kaylor pointed out, all function calls need to have `strictfp` attribute, it is now attached to all calls in the inlined function. sepavloff: > Yes, the idea for LLVM-VP is to have only one set of fp intrinsics for both constrained and…
		simollUnsubmitted Not Done Reply Inline Actions As @andrew.w.kaylor pointed out, all function calls need to have strictfp attribute, it is now attached to all calls in the inlined function. Great! That should work seamlessly for VP intrinsics once we enable their fp-constrained usage. simoll: > As @andrew.w.kaylor pointed out, all function calls need to have strictfp attribute, it is…
		#define FUNCTION INSTRUCTION
		#include "llvm/IR/ConstrainedOps.def"
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions After further investigation, I think it is going to be necessary to attach the strictfp attribute to all callsites. In the call to cloneBlock() below, we're calling SimplifyInstruction after a call has been cloned but before it is inserted into a function. Consequently, if we don't attach the strictfp attribute to library calls, SimplifyInstruction may constant fold them away. That's not so bad for the case where we're inlining a no-strictfp function into a strictfp function, but for the case where both functions are strictfp it's a problem. andrew.w.kaylor: After further investigation, I think it is going to be necessary to attach the strictfp…
		default:
		llvm_unreachable("Unexpected constrained intrinsic id");
		}
		}

		Instruction *
		PruningFunctionCloner::cloneInstruction(BasicBlock::const_iterator II) {
		const Instruction &OldInst = *II;
		Instruction *NewInst = nullptr;
		if (HostFuncIsStrictFP) {
		Intrinsic::ID CIID = getConstrainedIntrinsicID(OldInst);
		if (CIID != Intrinsic::not_intrinsic) {
		// Instead of cloning the instruction, a call to constrained intrinsic
		// should be created.
		// Assume the first arguments of constrained intrinsics are the same as
		// the operands of original instruction.

		// Determine overloaded types of the intrinsic.
		SmallVector<Type *, 2> TParams;
		kpnUnsubmitted Done Reply Inline Actions Not all constrained intrinsics take both metadata arguments. See the LangRef or the IR verifier for the details. kpn: Not all constrained intrinsics take both metadata arguments. See the LangRef or the IR verifier…
		SmallVector<Intrinsic::IITDescriptor, 8> Descriptor;
		getIntrinsicInfoTableEntries(CIID, Descriptor);
		kpnUnsubmitted Done Reply Inline Actions I don't see the strictfp attribute being added to the call. kpn: I don't see the strictfp attribute being added to the call.
		for (unsigned I = 0, E = Descriptor.size(); I != E; ++I) {
		Intrinsic::IITDescriptor Operand = Descriptor[I];
		kpnUnsubmitted Not Done Reply Inline Actions Is this the path that a call instruction goes through? Because it'll need the strictfp attribute added. kpn: Is this the path that a call instruction goes through? Because it'll need the strictfp…
		switch (Operand.Kind) {
		case Intrinsic::IITDescriptor::Argument:
		if (Operand.getArgumentKind() !=
		Intrinsic::IITDescriptor::AK_MatchType) {
		if (I == 0)
		TParams.push_back(OldInst.getType());
		else
		TParams.push_back(OldInst.getOperand(I - 1)->getType());
		}
		break;
		case Intrinsic::IITDescriptor::SameVecWidthArgument:
		++I;
		break;
		default:
		break;
		}
		}

		// Create intrinsic call.
		LLVMContext &Ctx = NewFunc->getContext();
		Function *IFn =
		Intrinsic::getDeclaration(NewFunc->getParent(), CIID, TParams);
		SmallVector<Value *, 4> Args;
		unsigned NumOperands = OldInst.getNumOperands();
		if (isa<CallInst>(OldInst))
		--NumOperands;
		for (unsigned I = 0; I < NumOperands; ++I) {
		Value *Op = OldInst.getOperand(I);
		Args.push_back(Op);
		}
		if (const auto *CmpI = dyn_cast<FCmpInst>(&OldInst)) {
		FCmpInst::Predicate Pred = CmpI->getPredicate();
		StringRef PredName = FCmpInst::getPredicateName(Pred);
		Args.push_back(MetadataAsValue::get(Ctx, MDString::get(Ctx, PredName)));
		}

		// The last arguments of a constrained intrinsic are metadata that
		// represent rounding mode (absents in some intrinsics) and exception
		// behavior. The inlined function uses default settings.
		if (hasRoundingModeOperand(CIID))
		Args.push_back(
		MetadataAsValue::get(Ctx, MDString::get(Ctx, "round.tonearest")));
		Args.push_back(
		MetadataAsValue::get(Ctx, MDString::get(Ctx, "fpexcept.ignore")));

		NewInst = CallInst::Create(IFn, Args, OldInst.getName() + ".strict");
		}
		}
		if (!NewInst)
		NewInst = II->clone();
		return NewInst;
		}

/// The specified block is found to be reachable, clone it and		/// The specified block is found to be reachable, clone it and
/// anything that it can reach.		/// anything that it can reach.
void PruningFunctionCloner::CloneBlock(		void PruningFunctionCloner::CloneBlock(
const BasicBlock *BB, BasicBlock::const_iterator StartingInst,		const BasicBlock *BB, BasicBlock::const_iterator StartingInst,
std::vector<const BasicBlock *> &ToClone) {		std::vector<const BasicBlock *> &ToClone) {
WeakTrackingVH &BBEntry = VMap[BB];		WeakTrackingVH &BBEntry = VMap[BB];

// Have we already cloned this block?		// Have we already cloned this block?
Show All 23 Lines	void PruningFunctionCloner::CloneBlock(

bool hasCalls = false, hasDynamicAllocas = false, hasStaticAllocas = false;		bool hasCalls = false, hasDynamicAllocas = false, hasStaticAllocas = false;

// Loop over all instructions, and copy them over, DCE'ing as we go. This		// Loop over all instructions, and copy them over, DCE'ing as we go. This
// loop doesn't include the terminator.		// loop doesn't include the terminator.
for (BasicBlock::const_iterator II = StartingInst, IE = --BB->end(); II != IE;		for (BasicBlock::const_iterator II = StartingInst, IE = --BB->end(); II != IE;
++II) {		++II) {

Instruction *NewInst = II->clone();		Instruction *NewInst = cloneInstruction(II);

		if (HostFuncIsStrictFP) {
		// All function calls in the inlined function must get 'strictfp'
		// attribute to prevent undesirable optimizations.
		if (auto *Call = dyn_cast<CallInst>(NewInst))
		Call->addFnAttr(Attribute::StrictFP);
		}

// Eagerly remap operands to the newly cloned instruction, except for PHI		// Eagerly remap operands to the newly cloned instruction, except for PHI
// nodes for which we defer processing until we update the CFG.		// nodes for which we defer processing until we update the CFG.
if (!isa<PHINode>(NewInst)) {		if (!isa<PHINode>(NewInst)) {
RemapInstruction(NewInst, VMap,		RemapInstruction(NewInst, VMap,
ModuleLevelChanges ? RF_None : RF_NoModuleLevelChanges);		ModuleLevelChanges ? RF_None : RF_NoModuleLevelChanges);

// If we can simplify this instruction to some other value, simply add		// If we can simplify this instruction to some other value, simply add
▲ Show 20 Lines • Show All 653 Lines • Show Last 20 Lines

llvm/lib/Transforms/Utils/InlineFunction.cpp

Show First 20 Lines • Show All 1,782 Lines • ▼ Show 20 Lines	llvm::InlineResult llvm::InlineFunction(CallBase &CB, InlineFunctionInfo &IFI,

// If the call to the callee cannot throw, set the 'nounwind' flag on any		// If the call to the callee cannot throw, set the 'nounwind' flag on any
// calls that we inline.		// calls that we inline.
bool MarkNoUnwind = CB.doesNotThrow();		bool MarkNoUnwind = CB.doesNotThrow();

BasicBlock *OrigBB = CB.getParent();		BasicBlock *OrigBB = CB.getParent();
Function *Caller = OrigBB->getParent();		Function *Caller = OrigBB->getParent();

		// Do not inline strictfp function into non-strictfp one. It would require
		// conversion of all FP operations in host function to constrained intrinsics.
		if (CalledFunc->getAttributes().hasFnAttr(Attribute::StrictFP) &&
		!Caller->getAttributes().hasFnAttr(Attribute::StrictFP)) {
		return InlineResult::failure("incompatible strictfp attributes");
		}

// GC poses two hazards to inlining, which only occur when the callee has GC:		// GC poses two hazards to inlining, which only occur when the callee has GC:
// 1. If the caller has no GC, then the callee's GC must be propagated to the		// 1. If the caller has no GC, then the callee's GC must be propagated to the
// caller.		// caller.
// 2. If the caller has a differing GC, it is invalid to inline.		// 2. If the caller has a differing GC, it is invalid to inline.
if (CalledFunc->hasGC()) {		if (CalledFunc->hasGC()) {
if (!Caller->hasGC())		if (!Caller->hasGC())
Caller->setGC(CalledFunc->getGC());		Caller->setGC(CalledFunc->getGC());
else if (CalledFunc->getGC() != Caller->getGC())		else if (CalledFunc->getGC() != Caller->getGC())
▲ Show 20 Lines • Show All 855 Lines • Show Last 20 Lines

llvm/test/Transforms/CodeExtractor/PartialInlineAttributes.ll

	Show First 20 Lines • Show All 67 Lines • ▼ Show 20 Lines

	; CHECK: define internal void @callee_writeonly.1.if.then(i32 %v, i32* %sub.out) [[FN_ATTRS0:#[0-9]+]]			; CHECK: define internal void @callee_writeonly.1.if.then(i32 %v, i32* %sub.out) [[FN_ATTRS0:#[0-9]+]]
	; CHECK: define internal void @callee_most.2.if.then(i32 %v, i32* %sub.out) [[FN_ATTRS:#[0-9]+]]			; CHECK: define internal void @callee_most.2.if.then(i32 %v, i32* %sub.out) [[FN_ATTRS:#[0-9]+]]

	; attributes to preserve			; attributes to preserve
	attributes #0 = {			attributes #0 = {
	inlinehint minsize noduplicate noimplicitfloat norecurse noredzone nounwind			inlinehint minsize noduplicate noimplicitfloat norecurse noredzone nounwind
	nonlazybind optsize safestack sanitize_address sanitize_hwaddress sanitize_memory			nonlazybind optsize safestack sanitize_address sanitize_hwaddress sanitize_memory
	sanitize_thread ssp sspreq sspstrong strictfp uwtable "foo"="bar"			sanitize_thread ssp sspreq sspstrong uwtable "foo"="bar"
	"patchable-function"="prologue-short-redirect" "probe-stack"="_foo_guard" "stack-probe-size"="4096" }			"patchable-function"="prologue-short-redirect" "probe-stack"="_foo_guard" "stack-probe-size"="4096" }

	; CHECK: attributes [[FN_ATTRS0]] = { ssp			; CHECK: attributes [[FN_ATTRS0]] = { ssp
	; CHECK: attributes [[FN_ATTRS]] = { inlinehint minsize noduplicate noimplicitfloat norecurse noredzone nounwind nonlazybind optsize safestack sanitize_address sanitize_hwaddress sanitize_memory sanitize_thread ssp sspreq sspstrong strictfp uwtable "foo"="bar" "patchable-function"="prologue-short-redirect" "probe-stack"="_foo_guard" "stack-probe-size"="4096" }			; CHECK: attributes [[FN_ATTRS]] = { inlinehint minsize noduplicate noimplicitfloat norecurse noredzone nounwind nonlazybind optsize safestack sanitize_address sanitize_hwaddress sanitize_memory sanitize_thread ssp sspreq sspstrong uwtable "foo"="bar" "patchable-function"="prologue-short-redirect" "probe-stack"="_foo_guard" "stack-probe-size"="4096" }

	; attributes to drop			; attributes to drop
	attributes #1 = {			attributes #1 = {
	alignstack=16 convergent inaccessiblememonly inaccessiblemem_or_argmemonly naked			alignstack=16 convergent inaccessiblememonly inaccessiblemem_or_argmemonly naked
	noreturn readonly argmemonly returns_twice speculatable "thunk"			noreturn readonly argmemonly returns_twice speculatable "thunk"
	}			}

llvm/test/Transforms/Inline/inline-strictfp.ll

This file was added.

				; RUN: opt -inline %s -S \| FileCheck %s


				; Ordinary function is inlined into strictfp function.

				define float @inlined_01(float %a) {
				entry:
				%add = fadd float %a, %a
				ret float %add
				}

				define float @host_02(float %a) #0 {
				entry:
				%0 = call float @inlined_01(float %a) #0
				%add = call float @llvm.experimental.constrained.fadd.f32(float %0, float 2.000000e+00, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
				ret float %add
				; CHECK_LABEL: @host_02
				; CHECK: call float @llvm.experimental.constrained.fadd.f32(float {{.}}, float {{.}}, metadata !"round.tonearest", metadata !"fpexcept.ignore") #0
				; CHECK: call float @llvm.experimental.constrained.fadd.f32(float {{.*}}, float 2.000000e+00, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
				}


				; strictfp function is inlined into another strictfp function.

				define float @inlined_03(float %a) #0 {
				entry:
				%add = call float @llvm.experimental.constrained.fadd.f32(float %a, float %a, metadata !"round.downward", metadata !"fpexcept.maytrap") #0
				ret float %add
				}

				define float @host_04(float %a) #0 {
				entry:
				%0 = call float @inlined_03(float %a) #0
				%add = call float @llvm.experimental.constrained.fadd.f32(float %0, float 2.000000e+00, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
				ret float %add
				; CHECK_LABEL: @host_04
				; CHECK: call float @llvm.experimental.constrained.fadd.f32(float {{.}}, float {{.}}, metadata !"round.downward", metadata !"fpexcept.maytrap") #0
				; CHECK: call float @llvm.experimental.constrained.fadd.f32(float {{.*}}, float 2.000000e+00, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
				}


				; strictfp function is NOT inlined into ordinary function.

				define float @inlined_05(float %a) strictfp {
				entry:
				%add = call float @llvm.experimental.constrained.fadd.f32(float %a, float %a, metadata !"round.downward", metadata !"fpexcept.maytrap") #0
				ret float %add
				}

				define float @host_06(float %a) {
				entry:
				%0 = call float @inlined_05(float %a)
				%add = fadd float %0, 2.000000e+00
				ret float %add
				; CHECK_LABEL: @host_06
				; CHECK: call float @inlined_05(float %a)
				; CHECK: fadd float %0, 2.000000e+00
				}


				; Calls in inlined function must get strictfp attribute.

				declare float @func_ext(float);

				define float @inlined_07(float %a) {
				entry:
				%0 = call float @func_ext(float %a)
				%add = fadd float %0, %a

				ret float %add
				}

				define float @host_08(float %a) #0 {
				entry:
				%0 = call float @inlined_07(float %a) #0
				%add = call float @llvm.experimental.constrained.fadd.f32(float %0, float 2.000000e+00, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
				ret float %add
				; CHECK_LABEL: @host_08
				; CHECK: call float @func_ext(float {{.*}}) #0
				; CHECK: call float @llvm.experimental.constrained.fadd.f32(float {{.}}, float {{.}}, metadata !"round.tonearest", metadata !"fpexcept.ignore") #0
				; CHECK: call float @llvm.experimental.constrained.fadd.f32(float {{.*}}, float 2.000000e+00, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
				}


				; Cloning particular instructions.

				; fpext has two overloaded types.
				define double @inlined_09(float %a) {
				entry:
				%t = fpext float %a to double
				ret double %t
				}

				define double @host_10(float %a) #0 {
				entry:
				%0 = call double @inlined_09(float %a) #0
				%add = call double @llvm.experimental.constrained.fadd.f64(double %0, double 2.000000e+00, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
				ret double %add
				; CHECK_LABEL: @host_10
				; CHECK: call double @llvm.experimental.constrained.fpext.f64.f32(float {{.*}}, metadata !"fpexcept.ignore") #0
				; CHECK: call double @llvm.experimental.constrained.fadd.f64(double {{.*}}, double 2.000000e+00, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
				}

				; fcmp does not depend on rounding mode and has metadata argument.
				define i1 @inlined_11(float %a, float %b) {
				entry:
				%t = fcmp oeq float %a, %b
				ret i1 %t
				}

				define i1 @host_12(float %a, float %b) #0 {
				entry:
				%add = call float @llvm.experimental.constrained.fadd.f32(float %a, float %b, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
				%cmp = call i1 @inlined_11(float %a, float %b) #0
				ret i1 %cmp
				; CHECK_LABEL: @host_12
				; CHECK: call float @llvm.experimental.constrained.fadd.f32(float %a, float %b, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
				; CHECK: call i1 @llvm.experimental.constrained.fcmp.f32(float {{.*}}, metadata !"oeq", metadata !"fpexcept.ignore") #0
				}

				; Intrinsic 'ceil' has constrained variant.
				define float @inlined_13(float %a) {
				entry:
				%t = call float @llvm.ceil.f32(float %a)
				ret float %t
				}

				define float @host_14(float %a) #0 {
				entry:
				%0 = call float @inlined_13(float %a) #0
				%add = call float @llvm.experimental.constrained.fadd.f32(float %0, float 2.000000e+00, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
				ret float %add
				; CHECK_LABEL: @host_14
				; CHECK: call float @llvm.experimental.constrained.ceil.f32(float %a, metadata !"fpexcept.ignore") #0
				; CHECK: call float @llvm.experimental.constrained.fadd.f32(float {{.*}}, float 2.000000e+00, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
				}

				attributes #0 = { strictfp }

				declare float @llvm.experimental.constrained.fadd.f32(float, float, metadata, metadata)
				declare double @llvm.experimental.constrained.fadd.f64(double, double, metadata, metadata)
				declare double @llvm.experimental.constrained.fpext.f64.f32(float, metadata)
				declare i1 @llvm.experimental.constrained.fcmp.f32(float, float, metadata, metadata)
				declare float @llvm.experimental.constrained.ceil.f32(float, metadata)
				declare float @llvm.ceil.f32(float)

This is an archive of the discontinued LLVM Phabricator instance.

Implement inlining of strictfp functionsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 419404

llvm/lib/Transforms/Utils/CloneFunction.cpp

llvm/lib/Transforms/Utils/InlineFunction.cpp

llvm/test/Transforms/CodeExtractor/PartialInlineAttributes.ll

llvm/test/Transforms/Inline/inline-strictfp.ll

Implement inlining of strictfp functions
ClosedPublic