This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
Analysis/
-
InlineCost.h
2/4
TargetLibraryInfo.h
-
Transforms/IPO/
-
IPO/
-
Inliner.h
-
lib/
-
Analysis/
-
InlineCost.cpp
-
Target/AMDGPU/
-
AMDGPU/
-
AMDGPUInline.cpp
-
Transforms/IPO/
-
IPO/
-
InlineSimple.cpp
-
Inliner.cpp
2/3
PartialInlining.cpp
-
SampleProfile.cpp
-
test/Transforms/Inline/
-
Transforms/
-
Inline/
-
inline-no-builtin-compatible.ll

Differential D74162

[Inliner] Inlining should honor nobuiltin attributes
ClosedPublic

Authored by tejohnson on Feb 6 2020, 1:36 PM.

Download Raw Diff

Details

Reviewers

hfinkel
gchatelet
chandlerc
davidxl

Commits

rGf9ca75f19bab: [Inliner] Inlining should honor nobuiltin attributes

Summary

Final patch in series to fix inlining between functions with different
nobuiltin attributes/options, which was specifically an issue in LTO.
See discussion on D61634 for background.

The prior patch in this series (D67923) enabled per-Function TLI
construction that identified the nobuiltin attributes.

Here I have allowed inlining to proceed if the callee's nobuiltins are a
subset of the caller's nobuiltins, but not in the reverse case, which
should be conservatively correct.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

tejohnson created this revision.Feb 6 2020, 1:36 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 6 2020, 1:36 PM

Herald added subscribers: kerbowa, dexonsmith, haicheng and 6 others. · View Herald Transcript

Harbormaster completed remote builds in B45890: Diff 242994.Feb 6 2020, 1:38 PM

Apart from my other comment LGTM

llvm/lib/Transforms/IPO/PartialInlining.cpp
393	I'm having a hard time convincing myself that the lifetime requirements are correct here. Passing a local variable `GetTLI` by address in `return` statement looks fishy. It's similar to `GetTTI` so is seems correct, it's just hard to tell by looking at the code. Same above and below.

nhaehnle removed a subscriber: nhaehnle.Feb 7 2020, 3:18 AM

tejohnson marked an inline comment as done.Feb 7 2020, 6:49 AM

tejohnson added inline comments.

llvm/lib/Transforms/IPO/PartialInlining.cpp
393	What's being returned is the bool result of the run() call, not the PartialInlinerImpl object, which doesn't survive past this function and therefore the GetTLI scope.

Let's wait a bit for other reviewers to comment.

llvm/lib/Transforms/IPO/PartialInlining.cpp
393	Ha right, I got confused by the formatting and missed the `run()` on next line.

This revision is now accepted and ready to land.Feb 7 2020, 7:13 AM

@davidxl David can you take a look at it from the inliner side and let me know if it looks ok?

davidxl added inline comments.Feb 27 2020, 10:03 AM

llvm/include/llvm/Analysis/TargetLibraryInfo.h
267	This may be bad for performance -- as the inline instance will be optimized differently.
llvm/test/Transforms/Inline/X86/inline-no-builtin-compatible.ll
6 ↗	(On Diff #242994)	Is this test x86 specific?
23 ↗	(On Diff #242994)	why not directly check-not call?

tejohnson marked 3 inline comments as done.Feb 27 2020, 10:13 AM

tejohnson added inline comments.

llvm/include/llvm/Analysis/TargetLibraryInfo.h
267	Note that it won't be worse than head, which doesn't restrict the inlines based on nobuiltin attributes at all. We could also just disallow inlining completely between callers/callees with different nobuiltin attributes. But I was concerned that this would degrade performance too much by disallowing inlining in too many cases.
llvm/test/Transforms/Inline/X86/inline-no-builtin-compatible.ll
6 ↗	(On Diff #242994)	Actually not, I can move to parent directory.
23 ↗	(On Diff #242994)	Yeah that would be better, will change.

Address comments

davidxl added inline comments.Feb 27 2020, 10:20 AM

llvm/include/llvm/Analysis/TargetLibraryInfo.h
267	Perhaps add an additional parameter to the interface to allow superset behavior. Then in the inlineCost.cpp, add an internal option to specify whether strict attribute matching is required -- the default can be the current behavior -- allow inlining into superset.

Harbormaster completed remote builds in B47464: Diff 247023.Feb 27 2020, 11:35 AM

tejohnson marked an inline comment as done.Feb 27 2020, 9:51 PM

Add internal option to control superset check and test it

lgtm

Harbormaster completed remote builds in B47550: Diff 247165.Feb 27 2020, 10:34 PM

gchatelet added inline comments.Feb 28 2020, 1:11 AM

llvm/include/llvm/Analysis/TargetLibraryInfo.h
267	Note that it won't be worse than head, which doesn't restrict the inlines based on nobuiltin attributes at all. We could also just disallow inlining completely between callers/callees with different nobuiltin attributes. But I was concerned that this would degrade performance too much by disallowing inlining in too many cases. I agree disallowing inlining completely when `nobuiltin` differ would prevent inlining of basic memory functions entirely (memset, memcpy, etc..)

Closed by commit rGf9ca75f19bab: [Inliner] Inlining should honor nobuiltin attributes (authored by tejohnson). · Explain WhyFeb 28 2020, 7:44 AM

This revision was automatically updated to reflect the committed changes.

tejohnson mentioned this in D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++.Feb 28 2020, 7:47 AM

I was discussing this patch with @arsenm after investigating why the calls to libc functions weren't being inlined on the GPU. What was the motivation for the original change? I figured it would make no difference if we inlined a function that doesn't allow built-ins into one that does. This is problematic because it prevents us from inlining any function from the LLVM C library under LTO. I'd like a way to at least work around this in the short term, but I'd like some context for why this was introduced.

Herald added a project: Restricted Project. · View Herald TranscriptJul 17 2023, 3:29 PM

Herald added subscribers: hoy, wlei, ormris and 3 others. · View Herald Transcript

In D74162#4508033, @jhuber6 wrote:

I was discussing this patch with @arsenm after investigating why the calls to libc functions weren't being inlined on the GPU. What was the motivation for the original change? I figured it would make no difference if we inlined a function that doesn't allow built-ins into one that does. This is problematic because it prevents us from inlining any function from the LLVM C library under LTO. I'd like a way to at least work around this in the short term, but I'd like some context for why this was introduced.

@gchatelet to add more context as this was related to work he was doing for custom implementations of libc memcpy functions.

Context is in D61634. See specifically the discussion from https://reviews.llvm.org/D61634#1502201 to https://reviews.llvm.org/D61634#1512020 which discusses the inliner changes for the new attribute. IIRC the issue is that allowing such inlining would lose the no-builtin attributes, which would defeat their purpose.

In D74162#4514824, @tejohnson wrote:

In D74162#4508033, @jhuber6 wrote:

I was discussing this patch with @arsenm after investigating why the calls to libc functions weren't being inlined on the GPU. What was the motivation for the original change? I figured it would make no difference if we inlined a function that doesn't allow built-ins into one that does. This is problematic because it prevents us from inlining any function from the LLVM C library under LTO. I'd like a way to at least work around this in the short term, but I'd like some context for why this was introduced.

@gchatelet to add more context as this was related to work he was doing for custom implementations of libc memcpy functions.

Context is in D61634. See specifically the discussion from https://reviews.llvm.org/D61634#1502201 to https://reviews.llvm.org/D61634#1512020 which discusses the inliner changes for the new attribute. IIRC the issue is that allowing such inlining would lose the no-builtin attributes, which would defeat their purpose.

Losing the ability to inline any function implemented in an LTO library compiled with -ffreestanding seems like a very bad tradeoff. I was talking with @arsenm about this and the attribute seems undocumented and sparsely tested. What was the specific failure case that this was introduced to solve? We lose that attribute, but is that a bad thing? If we inline a function that cannot call builtins into a function that can, what is the issue with calling builtins at that point?

In D74162#4514856, @jhuber6 wrote:

In D74162#4514824, @tejohnson wrote:

In D74162#4508033, @jhuber6 wrote:

I was discussing this patch with @arsenm after investigating why the calls to libc functions weren't being inlined on the GPU. What was the motivation for the original change? I figured it would make no difference if we inlined a function that doesn't allow built-ins into one that does. This is problematic because it prevents us from inlining any function from the LLVM C library under LTO. I'd like a way to at least work around this in the short term, but I'd like some context for why this was introduced.

@gchatelet to add more context as this was related to work he was doing for custom implementations of libc memcpy functions.

Context is in D61634. See specifically the discussion from https://reviews.llvm.org/D61634#1502201 to https://reviews.llvm.org/D61634#1512020 which discusses the inliner changes for the new attribute. IIRC the issue is that allowing such inlining would lose the no-builtin attributes, which would defeat their purpose.

Losing the ability to inline any function implemented in an LTO library compiled with -ffreestanding seems like a very bad tradeoff. I was talking with @arsenm about this and the attribute seems undocumented and sparsely tested. What was the specific failure case that this was introduced to solve? We lose that attribute, but is that a bad thing? If we inline a function that cannot call builtins into a function that can, what is the issue with calling builtins at that point?

Adding @sivachandra for the LLVM libc part.

The gist of the issue is described in the RFC.

A few considerations first:

LLVM libc is not written in assembly because we want compiler support for optimizations, sanitizers and fuzzers.
Compiler is able to recognize C/C++ constructs and turn them into library calls unless we use the dedicated -fno-builtin-* compiler flags.
-ffreestanding implies -fno-builtins (i.e., all builtins).

Now let's consider the case of a foo function calling memcpy where we allow inlining of functions with different attributes.
foo is compiled without any of -ffreestanding/-fno-builtin-*.
During LTO, the compiler decides to inline all of the code of memcpy within foo, discarding memcpy's -fno-builtin-memcpy flag (stored as a function attribute).
Deep down memcpy's implementation the compiler recognizes a copy loop and turns it into the @llvm.memcpy IR intrinsic.
The backend decides that this intrinsic is best implemented by calling libc's memcpy and inserts a call.
Now memcpy is implemented by a call to itself and will eventually stack overflow or infinite loop.
This happened in production.

Now for memcpy in particular we have introduced __builtin_memcpy_inline (doc) which allows us to convey the semantics of "fixed size operations that always generate code". They are guaranteed to never "call to the libc" (godbolt). Unfortunately those intrinsics are not supported in GCC so this would be clang only. Also they don't yet cover memcmp and bcmp.

Today only memcpy relies completely on __builtin_memcpy_inline (when compiled with clang) so technically this function could be compiled without -ffreestanding nor -fno-builtin and thus inlined during LTO.

Now it is very desirable that the libc functions can be inlined through LTO, we may be able to reach this state when we have strong guarantees that the compiler will not generate libc calls from within the libc function themselves. We're there for memcpy which really is the most fundamental libc function. But we're not there for most of the libc.

Note: for the reasons described above we had to implement the no-builtin flags as function attributes so they can stick to code even during LTO.

In D74162#4518301, @gchatelet wrote:

Adding @sivachandra for the LLVM libc part.

The gist of the issue is described in the RFC.

A few considerations first:

LLVM libc is not written in assembly because we want compiler support for optimizations, sanitizers and fuzzers.

Compiler is able to recognize C/C++ constructs and turn them into library calls unless we use the dedicated -fno-builtin-* compiler flags.

-ffreestanding implies -fno-builtins (i.e., all builtins).

Now let's consider the case of a foo function calling memcpy where we allow inlining of functions with different attributes.
foo is compiled without any of -ffreestanding/-fno-builtin-*.
During LTO, the compiler decides to inline all of the code of memcpy within foo, discarding memcpy's -fno-builtin-memcpy flag (stored as a function attribute).
Deep down memcpy's implementation the compiler recognizes a copy loop and turns it into the @llvm.memcpy IR intrinsic.
The backend decides that this intrinsic is best implemented by calling libc's memcpy and inserts a call.
Now memcpy is implemented by a call to itself and will eventually stack overflow or infinite loop.
This happened in production.

Now for memcpy in particular we have introduced __builtin_memcpy_inline (doc) which allows us to convey the semantics of "fixed size operations that always generate code". They are guaranteed to never "call to the libc" (godbolt). Unfortunately those intrinsics are not supported in GCC so this would be clang only. Also they don't yet cover memcmp and bcmp.

Today only memcpy relies completely on __builtin_memcpy_inline (when compiled with clang) so technically this function could be compiled without -ffreestanding nor -fno-builtin and thus inlined during LTO.

Now it is very desirable that the libc functions can be inlined through LTO, we may be able to reach this state when we have strong guarantees that the compiler will not generate libc calls from within the libc function themselves. We're there for memcpy which really is the most fundamental libc function. But we're not there for most of the libc.

Note: for the reasons described above we had to implement the no-builtin flags as function attributes so they can stick to code even during LTO.

Thanks for the background. Was there a reason that we could not simply merge the nobuiltin attribute into the caller? By the time we get to this point it's likely that we've already run optimizations on the TUs individually so I'd guess that we wouldn't lose any potential optimizations more severe than not being able to inline the function in the first place.

In D74162#4518301, @gchatelet wrote:

The backend decides that this intrinsic is best implemented by calling libc's memcpy and inserts a call.

I looked into this and it's just a straight SelectionDAG bug. DAG.getMemcpy tries expanding inline depending on target size thresholds and custom instructions, but ultimately unconditionally emits an illegal libcall. It needs to consider the function availability. I've already "fixed" this particular issue in 3c848194f28decca41b7362f9dd35d4939797724. This is an IR pass to work around SelectionDAG limitations, the DAG cannot directly emit a loop. Really the DAG builder code should emit a hard error if the memcpy runtime call isn't available. There is a small gap where unhandled variable sized memcpys could be introduced between PreISelIntrinsicLowering and isel, but this is just a SelectionDAG implementation issue. That would only be an issue for a hypothetical pass which were to introduce a variable sized memcpy in the late codegen pipeline, which I can't see a need for. GlobalISel doesn't have the same limitation, and we could always come up with a post-isel expanded pseudo hack to deal with it.

I think part of the fundamental problem is we've jammed two different concepts into a confusing set of overlapping control mechanisms. We currently have this set of attributes:

`builtin`
`nobuiltin`
"no-builtins"
"no-builtin-<func>"

I'm unclear on exactly the semantics of any of these is supposed to be. "no-builtins" isn't in the LangRef, and the wording on nobuiltin is oddly call site specific. With this confusion, I think the implementation is just buggy. What is the difference between nobuiltin and "no-builtins" supposed to be? The nobuiltin LangRef merely states it may be placed on declarations and definitions, but doesn't state what that means in that case.

There are 2 different concepts here: what calls the compiler is allowed to introduce, and which the compiler recognizes as having known semantics. There are traces of evidence that they're supposed to be separate but this wasn't fully implemented. We separately have "RuntimeLibcalls" and "TargetLibraryInfo", which have some overlap in the set of functions. Both are unfortunately hardcoded sets the target has to opt-out of manually. TargetLibraryInfo tries to respect "no-builtins", but no equivalent mechanism is used for "RuntimeLibcalls". As such, I do think the PreISelIntrinsicLowering case should be using some kind of runtime libcall information, though not necessarily TargetLibraryInfo.

Now for memcpy in particular we have introduced __builtin_memcpy_inline (doc) which allows us to convey the semantics of "fixed size operations that always generate code". They are guaranteed to never "call to the libc" (godbolt). Unfortunately those intrinsics are not supported in GCC so this would be clang only. Also they don't yet cover memcmp and bcmp.

I do not believe memcpy inline should be necessary. It should always be correct to emit an llvm.memcpy call anywhere, and the backend can always expand it. It needs to consider runtime call target availability, which is just not considered currently as a bug.

arsenm mentioned this in D155790: PreISelIntrinsicLowering: don't expand memcpys in minsize functions, even with no-builtins..Jul 20 2023, 4:22 PM

I'm unclear on exactly the semantics of any of these is supposed to be. "no-builtins" isn't in the LangRef, and the wording on nobuiltin is oddly call site specific. With this confusion, I think the implementation is just buggy. What is the difference between nobuiltin and "no-builtins" supposed to be? The nobuiltin LangRef merely states it may be placed on declarations and definitions, but doesn't state what that means in that case.

Roughly speaking, nobuiltin exists to suppress libcall recognition on a specific call, and "no-builtins" suppresses libcall recognition in function definitions. Sort of similar, but not exactly the same. (I think the primary case where we need "nobuiltin" C++ operator new, where a new expression and an explicit call to operator new have different optimization rules.)

I looked into this and it's just a straight SelectionDAG bug. DAG.getMemcpy tries expanding inline depending on target size thresholds and custom instructions, but ultimately unconditionally emits an illegal libcall

memcpy in particular has its own set of considerations. Historically, we required the user to provide a symbol named "memcpy", no matter what flags they passed. This has turned out to be unsatisfactory in certain scenarios, so we've worked around it. The primary form of workaround is that we don't form memcpys; if the input IR doesn't contain any reference to llvm.memcpy, we don't introduce such a reference. This makes optimizations slightly worse, but basically avoids the weird issues without significantly pessimizing code in most cases. (memcpy formation is pretty niche in most cases.)

Technically speaking, a codepath was recently added to expand memcpy. But it generates extremely bad code, to the point where it's not actually usable unless you don't care at all about performance. So practically speaking, nothing has changed here.

but no equivalent mechanism is used for "RuntimeLibcalls"

Historically, we assumed that all the functions in RuntimeLibcalls were available even in no-builtins mode, because we don't have alternative implementations. This has always had various holes, but it was a good enough approximation to allow building system libraries like libc.

Was there a reason that we could not simply merge the nobuiltin attribute into the caller?

The primary issue with LTO'ing parts of libc into a program is that we're basically crossing a line: once we start inlining calls into the program, "libc" is no longer an abstract interface, so we have to turn off libcall recognition for the whole module. I don't think merging the nobuiltin markings when you inline a function is sufficient.

This discussion is going in so many different directions it's hard for me to address everything...

In D74162#4526458, @efriedma wrote:

Was there a reason that we could not simply merge the nobuiltin attribute into the caller?

The primary issue with LTO'ing parts of libc into a program is that we're basically crossing a line: once we start inlining calls into the program, "libc" is no longer an abstract interface, so we have to turn off libcall recognition for the whole module. I don't think merging the nobuiltin markings when you inline a function is sufficient.

That's completely satisfactory at least for my case, as the GPU has no hosted environment we can make assumptions about not emitting such calls in the backend. There's a similar problem with this that I attempted to solve in https://reviews.llvm.org/D154364 but I'm currently rethinking. The problem there is that we prevent explicit internalization of recognized libcalls during LTO because the backend might emit a call to it later. I think at the end of the day we need a list of libcall functions that each backend can emit, rather than just assume each backend roughly emits everything x64 does. But I'm not familiar enough with the machinery to know where such a function could live. I was thinking that we could maybe has a pass that decorates libcalls with special attributes if we know from the target library info that it is not emitted by the backend, but I don't know how feasible that is in a generic IR pass before touching the backend, not overly familiar with that machinery.

As it stands, not being able to inline even something simple like isalnum on the GPU is not great, so I'd appreciate some help in figuring out some kind of workaround even if it's just for the GPU. I feel like this is a bit of an edge case since up until now no one has really considered being able to optimize out libcalls.

In D74162#4526483, @jhuber6 wrote:

In D74162#4526458, @efriedma wrote:

Was there a reason that we could not simply merge the nobuiltin attribute into the caller?

The primary issue with LTO'ing parts of libc into a program is that we're basically crossing a line: once we start inlining calls into the program, "libc" is no longer an abstract interface, so we have to turn off libcall recognition for the whole module. I don't think merging the nobuiltin markings when you inline a function is sufficient.

That's completely satisfactory at least for my case, as the GPU has no hosted environment we can make assumptions about not emitting such calls in the backend. There's a similar problem with this that I attempted to solve in https://reviews.llvm.org/D154364 but I'm currently rethinking. The problem there is that we prevent explicit internalization of recognized libcalls during LTO because the backend might emit a call to it later. I think at the end of the day we need a list of libcall functions that each backend can emit, rather than just assume each backend roughly emits everything x64 does. But I'm not familiar enough with the machinery to know where such a function could live. I was thinking that we could maybe has a pass that decorates libcalls with special attributes if we know from the target library info that it is not emitted by the backend, but I don't know how feasible that is in a generic IR pass before touching the backend, not overly familiar with that machinery.

As it stands, not being able to inline even something simple like isalnum on the GPU is not great, so I'd appreciate some help in figuring out some kind of workaround even if it's just for the GPU. I feel like this is a bit of an edge case since up until now no one has really considered being able to optimize out libcalls.

Given the breadth of the issue it probably deserves an RFC on discourse.

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

InlineCost.h

3 lines

TargetLibraryInfo.h

15 lines

Transforms/

IPO/

Inliner.h

1 line

lib/

Analysis/

InlineCost.cpp

26 lines

Target/

AMDGPU/

AMDGPUInline.cpp

4 lines

Transforms/

IPO/

2 lines

9 lines

24 lines

42 lines

test/

Transforms/

Inline/

inline-no-builtin-compatible.ll

94 lines

Diff 247165

llvm/include/llvm/Analysis/InlineCost.h

	Show All 21 Lines
	namespace llvm {			namespace llvm {
	class AssumptionCacheTracker;			class AssumptionCacheTracker;
	class BlockFrequencyInfo;			class BlockFrequencyInfo;
	class CallBase;			class CallBase;
	class DataLayout;			class DataLayout;
	class Function;			class Function;
	class ProfileSummaryInfo;			class ProfileSummaryInfo;
	class TargetTransformInfo;			class TargetTransformInfo;
				class TargetLibraryInfo;

	namespace InlineConstants {			namespace InlineConstants {
	// Various thresholds used by inline cost analysis.			// Various thresholds used by inline cost analysis.
	/// Use when optsize (-Os) is specified.			/// Use when optsize (-Os) is specified.
	const int OptSizeThreshold = 50;			const int OptSizeThreshold = 50;

	/// Use when minsize (-Oz) is specified.			/// Use when minsize (-Oz) is specified.
	const int OptMinSizeThreshold = 5;			const int OptMinSizeThreshold = 5;
	▲ Show 20 Lines • Show All 176 Lines • ▼ Show 20 Lines
	/// sufficiently low to warrant inlining.			/// sufficiently low to warrant inlining.
	///			///
	/// Also note that calling this function dynamically computes the cost of			/// Also note that calling this function dynamically computes the cost of
	/// inlining the callsite. It is an expensive, heavyweight call.			/// inlining the callsite. It is an expensive, heavyweight call.
	InlineCost getInlineCost(			InlineCost getInlineCost(
	CallBase &Call, const InlineParams &Params, TargetTransformInfo &CalleeTTI,			CallBase &Call, const InlineParams &Params, TargetTransformInfo &CalleeTTI,
	std::function<AssumptionCache &(Function &)> &GetAssumptionCache,			std::function<AssumptionCache &(Function &)> &GetAssumptionCache,
	Optional<function_ref<BlockFrequencyInfo &(Function &)>> GetBFI,			Optional<function_ref<BlockFrequencyInfo &(Function &)>> GetBFI,
				function_ref<const TargetLibraryInfo &(Function &)> GetTLI,
	ProfileSummaryInfo PSI, OptimizationRemarkEmitter ORE = nullptr);			ProfileSummaryInfo PSI, OptimizationRemarkEmitter ORE = nullptr);

	/// Get an InlineCost with the callee explicitly specified.			/// Get an InlineCost with the callee explicitly specified.
	/// This allows you to calculate the cost of inlining a function via a			/// This allows you to calculate the cost of inlining a function via a
	/// pointer. This behaves exactly as the version with no explicit callee			/// pointer. This behaves exactly as the version with no explicit callee
	/// parameter in all other respects.			/// parameter in all other respects.
	//			//
	InlineCost			InlineCost
	getInlineCost(CallBase &Call, Function *Callee, const InlineParams &Params,			getInlineCost(CallBase &Call, Function *Callee, const InlineParams &Params,
	TargetTransformInfo &CalleeTTI,			TargetTransformInfo &CalleeTTI,
	std::function<AssumptionCache &(Function &)> &GetAssumptionCache,			std::function<AssumptionCache &(Function &)> &GetAssumptionCache,
	Optional<function_ref<BlockFrequencyInfo &(Function &)>> GetBFI,			Optional<function_ref<BlockFrequencyInfo &(Function &)>> GetBFI,
				function_ref<const TargetLibraryInfo &(Function &)> GetTLI,
	ProfileSummaryInfo PSI, OptimizationRemarkEmitter ORE);			ProfileSummaryInfo PSI, OptimizationRemarkEmitter ORE);

	/// Minimal filter to detect invalid constructs for inlining.			/// Minimal filter to detect invalid constructs for inlining.
	InlineResult isInlineViable(Function &Callee);			InlineResult isInlineViable(Function &Callee);
	} // namespace llvm			} // namespace llvm

	#endif			#endif

llvm/include/llvm/Analysis/TargetLibraryInfo.h

Show First 20 Lines • Show All 254 Lines • ▼ Show 20 Lines	TargetLibraryInfo &operator=(const TargetLibraryInfo &TLI) {
return *this;		return *this;
}		}
TargetLibraryInfo &operator=(TargetLibraryInfo &&TLI) {		TargetLibraryInfo &operator=(TargetLibraryInfo &&TLI) {
Impl = TLI.Impl;		Impl = TLI.Impl;
OverrideAsUnavailable = TLI.OverrideAsUnavailable;		OverrideAsUnavailable = TLI.OverrideAsUnavailable;
return *this;		return *this;
}		}

		/// Determine whether a callee with the given TLI can be inlined into
		/// caller with this TLI, based on 'nobuiltin' attributes. When requested,
		/// allow inlining into a caller with a superset of the callee's nobuiltin
		/// attributes, which is conservatively correct.
		bool areInlineCompatible(const TargetLibraryInfo &CalleeTLI,
		davidxlUnsubmitted Not Done Reply Inline Actions This may be bad for performance -- as the inline instance will be optimized differently. davidxl: This may be bad for performance -- as the inline instance will be optimized differently.
		tejohnsonAuthorUnsubmitted Done Reply Inline Actions Note that it won't be worse than head, which doesn't restrict the inlines based on nobuiltin attributes at all. We could also just disallow inlining completely between callers/callees with different nobuiltin attributes. But I was concerned that this would degrade performance too much by disallowing inlining in too many cases. tejohnson: Note that it won't be worse than head, which doesn't restrict the inlines based on nobuiltin…
		davidxlUnsubmitted Done Reply Inline Actions Perhaps add an additional parameter to the interface to allow superset behavior. Then in the inlineCost.cpp, add an internal option to specify whether strict attribute matching is required -- the default can be the current behavior -- allow inlining into superset. davidxl: Perhaps add an additional parameter to the interface to allow superset behavior. Then in the…
		gchateletUnsubmitted Not Done Reply Inline Actions Note that it won't be worse than head, which doesn't restrict the inlines based on nobuiltin attributes at all. We could also just disallow inlining completely between callers/callees with different nobuiltin attributes. But I was concerned that this would degrade performance too much by disallowing inlining in too many cases. I agree disallowing inlining completely when `nobuiltin` differ would prevent inlining of basic memory functions entirely (memset, memcpy, etc..) gchatelet: > Note that it won't be worse than head, which doesn't restrict the inlines based on nobuiltin…
		bool AllowCallerSuperset) const {
		if (!AllowCallerSuperset)
		return OverrideAsUnavailable == CalleeTLI.OverrideAsUnavailable;
		BitVector B = OverrideAsUnavailable;
		B \|= CalleeTLI.OverrideAsUnavailable;
		// We can inline if the union of the caller and callee's nobuiltin
		// attributes is no stricter than the caller's nobuiltin attributes.
		return B == OverrideAsUnavailable;
		}

/// Searches for a particular function name.		/// Searches for a particular function name.
///		///
/// If it is one of the known library functions, return true and set F to the		/// If it is one of the known library functions, return true and set F to the
/// corresponding value.		/// corresponding value.
bool getLibFunc(StringRef funcName, LibFunc &F) const {		bool getLibFunc(StringRef funcName, LibFunc &F) const {
return Impl->getLibFunc(funcName, F);		return Impl->getLibFunc(funcName, F);
}		}

▲ Show 20 Lines • Show All 185 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/IPO/Inliner.h

	Show First 20 Lines • Show All 68 Lines • ▼ Show 20 Lines

	private:			private:
	// Insert @llvm.lifetime intrinsics.			// Insert @llvm.lifetime intrinsics.
	bool InsertLifetime = true;			bool InsertLifetime = true;

	protected:			protected:
	AssumptionCacheTracker *ACT;			AssumptionCacheTracker *ACT;
	ProfileSummaryInfo *PSI;			ProfileSummaryInfo *PSI;
				std::function<const TargetLibraryInfo &(Function &)> GetTLI;
	ImportedFunctionsInliningStatistics ImportedFunctionsStats;			ImportedFunctionsInliningStatistics ImportedFunctionsStats;
	};			};

	/// The inliner pass for the new pass manager.			/// The inliner pass for the new pass manager.
	///			///
	/// This pass wires together the inlining utilities and the inline cost			/// This pass wires together the inlining utilities and the inline cost
	/// analysis into a CGSCC pass. It considers every call in every function in			/// analysis into a CGSCC pass. It considers every call in every function in
	/// the SCC and tries to inline if profitable. It can be tuned with a number of			/// the SCC and tries to inline if profitable. It can be tuned with a number of
	Show All 29 Lines

llvm/lib/Analysis/InlineCost.cpp

Show All 18 Lines
#include "llvm/Analysis/AssumptionCache.h"		#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/Analysis/BlockFrequencyInfo.h"		#include "llvm/Analysis/BlockFrequencyInfo.h"
#include "llvm/Analysis/CFG.h"		#include "llvm/Analysis/CFG.h"
#include "llvm/Analysis/CodeMetrics.h"		#include "llvm/Analysis/CodeMetrics.h"
#include "llvm/Analysis/ConstantFolding.h"		#include "llvm/Analysis/ConstantFolding.h"
#include "llvm/Analysis/InstructionSimplify.h"		#include "llvm/Analysis/InstructionSimplify.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/ProfileSummaryInfo.h"		#include "llvm/Analysis/ProfileSummaryInfo.h"
		#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/Config/llvm-config.h"		#include "llvm/Config/llvm-config.h"
#include "llvm/IR/CallingConv.h"		#include "llvm/IR/CallingConv.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/GetElementPtrTypeIterator.h"		#include "llvm/IR/GetElementPtrTypeIterator.h"
#include "llvm/IR/GlobalAlias.h"		#include "llvm/IR/GlobalAlias.h"
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	cl::desc("Minimum block frequency, expressed as a multiple of caller's "
"entry frequency, for a callsite to be hot in the absence of "		"entry frequency, for a callsite to be hot in the absence of "
"profile information."));		"profile information."));

static cl::opt<bool> OptComputeFullInlineCost(		static cl::opt<bool> OptComputeFullInlineCost(
"inline-cost-full", cl::Hidden, cl::init(false), cl::ZeroOrMore,		"inline-cost-full", cl::Hidden, cl::init(false), cl::ZeroOrMore,
cl::desc("Compute the full inline cost of a call site even when the cost "		cl::desc("Compute the full inline cost of a call site even when the cost "
"exceeds the threshold."));		"exceeds the threshold."));

		static cl::opt<bool> InlineCallerSupersetNoBuiltin(
		"inline-caller-superset-nobuiltin", cl::Hidden, cl::init(true),
		cl::ZeroOrMore,
		cl::desc("Allow inlining when caller has a superset of callee's nobuiltin "
		"attributes."));

namespace {		namespace {
class InlineCostCallAnalyzer;		class InlineCostCallAnalyzer;
class CallAnalyzer : public InstVisitor<CallAnalyzer, bool> {		class CallAnalyzer : public InstVisitor<CallAnalyzer, bool> {
typedef InstVisitor<CallAnalyzer, bool> Base;		typedef InstVisitor<CallAnalyzer, bool> Base;
friend class InstVisitor<CallAnalyzer, bool>;		friend class InstVisitor<CallAnalyzer, bool>;

protected:		protected:
virtual ~CallAnalyzer() {}		virtual ~CallAnalyzer() {}
▲ Show 20 Lines • Show All 1,955 Lines • ▼ Show 20 Lines	#define DEBUG_PRINT_STAT(x) dbgs() << " " #x ": " << x << "\n"
DEBUG_PRINT_STAT(Cost);		DEBUG_PRINT_STAT(Cost);
DEBUG_PRINT_STAT(Threshold);		DEBUG_PRINT_STAT(Threshold);
#undef DEBUG_PRINT_STAT		#undef DEBUG_PRINT_STAT
}		}
#endif		#endif

/// Test that there are no attribute conflicts between Caller and Callee		/// Test that there are no attribute conflicts between Caller and Callee
/// that prevent inlining.		/// that prevent inlining.
static bool functionsHaveCompatibleAttributes(Function *Caller,		static bool functionsHaveCompatibleAttributes(
Function *Callee,		Function Caller, Function Callee, TargetTransformInfo &TTI,
TargetTransformInfo &TTI) {		function_ref<const TargetLibraryInfo &(Function &)> &GetTLI) {
		// Note that CalleeTLI must be a copy not a reference. The legacy pass manager
		// caches the most recently created TLI in the TargetLibraryInfoWrapperPass
		// object, and always returns the same object (which is overwritten on each
		// GetTLI call). Therefore we copy the first result.
		auto CalleeTLI = GetTLI(*Callee);
return TTI.areInlineCompatible(Caller, Callee) &&		return TTI.areInlineCompatible(Caller, Callee) &&
		GetTLI(*Caller).areInlineCompatible(CalleeTLI,
		InlineCallerSupersetNoBuiltin) &&
AttributeFuncs::areInlineCompatible(Caller, Callee);		AttributeFuncs::areInlineCompatible(Caller, Callee);
}		}

int llvm::getCallsiteCost(CallBase &Call, const DataLayout &DL) {		int llvm::getCallsiteCost(CallBase &Call, const DataLayout &DL) {
int Cost = 0;		int Cost = 0;
for (unsigned I = 0, E = Call.arg_size(); I != E; ++I) {		for (unsigned I = 0, E = Call.arg_size(); I != E; ++I) {
if (Call.isByValArgument(I)) {		if (Call.isByValArgument(I)) {
// We approximate the number of loads and stores needed by dividing the		// We approximate the number of loads and stores needed by dividing the
Show All 24 Lines	int llvm::getCallsiteCost(CallBase &Call, const DataLayout &DL) {
Cost += InlineConstants::InstrCost + InlineConstants::CallPenalty;		Cost += InlineConstants::InstrCost + InlineConstants::CallPenalty;
return Cost;		return Cost;
}		}

InlineCost llvm::getInlineCost(		InlineCost llvm::getInlineCost(
CallBase &Call, const InlineParams &Params, TargetTransformInfo &CalleeTTI,		CallBase &Call, const InlineParams &Params, TargetTransformInfo &CalleeTTI,
std::function<AssumptionCache &(Function &)> &GetAssumptionCache,		std::function<AssumptionCache &(Function &)> &GetAssumptionCache,
Optional<function_ref<BlockFrequencyInfo &(Function &)>> GetBFI,		Optional<function_ref<BlockFrequencyInfo &(Function &)>> GetBFI,
		function_ref<const TargetLibraryInfo &(Function &)> GetTLI,
ProfileSummaryInfo PSI, OptimizationRemarkEmitter ORE) {		ProfileSummaryInfo PSI, OptimizationRemarkEmitter ORE) {
return getInlineCost(Call, Call.getCalledFunction(), Params, CalleeTTI,		return getInlineCost(Call, Call.getCalledFunction(), Params, CalleeTTI,
GetAssumptionCache, GetBFI, PSI, ORE);		GetAssumptionCache, GetBFI, GetTLI, PSI, ORE);
}		}

InlineCost llvm::getInlineCost(		InlineCost llvm::getInlineCost(
CallBase &Call, Function *Callee, const InlineParams &Params,		CallBase &Call, Function *Callee, const InlineParams &Params,
TargetTransformInfo &CalleeTTI,		TargetTransformInfo &CalleeTTI,
std::function<AssumptionCache &(Function &)> &GetAssumptionCache,		std::function<AssumptionCache &(Function &)> &GetAssumptionCache,
Optional<function_ref<BlockFrequencyInfo &(Function &)>> GetBFI,		Optional<function_ref<BlockFrequencyInfo &(Function &)>> GetBFI,
		function_ref<const TargetLibraryInfo &(Function &)> GetTLI,
ProfileSummaryInfo PSI, OptimizationRemarkEmitter ORE) {		ProfileSummaryInfo PSI, OptimizationRemarkEmitter ORE) {

// Cannot inline indirect calls.		// Cannot inline indirect calls.
if (!Callee)		if (!Callee)
return llvm::InlineCost::getNever("indirect call");		return llvm::InlineCost::getNever("indirect call");

// Never inline calls with byval arguments that does not have the alloca		// Never inline calls with byval arguments that does not have the alloca
// address space. Since byval arguments can be replaced with a copy to an		// address space. Since byval arguments can be replaced with a copy to an
Show All 16 Lines	if (Call.hasFnAttr(Attribute::AlwaysInline)) {
if (IsViable.isSuccess())		if (IsViable.isSuccess())
return llvm::InlineCost::getAlways("always inline attribute");		return llvm::InlineCost::getAlways("always inline attribute");
return llvm::InlineCost::getNever(IsViable.getFailureReason());		return llvm::InlineCost::getNever(IsViable.getFailureReason());
}		}

// Never inline functions with conflicting attributes (unless callee has		// Never inline functions with conflicting attributes (unless callee has
// always-inline attribute).		// always-inline attribute).
Function *Caller = Call.getCaller();		Function *Caller = Call.getCaller();
if (!functionsHaveCompatibleAttributes(Caller, Callee, CalleeTTI))		if (!functionsHaveCompatibleAttributes(Caller, Callee, CalleeTTI, GetTLI))
return llvm::InlineCost::getNever("conflicting attributes");		return llvm::InlineCost::getNever("conflicting attributes");

// Don't inline this call if the caller has the optnone attribute.		// Don't inline this call if the caller has the optnone attribute.
if (Caller->hasOptNone())		if (Caller->hasOptNone())
return llvm::InlineCost::getNever("optnone attribute");		return llvm::InlineCost::getNever("optnone attribute");

// Don't inline a function that treats null pointer as valid into a caller		// Don't inline a function that treats null pointer as valid into a caller
// that does not have this attribute.		// that does not have this attribute.
▲ Show 20 Lines • Show All 171 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUInline.cpp

Show First 20 Lines • Show All 209 Lines • ▼ Show 20 Lines	InlineCost AMDGPUInliner::getInlineCost(CallSite CS) {

OptimizationRemarkEmitter ORE(Caller);		OptimizationRemarkEmitter ORE(Caller);
std::function<AssumptionCache &(Function &)> GetAssumptionCache =		std::function<AssumptionCache &(Function &)> GetAssumptionCache =
[this](Function &F) -> AssumptionCache & {		[this](Function &F) -> AssumptionCache & {
return ACT->getAssumptionCache(F);		return ACT->getAssumptionCache(F);
};		};

auto IC = llvm::getInlineCost(cast<CallBase>(*CS.getInstruction()), Callee,		auto IC = llvm::getInlineCost(cast<CallBase>(*CS.getInstruction()), Callee,
LocalParams, TTI, GetAssumptionCache, None, PSI,		LocalParams, TTI, GetAssumptionCache, None,
RemarksEnabled ? &ORE : nullptr);		GetTLI, PSI, RemarksEnabled ? &ORE : nullptr);

if (IC && !IC.isAlways() && !Callee->hasFnAttribute(Attribute::InlineHint)) {		if (IC && !IC.isAlways() && !Callee->hasFnAttribute(Attribute::InlineHint)) {
// Single BB does not increase total BB amount, thus subtract 1		// Single BB does not increase total BB amount, thus subtract 1
size_t Size = Caller->size() + Callee->size() - 1;		size_t Size = Caller->size() + Callee->size() - 1;
if (MaxBB && Size > MaxBB)		if (MaxBB && Size > MaxBB)
return llvm::InlineCost::getNever("max number of bb exceeded");		return llvm::InlineCost::getNever("max number of bb exceeded");
}		}
return IC;		return IC;
}		}

llvm/lib/Transforms/IPO/InlineSimple.cpp

Show First 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	InlineCost getInlineCost(CallSite CS) override {
OptimizationRemarkEmitter ORE(CS.getCaller());		OptimizationRemarkEmitter ORE(CS.getCaller());

std::function<AssumptionCache &(Function &)> GetAssumptionCache =		std::function<AssumptionCache &(Function &)> GetAssumptionCache =
[&](Function &F) -> AssumptionCache & {		[&](Function &F) -> AssumptionCache & {
return ACT->getAssumptionCache(F);		return ACT->getAssumptionCache(F);
};		};
return llvm::getInlineCost(		return llvm::getInlineCost(
cast<CallBase>(*CS.getInstruction()), Params, TTI, GetAssumptionCache,		cast<CallBase>(*CS.getInstruction()), Params, TTI, GetAssumptionCache,
/GetBFI=/None, PSI, RemarksEnabled ? &ORE : nullptr);		/GetBFI=/None, GetTLI, PSI, RemarksEnabled ? &ORE : nullptr);
}		}

bool runOnSCC(CallGraphSCC &SCC) override;		bool runOnSCC(CallGraphSCC &SCC) override;
void getAnalysisUsage(AnalysisUsage &AU) const override;		void getAnalysisUsage(AnalysisUsage &AU) const override;

private:		private:
TargetTransformInfoWrapperPass *TTIWP;		TargetTransformInfoWrapperPass *TTIWP;

▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

llvm/lib/Transforms/IPO/Inliner.cpp

Show First 20 Lines • Show All 522 Lines • ▼ Show 20 Lines	static void setInlineRemark(CallSite &CS, StringRef message) {
Attribute attr = Attribute::get(CS->getContext(), "inline-remark", message);		Attribute attr = Attribute::get(CS->getContext(), "inline-remark", message);
CS.addAttribute(AttributeList::FunctionIndex, attr);		CS.addAttribute(AttributeList::FunctionIndex, attr);
}		}

static bool		static bool
inlineCallsImpl(CallGraphSCC &SCC, CallGraph &CG,		inlineCallsImpl(CallGraphSCC &SCC, CallGraph &CG,
std::function<AssumptionCache &(Function &)> GetAssumptionCache,		std::function<AssumptionCache &(Function &)> GetAssumptionCache,
ProfileSummaryInfo *PSI,		ProfileSummaryInfo *PSI,
std::function<TargetLibraryInfo &(Function &)> GetTLI,		std::function<const TargetLibraryInfo &(Function &)> GetTLI,
bool InsertLifetime,		bool InsertLifetime,
function_ref<InlineCost(CallSite CS)> GetInlineCost,		function_ref<InlineCost(CallSite CS)> GetInlineCost,
function_ref<AAResults &(Function &)> AARGetter,		function_ref<AAResults &(Function &)> AARGetter,
ImportedFunctionsInliningStatistics &ImportedFunctionsStats) {		ImportedFunctionsInliningStatistics &ImportedFunctionsStats) {
SmallPtrSet<Function *, 8> SCCFunctions;		SmallPtrSet<Function *, 8> SCCFunctions;
LLVM_DEBUG(dbgs() << "Inliner visiting SCC:");		LLVM_DEBUG(dbgs() << "Inliner visiting SCC:");
for (CallGraphNode *Node : SCC) {		for (CallGraphNode *Node : SCC) {
Function *F = Node->getFunction();		Function *F = Node->getFunction();
▲ Show 20 Lines • Show All 216 Lines • ▼ Show 20 Lines	inlineCallsImpl(CallGraphSCC &SCC, CallGraph &CG,

return Changed;		return Changed;
}		}

bool LegacyInlinerBase::inlineCalls(CallGraphSCC &SCC) {		bool LegacyInlinerBase::inlineCalls(CallGraphSCC &SCC) {
CallGraph &CG = getAnalysis<CallGraphWrapperPass>().getCallGraph();		CallGraph &CG = getAnalysis<CallGraphWrapperPass>().getCallGraph();
ACT = &getAnalysis<AssumptionCacheTracker>();		ACT = &getAnalysis<AssumptionCacheTracker>();
PSI = &getAnalysis<ProfileSummaryInfoWrapperPass>().getPSI();		PSI = &getAnalysis<ProfileSummaryInfoWrapperPass>().getPSI();
auto GetTLI = [&](Function &F) -> TargetLibraryInfo & {		GetTLI = [&](Function &F) -> const TargetLibraryInfo & {
return getAnalysis<TargetLibraryInfoWrapperPass>().getTLI(F);		return getAnalysis<TargetLibraryInfoWrapperPass>().getTLI(F);
};		};
auto GetAssumptionCache = [&](Function &F) -> AssumptionCache & {		auto GetAssumptionCache = [&](Function &F) -> AssumptionCache & {
return ACT->getAssumptionCache(F);		return ACT->getAssumptionCache(F);
};		};
return inlineCallsImpl(		return inlineCallsImpl(
SCC, CG, GetAssumptionCache, PSI, GetTLI, InsertLifetime,		SCC, CG, GetAssumptionCache, PSI, GetTLI, InsertLifetime,
[this](CallSite CS) { return getInlineCost(CS); }, LegacyAARGetter(*this),		[this](CallSite CS) { return getInlineCost(CS); }, LegacyAARGetter(*this),
▲ Show 20 Lines • Show All 230 Lines • ▼ Show 20 Lines	for (int i = 0; i < (int)Calls.size(); ++i) {

std::function<AssumptionCache &(Function &)> GetAssumptionCache =		std::function<AssumptionCache &(Function &)> GetAssumptionCache =
[&](Function &F) -> AssumptionCache & {		[&](Function &F) -> AssumptionCache & {
return FAM.getResult<AssumptionAnalysis>(F);		return FAM.getResult<AssumptionAnalysis>(F);
};		};
auto GetBFI = [&](Function &F) -> BlockFrequencyInfo & {		auto GetBFI = [&](Function &F) -> BlockFrequencyInfo & {
return FAM.getResult<BlockFrequencyAnalysis>(F);		return FAM.getResult<BlockFrequencyAnalysis>(F);
};		};
		auto GetTLI = [&](Function &F) -> const TargetLibraryInfo & {
		return FAM.getResult<TargetLibraryAnalysis>(F);
		};

auto GetInlineCost = [&](CallSite CS) {		auto GetInlineCost = [&](CallSite CS) {
Function &Callee = *CS.getCalledFunction();		Function &Callee = *CS.getCalledFunction();
auto &CalleeTTI = FAM.getResult<TargetIRAnalysis>(Callee);		auto &CalleeTTI = FAM.getResult<TargetIRAnalysis>(Callee);
bool RemarksEnabled =		bool RemarksEnabled =
Callee.getContext().getDiagHandlerPtr()->isMissedOptRemarkEnabled(		Callee.getContext().getDiagHandlerPtr()->isMissedOptRemarkEnabled(
DEBUG_TYPE);		DEBUG_TYPE);
return getInlineCost(cast<CallBase>(*CS.getInstruction()), Params,		return getInlineCost(cast<CallBase>(*CS.getInstruction()), Params,
CalleeTTI, GetAssumptionCache, {GetBFI}, PSI,		CalleeTTI, GetAssumptionCache, {GetBFI}, GetTLI, PSI,
RemarksEnabled ? &ORE : nullptr);		RemarksEnabled ? &ORE : nullptr);
};		};

// Now process as many calls as we have within this caller in the sequnece.		// Now process as many calls as we have within this caller in the sequnece.
// We bail out as soon as the caller has to change so we can update the		// We bail out as soon as the caller has to change so we can update the
// call graph and prepare the context of that new caller.		// call graph and prepare the context of that new caller.
bool DidInline = false;		bool DidInline = false;
for (; i < (int)Calls.size() && Calls[i].first.getCaller() == &F; ++i) {		for (; i < (int)Calls.size() && Calls[i].first.getCaller() == &F; ++i) {
▲ Show 20 Lines • Show All 216 Lines • Show Last 20 Lines

llvm/lib/Transforms/IPO/PartialInlining.cpp

Show First 20 Lines • Show All 197 Lines • ▼ Show 20 Lines

struct PartialInlinerImpl {		struct PartialInlinerImpl {

PartialInlinerImpl(		PartialInlinerImpl(
std::function<AssumptionCache &(Function &)> *GetAC,		std::function<AssumptionCache &(Function &)> *GetAC,
function_ref<AssumptionCache *(Function &)> LookupAC,		function_ref<AssumptionCache *(Function &)> LookupAC,
std::function<TargetTransformInfo &(Function &)> *GTTI,		std::function<TargetTransformInfo &(Function &)> *GTTI,
Optional<function_ref<BlockFrequencyInfo &(Function &)>> GBFI,		Optional<function_ref<BlockFrequencyInfo &(Function &)>> GBFI,
		std::function<const TargetLibraryInfo &(Function &)> *GTLI,
ProfileSummaryInfo *ProfSI)		ProfileSummaryInfo *ProfSI)
: GetAssumptionCache(GetAC), LookupAssumptionCache(LookupAC),		: GetAssumptionCache(GetAC), LookupAssumptionCache(LookupAC),
GetTTI(GTTI), GetBFI(GBFI), PSI(ProfSI) {}		GetTTI(GTTI), GetBFI(GBFI), GetTLI(GTLI), PSI(ProfSI) {}

bool run(Module &M);		bool run(Module &M);
// Main part of the transformation that calls helper functions to find		// Main part of the transformation that calls helper functions to find
// outlining candidates, clone & outline the function, and attempt to		// outlining candidates, clone & outline the function, and attempt to
// partially inline the resulting function. Returns true if		// partially inline the resulting function. Returns true if
// inlining was successful, false otherwise. Also returns the outline		// inlining was successful, false otherwise. Also returns the outline
// function (only if we partially inlined early returns) as there is a		// function (only if we partially inlined early returns) as there is a
// possibility to further "peel" early return statements that were left in the		// possibility to further "peel" early return statements that were left in the
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	struct PartialInlinerImpl {
};		};

private:		private:
int NumPartialInlining = 0;		int NumPartialInlining = 0;
std::function<AssumptionCache &(Function &)> *GetAssumptionCache;		std::function<AssumptionCache &(Function &)> *GetAssumptionCache;
function_ref<AssumptionCache *(Function &)> LookupAssumptionCache;		function_ref<AssumptionCache *(Function &)> LookupAssumptionCache;
std::function<TargetTransformInfo &(Function &)> *GetTTI;		std::function<TargetTransformInfo &(Function &)> *GetTTI;
Optional<function_ref<BlockFrequencyInfo &(Function &)>> GetBFI;		Optional<function_ref<BlockFrequencyInfo &(Function &)>> GetBFI;
		std::function<const TargetLibraryInfo &(Function &)> *GetTLI;
ProfileSummaryInfo *PSI;		ProfileSummaryInfo *PSI;

// Return the frequency of the OutlininingBB relative to F's entry point.		// Return the frequency of the OutlininingBB relative to F's entry point.
// The result is no larger than 1 and is represented using BP.		// The result is no larger than 1 and is represented using BP.
// (Note that the outlined region's 'head' block can only have incoming		// (Note that the outlined region's 'head' block can only have incoming
// edges from the guarding entry blocks).		// edges from the guarding entry blocks).
BranchProbability getOutliningCallBBRelativeFreq(FunctionCloner &Cloner);		BranchProbability getOutliningCallBBRelativeFreq(FunctionCloner &Cloner);

▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	struct PartialInlinerLegacyPass : public ModulePass {
PartialInlinerLegacyPass() : ModulePass(ID) {		PartialInlinerLegacyPass() : ModulePass(ID) {
initializePartialInlinerLegacyPassPass(*PassRegistry::getPassRegistry());		initializePartialInlinerLegacyPassPass(*PassRegistry::getPassRegistry());
}		}

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.addRequired<AssumptionCacheTracker>();		AU.addRequired<AssumptionCacheTracker>();
AU.addRequired<ProfileSummaryInfoWrapperPass>();		AU.addRequired<ProfileSummaryInfoWrapperPass>();
AU.addRequired<TargetTransformInfoWrapperPass>();		AU.addRequired<TargetTransformInfoWrapperPass>();
		AU.addRequired<TargetLibraryInfoWrapperPass>();
}		}

bool runOnModule(Module &M) override {		bool runOnModule(Module &M) override {
if (skipModule(M))		if (skipModule(M))
return false;		return false;

AssumptionCacheTracker *ACT = &getAnalysis<AssumptionCacheTracker>();		AssumptionCacheTracker *ACT = &getAnalysis<AssumptionCacheTracker>();
TargetTransformInfoWrapperPass *TTIWP =		TargetTransformInfoWrapperPass *TTIWP =
Show All 10 Lines	auto LookupAssumptionCache = [ACT](Function &F) -> AssumptionCache * {
return ACT->lookupAssumptionCache(F);		return ACT->lookupAssumptionCache(F);
};		};

std::function<TargetTransformInfo &(Function &)> GetTTI =		std::function<TargetTransformInfo &(Function &)> GetTTI =
[&TTIWP](Function &F) -> TargetTransformInfo & {		[&TTIWP](Function &F) -> TargetTransformInfo & {
return TTIWP->getTTI(F);		return TTIWP->getTTI(F);
};		};

		std::function<const TargetLibraryInfo &(Function &)> GetTLI =
		[this](Function &F) -> TargetLibraryInfo & {
		return this->getAnalysis<TargetLibraryInfoWrapperPass>().getTLI(F);
		};

return PartialInlinerImpl(&GetAssumptionCache, LookupAssumptionCache,		return PartialInlinerImpl(&GetAssumptionCache, LookupAssumptionCache,
&GetTTI, NoneType::None, PSI)		&GetTTI, NoneType::None, &GetTLI, PSI)
		gchateletUnsubmitted Not Done Reply Inline Actions I'm having a hard time convincing myself that the lifetime requirements are correct here. Passing a local variable `GetTLI` by address in `return` statement looks fishy. It's similar to `GetTTI` so is seems correct, it's just hard to tell by looking at the code. Same above and below. gchatelet: I'm having a hard time convincing myself that the lifetime requirements are correct here.
		tejohnsonAuthorUnsubmitted Done Reply Inline Actions What's being returned is the bool result of the run() call, not the PartialInlinerImpl object, which doesn't survive past this function and therefore the GetTLI scope. tejohnson: What's being returned is the bool result of the run() call, not the PartialInlinerImpl object…
		gchateletUnsubmitted Done Reply Inline Actions Ha right, I got confused by the formatting and missed the `run()` on next line. gchatelet: Ha right, I got confused by the formatting and missed the `run()` on next line.
.run(M);		.run(M);
}		}
};		};

} // end anonymous namespace		} // end anonymous namespace

std::unique_ptr<FunctionOutliningMultiRegionInfo>		std::unique_ptr<FunctionOutliningMultiRegionInfo>
PartialInlinerImpl::computeOutliningColdRegionsInfo(Function *F,		PartialInlinerImpl::computeOutliningColdRegionsInfo(Function *F,
▲ Show 20 Lines • Show All 379 Lines • ▼ Show 20 Lines	bool PartialInlinerImpl::shouldPartialInline(

Function *Caller = CS.getCaller();		Function *Caller = CS.getCaller();
auto &CalleeTTI = (GetTTI)(Callee);		auto &CalleeTTI = (GetTTI)(Callee);
bool RemarksEnabled =		bool RemarksEnabled =
Callee->getContext().getDiagHandlerPtr()->isMissedOptRemarkEnabled(		Callee->getContext().getDiagHandlerPtr()->isMissedOptRemarkEnabled(
DEBUG_TYPE);		DEBUG_TYPE);
assert(Call && "invalid callsite for partial inline");		assert(Call && "invalid callsite for partial inline");
InlineCost IC = getInlineCost(cast<CallBase>(*Call), getInlineParams(),		InlineCost IC = getInlineCost(cast<CallBase>(*Call), getInlineParams(),
CalleeTTI, *GetAssumptionCache, GetBFI, PSI,		CalleeTTI, GetAssumptionCache, GetBFI, GetTLI,
RemarksEnabled ? &ORE : nullptr);		PSI, RemarksEnabled ? &ORE : nullptr);

if (IC.isAlways()) {		if (IC.isAlways()) {
ORE.emit([&]() {		ORE.emit([&]() {
return OptimizationRemarkAnalysis(DEBUG_TYPE, "AlwaysInline", Call)		return OptimizationRemarkAnalysis(DEBUG_TYPE, "AlwaysInline", Call)
<< NV("Callee", Cloner.OrigFunc)		<< NV("Callee", Cloner.OrigFunc)
<< " should always be fully inlined, not partially";		<< " should always be fully inlined, not partially";
});		});
return false;		return false;
▲ Show 20 Lines • Show All 697 Lines • ▼ Show 20 Lines

char PartialInlinerLegacyPass::ID = 0;		char PartialInlinerLegacyPass::ID = 0;

INITIALIZE_PASS_BEGIN(PartialInlinerLegacyPass, "partial-inliner",		INITIALIZE_PASS_BEGIN(PartialInlinerLegacyPass, "partial-inliner",
"Partial Inliner", false, false)		"Partial Inliner", false, false)
INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)		INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)
INITIALIZE_PASS_DEPENDENCY(ProfileSummaryInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(ProfileSummaryInfoWrapperPass)
INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
		INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)
INITIALIZE_PASS_END(PartialInlinerLegacyPass, "partial-inliner",		INITIALIZE_PASS_END(PartialInlinerLegacyPass, "partial-inliner",
"Partial Inliner", false, false)		"Partial Inliner", false, false)

ModulePass *llvm::createPartialInliningPass() {		ModulePass *llvm::createPartialInliningPass() {
return new PartialInlinerLegacyPass();		return new PartialInlinerLegacyPass();
}		}

PreservedAnalyses PartialInlinerPass::run(Module &M,		PreservedAnalyses PartialInlinerPass::run(Module &M,
Show All 14 Lines	std::function<BlockFrequencyInfo &(Function &)> GetBFI =
return FAM.getResult<BlockFrequencyAnalysis>(F);		return FAM.getResult<BlockFrequencyAnalysis>(F);
};		};

std::function<TargetTransformInfo &(Function &)> GetTTI =		std::function<TargetTransformInfo &(Function &)> GetTTI =
[&FAM](Function &F) -> TargetTransformInfo & {		[&FAM](Function &F) -> TargetTransformInfo & {
return FAM.getResult<TargetIRAnalysis>(F);		return FAM.getResult<TargetIRAnalysis>(F);
};		};

		std::function<const TargetLibraryInfo &(Function &)> GetTLI =
		[&FAM](Function &F) -> TargetLibraryInfo & {
		return FAM.getResult<TargetLibraryAnalysis>(F);
		};

ProfileSummaryInfo *PSI = &AM.getResult<ProfileSummaryAnalysis>(M);		ProfileSummaryInfo *PSI = &AM.getResult<ProfileSummaryAnalysis>(M);

if (PartialInlinerImpl(&GetAssumptionCache, LookupAssumptionCache, &GetTTI,		if (PartialInlinerImpl(&GetAssumptionCache, LookupAssumptionCache, &GetTTI,
{GetBFI}, PSI)		{GetBFI}, &GetTLI, PSI)
.run(M))		.run(M))
return PreservedAnalyses::none();		return PreservedAnalyses::none();
return PreservedAnalyses::all();		return PreservedAnalyses::all();
}		}

llvm/lib/Transforms/IPO/SampleProfile.cpp

Show All 36 Lines
#include "llvm/Analysis/AssumptionCache.h"		#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/Analysis/CallGraph.h"		#include "llvm/Analysis/CallGraph.h"
#include "llvm/Analysis/CallGraphSCCPass.h"		#include "llvm/Analysis/CallGraphSCCPass.h"
#include "llvm/Analysis/InlineCost.h"		#include "llvm/Analysis/InlineCost.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/OptimizationRemarkEmitter.h"		#include "llvm/Analysis/OptimizationRemarkEmitter.h"
#include "llvm/Analysis/PostDominators.h"		#include "llvm/Analysis/PostDominators.h"
#include "llvm/Analysis/ProfileSummaryInfo.h"		#include "llvm/Analysis/ProfileSummaryInfo.h"
		#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/CFG.h"		#include "llvm/IR/CFG.h"
#include "llvm/IR/CallSite.h"		#include "llvm/IR/CallSite.h"
#include "llvm/IR/DebugInfoMetadata.h"		#include "llvm/IR/DebugInfoMetadata.h"
#include "llvm/IR/DebugLoc.h"		#include "llvm/IR/DebugLoc.h"
#include "llvm/IR/DiagnosticInfo.h"		#include "llvm/IR/DiagnosticInfo.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
▲ Show 20 Lines • Show All 249 Lines • ▼ Show 20 Lines
/// This pass reads profile data from the file specified by		/// This pass reads profile data from the file specified by
/// -sample-profile-file and annotates every affected function with the		/// -sample-profile-file and annotates every affected function with the
/// profile information found in that file.		/// profile information found in that file.
class SampleProfileLoader {		class SampleProfileLoader {
public:		public:
SampleProfileLoader(		SampleProfileLoader(
StringRef Name, StringRef RemapName, bool IsThinLTOPreLink,		StringRef Name, StringRef RemapName, bool IsThinLTOPreLink,
std::function<AssumptionCache &(Function &)> GetAssumptionCache,		std::function<AssumptionCache &(Function &)> GetAssumptionCache,
std::function<TargetTransformInfo &(Function &)> GetTargetTransformInfo)		std::function<TargetTransformInfo &(Function &)> GetTargetTransformInfo,
		std::function<const TargetLibraryInfo &(Function &)> GetTLI)
: GetAC(std::move(GetAssumptionCache)),		: GetAC(std::move(GetAssumptionCache)),
GetTTI(std::move(GetTargetTransformInfo)), CoverageTracker(*this),		GetTTI(std::move(GetTargetTransformInfo)), GetTLI(std::move(GetTLI)),
Filename(std::string(Name)), RemappingFilename(std::string(RemapName)),		CoverageTracker(*this), Filename(std::string(Name)),
		RemappingFilename(std::string(RemapName)),
IsThinLTOPreLink(IsThinLTOPreLink) {}		IsThinLTOPreLink(IsThinLTOPreLink) {}

bool doInitialization(Module &M);		bool doInitialization(Module &M);
bool runOnModule(Module &M, ModuleAnalysisManager *AM,		bool runOnModule(Module &M, ModuleAnalysisManager *AM,
ProfileSummaryInfo _PSI, CallGraph CG);		ProfileSummaryInfo _PSI, CallGraph CG);

void dump() { Reader->dump(); }		void dump() { Reader->dump(); }

▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	protected:

/// Dominance, post-dominance and loop information.		/// Dominance, post-dominance and loop information.
std::unique_ptr<DominatorTree> DT;		std::unique_ptr<DominatorTree> DT;
std::unique_ptr<PostDominatorTree> PDT;		std::unique_ptr<PostDominatorTree> PDT;
std::unique_ptr<LoopInfo> LI;		std::unique_ptr<LoopInfo> LI;

std::function<AssumptionCache &(Function &)> GetAC;		std::function<AssumptionCache &(Function &)> GetAC;
std::function<TargetTransformInfo &(Function &)> GetTTI;		std::function<TargetTransformInfo &(Function &)> GetTTI;
		std::function<const TargetLibraryInfo &(Function &)> GetTLI;

/// Predecessors for each basic block in the CFG.		/// Predecessors for each basic block in the CFG.
BlockEdgeMap Predecessors;		BlockEdgeMap Predecessors;

/// Successors for each basic block in the CFG.		/// Successors for each basic block in the CFG.
BlockEdgeMap Successors;		BlockEdgeMap Successors;

SampleCoverageTracker CoverageTracker;		SampleCoverageTracker CoverageTracker;
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines

class SampleProfileLoaderLegacyPass : public ModulePass {		class SampleProfileLoaderLegacyPass : public ModulePass {
public:		public:
// Class identification, replacement for typeinfo		// Class identification, replacement for typeinfo
static char ID;		static char ID;

SampleProfileLoaderLegacyPass(StringRef Name = SampleProfileFile,		SampleProfileLoaderLegacyPass(StringRef Name = SampleProfileFile,
bool IsThinLTOPreLink = false)		bool IsThinLTOPreLink = false)
: ModulePass(ID),		: ModulePass(ID), SampleLoader(
SampleLoader(Name, SampleProfileRemappingFile, IsThinLTOPreLink,		Name, SampleProfileRemappingFile, IsThinLTOPreLink,
[&](Function &F) -> AssumptionCache & {		[&](Function &F) -> AssumptionCache & {
return ACT->getAssumptionCache(F);		return ACT->getAssumptionCache(F);
},		},
[&](Function &F) -> TargetTransformInfo & {		[&](Function &F) -> TargetTransformInfo & {
return TTIWP->getTTI(F);		return TTIWP->getTTI(F);
		},
		[&](Function &F) -> TargetLibraryInfo & {
		return TLIWP->getTLI(F);
}) {		}) {
initializeSampleProfileLoaderLegacyPassPass(		initializeSampleProfileLoaderLegacyPassPass(
*PassRegistry::getPassRegistry());		*PassRegistry::getPassRegistry());
}		}

void dump() { SampleLoader.dump(); }		void dump() { SampleLoader.dump(); }

bool doInitialization(Module &M) override {		bool doInitialization(Module &M) override {
return SampleLoader.doInitialization(M);		return SampleLoader.doInitialization(M);
}		}

StringRef getPassName() const override { return "Sample profile pass"; }		StringRef getPassName() const override { return "Sample profile pass"; }
bool runOnModule(Module &M) override;		bool runOnModule(Module &M) override;

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.addRequired<AssumptionCacheTracker>();		AU.addRequired<AssumptionCacheTracker>();
AU.addRequired<TargetTransformInfoWrapperPass>();		AU.addRequired<TargetTransformInfoWrapperPass>();
		AU.addRequired<TargetLibraryInfoWrapperPass>();
AU.addRequired<ProfileSummaryInfoWrapperPass>();		AU.addRequired<ProfileSummaryInfoWrapperPass>();
}		}

private:		private:
SampleProfileLoader SampleLoader;		SampleProfileLoader SampleLoader;
AssumptionCacheTracker *ACT = nullptr;		AssumptionCacheTracker *ACT = nullptr;
TargetTransformInfoWrapperPass *TTIWP = nullptr;		TargetTransformInfoWrapperPass *TTIWP = nullptr;
		TargetLibraryInfoWrapperPass *TLIWP = nullptr;
};		};

} // end anonymous namespace		} // end anonymous namespace

/// Return true if the given callsite is hot wrt to hot cutoff threshold.		/// Return true if the given callsite is hot wrt to hot cutoff threshold.
///		///
/// Functions that were inlined in the original binary will be represented		/// Functions that were inlined in the original binary will be represented
/// in the inline stack in the sample profile. If the profile shows that		/// in the inline stack in the sample profile. If the profile shows that
▲ Show 20 Lines • Show All 381 Lines • ▼ Show 20 Lines	bool SampleProfileLoader::inlineCallInstruction(Instruction *I) {
// Checks if there is anything in the reachable portion of the callee at		// Checks if there is anything in the reachable portion of the callee at
// this callsite that makes this inlining potentially illegal. Need to		// this callsite that makes this inlining potentially illegal. Need to
// set ComputeFullInlineCost, otherwise getInlineCost may return early		// set ComputeFullInlineCost, otherwise getInlineCost may return early
// when cost exceeds threshold without checking all IRs in the callee.		// when cost exceeds threshold without checking all IRs in the callee.
// The acutal cost does not matter because we only checks isNever() to		// The acutal cost does not matter because we only checks isNever() to
// see if it is legal to inline the callsite.		// see if it is legal to inline the callsite.
InlineCost Cost =		InlineCost Cost =
getInlineCost(cast<CallBase>(I), Params, GetTTI(CalledFunction), GetAC,		getInlineCost(cast<CallBase>(I), Params, GetTTI(CalledFunction), GetAC,
None, nullptr, nullptr);		None, GetTLI, nullptr, nullptr);
if (Cost.isNever()) {		if (Cost.isNever()) {
ORE->emit(OptimizationRemarkAnalysis(CSINLINE_DEBUG, "InlineFail", DLoc, BB)		ORE->emit(OptimizationRemarkAnalysis(CSINLINE_DEBUG, "InlineFail", DLoc, BB)
<< "incompatible inlining");		<< "incompatible inlining");
return false;		return false;
}		}
InlineFunctionInfo IFI(nullptr, &GetAC);		InlineFunctionInfo IFI(nullptr, &GetAC);
if (InlineFunction(CS, IFI).isSuccess()) {		if (InlineFunction(CS, IFI).isSuccess()) {
// The call to InlineFunction erases I, so we can't pass it here.		// The call to InlineFunction erases I, so we can't pass it here.
Show All 10 Lines	if (!ProfileSizeInline)
return false;		return false;

Function *Callee = CallSite(&CallInst).getCalledFunction();		Function *Callee = CallSite(&CallInst).getCalledFunction();
if (Callee == nullptr)		if (Callee == nullptr)
return false;		return false;

InlineCost Cost =		InlineCost Cost =
getInlineCost(cast<CallBase>(CallInst), getInlineParams(),		getInlineCost(cast<CallBase>(CallInst), getInlineParams(),
GetTTI(*Callee), GetAC, None, nullptr, nullptr);		GetTTI(*Callee), GetAC, None, GetTLI, nullptr, nullptr);

return Cost.getCost() <= SampleColdCallSiteThreshold;		return Cost.getCost() <= SampleColdCallSiteThreshold;
}		}

void SampleProfileLoader::emitOptimizationRemarksForInlineCandidates(		void SampleProfileLoader::emitOptimizationRemarksForInlineCandidates(
const SmallVector<Instruction *, 10> &Candidates, const Function &F,		const SmallVector<Instruction *, 10> &Candidates, const Function &F,
bool Hot) {		bool Hot) {
for (auto I : Candidates) {		for (auto I : Candidates) {
▲ Show 20 Lines • Show All 824 Lines • ▼ Show 20 Lines
}		}

char SampleProfileLoaderLegacyPass::ID = 0;		char SampleProfileLoaderLegacyPass::ID = 0;

INITIALIZE_PASS_BEGIN(SampleProfileLoaderLegacyPass, "sample-profile",		INITIALIZE_PASS_BEGIN(SampleProfileLoaderLegacyPass, "sample-profile",
"Sample Profile loader", false, false)		"Sample Profile loader", false, false)
INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)		INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)
INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
		INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)
INITIALIZE_PASS_DEPENDENCY(ProfileSummaryInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(ProfileSummaryInfoWrapperPass)
INITIALIZE_PASS_END(SampleProfileLoaderLegacyPass, "sample-profile",		INITIALIZE_PASS_END(SampleProfileLoaderLegacyPass, "sample-profile",
"Sample Profile loader", false, false)		"Sample Profile loader", false, false)

std::vector<Function *>		std::vector<Function *>
SampleProfileLoader::buildFunctionOrder(Module &M, CallGraph *CG) {		SampleProfileLoader::buildFunctionOrder(Module &M, CallGraph *CG) {
std::vector<Function *> FunctionOrderList;		std::vector<Function *> FunctionOrderList;
FunctionOrderList.reserve(M.size());		FunctionOrderList.reserve(M.size());
▲ Show 20 Lines • Show All 104 Lines • ▼ Show 20 Lines	for (const std::pair<Function *, NotInlinedProfileInfo> &pair :
updateProfileCallee(pair.first, pair.second.entryCount);		updateProfileCallee(pair.first, pair.second.entryCount);

return retval;		return retval;
}		}

bool SampleProfileLoaderLegacyPass::runOnModule(Module &M) {		bool SampleProfileLoaderLegacyPass::runOnModule(Module &M) {
ACT = &getAnalysis<AssumptionCacheTracker>();		ACT = &getAnalysis<AssumptionCacheTracker>();
TTIWP = &getAnalysis<TargetTransformInfoWrapperPass>();		TTIWP = &getAnalysis<TargetTransformInfoWrapperPass>();
		TLIWP = &getAnalysis<TargetLibraryInfoWrapperPass>();
ProfileSummaryInfo *PSI =		ProfileSummaryInfo *PSI =
&getAnalysis<ProfileSummaryInfoWrapperPass>().getPSI();		&getAnalysis<ProfileSummaryInfoWrapperPass>().getPSI();
return SampleLoader.runOnModule(M, nullptr, PSI, nullptr);		return SampleLoader.runOnModule(M, nullptr, PSI, nullptr);
}		}

bool SampleProfileLoader::runOnFunction(Function &F, ModuleAnalysisManager *AM) {		bool SampleProfileLoader::runOnFunction(Function &F, ModuleAnalysisManager *AM) {

DILocation2SampleMap.clear();		DILocation2SampleMap.clear();
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	FunctionAnalysisManager &FAM =
AM.getResult<FunctionAnalysisManagerModuleProxy>(M).getManager();		AM.getResult<FunctionAnalysisManagerModuleProxy>(M).getManager();

auto GetAssumptionCache = [&](Function &F) -> AssumptionCache & {		auto GetAssumptionCache = [&](Function &F) -> AssumptionCache & {
return FAM.getResult<AssumptionAnalysis>(F);		return FAM.getResult<AssumptionAnalysis>(F);
};		};
auto GetTTI = [&](Function &F) -> TargetTransformInfo & {		auto GetTTI = [&](Function &F) -> TargetTransformInfo & {
return FAM.getResult<TargetIRAnalysis>(F);		return FAM.getResult<TargetIRAnalysis>(F);
};		};
		auto GetTLI = [&](Function &F) -> const TargetLibraryInfo & {
		return FAM.getResult<TargetLibraryAnalysis>(F);
		};

SampleProfileLoader SampleLoader(		SampleProfileLoader SampleLoader(
ProfileFileName.empty() ? SampleProfileFile : ProfileFileName,		ProfileFileName.empty() ? SampleProfileFile : ProfileFileName,
ProfileRemappingFileName.empty() ? SampleProfileRemappingFile		ProfileRemappingFileName.empty() ? SampleProfileRemappingFile
: ProfileRemappingFileName,		: ProfileRemappingFileName,
IsThinLTOPreLink, GetAssumptionCache, GetTTI);		IsThinLTOPreLink, GetAssumptionCache, GetTTI, GetTLI);

if (!SampleLoader.doInitialization(M))		if (!SampleLoader.doInitialization(M))
return PreservedAnalyses::all();		return PreservedAnalyses::all();

ProfileSummaryInfo *PSI = &AM.getResult<ProfileSummaryAnalysis>(M);		ProfileSummaryInfo *PSI = &AM.getResult<ProfileSummaryAnalysis>(M);
CallGraph &CG = AM.getResult<CallGraphAnalysis>(M);		CallGraph &CG = AM.getResult<CallGraphAnalysis>(M);
if (!SampleLoader.runOnModule(M, &AM, PSI, &CG))		if (!SampleLoader.runOnModule(M, &AM, PSI, &CG))
return PreservedAnalyses::all();		return PreservedAnalyses::all();

return PreservedAnalyses::none();		return PreservedAnalyses::none();
}		}

llvm/test/Transforms/Inline/inline-no-builtin-compatible.ll

This file was added.

				; Test to ensure no inlining is allowed into a caller with fewer nobuiltin attributes.
				; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -S -inline \| FileCheck %s
				; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -S -passes='cgscc(inline)' \| FileCheck %s

				; Make sure we don't inline callees into a caller with a superset of the
				; no builtin attributes when -inline-caller-superset-nobuiltin=false.
				; RUN: opt < %s -inline-caller-superset-nobuiltin=false -mtriple=x86_64-unknown-linux-gnu -S -passes='cgscc(inline)' \| FileCheck %s --check-prefix=NOSUPERSET

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				define i32 @allbuiltins() {
				entry:
				%call = call i32 (...) @externalfunc()
				ret i32 %call
				; CHECK-LABEL: allbuiltins
				; CHECK: call i32 (...) @externalfunc()
				}
				declare i32 @externalfunc(...)

				; We can inline a function that allows all builtins into one with a single
				; nobuiltin.
				define i32 @nobuiltinmemcpy() #0 {
				entry:
				%call = call i32 @allbuiltins()
				ret i32 %call
				; CHECK-LABEL: nobuiltinmemcpy
				; CHECK-NOT: call i32 @allbuiltins()
				; NOSUPERSET-LABEL: nobuiltinmemcpy
				; NOSUPERSET: call i32 @allbuiltins()
				}

				; We can inline a function that allows all builtins into one with all
				; nobuiltins.
				define i32 @nobuiltins() #1 {
				entry:
				%call = call i32 @allbuiltins()
				ret i32 %call
				; CHECK-LABEL: nobuiltins
				; CHECK-NOT: call i32 @allbuiltins()
				; NOSUPERSET-LABEL: nobuiltins
				; NOSUPERSET: call i32 @allbuiltins()
				}

				; We can inline a function with a single nobuiltin into one with all nobuiltins.
				define i32 @nobuiltins2() #1 {
				entry:
				%call = call i32 @nobuiltinmemcpy()
				ret i32 %call
				; CHECK-LABEL: nobuiltins2
				; CHECK-NOT: call i32 @nobuiltinmemcpy()
				; NOSUPERSET-LABEL: nobuiltins2
				; NOSUPERSET: call i32 @nobuiltinmemcpy()
				}

				; We can't inline a function with any given nobuiltin into one that allows all
				; builtins.
				define i32 @allbuiltins2() {
				entry:
				%call = call i32 @nobuiltinmemcpy()
				ret i32 %call
				; CHECK-LABEL: allbuiltins2
				; CHECK: call i32 @nobuiltinmemcpy()
				; NOSUPERSET-LABEL: allbuiltins2
				; NOSUPERSET: call i32 @nobuiltinmemcpy()
				}

				; We can't inline a function with all nobuiltins into one that allows all
				; builtins.
				define i32 @allbuiltins3() {
				entry:
				%call = call i32 @nobuiltins()
				ret i32 %call
				; CHECK-LABEL: allbuiltins3
				; CHECK: call i32 @nobuiltins()
				; NOSUPERSET-LABEL: allbuiltins3
				; NOSUPERSET: call i32 @nobuiltins()
				}

				; We can't inline a function with a specific nobuiltin into one with a
				; different specific nobuiltin.
				define i32 @nobuiltinmemset() #2 {
				entry:
				%call = call i32 @nobuiltinmemcpy()
				ret i32 %call
				; CHECK-LABEL: nobuiltinmemset
				; CHECK: call i32 @nobuiltinmemcpy()
				; NOSUPERSET-LABEL: nobuiltinmemset
				; NOSUPERSET: call i32 @nobuiltinmemcpy()
				}

				attributes #0 = { "no-builtin-memcpy" }
				attributes #1 = { "no-builtins" }
				attributes #2 = { "no-builtin-memset" }

This is an archive of the discontinued LLVM Phabricator instance.

[Inliner] Inlining should honor nobuiltin attributesClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 247165

llvm/include/llvm/Analysis/InlineCost.h

llvm/include/llvm/Analysis/TargetLibraryInfo.h

llvm/include/llvm/Transforms/IPO/Inliner.h

llvm/lib/Analysis/InlineCost.cpp

llvm/lib/Target/AMDGPU/AMDGPUInline.cpp

llvm/lib/Transforms/IPO/InlineSimple.cpp

llvm/lib/Transforms/IPO/Inliner.cpp

llvm/lib/Transforms/IPO/PartialInlining.cpp

llvm/lib/Transforms/IPO/SampleProfile.cpp

llvm/test/Transforms/Inline/inline-no-builtin-compatible.ll

[Inliner] Inlining should honor nobuiltin attributes
ClosedPublic