This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/
-
llvm/
-
Analysis/
1/7
InlineCost.h
-
Transforms/
-
IPO/
1/5
InlinerPass.h
-
Utils/
2
Cloning.h
-
lib/
-
Analysis/
16/22
InlineCost.cpp
-
Transforms/
-
IPO/
6/11
InlineSimple.cpp
3/7
Inliner.cpp
-
Utils/
-
CloneFunction.cpp
-
InlineFunction.cpp
-
test/Transforms/Inline/
-
Transforms/
-
Inline/
-
function-count-update-2.ll
-
function-count-update-3.ll
2/6
function-count-update.ll

Differential D16381

Infrastructure to allow use of PGO in inliner
ClosedPublic

Authored by eraman on Jan 20 2016, 4:55 PM.

Download Raw Diff

Details

Reviewers

chandlerc
davidxl
zzheng
hfinkel

Commits

rG3035719c8681: Infrastructure for PGO enhancements in inliner
rL262636: Infrastructure for PGO enhancements in inliner

Summary

This patch provides the following infrastructure to enable the use of PGO in inliner:

Enable the use of block level profile information in inliner
Incremental update of block frequency information during inlining
Update the function entry counts of callees when they get inlined into callers.

Diff Detail

Event Timeline

eraman updated this revision to Diff 45464.Jan 20 2016, 4:55 PM

eraman retitled this revision from to Adjust inlining thresholds based on callsite hotness.

eraman updated this object.

eraman updated this object.Jan 20 2016, 4:57 PM

eraman added reviewers: hfinkel, chandlerc.

eraman added subscribers: llvm-commits, davidxl.

junbuml added a subscriber: junbuml.Jan 21 2016, 8:47 AM

The subject of the patch is misleading -- it should really be 'Make block/callsite level profile data available for inliner'

zzheng added a reviewer: zzheng.Jan 21 2016, 12:57 PM

A few suggestions to make code more easy to read. This patch of course needs a PGO expert to review.

include/llvm/Transforms/IPO/InlinerPass.h
112	Where's these two being used?
lib/Analysis/InlineCost.cpp
622	Unused variable: CallerEntryCount
627	Caller->getEntryCount() called twice, this looks simpler: Optional<uint64_t> CallerEntryCount = Caller->getEntryCount(); if (CallerEntryCount.hasValue()) { uint64_t CallSiteCount = CallerEntryCount.getValue() * CallSiteFreq / CallerEntryFreq; }
1577	Empty comment line, I think it can be removed.

Address comments by zzheng

Thanks for the comments.

include/llvm/Transforms/IPO/InlinerPass.h
106	Removed them. I had initially used them and later replaced them by getXXXFunctor methods but forgot to remove those fields.

junbuml added inline comments.Jan 22 2016, 7:43 AM

lib/Analysis/InlineCost.cpp
1465	I cannot see any use of this function. Are you planing to hook this function in heuristic in this patch ?
1583	You can call LoopInfo LI(DT) instead of LoopInfo LI; LI.analyze(DT);
1586	BranchProbabilityInfo BPI(*F, LI); will call calculate(F, LI);
1589	You can also call BlockFrequencyInfo(F, BPI, LI), which call calculate(F, BPI, LI); inside.

mcrosier added a subscriber: mcrosier.Jan 22 2016, 10:26 AM

davidxl added a reviewer: davidxl.Jan 22 2016, 10:27 AM

davidxl added inline comments.Jan 22 2016, 4:28 PM

include/llvm/Transforms/IPO/InlinerPass.h
105	Can they be made pure virtual -- probably not if the derived class is not required to override it.
include/llvm/Transforms/Utils/Cloning.h
51	Is there a common header to put this decl in?
lib/Analysis/InlineCost.cpp
75	Remove this -- it can be folded in the threshold adjusting method tuned in the future.
81	I suggest removing these two parameters in these patch, as they are for not irrelevant and confusing (and subject to change in the very near future when summary data is available to be used here). The objective of this patch is to provide infrastructure hooks to feed profile data to the inliner, not to tune/or change the current inline behavior/decision -- as that needs more design and experiment. To accomplish that goal, I suggest simply add a dummy wrapper function to do this: int getAdjustedThreshold(int current_threshold, uint64_t callsite_count attribute((unused))) { // just return the input threshold for now return current_threshold; } In the future, I expect the threshold is different based on callsite_count and summary cutoffs.
623	Define a helper method to get callsite profile count.
629	call the proposed 'getAdjustedThreshold(Threshold, CallSiteCount);
1465	Remove unused function.

When the invalidation happens, the updated entry count information gets discarded it seems. This is wrong. The data should be propagated back to the IR. The information can be useful later for decisions such as function layout and optimize for size etc.

Some test case is also needed to test that the entry counts can be properly updated during inlining.

lib/Transforms/IPO/InlineSimple.cpp
107	\p OrigBB \p NewBB
122	Freq --> OrigBBFreq with comment it is OrigBB's frequency in callee
127	\p Callee ... after it got inlined at callsite in block \p CallBB
142	computing callsite count belongs to a common helper function -- which can be used elsewhere too.
145	It is an estimate because the limitation of frequency propagation algorithm (unable to handle zero BP etc). It is helpful to add a debug message here to indicate this condition.

Address review comments and add a test case to function entry counts are correctly updated.

In D16381#334500, @davidxl wrote:

When the invalidation happens, the updated entry count information gets discarded it seems. This is wrong. The data should be propagated back to the IR. The information can be useful later for decisions such as function layout and optimize for size etc.

UpdateEntryCount in InlineSimple.cpp calls setEntryCount and hence the counts are not lost.

Some test case is also needed to test that the entry counts can be properly updated during inlining.

Added a test case.

lib/Transforms/IPO/InlineSimple.cpp
142	I have placed this helper in Inlinecost.cpp.

eraman added inline comments.Jan 26 2016, 11:53 AM

include/llvm/Transforms/IPO/InlinerPass.h
105	AlwaysInliner then has to implement this method to return nullptr. I don't see any advantage in making it pyure virtual.
include/llvm/Transforms/Utils/Cloning.h
51	I could leave it in Cloning.h and include it in InlineSimple.cpp and get rid of it in InlinerPass.h, but this adds an unnecessary dependence to InlineSimple.cpp. There is no other header file common to the inliner proper and CloneFunction.cpp where it is appropriate to add this.
lib/Analysis/InlineCost.cpp
1465	Yes, I was planning to use this as part of tuning, but I've now removed it from this patch.

eraman retitled this revision from Adjust inlining thresholds based on callsite hotness to Infrastructure to allow use of PGO in inliner.Jan 29 2016, 11:44 AM

eraman updated this object.

We need to add more test cases. For instance

two callsites to one callee all get inlined -- the callee should have zero count left
scaling testing -- especially interesting in the context of top-down inlining (in the future we may have more heuristics to enable more top down inining, there are existing heuristics that can be used to trigger the case):

a ---> (200) ---> c
b ---> (300) ---> c
where c has entry count of 500

c-->(200) --> e
d -->(300) --> e
where e has entry count of 500.

Suppose there are two inlines in the following order: a->c, and a->e

After inlining, testing e's entry count is properly updated.

include/llvm/Analysis/InlineCost.h
44	suggested comment change: ... on demand, cached (for callees), and incrementally updated (for callers). The caller BFI is invalidated after inlining is done for (the caller).
53	Typo -- frequency
lib/Analysis/InlineCost.cpp
620	This guard is wrong :1) it should check caller entry count, not callee entry count. 2) hasPGOCounts also depends on other conditions. In fact, this condition can be removed.
1564	Add a comment for the function.
1577	Irrelevant comment line?

Address David's review comments.

I've added two more test cases.

include/llvm/Analysis/InlineCost.h
44	This class itself is agnostic about callers and callees. All it does is cache the BFI and invalidate them when required - and that's what the comment says.
lib/Analysis/InlineCost.cpp
620	Yes, this check is not needed and I've removed it.

Chandler and Hal, could you please give your feedback on this patch? Even if you can not give detailed comments right now, any high level comments will be very useful and I can iterate on this patch based on them.

Nice test case.

Dehao mentioned there is a problem with BB splitting when a basic block has two callsites -- after the first call gets inlined, the newly split block with the second callsite does not have updated profile data. Please handle that (with a proper test case).

junbuml added inline comments.Feb 2 2016, 9:47 AM

lib/Analysis/InlineCost.cpp
618–619	Call and CallBB seems to be used only in llvm::getBlockCount(CallBB, BFA);
1574	Can we guarantee FunctionEntryFreq is always non-zero if EntryCount is non-zero ?
lib/Transforms/IPO/InlineSimple.cpp
124	Should we have (double) here ?
125	why not checking CalleeEntryFreq * CallSiteFreq is non-zero.
161	No need to have braces.

In D16381#341243, @davidxl wrote:

Nice test case.

Dehao mentioned there is a problem with BB splitting when a basic block has two callsites -- after the first call gets inlined, the newly split block with the second callsite does not have updated profile data. Please handle that (with a proper test case).

I am working on the fix. I am also going to refactor the code a bit to do all updates in the base Inliner class rather than SimpleInliner. My original reasoning was that we don't have to do the updates for AlwaysInliner, but that is not true. With PGO, if always inline pass inlines a hot callsite, the callee will not be updated with the right entruy count. Besides using callbacks between the base and derived class looks ugly. I am going to move all the updates to the base class (there will still be a callback passed to the cloning methods) and guard the updates with a check for PGO. Later, we can refine this to support updates for non-PGO cases as well.

This fixes a bug (BB containing call's successor didn't have the right BFI), addresses reviewer comments and has been rebased to a recent head.

lib/Analysis/InlineCost.cpp
1574	FunctionEntryFreq is always non-zero and that doesn't depend on entry count.
lib/Transforms/IPO/InlineSimple.cpp
124	Not sure why that is needed.
125	It is dividing by CalleeEntryFreq and multiplying by CallSiteFreq, so this is a problem only when CalleeEntryFreq is 0. As I wrote in a previous comment, that is not possible.

davidxl added inline comments.Feb 2 2016, 4:35 PM

include/llvm/Transforms/IPO/InlinerPass.h
80	--> hasProfileData
lib/Transforms/IPO/Inliner.cpp
432	hasProfileData
578	add a comment here.
test/Transforms/Inline/function-count-update.ll
4	into two callsites in the same block.
20	Can you make these two calls in a block that is not the entry block?

Please also add a test case that covers proper profile update with always inline as intended for the refactoring.

davidxl added inline comments.Feb 2 2016, 4:43 PM

include/llvm/Analysis/InlineCost.h
152	I suggest making this a member method of BFA. Also provide a wrapper method -- this will be the primary API used by other clients for callsite hotness: Optional<uint64_t> getCallsiteCount(CallSite CS);

In D16381#342401, @davidxl wrote:

Please also add a test case that covers proper profile update with always inline as intended for the refactoring.

Done. I have added the alwaysinline attribute to the callee in the first test case and have run it with -always-inline alone.

include/llvm/Analysis/InlineCost.h
152	I agree that this needs to be moved somewhere else, but I am not convinced BFA is that place. In the other profile refactoring patch, I have created a ProfileCommon.h. Perhaps that is a good place for profile related urility methods like this?
test/Transforms/Inline/function-count-update.ll
20	That does not matter for the purposes of this test case.

Address David's comments.

davidxl added inline comments.Feb 3 2016, 4:42 PM

include/llvm/Analysis/InlineCost.h
152	ok -- ProfileCommon.h will be a better place in the future. For now let's keep it here -- but I think we should have a top level wrapper to get count for callsite here.
test/Transforms/Inline/function-count-update.ll
6	in a single callee or in a single BB? The test intends to verify that the profile update of the second inlined callsite is correct after block split -- so having the callee identical in the two sites are not essential to the test.
22	The purpose is to verify that copyBlockFrequency after bb split works in general not just the entry block.

eraman marked an inline comment as done.Feb 4 2016, 1:25 PM

eraman added inline comments.

include/llvm/Analysis/InlineCost.h
152	Do you want the wrapper in InlineCost.cpp or in ProfileCommon.h (later)? As of now, there are two calls to this and one of them can not pass CS (this is in Inliner.cpp during BFI updates) since the call instruction in CS has been removed. I agree that getProfileCount(CS) is a useful API, but I think that should be somewhere like ProfileCommon
test/Transforms/Inline/function-count-update.ll
6	This is a typo - I meant to write "in the same caller". I have fixed the comments. Yes, for the purpose of this test, we don't need the callee to be the same. I have made the caller call 2 different functions. I also don't think making the calls in a block other than the entry makes any difference, but since it doesn't hurt, I've made that change as well.

Tweaks to a test case based on David's feedback.

Ping.

This is a very important enhancement that many people and tuning work are waiting for. It is IMO very clean, solid and non-intrusive. Before we move on, I'd like to see more testings on large apps for sanity.

Prashanth added a subscriber: Prashanth.Feb 9 2016, 9:39 PM

Ping. I would like to move forward on this to enable PGO improvements in
inliner and function layout (separating cold functions becomes effective if
we have their entry counts after inlining). Specifically, I think the code
that computes BFA is isolated enough that ripping it out when inliner is
ported to the new pass manager is not very intrusive.

I did more testing as asked by David. Specifically,

Turn on the updates by default (instead of inly in PGO) and ran llvm

testsuite

Turn on the updates by default and build clang.
Build Google internal compiler benchmark suite in PGO mode with this

patch enabled.

These tests above didn't reveal any issues.

Thanks,
Easwaran

twoh added a subscriber: twoh.Feb 25 2016, 9:00 PM

Thanks for collecting the data!

I have a couple of more comments below:

lib/Analysis/InlineCost.cpp
618	This call has the side effect of force computing BFI even when it is disabled (with option). I think CallAnalyzer needs to have a wrapper to this function and check the settings before this is called.
lib/Transforms/IPO/Inliner.cpp
427	Needs to be guarded with EnableProfile?
441	I think we should introduce a more general flag here EnableProfile. This flag's value is determined by an internal option: --enable-profile-in-inliner=default\|yes\|no With yes and no can be used to force turning on\|off while default is subject to compiler: for now if hasProfile is true, turn it on otherwise off.
582	Should this be guarded with EnableProfile too?

eraman marked an inline comment as done.Mar 2 2016, 11:44 AM

eraman added inline comments.

lib/Analysis/InlineCost.cpp
618	I have guarded the computation in getBlockCount by a check to see if BFA is nullptr. Also made sure getInlineCost gets a non-null BFA only when HasProfileData is true.
lib/Transforms/IPO/Inliner.cpp
441	I don't prefer to add a new flag for that. I think the current solution of checking if profile is turned on is sufficient for now.
582	The callee copyBlockFrequency has the guard, so it is not necessary at the callsite.

eraman updated this revision to Diff 49656.Mar 2 2016, 11:45 AM

LGTM.

This looks really great, so let's move on with this long waited missing feature.

watch for sanitizer build results
there are more tuning opportunities that can be done in the future -- e.g, if all incoming callgraph edges of a node have been analyzed and marked as non-inlinable, the node's prof data can be evicted from the cache. Maybe useful for LTO.

This revision is now accepted and ready to land.Mar 3 2016, 9:32 AM

Closed by commit rL262636: Infrastructure for PGO enhancements in inliner (authored by eraman). · Explain WhyMar 3 2016, 10:31 AM

This revision was automatically updated to reflect the committed changes.

In D16381#367348, @davidxl wrote:

LGTM.

This looks really great, so let's move on with this long waited missing feature.

David, this is not an area of LLVM you have done substantial work on, and this is a very significant feature.

I'm sorry that I have not had time to review this yet, but the correct response is not for you to make a patch as LGTM. Please revert this and let's actually get it reviewed before it goes into the project.

First and foremost, sorry about snapping earlier. I shouldn't have done that, I was frustrated and not communicating very effectively. Thanks to Sean and Hal and others who wrote constructive and helpful emails to get this back on the rails. Secondly, sorry that I've neglected this patch for so long. I kept prioritizing working on the actual pass manager stuff over it, and I should have at least written this up.

So I do have some serious concerns with the approach. I think it is probably a nice way to prototype things and understand their impact (which is very important!) but there are several aspects of the design that seem to be the wrong approach. I've detailed them below.

I've said a few times before, but I continue to think that while some of the issues I've raised below could be addressed, that may not be the best use of time. I think it would probably be more effective to work on helping with the pass manager effort in order to make that effort finish more rapidly. There is a large amount of work that is essentially unblocked right now to help port function passes over. All of this is very easily parallelized.

Once that is done, I would work on designing a good interprocedural profile analysis interface with a good update API, and then teach the inliner to use that. I think that the process of porting the inliner to the new pass manager will end up being a fairly substantial change (essentially a re-write) which will make integrating this much cleaner and easier.

I'm somewhat worried that in its current form, this change would actually make it *harder* to port the inliner and more work to introduce the correct long-term design.

Below are detailed comments about the design issues, but also quite a few comments about serious API, code, and implementation issues with the patch as it currently stands. I think it still needs substantial work to go into the tree, even if the high level design concern above is addressed.

llvm/trunk/include/llvm/Analysis/InlineCost.h
42–55 ↗	(On Diff #49755)	This is pretty much re-implementing a tiny part of the new pass manager inside the inline cost analysis. That really feels like the wrong approach to me. Notably, this is even named exactly the same as we would want to name a port of BlockFrequencyInfo to the new pass manager. I don't agree with this approach. Fundamentally, this is not the right long-term design for how the inliner should access profile information, and it actually obstructs getting the right design in place.
151–152 ↗	(On Diff #49755)	This seems like a surprising API to expose at the top level from the inline cost analysis.
llvm/trunk/include/llvm/Transforms/IPO/InlinerPass.h
30–37 ↗	(On Diff #49755)	These functor typedefs are a bit confusing. First off, std::function is very slow. Is this overhead going to cause a compile time issue? The second two of these typedefs aren't used, please don't leave dead code in the patch. Lastly, having them at the global scope seems like a bad design. It seems like they should be part of some API instead of just being global?
82 ↗	(On Diff #49755)	I would keep vertical space between declarations. It makes the comments much easier to read.
102 ↗	(On Diff #49755)	I would say "Indicates whether we are using profile guided optimization." in the comment. However, I'm surprised we need both a bool and a unique_ptr -- wouldn't the nullness of the pointer indicate the same thing as the bool?
llvm/trunk/include/llvm/Transforms/Utils/Cloning.h
51–52 ↗	(On Diff #49755)	This is in essence an ODR violation -- two headers are defining the same typedef.
164 ↗	(On Diff #49755)	Initializing a functor from nullptr is pretty surprising. I would just use `BlockCloningFunctor Ftor = BlockCloningFunctor()` instead. However, I think this is an indication of a larger design issue. I think the fact that you need to thread this cloning functor between very disparate layers of the code isn't good. I feel like the cloning layer should really be taught to directly support updating profile information when cloning. This of course may not be terrible reasonable to implement currently, as you have the inliner essentially maintaining a mini form of the pass manager internally and need to thread some way to update it through these layers. If that's the problem, I think it really is indicating that this needs to wait until we have the pass manager fixed so that we can layer this logic more cleanly.
llvm/trunk/lib/Analysis/InlineCost.cpp
585–588 ↗	(On Diff #49755)	It's a bit odd that this isn't a doxygen comment. Also that this is marked as "unused". What's the plan here?
618–620 ↗	(On Diff #49755)	Separating this out into two functions doesn't make a lot of sense here. You now have two parts of the code handling profile based inlining adjustments. I think you should keep all of the profile based adjustments inline here, and integrate it with the function count logic above.
1570–1576 ↗	(On Diff #49755)	So, inside this routine, you're actually implementing a proper interprocedural profile analysis. I really think this needs to be extracted to provide a high level analysis API around interprocedurally weighted profile information.
1591–1595 ↗	(On Diff #49755)	While it happens that BPI doesn't keep any references or pointers back into the dominator tree or loop info, the APIs of these analyses actually would make that OK to do... Which makes this code a rather subtle mini implementation of the core logic of the pass manager but inside the inline cost analysis. That worries me some. At the very least it would need a lot of documentation to make it clear exactly what was going on and why this was a safe thing to do.
llvm/trunk/lib/Transforms/IPO/Inliner.cpp
376–380 ↗	(On Diff #49755)	You're dividing by CalleeEntryFreq because the original value was scaled up by that. This essentially strips some precision off. It would seem better to instead directly access the unscaled block frequency for the old BB, and then scale it correctly for the post-inlined state.
383–384 ↗	(On Diff #49755)	We have worked very hard throughout the PGO analysis infrastructure to not rely on actual hardware floating point. We have a broad collection of carefully written math utilities for this. One part of factoring this logic into a real IPO profile update routine would be to use the same facilities and infrastructure that the existing profile (and profile update) analyses use for this purpose.
570 ↗	(On Diff #49755)	Please use lambdas instead of std::bind -- they make the code much more readable as it is easier to understand how they behave. It also avoids the pain of using placeholder values.
759 ↗	(On Diff #49755)	I'm really concerned by the need to manually invalidate BFI in so many places. I think this is going to prove to be a very error prone pattern to maintain long term.

Thanks for the detailed comments Chandler. The comments related to problems around API, code and implementation are very helpful and I'll address them.

Your main concern is that this patch tries to provide the functionality of pass manager inside inline cost analysis. The intention is indeed to provide a restricted subset of pass manager's functionality for the inliner to use, but with the understanding that this is a temporary fix for inliner that should be easily replaceable when this functionality is available in PM. If this doesn't meet those goals (restricted to inliner and takes minimal effort to replace it by the new PM), I'll happily iterate on this patch to address that concern. I have responded inline to the comments related to this aspect.

Thanks,
Easwaran

llvm/trunk/include/llvm/Analysis/InlineCost.h
42–55 ↗	(On Diff #49755)	This is not intended to be a long term solution. The idea is to provide an interface similar to what the new pass manager provides so that when it lands, this code can be ripped off. I can rename it and/or move this code to a separate file if that will make the migration smoother.
151–152 ↗	(On Diff #49755)	I'll move it to ProfileCommon.h. I left it here because I don't want to make anything outside of inliner to use BlockFrequencyAnalysis
llvm/trunk/include/llvm/Transforms/Utils/Cloning.h
164 ↗	(On Diff #49755)	I think clients that invoke the cloning code might want to update different analyses or choose to not update at all. IMO, sinking all that into the cloning code doesn't seem a good idea,
llvm/trunk/lib/Analysis/InlineCost.cpp
1570–1576 ↗	(On Diff #49755)	Ok.
1591–1595 ↗	(On Diff #49755)	To reiterate, the intention is to temporarily provide the functionality that will be provided by the PM in a way that will make the transition easier. Agreed that this is not documented at all. Will fix that.
llvm/trunk/lib/Transforms/IPO/Inliner.cpp
376–380 ↗	(On Diff #49755)	The unscaled block frequency is meant to be used only for the initial BFI computation right? Moreover as we incrementally update the block frequency using setBlockFrequency interface of BFI, there may be blocks that do not have an unscaled value associated with them,
759 ↗	(On Diff #49755)	The alternative is to conservatively invalidate BFI at the beginning of the inliner loop. That is fine with me if the increase in compilation time is ok.

Revision Contents

Path

Size

include/

llvm/

Analysis/

InlineCost.h

25 lines

Transforms/

IPO/

InlinerPass.h

29 lines

Utils/

Cloning.h

18 lines

lib/

Analysis/

InlineCost.cpp

97 lines

Transforms/

IPO/

InlineSimple.cpp

2 lines

Inliner.cpp

110 lines

Utils/

CloneFunction.cpp

42 lines

InlineFunction.cpp

4 lines

test/

Transforms/

Inline/

function-count-update-2.ll

27 lines

function-count-update-3.ll

69 lines

function-count-update.ll

30 lines

Diff 46720

include/llvm/Analysis/InlineCost.h

	Show All 14 Lines
	#define LLVM_ANALYSIS_INLINECOST_H			#define LLVM_ANALYSIS_INLINECOST_H

	#include "llvm/Analysis/CallGraphSCCPass.h"			#include "llvm/Analysis/CallGraphSCCPass.h"
	#include <cassert>			#include <cassert>
	#include <climits>			#include <climits>

	namespace llvm {			namespace llvm {
	class AssumptionCacheTracker;			class AssumptionCacheTracker;
				class BlockFrequencyInfo;
	class CallSite;			class CallSite;
	class DataLayout;			class DataLayout;
	class Function;			class Function;
	class TargetTransformInfo;			class TargetTransformInfo;

	namespace InlineConstants {			namespace InlineConstants {
	// Various magic constants used to adjust heuristics.			// Various magic constants used to adjust heuristics.
	const int InstrCost = 5;			const int InstrCost = 5;
	const int IndirectCallThreshold = 100;			const int IndirectCallThreshold = 100;
	const int CallPenalty = 25;			const int CallPenalty = 25;
	const int LastCallToStaticBonus = -15000;			const int LastCallToStaticBonus = -15000;
	const int ColdccPenalty = 2000;			const int ColdccPenalty = 2000;
	const int NoreturnPenalty = 10000;			const int NoreturnPenalty = 10000;
	/// Do not inline functions which allocate this many bytes on the stack			/// Do not inline functions which allocate this many bytes on the stack
	/// when the caller is recursive.			/// when the caller is recursive.
	const unsigned TotalAllocaSizeRecursiveCaller = 1024;			const unsigned TotalAllocaSizeRecursiveCaller = 1024;
	}			}

				/// \brief Block frequency analysis for multiple functions.
				/// This class mimics block frequency analysis on CGSCC level. Block frequency
				/// info is computed on demand and cached unless they are invalidated.
				davidxlUnsubmitted Not Done Reply Inline Actions suggested comment change: ... on demand, cached (for callees), and incrementally updated (for callers). The caller BFI is invalidated after inlining is done for (the caller). davidxl: suggested comment change: ... on demand, cached (for callees), and incrementally updated (for…
				eramanAuthorUnsubmitted Not Done Reply Inline Actions This class itself is agnostic about callers and callees. All it does is cache the BFI and invalidate them when required - and that's what the comment says. eraman: This class itself is agnostic about callers and callees. All it does is cache the BFI and…
				class BlockFrequencyAnalysis {
				private:
				DenseMap<Function , BlockFrequencyInfo > BFM;

				public:
				~BlockFrequencyAnalysis();
				/// \brief Returns BlockFrequencyInfo for a function.
				BlockFrequencyInfo getBlockFrequencyInfo(Function );
				/// \brief Invalidates block frequency info for a function.
				davidxlUnsubmitted Done Reply Inline Actions Typo -- frequency davidxl: Typo -- frequency
				void invalidateBlockFrequencyInfo(Function *);
				};

	/// \brief Represents the cost of inlining a function.			/// \brief Represents the cost of inlining a function.
	///			///
	/// This supports special values for functions which should "always" or			/// This supports special values for functions which should "always" or
	/// "never" be inlined. Otherwise, the cost represents a unitless amount;			/// "never" be inlined. Otherwise, the cost represents a unitless amount;
	/// smaller values increase the likelihood of the function being inlined.			/// smaller values increase the likelihood of the function being inlined.
	///			///
	/// Objects of this type also provide the adjusted threshold for inlining			/// Objects of this type also provide the adjusted threshold for inlining
	/// based on the information available for a particular callsite. They can be			/// based on the information available for a particular callsite. They can be
	▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines
	/// new threshold are computed with any accuracy. The new threshold can be			/// new threshold are computed with any accuracy. The new threshold can be
	/// used to bound the computation necessary to determine whether the cost is			/// used to bound the computation necessary to determine whether the cost is
	/// sufficiently low to warrant inlining.			/// sufficiently low to warrant inlining.
	///			///
	/// Also note that calling this function dynamically computes the cost of			/// Also note that calling this function dynamically computes the cost of
	/// inlining the callsite. It is an expensive, heavyweight call.			/// inlining the callsite. It is an expensive, heavyweight call.
	InlineCost getInlineCost(CallSite CS, int DefaultThreshold,			InlineCost getInlineCost(CallSite CS, int DefaultThreshold,
	TargetTransformInfo &CalleeTTI,			TargetTransformInfo &CalleeTTI,
	AssumptionCacheTracker *ACT);			AssumptionCacheTracker *ACT,
				BlockFrequencyAnalysis *BFA);

	/// \brief Get an InlineCost with the callee explicitly specified.			/// \brief Get an InlineCost with the callee explicitly specified.
	/// This allows you to calculate the cost of inlining a function via a			/// This allows you to calculate the cost of inlining a function via a
	/// pointer. This behaves exactly as the version with no explicit callee			/// pointer. This behaves exactly as the version with no explicit callee
	/// parameter in all other respects.			/// parameter in all other respects.
	//			//
	InlineCost getInlineCost(CallSite CS, Function *Callee, int DefaultThreshold,			InlineCost getInlineCost(CallSite CS, Function *Callee, int DefaultThreshold,
	TargetTransformInfo &CalleeTTI,			TargetTransformInfo &CalleeTTI,
	AssumptionCacheTracker *ACT);			AssumptionCacheTracker *ACT,
				BlockFrequencyAnalysis *BFA);

	int computeThresholdFromOptLevels(unsigned OptLevel, unsigned SizeOptLevel);			int computeThresholdFromOptLevels(unsigned OptLevel, unsigned SizeOptLevel);

	/// \brief Return the default value of -inline-threshold.			/// \brief Return the default value of -inline-threshold.
	int getDefaultInlineThreshold();			int getDefaultInlineThreshold();

	/// \brief Minimal filter to detect invalid constructs for inlining.			/// \brief Minimal filter to detect invalid constructs for inlining.
	bool isInlineViable(Function &Callee);			bool isInlineViable(Function &Callee);

				/// \brief Return estimated count of the block \p BB.
				Optional<uint64_t> getBlockCount(BasicBlock BB, BlockFrequencyAnalysis BFA);
				davidxlUnsubmitted Not Done Reply Inline Actions I suggest making this a member method of BFA. Also provide a wrapper method -- this will be the primary API used by other clients for callsite hotness: Optional<uint64_t> getCallsiteCount(CallSite CS); davidxl: I suggest making this a member method of BFA. Also provide a wrapper method -- this will be the…
				eramanAuthorUnsubmitted Not Done Reply Inline Actions I agree that this needs to be moved somewhere else, but I am not convinced BFA is that place. In the other profile refactoring patch, I have created a ProfileCommon.h. Perhaps that is a good place for profile related urility methods like this? eraman: I agree that this needs to be moved somewhere else, but I am not convinced BFA is that place.
				davidxlUnsubmitted Not Done Reply Inline Actions ok -- ProfileCommon.h will be a better place in the future. For now let's keep it here -- but I think we should have a top level wrapper to get count for callsite here. davidxl: ok -- ProfileCommon.h will be a better place in the future. For now let's keep it here -- but I…
				eramanAuthorUnsubmitted Not Done Reply Inline Actions Do you want the wrapper in InlineCost.cpp or in ProfileCommon.h (later)? As of now, there are two calls to this and one of them can not pass CS (this is in Inliner.cpp during BFI updates) since the call instruction in CS has been removed. I agree that getProfileCount(CS) is a useful API, but I think that should be somewhere like ProfileCommon eraman: Do you want the wrapper in InlineCost.cpp or in ProfileCommon.h (later)? As of now, there are…
	}			}

	#endif			#endif

include/llvm/Transforms/IPO/InlinerPass.h

Show All 18 Lines

#include "llvm/Analysis/CallGraphSCCPass.h"		#include "llvm/Analysis/CallGraphSCCPass.h"

namespace llvm {		namespace llvm {
class AssumptionCacheTracker;		class AssumptionCacheTracker;
class CallSite;		class CallSite;
class DataLayout;		class DataLayout;
class InlineCost;		class InlineCost;
		class BlockFrequencyAnalysis;
template <class PtrType, unsigned SmallSize> class SmallPtrSet;		template <class PtrType, unsigned SmallSize> class SmallPtrSet;

		// Functor invoked when a block is cloned during inlining.
		typedef std::function<void(const BasicBlock , const BasicBlock )>
		BlockCloningFunctor;
		// Functor invoked when a function is inlined inside the basic block
		// containing the call.
		typedef std::function<void(BasicBlock , Function )> FunctionCloningFunctor;
		// Functor invoked when a function gets deleted during inlining.
		typedef std::function<void(Function *)> FunctionDeletedFunctor;

/// Inliner - This class contains all of the helper code which is used to		/// Inliner - This class contains all of the helper code which is used to
/// perform the inlining operations that do not depend on the policy.		/// perform the inlining operations that do not depend on the policy.
///		///
struct Inliner : public CallGraphSCCPass {		struct Inliner : public CallGraphSCCPass {
explicit Inliner(char &ID);		explicit Inliner(char &ID);
explicit Inliner(char &ID, bool InsertLifetime);		explicit Inliner(char &ID, bool InsertLifetime);

/// getAnalysisUsage - For this class, we declare that we require and preserve		/// getAnalysisUsage - For this class, we declare that we require and preserve
Show All 24 Lines	struct Inliner : public CallGraphSCCPass {
/// attribute. This is useful for the InlineAlways pass that only wants to		/// attribute. This is useful for the InlineAlways pass that only wants to
/// deal with that subset of the functions.		/// deal with that subset of the functions.
bool removeDeadFunctions(CallGraph &CG, bool AlwaysInlineOnly = false);		bool removeDeadFunctions(CallGraph &CG, bool AlwaysInlineOnly = false);

private:		private:
// InsertLifetime - Insert @llvm.lifetime intrinsics.		// InsertLifetime - Insert @llvm.lifetime intrinsics.
bool InsertLifetime;		bool InsertLifetime;

		/// Are we using profile guided optimization?
		bool PGOMode;
		davidxlUnsubmitted Done Reply Inline Actions --> hasProfileData davidxl: --> hasProfileData
/// shouldInline - Return true if the inliner should attempt to		/// shouldInline - Return true if the inliner should attempt to
/// inline at the given CallSite.		/// inline at the given CallSite.
bool shouldInline(CallSite CS);		bool shouldInline(CallSite CS);
		/// Set the BFI of \p Dst to be the same as \p Src.
		void copyBlockFrequency(BasicBlock Src, BasicBlock Dst);
		/// Invalidates BFI for function \p F.
		void invalidateBFI(Function *F);
		/// Invalidates BFI for all functions in \p SCC.
		void invalidateBFI(CallGraphSCC &SCC);
		/// Update function entry count for \p Callee which has been inlined into
		/// \p CallBB.
		void updateEntryCount(BasicBlock CallBB, Function Callee);
		/// \brief Update block frequency of an inlined block.
		/// This method updates the block frequency of \p NewBB which is a clone of
		/// \p OrigBB when the callsite \p CS gets inlined. The frequency of \p NewBB
		/// is computed as follows:
		/// Freq(NewBB) = Freq(OrigBB) * CallSiteFreq / CalleeEntryFreq.
		void updateBlockFreq(CallSite &CS, const BasicBlock *OrigBB,
		const BasicBlock *NewBB);

protected:		protected:
AssumptionCacheTracker *ACT;		AssumptionCacheTracker *ACT;
		std::unique_ptr<BlockFrequencyAnalysis> BFA;
};		};

		davidxlUnsubmitted Not Done Reply Inline Actions Can they be made pure virtual -- probably not if the derived class is not required to override it. davidxl: Can they be made pure virtual -- probably not if the derived class is not required to override…
		eramanAuthorUnsubmitted Not Done Reply Inline Actions AlwaysInliner then has to implement this method to return nullptr. I don't see any advantage in making it pyure virtual. eraman: AlwaysInliner then has to implement this method to return nullptr. I don't see any advantage in…
} // End llvm namespace		} // End llvm namespace
		eramanAuthorUnsubmitted Not Done Reply Inline Actions Removed them. I had initially used them and later replaced them by getXXXFunctor methods but forgot to remove those fields. eraman: Removed them. I had initially used them and later replaced them by getXXXFunctor methods but…

#endif		#endif
		zzhengUnsubmitted Not Done Reply Inline Actions Where's these two being used? zzheng: Where's these two being used?

include/llvm/Transforms/Utils/Cloning.h

Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
class CallGraph;		class CallGraph;
class DataLayout;		class DataLayout;
class Loop;		class Loop;
class LoopInfo;		class LoopInfo;
class AllocaInst;		class AllocaInst;
class AssumptionCacheTracker;		class AssumptionCacheTracker;
class DominatorTree;		class DominatorTree;

		typedef std::function<void(const BasicBlock , const BasicBlock )>
		davidxlUnsubmitted Not Done Reply Inline Actions Is there a common header to put this decl in? davidxl: Is there a common header to put this decl in?
		eramanAuthorUnsubmitted Not Done Reply Inline Actions I could leave it in Cloning.h and include it in InlineSimple.cpp and get rid of it in InlinerPass.h, but this adds an unnecessary dependence to InlineSimple.cpp. There is no other header file common to the inliner proper and CloneFunction.cpp where it is appropriate to add this. eraman: I could leave it in Cloning.h and include it in InlineSimple.cpp and get rid of it in…
		BlockCloningFunctor;

/// Return an exact copy of the specified module		/// Return an exact copy of the specified module
///		///
std::unique_ptr<Module> CloneModule(const Module *M);		std::unique_ptr<Module> CloneModule(const Module *M);
std::unique_ptr<Module> CloneModule(const Module *M, ValueToValueMapTy &VMap);		std::unique_ptr<Module> CloneModule(const Module *M, ValueToValueMapTy &VMap);

/// Return a copy of the specified module. The ShouldCloneDefinition function		/// Return a copy of the specified module. The ShouldCloneDefinition function
/// controls whether a specific GlobalValue's definition is cloned. If the		/// controls whether a specific GlobalValue's definition is cloned. If the
/// function returns false, the module copy will contain an external reference		/// function returns false, the module copy will contain an external reference
▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	void CloneFunctionInto(Function NewFunc, const Function OldFunc,
ValueMapTypeRemapper *TypeMapper = nullptr,		ValueMapTypeRemapper *TypeMapper = nullptr,
ValueMaterializer *Materializer = nullptr);		ValueMaterializer *Materializer = nullptr);

void CloneAndPruneIntoFromInst(Function NewFunc, const Function OldFunc,		void CloneAndPruneIntoFromInst(Function NewFunc, const Function OldFunc,
const Instruction *StartingInst,		const Instruction *StartingInst,
ValueToValueMapTy &VMap, bool ModuleLevelChanges,		ValueToValueMapTy &VMap, bool ModuleLevelChanges,
SmallVectorImpl<ReturnInst *> &Returns,		SmallVectorImpl<ReturnInst *> &Returns,
const char *NameSuffix = "",		const char *NameSuffix = "",
ClonedCodeInfo *CodeInfo = nullptr);		ClonedCodeInfo *CodeInfo = nullptr,
		BlockCloningFunctor Ftor = nullptr);

/// CloneAndPruneFunctionInto - This works exactly like CloneFunctionInto,		/// CloneAndPruneFunctionInto - This works exactly like CloneFunctionInto,
/// except that it does some simple constant prop and DCE on the fly. The		/// except that it does some simple constant prop and DCE on the fly. The
/// effect of this is to copy significantly less code in cases where (for		/// effect of this is to copy significantly less code in cases where (for
/// example) a function call with constant arguments is inlined, and those		/// example) a function call with constant arguments is inlined, and those
/// constant arguments cause a significant amount of code in the callee to be		/// constant arguments cause a significant amount of code in the callee to be
/// dead. Since this doesn't produce an exactly copy of the input, it can't be		/// dead. Since this doesn't produce an exactly copy of the input, it can't be
/// used for things like CloneFunction or CloneModule.		/// used for things like CloneFunction or CloneModule.
///		///
/// If ModuleLevelChanges is false, VMap contains no non-identity GlobalValue		/// If ModuleLevelChanges is false, VMap contains no non-identity GlobalValue
/// mappings.		/// mappings.
///		///
void CloneAndPruneFunctionInto(Function NewFunc, const Function OldFunc,		void CloneAndPruneFunctionInto(Function NewFunc, const Function OldFunc,
ValueToValueMapTy &VMap, bool ModuleLevelChanges,		ValueToValueMapTy &VMap, bool ModuleLevelChanges,
SmallVectorImpl<ReturnInst*> &Returns,		SmallVectorImpl<ReturnInst *> &Returns,
const char *NameSuffix = "",		const char *NameSuffix = "",
ClonedCodeInfo *CodeInfo = nullptr,		ClonedCodeInfo *CodeInfo = nullptr,
Instruction *TheCall = nullptr);		Instruction *TheCall = nullptr,
		BlockCloningFunctor Ftor = nullptr);

/// InlineFunctionInfo - This class captures the data input to the		/// InlineFunctionInfo - This class captures the data input to the
/// InlineFunction call, and records the auxiliary results produced by it.		/// InlineFunction call, and records the auxiliary results produced by it.
class InlineFunctionInfo {		class InlineFunctionInfo {
public:		public:
explicit InlineFunctionInfo(CallGraph *cg = nullptr,		explicit InlineFunctionInfo(CallGraph *cg = nullptr,
AssumptionCacheTracker *ACT = nullptr)		AssumptionCacheTracker *ACT = nullptr,
: CG(cg), ACT(ACT) {}		BlockCloningFunctor Ftor = nullptr)
		: CG(cg), ACT(ACT), Ftor(Ftor) {}

/// CG - If non-null, InlineFunction will update the callgraph to reflect the		/// CG - If non-null, InlineFunction will update the callgraph to reflect the
/// changes it makes.		/// changes it makes.
CallGraph *CG;		CallGraph *CG;
AssumptionCacheTracker *ACT;		AssumptionCacheTracker *ACT;
		// Functor that is invoked when a block is cloned into the new function.
		BlockCloningFunctor Ftor;

/// StaticAllocas - InlineFunction fills this in with all static allocas that		/// StaticAllocas - InlineFunction fills this in with all static allocas that
/// get copied into the caller.		/// get copied into the caller.
SmallVector<AllocaInst *, 4> StaticAllocas;		SmallVector<AllocaInst *, 4> StaticAllocas;

/// InlinedCalls - InlineFunction fills this in with callsites that were		/// InlinedCalls - InlineFunction fills this in with callsites that were
/// inlined from the callee. This is only filled in if CG is non-null.		/// inlined from the callee. This is only filled in if CG is non-null.
SmallVector<WeakVH, 8> InlinedCalls;		SmallVector<WeakVH, 8> InlinedCalls;
▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

lib/Analysis/InlineCost.cpp

Show All 12 Lines

#include "llvm/Analysis/InlineCost.h"		#include "llvm/Analysis/InlineCost.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SetVector.h"		#include "llvm/ADT/SetVector.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/AssumptionCache.h"		#include "llvm/Analysis/AssumptionCache.h"
		#include "llvm/Analysis/BlockFrequencyInfo.h"
		#include "llvm/Analysis/BlockFrequencyInfoImpl.h"
		#include "llvm/Analysis/BranchProbabilityInfo.h"
#include "llvm/Analysis/CodeMetrics.h"		#include "llvm/Analysis/CodeMetrics.h"
#include "llvm/Analysis/ConstantFolding.h"		#include "llvm/Analysis/ConstantFolding.h"
#include "llvm/Analysis/InstructionSimplify.h"		#include "llvm/Analysis/InstructionSimplify.h"
		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/IR/CallSite.h"		#include "llvm/IR/CallSite.h"
#include "llvm/IR/CallingConv.h"		#include "llvm/IR/CallingConv.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
		#include "llvm/IR/Dominators.h"
#include "llvm/IR/GetElementPtrTypeIterator.h"		#include "llvm/IR/GetElementPtrTypeIterator.h"
#include "llvm/IR/GlobalAlias.h"		#include "llvm/IR/GlobalAlias.h"
#include "llvm/IR/InstVisitor.h"		#include "llvm/IR/InstVisitor.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Operator.h"		#include "llvm/IR/Operator.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"

Show All 26 Lines
// PGO before we actually hook up inliner with analysis passes such as BPI and		// PGO before we actually hook up inliner with analysis passes such as BPI and
// BFI.		// BFI.
static cl::opt<int> ColdThreshold(		static cl::opt<int> ColdThreshold(
"inlinecold-threshold", cl::Hidden, cl::init(225),		"inlinecold-threshold", cl::Hidden, cl::init(225),
cl::desc("Threshold for inlining functions with cold attribute"));		cl::desc("Threshold for inlining functions with cold attribute"));

namespace {		namespace {

class CallAnalyzer : public InstVisitor<CallAnalyzer, bool> {		class CallAnalyzer : public InstVisitor<CallAnalyzer, bool> {
		davidxlUnsubmitted Done Reply Inline Actions Remove this -- it can be folded in the threshold adjusting method tuned in the future. davidxl: Remove this -- it can be folded in the threshold adjusting method tuned in the future.
typedef InstVisitor<CallAnalyzer, bool> Base;		typedef InstVisitor<CallAnalyzer, bool> Base;
friend class InstVisitor<CallAnalyzer, bool>;		friend class InstVisitor<CallAnalyzer, bool>;

/// The TargetTransformInfo available for this compilation.		/// The TargetTransformInfo available for this compilation.
const TargetTransformInfo &TTI;		const TargetTransformInfo &TTI;

		davidxlUnsubmitted Done Reply Inline Actions I suggest removing these two parameters in these patch, as they are for not irrelevant and confusing (and subject to change in the very near future when summary data is available to be used here). The objective of this patch is to provide infrastructure hooks to feed profile data to the inliner, not to tune/or change the current inline behavior/decision -- as that needs more design and experiment. To accomplish that goal, I suggest simply add a dummy wrapper function to do this: int getAdjustedThreshold(int current_threshold, uint64_t callsite_count attribute((unused))) { // just return the input threshold for now return current_threshold; } In the future, I expect the threshold is different based on callsite_count and summary cutoffs. davidxl: I suggest removing these two parameters in these patch, as they are for not irrelevant and…
/// The cache of @llvm.assume intrinsics.		/// The cache of @llvm.assume intrinsics.
AssumptionCacheTracker *ACT;		AssumptionCacheTracker *ACT;

// The called function.		// The called function.
Function &F;		Function &F;

// The candidate callsite being analyzed. Please do not use this to do		// The candidate callsite being analyzed. Please do not use this to do
// analysis in the caller function; we want the inline cost query to be		// analysis in the caller function; we want the inline cost query to be
// easily cacheable. Instead, use the cover function paramHasAttr.		// easily cacheable. Instead, use the cover function paramHasAttr.
CallSite CandidateCS;		CallSite CandidateCS;

		BlockFrequencyAnalysis *BFA;
int Threshold;		int Threshold;
int Cost;		int Cost;

bool IsCallerRecursive;		bool IsCallerRecursive;
bool IsRecursiveCall;		bool IsRecursiveCall;
bool ExposesReturnsTwice;		bool ExposesReturnsTwice;
bool HasDynamicAlloca;		bool HasDynamicAlloca;
bool ContainsNoDuplicateCall;		bool ContainsNoDuplicateCall;
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	class CallAnalyzer : public InstVisitor<CallAnalyzer, bool> {
/// inlined through this particular callsite.		/// inlined through this particular callsite.
bool isKnownNonNullInCallee(Value *V);		bool isKnownNonNullInCallee(Value *V);

/// Update Threshold based on callsite properties such as callee		/// Update Threshold based on callsite properties such as callee
/// attributes and callee hotness for PGO builds. The Callee is explicitly		/// attributes and callee hotness for PGO builds. The Callee is explicitly
/// passed to support analyzing indirect calls whose target is inferred by		/// passed to support analyzing indirect calls whose target is inferred by
/// analysis.		/// analysis.
void updateThreshold(CallSite CS, Function &Callee);		void updateThreshold(CallSite CS, Function &Callee);
		/// Adjust Threshold based on CallSiteCount and return the adjusted threshold.
		int getAdjustedThreshold(int Threshold, Optional<uint64_t> CallSiteCount);

// Custom analysis routines.		// Custom analysis routines.
bool analyzeBlock(BasicBlock BB, SmallPtrSetImpl<const Value > &EphValues);		bool analyzeBlock(BasicBlock BB, SmallPtrSetImpl<const Value > &EphValues);

// Disable several entry points to the visitor so we don't accidentally use		// Disable several entry points to the visitor so we don't accidentally use
// them by declaring but not defining them here.		// them by declaring but not defining them here.
void visit(Module *); void visit(Module &);		void visit(Module *); void visit(Module &);
void visit(Function *); void visit(Function &);		void visit(Function *); void visit(Function &);
Show All 25 Lines	class CallAnalyzer : public InstVisitor<CallAnalyzer, bool> {
bool visitIndirectBrInst(IndirectBrInst &IBI);		bool visitIndirectBrInst(IndirectBrInst &IBI);
bool visitResumeInst(ResumeInst &RI);		bool visitResumeInst(ResumeInst &RI);
bool visitCleanupReturnInst(CleanupReturnInst &RI);		bool visitCleanupReturnInst(CleanupReturnInst &RI);
bool visitCatchReturnInst(CatchReturnInst &RI);		bool visitCatchReturnInst(CatchReturnInst &RI);
bool visitUnreachableInst(UnreachableInst &I);		bool visitUnreachableInst(UnreachableInst &I);

public:		public:
CallAnalyzer(const TargetTransformInfo &TTI, AssumptionCacheTracker *ACT,		CallAnalyzer(const TargetTransformInfo &TTI, AssumptionCacheTracker *ACT,
Function &Callee, int Threshold, CallSite CSArg)		Function &Callee, int Threshold, CallSite CSArg,
: TTI(TTI), ACT(ACT), F(Callee), CandidateCS(CSArg), Threshold(Threshold),		BlockFrequencyAnalysis *BFA)
Cost(0), IsCallerRecursive(false), IsRecursiveCall(false),		: TTI(TTI), ACT(ACT), F(Callee), CandidateCS(CSArg), BFA(BFA),
ExposesReturnsTwice(false), HasDynamicAlloca(false),		Threshold(Threshold), Cost(0), IsCallerRecursive(false),
ContainsNoDuplicateCall(false), HasReturn(false), HasIndirectBr(false),		IsRecursiveCall(false), ExposesReturnsTwice(false),
HasFrameEscape(false), AllocatedSize(0), NumInstructions(0),		HasDynamicAlloca(false), ContainsNoDuplicateCall(false),
NumVectorInstructions(0), FiftyPercentVectorBonus(0),		HasReturn(false), HasIndirectBr(false), HasFrameEscape(false),
TenPercentVectorBonus(0), VectorBonus(0), NumConstantArgs(0),		AllocatedSize(0), NumInstructions(0), NumVectorInstructions(0),
NumConstantOffsetPtrArgs(0), NumAllocaArgs(0), NumConstantPtrCmps(0),		FiftyPercentVectorBonus(0), TenPercentVectorBonus(0), VectorBonus(0),
NumConstantPtrDiffs(0), NumInstructionsSimplified(0),		NumConstantArgs(0), NumConstantOffsetPtrArgs(0), NumAllocaArgs(0),
SROACostSavings(0), SROACostSavingsLost(0) {}		NumConstantPtrCmps(0), NumConstantPtrDiffs(0),
		NumInstructionsSimplified(0), SROACostSavings(0),
		SROACostSavingsLost(0) {}

bool analyzeCall(CallSite CS);		bool analyzeCall(CallSite CS);

int getThreshold() { return Threshold; }		int getThreshold() { return Threshold; }
int getCost() { return Cost; }		int getCost() { return Cost; }

// Keep a bunch of stats about the cost savings found so we can print them		// Keep a bunch of stats about the cost savings found so we can print them
// out when debugging.		// out when debugging.
▲ Show 20 Lines • Show All 351 Lines • ▼ Show 20 Lines	if (isAllocaDerivedArg(V))
// We can actually predict the result of comparisons between an		// We can actually predict the result of comparisons between an
// alloca-derived value and null. Note that this fires regardless of		// alloca-derived value and null. Note that this fires regardless of
// SROA firing.		// SROA firing.
return true;		return true;

return false;		return false;
}		}

		// Adjust the threshold based on callsite hotness. Currently this is a nop.
		int CallAnalyzer::getAdjustedThreshold(int Threshold,
		Optional<uint64_t> CallSiteCount
		__attribute__((unused))) {
		// FIXME: The new threshold should be computed from the given Threshold and
		// the callsite hotness.
		return Threshold;
		}

void CallAnalyzer::updateThreshold(CallSite CS, Function &Callee) {		void CallAnalyzer::updateThreshold(CallSite CS, Function &Callee) {
// If -inline-threshold is not given, listen to the optsize and minsize		// If -inline-threshold is not given, listen to the optsize and minsize
// attributes when they would decrease the threshold.		// attributes when they would decrease the threshold.
Function *Caller = CS.getCaller();		Function *Caller = CS.getCaller();

if (!(DefaultInlineThreshold.getNumOccurrences() > 0)) {		if (!(DefaultInlineThreshold.getNumOccurrences() > 0)) {
if (Caller->optForMinSize() && OptMinSizeThreshold < Threshold)		if (Caller->optForMinSize() && OptMinSizeThreshold < Threshold)
Threshold = OptMinSizeThreshold;		Threshold = OptMinSizeThreshold;
else if (Caller->optForSize() && OptSizeThreshold < Threshold)		else if (Caller->optForSize() && OptSizeThreshold < Threshold)
Threshold = OptSizeThreshold;		Threshold = OptSizeThreshold;
}		}

// If profile information is available, use that to adjust threshold of hot		// If profile information is available, use that to adjust threshold of hot
// and cold functions.		// and cold functions.
// FIXME: The heuristic used below for determining hotness and coldness are		// FIXME: The heuristic used below for determining hotness and coldness are
// based on preliminary SPEC tuning and may not be optimal. Replace this with		// based on preliminary SPEC tuning and may not be optimal. Replace this with
// a well-tuned heuristic based on callsite hotness and not callee hotness.		// a well-tuned heuristic based on callsite hotness and not callee hotness.
uint64_t FunctionCount = 0, MaxFunctionCount = 0;		uint64_t FunctionCount = 0, MaxFunctionCount = 0;
bool HasPGOCounts = false;		bool HasPGOCounts = false;
if (Callee.getEntryCount() && Callee.getParent()->getMaximumFunctionCount()) {		if (Callee.getEntryCount() && Callee.getParent()->getMaximumFunctionCount()) {
HasPGOCounts = true;		HasPGOCounts = true;
FunctionCount = Callee.getEntryCount().getValue();		FunctionCount = Callee.getEntryCount().getValue();
MaxFunctionCount = Callee.getParent()->getMaximumFunctionCount().getValue();		MaxFunctionCount = Callee.getParent()->getMaximumFunctionCount().getValue();
}		}
		Optional<uint64_t> CallSiteCount =
		davidxlUnsubmitted Not Done Reply Inline Actions This call has the side effect of force computing BFI even when it is disabled (with option). I think CallAnalyzer needs to have a wrapper to this function and check the settings before this is called. davidxl: This call has the side effect of force computing BFI even when it is disabled (with option). I…
		eramanAuthorUnsubmitted Not Done Reply Inline Actions I have guarded the computation in getBlockCount by a check to see if BFA is nullptr. Also made sure getInlineCost gets a non-null BFA only when HasProfileData is true. eraman: I have guarded the computation in getBlockCount by a check to see if BFA is nullptr. Also made…
		llvm::getBlockCount(CS.getInstruction()->getParent(), BFA);
		junbumlUnsubmitted Done Reply Inline Actions Call and CallBB seems to be used only in llvm::getBlockCount(CallBB, BFA); junbuml: Call and CallBB seems to be used only in llvm::getBlockCount(CallBB, BFA);
		Threshold = getAdjustedThreshold(Threshold, CallSiteCount);
		davidxlUnsubmitted Done Reply Inline Actions This guard is wrong :1) it should check caller entry count, not callee entry count. 2) hasPGOCounts also depends on other conditions. In fact, this condition can be removed. davidxl: This guard is wrong :1) it should check caller entry count, not callee entry count. 2)…
		eramanAuthorUnsubmitted Not Done Reply Inline Actions Yes, this check is not needed and I've removed it. eraman: Yes, this check is not needed and I've removed it.

// Listen to the inlinehint attribute or profile based hotness information		// Listen to the inlinehint attribute or profile based hotness information
		zzhengUnsubmitted Done Reply Inline Actions Unused variable: CallerEntryCount zzheng: Unused variable: CallerEntryCount
// when it would increase the threshold and the caller does not need to		// when it would increase the threshold and the caller does not need to
		davidxlUnsubmitted Done Reply Inline Actions Define a helper method to get callsite profile count. davidxl: Define a helper method to get callsite profile count.
// minimize its size.		// minimize its size.
bool InlineHint =		bool InlineHint =
Callee.hasFnAttribute(Attribute::InlineHint) \|\|		Callee.hasFnAttribute(Attribute::InlineHint) \|\|
(HasPGOCounts &&		(HasPGOCounts &&
		zzhengUnsubmitted Done Reply Inline Actions Caller->getEntryCount() called twice, this looks simpler: Optional<uint64_t> CallerEntryCount = Caller->getEntryCount(); if (CallerEntryCount.hasValue()) { uint64_t CallSiteCount = CallerEntryCount.getValue() * CallSiteFreq / CallerEntryFreq; } zzheng: Caller->getEntryCount() called twice, this looks simpler: ``` Optional<uint64_t>…
FunctionCount >= (uint64_t)(0.3 * (double)MaxFunctionCount));		FunctionCount >= (uint64_t)(0.3 * (double)MaxFunctionCount));
if (InlineHint && HintThreshold > Threshold && !Caller->optForMinSize())		if (InlineHint && HintThreshold > Threshold && !Caller->optForMinSize())
		davidxlUnsubmitted Done Reply Inline Actions call the proposed 'getAdjustedThreshold(Threshold, CallSiteCount); davidxl: call the proposed 'getAdjustedThreshold(Threshold, CallSiteCount);
Threshold = HintThreshold;		Threshold = HintThreshold;

// Listen to the cold attribute or profile based coldness information		// Listen to the cold attribute or profile based coldness information
// when it would decrease the threshold.		// when it would decrease the threshold.
bool ColdCallee =		bool ColdCallee =
Callee.hasFnAttribute(Attribute::Cold) \|\|		Callee.hasFnAttribute(Attribute::Cold) \|\|
(HasPGOCounts &&		(HasPGOCounts &&
FunctionCount <= (uint64_t)(0.01 * (double)MaxFunctionCount));		FunctionCount <= (uint64_t)(0.01 * (double)MaxFunctionCount));
▲ Show 20 Lines • Show All 291 Lines • ▼ Show 20 Lines	bool CallAnalyzer::visitCallSite(CallSite CS) {
if (!F)		if (!F)
return Base::visitCallSite(CS);		return Base::visitCallSite(CS);

// If we have a constant that we are calling as a function, we can peer		// If we have a constant that we are calling as a function, we can peer
// through it and see the function target. This happens not infrequently		// through it and see the function target. This happens not infrequently
// during devirtualization and so we want to give it a hefty bonus for		// during devirtualization and so we want to give it a hefty bonus for
// inlining, but cap that bonus in the event that inlining wouldn't pan		// inlining, but cap that bonus in the event that inlining wouldn't pan
// out. Pretend to inline the function, with a custom threshold.		// out. Pretend to inline the function, with a custom threshold.
CallAnalyzer CA(TTI, ACT, *F, InlineConstants::IndirectCallThreshold, CS);		CallAnalyzer CA(TTI, ACT, *F, InlineConstants::IndirectCallThreshold, CS,
		BFA);
if (CA.analyzeCall(CS)) {		if (CA.analyzeCall(CS)) {
// We were able to inline the indirect call! Subtract the cost from the		// We were able to inline the indirect call! Subtract the cost from the
// threshold to get the bonus we want to apply, but don't go below zero.		// threshold to get the bonus we want to apply, but don't go below zero.
Cost -= std::max(0, CA.getThreshold() - CA.getCost());		Cost -= std::max(0, CA.getThreshold() - CA.getCost());
}		}

return Base::visitCallSite(CS);		return Base::visitCallSite(CS);
}		}
▲ Show 20 Lines • Show All 504 Lines • ▼ Show 20 Lines	static bool functionsHaveCompatibleAttributes(Function *Caller,
Function *Callee,		Function *Callee,
TargetTransformInfo &TTI) {		TargetTransformInfo &TTI) {
return TTI.areInlineCompatible(Caller, Callee) &&		return TTI.areInlineCompatible(Caller, Callee) &&
AttributeFuncs::areInlineCompatible(Caller, Callee);		AttributeFuncs::areInlineCompatible(Caller, Callee);
}		}

InlineCost llvm::getInlineCost(CallSite CS, int DefaultThreshold,		InlineCost llvm::getInlineCost(CallSite CS, int DefaultThreshold,
TargetTransformInfo &CalleeTTI,		TargetTransformInfo &CalleeTTI,
AssumptionCacheTracker *ACT) {		AssumptionCacheTracker *ACT,
		BlockFrequencyAnalysis *BFA) {
return getInlineCost(CS, CS.getCalledFunction(), DefaultThreshold, CalleeTTI,		return getInlineCost(CS, CS.getCalledFunction(), DefaultThreshold, CalleeTTI,
ACT);		ACT, BFA);
}		}

int llvm::computeThresholdFromOptLevels(unsigned OptLevel,		int llvm::computeThresholdFromOptLevels(unsigned OptLevel,
		junbumlUnsubmitted Done Reply Inline Actions I cannot see any use of this function. Are you planing to hook this function in heuristic in this patch ? junbuml: I cannot see any use of this function. Are you planing to hook this function in heuristic in…
		eramanAuthorUnsubmitted Not Done Reply Inline Actions Yes, I was planning to use this as part of tuning, but I've now removed it from this patch. eraman: Yes, I was planning to use this as part of tuning, but I've now removed it from this patch.
		davidxlUnsubmitted Done Reply Inline Actions Remove unused function. davidxl: Remove unused function.
unsigned SizeOptLevel) {		unsigned SizeOptLevel) {
if (OptLevel > 2)		if (OptLevel > 2)
return OptAggressiveThreshold;		return OptAggressiveThreshold;
if (SizeOptLevel == 1) // -Os		if (SizeOptLevel == 1) // -Os
return OptSizeThreshold;		return OptSizeThreshold;
if (SizeOptLevel == 2) // -Oz		if (SizeOptLevel == 2) // -Oz
return OptMinSizeThreshold;		return OptMinSizeThreshold;
return DefaultInlineThreshold;		return DefaultInlineThreshold;
}		}

int llvm::getDefaultInlineThreshold() { return DefaultInlineThreshold; }		int llvm::getDefaultInlineThreshold() { return DefaultInlineThreshold; }

InlineCost llvm::getInlineCost(CallSite CS, Function *Callee,		InlineCost llvm::getInlineCost(CallSite CS, Function *Callee,
int DefaultThreshold,		int DefaultThreshold,
TargetTransformInfo &CalleeTTI,		TargetTransformInfo &CalleeTTI,
AssumptionCacheTracker *ACT) {		AssumptionCacheTracker *ACT,
		BlockFrequencyAnalysis *BFA) {

// Cannot inline indirect calls.		// Cannot inline indirect calls.
if (!Callee)		if (!Callee)
return llvm::InlineCost::getNever();		return llvm::InlineCost::getNever();

// Calls to functions with always-inline attributes should be inlined		// Calls to functions with always-inline attributes should be inlined
// whenever possible.		// whenever possible.
if (CS.hasFnAttr(Attribute::AlwaysInline)) {		if (CS.hasFnAttr(Attribute::AlwaysInline)) {
Show All 16 Lines	InlineCost llvm::getInlineCost(CallSite CS, Function *Callee,
// marked noinline.		// marked noinline.
if (Callee->mayBeOverridden() \|\|		if (Callee->mayBeOverridden() \|\|
Callee->hasFnAttribute(Attribute::NoInline) \|\| CS.isNoInline())		Callee->hasFnAttribute(Attribute::NoInline) \|\| CS.isNoInline())
return llvm::InlineCost::getNever();		return llvm::InlineCost::getNever();

DEBUG(llvm::dbgs() << " Analyzing call of " << Callee->getName()		DEBUG(llvm::dbgs() << " Analyzing call of " << Callee->getName()
<< "...\n");		<< "...\n");

CallAnalyzer CA(CalleeTTI, ACT, *Callee, DefaultThreshold, CS);		CallAnalyzer CA(CalleeTTI, ACT, *Callee, DefaultThreshold, CS, BFA);
bool ShouldInline = CA.analyzeCall(CS);		bool ShouldInline = CA.analyzeCall(CS);

DEBUG(CA.dump());		DEBUG(CA.dump());

// Check if there was a reason to force inlining or no inlining.		// Check if there was a reason to force inlining or no inlining.
if (!ShouldInline && CA.getCost() < CA.getThreshold())		if (!ShouldInline && CA.getCost() < CA.getThreshold())
return InlineCost::getNever();		return InlineCost::getNever();
if (ShouldInline && CA.getCost() >= CA.getThreshold())		if (ShouldInline && CA.getCost() >= CA.getThreshold())
Show All 31 Lines	for (auto &II : *BI) {
CS.getCalledFunction()->getIntrinsicID() ==		CS.getCalledFunction()->getIntrinsicID() ==
llvm::Intrinsic::localescape)		llvm::Intrinsic::localescape)
return false;		return false;
}		}
}		}

return true;		return true;
}		}

		/// \brief Get estimated execution count for \p BB.
		davidxlUnsubmitted Done Reply Inline Actions Add a comment for the function. davidxl: Add a comment for the function.
		Optional<uint64_t> llvm::getBlockCount(BasicBlock *BB,
		BlockFrequencyAnalysis *BFA) {
		Function *F = BB->getParent();
		Optional<uint64_t> EntryCount = F->getEntryCount();
		if (!EntryCount)
		return None;
		BlockFrequencyInfo *BFI = BFA->getBlockFrequencyInfo(F);
		uint64_t BBFreq = BFI->getBlockFreq(BB).getFrequency();
		uint64_t FunctionEntryFreq = BFI->getEntryFreq();
		uint64_t BBCount = EntryCount.getValue() * BBFreq / FunctionEntryFreq;
		junbumlUnsubmitted Not Done Reply Inline Actions Can we guarantee FunctionEntryFreq is always non-zero if EntryCount is non-zero ? junbuml: Can we guarantee FunctionEntryFreq is always non-zero if EntryCount is non-zero ?
		eramanAuthorUnsubmitted Not Done Reply Inline Actions FunctionEntryFreq is always non-zero and that doesn't depend on entry count. eraman: FunctionEntryFreq is always non-zero and that doesn't depend on entry count.
		return BBCount;
		}

		zzhengUnsubmitted Done Reply Inline Actions Empty comment line, I think it can be removed. zzheng: Empty comment line, I think it can be removed.
		davidxlUnsubmitted Done Reply Inline Actions Irrelevant comment line? davidxl: Irrelevant comment line?
		BlockFrequencyAnalysis::~BlockFrequencyAnalysis() {
		for (auto &Entry : BFM) {
		delete Entry.second;
		}
		}

		junbumlUnsubmitted Done Reply Inline Actions You can call LoopInfo LI(DT) instead of LoopInfo LI; LI.analyze(DT); junbuml: You can call LoopInfo LI(DT) instead of LoopInfo LI; LI.analyze(DT);
		/// \brief Get BlockFrequencyInfo for a function.
		BlockFrequencyInfo BlockFrequencyAnalysis::getBlockFrequencyInfo(Function F) {
		auto Iter = BFM.find(F);
		junbumlUnsubmitted Done Reply Inline Actions BranchProbabilityInfo BPI(F, LI); will call calculate(F, LI); junbuml:* BranchProbabilityInfo BPI(*F, LI); will call calculate(F, LI);
		if (Iter != BFM.end())
		return Iter->second;
		// We need to create a BlockFrequencyInfo object for F and store it.
		junbumlUnsubmitted Done Reply Inline Actions You can also call BlockFrequencyInfo(F, BPI, LI), which call calculate(F, BPI, LI); inside. junbuml: You can also call BlockFrequencyInfo(F, BPI, LI), which call calculate(F, BPI, LI); inside.
		DominatorTree DT;
		DT.recalculate(*F);
		LoopInfo LI(DT);
		BranchProbabilityInfo BPI(*F, LI);
		BlockFrequencyInfo BFI = new BlockFrequencyInfo(F, BPI, LI);
		BFM[F] = BFI;
		return BFI;
		}

		/// \brief Invalidate BlockFrequencyInfo for a function.
		void BlockFrequencyAnalysis::invalidateBlockFrequencyInfo(Function *F) {
		BFM.erase(F);
		}

lib/Transforms/IPO/InlineSimple.cpp

Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	SimpleInliner(int Threshold) : Inliner(ID), DefaultThreshold(Threshold) {
initializeSimpleInlinerPass(*PassRegistry::getPassRegistry());		initializeSimpleInlinerPass(*PassRegistry::getPassRegistry());
}		}

static char ID; // Pass identification, replacement for typeid		static char ID; // Pass identification, replacement for typeid

InlineCost getInlineCost(CallSite CS) override {		InlineCost getInlineCost(CallSite CS) override {
Function *Callee = CS.getCalledFunction();		Function *Callee = CS.getCalledFunction();
TargetTransformInfo &TTI = TTIWP->getTTI(*Callee);		TargetTransformInfo &TTI = TTIWP->getTTI(*Callee);
return llvm::getInlineCost(CS, DefaultThreshold, TTI, ACT);		return llvm::getInlineCost(CS, DefaultThreshold, TTI, ACT, BFA.get());
}		}

bool runOnSCC(CallGraphSCC &SCC) override;		bool runOnSCC(CallGraphSCC &SCC) override;
void getAnalysisUsage(AnalysisUsage &AU) const override;		void getAnalysisUsage(AnalysisUsage &AU) const override;

private:		private:
TargetTransformInfoWrapperPass *TTIWP;		TargetTransformInfoWrapperPass *TTIWP;
};		};
Show All 25 Lines
bool SimpleInliner::runOnSCC(CallGraphSCC &SCC) {		bool SimpleInliner::runOnSCC(CallGraphSCC &SCC) {
TTIWP = &getAnalysis<TargetTransformInfoWrapperPass>();		TTIWP = &getAnalysis<TargetTransformInfoWrapperPass>();
return Inliner::runOnSCC(SCC);		return Inliner::runOnSCC(SCC);
}		}

void SimpleInliner::getAnalysisUsage(AnalysisUsage &AU) const {		void SimpleInliner::getAnalysisUsage(AnalysisUsage &AU) const {
AU.addRequired<TargetTransformInfoWrapperPass>();		AU.addRequired<TargetTransformInfoWrapperPass>();
Inliner::getAnalysisUsage(AU);		Inliner::getAnalysisUsage(AU);
}		}
		davidxlUnsubmitted Done Reply Inline Actions \p OrigBB \p NewBB davidxl: \p OrigBB \p NewBB
		davidxlUnsubmitted Done Reply Inline Actions Freq --> OrigBBFreq with comment it is OrigBB's frequency in callee davidxl: Freq --> OrigBBFreq with comment it is OrigBB's frequency in callee
		davidxlUnsubmitted Done Reply Inline Actions \p Callee ... after it got inlined at callsite in block \p CallBB davidxl: \p Callee ... after it got inlined at callsite in block \p CallBB
		davidxlUnsubmitted Done Reply Inline Actions computing callsite count belongs to a common helper function -- which can be used elsewhere too. davidxl: computing callsite count belongs to a common helper function -- which can be used elsewhere too.
		eramanAuthorUnsubmitted Not Done Reply Inline Actions I have placed this helper in Inlinecost.cpp. eraman: I have placed this helper in Inlinecost.cpp.
		davidxlUnsubmitted Done Reply Inline Actions It is an estimate because the limitation of frequency propagation algorithm (unable to handle zero BP etc). It is helpful to add a debug message here to indicate this condition. davidxl: It is an estimate because the limitation of frequency propagation algorithm (unable to handle…
		junbumlUnsubmitted Done Reply Inline Actions No need to have braces. junbuml: No need to have braces.
		junbumlUnsubmitted Not Done Reply Inline Actions Should we have (double) here ? junbuml: Should we have (double) here ?
		eramanAuthorUnsubmitted Not Done Reply Inline Actions Not sure why that is needed. eraman: Not sure why that is needed.
		junbumlUnsubmitted Not Done Reply Inline Actions why not checking CalleeEntryFreq * CallSiteFreq is non-zero. junbuml: why not checking CalleeEntryFreq * CallSiteFreq is non-zero.
		eramanAuthorUnsubmitted Not Done Reply Inline Actions It is dividing by CalleeEntryFreq and multiplying by CallSiteFreq, so this is a problem only when CalleeEntryFreq is 0. As I wrote in a previous comment, that is not possible. eraman: It is dividing by CalleeEntryFreq and multiplying by CallSiteFreq, so this is a problem only…

lib/Transforms/IPO/Inliner.cpp

Show All 13 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/Transforms/IPO/InlinerPass.h"		#include "llvm/Transforms/IPO/InlinerPass.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/Analysis/AssumptionCache.h"		#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/Analysis/BasicAliasAnalysis.h"		#include "llvm/Analysis/BasicAliasAnalysis.h"
		#include "llvm/Analysis/BlockFrequencyInfo.h"
#include "llvm/Analysis/CallGraph.h"		#include "llvm/Analysis/CallGraph.h"
#include "llvm/Analysis/InlineCost.h"		#include "llvm/Analysis/InlineCost.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/IR/CallSite.h"		#include "llvm/IR/CallSite.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/DiagnosticInfo.h"		#include "llvm/IR/DiagnosticInfo.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
Show All 12 Lines
STATISTIC(NumDeleted, "Number of functions deleted because all callers found");		STATISTIC(NumDeleted, "Number of functions deleted because all callers found");
STATISTIC(NumMergedAllocas, "Number of allocas merged together");		STATISTIC(NumMergedAllocas, "Number of allocas merged together");

// This weirdly named statistic tracks the number of times that, when attempting		// This weirdly named statistic tracks the number of times that, when attempting
// to inline a function A into B, we analyze the callers of B in order to see		// to inline a function A into B, we analyze the callers of B in order to see
// if those would be more profitable and blocked inline steps.		// if those would be more profitable and blocked inline steps.
STATISTIC(NumCallerCallersAnalyzed, "Number of caller-callers analyzed");		STATISTIC(NumCallerCallersAnalyzed, "Number of caller-callers analyzed");

Inliner::Inliner(char &ID) : CallGraphSCCPass(ID), InsertLifetime(true) {}		Inliner::Inliner(char &ID)
		: CallGraphSCCPass(ID), InsertLifetime(true),
		BFA(new BlockFrequencyAnalysis()) {}

Inliner::Inliner(char &ID, bool InsertLifetime)		Inliner::Inliner(char &ID, bool InsertLifetime)
: CallGraphSCCPass(ID), InsertLifetime(InsertLifetime) {}		: CallGraphSCCPass(ID), InsertLifetime(InsertLifetime),
		BFA(new BlockFrequencyAnalysis()) {}

/// For this class, we declare that we require and preserve the call graph.		/// For this class, we declare that we require and preserve the call graph.
/// If the derived class implements this method, it should		/// If the derived class implements this method, it should
/// always explicitly call the implementation here.		/// always explicitly call the implementation here.
void Inliner::getAnalysisUsage(AnalysisUsage &AU) const {		void Inliner::getAnalysisUsage(AnalysisUsage &AU) const {
AU.addRequired<AssumptionCacheTracker>();		AU.addRequired<AssumptionCacheTracker>();
AU.addRequired<TargetLibraryInfoWrapperPass>();		AU.addRequired<TargetLibraryInfoWrapperPass>();
CallGraphSCCPass::getAnalysisUsage(AU);		CallGraphSCCPass::getAnalysisUsage(AU);
▲ Show 20 Lines • Show All 191 Lines • ▼ Show 20 Lines	DEBUG(dbgs() << " NOT Inlining: cost=" << IC.getCost()
<< ", thres=" << (IC.getCostDelta() + IC.getCost())		<< ", thres=" << (IC.getCostDelta() + IC.getCost())
<< ", Call: " << *CS.getInstruction() << "\n");		<< ", Call: " << *CS.getInstruction() << "\n");
emitAnalysis(CS, Twine(CS.getCalledFunction()->getName() +		emitAnalysis(CS, Twine(CS.getCalledFunction()->getName() +
" too costly to inline (cost=") +		" too costly to inline (cost=") +
Twine(IC.getCost()) + ", threshold=" +		Twine(IC.getCost()) + ", threshold=" +
Twine(IC.getCostDelta() + IC.getCost()) + ")");		Twine(IC.getCostDelta() + IC.getCost()) + ")");
return false;		return false;
}		}

// Try to detect the case where the current inlining candidate caller (call		// Try to detect the case where the current inlining candidate caller (call
// it B) is a static or linkonce-ODR function and is an inlining candidate		// it B) is a static or linkonce-ODR function and is an inlining candidate
// elsewhere, and the current candidate callee (call it C) is large enough		// elsewhere, and the current candidate callee (call it C) is large enough
// that inlining it into B would make B too big to inline later. In these		// that inlining it into B would make B too big to inline later. In these
// circumstances it may be best not to inline C into B, but to inline B into		// circumstances it may be best not to inline C into B, but to inline B into
// its callers.		// its callers.
//		//
// This only applies to static and linkonce-ODR functions because those are		// This only applies to static and linkonce-ODR functions because those are
▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	assert(unsigned(InlineHistoryID) < InlineHistory.size() &&
"Invalid inline history ID");		"Invalid inline history ID");
if (InlineHistory[InlineHistoryID].first == F)		if (InlineHistory[InlineHistoryID].first == F)
return true;		return true;
InlineHistoryID = InlineHistory[InlineHistoryID].second;		InlineHistoryID = InlineHistory[InlineHistoryID].second;
}		}
return false;		return false;
}		}

		/// \brief Update the frequency of a block that is cloned into the caller.
		/// This is invoked when \p OrigBB from the callee is cloned into \p NewBB in
		/// the caller.
		void Inliner::updateBlockFreq(CallSite &CS, const BasicBlock *OrigBB,
		const BasicBlock *NewBB) {
		Instruction *Call = CS.getInstruction();
		BasicBlock *CallBB = Call->getParent();
		BlockFrequencyInfo *CalleeBFI =
		BFA->getBlockFrequencyInfo(CS.getCalledFunction());
		BlockFrequencyInfo *CallerBFI =
		BFA->getBlockFrequencyInfo(CallBB->getParent());
		// Find the number of times OrigBB is executed per invocation of the callee
		// and multiply by the number of times callee is executed in the caller.
		// Freq(NewBB) = Freq(OrigBB) * CallSiteFreq / CalleeEntryFreq.
		uint64_t CallSiteFreq = CallerBFI->getBlockFreq(CallBB).getFrequency();
		uint64_t CalleeEntryFreq = CalleeBFI->getEntryFreq();
		// Frequency of OrigBB in the callee.
		BlockFrequency OrigBBFreq = CalleeBFI->getBlockFreq(OrigBB);
		CallerBFI->setBlockFreq(NewBB, (double)(OrigBBFreq.getFrequency()) /
		CalleeEntryFreq * CallSiteFreq);
		}

		/// \brief Update entry count of \p Callee after it got inlined at a callsite
		/// in block \p CallBB.
		void Inliner::updateEntryCount(BasicBlock CallBB, Function Callee) {
		if (!PGOMode)
		return;
		// If the callee has a original count of N, and the estimated count of
		// callsite is M, the new callee count is set to N - M. M is estimated from
		// the caller's entry count, its entry block frequency and the block frequency
		// of the callsite.
		Optional<uint64_t> CalleeCount = Callee->getEntryCount();
		if (!CalleeCount)
		return;
		Optional<uint64_t> CallSiteCount = llvm::getBlockCount(CallBB, BFA.get());
		if (!CallSiteCount)
		return;
		// Since CallSiteCount is an estimate, it could exceed the original callee
		// count and has to be set to 0.
		if (CallSiteCount.getValue() > CalleeCount.getValue()) {
		Callee->setEntryCount(0);
		DEBUG(llvm::dbgs() << "Estimated count of block " << CallBB->getName()
		<< " is " << CallSiteCount.getValue()
		<< " which exceeds the entry count "
		<< CalleeCount.getValue() << " of the callee "
		<< Callee->getName() << "\n");
		} else
		Callee->setEntryCount(CalleeCount.getValue() - CallSiteCount.getValue());
		}

		void Inliner::invalidateBFI(Function *F) {
		if (!PGOMode)
		return;
		if (F)
		BFA->invalidateBlockFrequencyInfo(F);
		}
		void Inliner::invalidateBFI(CallGraphSCC &SCC) {
		if (!PGOMode)
		return;
		for (CallGraphNode *Node : SCC) {
		Function *F = Node->getFunction();
		invalidateBFI(F);
		}
		}
		void Inliner::copyBlockFrequency(BasicBlock Src, BasicBlock Dst) {
		Function *F = Src->getParent();
		davidxlUnsubmitted Done Reply Inline Actions Needs to be guarded with EnableProfile? davidxl: Needs to be guarded with EnableProfile?
		BlockFrequencyInfo *BFI = BFA->getBlockFrequencyInfo(F);
		BFI->setBlockFreq(Dst, BFI->getBlockFreq(Src).getFrequency());
		}

		static bool isPGOMode(Module &M) {
		davidxlUnsubmitted Done Reply Inline Actions hasProfileData davidxl: hasProfileData
		// We check for the presence of MaxFunctionCount in the module.
		// FIXME: This now only works for frontend based instrumentation.
		return M.getMaximumFunctionCount().hasValue();
		}

bool Inliner::runOnSCC(CallGraphSCC &SCC) {		bool Inliner::runOnSCC(CallGraphSCC &SCC) {
		using namespace std::placeholders;
CallGraph &CG = getAnalysis<CallGraphWrapperPass>().getCallGraph();		CallGraph &CG = getAnalysis<CallGraphWrapperPass>().getCallGraph();
		PGOMode = isPGOMode(CG.getModule());
		davidxlUnsubmitted Not Done Reply Inline Actions I think we should introduce a more general flag here EnableProfile. This flag's value is determined by an internal option: --enable-profile-in-inliner=default\|yes\|no With yes and no can be used to force turning on\|off while default is subject to compiler: for now if hasProfile is true, turn it on otherwise off. davidxl: I think we should introduce a more general flag here EnableProfile. This flag's value is…
		eramanAuthorUnsubmitted Not Done Reply Inline Actions I don't prefer to add a new flag for that. I think the current solution of checking if profile is turned on is sufficient for now. eraman: I don't prefer to add a new flag for that. I think the current solution of checking if profile…
ACT = &getAnalysis<AssumptionCacheTracker>();		ACT = &getAnalysis<AssumptionCacheTracker>();
auto &TLI = getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();		auto &TLI = getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();

SmallPtrSet<Function*, 8> SCCFunctions;		SmallPtrSet<Function*, 8> SCCFunctions;
DEBUG(dbgs() << "Inliner visiting SCC:");		DEBUG(dbgs() << "Inliner visiting SCC:");
for (CallGraphNode *Node : SCC) {		for (CallGraphNode *Node : SCC) {
Function *F = Node->getFunction();		Function *F = Node->getFunction();
if (F) SCCFunctions.insert(F);		if (F) SCCFunctions.insert(F);
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	bool Inliner::runOnSCC(CallGraphSCC &SCC) {
unsigned FirstCallInSCC = CallSites.size();		unsigned FirstCallInSCC = CallSites.size();
for (unsigned i = 0; i < FirstCallInSCC; ++i)		for (unsigned i = 0; i < FirstCallInSCC; ++i)
if (Function *F = CallSites[i].first.getCalledFunction())		if (Function *F = CallSites[i].first.getCalledFunction())
if (SCCFunctions.count(F))		if (SCCFunctions.count(F))
std::swap(CallSites[i--], CallSites[--FirstCallInSCC]);		std::swap(CallSites[i--], CallSites[--FirstCallInSCC]);


InlinedArrayAllocasTy InlinedArrayAllocas;		InlinedArrayAllocasTy InlinedArrayAllocas;
InlineFunctionInfo InlineInfo(&CG, ACT);

// Now that we have all of the call sites, loop over them and inline them if		// Now that we have all of the call sites, loop over them and inline them if
// it looks profitable to do so.		// it looks profitable to do so.
bool Changed = false;		bool Changed = false;
bool LocalChange;		bool LocalChange;
do {		do {
LocalChange = false;		LocalChange = false;
// Iterate over the outer loop because inlining functions can cause indirect		// Iterate over the outer loop because inlining functions can cause indirect
Show All 12 Lines	for (unsigned CSi = 0; CSi != CallSites.size(); ++CSi) {
if (isInstructionTriviallyDead(CS.getInstruction(), &TLI)) {		if (isInstructionTriviallyDead(CS.getInstruction(), &TLI)) {
DEBUG(dbgs() << " -> Deleting dead call: "		DEBUG(dbgs() << " -> Deleting dead call: "
<< *CS.getInstruction() << "\n");		<< *CS.getInstruction() << "\n");
// Update the call graph by deleting the edge from Callee to Caller.		// Update the call graph by deleting the edge from Callee to Caller.
CG[Caller]->removeCallEdgeFor(CS);		CG[Caller]->removeCallEdgeFor(CS);
CS.getInstruction()->eraseFromParent();		CS.getInstruction()->eraseFromParent();
++NumCallsDeleted;		++NumCallsDeleted;
} else {		} else {
		Instruction *TheCall = CS.getInstruction();
		BasicBlock *CallSiteBlock = TheCall->getParent();
		Instruction CallSuccessor = &(++BasicBlock::iterator(TheCall));

// We can only inline direct calls to non-declarations.		// We can only inline direct calls to non-declarations.
if (!Callee \|\| Callee->isDeclaration()) continue;		if (!Callee \|\| Callee->isDeclaration()) continue;

// If this call site was obtained by inlining another function, verify		// If this call site was obtained by inlining another function, verify
// that the include path for the function did not include the callee		// that the include path for the function did not include the callee
// itself. If so, we'd be recursively inlining the same function,		// itself. If so, we'd be recursively inlining the same function,
// which would provide the same callsites, which would cause us to		// which would provide the same callsites, which would cause us to
// infinitely inline.		// infinitely inline.
Show All 12 Lines	for (unsigned CSi = 0; CSi != CallSites.size(); ++CSi) {
if (!shouldInline(CS)) {		if (!shouldInline(CS)) {
emitOptimizationRemarkMissed(CallerCtx, DEBUG_TYPE, *Caller, DLoc,		emitOptimizationRemarkMissed(CallerCtx, DEBUG_TYPE, *Caller, DLoc,
Twine(Callee->getName() +		Twine(Callee->getName() +
" will not be inlined into " +		" will not be inlined into " +
Caller->getName()));		Caller->getName()));
continue;		continue;
}		}

		BlockCloningFunctor BCF = nullptr;
		if (PGOMode)
		BCF = std::bind(&Inliner::updateBlockFreq, this, CS, _1, _2);
		InlineFunctionInfo InlineInfo(&CG, ACT, BCF);

// Attempt to inline the function.		// Attempt to inline the function.
if (!InlineCallIfPossible(*this, CS, InlineInfo, InlinedArrayAllocas,		if (!InlineCallIfPossible(*this, CS, InlineInfo, InlinedArrayAllocas,
InlineHistoryID, InsertLifetime)) {		InlineHistoryID, InsertLifetime)) {
emitOptimizationRemarkMissed(CallerCtx, DEBUG_TYPE, *Caller, DLoc,		emitOptimizationRemarkMissed(CallerCtx, DEBUG_TYPE, *Caller, DLoc,
Twine(Callee->getName() +		Twine(Callee->getName() +
" will not be inlined into " +		" will not be inlined into " +
Caller->getName()));		Caller->getName()));
continue;		continue;
}		}
		updateEntryCount(CallSiteBlock, Callee);
		copyBlockFrequency(CallSiteBlock, CallSuccessor->getParent());
		davidxlUnsubmitted Done Reply Inline Actions add a comment here. davidxl: add a comment here.

++NumInlined;		++NumInlined;

// Report the inline decision.		// Report the inline decision.
		davidxlUnsubmitted Not Done Reply Inline Actions Should this be guarded with EnableProfile too? davidxl: Should this be guarded with EnableProfile too?
		eramanAuthorUnsubmitted Not Done Reply Inline Actions The callee copyBlockFrequency has the guard, so it is not necessary at the callsite. eraman: The callee copyBlockFrequency has the guard, so it is not necessary at the callsite.
emitOptimizationRemark(		emitOptimizationRemark(
CallerCtx, DEBUG_TYPE, *Caller, DLoc,		CallerCtx, DEBUG_TYPE, *Caller, DLoc,
Twine(Callee->getName() + " inlined into " + Caller->getName()));		Twine(Callee->getName() + " inlined into " + Caller->getName()));

// If inlining this function gave us any new call sites, throw them		// If inlining this function gave us any new call sites, throw them
// onto our worklist to process. They are useful inline candidates.		// onto our worklist to process. They are useful inline candidates.
if (!InlineInfo.InlinedCalls.empty()) {		if (!InlineInfo.InlinedCalls.empty()) {
// Create a new inline history entry for this, so that we remember		// Create a new inline history entry for this, so that we remember
Show All 19 Lines	for (unsigned CSi = 0; CSi != CallSites.size(); ++CSi) {
DEBUG(dbgs() << " -> Deleting dead function: "		DEBUG(dbgs() << " -> Deleting dead function: "
<< Callee->getName() << "\n");		<< Callee->getName() << "\n");
CallGraphNode *CalleeNode = CG[Callee];		CallGraphNode *CalleeNode = CG[Callee];

// Remove any call graph edges from the callee to its callees.		// Remove any call graph edges from the callee to its callees.
CalleeNode->removeAllCalledFunctions();		CalleeNode->removeAllCalledFunctions();

// Removing the node for callee from the call graph and delete it.		// Removing the node for callee from the call graph and delete it.
delete CG.removeFunctionFromModule(CalleeNode);		Function *F = CG.removeFunctionFromModule(CalleeNode);
		invalidateBFI(F);
		delete F;
++NumDeleted;		++NumDeleted;
}		}

// Remove this call site from the list. If possible, use		// Remove this call site from the list. If possible, use
// swap/pop_back for efficiency, but do not use it if doing so would		// swap/pop_back for efficiency, but do not use it if doing so would
// move a call site to a function in this SCC before the		// move a call site to a function in this SCC before the
// 'FirstCallInSCC' barrier.		// 'FirstCallInSCC' barrier.
if (SCC.isSingular()) {		if (SCC.isSingular()) {
CallSites[CSi] = CallSites.back();		CallSites[CSi] = CallSites.back();
CallSites.pop_back();		CallSites.pop_back();
} else {		} else {
CallSites.erase(CallSites.begin()+CSi);		CallSites.erase(CallSites.begin()+CSi);
}		}
--CSi;		--CSi;

Changed = true;		Changed = true;
LocalChange = true;		LocalChange = true;
}		}
} while (LocalChange);		} while (LocalChange);

		invalidateBFI(SCC);
return Changed;		return Changed;
}		}

/// Remove now-dead linkonce functions at the end of		/// Remove now-dead linkonce functions at the end of
/// processing to avoid breaking the SCC traversal.		/// processing to avoid breaking the SCC traversal.
bool Inliner::doFinalization(CallGraph &CG) {		bool Inliner::doFinalization(CallGraph &CG) {
return removeDeadFunctions(CG);		return removeDeadFunctions(CG);
}		}
▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	bool Inliner::removeDeadFunctions(CallGraph &CG, bool AlwaysInlineOnly) {
// Note that it doesn't matter that we are iterating over a non-stable order		// Note that it doesn't matter that we are iterating over a non-stable order
// here to do this, it doesn't matter which order the functions are deleted		// here to do this, it doesn't matter which order the functions are deleted
// in.		// in.
array_pod_sort(FunctionsToRemove.begin(), FunctionsToRemove.end());		array_pod_sort(FunctionsToRemove.begin(), FunctionsToRemove.end());
FunctionsToRemove.erase(std::unique(FunctionsToRemove.begin(),		FunctionsToRemove.erase(std::unique(FunctionsToRemove.begin(),
FunctionsToRemove.end()),		FunctionsToRemove.end()),
FunctionsToRemove.end());		FunctionsToRemove.end());
for (CallGraphNode *CGN : FunctionsToRemove) {		for (CallGraphNode *CGN : FunctionsToRemove) {
delete CG.removeFunctionFromModule(CGN);		Function *F = CG.removeFunctionFromModule(CGN);
		invalidateBFI(F);
		delete F;
++NumDeleted;		++NumDeleted;
}		}
return true;		return true;
}		}

lib/Transforms/Utils/CloneFunction.cpp

Show First 20 Lines • Show All 271 Lines • ▼ Show 20 Lines	PruningFunctionCloner(Function newFunc, const Function oldFunc,
ValueToValueMapTy &valueMap, bool moduleLevelChanges,		ValueToValueMapTy &valueMap, bool moduleLevelChanges,
const char nameSuffix, ClonedCodeInfo codeInfo)		const char nameSuffix, ClonedCodeInfo codeInfo)
: NewFunc(newFunc), OldFunc(oldFunc), VMap(valueMap),		: NewFunc(newFunc), OldFunc(oldFunc), VMap(valueMap),
ModuleLevelChanges(moduleLevelChanges), NameSuffix(nameSuffix),		ModuleLevelChanges(moduleLevelChanges), NameSuffix(nameSuffix),
CodeInfo(codeInfo) {}		CodeInfo(codeInfo) {}

/// The specified block is found to be reachable, clone it and		/// The specified block is found to be reachable, clone it and
/// anything that it can reach.		/// anything that it can reach.
void CloneBlock(const BasicBlock *BB,		void CloneBlock(const BasicBlock *BB,
BasicBlock::const_iterator StartingInst,		BasicBlock::const_iterator StartingInst,
std::vector<const BasicBlock*> &ToClone);		std::vector<const BasicBlock *> &ToClone,
		BlockCloningFunctor Ftor = nullptr);
};		};
}		}

/// The specified block is found to be reachable, clone it and		/// The specified block is found to be reachable, clone it and
/// anything that it can reach.		/// anything that it can reach.
void PruningFunctionCloner::CloneBlock(const BasicBlock *BB,		void PruningFunctionCloner::CloneBlock(const BasicBlock *BB,
BasicBlock::const_iterator StartingInst,		BasicBlock::const_iterator StartingInst,
std::vector<const BasicBlock*> &ToClone){		std::vector<const BasicBlock *> &ToClone,
		BlockCloningFunctor Ftor) {
WeakVH &BBEntry = VMap[BB];		WeakVH &BBEntry = VMap[BB];

// Have we already cloned this block?		// Have we already cloned this block?
if (BBEntry) return;		if (BBEntry) return;

// Nope, clone it now.		// Nope, clone it now.
BasicBlock *NewBB;		BasicBlock *NewBB;
BBEntry = NewBB = BasicBlock::Create(BB->getContext());		BBEntry = NewBB = BasicBlock::Create(BB->getContext());
▲ Show 20 Lines • Show All 120 Lines • ▼ Show 20 Lines	void PruningFunctionCloner::CloneBlock(const BasicBlock *BB,
}		}

if (CodeInfo) {		if (CodeInfo) {
CodeInfo->ContainsCalls \|= hasCalls;		CodeInfo->ContainsCalls \|= hasCalls;
CodeInfo->ContainsDynamicAllocas \|= hasDynamicAllocas;		CodeInfo->ContainsDynamicAllocas \|= hasDynamicAllocas;
CodeInfo->ContainsDynamicAllocas \|= hasStaticAllocas &&		CodeInfo->ContainsDynamicAllocas \|= hasStaticAllocas &&
BB != &BB->getParent()->front();		BB != &BB->getParent()->front();
}		}
		// Call Ftor to tell BB has been cloned to NewBB
		if (Ftor)
		Ftor(BB, NewBB);
}		}

/// This works like CloneAndPruneFunctionInto, except that it does not clone the		/// This works like CloneAndPruneFunctionInto, except that it does not clone the
/// entire function. Instead it starts at an instruction provided by the caller		/// entire function. Instead it starts at an instruction provided by the caller
/// and copies (and prunes) only the code reachable from that instruction.		/// and copies (and prunes) only the code reachable from that instruction.
void llvm::CloneAndPruneIntoFromInst(Function NewFunc, const Function OldFunc,		void llvm::CloneAndPruneIntoFromInst(
const Instruction *StartingInst,		Function NewFunc, const Function OldFunc, const Instruction *StartingInst,
ValueToValueMapTy &VMap,		ValueToValueMapTy &VMap, bool ModuleLevelChanges,
bool ModuleLevelChanges,		SmallVectorImpl<ReturnInst > &Returns, const char NameSuffix,
SmallVectorImpl<ReturnInst *> &Returns,		ClonedCodeInfo *CodeInfo, BlockCloningFunctor Ftor) {
const char *NameSuffix,
ClonedCodeInfo *CodeInfo) {
assert(NameSuffix && "NameSuffix cannot be null!");		assert(NameSuffix && "NameSuffix cannot be null!");

ValueMapTypeRemapper *TypeMapper = nullptr;		ValueMapTypeRemapper *TypeMapper = nullptr;
ValueMaterializer *Materializer = nullptr;		ValueMaterializer *Materializer = nullptr;

#ifndef NDEBUG		#ifndef NDEBUG
// If the cloning starts at the beginning of the function, verify that		// If the cloning starts at the beginning of the function, verify that
// the function arguments are mapped.		// the function arguments are mapped.
Show All 9 Lines	if (StartingInst)
StartingBB = StartingInst->getParent();		StartingBB = StartingInst->getParent();
else {		else {
StartingBB = &OldFunc->getEntryBlock();		StartingBB = &OldFunc->getEntryBlock();
StartingInst = &StartingBB->front();		StartingInst = &StartingBB->front();
}		}

// Clone the entry block, and anything recursively reachable from it.		// Clone the entry block, and anything recursively reachable from it.
std::vector<const BasicBlock*> CloneWorklist;		std::vector<const BasicBlock*> CloneWorklist;
PFC.CloneBlock(StartingBB, StartingInst->getIterator(), CloneWorklist);		PFC.CloneBlock(StartingBB, StartingInst->getIterator(), CloneWorklist, Ftor);
while (!CloneWorklist.empty()) {		while (!CloneWorklist.empty()) {
const BasicBlock *BB = CloneWorklist.back();		const BasicBlock *BB = CloneWorklist.back();
CloneWorklist.pop_back();		CloneWorklist.pop_back();
PFC.CloneBlock(BB, BB->begin(), CloneWorklist);		PFC.CloneBlock(BB, BB->begin(), CloneWorklist, Ftor);
}		}

// Loop over all of the basic blocks in the old function. If the block was		// Loop over all of the basic blocks in the old function. If the block was
// reachable, we have cloned it and the old block is now in the value map:		// reachable, we have cloned it and the old block is now in the value map:
// insert it into the new function in the right order. If not, ignore it.		// insert it into the new function in the right order. If not, ignore it.
//		//
// Defer PHI resolution until rest of function is resolved.		// Defer PHI resolution until rest of function is resolved.
SmallVector<const PHINode*, 16> PHIToResolve;		SmallVector<const PHINode*, 16> PHIToResolve;
▲ Show 20 Lines • Show All 184 Lines • ▼ Show 20 Lines

/// This works exactly like CloneFunctionInto,		/// This works exactly like CloneFunctionInto,
/// except that it does some simple constant prop and DCE on the fly. The		/// except that it does some simple constant prop and DCE on the fly. The
/// effect of this is to copy significantly less code in cases where (for		/// effect of this is to copy significantly less code in cases where (for
/// example) a function call with constant arguments is inlined, and those		/// example) a function call with constant arguments is inlined, and those
/// constant arguments cause a significant amount of code in the callee to be		/// constant arguments cause a significant amount of code in the callee to be
/// dead. Since this doesn't produce an exact copy of the input, it can't be		/// dead. Since this doesn't produce an exact copy of the input, it can't be
/// used for things like CloneFunction or CloneModule.		/// used for things like CloneFunction or CloneModule.
void llvm::CloneAndPruneFunctionInto(Function NewFunc, const Function OldFunc,		void llvm::CloneAndPruneFunctionInto(
ValueToValueMapTy &VMap,		Function NewFunc, const Function OldFunc, ValueToValueMapTy &VMap,
bool ModuleLevelChanges,		bool ModuleLevelChanges, SmallVectorImpl<ReturnInst *> &Returns,
SmallVectorImpl<ReturnInst*> &Returns,		const char NameSuffix, ClonedCodeInfo CodeInfo, Instruction *TheCall,
const char *NameSuffix,		BlockCloningFunctor Ftor) {
ClonedCodeInfo *CodeInfo,
Instruction *TheCall) {
CloneAndPruneIntoFromInst(NewFunc, OldFunc, &OldFunc->front().front(), VMap,		CloneAndPruneIntoFromInst(NewFunc, OldFunc, &OldFunc->front().front(), VMap,
ModuleLevelChanges, Returns, NameSuffix, CodeInfo);		ModuleLevelChanges, Returns, NameSuffix, CodeInfo,
		Ftor);
}		}

/// \brief Remaps instructions in \p Blocks using the mapping in \p VMap.		/// \brief Remaps instructions in \p Blocks using the mapping in \p VMap.
void llvm::remapInstructionsInBlocks(		void llvm::remapInstructionsInBlocks(
const SmallVectorImpl<BasicBlock *> &Blocks, ValueToValueMapTy &VMap) {		const SmallVectorImpl<BasicBlock *> &Blocks, ValueToValueMapTy &VMap) {
// Rewrite the code to refer to itself.		// Rewrite the code to refer to itself.
for (auto *BB : Blocks)		for (auto *BB : Blocks)
for (auto &Inst : *BB)		for (auto &Inst : *BB)
▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

lib/Transforms/Utils/InlineFunction.cpp

Show First 20 Lines • Show All 1,313 Lines • ▼ Show 20 Lines
bool llvm::InlineFunction(CallSite CS, InlineFunctionInfo &IFI,		bool llvm::InlineFunction(CallSite CS, InlineFunctionInfo &IFI,
AAResults *CalleeAAR, bool InsertLifetime) {		AAResults *CalleeAAR, bool InsertLifetime) {
Instruction *TheCall = CS.getInstruction();		Instruction *TheCall = CS.getInstruction();
assert(TheCall->getParent() && TheCall->getParent()->getParent() &&		assert(TheCall->getParent() && TheCall->getParent()->getParent() &&
"Instruction not in function!");		"Instruction not in function!");

// If IFI has any state in it, zap it before we fill it in.		// If IFI has any state in it, zap it before we fill it in.
IFI.reset();		IFI.reset();

const Function *CalledFunc = CS.getCalledFunction();		const Function *CalledFunc = CS.getCalledFunction();
if (!CalledFunc \|\| // Can't inline external function or indirect		if (!CalledFunc \|\| // Can't inline external function or indirect
CalledFunc->isDeclaration() \|\| // call, or call to a vararg function!		CalledFunc->isDeclaration() \|\| // call, or call to a vararg function!
CalledFunc->getFunctionType()->isVarArg()) return false;		CalledFunc->getFunctionType()->isVarArg()) return false;

// The inliner does not know how to inline through calls with operand bundles		// The inliner does not know how to inline through calls with operand bundles
// in general ...		// in general ...
if (CS.hasOperandBundles()) {		if (CS.hasOperandBundles()) {
▲ Show 20 Lines • Show All 137 Lines • ▼ Show 20 Lines	Function::iterator FirstNewBlock;
AddAlignmentAssumptions(CS, IFI);		AddAlignmentAssumptions(CS, IFI);

// We want the inliner to prune the code as it copies. We would LOVE to		// We want the inliner to prune the code as it copies. We would LOVE to
// have no dead or constant instructions leftover after inlining occurs		// have no dead or constant instructions leftover after inlining occurs
// (which can happen, e.g., because an argument was constant), but we'll be		// (which can happen, e.g., because an argument was constant), but we'll be
// happy with whatever the cloner can do.		// happy with whatever the cloner can do.
CloneAndPruneFunctionInto(Caller, CalledFunc, VMap,		CloneAndPruneFunctionInto(Caller, CalledFunc, VMap,
/ModuleLevelChanges=/false, Returns, ".i",		/ModuleLevelChanges=/false, Returns, ".i",
&InlinedFunctionInfo, TheCall);		&InlinedFunctionInfo, TheCall, IFI.Ftor);

// Remember the first block that is newly cloned over.		// Remember the first block that is newly cloned over.
FirstNewBlock = LastBlock; ++FirstNewBlock;		FirstNewBlock = LastBlock; ++FirstNewBlock;

// Inject byval arguments initialization.		// Inject byval arguments initialization.
for (std::pair<Value, Value> &Init : ByValInit)		for (std::pair<Value, Value> &Init : ByValInit)
HandleByValArgumentInit(Init.first, Init.second, Caller->getParent(),		HandleByValArgumentInit(Init.first, Init.second, Caller->getParent(),
&*FirstNewBlock, IFI);		&*FirstNewBlock, IFI);
▲ Show 20 Lines • Show All 524 Lines • Show Last 20 Lines

test/Transforms/Inline/function-count-update-2.ll

This file was added.

				; RUN: opt < %s -inline -S \| FileCheck %s

				; This tests that the function count of a callee gets correctly updated after it
				; has been inlined into a two callsites.

				; CHECK: @callee() !prof [[COUNT:![0-9]+]]
				define i32 @callee() !prof !1 {
				ret i32 0
				}

				define i32 @caller1() !prof !2 {
				%i = call i32 @callee()
				ret i32 %i
				}

				define i32 @caller2() !prof !3 {
				%i = call i32 @callee()
				ret i32 %i
				}

				!llvm.module.flags = !{!0}
				; CHECK: [[COUNT]] = !{!"function_entry_count", i64 0}
				!0 = !{i32 1, !"MaxFunctionCount", i32 1000}
				!1 = !{!"function_entry_count", i64 1000}
				!2 = !{!"function_entry_count", i64 600}
				!3 = !{!"function_entry_count", i64 400}

test/Transforms/Inline/function-count-update-3.ll

This file was added.

				; RUN: opt < %s -inline -S -inline-threshold=40 \| FileCheck %s

				; This tests that the function count of a function gets properly scaled after
				; inlining a call chain leading to the function.
				; Function a calls c with count 200 (C1)
				; Function b calls c with count 300
				; Function c calls e with count 250 (C2)
				; Entry count of e is 500 (C3)
				; c->e inlining does not happen since the cost exceeds threshold.
				; c then inlined into a.
				; e now gets inlined into a (through c) since the branch condition in e is now
				; known and hence the cost gets reduced.
				; Estimated count of a->e callsite = C2 * (C1 / C3)
				; Estimated count of a->e callsite = 250 * (200 / 500) = 100
				; Remaining count of e = C3 - 100 = 500 - 100 = 400

				@data = external global i32

				define i32 @a(i32 %a1) !prof !1 {
				%a2 = call i32 @c(i32 %a1, i32 1)
				ret i32 %a2
				}

				define i32 @b(i32 %b1) !prof !2 {
				%b2 = call i32 @c(i32 %b1, i32 %b1)
				ret i32 %b2
				}

				define i32 @c(i32 %c1, i32 %c100) !prof !3 {
				%cond = icmp sle i32 %c1, 1
				br i1 %cond, label %cond_true, label %cond_false

				cond_false:
				ret i32 0

				cond_true:
				%c11 = call i32 @e(i32 %c100)
				ret i32 %c11
				}

				; CHECK: @e(i32 %c1) !prof [[COUNT:![0-9]+]]
				define i32 @e(i32 %c1) !prof !4 {
				%cond = icmp sle i32 %c1, 1
				br i1 %cond, label %cond_true, label %cond_false

				cond_false:
				%c2 = load i32, i32* @data, align 4
				%c3 = add i32 %c1, %c2
				%c4 = mul i32 %c3, %c2
				%c5 = add i32 %c4, %c2
				%c6 = mul i32 %c5, %c2
				%c7 = add i32 %c6, %c2
				%c8 = mul i32 %c7, %c2
				%c9 = add i32 %c8, %c2
				%c10 = mul i32 %c9, %c2
				ret i32 %c10

				cond_true:
				ret i32 0
				}

				!llvm.module.flags = !{!0}
				; CHECK: [[COUNT]] = !{!"function_entry_count", i64 400}
				!0 = !{i32 1, !"MaxFunctionCount", i32 5000}
				!1 = !{!"function_entry_count", i64 200}
				!2 = !{!"function_entry_count", i64 300}
				!3 = !{!"function_entry_count", i64 500}
				!4 = !{!"function_entry_count", i64 500}

test/Transforms/Inline/function-count-update.ll

This file was added.

				; RUN: opt < %s -inline -S \| FileCheck %s

				; This tests that the function count of a callee gets correctly updated after it
				; has been inlined into a single callsite.
				davidxlUnsubmitted Done Reply Inline Actions into two callsites in the same block. davidxl: into two callsites in the same block.

				; CHECK: @callee(i32 %n) !prof [[COUNT:![0-9]+]]
				davidxlUnsubmitted Not Done Reply Inline Actions in a single callee or in a single BB? The test intends to verify that the profile update of the second inlined callsite is correct after block split -- so having the callee identical in the two sites are not essential to the test. davidxl: in a single callee or in a single BB? The test intends to verify that the profile update of…
				eramanAuthorUnsubmitted Not Done Reply Inline Actions This is a typo - I meant to write "in the same caller". I have fixed the comments. Yes, for the purpose of this test, we don't need the callee to be the same. I have made the caller call 2 different functions. I also don't think making the calls in a block other than the entry makes any difference, but since it doesn't hurt, I've made that change as well. eraman: This is a typo - I meant to write "in the same caller". I have fixed the comments. Yes, for…
				define i32 @callee(i32 %n) !prof !1 {
				%cond = icmp sle i32 %n, 10
				br i1 %cond, label %cond_true, label %cond_false

				cond_true:
				%r1 = add i32 %n, 1
				ret i32 %r1
				cond_false:
				%r2 = add i32 %n, 2
				ret i32 %r2
				}

				define i32 @caller(i32 %n) !prof !2 {
				%i = call i32 @callee(i32 %n)
				davidxlUnsubmitted Not Done Reply Inline Actions Can you make these two calls in a block that is not the entry block? davidxl: Can you make these two calls in a block that is not the entry block?
				eramanAuthorUnsubmitted Not Done Reply Inline Actions That does not matter for the purposes of this test case. eraman: That does not matter for the purposes of this test case.
				%j = call i32 @callee(i32 %i)
				ret i32 %j
				davidxlUnsubmitted Done Reply Inline Actions The purpose is to verify that copyBlockFrequency after bb split works in general not just the entry block. davidxl: The purpose is to verify that copyBlockFrequency after bb split works in general not just the…
				}

				!llvm.module.flags = !{!0}
				; CHECK: [[COUNT]] = !{!"function_entry_count", i64 200}
				!0 = !{i32 1, !"MaxFunctionCount", i32 1000}
				!1 = !{!"function_entry_count", i64 1000}
				!2 = !{!"function_entry_count", i64 400}

This is an archive of the discontinued LLVM Phabricator instance.

Infrastructure to allow use of PGO in inlinerClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 46720

include/llvm/Analysis/InlineCost.h

include/llvm/Transforms/IPO/InlinerPass.h

include/llvm/Transforms/Utils/Cloning.h

lib/Analysis/InlineCost.cpp

lib/Transforms/IPO/InlineSimple.cpp

lib/Transforms/IPO/Inliner.cpp

lib/Transforms/Utils/CloneFunction.cpp

lib/Transforms/Utils/InlineFunction.cpp

test/Transforms/Inline/function-count-update-2.ll

test/Transforms/Inline/function-count-update-3.ll

test/Transforms/Inline/function-count-update.ll

Infrastructure to allow use of PGO in inliner
ClosedPublic