This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
include/llvm/ProfileData/
-
llvm/
-
ProfileData/
-
ProfileCommon.h
-
lib/
-
Analysis/
-
InlineCost.cpp
-
ProfileData/
-
ProfileSummary.cpp
-
test/Transforms/Inline/
-
Transforms/
-
Inline/
-
inline-cold-callee.ll
-
inline-hot-callee.ll
-
unittests/ProfileData/
-
ProfileData/
-
CMakeLists.txt
-
ProfileSummaryTest.cpp

Differential D18622

Replace the use of MaxFunctionCount module flag
ClosedPublic

Authored by eraman on Mar 30 2016, 2:19 PM.

Download Raw Diff

Details

Reviewers

vsk

Commits

rGf53baca68658: Replace the use of MaxFunctionCount module flag
rL266477: Replace the use of MaxFunctionCount module flag

Summary

This provides a convenient way to get ProfileSummary for a module and use the maximum function count from the profile summary instead of calling getMaximumFunctionCount. The getMaximumFunctionCount is not removed in this patch and will be done in a later patch.

Diff Detail

Repository: rL LLVM

Event Timeline

eraman updated this revision to Diff 52122.Mar 30 2016, 2:19 PM

eraman retitled this revision from to Replace the use of MaxFunctionCount module flag.

eraman updated this object.

eraman added a reviewer: vsk.

eraman added subscribers: davidxl, llvm-commits.

Thanks, comments inline --

include/llvm/ProfileData/ProfileCommon.h
75 ↗	(On Diff #52122)	Hm, this won't work in multi-threaded environments. You can either sink this into LLVMContext (preferred!), or make it an `llvm::ManagedStatic` defined outside of `ProfileSummary` for more visibility. If you go with the second option, you'll also need a mutex around all cache accesses. Somewhere down the line you might consider making this cache some kind of (Small)?DenseMap.
lib/ProfileData/ProfileSummary.cpp
36 ↗	(On Diff #52122)	See comment above.
371 ↗	(On Diff #52122)	nit: I prefer `if (Metadata *MD = M->getProfileSummary()) return ...;`, but I'll leave it up to you.
unittests/ProfileData/ProfileSummaryTest.cpp
23 ↗	(On Diff #52122)	I think this sort of initialization code is supposed to go into a `void SetUp()` method. Let gtest do its own thing with the test constructor.
52 ↗	(On Diff #52122)	It looks like this is worth lifting into a `friend bool operator==(...)` method for ProfileSummary. I think that would make things a bit easier to read; maybe clients would like to be able to compare summaries too. Wdyt?

eraman added inline comments.Mar 30 2016, 2:51 PM

include/llvm/ProfileData/ProfileCommon.h
75 ↗	(On Diff #52122)	Yes indeed. Perhaps it is better to not cache this and instead let the clients cache the summary? As to use of DenseMap, is this ever needed? I believe (and I may be wrong) the only place multiple modules are simultaneously present is in the LTO mode and there all modules are actually merged into one module which is where the IPO optimizations take place. Another crazy thought - we could just cache the ProfileSummary without any key since all modules compiled in a single process are bound to have the same profile summary (since summary is not module specific)

vsk added inline comments.Mar 30 2016, 2:53 PM

include/llvm/ProfileData/ProfileCommon.h
75 ↗	(On Diff #52122)	Sure, using a DenseMap is probably overkill. Having a one-entry cache seems useful though. Regarding your last point: it's cheap to keep the `Module*` as a key, and keeping it makes the code less "magical" (no hidden assumptions about which Module you're referring to).

eraman added inline comments.Mar 30 2016, 3:55 PM

include/llvm/ProfileData/ProfileCommon.h
75 ↗	(On Diff #52122)	Yeah, that's why I didn't go that route. I don't see a way to make this part of the LLVMContextImpl since ProfileSummary is not part of Core (and we obviously don't want Core to be dependent on ProfileData). In fact that is the reason why Module's getProfileSummary returns Metadata instead of ProfileSummary. So the only way to make this work is to make it an ManagedStatic right?

vsk added inline comments.Mar 30 2016, 4:02 PM

include/llvm/ProfileData/ProfileCommon.h
75 ↗	(On Diff #52122)	There's one more possibility. What if we cache ProfileSummary objects in the clients of this API, instead of in `getProfileSummary`? E.g we could cache the summary in `CallAnalyzer`. An `LLVMContext` should be readily available there, and that seems like the right place to put this.

eraman added inline comments.Mar 30 2016, 4:27 PM

include/llvm/ProfileData/ProfileCommon.h
75 ↗	(On Diff #52122)	That's what I meant by "Perhaps it is better to not cache this and instead let the clients cache the summary". This won't require using LLVMContext as the cached summary will be part of the client object (in this case it'll be in Inliner and passed on to CallAnalyzer)

vsk added inline comments.Mar 30 2016, 4:29 PM

include/llvm/ProfileData/ProfileCommon.h
75 ↗	(On Diff #52122)	Ah, sorry for misreading. But, wouldn't the code be more generic if clients query the LLVMContext for the summary? That way, all clients of LLVMContext have access to the cached summary, not just the inliner.

eraman added inline comments.Mar 30 2016, 4:42 PM

include/llvm/ProfileData/ProfileCommon.h
75 ↗	(On Diff #52122)	That would require ProfileSummary definition to be visible to LLVMContext, doesn't it? Or are you suggesting keeping a (Module , ProfileSummary ) pair in LLVMContext, let the client query it and if there is no cached entry, compute the summary and update this cache to make it available for other optimizations? IMO this is not worth the potential savings.

vsk added inline comments.Mar 30 2016, 4:55 PM

include/llvm/ProfileData/ProfileCommon.h
75 ↗	(On Diff #52122)	I'm suggesting keeping a (Module , ProfileSummary ) pair in LLVMContextImpl and exposing LLVMContext::getProfileSummary(Module *). The latter would take care of computing and caching the summary. We wouldn't have to expose ProfileCommon.h to all users of LLVMContext (we can just forward-declare ProfileSummary). ISTM that this approach is identical to caching the summary in the inliner, but is easier to reuse. So, is your concern that it would be trickier to cache the summary in an LLVMContext vs. the inliner? Or would you prefer to not cache the summary anywhere?

eraman added inline comments.Mar 31 2016, 4:00 PM

include/llvm/ProfileData/ProfileCommon.h
75 ↗	(On Diff #52122)	Sorry, I am still not getting it. To implement LLVMContext::getProfileSummary(Module ) in the .cpp file - specifically the part that computes the summary - we need the definition of ProfileSummary and not just the forward declaration. This would also have to call getFromMD which is defined in ProfileSummary.cpp. In short, this would make LLVMCore depend on LLVMProfileData (or make the code in ProfileData part of LLVMCore) which is something we don't want to do.. Does this make sense? One thing we could do is keep a ProfileSummary in Module and use it as a cache that is populated by ProfileSummary::getProfileSummary(Module *). This would only require forward declaration of ProfileSummary in Module.

vsk added inline comments.Apr 6 2016, 9:48 AM

include/llvm/ProfileData/ProfileCommon.h
75 ↗	(On Diff #52122)	I'm sorry I missed your response! I understand the problem now. I think your solution (keep a ProfileSummary* in Module) is good.

eraman added inline comments.Apr 6 2016, 5:36 PM

include/llvm/ProfileData/ProfileCommon.h
75 ↗	(On Diff #52122)	Unfortunately, this is not ideal either. Who should own the ProfileSummary object created out of Metadata? Module cannot since it doesn't see ProfileSummary destructor. I think the only options are to a) let the optimizations cache the summary or b) use ManagedStatic.

vsk added inline comments.Apr 6 2016, 11:21 PM

include/llvm/ProfileData/ProfileCommon.h
75 ↗	(On Diff #52122)	I see, we can't place a unique_ptr<ProfileSummary> into LLVMContext without exposing its class definition. In that case, I prefer option (b) because it's a bit more general solution.

davidxl added inline comments.Apr 7 2016, 11:41 AM

include/llvm/ProfileData/ProfileCommon.h
194 ↗	(On Diff #52122)	I suggest doing some compile time measurement before using the caching mechanism. Besides, this won't affect O2 compile time at all.

eraman added inline comments.Apr 12 2016, 1:14 PM

include/llvm/ProfileData/ProfileCommon.h
194 ↗	(On Diff #52122)	Here are some numbers: It takes ~ 1.05 ms on a machine with Intel Xeon E5-2690 with 2.9GHz clock frequency per call to computeProfileSummary. Or 3000 cycles. The time was obtained by using a NamedRegionTimer to measure 1M calls to computeProfileSummary and I took the user + sys time. To put this in perspective, while compiling a large real application, the time taken by computeProfileSummary is 25-30% of the time taken by CallAnalyzer's analyzeCall. So not caching will increase in a 25-30% increase in analyzeCall (which is now the only client of computeProfileSummary). Of course, as you point out this is only in the PGO mode and doesn't affect O2 compiles, but still it seems like worthwhile caching this,

eraman added inline comments.Apr 12 2016, 2:20 PM

include/llvm/ProfileData/ProfileCommon.h
194 ↗	(On Diff #52122)	I made the cahe into a ManagedStatic and added SmartScopedLock to control accesses to the cache. Now, a call to getProfileSummary takes ~323 ns (averaged over 20M calls). Out of this, 70% is system time.

What is the impact to overall compile time (using a large app)? On the other hand, if caching does not increase code complexity too much, it is also fine. Do you have a updated version of the patch?

Use ManagedStatic and a mutex to access the cache

In D18622#400020, @davidxl wrote:

What is the impact to overall compile time (using a large app)? On the other hand, if caching does not increase code complexity too much, it is also fine. Do you have a updated version of the patch?

I was looking at compilation time of a fairly large file with a higher threshold (1500) for hot callsites. In this case, the total time spent on inlining (not just inline cost analysis) was around 5.5% of the total compilation time (as reported by -time-passes). If this is representative, we are looking at increasing the compilation time by ~1.4% (25% of 5.5%). Not much, but this is a low hanging fruit and the code with ManagedStatic is not much complex.

davidxl added inline comments.Apr 13 2016, 11:38 PM

lib/ProfileData/ProfileSummary.cpp
116 ↗	(On Diff #53590)	dyn_cast
unittests/ProfileData/ProfileSummaryTest.cpp
65 ↗	(On Diff #53590)	redundant line.

Address David's comments.

looks good to me. Check with Vedant in case there are more comments.

lgtm!

Closed by commit rL266477: Replace the use of MaxFunctionCount module flag (authored by eraman). · Explain WhyApr 15 2016, 2:45 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

ProfileData/

ProfileCommon.h

37 lines

lib/

Analysis/

InlineCost.cpp

6 lines

ProfileData/

ProfileSummary.cpp

44 lines

test/

Transforms/

Inline/

inline-cold-callee.ll

27 lines

inline-hot-callee.ll

27 lines

unittests/

ProfileData/

CMakeLists.txt

1 line

ProfileSummaryTest.cpp

66 lines

Diff 53952

llvm/trunk/include/llvm/ProfileData/ProfileCommon.h

Show All 16 Lines

#include "llvm/ADT/APInt.h"		#include "llvm/ADT/APInt.h"
#include <cstdint>		#include <cstdint>
#include <functional>		#include <functional>
#include <map>		#include <map>
#include <vector>		#include <vector>

#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
		#include "llvm/Support/ManagedStatic.h"
		#include "llvm/Support/Mutex.h"

namespace llvm {		namespace llvm {
class Function;		class Function;
namespace IndexedInstrProf {		namespace IndexedInstrProf {
struct Summary;		struct Summary;
}		}
namespace sampleprof {		namespace sampleprof {
class FunctionSamples;		class FunctionSamples;
}		}
struct InstrProfRecord;		struct InstrProfRecord;
class LLVMContext;		class LLVMContext;
class Metadata;		class Metadata;
class MDTuple;		class MDTuple;
class MDNode;		class MDNode;
		class Module;

inline const char *getHotSectionPrefix() { return ".hot"; }		inline const char *getHotSectionPrefix() { return ".hot"; }
inline const char *getUnlikelySectionPrefix() { return ".unlikely"; }		inline const char *getUnlikelySectionPrefix() { return ".unlikely"; }

// The profile summary is one or more (Cutoff, MinCount, NumCounts) triplets.		// The profile summary is one or more (Cutoff, MinCount, NumCounts) triplets.
// The semantics of counts depend on the type of profile. For instrumentation		// The semantics of counts depend on the type of profile. For instrumentation
// profile, counts are block counts and for sample profile, counts are		// profile, counts are block counts and for sample profile, counts are
// per-line samples. Given a target counts percentile, we compute the minimum		// per-line samples. Given a target counts percentile, we compute the minimum
Show All 16 Lines

private:		private:
const Kind PSK;		const Kind PSK;
static const char *KindStr[2];		static const char *KindStr[2];
// We keep track of the number of times a count (block count or samples)		// We keep track of the number of times a count (block count or samples)
// appears in the profile. The map is kept sorted in the descending order of		// appears in the profile. The map is kept sorted in the descending order of
// counts.		// counts.
std::map<uint64_t, uint32_t, std::greater<uint64_t>> CountFrequencies;		std::map<uint64_t, uint32_t, std::greater<uint64_t>> CountFrequencies;
		// Compute profile summary for a module.
		static ProfileSummary computeProfileSummary(Module M);
		// Cache of last seen module and its profile summary.
		static ManagedStatic<std::pair<Module *, std::unique_ptr<ProfileSummary>>>
		CachedSummary;
		// Mutex to access summary cache
		static ManagedStatic<sys::SmartMutex<true>> CacheMutex;

protected:		protected:
SummaryEntryVector DetailedSummary;		SummaryEntryVector DetailedSummary;
std::vector<uint32_t> DetailedSummaryCutoffs;		std::vector<uint32_t> DetailedSummaryCutoffs;
uint64_t TotalCount, MaxCount, MaxFunctionCount;		uint64_t TotalCount, MaxCount, MaxFunctionCount;
uint32_t NumCounts, NumFunctions;		uint32_t NumCounts, NumFunctions;
ProfileSummary(Kind K, std::vector<uint32_t> Cutoffs)		ProfileSummary(Kind K, std::vector<uint32_t> Cutoffs)
: PSK(K), DetailedSummaryCutoffs(Cutoffs), TotalCount(0), MaxCount(0),		: PSK(K), DetailedSummaryCutoffs(Cutoffs), TotalCount(0), MaxCount(0),
MaxFunctionCount(0), NumCounts(0), NumFunctions(0) {}		MaxFunctionCount(0), NumCounts(0), NumFunctions(0) {}
ProfileSummary(Kind K)		ProfileSummary(Kind K)
: PSK(K), TotalCount(0), MaxCount(0), MaxFunctionCount(0), NumCounts(0),		: PSK(K), TotalCount(0), MaxCount(0), MaxFunctionCount(0), NumCounts(0),
NumFunctions(0) {}		NumFunctions(0) {}
ProfileSummary(Kind K, SummaryEntryVector DetailedSummary,		ProfileSummary(Kind K, SummaryEntryVector DetailedSummary,
uint64_t TotalCount, uint64_t MaxCount,		uint64_t TotalCount, uint64_t MaxCount,
uint64_t MaxFunctionCount, uint32_t NumCounts,		uint64_t MaxFunctionCount, uint32_t NumCounts,
uint32_t NumFunctions)		uint32_t NumFunctions)
: PSK(K), DetailedSummary(DetailedSummary), TotalCount(TotalCount),		: PSK(K), DetailedSummary(DetailedSummary), TotalCount(TotalCount),
MaxCount(MaxCount), MaxFunctionCount(MaxFunctionCount),		MaxCount(MaxCount), MaxFunctionCount(MaxFunctionCount),
NumCounts(NumCounts), NumFunctions(NumFunctions) {}		NumCounts(NumCounts), NumFunctions(NumFunctions) {}
~ProfileSummary() = default;
inline void addCount(uint64_t Count);		inline void addCount(uint64_t Count);
/// \brief Return metadata specific to the profile format.		/// \brief Return metadata specific to the profile format.
/// Derived classes implement this method to return a vector of Metadata.		/// Derived classes implement this method to return a vector of Metadata.
virtual std::vector<Metadata *> getFormatSpecificMD(LLVMContext &Context) = 0;		virtual std::vector<Metadata *> getFormatSpecificMD(LLVMContext &Context) = 0;
/// \brief Return detailed summary as metadata.		/// \brief Return detailed summary as metadata.
Metadata *getDetailedSummaryMD(LLVMContext &Context);		Metadata *getDetailedSummaryMD(LLVMContext &Context);

public:		public:
static const int Scale = 1000000;		static const int Scale = 1000000;
Kind getKind() const { return PSK; }		Kind getKind() const { return PSK; }
const char *getKindStr() const { return KindStr[PSK]; }		const char *getKindStr() const { return KindStr[PSK]; }
// \brief Returns true if F is a hot function.		// \brief Returns true if F is a hot function.
static bool isFunctionHot(const Function *F);		static bool isFunctionHot(const Function *F);
// \brief Returns true if F is unlikley executed.		// \brief Returns true if F is unlikley executed.
static bool isFunctionUnlikely(const Function *F);		static bool isFunctionUnlikely(const Function *F);
inline SummaryEntryVector &getDetailedSummary();		inline SummaryEntryVector &getDetailedSummary();
void computeDetailedSummary();		void computeDetailedSummary();
/// \brief A vector of useful cutoff values for detailed summary.		/// \brief A vector of useful cutoff values for detailed summary.
static const std::vector<uint32_t> DefaultCutoffs;		static const std::vector<uint32_t> DefaultCutoffs;
/// \brief Return summary information as metadata.		/// \brief Return summary information as metadata.
Metadata *getMD(LLVMContext &Context);		Metadata *getMD(LLVMContext &Context);
/// \brief Construct profile summary from metdata.		/// \brief Construct profile summary from metdata.
static ProfileSummary getFromMD(Metadata MD);		static ProfileSummary getFromMD(Metadata MD);
uint32_t getNumFunctions() { return NumFunctions; }		uint32_t getNumFunctions() { return NumFunctions; }
uint64_t getMaxFunctionCount() { return MaxFunctionCount; }		uint64_t getMaxFunctionCount() { return MaxFunctionCount; }
		/// \brief Get profile summary associated with module \p M
		static inline ProfileSummary getProfileSummary(Module M);
		virtual ~ProfileSummary() = default;
		virtual bool operator==(ProfileSummary &Other);
};		};

class InstrProfSummary final : public ProfileSummary {		class InstrProfSummary final : public ProfileSummary {
uint64_t MaxInternalBlockCount;		uint64_t MaxInternalBlockCount;
inline void addEntryCount(uint64_t Count);		inline void addEntryCount(uint64_t Count);
inline void addInternalCount(uint64_t Count);		inline void addInternalCount(uint64_t Count);

protected:		protected:
Show All 13 Lines	public:
static bool classof(const ProfileSummary *PS) {		static bool classof(const ProfileSummary *PS) {
return PS->getKind() == PSK_Instr;		return PS->getKind() == PSK_Instr;
}		}
void addRecord(const InstrProfRecord &);		void addRecord(const InstrProfRecord &);
uint32_t getNumBlocks() { return NumCounts; }		uint32_t getNumBlocks() { return NumCounts; }
uint64_t getTotalCount() { return TotalCount; }		uint64_t getTotalCount() { return TotalCount; }
uint64_t getMaxBlockCount() { return MaxCount; }		uint64_t getMaxBlockCount() { return MaxCount; }
uint64_t getMaxInternalBlockCount() { return MaxInternalBlockCount; }		uint64_t getMaxInternalBlockCount() { return MaxInternalBlockCount; }
		bool operator==(ProfileSummary &Other) override;
};		};

class SampleProfileSummary final : public ProfileSummary {		class SampleProfileSummary final : public ProfileSummary {
protected:		protected:
std::vector<Metadata *> getFormatSpecificMD(LLVMContext &Context) override;		std::vector<Metadata *> getFormatSpecificMD(LLVMContext &Context) override;

public:		public:
uint32_t getNumLinesWithSamples() { return NumCounts; }		uint32_t getNumLinesWithSamples() { return NumCounts; }
Show All 24 Lines
}		}

SummaryEntryVector &ProfileSummary::getDetailedSummary() {		SummaryEntryVector &ProfileSummary::getDetailedSummary() {
if (!DetailedSummaryCutoffs.empty() && DetailedSummary.empty())		if (!DetailedSummaryCutoffs.empty() && DetailedSummary.empty())
computeDetailedSummary();		computeDetailedSummary();
return DetailedSummary;		return DetailedSummary;
}		}

		ProfileSummary ProfileSummary::getProfileSummary(Module M) {
		if (!M)
		return nullptr;
		sys::SmartScopedLock<true> Lock(*CacheMutex);
		// Computing profile summary for a module involves parsing a fairly large
		// metadata and could be expensive. We use a simple cache of the last seen
		// module and its profile summary.
		if (CachedSummary->first != M) {
		auto *Summary = computeProfileSummary(M);
		// Do not cache if the summary is empty. This is because a later pass
		// (sample profile loader, for example) could attach the summary metadata on
		// the module.
		if (!Summary)
		return nullptr;
		CachedSummary->first = M;
		CachedSummary->second.reset(Summary);
		}
		return CachedSummary->second.get();
		}
} // end namespace llvm		} // end namespace llvm
#endif		#endif

llvm/trunk/lib/Analysis/InlineCost.cpp

Show All 24 Lines
#include "llvm/IR/CallSite.h"		#include "llvm/IR/CallSite.h"
#include "llvm/IR/CallingConv.h"		#include "llvm/IR/CallingConv.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/GetElementPtrTypeIterator.h"		#include "llvm/IR/GetElementPtrTypeIterator.h"
#include "llvm/IR/GlobalAlias.h"		#include "llvm/IR/GlobalAlias.h"
#include "llvm/IR/InstVisitor.h"		#include "llvm/IR/InstVisitor.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Operator.h"		#include "llvm/IR/Operator.h"
		#include "llvm/ProfileData/ProfileCommon.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "inline-cost"		#define DEBUG_TYPE "inline-cost"

STATISTIC(NumCallsAnalyzed, "Number of call sites analyzed");		STATISTIC(NumCallsAnalyzed, "Number of call sites analyzed");
▲ Show 20 Lines • Show All 580 Lines • ▼ Show 20 Lines	void CallAnalyzer::updateThreshold(CallSite CS, Function &Callee) {

// If profile information is available, use that to adjust threshold of hot		// If profile information is available, use that to adjust threshold of hot
// and cold functions.		// and cold functions.
// FIXME: The heuristic used below for determining hotness and coldness are		// FIXME: The heuristic used below for determining hotness and coldness are
// based on preliminary SPEC tuning and may not be optimal. Replace this with		// based on preliminary SPEC tuning and may not be optimal. Replace this with
// a well-tuned heuristic based on callsite hotness and not callee hotness.		// a well-tuned heuristic based on callsite hotness and not callee hotness.
uint64_t FunctionCount = 0, MaxFunctionCount = 0;		uint64_t FunctionCount = 0, MaxFunctionCount = 0;
bool HasPGOCounts = false;		bool HasPGOCounts = false;
if (Callee.getEntryCount() && Callee.getParent()->getMaximumFunctionCount()) {		ProfileSummary *PS = ProfileSummary::getProfileSummary(Callee.getParent());
		if (Callee.getEntryCount() && PS) {
HasPGOCounts = true;		HasPGOCounts = true;
FunctionCount = Callee.getEntryCount().getValue();		FunctionCount = Callee.getEntryCount().getValue();
MaxFunctionCount = Callee.getParent()->getMaximumFunctionCount().getValue();		MaxFunctionCount = PS->getMaxFunctionCount();
}		}

// Listen to the inlinehint attribute or profile based hotness information		// Listen to the inlinehint attribute or profile based hotness information
// when it would increase the threshold and the caller does not need to		// when it would increase the threshold and the caller does not need to
// minimize its size.		// minimize its size.
bool InlineHint =		bool InlineHint =
Callee.hasFnAttribute(Attribute::InlineHint) \|\|		Callee.hasFnAttribute(Attribute::InlineHint) \|\|
(HasPGOCounts &&		(HasPGOCounts &&
▲ Show 20 Lines • Show All 903 Lines • Show Last 20 Lines

llvm/trunk/lib/ProfileData/ProfileSummary.cpp

Show All 9 Lines
// This file contains support for computing profile summary data.		// This file contains support for computing profile summary data.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/IR/Attributes.h"		#include "llvm/IR/Attributes.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/Metadata.h"		#include "llvm/IR/Metadata.h"
		#include "llvm/IR/Module.h"
#include "llvm/IR/Type.h"		#include "llvm/IR/Type.h"
#include "llvm/ProfileData/InstrProf.h"		#include "llvm/ProfileData/InstrProf.h"
#include "llvm/ProfileData/ProfileCommon.h"		#include "llvm/ProfileData/ProfileCommon.h"
#include "llvm/ProfileData/SampleProf.h"		#include "llvm/ProfileData/SampleProf.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"

using namespace llvm;		using namespace llvm;

// A set of cutoff values. Each value, when divided by ProfileSummary::Scale		// A set of cutoff values. Each value, when divided by ProfileSummary::Scale
// (which is 1000000) is a desired percentile of total counts.		// (which is 1000000) is a desired percentile of total counts.
const std::vector<uint32_t> ProfileSummary::DefaultCutoffs(		const std::vector<uint32_t> ProfileSummary::DefaultCutoffs(
{10000, /* 1% */		{10000, /* 1% */
100000, /* 10% */		100000, /* 10% */
200000, 300000, 400000, 500000, 600000, 500000, 600000, 700000, 800000,		200000, 300000, 400000, 500000, 600000, 500000, 600000, 700000, 800000,
900000, 950000, 990000, 999000, 999900, 999990, 999999});		900000, 950000, 990000, 999000, 999900, 999990, 999999});
const char *ProfileSummary::KindStr[2] = {"InstrProf", "SampleProfile"};		const char *ProfileSummary::KindStr[2] = {"InstrProf", "SampleProfile"};

		ManagedStatic<std::pair<Module *, std::unique_ptr<ProfileSummary>>>
		ProfileSummary::CachedSummary;
		ManagedStatic<sys::SmartMutex<true>> ProfileSummary::CacheMutex;

void InstrProfSummary::addRecord(const InstrProfRecord &R) {		void InstrProfSummary::addRecord(const InstrProfRecord &R) {
addEntryCount(R.Counts[0]);		addEntryCount(R.Counts[0]);
for (size_t I = 1, E = R.Counts.size(); I < E; ++I)		for (size_t I = 1, E = R.Counts.size(); I < E; ++I)
addInternalCount(R.Counts[I]);		addInternalCount(R.Counts[I]);
}		}

// To compute the detailed summary, we consider each line containing samples as		// To compute the detailed summary, we consider each line containing samples as
// equivalent to a block with a count in the instrumented profile.		// equivalent to a block with a count in the instrumented profile.
Show All 34 Lines	while (CurrSum < DesiredCount && Iter != End) {
Iter++;		Iter++;
}		}
assert(CurrSum >= DesiredCount);		assert(CurrSum >= DesiredCount);
ProfileSummaryEntry PSE = {Cutoff, Count, CountsSeen};		ProfileSummaryEntry PSE = {Cutoff, Count, CountsSeen};
DetailedSummary.push_back(PSE);		DetailedSummary.push_back(PSE);
}		}
}		}

		bool ProfileSummary::operator==(ProfileSummary &Other) {
		if (getKind() != Other.getKind())
		return false;
		if (TotalCount != Other.TotalCount)
		return false;
		if (MaxCount != Other.MaxCount)
		return false;
		if (MaxFunctionCount != Other.MaxFunctionCount)
		return false;
		if (NumFunctions != Other.NumFunctions)
		return false;
		if (NumCounts != Other.NumCounts)
		return false;
		std::vector<ProfileSummaryEntry> DS1 = getDetailedSummary();
		std::vector<ProfileSummaryEntry> DS2 = Other.getDetailedSummary();
		auto CompareSummaryEntry = [](ProfileSummaryEntry &E1,
		ProfileSummaryEntry &E2) {
		return E1.Cutoff == E2.Cutoff && E1.MinCount == E2.MinCount &&
		E1.NumCounts == E2.NumCounts;
		};
		if (!std::equal(DS1.begin(), DS1.end(), DS2.begin(), CompareSummaryEntry))
		return false;
		return true;
		}

		bool InstrProfSummary::operator==(ProfileSummary &Other) {
		InstrProfSummary *OtherIPS = dyn_cast<InstrProfSummary>(&Other);
		if (!OtherIPS)
		return false;
		return MaxInternalBlockCount == OtherIPS->MaxInternalBlockCount &&
		ProfileSummary::operator==(Other);
		}

// Returns true if the function is a hot function.		// Returns true if the function is a hot function.
bool ProfileSummary::isFunctionHot(const Function *F) {		bool ProfileSummary::isFunctionHot(const Function *F) {
// FIXME: update when summary data is stored in module's metadata.		// FIXME: update when summary data is stored in module's metadata.
return false;		return false;
}		}

// Returns true if the function is a cold function.		// Returns true if the function is a cold function.
bool ProfileSummary::isFunctionUnlikely(const Function *F) {		bool ProfileSummary::isFunctionUnlikely(const Function *F) {
▲ Show 20 Lines • Show All 264 Lines • ▼ Show 20 Lines	if (isKeyValuePair(dyn_cast_or_null<MDTuple>(FormatMD), "ProfileFormat",
"SampleProfile"))		"SampleProfile"))
return getSampleProfileSummaryFromMD(Tuple);		return getSampleProfileSummaryFromMD(Tuple);
else if (isKeyValuePair(dyn_cast_or_null<MDTuple>(FormatMD), "ProfileFormat",		else if (isKeyValuePair(dyn_cast_or_null<MDTuple>(FormatMD), "ProfileFormat",
"InstrProf"))		"InstrProf"))
return getInstrProfSummaryFromMD(Tuple);		return getInstrProfSummaryFromMD(Tuple);
else		else
return nullptr;		return nullptr;
}		}

		ProfileSummary ProfileSummary::computeProfileSummary(Module M) {
		if (Metadata *MD = M->getProfileSummary())
		return getFromMD(MD);
		return nullptr;
		}

llvm/trunk/test/Transforms/Inline/inline-cold-callee.ll

	; RUN: opt < %s -inline -inlinecold-threshold=0 -S \| FileCheck %s			; RUN: opt < %s -inline -inlinecold-threshold=0 -S \| FileCheck %s

	; This tests that a cold callee gets the (lower) inlinecold-threshold even without			; This tests that a cold callee gets the (lower) inlinecold-threshold even without
	; Cold hint and does not get inlined because the cost exceeds the inlinecold-threshold.			; Cold hint and does not get inlined because the cost exceeds the inlinecold-threshold.
	; A callee with identical body does gets inlined because cost fits within the			; A callee with identical body does gets inlined because cost fits within the
	; inline-threshold			; inline-threshold

	define i32 @callee1(i32 %x) !prof !1 {			define i32 @callee1(i32 %x) !prof !21 {
	%x1 = add i32 %x, 1			%x1 = add i32 %x, 1
	%x2 = add i32 %x1, 1			%x2 = add i32 %x1, 1
	%x3 = add i32 %x2, 1			%x3 = add i32 %x2, 1

	ret i32 %x3			ret i32 %x3
	}			}

	define i32 @callee2(i32 %x) !prof !2 {			define i32 @callee2(i32 %x) !prof !22 {
	; CHECK-LABEL: @callee2(			; CHECK-LABEL: @callee2(
	%x1 = add i32 %x, 1			%x1 = add i32 %x, 1
	%x2 = add i32 %x1, 1			%x2 = add i32 %x1, 1
	%x3 = add i32 %x2, 1			%x3 = add i32 %x2, 1

	ret i32 %x3			ret i32 %x3
	}			}

	define i32 @caller2(i32 %y1) !prof !2 {			define i32 @caller2(i32 %y1) !prof !22 {
	; CHECK-LABEL: @caller2(			; CHECK-LABEL: @caller2(
	; CHECK: call i32 @callee2			; CHECK: call i32 @callee2
	; CHECK-NOT: call i32 @callee1			; CHECK-NOT: call i32 @callee1
	; CHECK: ret i32 %x3.i			; CHECK: ret i32 %x3.i
	%y2 = call i32 @callee2(i32 %y1)			%y2 = call i32 @callee2(i32 %y1)
	%y3 = call i32 @callee1(i32 %y2)			%y3 = call i32 @callee1(i32 %y2)
	ret i32 %y3			ret i32 %y3
	}			}

	!llvm.module.flags = !{!0}			!llvm.module.flags = !{!1}
	!0 = !{i32 1, !"MaxFunctionCount", i32 1000}			!21 = !{!"function_entry_count", i64 100}
	!1 = !{!"function_entry_count", i64 100}			!22 = !{!"function_entry_count", i64 1}
	!2 = !{!"function_entry_count", i64 1}
				!1 = !{i32 1, !"ProfileSummary", !2}
				!2 = !{!3, !4, !5, !6, !7, !8, !9, !10}
				!3 = !{!"ProfileFormat", !"InstrProf"}
				!4 = !{!"TotalCount", i64 10000}
				!5 = !{!"MaxBlockCount", i64 1000}
				!6 = !{!"MaxInternalBlockCount", i64 1}
				!7 = !{!"MaxFunctionCount", i64 1000}
				!8 = !{!"NumBlocks", i64 3}
				!9 = !{!"NumFunctions", i64 3}
				!10 = !{!"DetailedSummary", !11}
				!11 = !{!12}
				!12 = !{i32 10000, i64 0, i32 0}

llvm/trunk/test/Transforms/Inline/inline-hot-callee.ll

	; RUN: opt < %s -inline -inline-threshold=0 -inlinehint-threshold=100 -S \| FileCheck %s			; RUN: opt < %s -inline -inline-threshold=0 -inlinehint-threshold=100 -S \| FileCheck %s

	; This tests that a hot callee gets the (higher) inlinehint-threshold even without			; This tests that a hot callee gets the (higher) inlinehint-threshold even without
	; inline hints and gets inlined because the cost is less than inlinehint-threshold.			; inline hints and gets inlined because the cost is less than inlinehint-threshold.
	; A cold callee with identical body does not get inlined because cost exceeds the			; A cold callee with identical body does not get inlined because cost exceeds the
	; inline-threshold			; inline-threshold

	define i32 @callee1(i32 %x) !prof !1 {			define i32 @callee1(i32 %x) !prof !20 {
	%x1 = add i32 %x, 1			%x1 = add i32 %x, 1
	%x2 = add i32 %x1, 1			%x2 = add i32 %x1, 1
	%x3 = add i32 %x2, 1			%x3 = add i32 %x2, 1

	ret i32 %x3			ret i32 %x3
	}			}

	define i32 @callee2(i32 %x) !prof !2 {			define i32 @callee2(i32 %x) !prof !21 {
	; CHECK-LABEL: @callee2(			; CHECK-LABEL: @callee2(
	%x1 = add i32 %x, 1			%x1 = add i32 %x, 1
	%x2 = add i32 %x1, 1			%x2 = add i32 %x1, 1
	%x3 = add i32 %x2, 1			%x3 = add i32 %x2, 1

	ret i32 %x3			ret i32 %x3
	}			}

	define i32 @caller2(i32 %y1) !prof !2 {			define i32 @caller2(i32 %y1) !prof !21 {
	; CHECK-LABEL: @caller2(			; CHECK-LABEL: @caller2(
	; CHECK: call i32 @callee2			; CHECK: call i32 @callee2
	; CHECK-NOT: call i32 @callee1			; CHECK-NOT: call i32 @callee1
	; CHECK: ret i32 %x3.i			; CHECK: ret i32 %x3.i
	%y2 = call i32 @callee2(i32 %y1)			%y2 = call i32 @callee2(i32 %y1)
	%y3 = call i32 @callee1(i32 %y2)			%y3 = call i32 @callee1(i32 %y2)
	ret i32 %y3			ret i32 %y3
	}			}

	!llvm.module.flags = !{!0}			!llvm.module.flags = !{!1}
	!0 = !{i32 1, !"MaxFunctionCount", i32 10}			!20 = !{!"function_entry_count", i64 10}
	!1 = !{!"function_entry_count", i64 10}			!21 = !{!"function_entry_count", i64 1}
	!2 = !{!"function_entry_count", i64 1}
				!1 = !{i32 1, !"ProfileSummary", !2}
				!2 = !{!3, !4, !5, !6, !7, !8, !9, !10}
				!3 = !{!"ProfileFormat", !"InstrProf"}
				!4 = !{!"TotalCount", i64 10000}
				!5 = !{!"MaxBlockCount", i64 10}
				!6 = !{!"MaxInternalBlockCount", i64 1}
				!7 = !{!"MaxFunctionCount", i64 10}
				!8 = !{!"NumBlocks", i64 3}
				!9 = !{!"NumFunctions", i64 3}
				!10 = !{!"DetailedSummary", !11}
				!11 = !{!12}
				!12 = !{i32 10000, i64 0, i32 0}

llvm/trunk/unittests/ProfileData/CMakeLists.txt

	set(LLVM_LINK_COMPONENTS			set(LLVM_LINK_COMPONENTS
	Core			Core
	ProfileData			ProfileData
	Support			Support
	)			)

	add_llvm_unittest(ProfileDataTests			add_llvm_unittest(ProfileDataTests
	CoverageMappingTest.cpp			CoverageMappingTest.cpp
	InstrProfTest.cpp			InstrProfTest.cpp
				ProfileSummaryTest.cpp
	SampleProfTest.cpp			SampleProfTest.cpp
	)			)

llvm/trunk/unittests/ProfileData/ProfileSummaryTest.cpp

				//===- unittest/ProfileData/ProfileSummaryTest.cpp --------------- C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/IR/Module.h"
				#include "llvm/ProfileData/InstrProf.h"
				#include "llvm/ProfileData/ProfileCommon.h"
				#include "llvm/ProfileData/SampleProf.h"
				#include "gtest/gtest.h"

				using namespace llvm;
				using namespace sampleprof;

				struct ProfileSummaryTest : ::testing::Test {
				InstrProfSummary IPS;
				SampleProfileSummary SPS;

				ProfileSummaryTest()
				: IPS({100000, 900000, 999999}), SPS({100000, 900000, 999999}) {}
				void SetUp() {
				InstrProfRecord Record1("func1", 0x1234, {97531, 5, 99999});
				InstrProfRecord Record2("func2", 0x1234, {57341, 10000, 10, 1});
				IPS.addRecord(Record1);
				IPS.addRecord(Record2);

				IPS.computeDetailedSummary();

				FunctionSamples FooSamples;
				FooSamples.addTotalSamples(7711);
				FooSamples.addHeadSamples(610);
				FooSamples.addBodySamples(1, 0, 610);
				FooSamples.addBodySamples(2, 0, 600);
				FooSamples.addBodySamples(4, 0, 60000);
				FooSamples.addBodySamples(8, 0, 60351);
				FooSamples.addBodySamples(10, 0, 605);

				FunctionSamples BarSamples;
				BarSamples.addTotalSamples(20301);
				BarSamples.addHeadSamples(1437);
				BarSamples.addBodySamples(1, 0, 1437);

				SPS.addRecord(FooSamples);
				SPS.addRecord(BarSamples);

				SPS.computeDetailedSummary();
				}

				};

				TEST_F(ProfileSummaryTest, summary_from_module) {
				LLVMContext Context;
				Module M1("M1", Context);
				EXPECT_FALSE(ProfileSummary::getProfileSummary(&M1));
				M1.setProfileSummary(IPS.getMD(Context));
				EXPECT_TRUE(IPS == *ProfileSummary::getProfileSummary(&M1));

				Module M2("M2", Context);
				EXPECT_FALSE(ProfileSummary::getProfileSummary(&M2));
				M2.setProfileSummary(SPS.getMD(Context));
				EXPECT_TRUE(SPS == *ProfileSummary::getProfileSummary(&M2));
				}

This is an archive of the discontinued LLVM Phabricator instance.

Replace the use of MaxFunctionCount module flagClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 53952

llvm/trunk/include/llvm/ProfileData/ProfileCommon.h

llvm/trunk/lib/Analysis/InlineCost.cpp

llvm/trunk/lib/ProfileData/ProfileSummary.cpp

llvm/trunk/test/Transforms/Inline/inline-cold-callee.ll

llvm/trunk/test/Transforms/Inline/inline-hot-callee.ll

llvm/trunk/unittests/ProfileData/CMakeLists.txt

llvm/trunk/unittests/ProfileData/ProfileSummaryTest.cpp

Replace the use of MaxFunctionCount module flag
ClosedPublic