This is an archive of the discontinued LLVM Phabricator instance.

[SampleFDO] Don't let inliner treat warm callsite with inline instance in the profile as cold
ClosedPublic

Authored by wmi on Apr 6 2018, 9:16 AM.

Download Raw Diff

Details

Reviewers

danielcdh
davidxl
javed.absar

Commits

rG0c2f6be662d8: [SampleFDO] Don't treat warm callsite with inline instance in the profile as…
rL332058: [SampleFDO] Don't treat warm callsite with inline instance in the profile as…

Summary

We found current sampleFDO had a performance issue when triaging a regression. For a callsite with inline instance in the profile, even if hot callsite inliner cannot inline it, it may still execute enough times and should not be treated as cold in regular inliner later. However, currently if such callsite is not inlined by hot callsite inliner, and the BB where the callsite locates doesn't get samples from other instructions inside of it, the callsite will have no profile metadata annotated. In regular inliner cost analysis, if the callsite has no profile annotated and its caller has profile information, it will be treated as cold.

The fix is for such warm callsites without profile because they are inlined in the profile, still keep them without profile metadata annotated. For other callsites whose parent BBs don't get any sample, explicitly annotate them with 0 profile count (Don't omit profile metadata). In regular inliner cost analysis, if a callsite has no profile annotated, we won't treat it as cold anymore -- we treat callsites as cold only when they profile count exists and is less than cold cutoff value.

It fixes a 5% regression in the target application. I also evaluate it on two server benchmarks and find no performance difference there, but one server benchmark gets 2% reduction in size.

I also evaluate other alternative to fix the issue, like relax the criterial of hotness checking in hot callsite inliner, but the result is not as good as this strategy probably because regular inliner has more information about whether we should inline a warm callsite with medium/small size callee.

Diff Detail

Repository: rL LLVM

Event Timeline

wmi created this revision.Apr 6 2018, 9:16 AM

Herald added subscribers: kristof.beyls, eraman, javed.absar, sanjoy. · View Herald TranscriptApr 6 2018, 9:16 AM

Update a comment in the code.

davidxl added inline comments.Apr 11 2018, 9:32 AM

lib/Analysis/ProfileSummaryInfo.cpp
252 ↗	(On Diff #141366)	This can cause problem if the caller function is newly added and there is no profile associated with it -- all callsites there will be marked as cold.
lib/Transforms/IPO/SampleProfile.cpp
1300 ↗	(On Diff #141366)	Instead of skipping it, is it better to annotate it with a 'warm' profile count?

wmi added inline comments.Apr 11 2018, 10:00 AM

lib/Analysis/ProfileSummaryInfo.cpp
252 ↗	(On Diff #141366)	You are right. That is a problem. Need to figure out how to avoid it.
lib/Transforms/IPO/SampleProfile.cpp
1300 ↗	(On Diff #141366)	I considered that solution but I was worried that by annotating the callsite with warm profile count, it will be treated as warm but the callsites inside of the callee will still be treated as cold after the current callsite is inlined. Definitely the issue here I am worried about is minor than the missing profile issue of new functions introduced by source change. I think this is still a good solution if only the testing is fine, or if we can come up with better solution.

Thanks for the fix!

I think maybe a preferred way to fix this is to change SampleProfileLoader::inlineHotFunctions to inline these "warm" inlined callsites early. The current algorithm uses callsiteIsHot, which compares inline instance's total count to the caller's total count, which could be misleading if the caller is super large/hot. A better algorithm should compare inline instance's total count to PSI to get a global hotness. In this way, if the profile annotator thinks a callsite is not hot, the later inliner should *not* even try to inline it. This makes the design cleaner and more stable. WDYT?

In D45377#1068438, @danielcdh wrote:

Thanks for the fix!

I think maybe a preferred way to fix this is to change SampleProfileLoader::inlineHotFunctions to inline these "warm" inlined callsites early. The current algorithm uses callsiteIsHot, which compares inline instance's total count to the caller's total count, which could be misleading if the caller is super large/hot. A better algorithm should compare inline instance's total count to PSI to get a global hotness. In this way, if the profile annotator thinks a callsite is not hot, the later inliner should *not* even try to inline it. This makes the design cleaner and more stable. WDYT?

I tried the idea to compute the inline instance's total count divided by its bb count, and compare the division result to PSI hot threshold. That improved the regression benchmark but did not recover the whole regression. That is why I choosed to keep the current callsiteIsHot check in early inliner unchanged because I guessed regular inliner may have a better position to decide whether to inline such warm/medium size callsite.

Tried David's suggestion and found the tests were good. The original regression for the target benchmark was recovered and we even got a little improvement. Another two server benchmarks had no performance change.

Patch was updated accordingly.

In D45377#1068853, @wmi wrote:

In D45377#1068438, @danielcdh wrote:

Thanks for the fix!

I think maybe a preferred way to fix this is to change SampleProfileLoader::inlineHotFunctions to inline these "warm" inlined callsites early. The current algorithm uses callsiteIsHot, which compares inline instance's total count to the caller's total count, which could be misleading if the caller is super large/hot. A better algorithm should compare inline instance's total count to PSI to get a global hotness. In this way, if the profile annotator thinks a callsite is not hot, the later inliner should *not* even try to inline it. This makes the design cleaner and more stable. WDYT?

I tried the idea to compute the inline instance's total count divided by its bb count, and compare the division result to PSI hot threshold. That improved the regression benchmark but did not recover the whole regression. That is why I choosed to keep the current callsiteIsHot check in early inliner unchanged because I guessed regular inliner may have a better position to decide whether to inline such warm/medium size callsite.

I suppose the regression comes from iterative-AutoFDO?

The problem of letting regular inliner to handle warm callsites is that the callee may have profile missing if it is fully inlined. Maybe instead of comparing total_count/num_calle_bb to hot threshold, just compare total_count to hot threshold? I agree this may increase code size a little, but it should not be worst than the previous afdo binary?

In D45377#1068900, @danielcdh wrote:

In D45377#1068853, @wmi wrote:

In D45377#1068438, @danielcdh wrote:

Thanks for the fix!

I think maybe a preferred way to fix this is to change SampleProfileLoader::inlineHotFunctions to inline these "warm" inlined callsites early. The current algorithm uses callsiteIsHot, which compares inline instance's total count to the caller's total count, which could be misleading if the caller is super large/hot. A better algorithm should compare inline instance's total count to PSI to get a global hotness. In this way, if the profile annotator thinks a callsite is not hot, the later inliner should *not* even try to inline it. This makes the design cleaner and more stable. WDYT?

I tried the idea to compute the inline instance's total count divided by its bb count, and compare the division result to PSI hot threshold. That improved the regression benchmark but did not recover the whole regression. That is why I choosed to keep the current callsiteIsHot check in early inliner unchanged because I guessed regular inliner may have a better position to decide whether to inline such warm/medium size callsite.

I suppose the regression comes from iterative-AutoFDO?

It is possible. Because it is only about 1% regression uncovered by the change, I don't have a good way to measure exactly where it comes from. And 1% is within the fluctuation range the target benchmarks allows.

The problem of letting regular inliner to handle warm callsites is that the callee may have profile missing if it is fully inlined. Maybe instead of comparing total_count/num_calle_bb to hot threshold, just compare total_count to hot threshold? I agree this may increase code size a little, but it should not be worst than the previous afdo binary?

Yes, that is the same concern I have in my reply to David's suggestion, but the result seems fine. I can measure your suggested way and see how it looks like.

The problem of letting regular inliner to handle warm callsites is that the callee may have profile missing if it is fully inlined. Maybe instead of comparing total_count/num_calle_bb to hot
threshold, just compare total_count to hot threshold? I agree this may increase code size a little, but it should not be worst than the previous afdo binary?

Yes, that is the same concern I have in my reply to David's suggestion, but the result seems fine. I can measure your suggested way and see how it looks like.

I tested the solution of comparing total_count to hot threshold, for the two server benchmarks the performance had no change. But for the regressed benchmark, it is a little worse than the solution of comparing total_count/num_callee_bb to hot threshold -- in my three runs there were two runs for which the regression was larger than the fluctuation range the target benchmarks allows. I know it is possible there is other side-effect taking place here, but for now I don't have detail perf profile for me to find out.

Herald added a subscriber: chrib. · View Herald TranscriptApr 18 2018, 8:42 AM

In D45377#1071031, @wmi wrote:

The problem of letting regular inliner to handle warm callsites is that the callee may have profile missing if it is fully inlined. Maybe instead of comparing total_count/num_calle_bb to hot
threshold, just compare total_count to hot threshold? I agree this may increase code size a little, but it should not be worst than the previous afdo binary?

Yes, that is the same concern I have in my reply to David's suggestion, but the result seems fine. I can measure your suggested way and see how it looks like.

I tested the solution of comparing total_count to hot threshold, for the two server benchmarks the performance had no change. But for the regressed benchmark, it is a little worse than the solution of comparing total_count/num_callee_bb to hot threshold -- in my three runs there were two runs for which the regression was larger than the fluctuation range the target benchmarks allows. I know it is possible there is other side-effect taking place here, but for now I don't have detail perf profile for me to find out.

Ok, I find out the reason why comparing total_count to hot threshold didn't recover the regression. It is indeed caused by side-effect. The different inline disabled a jumpthreading and in turn disabled a block of code from being sunk into cold region in machine sinking. This lead to the regression. The patch in https://reviews.llvm.org/D46275 can fix the issue in jumpthreading. With D46275 installed, the solution of comparing total_count to hot threshold recover all the regression and even bring small improvement for the benchmark.

I will update the patch using the solution of comparing total_count to hot threshold.

Herald added a reviewer: javed.absar. · View Herald TranscriptApr 30 2018, 10:51 AM

Update the patch to use the solution of comparing total count to hot cutoff threshold.

danielcdh added inline comments.Apr 30 2018, 6:41 PM

lib/Transforms/IPO/SampleProfile.cpp
368 ↗	(On Diff #144647)	In what situations will PSI be nullptr? If not, then please assert it instead. Also, I think this will overwrite the later (PercentSamples >= SampleProfileHotThreshold) heuristic, and we should remove that flag.

wmi added inline comments.Apr 30 2018, 8:48 PM

lib/Transforms/IPO/SampleProfile.cpp
368 ↗	(On Diff #144647)	PSI will not be nullptr. Will add an assertion. It is possible that for a callsite its CallsiteTotalSamples is less than hot cutoff threshold but still have a PententSamples larger than SampleProfileHotThreshold. My original plan is if a callsite is inlined currently, the new heuristic will still keep it. But I check where SampleProfileHotThreshold is used and find it is also used to populate the InlinedGUIDs set. To make that simple and consistent, like you suggest, I may remove SampleProfileHotThreshold and related heuristic.

remove SampleProfileHotThreshold. The benchmarks showed no regressions. I am now testing the iterative AFDO result.

Iterative AFDO result is comparable with AFDO result.

Ping.

danielcdh accepted this revision.May 10 2018, 1:09 PM

This revision is now accepted and ready to land.May 10 2018, 1:09 PM

Closed by commit rL332058: [SampleFDO] Don't treat warm callsite with inline instance in the profile as… (authored by wmi). · Explain WhyMay 10 2018, 4:06 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

Analysis/

ProfileSummaryInfo.h

6 lines

lib/

Analysis/

ProfileSummaryInfo.cpp

12 lines

Transforms/

IPO/

SampleProfile.cpp

101 lines

test/

Transforms/

SampleProfile/

Inputs/

warm-inline-instance.prof

11 lines

function_metadata.ll

2 lines

inline.ll

4 lines

warm-inline-instance.ll

115 lines

Diff 146250

llvm/trunk/include/llvm/Analysis/ProfileSummaryInfo.h

Show First 20 Lines • Show All 104 Lines • ▼ Show 20 Lines	public:
/// Returns true if BasicBlock \p B is considered hot.		/// Returns true if BasicBlock \p B is considered hot.
bool isHotBB(const BasicBlock B, BlockFrequencyInfo BFI);		bool isHotBB(const BasicBlock B, BlockFrequencyInfo BFI);
/// Returns true if BasicBlock \p B is considered cold.		/// Returns true if BasicBlock \p B is considered cold.
bool isColdBB(const BasicBlock B, BlockFrequencyInfo BFI);		bool isColdBB(const BasicBlock B, BlockFrequencyInfo BFI);
/// Returns true if CallSite \p CS is considered hot.		/// Returns true if CallSite \p CS is considered hot.
bool isHotCallSite(const CallSite &CS, BlockFrequencyInfo *BFI);		bool isHotCallSite(const CallSite &CS, BlockFrequencyInfo *BFI);
/// Returns true if Callsite \p CS is considered cold.		/// Returns true if Callsite \p CS is considered cold.
bool isColdCallSite(const CallSite &CS, BlockFrequencyInfo *BFI);		bool isColdCallSite(const CallSite &CS, BlockFrequencyInfo *BFI);
		/// Returns HotCountThreshold if set. Recompute HotCountThreshold
		/// if not set.
		uint64_t getOrCompHotCountThreshold();
		/// Returns ColdCountThreshold if set. Recompute HotCountThreshold
		/// if not set.
		uint64_t getOrCompColdCountThreshold();
/// Returns HotCountThreshold if set.		/// Returns HotCountThreshold if set.
uint64_t getHotCountThreshold() {		uint64_t getHotCountThreshold() {
return HotCountThreshold ? HotCountThreshold.getValue() : 0;		return HotCountThreshold ? HotCountThreshold.getValue() : 0;
}		}
/// Returns ColdCountThreshold if set.		/// Returns ColdCountThreshold if set.
uint64_t getColdCountThreshold() {		uint64_t getColdCountThreshold() {
return ColdCountThreshold ? ColdCountThreshold.getValue() : 0;		return ColdCountThreshold ? ColdCountThreshold.getValue() : 0;
}		}
▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

llvm/trunk/lib/Analysis/ProfileSummaryInfo.cpp

	Show First 20 Lines • Show All 217 Lines • ▼ Show 20 Lines
	}			}

	bool ProfileSummaryInfo::isColdCount(uint64_t C) {			bool ProfileSummaryInfo::isColdCount(uint64_t C) {
	if (!ColdCountThreshold)			if (!ColdCountThreshold)
	computeThresholds();			computeThresholds();
	return ColdCountThreshold && C <= ColdCountThreshold.getValue();			return ColdCountThreshold && C <= ColdCountThreshold.getValue();
	}			}

				uint64_t ProfileSummaryInfo::getOrCompHotCountThreshold() {
				if (!HotCountThreshold)
				computeThresholds();
				return HotCountThreshold && HotCountThreshold.getValue();
				}

				uint64_t ProfileSummaryInfo::getOrCompColdCountThreshold() {
				if (!ColdCountThreshold)
				computeThresholds();
				return ColdCountThreshold && ColdCountThreshold.getValue();
				}

	bool ProfileSummaryInfo::isHotBB(const BasicBlock B, BlockFrequencyInfo BFI) {			bool ProfileSummaryInfo::isHotBB(const BasicBlock B, BlockFrequencyInfo BFI) {
	auto Count = BFI->getBlockProfileCount(B);			auto Count = BFI->getBlockProfileCount(B);
	return Count && isHotCount(*Count);			return Count && isHotCount(*Count);
	}			}

	bool ProfileSummaryInfo::isColdBB(const BasicBlock *B,			bool ProfileSummaryInfo::isColdBB(const BasicBlock *B,
	BlockFrequencyInfo *BFI) {			BlockFrequencyInfo *BFI) {
	auto Count = BFI->getBlockProfileCount(B);			auto Count = BFI->getBlockProfileCount(B);
	▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/IPO/SampleProfile.cpp

Show All 31 Lines
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringMap.h"		#include "llvm/ADT/StringMap.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/ADT/Twine.h"		#include "llvm/ADT/Twine.h"
#include "llvm/Analysis/AssumptionCache.h"		#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/Analysis/InlineCost.h"		#include "llvm/Analysis/InlineCost.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/OptimizationRemarkEmitter.h"		#include "llvm/Analysis/OptimizationRemarkEmitter.h"
		#include "llvm/Analysis/ProfileSummaryInfo.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/CFG.h"		#include "llvm/IR/CFG.h"
#include "llvm/IR/CallSite.h"		#include "llvm/IR/CallSite.h"
#include "llvm/IR/DebugInfoMetadata.h"		#include "llvm/IR/DebugInfoMetadata.h"
#include "llvm/IR/DebugLoc.h"		#include "llvm/IR/DebugLoc.h"
#include "llvm/IR/DiagnosticInfo.h"		#include "llvm/IR/DiagnosticInfo.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	static cl::opt<unsigned> SampleProfileRecordCoverage(
cl::desc("Emit a warning if less than N% of records in the input profile "		cl::desc("Emit a warning if less than N% of records in the input profile "
"are matched to the IR."));		"are matched to the IR."));

static cl::opt<unsigned> SampleProfileSampleCoverage(		static cl::opt<unsigned> SampleProfileSampleCoverage(
"sample-profile-check-sample-coverage", cl::init(0), cl::value_desc("N"),		"sample-profile-check-sample-coverage", cl::init(0), cl::value_desc("N"),
cl::desc("Emit a warning if less than N% of samples in the input profile "		cl::desc("Emit a warning if less than N% of samples in the input profile "
"are matched to the IR."));		"are matched to the IR."));

static cl::opt<double> SampleProfileHotThreshold(
"sample-profile-inline-hot-threshold", cl::init(0.1), cl::value_desc("N"),
cl::desc("Inlined functions that account for more than N% of all samples "
"collected in the parent function, will be inlined again."));

namespace {		namespace {

using BlockWeightMap = DenseMap<const BasicBlock *, uint64_t>;		using BlockWeightMap = DenseMap<const BasicBlock *, uint64_t>;
using EquivalenceClassMap = DenseMap<const BasicBlock , const BasicBlock >;		using EquivalenceClassMap = DenseMap<const BasicBlock , const BasicBlock >;
using Edge = std::pair<const BasicBlock , const BasicBlock >;		using Edge = std::pair<const BasicBlock , const BasicBlock >;
using EdgeWeightMap = DenseMap<Edge, uint64_t>;		using EdgeWeightMap = DenseMap<Edge, uint64_t>;
using BlockEdgeMap =		using BlockEdgeMap =
DenseMap<const BasicBlock , SmallVector<const BasicBlock , 8>>;		DenseMap<const BasicBlock , SmallVector<const BasicBlock , 8>>;

class SampleCoverageTracker {		class SampleCoverageTracker {
public:		public:
SampleCoverageTracker() = default;		SampleCoverageTracker() = default;

bool markSamplesUsed(const FunctionSamples *FS, uint32_t LineOffset,		bool markSamplesUsed(const FunctionSamples *FS, uint32_t LineOffset,
uint32_t Discriminator, uint64_t Samples);		uint32_t Discriminator, uint64_t Samples);
unsigned computeCoverage(unsigned Used, unsigned Total) const;		unsigned computeCoverage(unsigned Used, unsigned Total) const;
unsigned countUsedRecords(const FunctionSamples *FS) const;		unsigned countUsedRecords(const FunctionSamples *FS,
unsigned countBodyRecords(const FunctionSamples *FS) const;		ProfileSummaryInfo *PSI) const;
		unsigned countBodyRecords(const FunctionSamples *FS,
		ProfileSummaryInfo *PSI) const;
uint64_t getTotalUsedSamples() const { return TotalUsedSamples; }		uint64_t getTotalUsedSamples() const { return TotalUsedSamples; }
uint64_t countBodySamples(const FunctionSamples *FS) const;		uint64_t countBodySamples(const FunctionSamples *FS,
		ProfileSummaryInfo *PSI) const;

void clear() {		void clear() {
SampleCoverage.clear();		SampleCoverage.clear();
TotalUsedSamples = 0;		TotalUsedSamples = 0;
}		}

private:		private:
using BodySampleCoverageMap = std::map<LineLocation, unsigned>;		using BodySampleCoverageMap = std::map<LineLocation, unsigned>;
Show All 36 Lines	SampleProfileLoader(
StringRef Name, bool IsThinLTOPreLink,		StringRef Name, bool IsThinLTOPreLink,
std::function<AssumptionCache &(Function &)> GetAssumptionCache,		std::function<AssumptionCache &(Function &)> GetAssumptionCache,
std::function<TargetTransformInfo &(Function &)> GetTargetTransformInfo)		std::function<TargetTransformInfo &(Function &)> GetTargetTransformInfo)
: GetAC(std::move(GetAssumptionCache)),		: GetAC(std::move(GetAssumptionCache)),
GetTTI(std::move(GetTargetTransformInfo)), Filename(Name),		GetTTI(std::move(GetTargetTransformInfo)), Filename(Name),
IsThinLTOPreLink(IsThinLTOPreLink) {}		IsThinLTOPreLink(IsThinLTOPreLink) {}

bool doInitialization(Module &M);		bool doInitialization(Module &M);
bool runOnModule(Module &M, ModuleAnalysisManager *AM);		bool runOnModule(Module &M, ModuleAnalysisManager *AM,
		ProfileSummaryInfo *_PSI);

void dump() { Reader->dump(); }		void dump() { Reader->dump(); }

protected:		protected:
bool runOnFunction(Function &F, ModuleAnalysisManager *AM);		bool runOnFunction(Function &F, ModuleAnalysisManager *AM);
unsigned getFunctionLoc(Function &F);		unsigned getFunctionLoc(Function &F);
bool emitAnnotations(Function &F);		bool emitAnnotations(Function &F);
ErrorOr<uint64_t> getInstWeight(const Instruction &I);		ErrorOr<uint64_t> getInstWeight(const Instruction &I);
▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	protected:
bool ProfileIsValid = false;		bool ProfileIsValid = false;

/// Flag indicating if the pass is invoked in ThinLTO compile phase.		/// Flag indicating if the pass is invoked in ThinLTO compile phase.
///		///
/// In this phase, in annotation, we should not promote indirect calls.		/// In this phase, in annotation, we should not promote indirect calls.
/// Instead, we will mark GUIDs that needs to be annotated to the function.		/// Instead, we will mark GUIDs that needs to be annotated to the function.
bool IsThinLTOPreLink;		bool IsThinLTOPreLink;

		/// Profile Summary Info computed from sample profile.
		ProfileSummaryInfo *PSI = nullptr;

/// Total number of samples collected in this profile.		/// Total number of samples collected in this profile.
///		///
/// This is the sum of all the samples collected in all the functions executed		/// This is the sum of all the samples collected in all the functions executed
/// at runtime.		/// at runtime.
uint64_t TotalCollectedSamples = 0;		uint64_t TotalCollectedSamples = 0;

/// Optimization Remark Emitter used to emit diagnostic remarks.		/// Optimization Remark Emitter used to emit diagnostic remarks.
OptimizationRemarkEmitter *ORE = nullptr;		OptimizationRemarkEmitter *ORE = nullptr;
Show All 24 Lines	public:
}		}

StringRef getPassName() const override { return "Sample profile pass"; }		StringRef getPassName() const override { return "Sample profile pass"; }
bool runOnModule(Module &M) override;		bool runOnModule(Module &M) override;

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.addRequired<AssumptionCacheTracker>();		AU.addRequired<AssumptionCacheTracker>();
AU.addRequired<TargetTransformInfoWrapperPass>();		AU.addRequired<TargetTransformInfoWrapperPass>();
		AU.addRequired<ProfileSummaryInfoWrapperPass>();
}		}

private:		private:
SampleProfileLoader SampleLoader;		SampleProfileLoader SampleLoader;
AssumptionCacheTracker *ACT = nullptr;		AssumptionCacheTracker *ACT = nullptr;
TargetTransformInfoWrapperPass *TTIWP = nullptr;		TargetTransformInfoWrapperPass *TTIWP = nullptr;
};		};

} // end anonymous namespace		} // end anonymous namespace

/// Return true if the given callsite is hot wrt to its caller.		/// Return true if the given callsite is hot wrt to hot cutoff threshold.
///		///
/// Functions that were inlined in the original binary will be represented		/// Functions that were inlined in the original binary will be represented
/// in the inline stack in the sample profile. If the profile shows that		/// in the inline stack in the sample profile. If the profile shows that
/// the original inline decision was "good" (i.e., the callsite is executed		/// the original inline decision was "good" (i.e., the callsite is executed
/// frequently), then we will recreate the inline decision and apply the		/// frequently), then we will recreate the inline decision and apply the
/// profile from the inlined callsite.		/// profile from the inlined callsite.
///		///
/// To decide whether an inlined callsite is hot, we compute the fraction		/// To decide whether an inlined callsite is hot, we compare the callsite
/// of samples used by the callsite with respect to the total number of samples		/// sample count with the hot cutoff computed by ProfileSummaryInfo, it is
/// collected in the caller.		/// regarded as hot if the count is above the cutoff value.
///		static bool callsiteIsHot(const FunctionSamples *CallsiteFS,
/// If that fraction is larger than the default given by		ProfileSummaryInfo *PSI) {
/// SampleProfileHotThreshold, the callsite will be inlined again.
static bool callsiteIsHot(const FunctionSamples *CallerFS,
const FunctionSamples *CallsiteFS) {
if (!CallsiteFS)		if (!CallsiteFS)
return false; // The callsite was not inlined in the original binary.		return false; // The callsite was not inlined in the original binary.

uint64_t ParentTotalSamples = CallerFS->getTotalSamples();		assert(PSI && "PSI is expected to be non null");
if (ParentTotalSamples == 0)
return false; // Avoid division by zero.

uint64_t CallsiteTotalSamples = CallsiteFS->getTotalSamples();		uint64_t CallsiteTotalSamples = CallsiteFS->getTotalSamples();
if (CallsiteTotalSamples == 0)		return PSI->isHotCount(CallsiteTotalSamples);
return false; // Callsite is trivially cold.

double PercentSamples =
(double)CallsiteTotalSamples / (double)ParentTotalSamples * 100.0;
return PercentSamples >= SampleProfileHotThreshold;
}		}

/// Mark as used the sample record for the given function samples at		/// Mark as used the sample record for the given function samples at
/// (LineOffset, Discriminator).		/// (LineOffset, Discriminator).
///		///
/// \returns true if this is the first time we mark the given record.		/// \returns true if this is the first time we mark the given record.
bool SampleCoverageTracker::markSamplesUsed(const FunctionSamples *FS,		bool SampleCoverageTracker::markSamplesUsed(const FunctionSamples *FS,
uint32_t LineOffset,		uint32_t LineOffset,
uint32_t Discriminator,		uint32_t Discriminator,
uint64_t Samples) {		uint64_t Samples) {
LineLocation Loc(LineOffset, Discriminator);		LineLocation Loc(LineOffset, Discriminator);
unsigned &Count = SampleCoverage[FS][Loc];		unsigned &Count = SampleCoverage[FS][Loc];
bool FirstTime = (++Count == 1);		bool FirstTime = (++Count == 1);
if (FirstTime)		if (FirstTime)
TotalUsedSamples += Samples;		TotalUsedSamples += Samples;
return FirstTime;		return FirstTime;
}		}

/// Return the number of sample records that were applied from this profile.		/// Return the number of sample records that were applied from this profile.
///		///
/// This count does not include records from cold inlined callsites.		/// This count does not include records from cold inlined callsites.
unsigned		unsigned
SampleCoverageTracker::countUsedRecords(const FunctionSamples *FS) const {		SampleCoverageTracker::countUsedRecords(const FunctionSamples *FS,
		ProfileSummaryInfo *PSI) const {
auto I = SampleCoverage.find(FS);		auto I = SampleCoverage.find(FS);

// The size of the coverage map for FS represents the number of records		// The size of the coverage map for FS represents the number of records
// that were marked used at least once.		// that were marked used at least once.
unsigned Count = (I != SampleCoverage.end()) ? I->second.size() : 0;		unsigned Count = (I != SampleCoverage.end()) ? I->second.size() : 0;

// If there are inlined callsites in this function, count the samples found		// If there are inlined callsites in this function, count the samples found
// in the respective bodies. However, do not bother counting callees with 0		// in the respective bodies. However, do not bother counting callees with 0
// total samples, these are callees that were never invoked at runtime.		// total samples, these are callees that were never invoked at runtime.
for (const auto &I : FS->getCallsiteSamples())		for (const auto &I : FS->getCallsiteSamples())
for (const auto &J : I.second) {		for (const auto &J : I.second) {
const FunctionSamples *CalleeSamples = &J.second;		const FunctionSamples *CalleeSamples = &J.second;
if (callsiteIsHot(FS, CalleeSamples))		if (callsiteIsHot(CalleeSamples, PSI))
Count += countUsedRecords(CalleeSamples);		Count += countUsedRecords(CalleeSamples, PSI);
}		}

return Count;		return Count;
}		}

/// Return the number of sample records in the body of this profile.		/// Return the number of sample records in the body of this profile.
///		///
/// This count does not include records from cold inlined callsites.		/// This count does not include records from cold inlined callsites.
unsigned		unsigned
SampleCoverageTracker::countBodyRecords(const FunctionSamples *FS) const {		SampleCoverageTracker::countBodyRecords(const FunctionSamples *FS,
		ProfileSummaryInfo *PSI) const {
unsigned Count = FS->getBodySamples().size();		unsigned Count = FS->getBodySamples().size();

// Only count records in hot callsites.		// Only count records in hot callsites.
for (const auto &I : FS->getCallsiteSamples())		for (const auto &I : FS->getCallsiteSamples())
for (const auto &J : I.second) {		for (const auto &J : I.second) {
const FunctionSamples *CalleeSamples = &J.second;		const FunctionSamples *CalleeSamples = &J.second;
if (callsiteIsHot(FS, CalleeSamples))		if (callsiteIsHot(CalleeSamples, PSI))
Count += countBodyRecords(CalleeSamples);		Count += countBodyRecords(CalleeSamples, PSI);
}		}

return Count;		return Count;
}		}

/// Return the number of samples collected in the body of this profile.		/// Return the number of samples collected in the body of this profile.
///		///
/// This count does not include samples from cold inlined callsites.		/// This count does not include samples from cold inlined callsites.
uint64_t		uint64_t
SampleCoverageTracker::countBodySamples(const FunctionSamples *FS) const {		SampleCoverageTracker::countBodySamples(const FunctionSamples *FS,
		ProfileSummaryInfo *PSI) const {
uint64_t Total = 0;		uint64_t Total = 0;
for (const auto &I : FS->getBodySamples())		for (const auto &I : FS->getBodySamples())
Total += I.second.getSamples();		Total += I.second.getSamples();

// Only count samples in hot callsites.		// Only count samples in hot callsites.
for (const auto &I : FS->getCallsiteSamples())		for (const auto &I : FS->getCallsiteSamples())
for (const auto &J : I.second) {		for (const auto &J : I.second) {
const FunctionSamples *CalleeSamples = &J.second;		const FunctionSamples *CalleeSamples = &J.second;
if (callsiteIsHot(FS, CalleeSamples))		if (callsiteIsHot(CalleeSamples, PSI))
Total += countBodySamples(CalleeSamples);		Total += countBodySamples(CalleeSamples, PSI);
}		}

return Total;		return Total;
}		}

/// Return the fraction of sample records used in this profile.		/// Return the fraction of sample records used in this profile.
///		///
/// The returned value is an unsigned integer in the range 0-100 indicating		/// The returned value is an unsigned integer in the range 0-100 indicating
▲ Show 20 Lines • Show All 311 Lines • ▼ Show 20 Lines	while (true) {
for (auto &BB : F) {		for (auto &BB : F) {
bool Hot = false;		bool Hot = false;
SmallVector<Instruction *, 10> Candidates;		SmallVector<Instruction *, 10> Candidates;
for (auto &I : BB.getInstList()) {		for (auto &I : BB.getInstList()) {
const FunctionSamples *FS = nullptr;		const FunctionSamples *FS = nullptr;
if ((isa<CallInst>(I) \|\| isa<InvokeInst>(I)) &&		if ((isa<CallInst>(I) \|\| isa<InvokeInst>(I)) &&
!isa<IntrinsicInst>(I) && (FS = findCalleeFunctionSamples(I))) {		!isa<IntrinsicInst>(I) && (FS = findCalleeFunctionSamples(I))) {
Candidates.push_back(&I);		Candidates.push_back(&I);
if (callsiteIsHot(Samples, FS))		if (callsiteIsHot(FS, PSI))
Hot = true;		Hot = true;
}		}
}		}
if (Hot) {		if (Hot) {
CIS.insert(CIS.begin(), Candidates.begin(), Candidates.end());		CIS.insert(CIS.begin(), Candidates.begin(), Candidates.end());
}		}
}		}
for (auto I : CIS) {		for (auto I : CIS) {
Function *CalledFunction = CallSite(I).getCalledFunction();		Function *CalledFunction = CallSite(I).getCalledFunction();
// Do not inline recursive calls.		// Do not inline recursive calls.
if (CalledFunction == &F)		if (CalledFunction == &F)
continue;		continue;
if (CallSite(I).isIndirectCall()) {		if (CallSite(I).isIndirectCall()) {
if (PromotedInsns.count(I))		if (PromotedInsns.count(I))
continue;		continue;
uint64_t Sum;		uint64_t Sum;
for (const auto FS : findIndirectCallFunctionSamples(I, Sum)) {		for (const auto FS : findIndirectCallFunctionSamples(I, Sum)) {
if (IsThinLTOPreLink) {		if (IsThinLTOPreLink) {
FS->findInlinedFunctions(InlinedGUIDs, F.getParent(),		FS->findInlinedFunctions(InlinedGUIDs, F.getParent(),
Samples->getTotalSamples() *		PSI->getOrCompHotCountThreshold());
SampleProfileHotThreshold / 100);
continue;		continue;
}		}
auto CalleeFunctionName = FS->getName();		auto CalleeFunctionName = FS->getName();
// If it is a recursive call, we do not inline it as it could bloat		// If it is a recursive call, we do not inline it as it could bloat
// the code exponentially. There is way to better handle this, e.g.		// the code exponentially. There is way to better handle this, e.g.
// clone the caller first, and inline the cloned caller if it is		// clone the caller first, and inline the cloned caller if it is
// recursive. As llvm does not inline recursive calls, we will		// recursive. As llvm does not inline recursive calls, we will
// simply ignore it instead of handling it explicitly.		// simply ignore it instead of handling it explicitly.
Show All 22 Lines	for (auto I : CIS) {
}		}
}		}
} else if (CalledFunction && CalledFunction->getSubprogram() &&		} else if (CalledFunction && CalledFunction->getSubprogram() &&
!CalledFunction->isDeclaration()) {		!CalledFunction->isDeclaration()) {
if (inlineCallInstruction(I))		if (inlineCallInstruction(I))
LocalChanged = true;		LocalChanged = true;
} else if (IsThinLTOPreLink) {		} else if (IsThinLTOPreLink) {
findCalleeFunctionSamples(*I)->findInlinedFunctions(		findCalleeFunctionSamples(*I)->findInlinedFunctions(
InlinedGUIDs, F.getParent(),		InlinedGUIDs, F.getParent(), PSI->getOrCompHotCountThreshold());
Samples->getTotalSamples() * SampleProfileHotThreshold / 100);
}		}
}		}
if (LocalChanged) {		if (LocalChanged) {
Changed = true;		Changed = true;
} else {		} else {
break;		break;
}		}
}		}
▲ Show 20 Lines • Show All 618 Lines • ▼ Show 20 Lines	if (Changed) {
findEquivalenceClasses(F);		findEquivalenceClasses(F);

// Propagate weights to all edges.		// Propagate weights to all edges.
propagateWeights(F);		propagateWeights(F);
}		}

// If coverage checking was requested, compute it now.		// If coverage checking was requested, compute it now.
if (SampleProfileRecordCoverage) {		if (SampleProfileRecordCoverage) {
unsigned Used = CoverageTracker.countUsedRecords(Samples);		unsigned Used = CoverageTracker.countUsedRecords(Samples, PSI);
unsigned Total = CoverageTracker.countBodyRecords(Samples);		unsigned Total = CoverageTracker.countBodyRecords(Samples, PSI);
unsigned Coverage = CoverageTracker.computeCoverage(Used, Total);		unsigned Coverage = CoverageTracker.computeCoverage(Used, Total);
if (Coverage < SampleProfileRecordCoverage) {		if (Coverage < SampleProfileRecordCoverage) {
F.getContext().diagnose(DiagnosticInfoSampleProfile(		F.getContext().diagnose(DiagnosticInfoSampleProfile(
F.getSubprogram()->getFilename(), getFunctionLoc(F),		F.getSubprogram()->getFilename(), getFunctionLoc(F),
Twine(Used) + " of " + Twine(Total) + " available profile records (" +		Twine(Used) + " of " + Twine(Total) + " available profile records (" +
Twine(Coverage) + "%) were applied",		Twine(Coverage) + "%) were applied",
DS_Warning));		DS_Warning));
}		}
}		}

if (SampleProfileSampleCoverage) {		if (SampleProfileSampleCoverage) {
uint64_t Used = CoverageTracker.getTotalUsedSamples();		uint64_t Used = CoverageTracker.getTotalUsedSamples();
uint64_t Total = CoverageTracker.countBodySamples(Samples);		uint64_t Total = CoverageTracker.countBodySamples(Samples, PSI);
unsigned Coverage = CoverageTracker.computeCoverage(Used, Total);		unsigned Coverage = CoverageTracker.computeCoverage(Used, Total);
if (Coverage < SampleProfileSampleCoverage) {		if (Coverage < SampleProfileSampleCoverage) {
F.getContext().diagnose(DiagnosticInfoSampleProfile(		F.getContext().diagnose(DiagnosticInfoSampleProfile(
F.getSubprogram()->getFilename(), getFunctionLoc(F),		F.getSubprogram()->getFilename(), getFunctionLoc(F),
Twine(Used) + " of " + Twine(Total) + " available profile samples (" +		Twine(Used) + " of " + Twine(Total) + " available profile samples (" +
Twine(Coverage) + "%) were applied",		Twine(Coverage) + "%) were applied",
DS_Warning));		DS_Warning));
}		}
}		}
return Changed;		return Changed;
}		}

char SampleProfileLoaderLegacyPass::ID = 0;		char SampleProfileLoaderLegacyPass::ID = 0;

INITIALIZE_PASS_BEGIN(SampleProfileLoaderLegacyPass, "sample-profile",		INITIALIZE_PASS_BEGIN(SampleProfileLoaderLegacyPass, "sample-profile",
"Sample Profile loader", false, false)		"Sample Profile loader", false, false)
INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)		INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)
INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
		INITIALIZE_PASS_DEPENDENCY(ProfileSummaryInfoWrapperPass)
INITIALIZE_PASS_END(SampleProfileLoaderLegacyPass, "sample-profile",		INITIALIZE_PASS_END(SampleProfileLoaderLegacyPass, "sample-profile",
"Sample Profile loader", false, false)		"Sample Profile loader", false, false)

bool SampleProfileLoader::doInitialization(Module &M) {		bool SampleProfileLoader::doInitialization(Module &M) {
auto &Ctx = M.getContext();		auto &Ctx = M.getContext();
auto ReaderOrErr = SampleProfileReader::create(Filename, Ctx);		auto ReaderOrErr = SampleProfileReader::create(Filename, Ctx);
if (std::error_code EC = ReaderOrErr.getError()) {		if (std::error_code EC = ReaderOrErr.getError()) {
std::string Msg = "Could not open profile: " + EC.message();		std::string Msg = "Could not open profile: " + EC.message();
Ctx.diagnose(DiagnosticInfoSampleProfile(Filename, Msg));		Ctx.diagnose(DiagnosticInfoSampleProfile(Filename, Msg));
return false;		return false;
}		}
Reader = std::move(ReaderOrErr.get());		Reader = std::move(ReaderOrErr.get());
ProfileIsValid = (Reader->read() == sampleprof_error::success);		ProfileIsValid = (Reader->read() == sampleprof_error::success);
return true;		return true;
}		}

ModulePass *llvm::createSampleProfileLoaderPass() {		ModulePass *llvm::createSampleProfileLoaderPass() {
return new SampleProfileLoaderLegacyPass(SampleProfileFile);		return new SampleProfileLoaderLegacyPass(SampleProfileFile);
}		}

ModulePass *llvm::createSampleProfileLoaderPass(StringRef Name) {		ModulePass *llvm::createSampleProfileLoaderPass(StringRef Name) {
return new SampleProfileLoaderLegacyPass(Name);		return new SampleProfileLoaderLegacyPass(Name);
}		}

bool SampleProfileLoader::runOnModule(Module &M, ModuleAnalysisManager *AM) {		bool SampleProfileLoader::runOnModule(Module &M, ModuleAnalysisManager *AM,
		ProfileSummaryInfo *_PSI) {
if (!ProfileIsValid)		if (!ProfileIsValid)
return false;		return false;

		PSI = _PSI;
		if (M.getProfileSummary() == nullptr)
		M.setProfileSummary(Reader->getSummary().getMD(M.getContext()));

// Compute the total number of samples collected in this profile.		// Compute the total number of samples collected in this profile.
for (const auto &I : Reader->getProfiles())		for (const auto &I : Reader->getProfiles())
TotalCollectedSamples += I.second.getTotalSamples();		TotalCollectedSamples += I.second.getTotalSamples();

// Populate the symbol map.		// Populate the symbol map.
for (const auto &N_F : M.getValueSymbolTable()) {		for (const auto &N_F : M.getValueSymbolTable()) {
StringRef OrigName = N_F.getKey();		StringRef OrigName = N_F.getKey();
Function *F = dyn_cast<Function>(N_F.getValue());		Function *F = dyn_cast<Function>(N_F.getValue());
Show All 14 Lines	bool SampleProfileLoader::runOnModule(Module &M, ModuleAnalysisManager *AM,
}		}

bool retval = false;		bool retval = false;
for (auto &F : M)		for (auto &F : M)
if (!F.isDeclaration()) {		if (!F.isDeclaration()) {
clearFunctionData();		clearFunctionData();
retval \|= runOnFunction(F, AM);		retval \|= runOnFunction(F, AM);
}		}
if (M.getProfileSummary() == nullptr)
M.setProfileSummary(Reader->getSummary().getMD(M.getContext()));
return retval;		return retval;
}		}

bool SampleProfileLoaderLegacyPass::runOnModule(Module &M) {		bool SampleProfileLoaderLegacyPass::runOnModule(Module &M) {
ACT = &getAnalysis<AssumptionCacheTracker>();		ACT = &getAnalysis<AssumptionCacheTracker>();
TTIWP = &getAnalysis<TargetTransformInfoWrapperPass>();		TTIWP = &getAnalysis<TargetTransformInfoWrapperPass>();
return SampleLoader.runOnModule(M, nullptr);		ProfileSummaryInfo *PSI =
		getAnalysis<ProfileSummaryInfoWrapperPass>().getPSI();
		return SampleLoader.runOnModule(M, nullptr, PSI);
}		}

bool SampleProfileLoader::runOnFunction(Function &F, ModuleAnalysisManager *AM) {		bool SampleProfileLoader::runOnFunction(Function &F, ModuleAnalysisManager *AM) {
// Initialize the entry count to -1, which will be treated conservatively		// Initialize the entry count to -1, which will be treated conservatively
// by getEntryCount as the same as unknown (None). If we have samples this		// by getEntryCount as the same as unknown (None). If we have samples this
// will be overwritten in emitAnnotations.		// will be overwritten in emitAnnotations.
F.setEntryCount(ProfileCount(-1, Function::PCT_Real));		F.setEntryCount(ProfileCount(-1, Function::PCT_Real));
std::unique_ptr<OptimizationRemarkEmitter> OwnedORE;		std::unique_ptr<OptimizationRemarkEmitter> OwnedORE;
Show All 25 Lines	PreservedAnalyses SampleProfileLoaderPass::run(Module &M,
};		};

SampleProfileLoader SampleLoader(		SampleProfileLoader SampleLoader(
ProfileFileName.empty() ? SampleProfileFile : ProfileFileName,		ProfileFileName.empty() ? SampleProfileFile : ProfileFileName,
IsThinLTOPreLink, GetAssumptionCache, GetTTI);		IsThinLTOPreLink, GetAssumptionCache, GetTTI);

SampleLoader.doInitialization(M);		SampleLoader.doInitialization(M);

if (!SampleLoader.runOnModule(M, &AM))		ProfileSummaryInfo *PSI = &AM.getResult<ProfileSummaryAnalysis>(M);
		if (!SampleLoader.runOnModule(M, &AM, PSI))
return PreservedAnalyses::all();		return PreservedAnalyses::all();

return PreservedAnalyses::none();		return PreservedAnalyses::none();
}		}

llvm/trunk/test/Transforms/SampleProfile/Inputs/warm-inline-instance.prof

				main:2257150:0
				2.1: 5553
				3: 5391
				3.1: foo:5860
				0: 5279
				1: 5279
				2: 5279
				4.1: goo:60
				0: 20
				1: 20
				2: 20

llvm/trunk/test/Transforms/SampleProfile/function_metadata.ll

	Show All 22 Lines
	; CHECK: define void @test_liveness({{.*}} !prof ![[ENTRY_TEST_LIVENESS:[0-9]+]]			; CHECK: define void @test_liveness({{.*}} !prof ![[ENTRY_TEST_LIVENESS:[0-9]+]]
	define void @test_liveness() !dbg !12 {			define void @test_liveness() !dbg !12 {
	call void @foo(), !dbg !20			call void @foo(), !dbg !20
	ret void			ret void
	}			}

	; GUIDs of foo, bar, foo1, foo2 and foo3 should be included in the metadata to			; GUIDs of foo, bar, foo1, foo2 and foo3 should be included in the metadata to
	; make sure hot inline stacks are imported.			; make sure hot inline stacks are imported.
	; CHECK: ![[ENTRY_TEST]] = !{!"function_entry_count", i64 1, i64 2494702099028631698, i64 6699318081062747564, i64 7682762345278052905, i64 -7908226060800700466, i64 -2012135647395072713}			; CHECK: ![[ENTRY_TEST]] = !{!"function_entry_count", i64 1, i64 2494702099028631698, i64 6699318081062747564, i64 7546896869197086323, i64 7682762345278052905, i64 -7908226060800700466, i64 -2012135647395072713}

	; Check GUIDs for both foo and foo_available are included in the metadata to			; Check GUIDs for both foo and foo_available are included in the metadata to
	; make sure the liveness analysis can capture the dependency from test_liveness			; make sure the liveness analysis can capture the dependency from test_liveness
	; to foo_available.			; to foo_available.
	; CHECK: ![[ENTRY_TEST_LIVENESS]] = !{!"function_entry_count", i64 1, i64 4005816710939881937, i64 6699318081062747564}			; CHECK: ![[ENTRY_TEST_LIVENESS]] = !{!"function_entry_count", i64 1, i64 4005816710939881937, i64 6699318081062747564}

	!llvm.dbg.cu = !{!0}			!llvm.dbg.cu = !{!0}
	!llvm.module.flags = !{!8, !9}			!llvm.module.flags = !{!8, !9}
	Show All 17 Lines

llvm/trunk/test/Transforms/SampleProfile/inline.ll

	; RUN: opt < %s -sample-profile -sample-profile-file=%S/Inputs/inline.prof -sample-profile-inline-hot-threshold=1 -S \| FileCheck %s			; RUN: opt < %s -sample-profile -sample-profile-file=%S/Inputs/inline.prof -S \| FileCheck %s
	; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline.prof -sample-profile-inline-hot-threshold=1 -S \| FileCheck %s			; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline.prof -S \| FileCheck %s

	; Original C++ test case			; Original C++ test case
	;			;
	; #include <stdio.h>			; #include <stdio.h>
	;			;
	; int sum(int x, int y) {			; int sum(int x, int y) {
	; return x + y;			; return x + y;
	; }			; }
	▲ Show 20 Lines • Show All 99 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/SampleProfile/warm-inline-instance.ll

				; RUN: opt < %s -sample-profile -sample-profile-file=%S/Inputs/warm-inline-instance.prof -S \| FileCheck %s
				; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/warm-inline-instance.prof -S \| FileCheck %s

				@.str = private unnamed_addr constant [11 x i8] c"sum is %d\0A\00", align 1

				; Function Attrs: nounwind uwtable
				define i32 @foo(i32 %x, i32 %y) !dbg !4 {
				entry:
				%x.addr = alloca i32, align 4
				%y.addr = alloca i32, align 4
				store i32 %x, i32* %x.addr, align 4
				store i32 %y, i32* %y.addr, align 4
				%t0 = load i32, i32* %x.addr, align 4, !dbg !11
				%t1 = load i32, i32* %y.addr, align 4, !dbg !11
				%add = add nsw i32 %t0, %t1, !dbg !11
				ret i32 %add, !dbg !11
				}

				define i32 @goo(i32 %x, i32 %y) {
				entry:
				%x.addr = alloca i32, align 4
				%y.addr = alloca i32, align 4
				store i32 %x, i32* %x.addr, align 4
				store i32 %y, i32* %y.addr, align 4
				%t0 = load i32, i32* %x.addr, align 4, !dbg !11
				%t1 = load i32, i32* %y.addr, align 4, !dbg !11
				%add = add nsw i32 %t0, %t1, !dbg !11
				ret i32 %add, !dbg !11
				}

				; Function Attrs: uwtable
				define i32 @main() !dbg !7 {
				entry:
				%retval = alloca i32, align 4
				%s = alloca i32, align 4
				%i = alloca i32, align 4
				store i32 0, i32* %retval
				store i32 0, i32* %i, align 4, !dbg !12
				br label %while.cond, !dbg !13

				while.cond: ; preds = %if.end, %entry
				%t0 = load i32, i32* %i, align 4, !dbg !14
				%inc = add nsw i32 %t0, 1, !dbg !14
				store i32 %inc, i32* %i, align 4, !dbg !14
				%cmp = icmp slt i32 %t0, 400000000, !dbg !14
				br i1 %cmp, label %while.body, label %while.end, !dbg !14

				while.body: ; preds = %while.cond
				%t1 = load i32, i32* %i, align 4, !dbg !16
				%cmp1 = icmp ne i32 %t1, 100, !dbg !16
				br i1 %cmp1, label %if.then, label %if.else, !dbg !16

				if.then: ; preds = %while.body
				%t2 = load i32, i32* %i, align 4, !dbg !18
				%t3 = load i32, i32* %s, align 4, !dbg !18
				; Although the ratio of total samples of @foo vs total samples of @main is
				; small, since the total samples count is larger than hot cutoff computed by
				; ProfileSummaryInfo, we will still regard the callsite of foo as hot and
				; early inlining will inline it.
				; CHECK-LABEL: @main(
				; CHECK-NOT: call i32 @foo(i32 %t2, i32 %t3)
				%call1 = call i32 @foo(i32 %t2, i32 %t3), !dbg !18
				store i32 %call1, i32* %s, align 4, !dbg !18
				br label %if.end, !dbg !18

				if.else: ; preds = %while.body
				; call @goo 's basicblock doesn't get any sample, so no profile will be annotated.
				; CHECK: call i32 @goo(i32 2, i32 3), !dbg !{{[0-9]+}}
				; CHECK-NOT: !prof
				; CHECK-SAME: {{$}}
				%call2 = call i32 @goo(i32 2, i32 3), !dbg !26
				store i32 %call2, i32* %s, align 4, !dbg !20
				br label %if.end

				if.end: ; preds = %if.else, %if.then
				br label %while.cond, !dbg !22

				while.end: ; preds = %while.cond
				%t4 = load i32, i32* %s, align 4, !dbg !24
				%call3 = call i32 (i8, ...) @printf(i8 getelementptr inbounds ([11 x i8], [11 x i8]* @.str, i32 0, i32 0), i32 %t4), !dbg !24
				ret i32 0, !dbg !25
				}

				declare i32 @printf(i8*, ...) #2

				!llvm.dbg.cu = !{!0}
				!llvm.module.flags = !{!8, !9}
				!llvm.ident = !{!10}

				!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, producer: "clang version 3.5 ", isOptimized: false, emissionKind: NoDebug, file: !1, enums: !2, retainedTypes: !2, globals: !2, imports: !2)
				!1 = !DIFile(filename: "calls.cc", directory: ".")
				!2 = !{}
				!4 = distinct !DISubprogram(name: "foo", line: 3, isLocal: false, isDefinition: true, virtualIndex: 6, flags: DIFlagPrototyped, isOptimized: false, unit: !0, scopeLine: 3, file: !1, scope: !5, type: !6, retainedNodes: !2)
				!5 = !DIFile(filename: "calls.cc", directory: ".")
				!6 = !DISubroutineType(types: !2)
				!7 = distinct !DISubprogram(name: "main", line: 7, isLocal: false, isDefinition: true, virtualIndex: 6, flags: DIFlagPrototyped, isOptimized: false, unit: !0, scopeLine: 7, file: !1, scope: !5, type: !6, retainedNodes: !2)
				!8 = !{i32 2, !"Dwarf Version", i32 4}
				!9 = !{i32 1, !"Debug Info Version", i32 3}
				!10 = !{!"clang version 3.5 "}
				!11 = !DILocation(line: 4, scope: !4)
				!12 = !DILocation(line: 8, scope: !7)
				!13 = !DILocation(line: 9, scope: !7)
				!14 = !DILocation(line: 9, scope: !15)
				!15 = !DILexicalBlockFile(discriminator: 2, file: !1, scope: !7)
				!16 = !DILocation(line: 10, scope: !17)
				!17 = distinct !DILexicalBlock(line: 10, column: 0, file: !1, scope: !7)
				!18 = !DILocation(line: 10, scope: !19)
				!19 = !DILexicalBlockFile(discriminator: 2, file: !1, scope: !17)
				!20 = !DILocation(line: 10, scope: !21)
				!21 = !DILexicalBlockFile(discriminator: 4, file: !1, scope: !17)
				!22 = !DILocation(line: 10, scope: !23)
				!23 = !DILexicalBlockFile(discriminator: 6, file: !1, scope: !17)
				!24 = !DILocation(line: 11, scope: !7)
				!25 = !DILocation(line: 12, scope: !7)
				!26 = !DILocation(line: 11, scope: !19)