This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/tools/llvm-profgen/
-
tools/
-
llvm-profgen/
-
CSPreInliner.h
9/18
CSPreInliner.cpp
-
ProfileGenerator.h
-
ProfileGenerator.cpp

Differential D125023

[CSSPGO][Preinliner] Use linear threshold to drive inline decision.
ClosedPublic

Authored by hoy on May 5 2022, 10:10 AM.

Download Raw Diff

Details

Reviewers

wenlei
wlei

Commits

rGa4190037fac0: [CSSPGO][Preinliner] Use linear threshold to drive inline decision.

Summary

The per-callsite size threshold used today to drive preinline decision is based on hotness/coldness cutoff. The default setup is for callsites with a sample count above the hotness cutoff (99%), a 1500 size threshold is used. Any callsite below 99.99% coldness cutoff uses a zero threshold. This has a couple issues:

While both cutoffs and size thoresholds are configurable, different applications may need different setups, making a universal setup impractical.

The callsites between hotness cutoff and coldness cutoff are not considered as inline candidates, which could be a missing opportunity.

Hot callsites always use the same threshold. In reality we may want a bigger threshold for hotter callsites.

In this change we are introducing a linear threshold regardless of hot/cold cutoffs. Given a sample space, a threshold is computed for a callsite based on the position of that callsite sample in the whole space. With that we no longer need to define what's hot or cold. Callsites with different hotness will get a different threshold. This should overcome the above three issues.

I have seen good results with a universal default setup for two of our internal services.

For one service, 0.2% to 0.5% perf improvement over a baseline with a previous default setup, on-par code size.
For the second service, 0.5% to 0.8% perf improvement over a baseline with a previous default setup, 0.2% code size increase; on-par performance and code size with a baseline that is with a carefully tuned cutoff to cover enough hot functions.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

hoy created this revision.May 5 2022, 10:10 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 5 2022, 10:10 AM

Herald added subscribers: modimo, wenlei. · View Herald Transcript

hoy requested review of this revision.May 5 2022, 10:10 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 5 2022, 10:10 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

hoy edited the summary of this revision. (Show Details)May 5 2022, 10:13 AM

hoy added reviewers: wenlei, wlei.

In general, we should include code size and perf numbers for benchmarks for such changes.

hoy edited the summary of this revision. (Show Details)May 5 2022, 11:37 AM

Harbormaster completed remote builds in B162956: Diff 427380.May 5 2022, 12:06 PM

wenlei edited the summary of this revision. (Show Details)May 5 2022, 3:16 PM

wenlei added inline comments.May 5 2022, 10:12 PM

llvm/tools/llvm-profgen/CSPreInliner.cpp
163	One concern about using `getMaxCount` -- that will have more instability and variation run to run. How about using HotCountThreshold (or N * HotCountThresold, or a set percentile) to stabilize? The `NormalizedHotness` doesn't have to be between 0 and 1. Or if we want it to be between 0 and 1, we can also do `min(NormalizedHotness, 1)`.
164–165	Perhaps rename `Position` as `NormalizedHotness`?
166–169	How about we simplify the knobs, i.e. remove flag `preinline-hot-callsite-threshold-multiplier` and `preinline-hot-callsite-threshold-constant`. Make `Multiplier` a constant, and we can still effectively tune the multiplier via setting `sample-profile-hot-inline-threshold`. Make `PreInlineHotCallsiteThresholdConstant` a constant and we can still effectively tune it via `sample-profile-cold-inline-threshold`. We could also consider setting `SampleColdCallSiteThreshold = 1` in PreInliner (current we set it to 0). With the above, I think if we can simply switches (avoid adding new ones) without scarifying flexibility for tuning.

hoy added inline comments.May 5 2022, 10:58 PM

llvm/tools/llvm-profgen/CSPreInliner.cpp
163	(or N * HotCountThresold, or a set percentile) to stabilize? You mean something like Position = N* HotCountThresold ? BTW, what is N? Using a percentile should have a better stability. Position is similar to (1-percentile).
164–165	Sounds good.
166–169	Make Multiplier a constant, and we can still effectively tune the multiplier via setting sample-profile-hot-inline-threshold. This should work since SampleHotCallSiteThreshold is only used here. One thing is the hot threshold may look very different from what the compiler uses, because of the multiplier. We could use the same heuristics on the compiler as well. Make PreInlineHotCallsiteThresholdConstant a constant and we can still effectively tune it via sample-profile-cold-inline-threshold This may also affect cold contexts. Enlarging `sample-profile-cold-inline-threshold` , say from 0 to 5, may result in a lot more cold contexts in the profile. This is because we currently treat cold functions as having the same hotness, and only use the cold threshold for them instead of computing a linear threshold.

wenlei added inline comments.May 5 2022, 11:14 PM

llvm/tools/llvm-profgen/CSPreInliner.cpp
163	BTW, what is N? N can be arbitrary constant, depending on the range we want to normalize. Some random example: (3 * 99% count), or (2 * 95% count) You mean something like Position = N* HotCountThresold ? No, just replace getMaxCount with HotCountThresold or a precentile count. Using a percentile should have a better stability. Position is similar to (1-percentile). If you use (3 * 99% count) as the upper bound of range, then for count = (5 * 99% count), normalized hotness will be larger than 1, unless we manually cap it.
166–169	One thing is the hot threshold may look very different from what the compiler uses, because of the multiplier. We could use the same heuristics on the compiler as well. I think it's ok for compiler and preinliner to have different thresholds, because one is based on IR, and the other is based on different proxy. This may also affect cold contexts. Enlarging sample-profile-cold-inline-threshold , say from 0 to 5, may result in a lot more cold contexts in the profile. This is because we currently treat cold functions as having the same hotness, and only use the cold threshold for them instead of computing a linear threshold. Maybe you didn't understand what I said. What I was suggesting is basically this `+ SampleColdCallSiteThreshold + PreInlineHotCallsiteThresholdConstant` -> `+ SampleColdCallSiteThreshold`. Then we tune `SampleColdCallSiteThreshold` alone, which is equivalent. I didn't find other use of `SampleColdCallSiteThreshold` in llvm-profgen.

hoy added inline comments.May 5 2022, 11:31 PM

llvm/tools/llvm-profgen/CSPreInliner.cpp
163	I see. Yeah, choosing a value smaller than getMaxCount may need caping to avoid threshold going ridiculous. But capping means values above the cap won't be treated linearly. Also positioning based on HotCountThresold makes it service-dependent again. The value at a given cutoff could be very different.
166–169	`SampleColdCallSiteThreshold` is also used around line 170: if (Candidate.CallsiteCount <= ColdCountThreshold) SampleThreshold = SampleColdCallSiteThreshold; I didn't choose to go linearly for every sample. Cold samples share the same threshold for now. I did this because cold samples are many and size-sensitive, changing `SampleColdCallSiteThreshold` from 0 to 1 can cause profile grow a lot. Linear threshold for cold samples probably doesn't make a lot sense.

wenlei added inline comments.May 5 2022, 11:58 PM

llvm/tools/llvm-profgen/CSPreInliner.cpp
163	But capping means values above the cap won't be treated linearly. That's why we can use a factor N, or simply use a different percentile, say 20%, which should be very large, and still much more stable than max. Also positioning based on HotCountThresold makes it service-dependent again. The value at a given cutoff could be very different. How is getMaxCount not service dependent then?
166–169	Ah, I see. I missed line 170. But the way you have it setup now creates a cliff for the heuristic: for Candidate.CallsiteCount == ColdCountThreshold, threshold is SampleColdCallSiteThreshold for Candidate.CallsiteCount == ColdCountThreshold + 1, threshold is SampleColdCallSiteThreshold + PreInlineHotCallsiteThresholdConstant (can be large number from flag) We don't need to have linear threshold for cold samples, but it makes more sense for the heuristic to be consecutive without cliffs. If all we need here is to avoid changing SampleColdCallSiteThreshold to1, perhaps making PreInlineHotCallsiteThresholdConstant a constant 1 is good enough -- it doesn't need to be tunable. Using 1 also makes the heuristic consecutive. I was hoping to avoid explosion of tuning knobs and only expose those that are really necessary.

hoy added inline comments.May 6 2022, 10:13 AM

llvm/tools/llvm-profgen/CSPreInliner.cpp
163	I see, making it very large makes sense. 20% or 10% or should be close to getMaxCount while be more stable. As long as the value is big enough to cover most samples, it's not service-dependent. I was concerning about the samples that are not covered or not getting a linear threshold.
166–169	Making PreInlineHotCallsiteThresholdConstant sounds good. Actually I never tried other values during my tuning.

Addressing comments.

wenlei added inline comments.May 6 2022, 10:43 AM

llvm/tools/llvm-profgen/CSPreInliner.cpp
164	Use the count at 10% cutff to cap the threshold. nit: people should be able to figure out what this is all about, but this comment can be confusing. maybe let's just make it more clear that "we normalize hotness to be [0,1], then linearly adjust threshold based on normalized hotness. " That comment couples with more explicit code for readability. NormalizationUpperBound = ProfileSummaryBuilder::getEntryForPercentile(...) NormalizationLowerBound = ColdCountThreshold (or ProfileSummaryBuilder::getEntryForPercentile(...) ) NormalizedHotness = ... SampleThreshold = ...
172	Perhaps worth a comment for this `+1` (so later on others don't think it's non-critical and attempt to clean it up)

hoy added inline comments.May 6 2022, 10:58 AM

llvm/tools/llvm-profgen/CSPreInliner.cpp
164	Sounds good. Also added comment for the 10% cutoff.
172	Done.

Updating D125023: [CSSPGO][Preinliner] Use linear threshold to drive inline decision.

thanks, lgtm assuming performance is still good with the final version.

This revision is now accepted and ready to land.May 6 2022, 11:01 AM

In D125023#3497197, @wenlei wrote:

thanks, lgtm assuming performance is still good with the final version.

Perf results came back good. Landing.

This revision was landed with ongoing or failed builds.May 8 2022, 10:23 PM

Closed by commit rGa4190037fac0: [CSSPGO][Preinliner] Use linear threshold to drive inline decision. (authored by hoy). · Explain Why

This revision was automatically updated to reflect the committed changes.

hoy added a commit: rGa4190037fac0: [CSSPGO][Preinliner] Use linear threshold to drive inline decision..

Revision Contents

Path

Size

llvm/

tools/

llvm-profgen/

8 lines

34 lines

2 lines

5 lines

Diff 427977

llvm/tools/llvm-profgen/CSPreInliner.h

	Show First 20 Lines • Show All 62 Lines • ▼ Show 20 Lines
	// The PreInliner estimates inline decision using hotness from profile			// The PreInliner estimates inline decision using hotness from profile
	// and cost estimation from machine code size. It helps merges context			// and cost estimation from machine code size. It helps merges context
	// profile globally and achieves better post-inine profile quality, which			// profile globally and achieves better post-inine profile quality, which
	// otherwise won't be possible for ThinLTO. It also reduce context profile			// otherwise won't be possible for ThinLTO. It also reduce context profile
	// size by only keep context that is estimated to be inlined.			// size by only keep context that is estimated to be inlined.
	class CSPreInliner {			class CSPreInliner {
	public:			public:
	CSPreInliner(SampleProfileMap &Profiles, ProfiledBinary &Binary,			CSPreInliner(SampleProfileMap &Profiles, ProfiledBinary &Binary,
	uint64_t HotThreshold, uint64_t ColdThreshold);			ProfileSummary *Summary);
	void run();			void run();

	private:			private:
	bool getInlineCandidates(ProfiledCandidateQueue &CQueue,			bool getInlineCandidates(ProfiledCandidateQueue &CQueue,
	const FunctionSamples *FCallerContextSamples);			const FunctionSamples *FCallerContextSamples);
	std::vector<StringRef> buildTopDownOrder();			std::vector<StringRef> buildTopDownOrder();
	void processFunction(StringRef Name);			void processFunction(StringRef Name);
	bool shouldInline(ProfiledInlineCandidate &Candidate);			bool shouldInline(ProfiledInlineCandidate &Candidate);
	uint32_t getFuncSize(const FunctionSamples &FSamples);			uint32_t getFuncSize(const FunctionSamples &FSamples);
	bool UseContextCost;			bool UseContextCost;
	SampleContextTracker ContextTracker;			SampleContextTracker ContextTracker;
	SampleProfileMap &ProfileMap;			SampleProfileMap &ProfileMap;
	ProfiledBinary &Binary;			ProfiledBinary &Binary;
				ProfileSummary *Summary;
	// Count thresholds to answer isHotCount and isColdCount queries.
	// Mirrors the threshold in ProfileSummaryInfo.
	uint64_t HotCountThreshold;
	uint64_t ColdCountThreshold;
	};			};

	} // end namespace sampleprof			} // end namespace sampleprof
	} // end namespace llvm			} // end namespace llvm

	#endif			#endif

llvm/tools/llvm-profgen/CSPreInliner.cpp

Show First 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	cl::opt<bool> UseContextCostForPreInliner(
cl::desc("Use context-sensitive byte size cost for preinliner decisions"));		cl::desc("Use context-sensitive byte size cost for preinliner decisions"));

static cl::opt<bool> SamplePreInlineReplay(		static cl::opt<bool> SamplePreInlineReplay(
"csspgo-replay-preinline", cl::Hidden, cl::init(false),		"csspgo-replay-preinline", cl::Hidden, cl::init(false),
cl::desc(		cl::desc(
"Replay previous inlining and adjust context profile accordingly"));		"Replay previous inlining and adjust context profile accordingly"));

CSPreInliner::CSPreInliner(SampleProfileMap &Profiles, ProfiledBinary &Binary,		CSPreInliner::CSPreInliner(SampleProfileMap &Profiles, ProfiledBinary &Binary,
uint64_t HotThreshold, uint64_t ColdThreshold)		ProfileSummary *Summary)
: UseContextCost(UseContextCostForPreInliner),		: UseContextCost(UseContextCostForPreInliner),
// TODO: Pass in a guid-to-name map in order for		// TODO: Pass in a guid-to-name map in order for
// ContextTracker.getFuncNameFor to work, if `Profiles` can have md5 codes		// ContextTracker.getFuncNameFor to work, if `Profiles` can have md5 codes
// as their profile context.		// as their profile context.
ContextTracker(Profiles, nullptr), ProfileMap(Profiles), Binary(Binary),		ContextTracker(Profiles, nullptr), ProfileMap(Profiles), Binary(Binary),
HotCountThreshold(HotThreshold), ColdCountThreshold(ColdThreshold) {		Summary(Summary) {
// Set default preinliner hot/cold call site threshold tuned with CSSPGO.		// Set default preinliner hot/cold call site threshold tuned with CSSPGO.
// for good performance with reasonable profile size.		// for good performance with reasonable profile size.
if (!SampleHotCallSiteThreshold.getNumOccurrences())		if (!SampleHotCallSiteThreshold.getNumOccurrences())
SampleHotCallSiteThreshold = 1500;		SampleHotCallSiteThreshold = 1500;
if (!SampleColdCallSiteThreshold.getNumOccurrences())		if (!SampleColdCallSiteThreshold.getNumOccurrences())
SampleColdCallSiteThreshold = 0;		SampleColdCallSiteThreshold = 0;
}		}

▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines

bool CSPreInliner::shouldInline(ProfiledInlineCandidate &Candidate) {		bool CSPreInliner::shouldInline(ProfiledInlineCandidate &Candidate) {
// If replay inline is requested, simply follow the inline decision of the		// If replay inline is requested, simply follow the inline decision of the
// profiled binary.		// profiled binary.
if (SamplePreInlineReplay)		if (SamplePreInlineReplay)
return Candidate.CalleeSamples->getContext().hasAttribute(		return Candidate.CalleeSamples->getContext().hasAttribute(
ContextWasInlined);		ContextWasInlined);

// Adjust threshold based on call site hotness, only do this for callsite
// prioritized inliner because otherwise cost-benefit check is done earlier.
unsigned int SampleThreshold = SampleColdCallSiteThreshold;		unsigned int SampleThreshold = SampleColdCallSiteThreshold;
if (Candidate.CallsiteCount > HotCountThreshold)		uint64_t ColdCountThreshold = ProfileSummaryBuilder::getColdCountThreshold(
SampleThreshold = SampleHotCallSiteThreshold;		(Summary->getDetailedSummary()));

// TODO: for small cold functions, we may inlined them and we need to keep		if (Candidate.CallsiteCount <= ColdCountThreshold)
// context profile accordingly.
if (Candidate.CallsiteCount < ColdCountThreshold)
SampleThreshold = SampleColdCallSiteThreshold;		SampleThreshold = SampleColdCallSiteThreshold;
		else {
		// Linearly adjust threshold based on normalized hotness, i.e, a value in
		// [0,1]. Use 10% cutoff instead of the max count as the normalization
		wenleiUnsubmitted Not Done Reply Inline Actions One concern about using `getMaxCount` -- that will have more instability and variation run to run. How about using HotCountThreshold (or N * HotCountThresold, or a set percentile) to stabilize? The `NormalizedHotness` doesn't have to be between 0 and 1. Or if we want it to be between 0 and 1, we can also do `min(NormalizedHotness, 1)`. wenlei: One concern about using `getMaxCount` -- that will have more instability and variation run to…
		hoyAuthorUnsubmitted Done Reply Inline Actions (or N * HotCountThresold, or a set percentile) to stabilize? You mean something like Position = N* HotCountThresold ? BTW, what is N? Using a percentile should have a better stability. Position is similar to (1-percentile). hoy: > (or N * HotCountThresold, or a set percentile) to stabilize? You mean something like…
		wenleiUnsubmitted Not Done Reply Inline Actions BTW, what is N? N can be arbitrary constant, depending on the range we want to normalize. Some random example: (3 * 99% count), or (2 * 95% count) You mean something like Position = N* HotCountThresold ? No, just replace getMaxCount with HotCountThresold or a precentile count. Using a percentile should have a better stability. Position is similar to (1-percentile). If you use (3 * 99% count) as the upper bound of range, then for count = (5 * 99% count), normalized hotness will be larger than 1, unless we manually cap it. wenlei: > BTW, what is N? N can be arbitrary constant, depending on the range we want to normalize.
		hoyAuthorUnsubmitted Done Reply Inline Actions I see. Yeah, choosing a value smaller than getMaxCount may need caping to avoid threshold going ridiculous. But capping means values above the cap won't be treated linearly. Also positioning based on HotCountThresold makes it service-dependent again. The value at a given cutoff could be very different. hoy: I see. Yeah, choosing a value smaller than getMaxCount may need caping to avoid threshold going…
		wenleiUnsubmitted Not Done Reply Inline Actions But capping means values above the cap won't be treated linearly. That's why we can use a factor N, or simply use a different percentile, say 20%, which should be very large, and still much more stable than max. Also positioning based on HotCountThresold makes it service-dependent again. The value at a given cutoff could be very different. How is getMaxCount not service dependent then? wenlei: > But capping means values above the cap won't be treated linearly. That's why we can use a…
		hoyAuthorUnsubmitted Done Reply Inline Actions I see, making it very large makes sense. 20% or 10% or should be close to getMaxCount while be more stable. As long as the value is big enough to cover most samples, it's not service-dependent. I was concerning about the samples that are not covered or not getting a linear threshold. hoy: I see, making it very large makes sense. 20% or 10% or should be close to getMaxCount while be…
		// upperbound for stability.
		wenleiUnsubmitted Not Done Reply Inline Actions Use the count at 10% cutff to cap the threshold. nit: people should be able to figure out what this is all about, but this comment can be confusing. maybe let's just make it more clear that "we normalize hotness to be [0,1], then linearly adjust threshold based on normalized hotness. " That comment couples with more explicit code for readability. NormalizationUpperBound = ProfileSummaryBuilder::getEntryForPercentile(...) NormalizationLowerBound = ColdCountThreshold (or ProfileSummaryBuilder::getEntryForPercentile(...) ) NormalizedHotness = ... SampleThreshold = ... wenlei: > Use the count at 10% cutff to cap the threshold. nit: people should be able to figure out…
		hoyAuthorUnsubmitted Done Reply Inline Actions Sounds good. Also added comment for the 10% cutoff. hoy: Sounds good. Also added comment for the 10% cutoff.
		double NormalizationUpperBound =
		wenleiUnsubmitted Not Done Reply Inline Actions Perhaps rename `Position` as `NormalizedHotness`? wenlei: Perhaps rename `Position` as `NormalizedHotness`?
		hoyAuthorUnsubmitted Done Reply Inline Actions Sounds good. hoy: Sounds good.
		ProfileSummaryBuilder::getEntryForPercentile(
		Summary->getDetailedSummary(), 100000 /* 10% */)
		.MinCount;
		double NormalizationLowerBound = ColdCountThreshold;
		wenleiUnsubmitted Not Done Reply Inline Actions How about we simplify the knobs, i.e. remove flag `preinline-hot-callsite-threshold-multiplier` and `preinline-hot-callsite-threshold-constant`. Make `Multiplier` a constant, and we can still effectively tune the multiplier via setting `sample-profile-hot-inline-threshold`. Make `PreInlineHotCallsiteThresholdConstant` a constant and we can still effectively tune it via `sample-profile-cold-inline-threshold`. We could also consider setting `SampleColdCallSiteThreshold = 1` in PreInliner (current we set it to 0). With the above, I think if we can simply switches (avoid adding new ones) without scarifying flexibility for tuning. wenlei: How about we simplify the knobs, i.e. remove flag `preinline-hot-callsite-threshold-multiplier`…
		hoyAuthorUnsubmitted Done Reply Inline Actions Make Multiplier a constant, and we can still effectively tune the multiplier via setting sample-profile-hot-inline-threshold. This should work since SampleHotCallSiteThreshold is only used here. One thing is the hot threshold may look very different from what the compiler uses, because of the multiplier. We could use the same heuristics on the compiler as well. Make PreInlineHotCallsiteThresholdConstant a constant and we can still effectively tune it via sample-profile-cold-inline-threshold This may also affect cold contexts. Enlarging `sample-profile-cold-inline-threshold` , say from 0 to 5, may result in a lot more cold contexts in the profile. This is because we currently treat cold functions as having the same hotness, and only use the cold threshold for them instead of computing a linear threshold. hoy: > Make Multiplier a constant, and we can still effectively tune the multiplier via setting…
		wenleiUnsubmitted Not Done Reply Inline Actions One thing is the hot threshold may look very different from what the compiler uses, because of the multiplier. We could use the same heuristics on the compiler as well. I think it's ok for compiler and preinliner to have different thresholds, because one is based on IR, and the other is based on different proxy. This may also affect cold contexts. Enlarging sample-profile-cold-inline-threshold , say from 0 to 5, may result in a lot more cold contexts in the profile. This is because we currently treat cold functions as having the same hotness, and only use the cold threshold for them instead of computing a linear threshold. Maybe you didn't understand what I said. What I was suggesting is basically this `+ SampleColdCallSiteThreshold + PreInlineHotCallsiteThresholdConstant` -> `+ SampleColdCallSiteThreshold`. Then we tune `SampleColdCallSiteThreshold` alone, which is equivalent. I didn't find other use of `SampleColdCallSiteThreshold` in llvm-profgen. wenlei: > One thing is the hot threshold may look very different from what the compiler uses, because…
		hoyAuthorUnsubmitted Done Reply Inline Actions `SampleColdCallSiteThreshold` is also used around line 170: if (Candidate.CallsiteCount <= ColdCountThreshold) SampleThreshold = SampleColdCallSiteThreshold; I didn't choose to go linearly for every sample. Cold samples share the same threshold for now. I did this because cold samples are many and size-sensitive, changing `SampleColdCallSiteThreshold` from 0 to 1 can cause profile grow a lot. Linear threshold for cold samples probably doesn't make a lot sense. hoy: `SampleColdCallSiteThreshold` is also used around line 170: if (Candidate.CallsiteCount <=…
		wenleiUnsubmitted Not Done Reply Inline Actions Ah, I see. I missed line 170. But the way you have it setup now creates a cliff for the heuristic: for Candidate.CallsiteCount == ColdCountThreshold, threshold is SampleColdCallSiteThreshold for Candidate.CallsiteCount == ColdCountThreshold + 1, threshold is SampleColdCallSiteThreshold + PreInlineHotCallsiteThresholdConstant (can be large number from flag) We don't need to have linear threshold for cold samples, but it makes more sense for the heuristic to be consecutive without cliffs. If all we need here is to avoid changing SampleColdCallSiteThreshold to1, perhaps making PreInlineHotCallsiteThresholdConstant a constant 1 is good enough -- it doesn't need to be tunable. Using 1 also makes the heuristic consecutive. I was hoping to avoid explosion of tuning knobs and only expose those that are really necessary. wenlei: Ah, I see. I missed line 170. But the way you have it setup now creates a cliff for the…
		hoyAuthorUnsubmitted Done Reply Inline Actions Making PreInlineHotCallsiteThresholdConstant sounds good. Actually I never tried other values during my tuning. hoy: Making PreInlineHotCallsiteThresholdConstant sounds good. Actually I never tried other values…
		double NormalizedHotness =
		(Candidate.CallsiteCount - NormalizationLowerBound) /
		(NormalizationUpperBound - NormalizationLowerBound);
		wenleiUnsubmitted Not Done Reply Inline Actions Perhaps worth a comment for this `+1` (so later on others don't think it's non-critical and attempt to clean it up) wenlei: Perhaps worth a comment for this `+1` (so later on others don't think it's non-critical and…
		hoyAuthorUnsubmitted Done Reply Inline Actions Done. hoy: Done.
		if (NormalizedHotness > 1.0)
		NormalizedHotness = 1.0;
		// Add 1 to to ensure hot callsites get a non-zero threshold, which could
		// happen when SampleColdCallSiteThreshold is 0. This is when we do not
		// want any inlining for cold callsites.
		SampleThreshold = SampleHotCallSiteThreshold * NormalizedHotness * 100 +
		SampleColdCallSiteThreshold + 1;
		}

return (Candidate.SizeCost < SampleThreshold);		return (Candidate.SizeCost < SampleThreshold);
}		}

void CSPreInliner::processFunction(const StringRef Name) {		void CSPreInliner::processFunction(const StringRef Name) {
FunctionSamples *FSamples = ContextTracker.getBaseSamplesFor(Name);		FunctionSamples *FSamples = ContextTracker.getBaseSamplesFor(Name);
if (!FSamples)		if (!FSamples)
return;		return;
▲ Show 20 Lines • Show All 116 Lines • Show Last 20 Lines

llvm/tools/llvm-profgen/ProfileGenerator.h

Show First 20 Lines • Show All 116 Lines • ▼ Show 20 Lines	protected:

// Thresholds from profile summary to answer isHotCount/isColdCount queries.		// Thresholds from profile summary to answer isHotCount/isColdCount queries.
uint64_t HotCountThreshold;		uint64_t HotCountThreshold;

uint64_t ColdCountThreshold;		uint64_t ColdCountThreshold;

ProfiledBinary *Binary = nullptr;		ProfiledBinary *Binary = nullptr;

		std::unique_ptr<ProfileSummary> Summary;

// Used by SampleProfileWriter		// Used by SampleProfileWriter
SampleProfileMap ProfileMap;		SampleProfileMap ProfileMap;

const ContextSampleCounterMap *SampleCounters = nullptr;		const ContextSampleCounterMap *SampleCounters = nullptr;
};		};

class ProfileGenerator : public ProfileGeneratorBase {		class ProfileGenerator : public ProfileGeneratorBase {

▲ Show 20 Lines • Show All 202 Lines • Show Last 20 Lines

llvm/tools/llvm-profgen/ProfileGenerator.cpp

Show First 20 Lines • Show All 919 Lines • ▼ Show 20 Lines
void CSProfileGenerator::postProcessProfiles() {		void CSProfileGenerator::postProcessProfiles() {
// Compute hot/cold threshold based on profile. This will be used for cold		// Compute hot/cold threshold based on profile. This will be used for cold
// context profile merging/trimming.		// context profile merging/trimming.
computeSummaryAndThreshold();		computeSummaryAndThreshold();

// Run global pre-inliner to adjust/merge context profile based on estimated		// Run global pre-inliner to adjust/merge context profile based on estimated
// inline decisions.		// inline decisions.
if (EnableCSPreInliner) {		if (EnableCSPreInliner) {
CSPreInliner(ProfileMap, *Binary, HotCountThreshold, ColdCountThreshold)		CSPreInliner(ProfileMap, *Binary, Summary.get()).run();
.run();
// Turn off the profile merger by default unless it is explicitly enabled.		// Turn off the profile merger by default unless it is explicitly enabled.
if (!CSProfMergeColdContext.getNumOccurrences())		if (!CSProfMergeColdContext.getNumOccurrences())
CSProfMergeColdContext = false;		CSProfMergeColdContext = false;
}		}

// Trim and merge cold context profile using cold threshold above.		// Trim and merge cold context profile using cold threshold above.
if (TrimColdProfile \|\| CSProfMergeColdContext) {		if (TrimColdProfile \|\| CSProfMergeColdContext) {
SampleContextTrimmer(ProfileMap)		SampleContextTrimmer(ProfileMap)
Show All 13 Lines	if (GenCSNestedProfile) {
CSProfileConverter CSConverter(ProfileMap);		CSProfileConverter CSConverter(ProfileMap);
CSConverter.convertProfiles();		CSConverter.convertProfiles();
FunctionSamples::ProfileIsCS = false;		FunctionSamples::ProfileIsCS = false;
}		}
}		}

void ProfileGeneratorBase::computeSummaryAndThreshold() {		void ProfileGeneratorBase::computeSummaryAndThreshold() {
SampleProfileSummaryBuilder Builder(ProfileSummaryBuilder::DefaultCutoffs);		SampleProfileSummaryBuilder Builder(ProfileSummaryBuilder::DefaultCutoffs);
auto Summary = Builder.computeSummaryForProfiles(ProfileMap);		Summary = Builder.computeSummaryForProfiles(ProfileMap);
HotCountThreshold = ProfileSummaryBuilder::getHotCountThreshold(		HotCountThreshold = ProfileSummaryBuilder::getHotCountThreshold(
(Summary->getDetailedSummary()));		(Summary->getDetailedSummary()));
ColdCountThreshold = ProfileSummaryBuilder::getColdCountThreshold(		ColdCountThreshold = ProfileSummaryBuilder::getColdCountThreshold(
(Summary->getDetailedSummary()));		(Summary->getDetailedSummary()));
}		}

void ProfileGeneratorBase::extractProbesFromRange(		void ProfileGeneratorBase::extractProbesFromRange(
const RangeSample &RangeCounter, ProbeCounterMap &ProbeCounter,		const RangeSample &RangeCounter, ProbeCounterMap &ProbeCounter,
▲ Show 20 Lines • Show All 192 Lines • Show Last 20 Lines