This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
docs/CommandGuide/
-
CommandGuide/
-
llvm-profdata.rst
-
include/llvm/ProfileData/
-
llvm/
-
ProfileData/
1/1
InstrProf.h
-
InstrProfWriter.h
-
lib/
-
ProfileData/
-
InstrProf.cpp
-
InstrProfWriter.cpp
-
ProfileSummaryBuilder.cpp
-
Transforms/Instrumentation/
-
Instrumentation/
2/6
PGOInstrumentation.cpp
-
test/
-
Transforms/PGOProfile/
-
PGOProfile/
-
Inputs/
-
sample-profile.proftext
-
suppl-profile.proftext
-
suppl-profile.ll
-
tools/llvm-profdata/
-
llvm-profdata/
-
Inputs/
1/2
mix_instr.proftext
-
mix_sample.proftext
-
overflow-instr.test
-
suppl-instr-with-sample.test
-
tools/llvm-profdata/
-
llvm-profdata/
8/14
llvm-profdata.cpp

Differential D81981

[PGO] Supplement PGO profile with Sample profile
ClosedPublic

Authored by wmi on Jun 16 2020, 4:42 PM.

Download Raw Diff

Details

Reviewers

xur
davidxl
wenlei

Commits

rGa23f62343cb7: Supplement instr profile with sample profile.

Summary

PGO profile is usually more precise than sample profile. However, PGO profile needs to be collected from loadtest and loadtest may not be representative enough to the production workload. Sample profile collected from production can be used as a supplement -- especially for functions cold in loadtest but warm/hot in production, we can use function entry count in sample profile to scale up the related function in PGO profile.

The implementation contains changes in compiler side and llvm-profdata side. In compiler side, during PGO instrumentation and profile-use phase, the patch will guarantee there is a counter in entry block for each function and the counter will be at the first entry in the counter vector. We will use llvm-profdata to merge PGO profile and sample profile, and the output will be a new PGO profile with some counters scaled up. If a function is never executed in PGO profile but hot in sample profile, llvm-profdata will reset the entry count using the related entry count in sample profile multiplied by a scalefactor, at the same time leaving the rest of the counters as zero. If a function has non-zero/cold entry count, but is hot in sample profile, all the counters inside of the function will be scaled up equally.

In the long run, it may be useful to let compiler support using PGO profile and sample profile at the same time, but that requires more careful design and more substantial changes to make two profiles work seamlessly. The patch here serves as a simple intermediate solution.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

wmi created this revision.Jun 16 2020, 4:42 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 16 2020, 4:42 PM

Herald added subscribers: hiraditya, kristof.beyls, eraman. · View Herald Transcript

I think it is good to have an entry counter always, so that the profile dump is more readable. Do you have data showing the instrumentation overhead and profile size impact (clang and some large app)?

In D81981#2099070, @davidxl wrote:

I think it is good to have an entry counter always, so that the profile dump is more readable. Do you have data showing the instrumentation overhead and profile size impact (clang and some large app)?

Yes, I tried clang. The instrumentation runtime overhead increases by about 0.8%. The raw profile size increases by 1.8%. The zipped profile size increases by 0.15%.
Right now in the patch, inserting entry counter is guarded by a flag with default value being false.

Why is the profile size increase? I expect the number of instrumented blocks remain mostly unchanged.

The reason for the question is that if the overhead is low, I think we should make the default to be true.

In D81981#2099452, @davidxl wrote:

Why is the profile size increase? I expect the number of instrumented blocks remain mostly unchanged.

The reason for the question is that if the overhead is low, I think we should make the default to be true.

For function entry bb which have multiple successors, the existing algorithm in FuncPGOInstrumentation<Edge, BBInfo>::getInstrBB will insert the counter in all its successors. My current implementation simply adds a counter in entry block so in that case, it introduces redundent counter.

I can improve it by selecting a successor to not insert counter for it since it can be inferred from the counters surrounding it. With that implemented, I expect the profile size will be unchanged.

It's an interesting idea to improve the PGO profile quality with sample profiles. Thanks for working on this!

In D81981#2099580, @wmi wrote:

In D81981#2099452, @davidxl wrote:

Why is the profile size increase? I expect the number of instrumented blocks remain mostly unchanged.

The reason for the question is that if the overhead is low, I think we should make the default to be true.

For function entry bb which have multiple successors, the existing algorithm in FuncPGOInstrumentation<Edge, BBInfo>::getInstrBB will insert the counter in all its successors. My current implementation simply adds a counter in entry block so in that case, it introduces redundent counter.

I can improve it by selecting a successor to not insert counter for it since it can be inferred from the counters surrounding it. With that implemented, I expect the profile size will be unchanged.

The current PGO instrumentation is based on MST. Changing the instrumentation may require changes in how the counts of non-MST-edges are calculated (in PGOUseFunc::setInstrumentedCounts). So maybe adjust the MST to remove the sibling edge ?

llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
777	Remove the lambda and use the newly added static function instead?

yes -- see Hongtao's reply.

If how we select BB to instrument depends on a switch, we would need instrumentation build and optimizing build to have consistent switch, otherwise counters could mismatch even if CFG checksum matches? I guess that's one reason why it'd be good to avoid different ways of selecting BBs.

Would it be possible to tweak/cheat the edge weights just for MST so entry BB is pinned to be non-MST node hence guaranteed to be instrumented directly?

There should not be an option which makes things complicated as Wenlei described. Instead, once this change is done, there would be a version bump (both raw and index). The index format needs to be backward compatible, so there needs to be some version specific handling there (can be removed later).

In D81981#2099702, @wenlei wrote:

If how we select BB to instrument depends on a switch, we would need instrumentation build and optimizing build to have consistent switch, otherwise counters could mismatch even if CFG checksum matches? I guess that's one reason why it'd be good to avoid different ways of selecting BBs.

PGO has already had a common practice to ensure profile-gen and profile-use to have the same flags. So the flag to enable inserting counter in entry block won't cause too much trouble.

Would it be possible to tweak/cheat the edge weights just for MST so entry BB is pinned to be non-MST node hence guaranteed to be instrumented directly?

I consider that but I don't know how to do that. From what I currently understand, MST is to select some edges with highest frequencies and pruning those edges won't affect the inference of the profile of all the edges. We can prune edge when selecting MST but we cannot guarantee a node is selected as an instrumented BB during that phase. Deciding which BB to instrument is done in getInstrBB -- by choosing whether to instrument src node or dst node for each edge.

I am new to this part so if you know there is way to do that, please let me know. That is very appreciated.

In D81981#2099769, @davidxl wrote:

The index format needs to be backward compatible, so there needs to be some version specific handling there (can be removed later).

I don't understand this part. Could you elaborate it -- why index format is different from raw format in backward compatiblity, and what is the version specific handling?

For function entry bb which have multiple successors, the existing algorithm in FuncPGOInstrumentation<Edge, BBInfo>::getInstrBB will insert the counter in all its successors. My current implementation simply adds a counter in entry block so in that case, it introduces redundent counter.

I can improve it by selecting a successor to not insert counter for it since it can be inferred from the counters surrounding it. With that implemented, I expect the profile size will be unchanged.

The current PGO instrumentation is based on MST.

Yes, it is based on two parts, selecting MST is one part and selecting src/dst node of each non-MST edge to instrument is another part.

Changing the instrumentation may require changes in how the counts of non-MST-edges are calculated (in PGOUseFunc::setInstrumentedCounts). So maybe adjust the MST to remove the sibling edge ?

The change to add entry BB as an instrumented BB is in function getInstrumentBBs which is shared by profile-gen and profile-use, so it will be consistent between profile-gen and profile-use. About adjusting MST to remove sibling edge, I feel it is inconsistent with current goal of MST selection. The goal of selecting MST is to avoid instrumenting the most frequent edges so we can minimize the cost. Removing a successor edge of the entry block is a different goal. Mixing these two goals will make things complicated. I feel it is simpler to add the change in the second part -- selecting between src and dst which node to instrument.

There is a use case that user check in indexed format profile for sources that do not change much (e.g. library code), thus the indexed format profile needs to be backward compatible. Raw profile has not such requirement.

For IR PGO, the compatibility requirement is nice to have, but it is probably not a hard requirement as there are other ways to easily break it -- for instance any early inliner changes or CFG cleanup pass changes can make the old profile unusable.

Also is it suffice to just never select the fake edge to entry in MST?

In D81981#2099825, @wmi wrote:

In D81981#2099769, @davidxl wrote:

The index format needs to be backward compatible, so there needs to be some version specific handling there (can be removed later).

I don't understand this part. Could you elaborate it -- why index format is different from raw format in backward compatiblity, and what is the version specific handling?

For function entry bb which have multiple successors, the existing algorithm in FuncPGOInstrumentation<Edge, BBInfo>::getInstrBB will insert the counter in all its successors. My current implementation simply adds a counter in entry block so in that case, it introduces redundent counter.

I can improve it by selecting a successor to not insert counter for it since it can be inferred from the counters surrounding it. With that implemented, I expect the profile size will be unchanged.

The current PGO instrumentation is based on MST.

Yes, it is based on two parts, selecting MST is one part and selecting src/dst node of each non-MST edge to instrument is another part.

Changing the instrumentation may require changes in how the counts of non-MST-edges are calculated (in PGOUseFunc::setInstrumentedCounts). So maybe adjust the MST to remove the sibling edge ?

The change to add entry BB as an instrumented BB is in function getInstrumentBBs which is shared by profile-gen and profile-use, so it will be consistent between profile-gen and profile-use. About adjusting MST to remove sibling edge, I feel it is inconsistent with current goal of MST selection. The goal of selecting MST is to avoid instrumenting the most frequent edges so we can minimize the cost. Removing a successor edge of the entry block is a different goal. Mixing these two goals will make things complicated. I feel it is simpler to add the change in the second part -- selecting between src and dst which node to instrument.

I see. Yes, it's reasonable to avoid changing MST edges, instead, to change which block to instrument for a given edge. I was thinking special logic may be needed for the edge count calculation as well, since it's related to where the instrumentation happens. This should work if we change both places.

Since only non-MST edges are instrumented, I was wondering alternatively we can remove an edge related to the entry block from MST to force the entry instrumented. I think removing the fake entry edge as David suggested is better than removing an outgoing sibling edge from the entry. Removing the fake entry edge from MST will result in one of the outgoing sibling edges added to MST, which in turn will cause the corresponding successor of the entry not instrumented.

Remove the compiler part since that part will be done in https://reviews.llvm.org/D82123.

Add an option -base-scale-function so people doesn't have to always compute the scale factor for PGO/SampleFDO profiles themselves. If user knows for some function its counter value is proportional to the total count of the execution, by specifying the function through -base-scale-function, llvm-profdata will compute the scale factor based on the counter values of the function.

https://reviews.llvm.org/D82123 to always instrument function entry BB has been committed guarded by a flag. https://reviews.llvm.org/D83024 to enable the flag by default is under review.

Can you take another look at the patch?

Can you first split the NFC part (refactoring part such as GetEntryForPercentile) out ?

llvm/include/llvm/ProfileData/InstrProf.h
681–682	document the parameters.

refactor GetEntryForPercentile out in https://reviews.llvm.org/D83439

Address David's comment.
Adjust comments, function names and flag names.

Fix a wrong flag name in test.

davidxl added inline comments.Jul 9 2020, 11:03 AM

llvm/tools/llvm-profdata/llvm-profdata.cpp
295	this refactoring can also be committed independently
515	make sample file path as the part of the option, so there is no need to handle the ordering.
530	Are these two weights comparable?
854	Is this flag tested?

wmi marked 5 inline comments as done.Jul 10 2020, 7:20 PM

wmi added inline comments.

llvm/tools/llvm-profdata/llvm-profdata.cpp
295	Done in https://reviews.llvm.org/D83521
515	Indeed that will save the ordering handle logic, but I want to use weighted_input to scale the count in sample profile to be roughtly the same as the count in instr profile. To support -supplement-instr-with-sample=<weight>,<filename> will be a little weird and increase complexity.
530	Yes, given "-weighted-input=2, instr_profile -weighted-input=3, sample_profile", that means we want to scale the count in sample profile by 3/2 before update the entry in instr profile.
854	Good point, add tests for this flag and the flag early-inline-size-threshold

Address David's comments.

I also plan to dump the functions cold in instr profile and hot in sample profile, and sort the output according to hotness in sample profile. That can be used to guide PGO users if they want to improve the representativeness of their loadtest. I leave that part in a separate patch for easier review.

I think this feature should be decoupled from the version change -- since this is an approximate anyway.

One way to do this is to use max count or total count as a reference point and compute the scale factor.

llvm/tools/llvm-profdata/llvm-profdata.cpp
447	when there is no scaling, setting instr count with sample count does not make sense. Perhaps just set it to be above cold threshold.

I think this feature should be decoupled from the version change -- since this is an approximate anyway. One way to do this is to use max count or total count as a reference point and compute the scale factor.

If it is uncoupled from the version change, for function with counter values not being 0 in instr profile, it is ok to scale all the counter values by a scale factor based on max count or total count. For function with all counter values being 0, we cannot uniformly scale up all the counter values because that will mess up the branch probability inside of the function. We want to set the entry BB counter to a hot value only so compiler can use static heuristic to compute the branch probability inside of the function. That is why entry BB counter is needed in this feature.

llvm/tools/llvm-profdata/llvm-profdata.cpp
447	Maybe I can make scalefactor an option, and requires user to provide either -scalefactor or -base-scale-function, so that user won't accidentally leave scalefactor to be 1. In this way, I can make sample profile to be the input of the option -supplement-instr-with-sample=, so I can remove the input profile ordering logic.

Address David's comment

If we don't need to handle all zero cases using scaling, we can remove the dependency on the always entry patch.

llvm/tools/llvm-profdata/llvm-profdata.cpp
401	handling this case (all zero case) in this way won't help much -- The branch Probablity pass will set all branch weights to 1 making all branch unbiased -- it is worse than using static heuristic based. The right way I think is to remove their profile entries completely. I assume the compiler pass later will treat such functions as unknown and not put into .text.unlikely.

wmi marked an inline comment as done.Jul 22 2020, 2:42 PM

wmi added inline comments.

llvm/tools/llvm-profdata/llvm-profdata.cpp
401	But I think for a large part of the functions missing in loadtest, they have all zero counter values, so it is better to have a way to handle them. Is it possible for PGO to handle the functions with only entry counter not being zero and with other counters all being zero in a special way -- for those functions, just set the entry count and skip the metadata setting inside of the function? So that those functions can use static profiling inside.

The PGOUse pass can choose not to annotate any branches with total weights == 0. Now the question becomes how do we tell PGOUse pass whether the entry should be set to 0 or leave it not set. There are two ways to do it (to signal it is not really cold, but unknown):

Remove the function from the indexed format profile;
set all counts to some sentinel value such as -1.

Inlining won't be helped unless there is a hot callsite to the all-zero count function -- but this should not exist. I think the major performance hit comes from 1) text.unlikely which may not be mlocked; and 2) all unbiased branches due to zero weights. So doing this depending it on entry count existence is fine, but we still to teach PGOUse to drop the body. I think a simpler design would be

At llvm_profdata side:

if the instrumentation cold function has enough internal counts, just scale up the max internal counts to be a multiple of hot threshold

if the cold function has all zero counts or we believe all their internal counts are not trustworthy (basically ignore step 1) with an option), we can simply discard the function entry completely (to signal this function is actually hot, but we don't know internal counts)

At PGOUse side:

if we don't find counters for a function, set the function's entry value to be above hot threshold (a function statically linked in should always have counts. If there are not counts, it means it is corrected by llvm-profdata).

Address David's comments.

The major change is to remove the dependence on always having entry counter in the profile. For function with all zero instr profile or most of zero instr profile, its counters will be set to all -1. All -1 counters indicates the internal profile for the function is unaccountable and also indicates the function is hot. PGO profile-use will drop all the internal counters while set the function entry count to be several times above hot threshold.

I choose to set all counters to all -1 instead of dropping the profile to express the indications above because I am afraid in rare case, PGO profile may be accidently used when building an unrelated target. If we set the functions to be hot when their profiles cannot be found, we may treat all the functions to be hot and that may bloat up the code and trigger compile-time issue.

davidxl added inline comments.Jul 27 2020, 9:22 AM

llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
1251	document the variable.
1256	Is it possible to have some blocks -1?
1679	oh, just move this comment to the variable decl.
llvm/test/tools/llvm-profdata/Inputs/mix_instr.proftext
12	Do we have a test case for all zero case and below the threshold case (considiered all zero)?
llvm/tools/llvm-profdata/llvm-profdata.cpp
425	Is it possible to delete the instprof record for the function from the profile?

wmi marked 4 inline comments as done.Jul 27 2020, 10:09 AM

wmi added inline comments.

llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
1256	I feel having all blocks -1 to indicate unpresentative profile for an actually hot function is simpler than having some blocks -1. That is because when we compute profile summary, we want to strip those unpresentative profile. If we change some blocks to -1 but keep the rest unchanged, those counters will still be used for computing profile summary.
1679	Ok, will do.
llvm/test/tools/llvm-profdata/Inputs/mix_instr.proftext
12	Yes, foo is intentionally created for that. The line 22 and line 42 in suppl-instr-with-sample.test test the cases where ratio of zero counter in foo is above or lower the threshold.
llvm/tools/llvm-profdata/llvm-profdata.cpp
425	One reason that I use all -1 as the indication that the function is hot and its profile is unpresentative is: user may build unrelated target together with the PGO optimized target in the same command. I know a lot of SampleFDO user does that to simplify their release. I imagine there could be PGO user doing that too. Another possiblity is to build test targets using the profile. If we delete the instprof record and treat all the functions without instprof to be hot during prof-use, we may accidently treat a lot of cold functions to be hot if the profile is applied on some unrelated targets or tests (tests may be partially related targets), and that may cause compile-time issue.

lgtm

This revision is now accepted and ready to land.Jul 27 2020, 10:59 AM

Closed by commit rGa23f62343cb7: Supplement instr profile with sample profile. (authored by wmi). · Explain WhyJul 27 2020, 9:23 PM

This revision was automatically updated to reflect the committed changes.

wmi added a commit: rGa23f62343cb7: Supplement instr profile with sample profile..

Revision Contents

Path

Size

llvm/

docs/

CommandGuide/

llvm-profdata.rst

24 lines

include/

llvm/

ProfileData/

InstrProf.h

12 lines

InstrProfWriter.h

2 lines

lib/

ProfileData/

InstrProf.cpp

15 lines

InstrProfWriter.cpp

2 lines

ProfileSummaryBuilder.cpp

11 lines

Transforms/

Instrumentation/

PGOInstrumentation.cpp

25 lines

test/

Transforms/

PGOProfile/

Inputs/

sample-profile.proftext

12 lines

suppl-profile.proftext

15 lines

suppl-profile.ll

37 lines

tools/

llvm-profdata/

Inputs/

mix_instr.proftext

25 lines

mix_sample.proftext

17 lines

overflow-instr.test

14 lines

suppl-instr-with-sample.test

102 lines

tools/

llvm-profdata/

llvm-profdata.cpp

201 lines

Diff 281104

llvm/docs/CommandGuide/llvm-profdata.rst

	Show First 20 Lines • Show All 155 Lines • ▼ Show 20 Lines
	This option can only be used with sample-based profile in extbinary format.			This option can only be used with sample-based profile in extbinary format.

	.. option:: -gen-partial-profile=[true\|false]			.. option:: -gen-partial-profile=[true\|false]

	Mark the profile to be a partial profile which only provides partial profile			Mark the profile to be a partial profile which only provides partial profile
	coverage for the optimized target. This option can only be used with			coverage for the optimized target. This option can only be used with
	sample-based profile in extbinary format.			sample-based profile in extbinary format.

				.. option:: -supplement-instr-with-sample=path_to_sample_profile

				Supplement an instrumentation profile with sample profile. The sample profile
				is the input of the flag. Output will be in instrumentation format (only works
				with -instr).

				.. option:: -zero-counter-threshold=threshold_float_number

				For the function which is cold in instr profile but hot in sample profile, if
				the ratio of the number of zero counters divided by the the total number of
				counters is above the threshold, the profile of the function will be regarded
				as being harmful for performance and will be dropped.

				.. option:: -instr-prof-cold-threshold=threshold_int_number

				User specified cold threshold for instr profile which will override the cold
				threshold got from profile summary.

				.. option:: -suppl-min-size-threshold=threshold_int_number

				If the size of a function is smaller than the threshold, assume it can be
				inlined by PGO early inliner and it will not be adjusted based on sample
				profile.

	EXAMPLES			EXAMPLES
	^^^^^^^^			^^^^^^^^
	Basic Usage			Basic Usage
	+++++++++++			+++++++++++
	Merge three profiles:			Merge three profiles:

	::			::

	▲ Show 20 Lines • Show All 182 Lines • Show Last 20 Lines

llvm/include/llvm/ProfileData/InstrProf.h

Show First 20 Lines • Show All 672 Lines • ▼ Show 20 Lines	struct InstrProfValueSiteRecord {
}		}
/// Sort ValueData Descending by Count		/// Sort ValueData Descending by Count
inline void sortByCount();		inline void sortByCount();

/// Merge data from another InstrProfValueSiteRecord		/// Merge data from another InstrProfValueSiteRecord
/// Optionally scale merged counts by \p Weight.		/// Optionally scale merged counts by \p Weight.
void merge(InstrProfValueSiteRecord &Input, uint64_t Weight,		void merge(InstrProfValueSiteRecord &Input, uint64_t Weight,
function_ref<void(instrprof_error)> Warn);		function_ref<void(instrprof_error)> Warn);
/// Scale up value profile data counts.		/// Scale up value profile data counts by N (Numerator) / D (Denominator).
void scale(uint64_t Weight, function_ref<void(instrprof_error)> Warn);		void scale(uint64_t N, uint64_t D, function_ref<void(instrprof_error)> Warn);
		davidxlUnsubmitted Done Reply Inline Actions document the parameters. davidxl: document the parameters.

/// Compute the overlap b/w this record and Input record.		/// Compute the overlap b/w this record and Input record.
void overlap(InstrProfValueSiteRecord &Input, uint32_t ValueKind,		void overlap(InstrProfValueSiteRecord &Input, uint32_t ValueKind,
OverlapStats &Overlap, OverlapStats &FuncLevelOverlap);		OverlapStats &Overlap, OverlapStats &FuncLevelOverlap);
};		};

/// Profiling information for a single function.		/// Profiling information for a single function.
struct InstrProfRecord {		struct InstrProfRecord {
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	void addValueData(uint32_t ValueKind, uint32_t Site,
InstrProfSymtab *SymTab);		InstrProfSymtab *SymTab);

/// Merge the counts in \p Other into this one.		/// Merge the counts in \p Other into this one.
/// Optionally scale merged counts by \p Weight.		/// Optionally scale merged counts by \p Weight.
void merge(InstrProfRecord &Other, uint64_t Weight,		void merge(InstrProfRecord &Other, uint64_t Weight,
function_ref<void(instrprof_error)> Warn);		function_ref<void(instrprof_error)> Warn);

/// Scale up profile counts (including value profile data) by		/// Scale up profile counts (including value profile data) by
/// \p Weight.		/// a factor of (N / D).
void scale(uint64_t Weight, function_ref<void(instrprof_error)> Warn);		void scale(uint64_t N, uint64_t D, function_ref<void(instrprof_error)> Warn);

/// Sort value profile data (per site) by count.		/// Sort value profile data (per site) by count.
void sortValueData() {		void sortValueData() {
for (uint32_t Kind = IPVK_First; Kind <= IPVK_Last; ++Kind)		for (uint32_t Kind = IPVK_First; Kind <= IPVK_Last; ++Kind)
for (auto &SR : getValueSitesForKind(Kind))		for (auto &SR : getValueSitesForKind(Kind))
SR.sortByCount();		SR.sortByCount();
}		}

▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	uint64_t remapValue(uint64_t Value, uint32_t ValueKind,
InstrProfSymtab *SymTab);		InstrProfSymtab *SymTab);

// Merge Value Profile data from Src record to this record for ValueKind.		// Merge Value Profile data from Src record to this record for ValueKind.
// Scale merged value counts by \p Weight.		// Scale merged value counts by \p Weight.
void mergeValueProfData(uint32_t ValkeKind, InstrProfRecord &Src,		void mergeValueProfData(uint32_t ValkeKind, InstrProfRecord &Src,
uint64_t Weight,		uint64_t Weight,
function_ref<void(instrprof_error)> Warn);		function_ref<void(instrprof_error)> Warn);

// Scale up value profile data count.		// Scale up value profile data count by N (Numerator) / D (Denominator).
void scaleValueProfData(uint32_t ValueKind, uint64_t Weight,		void scaleValueProfData(uint32_t ValueKind, uint64_t N, uint64_t D,
function_ref<void(instrprof_error)> Warn);		function_ref<void(instrprof_error)> Warn);
};		};

struct NamedInstrProfRecord : InstrProfRecord {		struct NamedInstrProfRecord : InstrProfRecord {
StringRef Name;		StringRef Name;
uint64_t Hash;		uint64_t Hash;

// We reserve this bit as the flag for context sensitive profile record.		// We reserve this bit as the flag for context sensitive profile record.
▲ Show 20 Lines • Show All 301 Lines • Show Last 20 Lines

llvm/include/llvm/ProfileData/InstrProfWriter.h

Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	private:
bool InstrEntryBBEnabled;		bool InstrEntryBBEnabled;
// Use raw pointer here for the incomplete type object.		// Use raw pointer here for the incomplete type object.
InstrProfRecordWriterTrait *InfoObj;		InstrProfRecordWriterTrait *InfoObj;

public:		public:
InstrProfWriter(bool Sparse = false, bool InstrEntryBBEnabled = false);		InstrProfWriter(bool Sparse = false, bool InstrEntryBBEnabled = false);
~InstrProfWriter();		~InstrProfWriter();

		StringMap<ProfilingData> &getProfileData() { return FunctionData; }

/// Add function counts for the given function. If there are already counts		/// Add function counts for the given function. If there are already counts
/// for this function and the hash and number of counts match, each counter is		/// for this function and the hash and number of counts match, each counter is
/// summed. Optionally scale counts by \p Weight.		/// summed. Optionally scale counts by \p Weight.
void addRecord(NamedInstrProfRecord &&I, uint64_t Weight,		void addRecord(NamedInstrProfRecord &&I, uint64_t Weight,
function_ref<void(Error)> Warn);		function_ref<void(Error)> Warn);
void addRecord(NamedInstrProfRecord &&I, function_ref<void(Error)> Warn) {		void addRecord(NamedInstrProfRecord &&I, function_ref<void(Error)> Warn) {
addRecord(std::move(I), 1, Warn);		addRecord(std::move(I), 1, Warn);
}		}
▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

llvm/lib/ProfileData/InstrProf.cpp

Show First 20 Lines • Show All 619 Lines • ▼ Show 20 Lines	if (I != IE && I->Value == J->Value) {
Warn(instrprof_error::counter_overflow);		Warn(instrprof_error::counter_overflow);
++I;		++I;
continue;		continue;
}		}
ValueData.insert(I, *J);		ValueData.insert(I, *J);
}		}
}		}

void InstrProfValueSiteRecord::scale(uint64_t Weight,		void InstrProfValueSiteRecord::scale(uint64_t N, uint64_t D,
function_ref<void(instrprof_error)> Warn) {		function_ref<void(instrprof_error)> Warn) {
for (auto I = ValueData.begin(), IE = ValueData.end(); I != IE; ++I) {		for (auto I = ValueData.begin(), IE = ValueData.end(); I != IE; ++I) {
bool Overflowed;		bool Overflowed;
I->Count = SaturatingMultiply(I->Count, Weight, &Overflowed);		I->Count = SaturatingMultiply(I->Count, N, &Overflowed) / D;
if (Overflowed)		if (Overflowed)
Warn(instrprof_error::counter_overflow);		Warn(instrprof_error::counter_overflow);
}		}
}		}

// Merge Value Profile data from Src record to this record for ValueKind.		// Merge Value Profile data from Src record to this record for ValueKind.
// Scale merged value counts by \p Weight.		// Scale merged value counts by \p Weight.
void InstrProfRecord::mergeValueProfData(		void InstrProfRecord::mergeValueProfData(
Show All 32 Lines	if (Overflowed)
Warn(instrprof_error::counter_overflow);		Warn(instrprof_error::counter_overflow);
}		}

for (uint32_t Kind = IPVK_First; Kind <= IPVK_Last; ++Kind)		for (uint32_t Kind = IPVK_First; Kind <= IPVK_Last; ++Kind)
mergeValueProfData(Kind, Other, Weight, Warn);		mergeValueProfData(Kind, Other, Weight, Warn);
}		}

void InstrProfRecord::scaleValueProfData(		void InstrProfRecord::scaleValueProfData(
uint32_t ValueKind, uint64_t Weight,		uint32_t ValueKind, uint64_t N, uint64_t D,
function_ref<void(instrprof_error)> Warn) {		function_ref<void(instrprof_error)> Warn) {
for (auto &R : getValueSitesForKind(ValueKind))		for (auto &R : getValueSitesForKind(ValueKind))
R.scale(Weight, Warn);		R.scale(N, D, Warn);
}		}

void InstrProfRecord::scale(uint64_t Weight,		void InstrProfRecord::scale(uint64_t N, uint64_t D,
function_ref<void(instrprof_error)> Warn) {		function_ref<void(instrprof_error)> Warn) {
		assert(D != 0 && "D cannot be 0");
for (auto &Count : this->Counts) {		for (auto &Count : this->Counts) {
bool Overflowed;		bool Overflowed;
Count = SaturatingMultiply(Count, Weight, &Overflowed);		Count = SaturatingMultiply(Count, N, &Overflowed) / D;
if (Overflowed)		if (Overflowed)
Warn(instrprof_error::counter_overflow);		Warn(instrprof_error::counter_overflow);
}		}
for (uint32_t Kind = IPVK_First; Kind <= IPVK_Last; ++Kind)		for (uint32_t Kind = IPVK_First; Kind <= IPVK_Last; ++Kind)
scaleValueProfData(Kind, Weight, Warn);		scaleValueProfData(Kind, N, D, Warn);
}		}

// Map indirect call target name hash to name string.		// Map indirect call target name hash to name string.
uint64_t InstrProfRecord::remapValue(uint64_t Value, uint32_t ValueKind,		uint64_t InstrProfRecord::remapValue(uint64_t Value, uint32_t ValueKind,
InstrProfSymtab *SymTab) {		InstrProfSymtab *SymTab) {
if (!SymTab)		if (!SymTab)
return Value;		return Value;

▲ Show 20 Lines • Show All 583 Lines • Show Last 20 Lines

llvm/lib/ProfileData/InstrProfWriter.cpp

Show First 20 Lines • Show All 235 Lines • ▼ Show 20 Lines	void InstrProfWriter::addRecord(StringRef Name, uint64_t Hash,
auto MapWarn = [&](instrprof_error E) {		auto MapWarn = [&](instrprof_error E) {
Warn(make_error<InstrProfError>(E));		Warn(make_error<InstrProfError>(E));
};		};

if (NewFunc) {		if (NewFunc) {
// We've never seen a function with this name and hash, add it.		// We've never seen a function with this name and hash, add it.
Dest = std::move(I);		Dest = std::move(I);
if (Weight > 1)		if (Weight > 1)
Dest.scale(Weight, MapWarn);		Dest.scale(Weight, 1, MapWarn);
} else {		} else {
// We're updating a function we've seen before.		// We're updating a function we've seen before.
Dest.merge(I, Weight, MapWarn);		Dest.merge(I, Weight, MapWarn);
}		}

Dest.sortValueData();		Dest.sortValueData();
}		}

▲ Show 20 Lines • Show All 225 Lines • Show Last 20 Lines

llvm/lib/ProfileData/ProfileSummaryBuilder.cpp

	Show First 20 Lines • Show All 113 Lines • ▼ Show 20 Lines
	std::unique_ptr<ProfileSummary> InstrProfSummaryBuilder::getSummary() {			std::unique_ptr<ProfileSummary> InstrProfSummaryBuilder::getSummary() {
	computeDetailedSummary();			computeDetailedSummary();
	return std::make_unique<ProfileSummary>(			return std::make_unique<ProfileSummary>(
	ProfileSummary::PSK_Instr, DetailedSummary, TotalCount, MaxCount,			ProfileSummary::PSK_Instr, DetailedSummary, TotalCount, MaxCount,
	MaxInternalBlockCount, MaxFunctionCount, NumCounts, NumFunctions);			MaxInternalBlockCount, MaxFunctionCount, NumCounts, NumFunctions);
	}			}

	void InstrProfSummaryBuilder::addEntryCount(uint64_t Count) {			void InstrProfSummaryBuilder::addEntryCount(uint64_t Count) {
	addCount(Count);
	NumFunctions++;			NumFunctions++;

				// Skip invalid count.
				if (Count == (uint64_t)-1)
				return;

				addCount(Count);
	if (Count > MaxFunctionCount)			if (Count > MaxFunctionCount)
	MaxFunctionCount = Count;			MaxFunctionCount = Count;
	}			}

	void InstrProfSummaryBuilder::addInternalCount(uint64_t Count) {			void InstrProfSummaryBuilder::addInternalCount(uint64_t Count) {
				// Skip invalid count.
				if (Count == (uint64_t)-1)
				return;

	addCount(Count);			addCount(Count);
	if (Count > MaxInternalBlockCount)			if (Count > MaxInternalBlockCount)
	MaxInternalBlockCount = Count;			MaxInternalBlockCount = Count;
	}			}

llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp

Show First 20 Lines • Show All 768 Lines • ▼ Show 20 Lines	BasicBlock FuncPGOInstrumentation<Edge, BBInfo>::getInstrBB(Edge E) {
BasicBlock SrcBB = const_cast<BasicBlock >(E->SrcBB);		BasicBlock SrcBB = const_cast<BasicBlock >(E->SrcBB);
BasicBlock DestBB = const_cast<BasicBlock >(E->DestBB);		BasicBlock DestBB = const_cast<BasicBlock >(E->DestBB);
// For a fake edge, instrument the real BB.		// For a fake edge, instrument the real BB.
if (SrcBB == nullptr)		if (SrcBB == nullptr)
return DestBB;		return DestBB;
if (DestBB == nullptr)		if (DestBB == nullptr)
return SrcBB;		return SrcBB;

auto canInstrument = [](BasicBlock BB) -> BasicBlock {		auto canInstrument = [](BasicBlock BB) -> BasicBlock {
		hoyFBUnsubmitted Not Done Reply Inline Actions Remove the lambda and use the newly added static function instead? hoyFB: Remove the lambda and use the newly added static function instead?
// There are basic blocks (such as catchswitch) cannot be instrumented.		// There are basic blocks (such as catchswitch) cannot be instrumented.
// If the returned first insertion point is the end of BB, skip this BB.		// If the returned first insertion point is the end of BB, skip this BB.
if (BB->getFirstInsertionPt() == BB->end())		if (BB->getFirstInsertionPt() == BB->end())
return nullptr;		return nullptr;
return BB;		return BB;
};		};

// Instrument the SrcBB if it has a single successor,		// Instrument the SrcBB if it has a single successor,
▲ Show 20 Lines • Show All 229 Lines • ▼ Show 20 Lines	PGOUseFunc(Function &Func, Module *Modu, TargetLibraryInfo &TLI,
BranchProbabilityInfo BPI, BlockFrequencyInfo BFIin,		BranchProbabilityInfo BPI, BlockFrequencyInfo BFIin,
ProfileSummaryInfo *PSI, bool IsCS, bool InstrumentFuncEntry)		ProfileSummaryInfo *PSI, bool IsCS, bool InstrumentFuncEntry)
: F(Func), M(Modu), BFI(BFIin), PSI(PSI),		: F(Func), M(Modu), BFI(BFIin), PSI(PSI),
FuncInfo(Func, TLI, ComdatMembers, false, BPI, BFIin, IsCS,		FuncInfo(Func, TLI, ComdatMembers, false, BPI, BFIin, IsCS,
InstrumentFuncEntry),		InstrumentFuncEntry),
FreqAttr(FFA_Normal), IsCS(IsCS) {}		FreqAttr(FFA_Normal), IsCS(IsCS) {}

// Read counts for the instrumented BB from profile.		// Read counts for the instrumented BB from profile.
bool readCounters(IndexedInstrProfReader *PGOReader, bool &AllZeros);		bool readCounters(IndexedInstrProfReader *PGOReader, bool &AllZeros,
		bool &AllMinusOnes);

// Populate the counts for all BBs.		// Populate the counts for all BBs.
void populateCounters();		void populateCounters();

// Set the branch weights based on the count values.		// Set the branch weights based on the count values.
void setBranchWeights();		void setBranchWeights();

// Annotate the value profile call sites for all value kind.		// Annotate the value profile call sites for all value kind.
▲ Show 20 Lines • Show All 166 Lines • ▼ Show 20 Lines	for (auto &E : Edges) {
return;		return;
}		}
llvm_unreachable("Cannot find the unknown count edge");		llvm_unreachable("Cannot find the unknown count edge");
}		}

// Read the profile from ProfileFileName and assign the value to the		// Read the profile from ProfileFileName and assign the value to the
// instrumented BB and the edges. This function also updates ProgramMaxCount.		// instrumented BB and the edges. This function also updates ProgramMaxCount.
// Return true if the profile are successfully read, and false on errors.		// Return true if the profile are successfully read, and false on errors.
bool PGOUseFunc::readCounters(IndexedInstrProfReader *PGOReader, bool &AllZeros) {		bool PGOUseFunc::readCounters(IndexedInstrProfReader *PGOReader, bool &AllZeros,
		bool &AllMinusOnes) {
auto &Ctx = M->getContext();		auto &Ctx = M->getContext();
Expected<InstrProfRecord> Result =		Expected<InstrProfRecord> Result =
PGOReader->getInstrProfRecord(FuncInfo.FuncName, FuncInfo.FunctionHash);		PGOReader->getInstrProfRecord(FuncInfo.FuncName, FuncInfo.FunctionHash);
if (Error E = Result.takeError()) {		if (Error E = Result.takeError()) {
handleAllErrors(std::move(E), [&](const InstrProfError &IPE) {		handleAllErrors(std::move(E), [&](const InstrProfError &IPE) {
auto Err = IPE.get();		auto Err = IPE.get();
bool SkipWarning = false;		bool SkipWarning = false;
LLVM_DEBUG(dbgs() << "Error in reading profile for Func "		LLVM_DEBUG(dbgs() << "Error in reading profile for Func "
Show All 26 Lines	if (Error E = Result.takeError()) {
});		});
return false;		return false;
}		}
ProfileRecord = std::move(Result.get());		ProfileRecord = std::move(Result.get());
std::vector<uint64_t> &CountFromProfile = ProfileRecord.Counts;		std::vector<uint64_t> &CountFromProfile = ProfileRecord.Counts;

IsCS ? NumOfCSPGOFunc++ : NumOfPGOFunc++;		IsCS ? NumOfCSPGOFunc++ : NumOfPGOFunc++;
LLVM_DEBUG(dbgs() << CountFromProfile.size() << " counts\n");		LLVM_DEBUG(dbgs() << CountFromProfile.size() << " counts\n");
		AllMinusOnes = (CountFromProfile.size() > 0);
		davidxlUnsubmitted Not Done Reply Inline Actions document the variable. davidxl: document the variable.
uint64_t ValueSum = 0;		uint64_t ValueSum = 0;
for (unsigned I = 0, S = CountFromProfile.size(); I < S; I++) {		for (unsigned I = 0, S = CountFromProfile.size(); I < S; I++) {
LLVM_DEBUG(dbgs() << " " << I << ": " << CountFromProfile[I] << "\n");		LLVM_DEBUG(dbgs() << " " << I << ": " << CountFromProfile[I] << "\n");
ValueSum += CountFromProfile[I];		ValueSum += CountFromProfile[I];
		if (CountFromProfile[I] != (uint64_t)-1)
		davidxlUnsubmitted Not Done Reply Inline Actions Is it possible to have some blocks -1? davidxl: Is it possible to have some blocks -1?
		wmiAuthorUnsubmitted Done Reply Inline Actions I feel having all blocks -1 to indicate unpresentative profile for an actually hot function is simpler than having some blocks -1. That is because when we compute profile summary, we want to strip those unpresentative profile. If we change some blocks to -1 but keep the rest unchanged, those counters will still be used for computing profile summary. wmi: I feel having all blocks -1 to indicate unpresentative profile for an actually hot function is…
		AllMinusOnes = false;
}		}
AllZeros = (ValueSum == 0);		AllZeros = (ValueSum == 0);

LLVM_DEBUG(dbgs() << "SUM = " << ValueSum << "\n");		LLVM_DEBUG(dbgs() << "SUM = " << ValueSum << "\n");

getBBInfo(nullptr).UnknownCountOutEdge = 2;		getBBInfo(nullptr).UnknownCountOutEdge = 2;
getBBInfo(nullptr).UnknownCountInEdge = 2;		getBBInfo(nullptr).UnknownCountInEdge = 2;

▲ Show 20 Lines • Show All 391 Lines • ▼ Show 20 Lines	for (auto &F : M) {
auto &TLI = LookupTLI(F);		auto &TLI = LookupTLI(F);
auto *BPI = LookupBPI(F);		auto *BPI = LookupBPI(F);
auto *BFI = LookupBFI(F);		auto *BFI = LookupBFI(F);
// Split indirectbr critical edges here before computing the MST rather than		// Split indirectbr critical edges here before computing the MST rather than
// later in getInstrBB() to avoid invalidating it.		// later in getInstrBB() to avoid invalidating it.
SplitIndirectBrCriticalEdges(F, BPI, BFI);		SplitIndirectBrCriticalEdges(F, BPI, BFI);
PGOUseFunc Func(F, &M, TLI, ComdatMembers, BPI, BFI, PSI, IsCS,		PGOUseFunc Func(F, &M, TLI, ComdatMembers, BPI, BFI, PSI, IsCS,
InstrumentFuncEntry);		InstrumentFuncEntry);
		// When AllMinusOnes is true, it means the profile for the function
		// is unrepresentative and this function is actually hot. Set the
		// entry count of the function to be multiple times of hot threshold
		// and drop all its internal counters.
		bool AllMinusOnes = false;
bool AllZeros = false;		bool AllZeros = false;
if (!Func.readCounters(PGOReader.get(), AllZeros))		if (!Func.readCounters(PGOReader.get(), AllZeros, AllMinusOnes))
continue;		continue;
if (AllZeros) {		if (AllZeros) {
F.setEntryCount(ProfileCount(0, Function::PCT_Real));		F.setEntryCount(ProfileCount(0, Function::PCT_Real));
if (Func.getProgramMaxCount() != 0)		if (Func.getProgramMaxCount() != 0)
ColdFunctions.push_back(&F);		ColdFunctions.push_back(&F);
continue;		continue;
}		}
		const unsigned MultiplyFactor = 3;
		davidxlUnsubmitted Not Done Reply Inline Actions oh, just move this comment to the variable decl. davidxl: oh, just move this comment to the variable decl.
		wmiAuthorUnsubmitted Done Reply Inline Actions Ok, will do. wmi: Ok, will do.
		if (AllMinusOnes) {
		uint64_t HotThreshold = PSI->getHotCountThreshold();
		if (HotThreshold)
		F.setEntryCount(
		ProfileCount(HotThreshold * MultiplyFactor, Function::PCT_Real));
		HotFunctions.push_back(&F);
		continue;
		}
Func.populateCounters();		Func.populateCounters();
Func.setBranchWeights();		Func.setBranchWeights();
Func.annotateValueSites();		Func.annotateValueSites();
Func.annotateIrrLoopHeaderWeights();		Func.annotateIrrLoopHeaderWeights();
PGOUseFunc::FuncFreqAttr FreqAttr = Func.getFuncFreqAttr();		PGOUseFunc::FuncFreqAttr FreqAttr = Func.getFuncFreqAttr();
if (FreqAttr == PGOUseFunc::FFA_Cold)		if (FreqAttr == PGOUseFunc::FFA_Cold)
ColdFunctions.push_back(&F);		ColdFunctions.push_back(&F);
else if (FreqAttr == PGOUseFunc::FFA_Hot)		else if (FreqAttr == PGOUseFunc::FFA_Hot)
▲ Show 20 Lines • Show All 230 Lines • Show Last 20 Lines

llvm/test/Transforms/PGOProfile/Inputs/sample-profile.proftext

This file was added.

				test_simple_for:4000:4000
				1: 1000
				2: 1000
				3: 1000
				4: 1000

				moo:10:10
				1: 2
				2: 2
				3: 2
				4: 2
				5: 2

llvm/test/Transforms/PGOProfile/Inputs/suppl-profile.proftext

This file was added.

				# :ir is the flag to indicate this is IR level profile.
				:ir
				test_simple_for
				34137660316
				2
				0
				0

				foo
				2582734
				4
				1000
				270
				180
				760

llvm/test/Transforms/PGOProfile/suppl-profile.ll

This file was added.

				; Supplement instr profile suppl-profile.proftext with sample profile
				; sample-profile.proftext.
				; RUN: llvm-profdata merge -instr -suppl-min-size-threshold=0 \
				; RUN: -supplement-instr-with-sample=%p/Inputs/sample-profile.proftext \
				; RUN: %S/Inputs/suppl-profile.proftext -o %t.profdata
				; RUN: opt < %s -pgo-instr-use -pgo-test-profile-file=%t.profdata -S \| FileCheck %s
				; RUN: opt < %s -passes=pgo-instr-use -pgo-test-profile-file=%t.profdata -S \| FileCheck %s

				target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; Check test_simple_for has a non-zero entry count and doesn't have any other
				; prof metadata.
				; CHECK: @test_simple_for(i32 %n) {{.*}} !prof ![[ENTRY_COUNT:[0-9]+]]
				; CHECK-NOT: !prof !
				; CHECK: ![[ENTRY_COUNT]] = !{!"function_entry_count", i64 540}
				define i32 @test_simple_for(i32 %n) {
				entry:
				br label %for.cond

				for.cond:
				%i = phi i32 [ 0, %entry ], [ %inc1, %for.inc ]
				%sum = phi i32 [ 1, %entry ], [ %inc, %for.inc ]
				%cmp = icmp slt i32 %i, %n
				br i1 %cmp, label %for.body, label %for.end

				for.body:
				%inc = add nsw i32 %sum, 1
				br label %for.inc

				for.inc:
				%inc1 = add nsw i32 %i, 1
				br label %for.cond

				for.end:
				ret i32 %sum
				}

llvm/test/tools/llvm-profdata/Inputs/mix_instr.proftext

This file was added.

				:ir
				foo
				7
				5
				12
				13
				0
				0
				0

				goo
				5
				davidxlUnsubmitted Not Done Reply Inline Actions Do we have a test case for all zero case and below the threshold case (considiered all zero)? davidxl: Do we have a test case for all zero case and below the threshold case (considiered all zero)?
				wmiAuthorUnsubmitted Done Reply Inline Actions Yes, foo is intentionally created for that. The line 22 and line 42 in suppl-instr-with-sample.test test the cases where ratio of zero counter in foo is above or lower the threshold. wmi: Yes, foo is intentionally created for that. The line 22 and line 42 in suppl-instr-with-sample.
				3
				0
				0
				0

				moo
				9
				4
				3000
				1000
				2000
				500

llvm/test/tools/llvm-profdata/Inputs/mix_sample.proftext

This file was added.

				foo:2000:2000
				1: 2000
				goo:3000:1500
				1: 1200
				2: 800
				3: 1000
				moo:1000:1000
				1: 1000
				hoo:50:1
				1: 1
				2: 2
				3: 3
				4: 4
				5: 5
				6: 6
				7: 7
				8: 8

llvm/test/tools/llvm-profdata/overflow-instr.test

	Tests for overflow when merging instrumented profiles.			Tests for overflow when merging instrumented profiles.

	1- Merge profile having maximum counts with itself and verify overflow detected and saturation occurred			1- Merge profile having maximum counts with itself and verify overflow detected and saturation occurred
	RUN: llvm-profdata merge -instr %p/Inputs/overflow-instr.proftext %p/Inputs/overflow-instr.proftext -o %t.out 2>&1 \| FileCheck %s -check-prefix=MERGE_OVERFLOW			RUN: llvm-profdata merge -instr %p/Inputs/overflow-instr.proftext %p/Inputs/overflow-instr.proftext -o %t.out 2>&1 \| FileCheck %s -check-prefix=MERGE_OVERFLOW
	RUN: llvm-profdata show -instr %t.out \| FileCheck %s --check-prefix=SHOW_OVERFLOW			RUN: llvm-profdata show -instr -all-functions -counts %t.out \| FileCheck %s --check-prefix=SHOW_OVERFLOW
	MERGE_OVERFLOW: {{.*}}: overflow: Counter overflow			MERGE_OVERFLOW: {{.*}}: overflow: Counter overflow
	SHOW_OVERFLOW: Total functions: 1			SHOW_OVERFLOW: Function count: 18446744073709551615
	SHOW_OVERFLOW-NEXT: Maximum function count: 18446744073709551615			SHOW_OVERFLOW-NEXT: Block counts: [18446744073709551615, 18446744073709551615]
	SHOW_OVERFLOW-NEXT: Maximum internal block count: 18446744073709551615

	2- Merge profile having maximum counts by itself and verify no overflow			2- Merge profile having maximum counts by itself and verify no overflow
	RUN: llvm-profdata merge -instr %p/Inputs/overflow-instr.proftext -o %t.out 2>&1 \| FileCheck %s -check-prefix=MERGE_NO_OVERFLOW -allow-empty			RUN: llvm-profdata merge -instr %p/Inputs/overflow-instr.proftext -o %t.out 2>&1 \| FileCheck %s -check-prefix=MERGE_NO_OVERFLOW -allow-empty
	RUN: llvm-profdata show -instr %t.out \| FileCheck %s --check-prefix=SHOW_NO_OVERFLOW			RUN: llvm-profdata show -instr -all-functions -counts %t.out \| FileCheck %s --check-prefix=SHOW_NO_OVERFLOW
	MERGE_NO_OVERFLOW-NOT: {{.*}}: overflow: Counter overflow			MERGE_NO_OVERFLOW-NOT: {{.*}}: overflow: Counter overflow
	SHOW_NO_OVERFLOW: Total functions: 1			SHOW_NO_OVERFLOW: Function count: 18446744073709551615
	SHOW_NO_OVERFLOW-NEXT: Maximum function count: 18446744073709551615			SHOW_NO_OVERFLOW-NEXT: Block counts: [9223372036854775808, 18446744073709551615]
	SHOW_NO_OVERFLOW-NEXT: Maximum internal block count: 18446744073709551615

llvm/test/tools/llvm-profdata/suppl-instr-with-sample.test

This file was added.

				Some basic tests for supplementing instrumentation profile with sample profile.

				Test all of goo's counters will be set to -1.
				RUN: llvm-profdata merge \
				RUN: -supplement-instr-with-sample=%p/Inputs/mix_sample.proftext \
				RUN: -suppl-min-size-threshold=0 %p/Inputs/mix_instr.proftext -o %t
				RUN: llvm-profdata show %t -all-functions -counts \| FileCheck %s --check-prefix=MIX1

				MIX1: foo:
				MIX1-NEXT: Hash: 0x0000000000000007
				MIX1-NEXT: Counters: 5
				MIX1-NEXT: Block counts: [12, 13, 0, 0, 0]
				MIX1: goo:
				MIX1-NEXT: Hash: 0x0000000000000005
				MIX1-NEXT: Counters: 3
				MIX1-NEXT: Block counts: [18446744073709551615, 18446744073709551615, 18446744073709551615]
				MIX1: moo:
				MIX1-NEXT: Hash: 0x0000000000000009
				MIX1-NEXT: Counters: 4
				MIX1-NEXT: Block counts: [3000, 1000, 2000, 500]

				Test when the zero counter ratio of foo is higher than zero-counter-threshold.
				RUN: llvm-profdata merge \
				RUN: -supplement-instr-with-sample=%p/Inputs/mix_sample.proftext \
				RUN: -suppl-min-size-threshold=0 -zero-counter-threshold=0.5 \
				RUN: -instr-prof-cold-threshold=30 %p/Inputs/mix_instr.proftext -o %t
				RUN: llvm-profdata show %t -all-functions -counts \| FileCheck %s --check-prefix=MIX2

				MIX2: foo:
				MIX2-NEXT: Hash: 0x0000000000000007
				MIX2-NEXT: Counters: 5
				MIX2-NEXT: Block counts: [18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615]
				MIX2: goo:
				MIX2-NEXT: Hash: 0x0000000000000005
				MIX2-NEXT: Counters: 3
				MIX2-NEXT: Block counts: [18446744073709551615, 18446744073709551615, 18446744073709551615]
				MIX2: moo:
				MIX2-NEXT: Hash: 0x0000000000000009
				MIX2-NEXT: Counters: 4
				MIX2-NEXT: Block counts: [3000, 1000, 2000, 500]

				Test when the zero counter ratio of foo is lower than zero-counter-threshold.
				RUN: llvm-profdata merge \
				RUN: -supplement-instr-with-sample=%p/Inputs/mix_sample.proftext \
				RUN: -suppl-min-size-threshold=0 -zero-counter-threshold=0.7 \
				RUN: -instr-prof-cold-threshold=30 %p/Inputs/mix_instr.proftext -o %t
				RUN: llvm-profdata show %t -all-functions -counts \| FileCheck %s --check-prefix=MIX3

				MIX3: foo:
				MIX3-NEXT: Hash: 0x0000000000000007
				MIX3-NEXT: Counters: 5
				MIX3-NEXT: Block counts: [1384, 1500, 0, 0, 0]
				MIX3: goo:
				MIX3-NEXT: Hash: 0x0000000000000005
				MIX3-NEXT: Counters: 3
				MIX3-NEXT: Block counts: [18446744073709551615, 18446744073709551615, 18446744073709551615]
				MIX3: moo:
				MIX3-NEXT: Hash: 0x0000000000000009
				MIX3-NEXT: Counters: 4
				MIX3-NEXT: Block counts: [3000, 1000, 2000, 500]

				Test foo's profile won't be adjusted because its size is smaller
				than suppl-min-size-threshold.
				RUN: llvm-profdata merge \
				RUN: -supplement-instr-with-sample=%p/Inputs/mix_sample.proftext \
				RUN: -suppl-min-size-threshold=2 -zero-counter-threshold=0.7 \
				RUN: -instr-prof-cold-threshold=30 %p/Inputs/mix_instr.proftext -o %t
				RUN: llvm-profdata show %t -all-functions -counts \| FileCheck %s --check-prefix=MIX4

				MIX4: foo:
				MIX4-NEXT: Hash: 0x0000000000000007
				MIX4-NEXT: Counters: 5
				MIX4-NEXT: Block counts: [12, 13, 0, 0, 0]
				MIX4: goo:
				MIX4-NEXT: Hash: 0x0000000000000005
				MIX4-NEXT: Counters: 3
				MIX4-NEXT: Block counts: [18446744073709551615, 18446744073709551615, 18446744073709551615]
				MIX4: moo:
				MIX4-NEXT: Hash: 0x0000000000000009
				MIX4-NEXT: Counters: 4
				MIX4-NEXT: Block counts: [3000, 1000, 2000, 500]

				Test profile summary won't be affected by -1 counter.
				RUN: llvm-profdata merge \
				RUN: -supplement-instr-with-sample=%p/Inputs/mix_sample.proftext \
				RUN: -suppl-min-size-threshold=0 %p/Inputs/mix_instr.proftext -o %t
				RUN: llvm-profdata show %t -detailed-summary \| FileCheck %s --check-prefix=MIX5

				MIX5: Instrumentation level: IR
				MIX5-NEXT: Total functions: 3
				MIX5-NEXT: Maximum function count: 3000
				MIX5-NEXT: Maximum internal block count: 2000
				MIX5-NEXT: Total number of blocks: 9
				MIX5-NEXT: Total count: 6525
				MIX5-NEXT: Detailed summary:
				MIX5-NEXT: 3 blocks with count >= 1000 account for 80 percentage of the total counts.
				MIX5-NEXT: 3 blocks with count >= 1000 account for 90 percentage of the total counts.
				MIX5-NEXT: 4 blocks with count >= 500 account for 95 percentage of the total counts.
				MIX5-NEXT: 4 blocks with count >= 500 account for 99 percentage of the total counts.
				MIX5-NEXT: 6 blocks with count >= 12 account for 99.9 percentage of the total counts.
				MIX5-NEXT: 6 blocks with count >= 12 account for 99.99 percentage of the total counts.
				MIX5-NEXT: 6 blocks with count >= 12 account for 99.999 percentage of the total counts.

llvm/tools/llvm-profdata/llvm-profdata.cpp

Show First 20 Lines • Show All 286 Lines • ▼ Show 20 Lines	Dst->Writer.mergeRecordsFromWriter(std::move(Src->Writer), [&](Error E) {
instrprof_error IPE = InstrProfError::take(std::move(E));		instrprof_error IPE = InstrProfError::take(std::move(E));
std::unique_lock<std::mutex> ErrGuard{Dst->ErrLock};		std::unique_lock<std::mutex> ErrGuard{Dst->ErrLock};
bool firstTime = Dst->WriterErrorCodes.insert(IPE).second;		bool firstTime = Dst->WriterErrorCodes.insert(IPE).second;
if (firstTime)		if (firstTime)
warn(toString(make_error<InstrProfError>(IPE)));		warn(toString(make_error<InstrProfError>(IPE)));
});		});
}		}

static void writeInstrProfile(StringRef OutputFilename,		static void writeInstrProfile(StringRef OutputFilename,
		davidxlUnsubmitted Done Reply Inline Actions this refactoring can also be committed independently davidxl: this refactoring can also be committed independently
		wmiAuthorUnsubmitted Done Reply Inline Actions Done in https://reviews.llvm.org/D83521 wmi: Done in https://reviews.llvm.org/D83521
ProfileFormat OutputFormat,		ProfileFormat OutputFormat,
InstrProfWriter &Writer) {		InstrProfWriter &Writer) {
std::error_code EC;		std::error_code EC;
raw_fd_ostream Output(OutputFilename.data(), EC, sys::fs::OF_None);		raw_fd_ostream Output(OutputFilename.data(), EC, sys::fs::OF_None);
if (EC)		if (EC)
exitWithErrorCode(EC, OutputFilename);		exitWithErrorCode(EC, OutputFilename);

if (OutputFormat == PF_Text) {		if (OutputFormat == PF_Text) {
▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	static void mergeInstrProfile(const WeightedFileVector &Inputs,
}		}
if (NumErrors == Inputs.size() \|\|		if (NumErrors == Inputs.size() \|\|
(NumErrors > 0 && FailMode == failIfAnyAreInvalid))		(NumErrors > 0 && FailMode == failIfAnyAreInvalid))
exitWithError("No profiles could be merged.");		exitWithError("No profiles could be merged.");

writeInstrProfile(OutputFilename, OutputFormat, Contexts[0]->Writer);		writeInstrProfile(OutputFilename, OutputFormat, Contexts[0]->Writer);
}		}

		/// The profile entry for a function in instrumentation profile.
		struct InstrProfileEntry {
		uint64_t MaxCount = 0;
		float ZeroCounterRatio = 0.0;
		InstrProfRecord *ProfRecord;
		InstrProfileEntry(InstrProfRecord *Record);
		InstrProfileEntry() = default;
		};

		InstrProfileEntry::InstrProfileEntry(InstrProfRecord *Record) {
		ProfRecord = Record;
		uint64_t CntNum = Record->Counts.size();
		uint64_t ZeroCntNum = 0;
		davidxlUnsubmitted Not Done Reply Inline Actions handling this case (all zero case) in this way won't help much -- The branch Probablity pass will set all branch weights to 1 making all branch unbiased -- it is worse than using static heuristic based. The right way I think is to remove their profile entries completely. I assume the compiler pass later will treat such functions as unknown and not put into .text.unlikely. davidxl: handling this case (all zero case) in this way won't help much -- The branch Probablity pass…
		wmiAuthorUnsubmitted Done Reply Inline Actions But I think for a large part of the functions missing in loadtest, they have all zero counter values, so it is better to have a way to handle them. Is it possible for PGO to handle the functions with only entry counter not being zero and with other counters all being zero in a special way -- for those functions, just set the entry count and skip the metadata setting inside of the function? So that those functions can use static profiling inside. wmi: But I think for a large part of the functions missing in loadtest, they have all zero counter…
		for (size_t I = 0; I < CntNum; ++I) {
		MaxCount = std::max(MaxCount, Record->Counts[I]);
		ZeroCntNum += !Record->Counts[I];
		}
		ZeroCounterRatio = (float)ZeroCntNum / CntNum;
		}

		/// Either set all the counters in the instr profile entry \p IFE to -1
		/// in order to drop the profile or scale up the counters in \p IFP to
		/// be above hot threshold. We use the ratio of zero counters in the
		/// profile of a function to decide the profile is helpful or harmful
		/// for performance, and to choose whether to scale up or drop it.
		static void updateInstrProfileEntry(InstrProfileEntry &IFE,
		uint64_t HotInstrThreshold,
		float ZeroCounterThreshold) {
		InstrProfRecord *ProfRecord = IFE.ProfRecord;
		if (!IFE.MaxCount \|\| IFE.ZeroCounterRatio > ZeroCounterThreshold) {
		// If all or most of the counters of the function are zero, the
		// profile is unaccountable and shuld be dropped. Reset all the
		// counters to be -1 and PGO profile-use will drop the profile.
		// All counters being -1 also implies that the function is hot so
		// PGO profile-use will also set the entry count metadata to be
		// above hot threshold.
		for (size_t I = 0; I < ProfRecord->Counts.size(); ++I)
		davidxlUnsubmitted Not Done Reply Inline Actions Is it possible to delete the instprof record for the function from the profile? davidxl: Is it possible to delete the instprof record for the function from the profile?
		wmiAuthorUnsubmitted Done Reply Inline Actions One reason that I use all -1 as the indication that the function is hot and its profile is unpresentative is: user may build unrelated target together with the PGO optimized target in the same command. I know a lot of SampleFDO user does that to simplify their release. I imagine there could be PGO user doing that too. Another possiblity is to build test targets using the profile. If we delete the instprof record and treat all the functions without instprof to be hot during prof-use, we may accidently treat a lot of cold functions to be hot if the profile is applied on some unrelated targets or tests (tests may be partially related targets), and that may cause compile-time issue. wmi: One reason that I use all -1 as the indication that the function is hot and its profile is…
		ProfRecord->Counts[I] = -1;
		return;
		}

		// Scale up the MaxCount to be multiple times above hot threshold.
		const unsigned MultiplyFactor = 3;
		uint64_t Numerator = HotInstrThreshold * MultiplyFactor;
		uint64_t Denominator = IFE.MaxCount;
		ProfRecord->scale(Numerator, Denominator, [&](instrprof_error E) {
		warn(toString(make_error<InstrProfError>(E)));
		});
		}

		const uint64_t ColdPercentileIdx = 15;
		const uint64_t HotPercentileIdx = 11;

		/// Adjust the instr profile in \p WC based on the sample profile in
		/// \p Reader.
		static void
		adjustInstrProfile(std::unique_ptr<WriterContext> &WC,
		std::unique_ptr<sampleprof::SampleProfileReader> &Reader,
		unsigned SupplMinSizeThreshold, float ZeroCounterThreshold,
		davidxlUnsubmitted Not Done Reply Inline Actions when there is no scaling, setting instr count with sample count does not make sense. Perhaps just set it to be above cold threshold. davidxl: when there is no scaling, setting instr count with sample count does not make sense. Perhaps…
		wmiAuthorUnsubmitted Done Reply Inline Actions Maybe I can make scalefactor an option, and requires user to provide either -scalefactor or -base-scale-function, so that user won't accidentally leave scalefactor to be 1. In this way, I can make sample profile to be the input of the option -supplement-instr-with-sample=, so I can remove the input profile ordering logic. wmi: Maybe I can make scalefactor an option, and requires user to provide either -scalefactor or…
		unsigned InstrProfColdThreshold) {
		// Function to its entry in instr profile.
		StringMap<InstrProfileEntry> InstrProfileMap;
		InstrProfSummaryBuilder IPBuilder(ProfileSummaryBuilder::DefaultCutoffs);
		for (auto &PD : WC->Writer.getProfileData()) {
		// Populate IPBuilder.
		for (const auto &PDV : PD.getValue()) {
		InstrProfRecord Record = PDV.second;
		IPBuilder.addRecord(Record);
		}

		// If a function has multiple entries in instr profile, skip it.
		if (PD.getValue().size() != 1)
		continue;

		// Initialize InstrProfileMap.
		InstrProfRecord *R = &PD.getValue().begin()->second;
		InstrProfileMap[PD.getKey()] = InstrProfileEntry(R);
		}

		ProfileSummary InstrPS = *IPBuilder.getSummary();
		ProfileSummary SamplePS = Reader->getSummary();

		// Compute cold thresholds for instr profile and sample profile.
		uint64_t ColdSampleThreshold =
		ProfileSummaryBuilder::getEntryForPercentile(
		SamplePS.getDetailedSummary(),
		ProfileSummaryBuilder::DefaultCutoffs[ColdPercentileIdx])
		.MinCount;
		uint64_t HotInstrThreshold =
		ProfileSummaryBuilder::getEntryForPercentile(
		InstrPS.getDetailedSummary(),
		ProfileSummaryBuilder::DefaultCutoffs[HotPercentileIdx])
		.MinCount;
		uint64_t ColdInstrThreshold =
		InstrProfColdThreshold
		? InstrProfColdThreshold
		: ProfileSummaryBuilder::getEntryForPercentile(
		InstrPS.getDetailedSummary(),
		ProfileSummaryBuilder::DefaultCutoffs[ColdPercentileIdx])
		.MinCount;

		// Find hot/warm functions in sample profile which is cold in instr profile
		// and adjust the profiles of those functions in the instr profile.
		for (const auto &PD : Reader->getProfiles()) {
		StringRef FName = PD.getKey();
		const sampleprof::FunctionSamples &FS = PD.getValue();
		auto It = InstrProfileMap.find(FName);
		if (FS.getHeadSamples() > ColdSampleThreshold &&
		It != InstrProfileMap.end() &&
		It->second.MaxCount <= ColdInstrThreshold &&
		FS.getBodySamples().size() >= SupplMinSizeThreshold) {
		updateInstrProfileEntry(It->second, HotInstrThreshold,
		ZeroCounterThreshold);
		}
		}
		}

		/// The main function to supplement instr profile with sample profile.
		/// \Inputs contains the instr profile. \p SampleFilename specifies the
		/// sample profile. \p OutputFilename specifies the output profile name.
		/// \p OutputFormat specifies the output profile format. \p OutputSparse
		/// specifies whether to generate sparse profile. \p SupplMinSizeThreshold
		/// specifies the minimal size for the functions whose profile will be
		/// adjusted. \p ZeroCounterThreshold is the threshold to check whether
		/// a function contains too many zero counters and whether its profile
		/// should be dropped. \p InstrProfColdThreshold is the user specified
		/// cold threshold which will override the cold threshold got from the
		davidxlUnsubmitted Not Done Reply Inline Actions make sample file path as the part of the option, so there is no need to handle the ordering. davidxl: make sample file path as the part of the option, so there is no need to handle the ordering.
		wmiAuthorUnsubmitted Done Reply Inline Actions Indeed that will save the ordering handle logic, but I want to use weighted_input to scale the count in sample profile to be roughtly the same as the count in instr profile. To support -supplement-instr-with-sample=<weight>,<filename> will be a little weird and increase complexity. wmi: Indeed that will save the ordering handle logic, but I want to use weighted_input to scale the…
		/// instr profile summary.
		static void supplementInstrProfile(
		const WeightedFileVector &Inputs, StringRef SampleFilename,
		StringRef OutputFilename, ProfileFormat OutputFormat, bool OutputSparse,
		unsigned SupplMinSizeThreshold, float ZeroCounterThreshold,
		unsigned InstrProfColdThreshold) {
		if (OutputFilename.compare("-") == 0)
		exitWithError("Cannot write indexed profdata format to stdout.");
		if (Inputs.size() != 1)
		exitWithError("Expect one input to be an instr profile.");
		if (Inputs[0].Weight != 1)
		exitWithError("Expect instr profile doesn't have weight.");

		StringRef InstrFilename = Inputs[0].Filename;

		davidxlUnsubmitted Not Done Reply Inline Actions Are these two weights comparable? davidxl: Are these two weights comparable?
		wmiAuthorUnsubmitted Done Reply Inline Actions Yes, given "-weighted-input=2, instr_profile -weighted-input=3, sample_profile", that means we want to scale the count in sample profile by 3/2 before update the entry in instr profile. wmi: Yes, given "-weighted-input=2, instr_profile -weighted-input=3, sample_profile", that means we…
		// Read sample profile.
		LLVMContext Context;
		auto ReaderOrErr =
		sampleprof::SampleProfileReader::create(SampleFilename.str(), Context);
		if (std::error_code EC = ReaderOrErr.getError())
		exitWithErrorCode(EC, SampleFilename);
		auto Reader = std::move(ReaderOrErr.get());
		if (std::error_code EC = Reader->read())
		exitWithErrorCode(EC, SampleFilename);

		// Read instr profile.
		std::mutex ErrorLock;
		SmallSet<instrprof_error, 4> WriterErrorCodes;
		auto WC = std::make_unique<WriterContext>(OutputSparse, ErrorLock,
		WriterErrorCodes);
		loadInput(Inputs[0], nullptr, WC.get());
		if (WC->Errors.size() > 0)
		exitWithError(std::move(WC->Errors[0].first), InstrFilename);

		adjustInstrProfile(WC, Reader, SupplMinSizeThreshold, ZeroCounterThreshold,
		InstrProfColdThreshold);
		writeInstrProfile(OutputFilename, OutputFormat, WC->Writer);
		}

/// Make a copy of the given function samples with all symbol names remapped		/// Make a copy of the given function samples with all symbol names remapped
/// by the provided symbol remapper.		/// by the provided symbol remapper.
static sampleprof::FunctionSamples		static sampleprof::FunctionSamples
remapSamples(const sampleprof::FunctionSamples &Samples,		remapSamples(const sampleprof::FunctionSamples &Samples,
SymbolRemapper &Remapper, sampleprof_error &Error) {		SymbolRemapper &Remapper, sampleprof_error &Error) {
sampleprof::FunctionSamples Result;		sampleprof::FunctionSamples Result;
Result.setName(Remapper(Samples.getName()));		Result.setName(Remapper(Samples.getName()));
Result.addTotalSamples(Samples.getTotalSamples());		Result.addTotalSamples(Samples.getTotalSamples());
▲ Show 20 Lines • Show All 278 Lines • ▼ Show 20 Lines	cl::opt<bool> CompressAllSections(
"meaningful for -extbinary)"));		"meaningful for -extbinary)"));
cl::opt<bool> UseMD5(		cl::opt<bool> UseMD5(
"use-md5", cl::init(false), cl::Hidden,		"use-md5", cl::init(false), cl::Hidden,
cl::desc("Choose to use MD5 to represent string in name table (only "		cl::desc("Choose to use MD5 to represent string in name table (only "
"meaningful for -extbinary)"));		"meaningful for -extbinary)"));
cl::opt<bool> GenPartialProfile(		cl::opt<bool> GenPartialProfile(
"gen-partial-profile", cl::init(false), cl::Hidden,		"gen-partial-profile", cl::init(false), cl::Hidden,
cl::desc("Generate a partial profile (only meaningful for -extbinary)"));		cl::desc("Generate a partial profile (only meaningful for -extbinary)"));
		cl::opt<std::string> SupplInstrWithSample(
		"supplement-instr-with-sample", cl::init(""), cl::Hidden,
		cl::desc("Supplement an instr profile with sample profile, to correct "
		"the profile unrepresentativeness issue. The sample "
		"profile is the input of the flag. Output will be in instr "
		"format (The flag only works with -instr)"));
		davidxlUnsubmitted Not Done Reply Inline Actions Is this flag tested? davidxl: Is this flag tested?
		wmiAuthorUnsubmitted Done Reply Inline Actions Good point, add tests for this flag and the flag early-inline-size-threshold wmi: Good point, add tests for this flag and the flag early-inline-size-threshold
		cl::opt<float> ZeroCounterThreshold(
		"zero-counter-threshold", cl::init(0.7), cl::Hidden,
		cl::desc("For the function which is cold in instr profile but hot in "
		"sample profile, if the ratio of the number of zero counters "
		"divided by the the total number of counters is above the "
		"threshold, the profile of the function will be regarded as "
		"being harmful for performance and will be dropped. "));
		cl::opt<unsigned> SupplMinSizeThreshold(
		"suppl-min-size-threshold", cl::init(10), cl::Hidden,
		cl::desc("If the size of a function is smaller than the threshold, "
		"assume it can be inlined by PGO early inliner and it won't "
		"be adjusted based on sample profile. "));
		cl::opt<unsigned> InstrProfColdThreshold(
		"instr-prof-cold-threshold", cl::init(0), cl::Hidden,
		cl::desc("User specified cold threshold for instr profile which will "
		"override the cold threshold got from profile summary. "));

cl::ParseCommandLineOptions(argc, argv, "LLVM profile data merger\n");		cl::ParseCommandLineOptions(argc, argv, "LLVM profile data merger\n");

WeightedFileVector WeightedInputs;		WeightedFileVector WeightedInputs;
for (StringRef Filename : InputFilenames)		for (StringRef Filename : InputFilenames)
addWeightedInput(WeightedInputs, {std::string(Filename), 1});		addWeightedInput(WeightedInputs, {std::string(Filename), 1});
for (StringRef WeightedFilename : WeightedInputFilenames)		for (StringRef WeightedFilename : WeightedInputFilenames)
addWeightedInput(WeightedInputs, parseWeightedFile(WeightedFilename));		addWeightedInput(WeightedInputs, parseWeightedFile(WeightedFilename));
Show All 12 Lines	for (auto &WF : WeightedInputs)
outs() << WF.Weight << "," << WF.Filename << "\n";		outs() << WF.Weight << "," << WF.Filename << "\n";
return 0;		return 0;
}		}

std::unique_ptr<SymbolRemapper> Remapper;		std::unique_ptr<SymbolRemapper> Remapper;
if (!RemappingFile.empty())		if (!RemappingFile.empty())
Remapper = SymbolRemapper::create(RemappingFile);		Remapper = SymbolRemapper::create(RemappingFile);

		if (!SupplInstrWithSample.empty()) {
		if (ProfileKind != instr)
		exitWithError(
		"-supplement-instr-with-sample can only work with -instr. ");

		supplementInstrProfile(WeightedInputs, SupplInstrWithSample, OutputFilename,
		OutputFormat, OutputSparse, SupplMinSizeThreshold,
		ZeroCounterThreshold, InstrProfColdThreshold);
		return 0;
		}

if (ProfileKind == instr)		if (ProfileKind == instr)
mergeInstrProfile(WeightedInputs, Remapper.get(), OutputFilename,		mergeInstrProfile(WeightedInputs, Remapper.get(), OutputFilename,
OutputFormat, OutputSparse, NumThreads, FailureMode);		OutputFormat, OutputSparse, NumThreads, FailureMode);
else		else
mergeSampleProfile(WeightedInputs, Remapper.get(), OutputFilename,		mergeSampleProfile(WeightedInputs, Remapper.get(), OutputFilename,
OutputFormat, ProfileSymbolListFile, CompressAllSections,		OutputFormat, ProfileSymbolListFile, CompressAllSections,
UseMD5, GenPartialProfile, FailureMode);		UseMD5, GenPartialProfile, FailureMode);

▲ Show 20 Lines • Show All 180 Lines • ▼ Show 20 Lines	for (const auto &Func : *Reader) {
}		}

assert(Func.Counts.size() > 0 && "function missing entry counter");		assert(Func.Counts.size() > 0 && "function missing entry counter");
Builder.addRecord(Func);		Builder.addRecord(Func);

uint64_t FuncMax = 0;		uint64_t FuncMax = 0;
uint64_t FuncSum = 0;		uint64_t FuncSum = 0;
for (size_t I = 0, E = Func.Counts.size(); I < E; ++I) {		for (size_t I = 0, E = Func.Counts.size(); I < E; ++I) {
		if (Func.Counts[I] == (uint64_t)-1)
		continue;
FuncMax = std::max(FuncMax, Func.Counts[I]);		FuncMax = std::max(FuncMax, Func.Counts[I]);
FuncSum += Func.Counts[I];		FuncSum += Func.Counts[I];
}		}

if (FuncMax < ValueCutoff) {		if (FuncMax < ValueCutoff) {
++BelowCutoffFunctions;		++BelowCutoffFunctions;
if (OnlyListBelow) {		if (OnlyListBelow) {
OS << " " << Func.Name << ": (Max = " << FuncMax		OS << " " << Func.Name << ": (Max = " << FuncMax
▲ Show 20 Lines • Show All 434 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[PGO] Supplement PGO profile with Sample profileClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 281104

llvm/docs/CommandGuide/llvm-profdata.rst

llvm/include/llvm/ProfileData/InstrProf.h

llvm/include/llvm/ProfileData/InstrProfWriter.h

llvm/lib/ProfileData/InstrProf.cpp

llvm/lib/ProfileData/InstrProfWriter.cpp

llvm/lib/ProfileData/ProfileSummaryBuilder.cpp

llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp

llvm/test/Transforms/PGOProfile/Inputs/sample-profile.proftext

llvm/test/Transforms/PGOProfile/Inputs/suppl-profile.proftext

llvm/test/Transforms/PGOProfile/suppl-profile.ll

llvm/test/tools/llvm-profdata/Inputs/mix_instr.proftext

llvm/test/tools/llvm-profdata/Inputs/mix_sample.proftext

llvm/test/tools/llvm-profdata/overflow-instr.test

llvm/test/tools/llvm-profdata/suppl-instr-with-sample.test

llvm/tools/llvm-profdata/llvm-profdata.cpp

[PGO] Supplement PGO profile with Sample profile
ClosedPublic