This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/ProfileData/
-
llvm/
-
ProfileData/
1/1
InstrProf.h
-
InstrProfWriter.h
-
ProfileCommon.h
-
lib/
-
Analysis/
-
ProfileSummaryInfo.cpp
-
ProfileData/
-
InstrProf.cpp
-
InstrProfWriter.cpp
-
ProfileSummaryBuilder.cpp
-
test/tools/llvm-profdata/
-
tools/
-
llvm-profdata/
-
Inputs/
1/2
mix_instr.proftext
-
mix_sample.proftext
-
suppl-instr-with-sample.test
-
tools/llvm-profdata/
-
llvm-profdata/
8/14
llvm-profdata.cpp

Differential D81981

[PGO] Supplement PGO profile with Sample profile
ClosedPublic

Authored by wmi on Jun 16 2020, 4:42 PM.

Download Raw Diff

Details

Reviewers

xur
davidxl
wenlei

Commits

rGa23f62343cb7: Supplement instr profile with sample profile.

Summary

PGO profile is usually more precise than sample profile. However, PGO profile needs to be collected from loadtest and loadtest may not be representative enough to the production workload. Sample profile collected from production can be used as a supplement -- especially for functions cold in loadtest but warm/hot in production, we can use function entry count in sample profile to scale up the related function in PGO profile.

The implementation contains changes in compiler side and llvm-profdata side. In compiler side, during PGO instrumentation and profile-use phase, the patch will guarantee there is a counter in entry block for each function and the counter will be at the first entry in the counter vector. We will use llvm-profdata to merge PGO profile and sample profile, and the output will be a new PGO profile with some counters scaled up. If a function is never executed in PGO profile but hot in sample profile, llvm-profdata will reset the entry count using the related entry count in sample profile multiplied by a scalefactor, at the same time leaving the rest of the counters as zero. If a function has non-zero/cold entry count, but is hot in sample profile, all the counters inside of the function will be scaled up equally.

In the long run, it may be useful to let compiler support using PGO profile and sample profile at the same time, but that requires more careful design and more substantial changes to make two profiles work seamlessly. The patch here serves as a simple intermediate solution.

Diff Detail

Repository: rL LLVM

Event Timeline

wmi created this revision.Jun 16 2020, 4:42 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 16 2020, 4:42 PM

Herald added subscribers: hiraditya, kristof.beyls, eraman. · View Herald Transcript

I think it is good to have an entry counter always, so that the profile dump is more readable. Do you have data showing the instrumentation overhead and profile size impact (clang and some large app)?

In D81981#2099070, @davidxl wrote:

I think it is good to have an entry counter always, so that the profile dump is more readable. Do you have data showing the instrumentation overhead and profile size impact (clang and some large app)?

Yes, I tried clang. The instrumentation runtime overhead increases by about 0.8%. The raw profile size increases by 1.8%. The zipped profile size increases by 0.15%.
Right now in the patch, inserting entry counter is guarded by a flag with default value being false.

Why is the profile size increase? I expect the number of instrumented blocks remain mostly unchanged.

The reason for the question is that if the overhead is low, I think we should make the default to be true.

In D81981#2099452, @davidxl wrote:

Why is the profile size increase? I expect the number of instrumented blocks remain mostly unchanged.

The reason for the question is that if the overhead is low, I think we should make the default to be true.

For function entry bb which have multiple successors, the existing algorithm in FuncPGOInstrumentation<Edge, BBInfo>::getInstrBB will insert the counter in all its successors. My current implementation simply adds a counter in entry block so in that case, it introduces redundent counter.

I can improve it by selecting a successor to not insert counter for it since it can be inferred from the counters surrounding it. With that implemented, I expect the profile size will be unchanged.

It's an interesting idea to improve the PGO profile quality with sample profiles. Thanks for working on this!

In D81981#2099580, @wmi wrote:

In D81981#2099452, @davidxl wrote:

Why is the profile size increase? I expect the number of instrumented blocks remain mostly unchanged.

The reason for the question is that if the overhead is low, I think we should make the default to be true.

For function entry bb which have multiple successors, the existing algorithm in FuncPGOInstrumentation<Edge, BBInfo>::getInstrBB will insert the counter in all its successors. My current implementation simply adds a counter in entry block so in that case, it introduces redundent counter.

I can improve it by selecting a successor to not insert counter for it since it can be inferred from the counters surrounding it. With that implemented, I expect the profile size will be unchanged.

The current PGO instrumentation is based on MST. Changing the instrumentation may require changes in how the counts of non-MST-edges are calculated (in PGOUseFunc::setInstrumentedCounts). So maybe adjust the MST to remove the sibling edge ?

llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
785 ↗	(On Diff #271239)	Remove the lambda and use the newly added static function instead?

yes -- see Hongtao's reply.

If how we select BB to instrument depends on a switch, we would need instrumentation build and optimizing build to have consistent switch, otherwise counters could mismatch even if CFG checksum matches? I guess that's one reason why it'd be good to avoid different ways of selecting BBs.

Would it be possible to tweak/cheat the edge weights just for MST so entry BB is pinned to be non-MST node hence guaranteed to be instrumented directly?

There should not be an option which makes things complicated as Wenlei described. Instead, once this change is done, there would be a version bump (both raw and index). The index format needs to be backward compatible, so there needs to be some version specific handling there (can be removed later).

In D81981#2099702, @wenlei wrote:

If how we select BB to instrument depends on a switch, we would need instrumentation build and optimizing build to have consistent switch, otherwise counters could mismatch even if CFG checksum matches? I guess that's one reason why it'd be good to avoid different ways of selecting BBs.

PGO has already had a common practice to ensure profile-gen and profile-use to have the same flags. So the flag to enable inserting counter in entry block won't cause too much trouble.

Would it be possible to tweak/cheat the edge weights just for MST so entry BB is pinned to be non-MST node hence guaranteed to be instrumented directly?

I consider that but I don't know how to do that. From what I currently understand, MST is to select some edges with highest frequencies and pruning those edges won't affect the inference of the profile of all the edges. We can prune edge when selecting MST but we cannot guarantee a node is selected as an instrumented BB during that phase. Deciding which BB to instrument is done in getInstrBB -- by choosing whether to instrument src node or dst node for each edge.

I am new to this part so if you know there is way to do that, please let me know. That is very appreciated.

In D81981#2099769, @davidxl wrote:

The index format needs to be backward compatible, so there needs to be some version specific handling there (can be removed later).

I don't understand this part. Could you elaborate it -- why index format is different from raw format in backward compatiblity, and what is the version specific handling?

For function entry bb which have multiple successors, the existing algorithm in FuncPGOInstrumentation<Edge, BBInfo>::getInstrBB will insert the counter in all its successors. My current implementation simply adds a counter in entry block so in that case, it introduces redundent counter.

I can improve it by selecting a successor to not insert counter for it since it can be inferred from the counters surrounding it. With that implemented, I expect the profile size will be unchanged.

The current PGO instrumentation is based on MST.

Yes, it is based on two parts, selecting MST is one part and selecting src/dst node of each non-MST edge to instrument is another part.

Changing the instrumentation may require changes in how the counts of non-MST-edges are calculated (in PGOUseFunc::setInstrumentedCounts). So maybe adjust the MST to remove the sibling edge ?

The change to add entry BB as an instrumented BB is in function getInstrumentBBs which is shared by profile-gen and profile-use, so it will be consistent between profile-gen and profile-use. About adjusting MST to remove sibling edge, I feel it is inconsistent with current goal of MST selection. The goal of selecting MST is to avoid instrumenting the most frequent edges so we can minimize the cost. Removing a successor edge of the entry block is a different goal. Mixing these two goals will make things complicated. I feel it is simpler to add the change in the second part -- selecting between src and dst which node to instrument.

There is a use case that user check in indexed format profile for sources that do not change much (e.g. library code), thus the indexed format profile needs to be backward compatible. Raw profile has not such requirement.

For IR PGO, the compatibility requirement is nice to have, but it is probably not a hard requirement as there are other ways to easily break it -- for instance any early inliner changes or CFG cleanup pass changes can make the old profile unusable.

Also is it suffice to just never select the fake edge to entry in MST?

In D81981#2099825, @wmi wrote:

In D81981#2099769, @davidxl wrote:

The index format needs to be backward compatible, so there needs to be some version specific handling there (can be removed later).

I don't understand this part. Could you elaborate it -- why index format is different from raw format in backward compatiblity, and what is the version specific handling?

For function entry bb which have multiple successors, the existing algorithm in FuncPGOInstrumentation<Edge, BBInfo>::getInstrBB will insert the counter in all its successors. My current implementation simply adds a counter in entry block so in that case, it introduces redundent counter.

I can improve it by selecting a successor to not insert counter for it since it can be inferred from the counters surrounding it. With that implemented, I expect the profile size will be unchanged.

The current PGO instrumentation is based on MST.

Yes, it is based on two parts, selecting MST is one part and selecting src/dst node of each non-MST edge to instrument is another part.

Changing the instrumentation may require changes in how the counts of non-MST-edges are calculated (in PGOUseFunc::setInstrumentedCounts). So maybe adjust the MST to remove the sibling edge ?

The change to add entry BB as an instrumented BB is in function getInstrumentBBs which is shared by profile-gen and profile-use, so it will be consistent between profile-gen and profile-use. About adjusting MST to remove sibling edge, I feel it is inconsistent with current goal of MST selection. The goal of selecting MST is to avoid instrumenting the most frequent edges so we can minimize the cost. Removing a successor edge of the entry block is a different goal. Mixing these two goals will make things complicated. I feel it is simpler to add the change in the second part -- selecting between src and dst which node to instrument.

I see. Yes, it's reasonable to avoid changing MST edges, instead, to change which block to instrument for a given edge. I was thinking special logic may be needed for the edge count calculation as well, since it's related to where the instrumentation happens. This should work if we change both places.

Since only non-MST edges are instrumented, I was wondering alternatively we can remove an edge related to the entry block from MST to force the entry instrumented. I think removing the fake entry edge as David suggested is better than removing an outgoing sibling edge from the entry. Removing the fake entry edge from MST will result in one of the outgoing sibling edges added to MST, which in turn will cause the corresponding successor of the entry not instrumented.

Remove the compiler part since that part will be done in https://reviews.llvm.org/D82123.

Add an option -base-scale-function so people doesn't have to always compute the scale factor for PGO/SampleFDO profiles themselves. If user knows for some function its counter value is proportional to the total count of the execution, by specifying the function through -base-scale-function, llvm-profdata will compute the scale factor based on the counter values of the function.

https://reviews.llvm.org/D82123 to always instrument function entry BB has been committed guarded by a flag. https://reviews.llvm.org/D83024 to enable the flag by default is under review.

Can you take another look at the patch?

Can you first split the NFC part (refactoring part such as GetEntryForPercentile) out ?

llvm/include/llvm/ProfileData/InstrProf.h
682	document the parameters.

refactor GetEntryForPercentile out in https://reviews.llvm.org/D83439

Address David's comment.
Adjust comments, function names and flag names.

Fix a wrong flag name in test.

davidxl added inline comments.Jul 9 2020, 11:03 AM

llvm/tools/llvm-profdata/llvm-profdata.cpp
293	this refactoring can also be committed independently
542	make sample file path as the part of the option, so there is no need to handle the ordering.
557	Are these two weights comparable?
845	Is this flag tested?

wmi marked 5 inline comments as done.Jul 10 2020, 7:20 PM

wmi added inline comments.

llvm/tools/llvm-profdata/llvm-profdata.cpp
293	Done in https://reviews.llvm.org/D83521
542	Indeed that will save the ordering handle logic, but I want to use weighted_input to scale the count in sample profile to be roughtly the same as the count in instr profile. To support -supplement-instr-with-sample=<weight>,<filename> will be a little weird and increase complexity.
557	Yes, given "-weighted-input=2, instr_profile -weighted-input=3, sample_profile", that means we want to scale the count in sample profile by 3/2 before update the entry in instr profile.
845	Good point, add tests for this flag and the flag early-inline-size-threshold

Address David's comments.

I also plan to dump the functions cold in instr profile and hot in sample profile, and sort the output according to hotness in sample profile. That can be used to guide PGO users if they want to improve the representativeness of their loadtest. I leave that part in a separate patch for easier review.

I think this feature should be decoupled from the version change -- since this is an approximate anyway.

One way to do this is to use max count or total count as a reference point and compute the scale factor.

llvm/tools/llvm-profdata/llvm-profdata.cpp
604	when there is no scaling, setting instr count with sample count does not make sense. Perhaps just set it to be above cold threshold.

I think this feature should be decoupled from the version change -- since this is an approximate anyway. One way to do this is to use max count or total count as a reference point and compute the scale factor.

If it is uncoupled from the version change, for function with counter values not being 0 in instr profile, it is ok to scale all the counter values by a scale factor based on max count or total count. For function with all counter values being 0, we cannot uniformly scale up all the counter values because that will mess up the branch probability inside of the function. We want to set the entry BB counter to a hot value only so compiler can use static heuristic to compute the branch probability inside of the function. That is why entry BB counter is needed in this feature.

llvm/tools/llvm-profdata/llvm-profdata.cpp
604	Maybe I can make scalefactor an option, and requires user to provide either -scalefactor or -base-scale-function, so that user won't accidentally leave scalefactor to be 1. In this way, I can make sample profile to be the input of the option -supplement-instr-with-sample=, so I can remove the input profile ordering logic.

Address David's comment

If we don't need to handle all zero cases using scaling, we can remove the dependency on the always entry patch.

llvm/tools/llvm-profdata/llvm-profdata.cpp
558	handling this case (all zero case) in this way won't help much -- The branch Probablity pass will set all branch weights to 1 making all branch unbiased -- it is worse than using static heuristic based. The right way I think is to remove their profile entries completely. I assume the compiler pass later will treat such functions as unknown and not put into .text.unlikely.

wmi marked an inline comment as done.Jul 22 2020, 2:42 PM

wmi added inline comments.

llvm/tools/llvm-profdata/llvm-profdata.cpp
558	But I think for a large part of the functions missing in loadtest, they have all zero counter values, so it is better to have a way to handle them. Is it possible for PGO to handle the functions with only entry counter not being zero and with other counters all being zero in a special way -- for those functions, just set the entry count and skip the metadata setting inside of the function? So that those functions can use static profiling inside.

The PGOUse pass can choose not to annotate any branches with total weights == 0. Now the question becomes how do we tell PGOUse pass whether the entry should be set to 0 or leave it not set. There are two ways to do it (to signal it is not really cold, but unknown):

Remove the function from the indexed format profile;
set all counts to some sentinel value such as -1.

Inlining won't be helped unless there is a hot callsite to the all-zero count function -- but this should not exist. I think the major performance hit comes from 1) text.unlikely which may not be mlocked; and 2) all unbiased branches due to zero weights. So doing this depending it on entry count existence is fine, but we still to teach PGOUse to drop the body. I think a simpler design would be

At llvm_profdata side:

if the instrumentation cold function has enough internal counts, just scale up the max internal counts to be a multiple of hot threshold

if the cold function has all zero counts or we believe all their internal counts are not trustworthy (basically ignore step 1) with an option), we can simply discard the function entry completely (to signal this function is actually hot, but we don't know internal counts)

At PGOUse side:

if we don't find counters for a function, set the function's entry value to be above hot threshold (a function statically linked in should always have counts. If there are not counts, it means it is corrected by llvm-profdata).

Address David's comments.

The major change is to remove the dependence on always having entry counter in the profile. For function with all zero instr profile or most of zero instr profile, its counters will be set to all -1. All -1 counters indicates the internal profile for the function is unaccountable and also indicates the function is hot. PGO profile-use will drop all the internal counters while set the function entry count to be several times above hot threshold.

I choose to set all counters to all -1 instead of dropping the profile to express the indications above because I am afraid in rare case, PGO profile may be accidently used when building an unrelated target. If we set the functions to be hot when their profiles cannot be found, we may treat all the functions to be hot and that may bloat up the code and trigger compile-time issue.

davidxl added inline comments.Jul 27 2020, 9:22 AM

llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
1234 ↗	(On Diff #280621)	document the variable.
1239 ↗	(On Diff #280621)	Is it possible to have some blocks -1?
1647 ↗	(On Diff #280621)	oh, just move this comment to the variable decl.
llvm/test/tools/llvm-profdata/Inputs/mix_instr.proftext
12	Do we have a test case for all zero case and below the threshold case (considiered all zero)?
llvm/tools/llvm-profdata/llvm-profdata.cpp
582	Is it possible to delete the instprof record for the function from the profile?

wmi marked 4 inline comments as done.Jul 27 2020, 10:09 AM

wmi added inline comments.

llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
1239 ↗	(On Diff #280621)	I feel having all blocks -1 to indicate unpresentative profile for an actually hot function is simpler than having some blocks -1. That is because when we compute profile summary, we want to strip those unpresentative profile. If we change some blocks to -1 but keep the rest unchanged, those counters will still be used for computing profile summary.
1647 ↗	(On Diff #280621)	Ok, will do.
llvm/test/tools/llvm-profdata/Inputs/mix_instr.proftext
12	Yes, foo is intentionally created for that. The line 22 and line 42 in suppl-instr-with-sample.test test the cases where ratio of zero counter in foo is above or lower the threshold.
llvm/tools/llvm-profdata/llvm-profdata.cpp
582	One reason that I use all -1 as the indication that the function is hot and its profile is unpresentative is: user may build unrelated target together with the PGO optimized target in the same command. I know a lot of SampleFDO user does that to simplify their release. I imagine there could be PGO user doing that too. Another possiblity is to build test targets using the profile. If we delete the instprof record and treat all the functions without instprof to be hot during prof-use, we may accidently treat a lot of cold functions to be hot if the profile is applied on some unrelated targets or tests (tests may be partially related targets), and that may cause compile-time issue.

lgtm

This revision is now accepted and ready to land.Jul 27 2020, 10:59 AM

Closed by commit rGa23f62343cb7: Supplement instr profile with sample profile. (authored by wmi). · Explain WhyJul 27 2020, 9:23 PM

This revision was automatically updated to reflect the committed changes.

wmi added a commit: rGa23f62343cb7: Supplement instr profile with sample profile..

Revision Contents

Path

Size

llvm/

include/

llvm/

ProfileData/

InstrProf.h

10 lines

InstrProfWriter.h

2 lines

ProfileCommon.h

4 lines

lib/

Analysis/

ProfileSummaryInfo.cpp

26 lines

ProfileData/

InstrProf.cpp

15 lines

InstrProfWriter.cpp

2 lines

ProfileSummaryBuilder.cpp

13 lines

test/

tools/

llvm-profdata/

Inputs/

mix_instr.proftext

15 lines

mix_sample.proftext

15 lines

suppl-instr-with-sample.test

34 lines

tools/

llvm-profdata/

llvm-profdata.cpp

207 lines

Diff 273545

llvm/include/llvm/ProfileData/InstrProf.h

Show First 20 Lines • Show All 673 Lines • ▼ Show 20 Lines	struct InstrProfValueSiteRecord {
/// Sort ValueData Descending by Count		/// Sort ValueData Descending by Count
inline void sortByCount();		inline void sortByCount();

/// Merge data from another InstrProfValueSiteRecord		/// Merge data from another InstrProfValueSiteRecord
/// Optionally scale merged counts by \p Weight.		/// Optionally scale merged counts by \p Weight.
void merge(InstrProfValueSiteRecord &Input, uint64_t Weight,		void merge(InstrProfValueSiteRecord &Input, uint64_t Weight,
function_ref<void(instrprof_error)> Warn);		function_ref<void(instrprof_error)> Warn);
/// Scale up value profile data counts.		/// Scale up value profile data counts.
void scale(uint64_t Weight, function_ref<void(instrprof_error)> Warn);		void scale(uint64_t Norm, uint64_t DeNorm,
		davidxlUnsubmitted Done Reply Inline Actions document the parameters. davidxl: document the parameters.
		function_ref<void(instrprof_error)> Warn);

/// Compute the overlap b/w this record and Input record.		/// Compute the overlap b/w this record and Input record.
void overlap(InstrProfValueSiteRecord &Input, uint32_t ValueKind,		void overlap(InstrProfValueSiteRecord &Input, uint32_t ValueKind,
OverlapStats &Overlap, OverlapStats &FuncLevelOverlap);		OverlapStats &Overlap, OverlapStats &FuncLevelOverlap);
};		};

/// Profiling information for a single function.		/// Profiling information for a single function.
struct InstrProfRecord {		struct InstrProfRecord {
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	void addValueData(uint32_t ValueKind, uint32_t Site,
InstrProfSymtab *SymTab);		InstrProfSymtab *SymTab);

/// Merge the counts in \p Other into this one.		/// Merge the counts in \p Other into this one.
/// Optionally scale merged counts by \p Weight.		/// Optionally scale merged counts by \p Weight.
void merge(InstrProfRecord &Other, uint64_t Weight,		void merge(InstrProfRecord &Other, uint64_t Weight,
function_ref<void(instrprof_error)> Warn);		function_ref<void(instrprof_error)> Warn);

/// Scale up profile counts (including value profile data) by		/// Scale up profile counts (including value profile data) by
/// \p Weight.		/// \p Norm then divide the counts by DeNorm.
void scale(uint64_t Weight, function_ref<void(instrprof_error)> Warn);		void scale(uint64_t Norm, uint64_t DeNorm,
		function_ref<void(instrprof_error)> Warn);

/// Sort value profile data (per site) by count.		/// Sort value profile data (per site) by count.
void sortValueData() {		void sortValueData() {
for (uint32_t Kind = IPVK_First; Kind <= IPVK_Last; ++Kind)		for (uint32_t Kind = IPVK_First; Kind <= IPVK_Last; ++Kind)
for (auto &SR : getValueSitesForKind(Kind))		for (auto &SR : getValueSitesForKind(Kind))
SR.sortByCount();		SR.sortByCount();
}		}

▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	private:

// Merge Value Profile data from Src record to this record for ValueKind.		// Merge Value Profile data from Src record to this record for ValueKind.
// Scale merged value counts by \p Weight.		// Scale merged value counts by \p Weight.
void mergeValueProfData(uint32_t ValkeKind, InstrProfRecord &Src,		void mergeValueProfData(uint32_t ValkeKind, InstrProfRecord &Src,
uint64_t Weight,		uint64_t Weight,
function_ref<void(instrprof_error)> Warn);		function_ref<void(instrprof_error)> Warn);

// Scale up value profile data count.		// Scale up value profile data count.
void scaleValueProfData(uint32_t ValueKind, uint64_t Weight,		void scaleValueProfData(uint32_t ValueKind, uint64_t Norm, uint64_t DeNorm,
function_ref<void(instrprof_error)> Warn);		function_ref<void(instrprof_error)> Warn);
};		};

struct NamedInstrProfRecord : InstrProfRecord {		struct NamedInstrProfRecord : InstrProfRecord {
StringRef Name;		StringRef Name;
uint64_t Hash;		uint64_t Hash;

// We reserve this bit as the flag for context sensitive profile record.		// We reserve this bit as the flag for context sensitive profile record.
▲ Show 20 Lines • Show All 297 Lines • Show Last 20 Lines

llvm/include/llvm/ProfileData/InstrProfWriter.h

Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	private:
ProfKind ProfileKind = PF_Unknown;		ProfKind ProfileKind = PF_Unknown;
// Use raw pointer here for the incomplete type object.		// Use raw pointer here for the incomplete type object.
InstrProfRecordWriterTrait *InfoObj;		InstrProfRecordWriterTrait *InfoObj;

public:		public:
InstrProfWriter(bool Sparse = false);		InstrProfWriter(bool Sparse = false);
~InstrProfWriter();		~InstrProfWriter();

		StringMap<ProfilingData> &getProfileData() { return FunctionData; }

/// Add function counts for the given function. If there are already counts		/// Add function counts for the given function. If there are already counts
/// for this function and the hash and number of counts match, each counter is		/// for this function and the hash and number of counts match, each counter is
/// summed. Optionally scale counts by \p Weight.		/// summed. Optionally scale counts by \p Weight.
void addRecord(NamedInstrProfRecord &&I, uint64_t Weight,		void addRecord(NamedInstrProfRecord &&I, uint64_t Weight,
function_ref<void(Error)> Warn);		function_ref<void(Error)> Warn);
void addRecord(NamedInstrProfRecord &&I, function_ref<void(Error)> Warn) {		void addRecord(NamedInstrProfRecord &&I, function_ref<void(Error)> Warn) {
addRecord(std::move(I), 1, Warn);		addRecord(std::move(I), 1, Warn);
}		}
▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

llvm/include/llvm/ProfileData/ProfileCommon.h

Show First 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	protected:
~ProfileSummaryBuilder() = default;		~ProfileSummaryBuilder() = default;

inline void addCount(uint64_t Count);		inline void addCount(uint64_t Count);
void computeDetailedSummary();		void computeDetailedSummary();

public:		public:
/// A vector of useful cutoff values for detailed summary.		/// A vector of useful cutoff values for detailed summary.
static const ArrayRef<uint32_t> DefaultCutoffs;		static const ArrayRef<uint32_t> DefaultCutoffs;

		/// Find the summary entry for a desired percentile of counts.
		static const ProfileSummaryEntry &
		getEntryForPercentile(SummaryEntryVector &DS, uint64_t Percentile);
};		};

class InstrProfSummaryBuilder final : public ProfileSummaryBuilder {		class InstrProfSummaryBuilder final : public ProfileSummaryBuilder {
uint64_t MaxInternalBlockCount = 0;		uint64_t MaxInternalBlockCount = 0;

inline void addEntryCount(uint64_t Count);		inline void addEntryCount(uint64_t Count);
inline void addInternalCount(uint64_t Count);		inline void addInternalCount(uint64_t Count);

Show All 30 Lines

llvm/lib/Analysis/ProfileSummaryInfo.cpp

Show All 13 Lines
#include "llvm/Analysis/ProfileSummaryInfo.h"		#include "llvm/Analysis/ProfileSummaryInfo.h"
#include "llvm/Analysis/BlockFrequencyInfo.h"		#include "llvm/Analysis/BlockFrequencyInfo.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/Metadata.h"		#include "llvm/IR/Metadata.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/IR/ProfileSummary.h"		#include "llvm/IR/ProfileSummary.h"
#include "llvm/InitializePasses.h"		#include "llvm/InitializePasses.h"
		#include "llvm/ProfileData/ProfileCommon.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
using namespace llvm;		using namespace llvm;

// The following two parameters determine the threshold for a count to be		// The following two parameters determine the threshold for a count to be
// considered hot/cold. These two parameters are percentile values (multiplied		// considered hot/cold. These two parameters are percentile values (multiplied
// by 10000). If the counts are sorted in descending order, the minimum count to		// by 10000). If the counts are sorted in descending order, the minimum count to
// reach ProfileSummaryCutoffHot gives the threshold to determine a hot count.		// reach ProfileSummaryCutoffHot gives the threshold to determine a hot count.
// Similarly, the minimum count to reach ProfileSummaryCutoffCold gives the		// Similarly, the minimum count to reach ProfileSummaryCutoffCold gives the
Show All 35 Lines	static cl::opt<int> ProfileSummaryColdCount(
"profile-summary-cold-count", cl::ReallyHidden, cl::ZeroOrMore,		"profile-summary-cold-count", cl::ReallyHidden, cl::ZeroOrMore,
cl::desc("A fixed cold count that overrides the count derived from"		cl::desc("A fixed cold count that overrides the count derived from"
" profile-summary-cutoff-cold"));		" profile-summary-cutoff-cold"));

static cl::opt<bool> PartialProfile(		static cl::opt<bool> PartialProfile(
"partial-profile", cl::Hidden, cl::init(false),		"partial-profile", cl::Hidden, cl::init(false),
cl::desc("Specify the current profile is used as a partial profile."));		cl::desc("Specify the current profile is used as a partial profile."));

// Find the summary entry for a desired percentile of counts.
static const ProfileSummaryEntry &getEntryForPercentile(SummaryEntryVector &DS,
uint64_t Percentile) {
auto It = partition_point(DS, [=](const ProfileSummaryEntry &Entry) {
return Entry.Cutoff < Percentile;
});
// The required percentile has to be <= one of the percentiles in the
// detailed summary.
if (It == DS.end())
report_fatal_error("Desired percentile exceeds the maximum cutoff");
return *It;
}

// The profile summary metadata may be attached either by the frontend or by		// The profile summary metadata may be attached either by the frontend or by
// any backend passes (IR level instrumentation, for example). This method		// any backend passes (IR level instrumentation, for example). This method
// checks if the Summary is null and if so checks if the summary metadata is now		// checks if the Summary is null and if so checks if the summary metadata is now
// available in the module and parses it to get the Summary object. Returns true		// available in the module and parses it to get the Summary object. Returns true
// if a valid Summary is available.		// if a valid Summary is available.
bool ProfileSummaryInfo::computeSummary() {		bool ProfileSummaryInfo::computeSummary() {
if (Summary)		if (Summary)
return true;		return true;
▲ Show 20 Lines • Show All 171 Lines • ▼ Show 20 Lines	bool ProfileSummaryInfo::isFunctionEntryCold(const Function *F) {
return FunctionCount && isColdCount(FunctionCount.getCount());		return FunctionCount && isColdCount(FunctionCount.getCount());
}		}

/// Compute the hot and cold thresholds.		/// Compute the hot and cold thresholds.
void ProfileSummaryInfo::computeThresholds() {		void ProfileSummaryInfo::computeThresholds() {
if (!computeSummary())		if (!computeSummary())
return;		return;
auto &DetailedSummary = Summary->getDetailedSummary();		auto &DetailedSummary = Summary->getDetailedSummary();
auto &HotEntry =		auto &HotEntry = ProfileSummaryBuilder::getEntryForPercentile(
getEntryForPercentile(DetailedSummary, ProfileSummaryCutoffHot);		DetailedSummary, ProfileSummaryCutoffHot);
HotCountThreshold = HotEntry.MinCount;		HotCountThreshold = HotEntry.MinCount;
if (ProfileSummaryHotCount.getNumOccurrences() > 0)		if (ProfileSummaryHotCount.getNumOccurrences() > 0)
HotCountThreshold = ProfileSummaryHotCount;		HotCountThreshold = ProfileSummaryHotCount;
auto &ColdEntry =		auto &ColdEntry = ProfileSummaryBuilder::getEntryForPercentile(
getEntryForPercentile(DetailedSummary, ProfileSummaryCutoffCold);		DetailedSummary, ProfileSummaryCutoffCold);
ColdCountThreshold = ColdEntry.MinCount;		ColdCountThreshold = ColdEntry.MinCount;
if (ProfileSummaryColdCount.getNumOccurrences() > 0)		if (ProfileSummaryColdCount.getNumOccurrences() > 0)
ColdCountThreshold = ProfileSummaryColdCount;		ColdCountThreshold = ProfileSummaryColdCount;
assert(ColdCountThreshold <= HotCountThreshold &&		assert(ColdCountThreshold <= HotCountThreshold &&
"Cold count threshold cannot exceed hot count threshold!");		"Cold count threshold cannot exceed hot count threshold!");
HasHugeWorkingSetSize =		HasHugeWorkingSetSize =
HotEntry.NumCounts > ProfileSummaryHugeWorkingSetSizeThreshold;		HotEntry.NumCounts > ProfileSummaryHugeWorkingSetSizeThreshold;
HasLargeWorkingSetSize =		HasLargeWorkingSetSize =
HotEntry.NumCounts > ProfileSummaryLargeWorkingSetSizeThreshold;		HotEntry.NumCounts > ProfileSummaryLargeWorkingSetSizeThreshold;
}		}

Optional<uint64_t> ProfileSummaryInfo::computeThreshold(int PercentileCutoff) {		Optional<uint64_t> ProfileSummaryInfo::computeThreshold(int PercentileCutoff) {
if (!computeSummary())		if (!computeSummary())
return None;		return None;
auto iter = ThresholdCache.find(PercentileCutoff);		auto iter = ThresholdCache.find(PercentileCutoff);
if (iter != ThresholdCache.end()) {		if (iter != ThresholdCache.end()) {
return iter->second;		return iter->second;
}		}
auto &DetailedSummary = Summary->getDetailedSummary();		auto &DetailedSummary = Summary->getDetailedSummary();
auto &Entry =		auto &Entry = ProfileSummaryBuilder::getEntryForPercentile(DetailedSummary,
getEntryForPercentile(DetailedSummary, PercentileCutoff);		PercentileCutoff);
uint64_t CountThreshold = Entry.MinCount;		uint64_t CountThreshold = Entry.MinCount;
ThresholdCache[PercentileCutoff] = CountThreshold;		ThresholdCache[PercentileCutoff] = CountThreshold;
return CountThreshold;		return CountThreshold;
}		}

bool ProfileSummaryInfo::hasHugeWorkingSetSize() {		bool ProfileSummaryInfo::hasHugeWorkingSetSize() {
if (!HasHugeWorkingSetSize)		if (!HasHugeWorkingSetSize)
computeThresholds();		computeThresholds();
▲ Show 20 Lines • Show All 149 Lines • Show Last 20 Lines

llvm/lib/ProfileData/InstrProf.cpp

Show First 20 Lines • Show All 619 Lines • ▼ Show 20 Lines	if (I != IE && I->Value == J->Value) {
Warn(instrprof_error::counter_overflow);		Warn(instrprof_error::counter_overflow);
++I;		++I;
continue;		continue;
}		}
ValueData.insert(I, *J);		ValueData.insert(I, *J);
}		}
}		}

void InstrProfValueSiteRecord::scale(uint64_t Weight,		void InstrProfValueSiteRecord::scale(uint64_t Norm, uint64_t DeNorm,
function_ref<void(instrprof_error)> Warn) {		function_ref<void(instrprof_error)> Warn) {
for (auto I = ValueData.begin(), IE = ValueData.end(); I != IE; ++I) {		for (auto I = ValueData.begin(), IE = ValueData.end(); I != IE; ++I) {
bool Overflowed;		bool Overflowed;
I->Count = SaturatingMultiply(I->Count, Weight, &Overflowed);		I->Count = SaturatingMultiply(I->Count, Norm, &Overflowed) / DeNorm;
if (Overflowed)		if (Overflowed)
Warn(instrprof_error::counter_overflow);		Warn(instrprof_error::counter_overflow);
}		}
}		}

// Merge Value Profile data from Src record to this record for ValueKind.		// Merge Value Profile data from Src record to this record for ValueKind.
// Scale merged value counts by \p Weight.		// Scale merged value counts by \p Weight.
void InstrProfRecord::mergeValueProfData(		void InstrProfRecord::mergeValueProfData(
Show All 32 Lines	if (Overflowed)
Warn(instrprof_error::counter_overflow);		Warn(instrprof_error::counter_overflow);
}		}

for (uint32_t Kind = IPVK_First; Kind <= IPVK_Last; ++Kind)		for (uint32_t Kind = IPVK_First; Kind <= IPVK_Last; ++Kind)
mergeValueProfData(Kind, Other, Weight, Warn);		mergeValueProfData(Kind, Other, Weight, Warn);
}		}

void InstrProfRecord::scaleValueProfData(		void InstrProfRecord::scaleValueProfData(
uint32_t ValueKind, uint64_t Weight,		uint32_t ValueKind, uint64_t Norm, uint64_t DeNorm,
function_ref<void(instrprof_error)> Warn) {		function_ref<void(instrprof_error)> Warn) {
for (auto &R : getValueSitesForKind(ValueKind))		for (auto &R : getValueSitesForKind(ValueKind))
R.scale(Weight, Warn);		R.scale(Norm, DeNorm, Warn);
}		}

void InstrProfRecord::scale(uint64_t Weight,		void InstrProfRecord::scale(uint64_t Norm, uint64_t DeNorm,
function_ref<void(instrprof_error)> Warn) {		function_ref<void(instrprof_error)> Warn) {
		assert(DeNorm != 0 && "DeNorm cannot be 0");
for (auto &Count : this->Counts) {		for (auto &Count : this->Counts) {
bool Overflowed;		bool Overflowed;
Count = SaturatingMultiply(Count, Weight, &Overflowed);		Count = SaturatingMultiply(Count, Norm, &Overflowed) / DeNorm;
if (Overflowed)		if (Overflowed)
Warn(instrprof_error::counter_overflow);		Warn(instrprof_error::counter_overflow);
}		}
for (uint32_t Kind = IPVK_First; Kind <= IPVK_Last; ++Kind)		for (uint32_t Kind = IPVK_First; Kind <= IPVK_Last; ++Kind)
scaleValueProfData(Kind, Weight, Warn);		scaleValueProfData(Kind, Norm, DeNorm, Warn);
}		}

// Map indirect call target name hash to name string.		// Map indirect call target name hash to name string.
uint64_t InstrProfRecord::remapValue(uint64_t Value, uint32_t ValueKind,		uint64_t InstrProfRecord::remapValue(uint64_t Value, uint32_t ValueKind,
InstrProfSymtab *SymTab) {		InstrProfSymtab *SymTab) {
if (!SymTab)		if (!SymTab)
return Value;		return Value;

▲ Show 20 Lines • Show All 580 Lines • Show Last 20 Lines

llvm/lib/ProfileData/InstrProfWriter.cpp

Show First 20 Lines • Show All 234 Lines • ▼ Show 20 Lines	void InstrProfWriter::addRecord(StringRef Name, uint64_t Hash,
auto MapWarn = [&](instrprof_error E) {		auto MapWarn = [&](instrprof_error E) {
Warn(make_error<InstrProfError>(E));		Warn(make_error<InstrProfError>(E));
};		};

if (NewFunc) {		if (NewFunc) {
// We've never seen a function with this name and hash, add it.		// We've never seen a function with this name and hash, add it.
Dest = std::move(I);		Dest = std::move(I);
if (Weight > 1)		if (Weight > 1)
Dest.scale(Weight, MapWarn);		Dest.scale(Weight, 1, MapWarn);
} else {		} else {
// We're updating a function we've seen before.		// We're updating a function we've seen before.
Dest.merge(I, Weight, MapWarn);		Dest.merge(I, Weight, MapWarn);
}		}

Dest.sortValueData();		Dest.sortValueData();
}		}

▲ Show 20 Lines • Show All 220 Lines • Show Last 20 Lines

llvm/lib/ProfileData/ProfileSummaryBuilder.cpp

	Show All 25 Lines
	static const uint32_t DefaultCutoffsData[] = {			static const uint32_t DefaultCutoffsData[] = {
	10000, /* 1% */			10000, /* 1% */
	100000, /* 10% */			100000, /* 10% */
	200000, 300000, 400000, 500000, 600000, 700000, 800000,			200000, 300000, 400000, 500000, 600000, 700000, 800000,
	900000, 950000, 990000, 999000, 999900, 999990, 999999};			900000, 950000, 990000, 999000, 999900, 999990, 999999};
	const ArrayRef<uint32_t> ProfileSummaryBuilder::DefaultCutoffs =			const ArrayRef<uint32_t> ProfileSummaryBuilder::DefaultCutoffs =
	DefaultCutoffsData;			DefaultCutoffsData;

				const ProfileSummaryEntry &
				ProfileSummaryBuilder::getEntryForPercentile(SummaryEntryVector &DS,
				uint64_t Percentile) {
				auto It = partition_point(DS, [=](const ProfileSummaryEntry &Entry) {
				return Entry.Cutoff < Percentile;
				});
				// The required percentile has to be <= one of the percentiles in the
				// detailed summary.
				if (It == DS.end())
				report_fatal_error("Desired percentile exceeds the maximum cutoff");
				return *It;
				}

	void InstrProfSummaryBuilder::addRecord(const InstrProfRecord &R) {			void InstrProfSummaryBuilder::addRecord(const InstrProfRecord &R) {
	// The first counter is not necessarily an entry count for IR			// The first counter is not necessarily an entry count for IR
	// instrumentation profiles.			// instrumentation profiles.
	// Eventually MaxFunctionCount will become obsolete and this can be			// Eventually MaxFunctionCount will become obsolete and this can be
	// removed.			// removed.
	addEntryCount(R.Counts[0]);			addEntryCount(R.Counts[0]);
	for (size_t I = 1, E = R.Counts.size(); I < E; ++I)			for (size_t I = 1, E = R.Counts.size(); I < E; ++I)
	addInternalCount(R.Counts[I]);			addInternalCount(R.Counts[I]);
	▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines

llvm/test/tools/llvm-profdata/Inputs/mix_instr.proftext

This file was added.

				:ir
				foo
				7
				4
				2
				3
				9
				4

				goo
				5
				3
				davidxlUnsubmitted Not Done Reply Inline Actions Do we have a test case for all zero case and below the threshold case (considiered all zero)? davidxl: Do we have a test case for all zero case and below the threshold case (considiered all zero)?
				wmiAuthorUnsubmitted Done Reply Inline Actions Yes, foo is intentionally created for that. The line 22 and line 42 in suppl-instr-with-sample.test test the cases where ratio of zero counter in foo is above or lower the threshold. wmi: Yes, foo is intentionally created for that. The line 22 and line 42 in suppl-instr-with-sample.
				0
				0
				0

llvm/test/tools/llvm-profdata/Inputs/mix_sample.proftext

This file was added.

				foo:2000:2000
				1: 2000
				goo:3000:1500
				1: 1200
				2: 800
				3: 1000
				hoo:50:1
				1: 1
				2: 2
				3: 3
				4: 4
				5: 5
				6: 6
				7: 7
				8: 8

llvm/test/tools/llvm-profdata/suppl-instr-with-sample.test

This file was added.

				Some basic tests for supplementing instrumentation profile with sample profile.

				RUN: llvm-profdata merge -mix-instr-sample-profiles \
				RUN: -early-inline-size-threshold=0 \
				RUN: %p/Inputs/mix_instr.proftext \
				RUN: %p/Inputs/mix_sample.proftext -o %t
				RUN: llvm-profdata show %t -all-functions -counts \| FileCheck %s --check-prefix=MIX1

				MIX1: foo:
				MIX1-NEXT: Hash: 0x0000000000000007
				MIX1-NEXT: Counters: 4
				MIX1-NEXT: Block counts: [2000, 3000, 9000, 4000]
				MIX1: goo:
				MIX1-NEXT: Hash: 0x0000000000000005
				MIX1-NEXT: Counters: 3
				MIX1-NEXT: Block counts: [1500, 0, 0]

				Some basic tests for supplementing instrumentation profile with sample profile.

				RUN: llvm-profdata merge -mix-instr-sample-profiles \
				RUN: -early-inline-size-threshold=0 \
				RUN: -weighted-input=2,%p/Inputs/mix_instr.proftext \
				RUN: -weighted-input=3,%p/Inputs/mix_sample.proftext -o %t
				RUN: llvm-profdata show %t -all-functions -counts \| FileCheck %s --check-prefix=MIX2

				MIX2: foo:
				MIX2-NEXT: Hash: 0x0000000000000007
				MIX2-NEXT: Counters: 4
				MIX2-NEXT: Block counts: [3000, 4500, 13500, 6000]
				MIX2: goo:
				MIX2-NEXT: Hash: 0x0000000000000005
				MIX2-NEXT: Counters: 3
				MIX2-NEXT: Block counts: [2250, 0, 0]

llvm/tools/llvm-profdata/llvm-profdata.cpp

Show First 20 Lines • Show All 284 Lines • ▼ Show 20 Lines	Dst->Writer.mergeRecordsFromWriter(std::move(Src->Writer), [&](Error E) {
instrprof_error IPE = InstrProfError::take(std::move(E));		instrprof_error IPE = InstrProfError::take(std::move(E));
std::unique_lock<std::mutex> ErrGuard{Dst->ErrLock};		std::unique_lock<std::mutex> ErrGuard{Dst->ErrLock};
bool firstTime = Dst->WriterErrorCodes.insert(IPE).second;		bool firstTime = Dst->WriterErrorCodes.insert(IPE).second;
if (firstTime)		if (firstTime)
warn(toString(make_error<InstrProfError>(IPE)));		warn(toString(make_error<InstrProfError>(IPE)));
});		});
}		}

		static void writeInstrProfile(StringRef OutputFilename,
		davidxlUnsubmitted Done Reply Inline Actions this refactoring can also be committed independently davidxl: this refactoring can also be committed independently
		wmiAuthorUnsubmitted Done Reply Inline Actions Done in https://reviews.llvm.org/D83521 wmi: Done in https://reviews.llvm.org/D83521
		ProfileFormat OutputFormat,
		InstrProfWriter &Writer) {
		std::error_code EC;
		raw_fd_ostream Output(OutputFilename.data(), EC, sys::fs::OF_None);
		if (EC)
		exitWithErrorCode(EC, OutputFilename);

		if (OutputFormat == PF_Text) {
		if (Error E = Writer.writeText(Output))
		exitWithError(std::move(E));
		} else {
		Writer.write(Output);
		}
		}

static void mergeInstrProfile(const WeightedFileVector &Inputs,		static void mergeInstrProfile(const WeightedFileVector &Inputs,
SymbolRemapper *Remapper,		SymbolRemapper *Remapper,
StringRef OutputFilename,		StringRef OutputFilename,
ProfileFormat OutputFormat, bool OutputSparse,		ProfileFormat OutputFormat, bool OutputSparse,
unsigned NumThreads, FailureMode FailMode) {		unsigned NumThreads, FailureMode FailMode) {
if (OutputFilename.compare("-") == 0)		if (OutputFilename.compare("-") == 0)
exitWithError("Cannot write indexed profdata format to stdout.");		exitWithError("Cannot write indexed profdata format to stdout.");

▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	for (auto &ErrorPair : WC->Errors) {
++NumErrors;		++NumErrors;
warn(toString(std::move(ErrorPair.first)), ErrorPair.second);		warn(toString(std::move(ErrorPair.first)), ErrorPair.second);
}		}
}		}
if (NumErrors == Inputs.size() \|\|		if (NumErrors == Inputs.size() \|\|
(NumErrors > 0 && FailMode == failIfAnyAreInvalid))		(NumErrors > 0 && FailMode == failIfAnyAreInvalid))
exitWithError("No profiles could be merged.");		exitWithError("No profiles could be merged.");

std::error_code EC;		writeInstrProfile(OutputFilename, OutputFormat, Contexts[0]->Writer);
raw_fd_ostream Output(OutputFilename.data(), EC, sys::fs::OF_None);		}
if (EC)
exitWithErrorCode(EC, OutputFilename);

InstrProfWriter &Writer = Contexts[0]->Writer;		static bool tryReadInstrProf(std::string Filename,
if (OutputFormat == PF_Text) {		std::unique_ptr<WriterContext> &WC,
if (Error E = Writer.writeText(Output))		bool OutputSparse) {
exitWithError(std::move(E));		std::mutex ErrorLock;
} else {		SmallSet<instrprof_error, 4> WriterErrorCodes;
Writer.write(Output);
		// Initialize the writer contexts.
		WC = std::make_unique<WriterContext>(OutputSparse, ErrorLock,
		WriterErrorCodes);

		loadInput({Filename, 1}, nullptr, WC.get());
		if (WC->Errors.size() > 0)
		exitWithError(std::move(WC->Errors[0].first), Filename);
		return true;
		}

		static bool
		tryReadSampleProf(std::string Filename,
		std::unique_ptr<sampleprof::SampleProfileReader> &Reader,
		bool ExitIfErr) {
		LLVMContext Context;
		auto ReaderOrErr = sampleprof::SampleProfileReader::create(Filename, Context);
		if (std::error_code EC = ReaderOrErr.getError()) {
		if (ExitIfErr)
		exitWithErrorCode(EC, Filename);
		return false;
		}

		Reader = std::move(ReaderOrErr.get());
		if (std::error_code EC = Reader->read()) {
		if (ExitIfErr)
		exitWithErrorCode(EC, Filename);
		return false;
		}
		return true;
}		}

		// The profile entry for a function in instrumentation profile.
		struct InstrProfileEntry {
		uint64_t EntryCount;
		InstrProfRecord *ProfRecord;
		};

		static void updateInstrProfileEntry(InstrProfileEntry &IFE,
		uint64_t SampleEntryCount,
		double ScaleFactor) {
		uint64_t InstrEntryCount = IFE.EntryCount;
		InstrProfRecord *ProfRecord = IFE.ProfRecord;
		if (!InstrEntryCount) {
		// If the function is never executed in instrumentation profile,
		// adjust its entry count using sample profile and leave other
		// counters as 0.
		ProfRecord->Counts[0] = SampleEntryCount * ScaleFactor;
		return;
		}
		// Scale up all the counters in the function equally.
		uint64_t Norm = (uint64_t)(SampleEntryCount * ScaleFactor);
		uint64_t DeNorm = InstrEntryCount;
		// Don't scale down the Instr profile.
		if (Norm <= DeNorm)
		return;
		ProfRecord->scale(Norm, DeNorm, [&](instrprof_error E) {
		warn(toString(make_error<InstrProfError>(E)));
		});
		}

		const uint64_t ColdPercentileIdx = 15;

		static void
		findFuncProfilesToFix(std::unique_ptr<WriterContext> &WC,
		std::unique_ptr<sampleprof::SampleProfileReader> &Reader,
		double ScaleFactor, unsigned EarlyInlineSizeThreshold,
		const std::string &BaseScaleFunction) {
		uint64_t BaseInstrCounter = 0;
		uint64_t BaseSampleCounter = 0;
		StringMap<InstrProfileEntry> InstrProfileMap;
		InstrProfSummaryBuilder IPBuilder(ProfileSummaryBuilder::DefaultCutoffs);
		for (auto &PD : WC->Writer.getProfileData()) {
		for (const auto &PDV : PD.getValue()) {
		InstrProfRecord Record = PDV.second;
		IPBuilder.addRecord(Record);
		}
		if (PD.getValue().size() == 1) {
		InstrProfRecord *R = &PD.getValue().begin()->second;
		InstrProfileMap[PD.getKey()] = {R->Counts[0], R};

		if (!BaseScaleFunction.empty() && PD.getKey() == BaseScaleFunction)
		BaseInstrCounter = R->Counts[0];
		}
		}
		for (const auto &PD : Reader->getProfiles()) {
		StringRef FName = PD.getKey();
		const sampleprof::FunctionSamples &FS = PD.getValue();

		if (!BaseScaleFunction.empty() && FName == BaseScaleFunction)
		BaseSampleCounter = FS.getHeadSamples();
		}
		if (BaseInstrCounter != 0 && BaseSampleCounter != 0)
		ScaleFactor = BaseInstrCounter / (double)BaseSampleCounter;

		ProfileSummary InstrPS = *IPBuilder.getSummary();
		ProfileSummary SamplePS = Reader->getSummary();

		uint64_t ColdSampleThreshold =
		ProfileSummaryBuilder::getEntryForPercentile(
		SamplePS.getDetailedSummary(),
		ProfileSummaryBuilder::DefaultCutoffs[ColdPercentileIdx])
		.MinCount;
		uint64_t ColdInstrThreshold =
		ProfileSummaryBuilder::getEntryForPercentile(
		InstrPS.getDetailedSummary(),
		ProfileSummaryBuilder::DefaultCutoffs[ColdPercentileIdx])
		.MinCount;
		for (const auto &PD : Reader->getProfiles()) {
		StringRef FName = PD.getKey();
		const sampleprof::FunctionSamples &FS = PD.getValue();
		auto It = InstrProfileMap.find(FName);
		// Find a hot/warm entry in sample profile which is cold in instr profile.
		if (FS.getHeadSamples() > ColdSampleThreshold &&
		It != InstrProfileMap.end() &&
		It->second.EntryCount <= ColdInstrThreshold &&
		FS.getBodySamples().size() >= EarlyInlineSizeThreshold) {
		updateInstrProfileEntry(It->second, FS.getHeadSamples(), ScaleFactor);
		}
		}
		}

		static void mergeMixedProfile(const WeightedFileVector &Inputs,
		StringRef OutputFilename,
		ProfileFormat OutputFormat, bool OutputSparse,
		unsigned EarlyInlineSizeThreshold,
		const std::string &BaseScaleFunction) {
		if (OutputFilename.compare("-") == 0)
		exitWithError("Cannot write indexed profdata format to stdout.");
		if (Inputs.size() != 2)
		exitWithError("Expect two inputs when merging profiles in mixed mode.");

		std::unique_ptr<sampleprof::SampleProfileReader> Reader;
		std::unique_ptr<WriterContext> WC;
		// Make sure Inputs[i] is sample profile and Inputs[i - 1] is
		// instrumentation profile.
		int i = 1;
		if (!tryReadSampleProf(Inputs[i].Filename, Reader, false))
		i = 0;

		tryReadSampleProf(Inputs[i].Filename, Reader, true);
		tryReadInstrProf(Inputs[1 - i].Filename, WC, OutputSparse);

		if (!BaseScaleFunction.empty() &&
		(Inputs[i].Weight != 1 \|\| Inputs[1 - i].Weight != 1))
		exitWithError("Don't use -base-scale-function and weighted inputs "
		"together. ");

		findFuncProfilesToFix(WC, Reader,
		Inputs[i].Weight / (double)Inputs[1 - i].Weight,
		EarlyInlineSizeThreshold, BaseScaleFunction);
		davidxlUnsubmitted Not Done Reply Inline Actions make sample file path as the part of the option, so there is no need to handle the ordering. davidxl: make sample file path as the part of the option, so there is no need to handle the ordering.
		wmiAuthorUnsubmitted Done Reply Inline Actions Indeed that will save the ordering handle logic, but I want to use weighted_input to scale the count in sample profile to be roughtly the same as the count in instr profile. To support -supplement-instr-with-sample=<weight>,<filename> will be a little weird and increase complexity. wmi: Indeed that will save the ordering handle logic, but I want to use weighted_input to scale the…
		writeInstrProfile(OutputFilename, OutputFormat, WC->Writer);
}		}

/// Make a copy of the given function samples with all symbol names remapped		/// Make a copy of the given function samples with all symbol names remapped
/// by the provided symbol remapper.		/// by the provided symbol remapper.
static sampleprof::FunctionSamples		static sampleprof::FunctionSamples
remapSamples(const sampleprof::FunctionSamples &Samples,		remapSamples(const sampleprof::FunctionSamples &Samples,
SymbolRemapper &Remapper, sampleprof_error &Error) {		SymbolRemapper &Remapper, sampleprof_error &Error) {
sampleprof::FunctionSamples Result;		sampleprof::FunctionSamples Result;
Result.setName(Remapper(Samples.getName()));		Result.setName(Remapper(Samples.getName()));
Result.addTotalSamples(Samples.getTotalSamples());		Result.addTotalSamples(Samples.getTotalSamples());
Result.addHeadSamples(Samples.getHeadSamples());		Result.addHeadSamples(Samples.getHeadSamples());
for (const auto &BodySample : Samples.getBodySamples()) {		for (const auto &BodySample : Samples.getBodySamples()) {
Result.addBodySamples(BodySample.first.LineOffset,		Result.addBodySamples(BodySample.first.LineOffset,
BodySample.first.Discriminator,		BodySample.first.Discriminator,
		davidxlUnsubmitted Not Done Reply Inline Actions Are these two weights comparable? davidxl: Are these two weights comparable?
		wmiAuthorUnsubmitted Done Reply Inline Actions Yes, given "-weighted-input=2, instr_profile -weighted-input=3, sample_profile", that means we want to scale the count in sample profile by 3/2 before update the entry in instr profile. wmi: Yes, given "-weighted-input=2, instr_profile -weighted-input=3, sample_profile", that means we…
BodySample.second.getSamples());		BodySample.second.getSamples());
		davidxlUnsubmitted Not Done Reply Inline Actions handling this case (all zero case) in this way won't help much -- The branch Probablity pass will set all branch weights to 1 making all branch unbiased -- it is worse than using static heuristic based. The right way I think is to remove their profile entries completely. I assume the compiler pass later will treat such functions as unknown and not put into .text.unlikely. davidxl: handling this case (all zero case) in this way won't help much -- The branch Probablity pass…
		wmiAuthorUnsubmitted Done Reply Inline Actions But I think for a large part of the functions missing in loadtest, they have all zero counter values, so it is better to have a way to handle them. Is it possible for PGO to handle the functions with only entry counter not being zero and with other counters all being zero in a special way -- for those functions, just set the entry count and skip the metadata setting inside of the function? So that those functions can use static profiling inside. wmi: But I think for a large part of the functions missing in loadtest, they have all zero counter…
for (const auto &Target : BodySample.second.getCallTargets()) {		for (const auto &Target : BodySample.second.getCallTargets()) {
Result.addCalledTargetSamples(BodySample.first.LineOffset,		Result.addCalledTargetSamples(BodySample.first.LineOffset,
BodySample.first.Discriminator,		BodySample.first.Discriminator,
Remapper(Target.first()), Target.second);		Remapper(Target.first()), Target.second);
}		}
}		}
for (const auto &CallsiteSamples : Samples.getCallsiteSamples()) {		for (const auto &CallsiteSamples : Samples.getCallsiteSamples()) {
sampleprof::FunctionSamplesMap &Target =		sampleprof::FunctionSamplesMap &Target =
Result.functionSamplesAt(CallsiteSamples.first);		Result.functionSamplesAt(CallsiteSamples.first);
for (const auto &Callsite : CallsiteSamples.second) {		for (const auto &Callsite : CallsiteSamples.second) {
sampleprof::FunctionSamples Remapped =		sampleprof::FunctionSamples Remapped =
remapSamples(Callsite.second, Remapper, Error);		remapSamples(Callsite.second, Remapper, Error);
MergeResult(Error,		MergeResult(Error,
Target[std::string(Remapped.getName())].merge(Remapped));		Target[std::string(Remapped.getName())].merge(Remapped));
}		}
}		}
return Result;		return Result;
}		}

static sampleprof::SampleProfileFormat FormatMap[] = {		static sampleprof::SampleProfileFormat FormatMap[] = {
sampleprof::SPF_None,		sampleprof::SPF_None,
sampleprof::SPF_Text,		sampleprof::SPF_Text,
sampleprof::SPF_Compact_Binary,		sampleprof::SPF_Compact_Binary,
sampleprof::SPF_Ext_Binary,		sampleprof::SPF_Ext_Binary,
		davidxlUnsubmitted Not Done Reply Inline Actions Is it possible to delete the instprof record for the function from the profile? davidxl: Is it possible to delete the instprof record for the function from the profile?
		wmiAuthorUnsubmitted Done Reply Inline Actions One reason that I use all -1 as the indication that the function is hot and its profile is unpresentative is: user may build unrelated target together with the PGO optimized target in the same command. I know a lot of SampleFDO user does that to simplify their release. I imagine there could be PGO user doing that too. Another possiblity is to build test targets using the profile. If we delete the instprof record and treat all the functions without instprof to be hot during prof-use, we may accidently treat a lot of cold functions to be hot if the profile is applied on some unrelated targets or tests (tests may be partially related targets), and that may cause compile-time issue. wmi: One reason that I use all -1 as the indication that the function is hot and its profile is…
sampleprof::SPF_GCC,		sampleprof::SPF_GCC,
sampleprof::SPF_Binary};		sampleprof::SPF_Binary};

static std::unique_ptr<MemoryBuffer>		static std::unique_ptr<MemoryBuffer>
getInputFileBuf(const StringRef &InputFile) {		getInputFileBuf(const StringRef &InputFile) {
if (InputFile == "")		if (InputFile == "")
return {};		return {};

auto BufOrError = MemoryBuffer::getFileOrSTDIN(InputFile);		auto BufOrError = MemoryBuffer::getFileOrSTDIN(InputFile);
if (!BufOrError)		if (!BufOrError)
exitWithErrorCode(BufOrError.getError(), InputFile);		exitWithErrorCode(BufOrError.getError(), InputFile);

return std::move(*BufOrError);		return std::move(*BufOrError);
}		}

static void populateProfileSymbolList(MemoryBuffer *Buffer,		static void populateProfileSymbolList(MemoryBuffer *Buffer,
sampleprof::ProfileSymbolList &PSL) {		sampleprof::ProfileSymbolList &PSL) {
if (!Buffer)		if (!Buffer)
return;		return;

SmallVector<StringRef, 32> SymbolVec;		SmallVector<StringRef, 32> SymbolVec;
StringRef Data = Buffer->getBuffer();		StringRef Data = Buffer->getBuffer();
		davidxlUnsubmitted Not Done Reply Inline Actions when there is no scaling, setting instr count with sample count does not make sense. Perhaps just set it to be above cold threshold. davidxl: when there is no scaling, setting instr count with sample count does not make sense. Perhaps…
		wmiAuthorUnsubmitted Done Reply Inline Actions Maybe I can make scalefactor an option, and requires user to provide either -scalefactor or -base-scale-function, so that user won't accidentally leave scalefactor to be 1. In this way, I can make sample profile to be the input of the option -supplement-instr-with-sample=, so I can remove the input profile ordering logic. wmi: Maybe I can make scalefactor an option, and requires user to provide either -scalefactor or…
Data.split(SymbolVec, '\n', /MaxSplit=/-1, /KeepEmpty=/false);		Data.split(SymbolVec, '\n', /MaxSplit=/-1, /KeepEmpty=/false);

for (StringRef symbol : SymbolVec)		for (StringRef symbol : SymbolVec)
PSL.add(symbol);		PSL.add(symbol);
}		}

static void handleExtBinaryWriter(sampleprof::SampleProfileWriter &Writer,		static void handleExtBinaryWriter(sampleprof::SampleProfileWriter &Writer,
ProfileFormat OutputFormat,		ProfileFormat OutputFormat,
▲ Show 20 Lines • Show All 219 Lines • ▼ Show 20 Lines	cl::opt<bool> CompressAllSections(
"meaningful for -extbinary)"));		"meaningful for -extbinary)"));
cl::opt<bool> UseMD5(		cl::opt<bool> UseMD5(
"use-md5", cl::init(false), cl::Hidden,		"use-md5", cl::init(false), cl::Hidden,
cl::desc("Choose to use MD5 to represent string in name table (only "		cl::desc("Choose to use MD5 to represent string in name table (only "
"meaningful for -extbinary)"));		"meaningful for -extbinary)"));
cl::opt<bool> GenPartialProfile(		cl::opt<bool> GenPartialProfile(
"gen-partial-profile", cl::init(false), cl::Hidden,		"gen-partial-profile", cl::init(false), cl::Hidden,
cl::desc("Generate a partial profile (only meaningful for -extbinary)"));		cl::desc("Generate a partial profile (only meaningful for -extbinary)"));
		cl::opt<bool> MixInstrSampleProfiles(
		"mix-instr-sample-profiles", cl::init(false), cl::Hidden,
		cl::desc("Supplement an instrumentation profile with sample profile, and "
		"output in instrumentation format (only works with -instr)"));
		cl::opt<std::string> BaseScaleFunction(
		"base-scale-function", cl::init(""), cl::Hidden,
		davidxlUnsubmitted Not Done Reply Inline Actions Is this flag tested? davidxl: Is this flag tested?
		wmiAuthorUnsubmitted Done Reply Inline Actions Good point, add tests for this flag and the flag early-inline-size-threshold wmi: Good point, add tests for this flag and the flag early-inline-size-threshold
		cl::desc("When supplementing an instrumentation profile with sample "
		"profile, use the input of the flag to compute the "
		"ScaleFactor. "));
		cl::opt<unsigned> EarlyInlineSizeThreshold(
		"early-inline-size-threshold", cl::init(10), cl::Hidden,
		cl::desc("If a function can be inlined by PGO early inliner, don't "
		"scale it up using sample profile. "));

cl::ParseCommandLineOptions(argc, argv, "LLVM profile data merger\n");		cl::ParseCommandLineOptions(argc, argv, "LLVM profile data merger\n");

WeightedFileVector WeightedInputs;		WeightedFileVector WeightedInputs;
for (StringRef Filename : InputFilenames)		for (StringRef Filename : InputFilenames)
addWeightedInput(WeightedInputs, {std::string(Filename), 1});		addWeightedInput(WeightedInputs, {std::string(Filename), 1});
for (StringRef WeightedFilename : WeightedInputFilenames)		for (StringRef WeightedFilename : WeightedInputFilenames)
addWeightedInput(WeightedInputs, parseWeightedFile(WeightedFilename));		addWeightedInput(WeightedInputs, parseWeightedFile(WeightedFilename));
Show All 12 Lines	for (auto &WF : WeightedInputs)
outs() << WF.Weight << "," << WF.Filename << "\n";		outs() << WF.Weight << "," << WF.Filename << "\n";
return 0;		return 0;
}		}

std::unique_ptr<SymbolRemapper> Remapper;		std::unique_ptr<SymbolRemapper> Remapper;
if (!RemappingFile.empty())		if (!RemappingFile.empty())
Remapper = SymbolRemapper::create(RemappingFile);		Remapper = SymbolRemapper::create(RemappingFile);

		if (MixInstrSampleProfiles) {
		if (ProfileKind != instr)
		exitWithError("-mix-instr-sample-profiles can only work with -instr. ");

		mergeMixedProfile(WeightedInputs, OutputFilename, OutputFormat,
		OutputSparse, EarlyInlineSizeThreshold,
		BaseScaleFunction);
		return 0;
		}

if (ProfileKind == instr)		if (ProfileKind == instr)
mergeInstrProfile(WeightedInputs, Remapper.get(), OutputFilename,		mergeInstrProfile(WeightedInputs, Remapper.get(), OutputFilename,
OutputFormat, OutputSparse, NumThreads, FailureMode);		OutputFormat, OutputSparse, NumThreads, FailureMode);
else		else
mergeSampleProfile(WeightedInputs, Remapper.get(), OutputFilename,		mergeSampleProfile(WeightedInputs, Remapper.get(), OutputFilename,
OutputFormat, ProfileSymbolListFile, CompressAllSections,		OutputFormat, ProfileSymbolListFile, CompressAllSections,
UseMD5, GenPartialProfile, FailureMode);		UseMD5, GenPartialProfile, FailureMode);

▲ Show 20 Lines • Show All 487 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[PGO] Supplement PGO profile with Sample profileClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 273545

llvm/include/llvm/ProfileData/InstrProf.h

llvm/include/llvm/ProfileData/InstrProfWriter.h

llvm/include/llvm/ProfileData/ProfileCommon.h

llvm/lib/Analysis/ProfileSummaryInfo.cpp

llvm/lib/ProfileData/InstrProf.cpp

llvm/lib/ProfileData/InstrProfWriter.cpp

llvm/lib/ProfileData/ProfileSummaryBuilder.cpp

llvm/test/tools/llvm-profdata/Inputs/mix_instr.proftext

llvm/test/tools/llvm-profdata/Inputs/mix_sample.proftext

llvm/test/tools/llvm-profdata/suppl-instr-with-sample.test

llvm/tools/llvm-profdata/llvm-profdata.cpp

[PGO] Supplement PGO profile with Sample profile
ClosedPublic