This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
test/tools/llvm-profgen/
-
tools/
-
llvm-profgen/
1/2
inline-cs-pseudoprobe.test
-
inline-noprobe2.test
-
tools/llvm-profgen/
-
llvm-profgen/
1/2
ProfileGenerator.h
2/4
ProfileGenerator.cpp
1/2
ProfiledBinary.h

Differential D113781

[llvm-profgen] Compute and show profile density
ClosedPublic

Authored by wlei on Nov 12 2021, 9:57 AM.

Download Raw Diff

Details

Reviewers

hoy
wenlei

Commits

rGc2e08aba1afd: [llvm-profgen] Compute and show profile density

Summary

AutoFDO performance is sensitive to profile density, i.e., the amount of samples in the profile relative to the program size, because profiles with insufficient samples could be inaccurate due to statistical noise and thus hurt AutoFDO performance. A previous investigation showed that AutoFDO performed better on MySQL with increased amount of samples. Therefore, we implement a profile-density computation feature to give hints about profile density to users and the compiler.

We define the density of a profile Prof as follows:

For each function A in the profile, density(A) = total_samples(A) / sizeof(A).
density(Prof) = min(density(A)) for all functions A that are warm (defined below).

A function is considered warm if its total-samples is within top N percent of the profile. For implementation, we reuse the ProfileSummaryBuilder::getHotCountThreshold(..) as threshold which can be set by percent(--profile-summary-cutoff-hot) or by value(--profile-summary-hot-count).

We also introduce --hot-function-density-threshold to set hot function density threshold and will give suggestion if profile density is below it which implies we should increase samples.

This also applies for CS profile with all profiles merged into base.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

wlei created this revision.Nov 12 2021, 9:57 AM

Herald added subscribers: hoy, wenlei, lxfind. · View Herald TranscriptNov 12 2021, 9:57 AM

wlei requested review of this revision.Nov 12 2021, 9:57 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 12 2021, 9:57 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

wlei retitled this revision from [llvm-profgen] density analysis to [llvm-profgen] Compute and show profile density.Nov 12 2021, 10:31 AM

wlei edited the summary of this revision. (Show Details)

wlei added reviewers: hoy, wenlei.

Herald added a subscriber: kristof.beyls. · View Herald TranscriptNov 12 2021, 10:31 AM

Harbormaster completed remote builds in B133981: Diff 386875.Nov 12 2021, 10:47 AM

wenlei added inline comments.Nov 23 2021, 5:50 PM

llvm/test/tools/llvm-profgen/inline-cs-pseudoprobe.test
23–24	I suggest we use a dedicated test case instead of adding feature test into various existing test cases. That way it helps keep testing isolated separated.
llvm/tools/llvm-profgen/ProfileGenerator.cpp
143–149	Maybe these two messages should be `warning`?
735	Perhaps we can let calculateAndShowDensity take a merged profile map, then `calculateAndShowDensity` doesn't have to be a virtual function and can have full implementation in `ProfileGeneratorBase`. We can prepare merged profiles here and pass into `calculateAndShowDensity`.
llvm/tools/llvm-profgen/ProfileGenerator.h
89	Can we omit `raw_fd_ostream &OS` as a parameter and use `outs()` directly inside? With that, can we also remove the include for `raw_os_ostream.h`?
llvm/tools/llvm-profgen/ProfiledBinary.h
80	nit: we've been using `uint64_t` for size. would be good to be consistent, even though they're the same thing here.

Addressing Wenlei's feedback

llvm/test/tools/llvm-profgen/inline-cs-pseudoprobe.test
23–24	Sounds good!
llvm/tools/llvm-profgen/ProfileGenerator.cpp
143–149	Good point!
735	Changed.
llvm/tools/llvm-profgen/ProfileGenerator.h
89	Sounds good!
llvm/tools/llvm-profgen/ProfiledBinary.h
80	Good to know this!

Harbormaster completed remote builds in B136377: Diff 390259.Nov 29 2021, 12:16 AM

lgtm, thx.

This revision is now accepted and ready to land.Nov 29 2021, 11:05 AM

lgtm, thanks.

Closed by commit rGc2e08aba1afd: [llvm-profgen] Compute and show profile density (authored by wlei). · Explain WhyNov 30 2021, 12:00 AM

This revision was automatically updated to reflect the committed changes.

wlei added a commit: rGc2e08aba1afd: [llvm-profgen] Compute and show profile density.

Revision Contents

Path

Size

llvm/

test/

tools/

llvm-profgen/

inline-cs-pseudoprobe.test

6 lines

inline-noprobe2.test

6 lines

tools/

llvm-profgen/

ProfileGenerator.h

26 lines

ProfileGenerator.cpp

85 lines

ProfiledBinary.h

15 lines

Diff 386875

llvm/test/tools/llvm-profgen/inline-cs-pseudoprobe.test

	; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/inline-cs-pseudoprobe.perfscript --binary=%S/Inputs/inline-cs-pseudoprobe.perfbin --output=%t --skip-symbolization --profile-summary-cold-count=0 --use-offset=0			; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/inline-cs-pseudoprobe.perfscript --binary=%S/Inputs/inline-cs-pseudoprobe.perfbin --output=%t --skip-symbolization --profile-summary-cold-count=0 --use-offset=0
	; RUN: FileCheck %s --input-file %t --check-prefix=CHECK-UNWINDER			; RUN: FileCheck %s --input-file %t --check-prefix=CHECK-UNWINDER
	; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/inline-cs-pseudoprobe.perfscript --binary=%S/Inputs/inline-cs-pseudoprobe.perfbin --output=%t --profile-summary-cold-count=0			; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/inline-cs-pseudoprobe.perfscript --binary=%S/Inputs/inline-cs-pseudoprobe.perfbin --output=%t --profile-summary-cold-count=0 --show-density &> %t1
	; RUN: FileCheck %s --input-file %t			; RUN: FileCheck %s --input-file %t
				; RUN: FileCheck %s --input-file %t1 --check-prefix=CHECK-DENSITY

	; CHECK: [main:2 @ foo]:74:0			; CHECK: [main:2 @ foo]:74:0
	; CHECK-NEXT: 1: 0			; CHECK-NEXT: 1: 0
	; CHECK-NEXT: 2: 15			; CHECK-NEXT: 2: 15
	; CHECK-NEXT: 3: 15			; CHECK-NEXT: 3: 15
	; CHECK-NEXT: 4: 14			; CHECK-NEXT: 4: 14
	; CHECK-NEXT: 5: 1			; CHECK-NEXT: 5: 1
	; CHECK-NEXT: 6: 15			; CHECK-NEXT: 6: 15
	; CHECK-NEXT: 7: 0			; CHECK-NEXT: 7: 0
	; CHECK-NEXT: 8: 14 bar:14			; CHECK-NEXT: 8: 14 bar:14
	; CHECK-NEXT: 9: 0			; CHECK-NEXT: 9: 0
	; CHECK-NEXT: !CFGChecksum: 563088904013236			; CHECK-NEXT: !CFGChecksum: 563088904013236
	; CHECK:[main:2 @ foo:8 @ bar]:28:14			; CHECK:[main:2 @ foo:8 @ bar]:28:14
	; CHECK-NEXT: 1: 14			; CHECK-NEXT: 1: 14
	; CHECK-NEXT: 4: 14			; CHECK-NEXT: 4: 14
	; CHECK-NEXT: !CFGChecksum: 72617220756			; CHECK-NEXT: !CFGChecksum: 72617220756

				; CHECK-DENSITY: AutoFDO is estimated to optimize better with 1675.7x more samples. Please consider increasing sampling rate or profiling for longer duration to get more samples.
				; CHECK-DENSITY: Minimum profile density for hot functions with top 99.00% total samples: 0.6
				wenleiUnsubmitted Not Done Reply Inline Actions I suggest we use a dedicated test case instead of adding feature test into various existing test cases. That way it helps keep testing isolated separated. wenlei: I suggest we use a dedicated test case instead of adding feature test into various existing…
				wleiAuthorUnsubmitted Done Reply Inline Actions Sounds good! wlei: Sounds good!

	; CHECK-UNWINDER: 3			; CHECK-UNWINDER: 3
	; CHECK-UNWINDER-NEXT: 201800-201858:1			; CHECK-UNWINDER-NEXT: 201800-201858:1
	; CHECK-UNWINDER-NEXT: 20180e-20182b:1			; CHECK-UNWINDER-NEXT: 20180e-20182b:1
	; CHECK-UNWINDER-NEXT: 20180e-201858:13			; CHECK-UNWINDER-NEXT: 20180e-201858:13
	; CHECK-UNWINDER-NEXT: 2			; CHECK-UNWINDER-NEXT: 2
	; CHECK-UNWINDER-NEXT: 20182b->201800:1			; CHECK-UNWINDER-NEXT: 20182b->201800:1
	; CHECK-UNWINDER-NEXT: 201858->20180e:15			; CHECK-UNWINDER-NEXT: 201858->20180e:15

	Show All 24 Lines

llvm/test/tools/llvm-profgen/inline-noprobe2.test

	; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/artificial-branch.perfscript --binary=%S/Inputs/inline-noprobe2.perfbin --output=%t --skip-symbolization --use-offset=0			; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/artificial-branch.perfscript --binary=%S/Inputs/inline-noprobe2.perfbin --output=%t --skip-symbolization --use-offset=0
	; RUN: FileCheck %s --input-file %t --check-prefix=CHECK-ARTIFICIAL-BRANCH			; RUN: FileCheck %s --input-file %t --check-prefix=CHECK-ARTIFICIAL-BRANCH
	; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/inline-noprobe2.perfscript --binary=%S/Inputs/inline-noprobe2.perfbin --output=%t --skip-symbolization --use-offset=0			; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/inline-noprobe2.perfscript --binary=%S/Inputs/inline-noprobe2.perfbin --output=%t --skip-symbolization --use-offset=0
	; RUN: FileCheck %s --input-file %t --check-prefix=CHECK-RAW-PROFILE			; RUN: FileCheck %s --input-file %t --check-prefix=CHECK-RAW-PROFILE
	; RUN: llvm-profgen --format=text --unsymbolized-profile=%t --binary=%S/Inputs/inline-noprobe2.perfbin --output=%t1 --use-offset=0			; RUN: llvm-profgen --format=text --unsymbolized-profile=%t --binary=%S/Inputs/inline-noprobe2.perfbin --output=%t1 --use-offset=0 --show-density -hot-function-density-threshold=1 &> %t2
	; RUN: FileCheck %s --input-file %t1 --check-prefix=CHECK			; RUN: FileCheck %s --input-file %t1 --check-prefix=CHECK
				; RUN: FileCheck %s --input-file %t2 --check-prefix=CHECK-DENSITY

	; RUN: llvm-profgen --format=extbinary --perfscript=%S/Inputs/inline-noprobe2.perfscript --binary=%S/Inputs/inline-noprobe2.perfbin --output=%t --populate-profile-symbol-list=1			; RUN: llvm-profgen --format=extbinary --perfscript=%S/Inputs/inline-noprobe2.perfscript --binary=%S/Inputs/inline-noprobe2.perfbin --output=%t --populate-profile-symbol-list=1
	; RUN: llvm-profdata show -show-prof-sym-list -sample %t \| FileCheck %s --check-prefix=CHECK-SYM-LIST			; RUN: llvm-profdata show -show-prof-sym-list -sample %t \| FileCheck %s --check-prefix=CHECK-SYM-LIST

	; CHECK-ARTIFICIAL-BRANCH: 3			; CHECK-ARTIFICIAL-BRANCH: 3
	; CHECK-ARTIFICIAL-BRANCH: 400540-400540:1			; CHECK-ARTIFICIAL-BRANCH: 400540-400540:1
	; CHECK-ARTIFICIAL-BRANCH: 400870-400870:2			; CHECK-ARTIFICIAL-BRANCH: 400870-400870:2
	; CHECK-ARTIFICIAL-BRANCH: 400875-4008bf:1			; CHECK-ARTIFICIAL-BRANCH: 400875-4008bf:1
	▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines
	;CHECK-NEXT: 6.1: 12			;CHECK-NEXT: 6.1: 12
	;CHECK-NEXT: 6.3: 10			;CHECK-NEXT: 6.3: 10
	;CHECK-NEXT: 7: 0			;CHECK-NEXT: 7: 0
	;CHECK-NEXT: 8: 0 quick_sort:1			;CHECK-NEXT: 8: 0 quick_sort:1
	;CHECK-NEXT: 9: 0			;CHECK-NEXT: 9: 0
	;CHECK-NEXT: 11: 0			;CHECK-NEXT: 11: 0
	;CHECK-NEXT: 14: 0			;CHECK-NEXT: 14: 0

				;CHECK-DENSITY: AutoFDO is estimated to optimize better with 4.9x more samples. Please consider increasing sampling rate or profiling for longer duration to get more samples.
				;CHECK-DENSITY: Minimum profile density for hot functions with top 99.00% total samples: 0.2

	; original code:			; original code:
	; clang -O3 -g -fno-optimize-sibling-calls -fdebug-info-for-profiling qsort.c -o a.out			; clang -O3 -g -fno-optimize-sibling-calls -fdebug-info-for-profiling qsort.c -o a.out
	#include <stdio.h>			#include <stdio.h>
	#include <stdlib.h>			#include <stdlib.h>

	void swap(int a, int b) {			void swap(int a, int b) {
	int t = *a;			int t = *a;
	a = b;			a = b;
	▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

llvm/tools/llvm-profgen/ProfileGenerator.h

//===-- ProfileGenerator.h - Profile Generator ------------------ C++ --===//		//===-- ProfileGenerator.h - Profile Generator ------------------ C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_TOOLS_LLVM_PROGEN_PROFILEGENERATOR_H		#ifndef LLVM_TOOLS_LLVM_PROGEN_PROFILEGENERATOR_H
#define LLVM_TOOLS_LLVM_PROGEN_PROFILEGENERATOR_H		#define LLVM_TOOLS_LLVM_PROGEN_PROFILEGENERATOR_H
#include "CSPreInliner.h"		#include "CSPreInliner.h"
#include "ErrorHandling.h"		#include "ErrorHandling.h"
#include "PerfReader.h"		#include "PerfReader.h"
#include "ProfiledBinary.h"		#include "ProfiledBinary.h"
#include "llvm/ProfileData/SampleProfWriter.h"		#include "llvm/ProfileData/SampleProfWriter.h"
		#include "llvm/Support/raw_os_ostream.h"
#include <memory>		#include <memory>
#include <unordered_set>		#include <unordered_set>

using namespace llvm;		using namespace llvm;
using namespace sampleprof;		using namespace sampleprof;

namespace llvm {		namespace llvm {
namespace sampleprof {		namespace sampleprof {
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	protected:
void findDisjointRanges(RangeSample &DisjointRanges,		void findDisjointRanges(RangeSample &DisjointRanges,
const RangeSample &Ranges);		const RangeSample &Ranges);
// Helper function for updating body sample for a leaf location in		// Helper function for updating body sample for a leaf location in
// FunctionProfile		// FunctionProfile
void updateBodySamplesforFunctionProfile(FunctionSamples &FunctionProfile,		void updateBodySamplesforFunctionProfile(FunctionSamples &FunctionProfile,
const SampleContextFrame &LeafLoc,		const SampleContextFrame &LeafLoc,
uint64_t Count);		uint64_t Count);
void updateTotalSamples();		void updateTotalSamples();

StringRef getCalleeNameForOffset(uint64_t TargetOffset);		StringRef getCalleeNameForOffset(uint64_t TargetOffset);

		void computeSummaryAndThreshold();

		virtual void calculateAndShowDensity(const SampleProfileMap &Profiles) = 0;

		double calculateDensity(const SampleProfileMap &Profiles,
		uint64_t HotCntThreshold);

		void showDensitySuggestion(double Density, raw_fd_ostream &OS);
		wenleiUnsubmitted Not Done Reply Inline Actions Can we omit `raw_fd_ostream &OS` as a parameter and use `outs()` directly inside? With that, can we also remove the include for `raw_os_ostream.h`? wenlei: Can we omit `raw_fd_ostream &OS` as a parameter and use `outs()` directly inside? With that…
		wleiAuthorUnsubmitted Done Reply Inline Actions Sounds good! wlei: Sounds good!

		// Thresholds from profile summary to answer isHotCount/isColdCount queries.
		uint64_t HotCountThreshold;

		uint64_t ColdCountThreshold;

// Used by SampleProfileWriter		// Used by SampleProfileWriter
SampleProfileMap ProfileMap;		SampleProfileMap ProfileMap;

ProfiledBinary *Binary = nullptr;		ProfiledBinary *Binary = nullptr;

const ContextSampleCounterMap &SampleCounters;		const ContextSampleCounterMap &SampleCounters;
};		};

Show All 12 Lines	private:
// Helper function to get the leaf frame's FunctionProfile by traversing the		// Helper function to get the leaf frame's FunctionProfile by traversing the
// inline stack and meanwhile it adds the total samples for each frame's		// inline stack and meanwhile it adds the total samples for each frame's
// function profile.		// function profile.
FunctionSamples &		FunctionSamples &
getLeafFrameProfile(const SampleContextFrameVector &FrameVec);		getLeafFrameProfile(const SampleContextFrameVector &FrameVec);
void populateBodySamplesForAllFunctions(const RangeSample &RangeCounter);		void populateBodySamplesForAllFunctions(const RangeSample &RangeCounter);
void		void
populateBoundarySamplesForAllFunctions(const BranchSample &BranchCounters);		populateBoundarySamplesForAllFunctions(const BranchSample &BranchCounters);
		void postProcessProfiles();
		void calculateAndShowDensity(const SampleProfileMap &Profiles) override;
};		};

using ProbeCounterMap =		using ProbeCounterMap =
std::unordered_map<const MCDecodedPseudoProbe *, uint64_t>;		std::unordered_map<const MCDecodedPseudoProbe *, uint64_t>;

class CSProfileGenerator : public ProfileGeneratorBase {		class CSProfileGenerator : public ProfileGeneratorBase {
public:		public:
CSProfileGenerator(ProfiledBinary *Binary,		CSProfileGenerator(ProfiledBinary *Binary,
▲ Show 20 Lines • Show All 125 Lines • ▼ Show 20 Lines	getFunctionProfileForContext(const SampleContextFrameVector &Context,
bool WasLeafInlined = false);		bool WasLeafInlined = false);
// For profiled only functions, on-demand compute their inline context		// For profiled only functions, on-demand compute their inline context
// function byte size which is used by the pre-inliner.		// function byte size which is used by the pre-inliner.
void computeSizeForProfiledFunctions();		void computeSizeForProfiledFunctions();
// Post processing for profiles before writing out, such as mermining		// Post processing for profiles before writing out, such as mermining
// and trimming cold profiles, running preinliner on profiles.		// and trimming cold profiles, running preinliner on profiles.
void postProcessProfiles();		void postProcessProfiles();

void computeSummaryAndThreshold();

void populateBodySamplesForFunction(FunctionSamples &FunctionProfile,		void populateBodySamplesForFunction(FunctionSamples &FunctionProfile,
const RangeSample &RangeCounters);		const RangeSample &RangeCounters);
void populateBoundarySamplesForFunction(SampleContextFrames ContextId,		void populateBoundarySamplesForFunction(SampleContextFrames ContextId,
FunctionSamples &FunctionProfile,		FunctionSamples &FunctionProfile,
const BranchSample &BranchCounters);		const BranchSample &BranchCounters);
void populateInferredFunctionSamples();		void populateInferredFunctionSamples();

void generateProbeBasedProfile();		void generateProbeBasedProfile();
// Go through each address from range to extract the top frame probe by		// Go through each address from range to extract the top frame probe by
// looking up in the Address2ProbeMap		// looking up in the Address2ProbeMap
void extractProbesFromRange(const RangeSample &RangeCounter,		void extractProbesFromRange(const RangeSample &RangeCounter,
ProbeCounterMap &ProbeCounter);		ProbeCounterMap &ProbeCounter);
// Fill in function body samples from probes		// Fill in function body samples from probes
void populateBodySamplesWithProbes(const RangeSample &RangeCounter,		void populateBodySamplesWithProbes(const RangeSample &RangeCounter,
SampleContextFrames ContextStack);		SampleContextFrames ContextStack);
// Fill in boundary samples for a call probe		// Fill in boundary samples for a call probe
void populateBoundarySamplesWithProbes(const BranchSample &BranchCounter,		void populateBoundarySamplesWithProbes(const BranchSample &BranchCounter,
SampleContextFrames ContextStack);		SampleContextFrames ContextStack);
// Helper function to get FunctionSamples for the leaf probe		// Helper function to get FunctionSamples for the leaf probe
FunctionSamples &		FunctionSamples &
getFunctionProfileForLeafProbe(SampleContextFrames ContextStack,		getFunctionProfileForLeafProbe(SampleContextFrames ContextStack,
const MCDecodedPseudoProbe *LeafProbe);		const MCDecodedPseudoProbe *LeafProbe);
// Thresholds from profile summary to answer isHotCount/isColdCount queries.
uint64_t HotCountThreshold;		void calculateAndShowDensity(const SampleProfileMap &Profiles) override;
uint64_t ColdCountThreshold;

// Underlying context table serves for sample profile writer.		// Underlying context table serves for sample profile writer.
std::unordered_set<SampleContextFrameVector, SampleContextFrameHash> Contexts;		std::unordered_set<SampleContextFrameVector, SampleContextFrameHash> Contexts;

public:		public:
// Deduplicate adjacent repeated context sequences up to a given sequence		// Deduplicate adjacent repeated context sequences up to a given sequence
// length. -1 means no size limit.		// length. -1 means no size limit.
static int32_t MaxCompressionSize;		static int32_t MaxCompressionSize;
static int MaxContextDepth;		static int MaxContextDepth;
};		};

} // end namespace sampleprof		} // end namespace sampleprof
} // end namespace llvm		} // end namespace llvm

#endif		#endif

llvm/tools/llvm-profgen/ProfileGenerator.cpp

//===-- ProfileGenerator.cpp - Profile Generator ---------------- C++ --===//		//===-- ProfileGenerator.cpp - Profile Generator ---------------- C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "ProfileGenerator.h"		#include "ProfileGenerator.h"
#include "ProfiledBinary.h"		#include "ProfiledBinary.h"
#include "llvm/ProfileData/ProfileCommon.h"		#include "llvm/ProfileData/ProfileCommon.h"
		#include <float.h>
#include <unordered_set>		#include <unordered_set>

cl::opt<std::string> OutputFilename("output", cl::value_desc("output"),		cl::opt<std::string> OutputFilename("output", cl::value_desc("output"),
cl::Required,		cl::Required,
cl::desc("Output profile file"));		cl::desc("Output profile file"));
static cl::alias OutputA("o", cl::desc("Alias for --output"),		static cl::alias OutputA("o", cl::desc("Alias for --output"),
cl::aliasopt(OutputFilename));		cl::aliasopt(OutputFilename));

▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	cl::desc("Keep the last K contexts while merging cold profile. 1 means the "
"context-less base profile"));		"context-less base profile"));

static cl::opt<int, true> CSProfMaxContextDepth(		static cl::opt<int, true> CSProfMaxContextDepth(
"csprof-max-context-depth", cl::ZeroOrMore,		"csprof-max-context-depth", cl::ZeroOrMore,
cl::desc("Keep the last K contexts while merging profile. -1 means no "		cl::desc("Keep the last K contexts while merging profile. -1 means no "
"depth limit."),		"depth limit."),
cl::location(llvm::sampleprof::CSProfileGenerator::MaxContextDepth));		cl::location(llvm::sampleprof::CSProfileGenerator::MaxContextDepth));

extern cl::opt<int> ProfileSummaryCutoffCold;		static cl::opt<double> HotFunctionDensityThreshold(
		"hot-function-density-threshold", llvm::cl::init(1000),
		llvm::cl::desc(
		"specify density threshold for hot functions (default: 1000)"),
		llvm::cl::Optional);
		static cl::opt<bool> ShowDensity("show-density", llvm::cl::init(false),
		llvm::cl::desc("show profile density details"),
		llvm::cl::Optional);

		extern cl::opt<int> ProfileSummaryCutoffHot;

using namespace llvm;		using namespace llvm;
using namespace sampleprof;		using namespace sampleprof;

namespace llvm {		namespace llvm {
namespace sampleprof {		namespace sampleprof {

// Initialize the MaxCompressionSize to -1 which means no size limit		// Initialize the MaxCompressionSize to -1 which means no size limit
Show All 40 Lines	if (OutputFormat != SPF_Ext_Binary)
"--format=extbinary to enable it\n";		"--format=extbinary to enable it\n";
else		else
WriterOrErr.get()->setUseMD5();		WriterOrErr.get()->setUseMD5();
}		}

write(std::move(WriterOrErr.get()), ProfileMap);		write(std::move(WriterOrErr.get()), ProfileMap);
}		}

		void ProfileGeneratorBase::showDensitySuggestion(double Density,
		raw_fd_ostream &OS) {
		if (Density == 0.0)
		OS << "The --profile-summary-cutoff-hot option may be set too low. Please "
		"check your command.\n";
		else if (Density < HotFunctionDensityThreshold)
		OS << "AutoFDO is estimated to optimize better with "
		<< format("%.1f", HotFunctionDensityThreshold / Density)
		<< "x more samples. Please consider increasing sampling rate or "
		"profiling for longer duration to get more samples.\n";
		wenleiUnsubmitted Not Done Reply Inline Actions Maybe these two messages should be `warning`? wenlei: Maybe these two messages should be `warning`?
		wleiAuthorUnsubmitted Done Reply Inline Actions Good point! wlei: Good point!

		if (ShowDensity)
		OS << "Minimum profile density for hot functions with top "
		<< format("%.2f",
		static_cast<double>(ProfileSummaryCutoffHot.getValue()) /
		10000)
		<< "% total samples: " << format("%.1f", Density) << "\n";
		}

		double ProfileGeneratorBase::calculateDensity(const SampleProfileMap &Profiles,
		uint64_t HotCntThreshold) {
		double Density = DBL_MAX;
		std::vector<const FunctionSamples *> HotFuncs;
		for (auto &I : Profiles) {
		auto &FuncSamples = I.second;
		if (FuncSamples.getTotalSamples() < HotCntThreshold)
		break;
		HotFuncs.emplace_back(&FuncSamples);
		}

		for (auto *FuncSamples : HotFuncs) {
		auto *Func = Binary->getBinaryFunction(FuncSamples->getName());
		if (!Func)
		continue;
		size_t FuncSize = Func->getFuncSize();
		if (FuncSize == 0)
		continue;
		Density =
		std::min(Density, static_cast<double>(FuncSamples->getTotalSamples()) /
		FuncSize);
		}

		return Density == DBL_MAX ? 0.0 : Density;
		}

void ProfileGeneratorBase::findDisjointRanges(RangeSample &DisjointRanges,		void ProfileGeneratorBase::findDisjointRanges(RangeSample &DisjointRanges,
const RangeSample &Ranges) {		const RangeSample &Ranges) {

/*		/*
Regions may overlap with each other. Using the boundary info, find all		Regions may overlap with each other. Using the boundary info, find all
disjoint ranges and their sample count. BoundaryPoint contains the count		disjoint ranges and their sample count. BoundaryPoint contains the count
multiple samples begin/end at this points.		multiple samples begin/end at this points.

▲ Show 20 Lines • Show All 168 Lines • ▼ Show 20 Lines
}		}

void ProfileGenerator::generateProfile() {		void ProfileGenerator::generateProfile() {
if (Binary->usePseudoProbes()) {		if (Binary->usePseudoProbes()) {
// TODO: Support probe based profile generation		// TODO: Support probe based profile generation
} else {		} else {
generateLineNumBasedProfile();		generateLineNumBasedProfile();
}		}
		postProcessProfiles();
		}

		void ProfileGenerator::postProcessProfiles() {
		computeSummaryAndThreshold();
		calculateAndShowDensity(ProfileMap);
}		}

void ProfileGenerator::generateLineNumBasedProfile() {		void ProfileGenerator::generateLineNumBasedProfile() {
assert(SampleCounters.size() == 1 &&		assert(SampleCounters.size() == 1 &&
"Must have one entry for profile generation.");		"Must have one entry for profile generation.");
const SampleCounter &SC = SampleCounters.begin()->second;		const SampleCounter &SC = SampleCounters.begin()->second;
// Fill in function body samples		// Fill in function body samples
populateBodySamplesForAllFunctions(SC.RangeCounter);		populateBodySamplesForAllFunctions(SC.RangeCounter);
▲ Show 20 Lines • Show All 113 Lines • ▼ Show 20 Lines	if (!FrameVec.empty()) {
CalleeName, Count);		CalleeName, Count);
}		}
// Add head samples for callee.		// Add head samples for callee.
FunctionSamples &CalleeProfile = getTopLevelFunctionProfile(CalleeName);		FunctionSamples &CalleeProfile = getTopLevelFunctionProfile(CalleeName);
CalleeProfile.addHeadSamples(Count);		CalleeProfile.addHeadSamples(Count);
}		}
}		}

		void ProfileGenerator::calculateAndShowDensity(
		const SampleProfileMap &Profiles) {
		double Density = calculateDensity(Profiles, HotCountThreshold);
		showDensitySuggestion(Density, outs());
		}

FunctionSamples &CSProfileGenerator::getFunctionProfileForContext(		FunctionSamples &CSProfileGenerator::getFunctionProfileForContext(
const SampleContextFrameVector &Context, bool WasLeafInlined) {		const SampleContextFrameVector &Context, bool WasLeafInlined) {
auto I = ProfileMap.find(SampleContext(Context));		auto I = ProfileMap.find(SampleContext(Context));
if (I == ProfileMap.end()) {		if (I == ProfileMap.end()) {
// Save the new context for future references.		// Save the new context for future references.
SampleContextFrames NewContext = *Contexts.insert(Context).first;		SampleContextFrames NewContext = *Contexts.insert(Context).first;
SampleContext FContext(NewContext, RawContext);		SampleContext FContext(NewContext, RawContext);
auto Ret = ProfileMap.emplace(FContext, FunctionSamples());		auto Ret = ProfileMap.emplace(FContext, FunctionSamples());
▲ Show 20 Lines • Show All 208 Lines • ▼ Show 20 Lines	void CSProfileGenerator::postProcessProfiles() {

// Trim and merge cold context profile using cold threshold above.		// Trim and merge cold context profile using cold threshold above.
if (CSProfTrimColdContext \|\| CSProfMergeColdContext) {		if (CSProfTrimColdContext \|\| CSProfMergeColdContext) {
SampleContextTrimmer(ProfileMap)		SampleContextTrimmer(ProfileMap)
.trimAndMergeColdContextProfiles(		.trimAndMergeColdContextProfiles(
HotCountThreshold, CSProfTrimColdContext, CSProfMergeColdContext,		HotCountThreshold, CSProfTrimColdContext, CSProfMergeColdContext,
CSProfMaxColdContextDepth, EnableCSPreInliner);		CSProfMaxColdContextDepth, EnableCSPreInliner);
}		}

		calculateAndShowDensity(ProfileMap);
		wenleiUnsubmitted Not Done Reply Inline Actions Perhaps we can let calculateAndShowDensity take a merged profile map, then `calculateAndShowDensity` doesn't have to be a virtual function and can have full implementation in `ProfileGeneratorBase`. We can prepare merged profiles here and pass into `calculateAndShowDensity`. wenlei: Perhaps we can let calculateAndShowDensity take a merged profile map, then…
		wleiAuthorUnsubmitted Done Reply Inline Actions Changed. wlei: Changed.
}		}

void CSProfileGenerator::computeSummaryAndThreshold() {		void ProfileGeneratorBase::computeSummaryAndThreshold() {
SampleProfileSummaryBuilder Builder(ProfileSummaryBuilder::DefaultCutoffs);		SampleProfileSummaryBuilder Builder(ProfileSummaryBuilder::DefaultCutoffs);
auto Summary = Builder.computeSummaryForProfiles(ProfileMap);		auto Summary = Builder.computeSummaryForProfiles(ProfileMap);
HotCountThreshold = ProfileSummaryBuilder::getHotCountThreshold(		HotCountThreshold = ProfileSummaryBuilder::getHotCountThreshold(
(Summary->getDetailedSummary()));		(Summary->getDetailedSummary()));
ColdCountThreshold = ProfileSummaryBuilder::getColdCountThreshold(		ColdCountThreshold = ProfileSummaryBuilder::getColdCountThreshold(
(Summary->getDetailedSummary()));		(Summary->getDetailedSummary()));
}		}

▲ Show 20 Lines • Show All 161 Lines • ▼ Show 20 Lines	FunctionSamples &CSProfileGenerator::getFunctionProfileForLeafProbe(
const auto *FuncDesc = Binary->getFuncDescForGUID(LeafProbe->getGuid());		const auto *FuncDesc = Binary->getFuncDescForGUID(LeafProbe->getGuid());
bool WasLeafInlined = LeafProbe->getInlineTreeNode()->hasInlineSite();		bool WasLeafInlined = LeafProbe->getInlineTreeNode()->hasInlineSite();
FunctionSamples &FunctionProile =		FunctionSamples &FunctionProile =
getFunctionProfileForContext(NewContextStack, WasLeafInlined);		getFunctionProfileForContext(NewContextStack, WasLeafInlined);
FunctionProile.setFunctionHash(FuncDesc->FuncHash);		FunctionProile.setFunctionHash(FuncDesc->FuncHash);
return FunctionProile;		return FunctionProile;
}		}

		void CSProfileGenerator::calculateAndShowDensity(
		const SampleProfileMap &Profiles) {
		sampleprof::SampleProfileMap ContextLessProfiles;
		// Merge function samples for CS profile.
		for (const auto &I : Profiles) {
		ContextLessProfiles[I.second.getName()].merge(I.second);
		}

		double Density = calculateDensity(ContextLessProfiles, HotCountThreshold);
		showDensitySuggestion(Density, outs());
		}

} // end namespace sampleprof		} // end namespace sampleprof
} // end namespace llvm		} // end namespace llvm

llvm/tools/llvm-profgen/ProfiledBinary.h

Show First 20 Lines • Show All 70 Lines • ▼ Show 20 Lines
};		};

using RangesTy = std::vector<std::pair<uint64_t, uint64_t>>;		using RangesTy = std::vector<std::pair<uint64_t, uint64_t>>;

struct BinaryFunction {		struct BinaryFunction {
StringRef FuncName;		StringRef FuncName;
// End of range is an exclusive bound.		// End of range is an exclusive bound.
RangesTy Ranges;		RangesTy Ranges;

		size_t getFuncSize() {
		wenleiUnsubmitted Not Done Reply Inline Actions nit: we've been using `uint64_t` for size. would be good to be consistent, even though they're the same thing here. wenlei: nit: we've been using `uint64_t` for size. would be good to be consistent, even though they're…
		wleiAuthorUnsubmitted Done Reply Inline Actions Good to know this! wlei: Good to know this!
		size_t Sum = 0;
		for (auto &R : Ranges) {
		Sum += R.second - R.first;
		}
		return Sum;
		}
};		};

// Info about function range. A function can be split into multiple		// Info about function range. A function can be split into multiple
// non-continuous ranges, each range corresponds to one FuncRange.		// non-continuous ranges, each range corresponds to one FuncRange.
struct FuncRange {		struct FuncRange {
uint64_t StartOffset;		uint64_t StartOffset;
// EndOffset is an exclusive bound.		// EndOffset is an exclusive bound.
uint64_t EndOffset;		uint64_t EndOffset;
▲ Show 20 Lines • Show All 310 Lines • ▼ Show 20 Lines	RangesTy getRangesForOffset(uint64_t Offset) {
return FRange->Func->Ranges;		return FRange->Func->Ranges;
}		}

const std::unordered_map<std::string, BinaryFunction> &		const std::unordered_map<std::string, BinaryFunction> &
getAllBinaryFunctions() {		getAllBinaryFunctions() {
return BinaryFunctions;		return BinaryFunctions;
}		}

		BinaryFunction *getBinaryFunction(StringRef FName) {
		auto I = BinaryFunctions.find(FName.str());
		if (I == BinaryFunctions.end())
		return nullptr;
		return &I->second;
		}

uint32_t getFuncSizeForContext(SampleContext &Context) {		uint32_t getFuncSizeForContext(SampleContext &Context) {
return FuncSizeTracker.getFuncSizeForContext(Context);		return FuncSizeTracker.getFuncSizeForContext(Context);
}		}

// Load the symbols from debug table and populate into symbol list.		// Load the symbols from debug table and populate into symbol list.
void populateSymbolListFromDWARF(ProfileSymbolList &SymbolList);		void populateSymbolListFromDWARF(ProfileSymbolList &SymbolList);

const SampleContextFrameVector &		const SampleContextFrameVector &
▲ Show 20 Lines • Show All 72 Lines • Show Last 20 Lines