This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/tools/llvm-profgen/
-
tools/
-
llvm-profgen/
10/16
PerfReader.h
3/4
PerfReader.cpp
-
ProfileGenerator.h
-
ProfileGenerator.cpp

Differential D92584

[CSSPGO][llvm-profgen] Refactor to unify hashable interface for trace sample and context-sensitive counter
ClosedPublic

Authored by wlei on Dec 3 2020, 10:15 AM.

Download Raw Diff

Details

Reviewers

wmi
davidxl
hoy
wenlei

Commits

rG414930b91bfd: [CSSPGO][llvm-profgen] Refactor to unify hashable interface for trace sample…

Summary

As we plan to support both CSSPGO and AutoFDO for llvm-profgen, we will have different kinds of perf sample and different kinds of sample counter(cs/non-cs, with/without pseudo probe) which both need to do aggregation in hash map. This change implements the hashable interface(Hashable) and the unified base class for them to have better extensibility and reusability.

Currently perf trace sample and sample counter with context implemented this Hashable and the class hierarchy is like:

| Hashable  
           | PerfSample
                          | HybridSample
                          | LBRSample
           | ContextKey
                          | StringBasedCtxKey
                          | ProbeBasedCtxKey
                          | CallsiteBasedCtxKey
           | ...

For perf sample, HybridSample includes the call stack with LBR stack and LBRSample only includes LBR stack.
For context key used for virtual unwinding counter aggregation, we use string based context(StringBasedCtxKey) by default for good debug experience. For pseudo probe, we switch to use a stack of probe pointer(ProbeBasedCtxKey) to avoid redundant string handling. In the future, we can also speed up StringBasedCtxKey to use the original function frame stack, which is here named CallsiteBasedCtxKey

Implementation

Class specifying Hashable should implement getHashCode and isEqual. Here we make getHashCode a non-virtual function to avoid vtable overhead, so derived class should calculate and assign the base class's HashCode manually. This also provides the flexibility for calculating the hash code incrementally(like rolling hash) during frame stack unwinding
isEqual is a virtual function, which will have perf overhead. In the future, if we redesign a better hash function, then we can just skip this or switch to non-virtual function.
Added PerfSample and ContextKey as base class for perf sample and counter context key, leveraging llvm-style RTTI for this.
Added StringBasedCtxKey class extending ContextKey to use string as context id.
Refactor AggregationCounter to take all kinds of PerfSample as key
Refactor ContextSampleCounter to take all kinds of ContextKey as key
Other refactoring work:
- Create a wrapper class SampleCounter to wrap RangeCounter and BranchCounter
- Hoist ContextId and FunctionProfile out of populateFunctionBodySamples and populateFunctionBoundarySamples to reuse them in ProfileGenerator

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

wlei created this revision.Dec 3 2020, 10:15 AM

Herald added a project: Restricted Project. · View Herald TranscriptDec 3 2020, 10:15 AM

Herald added subscribers: llvm-commits, hoy, wenlei, lxfind. · View Herald Transcript

wlei requested review of this revision.Dec 3 2020, 10:15 AM

Harbormaster completed remote builds in B80979: Diff 309300.Dec 3 2020, 10:16 AM

wlei retitled this revision from [CSSPGO][llvm-profgen] Rafactor to unify hashable interface for trace sample and context-sensitive counter to [CSSPGO][llvm-profgen] Refactor to unify hashable interface for trace sample and context-sensitive counter.Dec 3 2020, 9:00 PM

wlei edited the summary of this revision. (Show Details)

wlei added reviewers: wmi, davidxl, hoy, wenlei.

add more comments

Harbormaster completed remote builds in B81061: Diff 309451.Dec 3 2020, 9:10 PM

fix a missing genHashCode, also add a assertion for this

Harbormaster completed remote builds in B81063: Diff 309453.Dec 3 2020, 9:33 PM

rebase

wlei added a parent revision: D92334: [CSSPGO][llvm-profgen] Pseudo probe decoding and disassembling.Dec 8 2020, 3:16 PM

Harbormaster completed remote builds in B81538: Diff 310368.Dec 8 2020, 4:15 PM

wlei added a child revision: D92896: [CSSPGO][llvm-profgen] Virtual unwinding with pseudo probe.Dec 8 2020, 5:04 PM

hoy added inline comments.Dec 9 2020, 6:22 PM

llvm/tools/llvm-profgen/PerfReader.h
104	Nit: could this be defined as an overloaded `==` operator?

wlei added inline comments.Dec 10 2020, 1:03 PM

llvm/tools/llvm-profgen/PerfReader.h
104	`==` should work. Here using `Equal` is intentional, I want to diff from `==` to indicate that this is a 'Equal' function for hash not for the regular compare. Currently we exactly compare the elements like `==`, but maybe in the future if we have a good hash function, we can use the hash code for the comparison. Or we can have different custom `Equal` function, like `ContextKeyEqual`, `SampleEqual`. Or maybe the 'Equal' is not a proper naming. Any thoughts on this?

hoy added inline comments.Dec 10 2020, 3:14 PM

llvm/tools/llvm-profgen/PerfReader.h
104	Sounds good. It makes sense to name the comparison more specifically.

hoy added inline comments.Dec 13 2020, 11:23 PM

llvm/tools/llvm-profgen/PerfReader.cpp
76	Name it `getOrCreateSampleCounter`?
llvm/tools/llvm-profgen/PerfReader.h
110	I don't see a consumer of the two APIs. Maybe exclude them from this patch? I'm wondering if the shared pointer should be returned (for reference counting) when there is a need of exposing the underlying data.

wmi added inline comments.Dec 14 2020, 12:22 PM

llvm/tools/llvm-profgen/PerfReader.h
116
263	Can you use Hashable as a base class instead so you can remove the isEqual virtual function (The CRTP pattern: https://en.wikipedia.org/wiki/Curiously_recurring_template_pattern#Static_polymorphism)? It will also simplify the interface using Hashable. struct StringBasedCtxKey : public Hashable<StringBasedCtxKey> { ... } And other type of Key later on: struct ProbeBasedCtxKey : public Hashable<ProbeBasedCtxKey> { ... }

wmi added inline comments.Dec 14 2020, 12:39 PM

llvm/tools/llvm-profgen/PerfReader.h
298	Can we rename it to ContextSampleCounterMap and its object to something like CtxCounterMap?

address reviewers' feedback

Harbormaster completed remote builds in B82496: Diff 311958.Dec 15 2020, 10:20 AM

wlei marked 4 inline comments as done.Dec 15 2020, 10:22 AM

wlei added inline comments.

llvm/tools/llvm-profgen/PerfReader.cpp
76	renamed
llvm/tools/llvm-profgen/PerfReader.h
110	Removed `getData()`, `getPtr()` is used when getting the key data, like PerfRead.cpp:232 if the shared pointer should be returned Not sure `const shared_ptr<Key> K = dyn_cast<shared_ptr<Key>>(I.first.getPtr());` can work for this, let me try. Also since it's the hashkey, it's only used like: for(auto I : Map) { const Key K = dyn_cast<Key>(I.first.getPtr()); // use K } So `K` is always used inside of `Map`'s scope.
263	Thanks for your suggestion and detailed example. Here I want to put the different types derived from the same base class in the same container. like : struct StringBasedCtxKey : public CtxKey { ... } struct ProbeBasedCtxKey : public CtxKey { ... } Then I can use only one container `unordered_map<CtxKey, ...> ContextSampleCounterMap` . If use CRTP, I guess it need two containers. any ideas about this?
298	Good point, the value is actually a counter, renamed.

wmi added inline comments.Dec 15 2020, 12:16 PM

llvm/tools/llvm-profgen/PerfReader.h
263	Ok, I don't have better idea. A possible solution is described in https://en.wikipedia.org/wiki/Curiously_recurring_template_pattern#Pitfalls, but that prevents Hashable class from being reused by PerfSample without duplication. I am fine with your existing solution If no one else has better idea.

wlei marked 2 inline comments as done.Dec 15 2020, 1:02 PM

wlei added inline comments.

llvm/tools/llvm-profgen/PerfReader.h
263	Yeah, we want both `PerfSample` and `ContextKey` could implements a unified Hashable. Thank you.

hoy accepted this revision.Dec 16 2020, 12:20 AM

hoy added inline comments.

llvm/tools/llvm-profgen/PerfReader.h
110	Current `getPtr()` implementation and its use looks good. I was thinking `getPtr()` might leak the underlying data out of the shared pointer scope when destroyed.
263	Both patterns have their specialties. The current solution looks a bit cleaner though it falls back to use virtual methods to solve hash collision. Hopefully that could be mitigated by using a more efficient hash code.

This revision is now accepted and ready to land.Dec 16 2020, 12:20 AM

Thanks for the refactoring and clean up, looks great!

LBRSample will be added when AutoFDO support is moved into llvm-profgen, right? For AutoFDO, we could use the same infrastructure except that context will always be empty.

What is CallsiteBasedCtxKey in the description? A stack of call site addresses? Brief comment for each of the leaf class that was mentioned in the description (even if not implemented) would be useful.

llvm/tools/llvm-profgen/PerfReader.cpp
216	Does this map also needs to own a copy of the context string? nit: perhaps remove this comment below? determinism mentioned above is good enough. This is due to a build failure on sanitizer build(asan/msan/ubsan)
llvm/tools/llvm-profgen/PerfReader.h
171–174	nit: make `hashCombine` a lambda function inside `genHashCode` if there's no other use?
192	nit: rename it `AggregatedCounter`? we have `AggregatedSamples` as names.

Address Wenlei's feedback

LBRSample will be added when AutoFDO support is moved into llvm-profgen, right? For AutoFDO, we could use the same infrastructure except that context will always be empty.

Good question, Yes, my initial thought is to decouple this by using separated LBRSample when it comes to AutoFDO, perhaps this's good for readability. I guess your suggestion is for the performance since we won't have virtual call. That's good. If so and we don't have other kinds of perf sample, we might don't need the base class PerfSample and can just name HybridSample to more general name(like PerfSample).

What is CallsiteBasedCtxKey in the description? A stack of call site addresses? Brief comment for each of the leaf class that was mentioned in the description (even if not implemented) would be useful.

Yes, it's a vector of original call site address(Callstack). I was thinking whether we could use this to replace the string based context key in unwinder stage, later in ProfileGenerator convert to string. Just an idea. Will add more comments to the summary part.

llvm/tools/llvm-profgen/PerfReader.cpp
216	Good point. Changed to `StringRef` and remove this comment.

In D92584#2458810, @wlei wrote:

LBRSample will be added when AutoFDO support is moved into llvm-profgen, right? For AutoFDO, we could use the same infrastructure except that context will always be empty.

Good question, Yes, my initial thought is to decouple this by using separated LBRSample when it comes to AutoFDO, perhaps this's good for readability. I guess your suggestion is for the performance since we won't have virtual call. That's good. If so and we don't have other kinds of perf sample, we might don't need the base class PerfSample and can just name HybridSample to more general name(like PerfSample).

Right.. LBR sample can be a Hybrid sample with empty stack, in which case we don't need hierarchy. But we can refine and deal with that later.

What is CallsiteBasedCtxKey in the description? A stack of call site addresses? Brief comment for each of the leaf class that was mentioned in the description (even if not implemented) would be useful.

Yes, it's a vector of original call site address(Callstack). I was thinking whether we could use this to replace the string based context key in unwinder stage, later in ProfileGenerator convert to string. Just an idea. Will add more comments to the summary part.

Harbormaster completed remote builds in B82685: Diff 312294.Dec 16 2020, 2:33 PM

wlei edited the summary of this revision. (Show Details)Dec 18 2020, 2:35 PM

Kindly ping @wmi ,
I have addressed feedbacks from Hongtao and Wenlei, want to know whether this's also good for you or any other questions.
Also for other pseudo probe patches(https://reviews.llvm.org/D92334, https://reviews.llvm.org/D92896, https://reviews.llvm.org/D92998). Thanks!

Sorry for the delay. LGTM.

This revision was landed with ongoing or failed builds.Jan 13 2021, 11:07 AM

Closed by commit rG414930b91bfd: [CSSPGO][llvm-profgen] Refactor to unify hashable interface for trace sample… (authored by wlei). · Explain Why

This revision was automatically updated to reflect the committed changes.

wlei added a commit: rG414930b91bfd: [CSSPGO][llvm-profgen] Refactor to unify hashable interface for trace sample….

Revision Contents

Path

Size

llvm/

tools/

llvm-profgen/

196 lines

74 lines

33 lines

149 lines

Diff 316458

llvm/tools/llvm-profgen/PerfReader.h

//===-- PerfReader.h - perfscript reader ------------------------ C++ --===//		//===-- PerfReader.h - perfscript reader ------------------------ C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_TOOLS_LLVM_PROFGEN_PERFREADER_H		#ifndef LLVM_TOOLS_LLVM_PROFGEN_PERFREADER_H
#define LLVM_TOOLS_LLVM_PROFGEN_PERFREADER_H		#define LLVM_TOOLS_LLVM_PROFGEN_PERFREADER_H
#include "ErrorHandling.h"		#include "ErrorHandling.h"
#include "ProfiledBinary.h"		#include "ProfiledBinary.h"
		#include "llvm/Support/Casting.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Regex.h"		#include "llvm/Support/Regex.h"
#include <fstream>		#include <fstream>
#include <list>		#include <list>
#include <map>		#include <map>
#include <vector>		#include <vector>

using namespace llvm;		using namespace llvm;
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	struct LBREntry {
// An artificial branch stands for a series of consecutive branches starting		// An artificial branch stands for a series of consecutive branches starting
// from the current binary with a transition through external code and		// from the current binary with a transition through external code and
// eventually landing back in the current binary.		// eventually landing back in the current binary.
bool IsArtificial = false;		bool IsArtificial = false;
LBREntry(uint64_t S, uint64_t T, bool I)		LBREntry(uint64_t S, uint64_t T, bool I)
: Source(S), Target(T), IsArtificial(I) {}		: Source(S), Target(T), IsArtificial(I) {}
};		};

		// Hash interface for generic data of type T
		// Data should implement a \fn getHashCode and a \fn isEqual
		// Currently getHashCode is non-virtual to avoid the overhead of calling vtable,
		// i.e we explicitly calculate hash of derived class, assign to base class's
		// HashCode. This also provides the flexibility for calculating the hash code
		// incrementally(like rolling hash) during frame stack unwinding since unwinding
		// only changes the leaf of frame stack. \fn isEqual is a virtual function,
		// which will have perf overhead. In the future, if we redesign a better hash
		// function, then we can just skip this or switch to non-virtual function(like
		// just ignore comparision if hash conflicts probabilities is low)
		template <class T> class Hashable {
		public:
		std::shared_ptr<T> Data;
		Hashable(const std::shared_ptr<T> &D) : Data(D) {}

		// Hash code generation
		struct Hash {
		uint64_t operator()(const Hashable<T> &Key) const {
		// Don't make it virtual for getHashCode
		assert(Key.Data->getHashCode() && "Should generate HashCode for it!");
		return Key.Data->getHashCode();
		}
		};

		// Hash equal
		struct Equal {
		hoyUnsubmitted Done Reply Inline Actions Nit: could this be defined as an overloaded `==` operator? hoy: Nit: could this be defined as an overloaded `==` operator?
		wleiAuthorUnsubmitted Done Reply Inline Actions `==` should work. Here using `Equal` is intentional, I want to diff from `==` to indicate that this is a 'Equal' function for hash not for the regular compare. Currently we exactly compare the elements like `==`, but maybe in the future if we have a good hash function, we can use the hash code for the comparison. Or we can have different custom `Equal` function, like `ContextKeyEqual`, `SampleEqual`. Or maybe the 'Equal' is not a proper naming. Any thoughts on this? wlei: `==` should work. Here using `Equal` is intentional, I want to diff from `==` to indicate that…
		hoyUnsubmitted Done Reply Inline Actions Sounds good. It makes sense to name the comparison more specifically. hoy: Sounds good. It makes sense to name the comparison more specifically.
		bool operator()(const Hashable<T> &LHS, const Hashable<T> &RHS) const {
		// Precisely compare the data, vtable will have overhead.
		return LHS.Data->isEqual(RHS.Data.get());
		}
		};

		hoyUnsubmitted Not Done Reply Inline Actions I don't see a consumer of the two APIs. Maybe exclude them from this patch? I'm wondering if the shared pointer should be returned (for reference counting) when there is a need of exposing the underlying data. hoy: I don't see a consumer of the two APIs. Maybe exclude them from this patch? I'm wondering if…
		wleiAuthorUnsubmitted Done Reply Inline Actions Removed `getData()`, `getPtr()` is used when getting the key data, like PerfRead.cpp:232 if the shared pointer should be returned Not sure `const shared_ptr<Key> K = dyn_cast<shared_ptr<Key>>(I.first.getPtr());` can work for this, let me try. Also since it's the hashkey, it's only used like: for(auto I : Map) { const Key K = dyn_cast<Key>(I.first.getPtr()); // use K } So `K` is always used inside of `Map`'s scope. wlei: Removed `getData()`, `getPtr()` is used when getting the key data, like PerfRead.cpp:232 > if…
		hoyUnsubmitted Not Done Reply Inline Actions Current `getPtr()` implementation and its use looks good. I was thinking `getPtr()` might leak the underlying data out of the shared pointer scope when destroyed. hoy: Current `getPtr()` implementation and its use looks good. I was thinking `getPtr()` might leak…
		T *getPtr() const { return Data.get(); }
		};

		// Base class to extend for all types of perf sample
		struct PerfSample {
		uint64_t HashCode = 0;
		wmiUnsubmitted Not Done Reply Inline Actions wmi:

		virtual ~PerfSample() = default;
		uint64_t getHashCode() const { return HashCode; }
		virtual bool isEqual(const PerfSample *K) const {
		return HashCode == K->HashCode;
		};

		// Utilities for LLVM-style RTTI
		enum PerfKind { PK_HybridSample };
		const PerfKind Kind;
		PerfKind getKind() const { return Kind; }
		PerfSample(PerfKind K) : Kind(K){};
		};

// The parsed hybrid sample including call stack and LBR stack.		// The parsed hybrid sample including call stack and LBR stack.
struct HybridSample {		struct HybridSample : public PerfSample {
// Profiled binary that current frame address belongs to		// Profiled binary that current frame address belongs to
ProfiledBinary *Binary;		ProfiledBinary *Binary;
// Call stack recorded in FILO(leaf to root) order		// Call stack recorded in FILO(leaf to root) order
std::list<uint64_t> CallStack;		std::list<uint64_t> CallStack;
// LBR stack recorded in FIFO order		// LBR stack recorded in FIFO order
SmallVector<LBREntry, 16> LBRStack;		SmallVector<LBREntry, 16> LBRStack;

		HybridSample() : PerfSample(PK_HybridSample){};
		static bool classof(const PerfSample *K) {
		return K->getKind() == PK_HybridSample;
		}

// Used for sample aggregation		// Used for sample aggregation
bool operator==(const HybridSample &Other) const {		bool isEqual(const PerfSample *K) const override {
if (Other.Binary != Binary)		const HybridSample *Other = dyn_cast<HybridSample>(K);
		if (Other->Binary != Binary)
return false;		return false;
const std::list<uint64_t> &OtherCallStack = Other.CallStack;		const std::list<uint64_t> &OtherCallStack = Other->CallStack;
const SmallVector<LBREntry, 16> &OtherLBRStack = Other.LBRStack;		const SmallVector<LBREntry, 16> &OtherLBRStack = Other->LBRStack;

if (CallStack.size() != OtherCallStack.size() \|\|		if (CallStack.size() != OtherCallStack.size() \|\|
LBRStack.size() != OtherLBRStack.size())		LBRStack.size() != OtherLBRStack.size())
return false;		return false;

auto Iter = CallStack.begin();		auto Iter = CallStack.begin();
for (auto Address : OtherCallStack) {		for (auto Address : OtherCallStack) {
if (Address != *Iter++)		if (Address != *Iter++)
return false;		return false;
}		}

for (size_t I = 0; I < OtherLBRStack.size(); I++) {		for (size_t I = 0; I < OtherLBRStack.size(); I++) {
if (LBRStack[I].Source != OtherLBRStack[I].Source \|\|		if (LBRStack[I].Source != OtherLBRStack[I].Source \|\|
LBRStack[I].Target != OtherLBRStack[I].Target)		LBRStack[I].Target != OtherLBRStack[I].Target)
return false;		return false;
}		}
return true;		return true;
}		}

		void genHashCode() {
		// Use simple DJB2 hash
		auto HashCombine = [](uint64_t H, uint64_t V) {
		return ((H << 5) + H) + V;
		wenleiUnsubmitted Done Reply Inline Actions nit: make `hashCombine` a lambda function inside `genHashCode` if there's no other use? wenlei: nit: make `hashCombine` a lambda function inside `genHashCode` if there's no other use?
		};
		uint64_t Hash = 5381;
		Hash = HashCombine(Hash, reinterpret_cast<uint64_t>(Binary));
		for (const auto &Value : CallStack) {
		Hash = HashCombine(Hash, Value);
		}
		for (const auto &Entry : LBRStack) {
		Hash = HashCombine(Hash, Entry.Source);
		Hash = HashCombine(Hash, Entry.Target);
		}
		HashCode = Hash;
		}
};		};

		// After parsing the sample, we record the samples by aggregating them
		// into this counter. The key stores the sample data and the value is
		// the sample repeat times.
		using AggregatedCounter =
		wenleiUnsubmitted Done Reply Inline Actions nit: rename it `AggregatedCounter`? we have `AggregatedSamples` as names. wenlei: nit: rename it `AggregatedCounter`? we have `AggregatedSamples` as names.
		std::unordered_map<Hashable<PerfSample>, uint64_t,
		Hashable<PerfSample>::Hash, Hashable<PerfSample>::Equal>;

// The state for the unwinder, it doesn't hold the data but only keep the		// The state for the unwinder, it doesn't hold the data but only keep the
// pointer/index of the data, While unwinding, the CallStack is changed		// pointer/index of the data, While unwinding, the CallStack is changed
// dynamicially and will be recorded as the context of the sample		// dynamicially and will be recorded as the context of the sample
struct UnwindState {		struct UnwindState {
// Profiled binary that current frame address belongs to		// Profiled binary that current frame address belongs to
const ProfiledBinary *Binary;		const ProfiledBinary *Binary;
// TODO: switch to use trie for call stack		// TODO: switch to use trie for call stack
std::list<uint64_t> CallStack;		std::list<uint64_t> CallStack;
// Used to fall through the LBR stack		// Used to fall through the LBR stack
uint32_t LBRIndex = 0;		uint32_t LBRIndex = 0;
// Reference to HybridSample.LBRStack		// Reference to HybridSample.LBRStack
const SmallVector<LBREntry, 16> &LBRStack;		const SmallVector<LBREntry, 16> &LBRStack;
// Used to iterate the address range		// Used to iterate the address range
InstructionPointer InstPtr;		InstructionPointer InstPtr;
UnwindState(const HybridSample &Sample)		UnwindState(const HybridSample *Sample)
: Binary(Sample.Binary), CallStack(Sample.CallStack),		: Binary(Sample->Binary), CallStack(Sample->CallStack),
LBRStack(Sample.LBRStack),		LBRStack(Sample->LBRStack),
InstPtr(Sample.Binary, Sample.CallStack.front()) {}		InstPtr(Sample->Binary, Sample->CallStack.front()) {}

bool validateInitialState() {		bool validateInitialState() {
uint64_t LBRLeaf = LBRStack[LBRIndex].Target;		uint64_t LBRLeaf = LBRStack[LBRIndex].Target;
uint64_t StackLeaf = CallStack.front();		uint64_t StackLeaf = CallStack.front();
// When we take a stack sample, ideally the sampling distance between the		// When we take a stack sample, ideally the sampling distance between the
// leaf IP of stack and the last LBR target shouldn't be very large.		// leaf IP of stack and the last LBR target shouldn't be very large.
// Use a heuristic size (0x100) to filter out broken records.		// Use a heuristic size (0x100) to filter out broken records.
if (StackLeaf < LBRLeaf \|\| StackLeaf >= LBRLeaf + 0x100) {		if (StackLeaf < LBRLeaf \|\| StackLeaf >= LBRLeaf + 0x100) {
Show All 16 Lines	struct UnwindState {
const ProfiledBinary *getBinary() const { return Binary; }		const ProfiledBinary *getBinary() const { return Binary; }
bool hasNextLBR() const { return LBRIndex < LBRStack.size(); }		bool hasNextLBR() const { return LBRIndex < LBRStack.size(); }
uint64_t getCurrentLBRSource() const { return LBRStack[LBRIndex].Source; }		uint64_t getCurrentLBRSource() const { return LBRStack[LBRIndex].Source; }
uint64_t getCurrentLBRTarget() const { return LBRStack[LBRIndex].Target; }		uint64_t getCurrentLBRTarget() const { return LBRStack[LBRIndex].Target; }
const LBREntry &getCurrentLBR() const { return LBRStack[LBRIndex]; }		const LBREntry &getCurrentLBR() const { return LBRStack[LBRIndex]; }
void advanceLBR() { LBRIndex++; }		void advanceLBR() { LBRIndex++; }
};		};

// The counter of branch samples for one function indexed by the branch,		// Base class for sample counter key with context
// which is represented as the source and target offset pair.		struct ContextKey {
using BranchSample = std::map<std::pair<uint64_t, uint64_t>, uint64_t>;		uint64_t HashCode = 0;
// The counter of range samples for one function indexed by the range,		virtual ~ContextKey() = default;
// which is represented as the start and end offset pair.		uint64_t getHashCode() const { return HashCode; }
using RangeSample = std::map<std::pair<uint64_t, uint64_t>, uint64_t>;		virtual bool isEqual(const ContextKey *K) const {
// Range sample counters indexed by the context string		return HashCode == K->HashCode;
using ContextRangeCounter = std::unordered_map<std::string, RangeSample>;		};
// Branch sample counters indexed by the context string
using ContextBranchCounter = std::unordered_map<std::string, BranchSample>;

// For Hybrid sample counters		// Utilities for LLVM-style RTTI
struct ContextSampleCounters {		enum ContextKind { CK_StringBased };
ContextRangeCounter RangeCounter;		const ContextKind Kind;
ContextBranchCounter BranchCounter;		ContextKind getKind() const { return Kind; }
		ContextKey(ContextKind K) : Kind(K){};
		};

void recordRangeCount(std::string &ContextId, uint64_t Start, uint64_t End,		// String based context id
uint64_t Repeat) {		struct StringBasedCtxKey : public ContextKey {
		wmiUnsubmitted Not Done Reply Inline Actions Can you use Hashable as a base class instead so you can remove the isEqual virtual function (The CRTP pattern: https://en.wikipedia.org/wiki/Curiously_recurring_template_pattern#Static_polymorphism)? It will also simplify the interface using Hashable. struct StringBasedCtxKey : public Hashable<StringBasedCtxKey> { ... } And other type of Key later on: struct ProbeBasedCtxKey : public Hashable<ProbeBasedCtxKey> { ... } wmi: Can you use Hashable as a base class instead so you can remove the isEqual virtual function…
		wleiAuthorUnsubmitted Done Reply Inline Actions Thanks for your suggestion and detailed example. Here I want to put the different types derived from the same base class in the same container. like : struct StringBasedCtxKey : public CtxKey { ... } struct ProbeBasedCtxKey : public CtxKey { ... } Then I can use only one container `unordered_map<CtxKey, ...> ContextSampleCounterMap` . If use CRTP, I guess it need two containers. any ideas about this? wlei: Thanks for your suggestion and detailed example. Here I want to put the different types derived…
		wmiUnsubmitted Not Done Reply Inline Actions Ok, I don't have better idea. A possible solution is described in https://en.wikipedia.org/wiki/Curiously_recurring_template_pattern#Pitfalls, but that prevents Hashable class from being reused by PerfSample without duplication. I am fine with your existing solution If no one else has better idea. wmi: Ok, I don't have better idea. A possible solution is described in https://en.wikipedia.
		wleiAuthorUnsubmitted Done Reply Inline Actions Yeah, we want both `PerfSample` and `ContextKey` could implements a unified Hashable. Thank you. wlei: Yeah, we want both `PerfSample` and `ContextKey` could implements a unified Hashable. Thank you.
		hoyUnsubmitted Not Done Reply Inline Actions Both patterns have their specialties. The current solution looks a bit cleaner though it falls back to use virtual methods to solve hash collision. Hopefully that could be mitigated by using a more efficient hash code. hoy: Both patterns have their specialties. The current solution looks a bit cleaner though it falls…
RangeCounter[ContextId][{Start, End}] += Repeat;		std::string Context;
		StringBasedCtxKey() : ContextKey(CK_StringBased){};
		static bool classof(const ContextKey *K) {
		return K->getKind() == CK_StringBased;
}		}
void recordBranchCount(std::string &ContextId, uint64_t Source,
uint64_t Target, uint64_t Repeat) {		bool isEqual(const ContextKey *K) const override {
BranchCounter[ContextId][{Source, Target}] += Repeat;		const StringBasedCtxKey *Other = dyn_cast<StringBasedCtxKey>(K);
		return Context == Other->Context;
}		}

		void genHashCode() { HashCode = hash_value(Context); }
};		};

struct HybridSampleHash {		// The counter of branch samples for one function indexed by the branch,
uint64_t hashCombine(uint64_t Hash, uint64_t Value) const {		// which is represented as the source and target offset pair.
// Simple DJB2 hash		using BranchSample = std::map<std::pair<uint64_t, uint64_t>, uint64_t>;
return ((Hash << 5) + Hash) + Value;		// The counter of range samples for one function indexed by the range,
}		// which is represented as the start and end offset pair.
		using RangeSample = std::map<std::pair<uint64_t, uint64_t>, uint64_t>;
		// Wrapper for sample counters including range counter and branch counter
		struct SampleCounter {
		RangeSample RangeCounter;
		BranchSample BranchCounter;

uint64_t operator()(const HybridSample &Sample) const {		void recordRangeCount(uint64_t Start, uint64_t End, uint64_t Repeat) {
uint64_t Hash = 5381;		RangeCounter[{Start, End}] += Repeat;
Hash = hashCombine(Hash, reinterpret_cast<uint64_t>(Sample.Binary));
for (const auto &Value : Sample.CallStack) {
Hash = hashCombine(Hash, Value);
}
for (const auto &Entry : Sample.LBRStack) {
Hash = hashCombine(Hash, Entry.Source);
Hash = hashCombine(Hash, Entry.Target);
}		}
return Hash;		void recordBranchCount(uint64_t Source, uint64_t Target, uint64_t Repeat) {
		BranchCounter[{Source, Target}] += Repeat;
}		}
};		};

// After parsing the sample, we record the samples by aggregating them		// Sample counter with context to support context-sensitive profile
// into this structure and the value is the sample counter.		using ContextSampleCounterMap =
		wmiUnsubmitted Done Reply Inline Actions Can we rename it to ContextSampleCounterMap and its object to something like CtxCounterMap? wmi: Can we rename it to ContextSampleCounterMap and its object to something like CtxCounterMap?
		wleiAuthorUnsubmitted Done Reply Inline Actions Good point, the value is actually a counter, renamed. wlei: Good point, the value is actually a counter, renamed.
using AggregationCounter =		std::unordered_map<Hashable<ContextKey>, SampleCounter,
std::unordered_map<HybridSample, uint64_t, HybridSampleHash>;		Hashable<ContextKey>::Hash, Hashable<ContextKey>::Equal>;

/*		/*
As in hybrid sample we have a group of LBRs and the most recent sampling call		As in hybrid sample we have a group of LBRs and the most recent sampling call
stack, we can walk through those LBRs to infer more call stacks which would be		stack, we can walk through those LBRs to infer more call stacks which would be
used as context for profile. VirtualUnwinder is the class to do the call stack		used as context for profile. VirtualUnwinder is the class to do the call stack
unwinding based on LBR state. Two types of unwinding are processd here:		unwinding based on LBR state. Two types of unwinding are processd here:
1) LBR unwinding and 2) linear range unwinding.		1) LBR unwinding and 2) linear range unwinding.
Specifically, for each LBR entry(can be classified into call, return, regular		Specifically, for each LBR entry(can be classified into call, return, regular
branch), LBR unwinding will replay the operation by pushing, popping or		branch), LBR unwinding will replay the operation by pushing, popping or
switching leaf frame towards the call stack and since the initial call stack		switching leaf frame towards the call stack and since the initial call stack
is most recently sampled, the replay should be in anti-execution order, i.e. for		is most recently sampled, the replay should be in anti-execution order, i.e. for
the regular case, pop the call stack when LBR is call, push frame on call stack		the regular case, pop the call stack when LBR is call, push frame on call stack
when LBR is return. After each LBR processed, it also needs to align with the		when LBR is return. After each LBR processed, it also needs to align with the
next LBR by going through instructions from previous LBR's target to current		next LBR by going through instructions from previous LBR's target to current
LBR's source, which is the linear unwinding. As instruction from linear range		LBR's source, which is the linear unwinding. As instruction from linear range
can come from different function by inlining, linear unwinding will do the range		can come from different function by inlining, linear unwinding will do the range
splitting and record counters by the range with same inline context. Over those		splitting and record counters by the range with same inline context. Over those
unwinding process we will record each call stack as context id and LBR/linear		unwinding process we will record each call stack as context id and LBR/linear
range as sample counter for further CS profile generation.		range as sample counter for further CS profile generation.
*/		*/
class VirtualUnwinder {		class VirtualUnwinder {
public:		public:
VirtualUnwinder(ContextSampleCounters *Counters) : SampleCounters(Counters) {}		VirtualUnwinder(ContextSampleCounterMap *Counter) : CtxCounterMap(Counter) {}

bool isCallState(UnwindState &State) const {		bool isCallState(UnwindState &State) const {
// The tail call frame is always missing here in stack sample, we will		// The tail call frame is always missing here in stack sample, we will
// use a specific tail call tracker to infer it.		// use a specific tail call tracker to infer it.
return State.getBinary()->addressIsCall(State.getCurrentLBRSource());		return State.getBinary()->addressIsCall(State.getCurrentLBRSource());
}		}

bool isReturnState(UnwindState &State) const {		bool isReturnState(UnwindState &State) const {
// Simply check addressIsReturn, as ret is always reliable, both for		// Simply check addressIsReturn, as ret is always reliable, both for
// regular call and tail call.		// regular call and tail call.
return State.getBinary()->addressIsReturn(State.getCurrentLBRSource());		return State.getBinary()->addressIsReturn(State.getCurrentLBRSource());
}		}

void unwindCall(UnwindState &State);		void unwindCall(UnwindState &State);
void unwindLinear(UnwindState &State, uint64_t Repeat);		void unwindLinear(UnwindState &State, uint64_t Repeat);
void unwindReturn(UnwindState &State);		void unwindReturn(UnwindState &State);
void unwindBranchWithinFrame(UnwindState &State);		void unwindBranchWithinFrame(UnwindState &State);
bool unwind(const HybridSample &Sample, uint64_t Repeat);		bool unwind(const HybridSample *Sample, uint64_t Repeat);
void recordRangeCount(uint64_t Start, uint64_t End, UnwindState &State,		void recordRangeCount(uint64_t Start, uint64_t End, UnwindState &State,
uint64_t Repeat);		uint64_t Repeat);
void recordBranchCount(const LBREntry &Branch, UnwindState &State,		void recordBranchCount(const LBREntry &Branch, UnwindState &State,
uint64_t Repeat);		uint64_t Repeat);
		SampleCounter &getOrCreateSampleCounter(const ProfiledBinary *Binary,
		std::list<uint64_t> &CallStack);

private:		private:
ContextSampleCounters *SampleCounters;		ContextSampleCounterMap *CtxCounterMap;
};		};

// Filename to binary map		// Filename to binary map
using BinaryMap = StringMap<ProfiledBinary>;		using BinaryMap = StringMap<ProfiledBinary>;
// Address to binary map for fast look-up		// Address to binary map for fast look-up
using AddressBinaryMap = std::map<uint64_t, ProfiledBinary *>;		using AddressBinaryMap = std::map<uint64_t, ProfiledBinary *>;
// Binary to ContextSampleCounters Map to support multiple binary, we may have		// Binary to ContextSampleCounters Map to support multiple binary, we may have
// same binary loaded at different addresses, they should share the same sample		// same binary loaded at different addresses, they should share the same sample
// counter		// counter
using BinarySampleCounterMap =		using BinarySampleCounterMap =
std::unordered_map<ProfiledBinary *, ContextSampleCounters>;		std::unordered_map<ProfiledBinary *, ContextSampleCounterMap>;

// Load binaries and read perf trace to parse the events and samples		// Load binaries and read perf trace to parse the events and samples
class PerfReader {		class PerfReader {

public:		public:
PerfReader(cl::list<std::string> &BinaryFilenames);		PerfReader(cl::list<std::string> &BinaryFilenames);

// Hybrid sample(call stack + LBRs) profile traces are seprated by double line		// Hybrid sample(call stack + LBRs) profile traces are seprated by double line
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	private:
ProfiledBinary *getBinary(uint64_t Address);		ProfiledBinary *getBinary(uint64_t Address);

BinaryMap BinaryTable;		BinaryMap BinaryTable;
AddressBinaryMap AddrToBinaryMap; // Used by address-based lookup.		AddressBinaryMap AddrToBinaryMap; // Used by address-based lookup.

private:		private:
BinarySampleCounterMap BinarySampleCounters;		BinarySampleCounterMap BinarySampleCounters;
// Samples with the repeating time generated by the perf reader		// Samples with the repeating time generated by the perf reader
AggregationCounter AggregatedSamples;		AggregatedCounter AggregatedSamples;
PerfScriptType PerfType;		PerfScriptType PerfType;
};		};

} // end namespace sampleprof		} // end namespace sampleprof
} // end namespace llvm		} // end namespace llvm

#endif		#endif

llvm/tools/llvm-profgen/PerfReader.cpp

Show First 20 Lines • Show All 66 Lines • ▼ Show 20 Lines
void VirtualUnwinder::unwindBranchWithinFrame(UnwindState &State) {		void VirtualUnwinder::unwindBranchWithinFrame(UnwindState &State) {
// TODO: Tolerate tail call for now, as we may see tail call from libraries.		// TODO: Tolerate tail call for now, as we may see tail call from libraries.
// This is only for intra function branches, excluding tail calls.		// This is only for intra function branches, excluding tail calls.
uint64_t Source = State.getCurrentLBRSource();		uint64_t Source = State.getCurrentLBRSource();
State.CallStack.front() = Source;		State.CallStack.front() = Source;
State.InstPtr.update(Source);		State.InstPtr.update(Source);
}		}

		SampleCounter &
		VirtualUnwinder::getOrCreateSampleCounter(const ProfiledBinary *Binary,
		hoyUnsubmitted Done Reply Inline Actions Name it `getOrCreateSampleCounter`? hoy: Name it `getOrCreateSampleCounter`?
		wleiAuthorUnsubmitted Done Reply Inline Actions renamed wlei: renamed
		std::list<uint64_t> &CallStack) {
		std::shared_ptr<StringBasedCtxKey> KeyStr =
		std::make_shared<StringBasedCtxKey>();
		KeyStr->Context = Binary->getExpandedContextStr(CallStack);
		KeyStr->genHashCode();
		auto Ret =
		CtxCounterMap->emplace(Hashable<ContextKey>(KeyStr), SampleCounter());
		return Ret.first->second;
		}

void VirtualUnwinder::recordRangeCount(uint64_t Start, uint64_t End,		void VirtualUnwinder::recordRangeCount(uint64_t Start, uint64_t End,
UnwindState &State, uint64_t Repeat) {		UnwindState &State, uint64_t Repeat) {
std::string &&ContextId = State.getExpandedContextStr();
uint64_t StartOffset = State.getBinary()->virtualAddrToOffset(Start);		uint64_t StartOffset = State.getBinary()->virtualAddrToOffset(Start);
uint64_t EndOffset = State.getBinary()->virtualAddrToOffset(End);		uint64_t EndOffset = State.getBinary()->virtualAddrToOffset(End);
SampleCounters->recordRangeCount(ContextId, StartOffset, EndOffset, Repeat);		SampleCounter &SCounter =
		getOrCreateSampleCounter(State.getBinary(), State.CallStack);
		SCounter.recordRangeCount(StartOffset, EndOffset, Repeat);
}		}

void VirtualUnwinder::recordBranchCount(const LBREntry &Branch,		void VirtualUnwinder::recordBranchCount(const LBREntry &Branch,
UnwindState &State, uint64_t Repeat) {		UnwindState &State, uint64_t Repeat) {
if (Branch.IsArtificial)		if (Branch.IsArtificial)
return;		return;
std::string &&ContextId = State.getExpandedContextStr();
uint64_t SourceOffset = State.getBinary()->virtualAddrToOffset(Branch.Source);		uint64_t SourceOffset = State.getBinary()->virtualAddrToOffset(Branch.Source);
uint64_t TargetOffset = State.getBinary()->virtualAddrToOffset(Branch.Target);		uint64_t TargetOffset = State.getBinary()->virtualAddrToOffset(Branch.Target);
SampleCounters->recordBranchCount(ContextId, SourceOffset, TargetOffset,		SampleCounter &SCounter =
Repeat);		getOrCreateSampleCounter(State.getBinary(), State.CallStack);
		SCounter.recordBranchCount(SourceOffset, TargetOffset, Repeat);
}		}

bool VirtualUnwinder::unwind(const HybridSample &Sample, uint64_t Repeat) {		bool VirtualUnwinder::unwind(const HybridSample *Sample, uint64_t Repeat) {
// Capture initial state as starting point for unwinding.		// Capture initial state as starting point for unwinding.
UnwindState State(Sample);		UnwindState State(Sample);

// Sanity check - making sure leaf of LBR aligns with leaf of stack sample		// Sanity check - making sure leaf of LBR aligns with leaf of stack sample
// Stack sample sometimes can be unreliable, so filter out bogus ones.		// Stack sample sometimes can be unreliable, so filter out bogus ones.
if (!State.validateInitialState())		if (!State.validateInitialState())
return false;		return false;

▲ Show 20 Lines • Show All 90 Lines • ▼ Show 20 Lines	ProfiledBinary *PerfReader::getBinary(uint64_t Address) {
if (Iter == AddrToBinaryMap.end() \|\| Iter->first != Address) {		if (Iter == AddrToBinaryMap.end() \|\| Iter->first != Address) {
if (Iter == AddrToBinaryMap.begin())		if (Iter == AddrToBinaryMap.begin())
return nullptr;		return nullptr;
Iter--;		Iter--;
}		}
return Iter->second;		return Iter->second;
}		}

static void printSampleCounter(ContextRangeCounter &Counter) {
// Use ordered map to make the output deterministic		// Use ordered map to make the output deterministic
std::map<std::string, RangeSample> OrderedCounter(Counter.begin(),		using OrderedCounterForPrint = std::map<StringRef, RangeSample>;
Counter.end());
		wenleiUnsubmitted Not Done Reply Inline Actions Does this map also needs to own a copy of the context string? nit: perhaps remove this comment below? determinism mentioned above is good enough. This is due to a build failure on sanitizer build(asan/msan/ubsan) wenlei: Does this map also needs to own a copy of the context string? nit: perhaps remove this…
		wleiAuthorUnsubmitted Done Reply Inline Actions Good point. Changed to `StringRef` and remove this comment. wlei: Good point. Changed to `StringRef` and remove this comment.
		static void printSampleCounter(OrderedCounterForPrint &OrderedCounter) {
for (auto Range : OrderedCounter) {		for (auto Range : OrderedCounter) {
outs() << Range.first << "\n";		outs() << Range.first << "\n";
for (auto I : Range.second) {		for (auto I : Range.second) {
outs() << " (" << format("%" PRIx64, I.first.first) << ", "		outs() << " (" << format("%" PRIx64, I.first.first) << ", "
<< format("%" PRIx64, I.first.second) << "): " << I.second << "\n";		<< format("%" PRIx64, I.first.second) << "): " << I.second << "\n";
}		}
}		}
}		}

		static void printRangeCounter(ContextSampleCounterMap &Counter) {
		OrderedCounterForPrint OrderedCounter;
		for (auto &CI : Counter) {
		const StringBasedCtxKey *CtxKey =
		dyn_cast<StringBasedCtxKey>(CI.first.getPtr());
		OrderedCounter[CtxKey->Context] = CI.second.RangeCounter;
		}
		printSampleCounter(OrderedCounter);
		}

		static void printBranchCounter(ContextSampleCounterMap &Counter) {
		OrderedCounterForPrint OrderedCounter;
		for (auto &CI : Counter) {
		const StringBasedCtxKey *CtxKey =
		dyn_cast<StringBasedCtxKey>(CI.first.getPtr());
		OrderedCounter[CtxKey->Context] = CI.second.BranchCounter;
		}
		printSampleCounter(OrderedCounter);
		}

void PerfReader::printUnwinderOutput() {		void PerfReader::printUnwinderOutput() {
for (auto I : BinarySampleCounters) {		for (auto I : BinarySampleCounters) {
const ProfiledBinary *Binary = I.first;		const ProfiledBinary *Binary = I.first;
outs() << "Binary(" << Binary->getName().str() << ")'s Range Counter:\n";		outs() << "Binary(" << Binary->getName().str() << ")'s Range Counter:\n";
printSampleCounter(I.second.RangeCounter);		printRangeCounter(I.second);
outs() << "\nBinary(" << Binary->getName().str() << ")'s Branch Counter:\n";		outs() << "\nBinary(" << Binary->getName().str() << ")'s Branch Counter:\n";
printSampleCounter(I.second.BranchCounter);		printBranchCounter(I.second);
}		}
}		}

void PerfReader::unwindSamples() {		void PerfReader::unwindSamples() {
for (const auto &Item : AggregatedSamples) {		for (const auto &Item : AggregatedSamples) {
const HybridSample &Sample = Item.first;		const HybridSample *Sample = dyn_cast<HybridSample>(Item.first.getPtr());
VirtualUnwinder Unwinder(&BinarySampleCounters[Sample.Binary]);		VirtualUnwinder Unwinder(&BinarySampleCounters[Sample->Binary]);
Unwinder.unwind(Sample, Item.second);		Unwinder.unwind(Sample, Item.second);
}		}

if (ShowUnwinderOutput)		if (ShowUnwinderOutput)
printUnwinderOutput();		printUnwinderOutput();
}		}

bool PerfReader::extractLBRStack(TraceStream &TraceIt,		bool PerfReader::extractLBRStack(TraceStream &TraceIt,
▲ Show 20 Lines • Show All 125 Lines • ▼ Show 20 Lines	void PerfReader::parseHybridSample(TraceStream &TraceIt) {
// intermediately by LBR sample		// intermediately by LBR sample
// e.g.		// e.g.
// 4005dc # call stack leaf		// 4005dc # call stack leaf
// 400634		// 400634
// 400684 # call stack root		// 400684 # call stack root
// 0x4005c8/0x4005dc/P/-/-/0 0x40062f/0x4005b0/P/-/-/0 ...		// 0x4005c8/0x4005dc/P/-/-/0 0x40062f/0x4005b0/P/-/-/0 ...
// ... 0x4005c8/0x4005dc/P/-/-/0 # LBR Entries		// ... 0x4005c8/0x4005dc/P/-/-/0 # LBR Entries
//		//
HybridSample Sample;		std::shared_ptr<HybridSample> Sample = std::make_shared<HybridSample>();

// Parsing call stack and populate into HybridSample.CallStack		// Parsing call stack and populate into HybridSample.CallStack
if (!extractCallstack(TraceIt, Sample.CallStack)) {		if (!extractCallstack(TraceIt, Sample->CallStack)) {
// Skip the next LBR line matched current call stack		// Skip the next LBR line matched current call stack
if (!TraceIt.isAtEoF() && TraceIt.getCurrentLine().startswith(" 0x"))		if (!TraceIt.isAtEoF() && TraceIt.getCurrentLine().startswith(" 0x"))
TraceIt.advance();		TraceIt.advance();
return;		return;
}		}
// Set the binary current sample belongs to		// Set the binary current sample belongs to
Sample.Binary = getBinary(Sample.CallStack.front());		Sample->Binary = getBinary(Sample->CallStack.front());

if (!TraceIt.isAtEoF() && TraceIt.getCurrentLine().startswith(" 0x")) {		if (!TraceIt.isAtEoF() && TraceIt.getCurrentLine().startswith(" 0x")) {
// Parsing LBR stack and populate into HybridSample.LBRStack		// Parsing LBR stack and populate into HybridSample.LBRStack
if (extractLBRStack(TraceIt, Sample.LBRStack, Sample.Binary)) {		if (extractLBRStack(TraceIt, Sample->LBRStack, Sample->Binary)) {
// Canonicalize stack leaf to avoid 'random' IP from leaf frame skew LBR		// Canonicalize stack leaf to avoid 'random' IP from leaf frame skew LBR
// ranges		// ranges
Sample.CallStack.front() = Sample.LBRStack[0].Target;		Sample->CallStack.front() = Sample->LBRStack[0].Target;
// Record samples by aggregation		// Record samples by aggregation
AggregatedSamples[Sample]++;		Sample->genHashCode();
		AggregatedSamples[Hashable<PerfSample>(Sample)]++;
}		}
} else {		} else {
// LBR sample is encoded in single line after stack sample		// LBR sample is encoded in single line after stack sample
exitWithError("'Hybrid perf sample is corrupted, No LBR sample line");		exitWithError("'Hybrid perf sample is corrupted, No LBR sample line");
}		}
}		}

void PerfReader::parseMMap2Event(TraceStream &TraceIt) {		void PerfReader::parseMMap2Event(TraceStream &TraceIt) {
▲ Show 20 Lines • Show All 114 Lines • Show Last 20 Lines

llvm/tools/llvm-profgen/ProfileGenerator.h

Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	class CSProfileGenerator : public ProfileGenerator {
const BinarySampleCounterMap &BinarySampleCounters;		const BinarySampleCounterMap &BinarySampleCounters;

public:		public:
CSProfileGenerator(const BinarySampleCounterMap &Counters)		CSProfileGenerator(const BinarySampleCounterMap &Counters)
: BinarySampleCounters(Counters){};		: BinarySampleCounters(Counters){};

public:		public:
void generateProfile() override {		void generateProfile() override {
// Fill in function body samples		for (const auto &BI : BinarySampleCounters) {
populateFunctionBodySamples();		ProfiledBinary *Binary = BI.first;
		for (const auto &CI : BI.second) {
		const StringBasedCtxKey *CtxKey =
		dyn_cast<StringBasedCtxKey>(CI.first.getPtr());
		StringRef ContextId(CtxKey->Context);
		// Get or create function profile for the range
		FunctionSamples &FunctionProfile =
		getFunctionProfileForContext(ContextId);

		// Fill in function body samples
		populateFunctionBodySamples(FunctionProfile, CI.second.RangeCounter,
		Binary);
// Fill in boundary sample counts as well as call site samples for calls		// Fill in boundary sample counts as well as call site samples for calls
populateFunctionBoundarySamples();		populateFunctionBoundarySamples(ContextId, FunctionProfile,
		CI.second.BranchCounter, Binary);
		}
		}
// Fill in call site value sample for inlined calls and also use context to		// Fill in call site value sample for inlined calls and also use context to
// infer missing samples. Since we don't have call count for inlined		// infer missing samples. Since we don't have call count for inlined
// functions, we estimate it from inlinee's profile using the entry of the		// functions, we estimate it from inlinee's profile using the entry of the
// body sample.		// body sample.
populateInferredFunctionSamples();		populateInferredFunctionSamples();
}		}

private:		private:
// Helper function for updating body sample for a leaf location in		// Helper function for updating body sample for a leaf location in
// FunctionProfile		// FunctionProfile
void updateBodySamplesforFunctionProfile(FunctionSamples &FunctionProfile,		void updateBodySamplesforFunctionProfile(FunctionSamples &FunctionProfile,
const FrameLocation &LeafLoc,		const FrameLocation &LeafLoc,
uint64_t Count);		uint64_t Count);
// Lookup or create FunctionSamples for the context		// Lookup or create FunctionSamples for the context
FunctionSamples &getFunctionProfileForContext(StringRef ContextId);		FunctionSamples &getFunctionProfileForContext(StringRef ContextId);
void populateFunctionBodySamples();		void populateFunctionBodySamples(FunctionSamples &FunctionProfile,
void populateFunctionBoundarySamples();		const RangeSample &RangeCounters,
		ProfiledBinary *Binary);
		void populateFunctionBoundarySamples(StringRef ContextId,
		FunctionSamples &FunctionProfile,
		const BranchSample &BranchCounters,
		ProfiledBinary *Binary);
void populateInferredFunctionSamples();		void populateInferredFunctionSamples();
};		};

} // end namespace sampleprof		} // end namespace sampleprof
} // end namespace llvm		} // end namespace llvm

#endif		#endif

llvm/tools/llvm-profgen/ProfileGenerator.cpp

Show First 20 Lines • Show All 172 Lines • ▼ Show 20 Lines	void CSProfileGenerator::updateBodySamplesforFunctionProfile(
if (PreviousCount < Count) {		if (PreviousCount < Count) {
FunctionProfile.addBodySamples(LeafLoc.second.LineOffset,		FunctionProfile.addBodySamples(LeafLoc.second.LineOffset,
LeafLoc.second.Discriminator,		LeafLoc.second.Discriminator,
Count - PreviousCount);		Count - PreviousCount);
FunctionProfile.addTotalSamples(Count - PreviousCount);		FunctionProfile.addTotalSamples(Count - PreviousCount);
}		}
}		}

void CSProfileGenerator::populateFunctionBodySamples() {		void CSProfileGenerator::populateFunctionBodySamples(
for (const auto &BI : BinarySampleCounters) {		FunctionSamples &FunctionProfile, const RangeSample &RangeCounter,
ProfiledBinary *Binary = BI.first;		ProfiledBinary *Binary) {
for (const auto &CI : BI.second.RangeCounter) {
StringRef ContextId(CI.first);
// Get or create function profile for the range
FunctionSamples &FunctionProfile =
getFunctionProfileForContext(ContextId);
// Compute disjoint ranges first, so we can use MAX		// Compute disjoint ranges first, so we can use MAX
// for calculating count for each location.		// for calculating count for each location.
RangeSample Ranges;		RangeSample Ranges;
findDisjointRanges(Ranges, CI.second);		findDisjointRanges(Ranges, RangeCounter);

for (auto Range : Ranges) {		for (auto Range : Ranges) {
uint64_t RangeBegin = Binary->offsetToVirtualAddr(Range.first.first);		uint64_t RangeBegin = Binary->offsetToVirtualAddr(Range.first.first);
uint64_t RangeEnd = Binary->offsetToVirtualAddr(Range.first.second);		uint64_t RangeEnd = Binary->offsetToVirtualAddr(Range.first.second);
uint64_t Count = Range.second;		uint64_t Count = Range.second;
// Disjoint ranges have introduce zero-filled gap that		// Disjoint ranges have introduce zero-filled gap that
// doesn't belong to current context, filter them out.		// doesn't belong to current context, filter them out.
if (Count == 0)		if (Count == 0)
continue;		continue;

InstructionPointer IP(Binary, RangeBegin, true);		InstructionPointer IP(Binary, RangeBegin, true);

// Disjoint ranges may have range in the middle of two instr,		// Disjoint ranges may have range in the middle of two instr,
// e.g. If Instr1 at Addr1, and Instr2 at Addr2, disjoint range		// e.g. If Instr1 at Addr1, and Instr2 at Addr2, disjoint range
// can be Addr1+1 to Addr2-1. We should ignore such range.		// can be Addr1+1 to Addr2-1. We should ignore such range.
if (IP.Address > RangeEnd)		if (IP.Address > RangeEnd)
continue;		continue;

while (IP.Address <= RangeEnd) {		while (IP.Address <= RangeEnd) {
uint64_t Offset = Binary->virtualAddrToOffset(IP.Address);		uint64_t Offset = Binary->virtualAddrToOffset(IP.Address);
const FrameLocation &LeafLoc = Binary->getInlineLeafFrameLoc(Offset);		const FrameLocation &LeafLoc = Binary->getInlineLeafFrameLoc(Offset);
// Recording body sample for this specific context		// Recording body sample for this specific context
updateBodySamplesforFunctionProfile(FunctionProfile, LeafLoc, Count);		updateBodySamplesforFunctionProfile(FunctionProfile, LeafLoc, Count);
// Move to next IP within the range		// Move to next IP within the range
IP.advance();		IP.advance();
}		}
}		}
}		}
}
}

void CSProfileGenerator::populateFunctionBoundarySamples() {		void CSProfileGenerator::populateFunctionBoundarySamples(
for (const auto &BI : BinarySampleCounters) {		StringRef ContextId, FunctionSamples &FunctionProfile,
ProfiledBinary *Binary = BI.first;		const BranchSample &BranchCounters, ProfiledBinary *Binary) {
for (const auto &CI : BI.second.BranchCounter) {
StringRef ContextId(CI.first);
// Get or create function profile for branch Source
FunctionSamples &FunctionProfile =
getFunctionProfileForContext(ContextId);

for (auto Entry : CI.second) {		for (auto Entry : BranchCounters) {
uint64_t SourceOffset = Entry.first.first;		uint64_t SourceOffset = Entry.first.first;
uint64_t TargetOffset = Entry.first.second;		uint64_t TargetOffset = Entry.first.second;
uint64_t Count = Entry.second;		uint64_t Count = Entry.second;
// Get the callee name by branch target if it's a call branch		// Get the callee name by branch target if it's a call branch
StringRef CalleeName = FunctionSamples::getCanonicalFnName(		StringRef CalleeName = FunctionSamples::getCanonicalFnName(
Binary->getFuncFromStartOffset(TargetOffset));		Binary->getFuncFromStartOffset(TargetOffset));
if (CalleeName.size() == 0)		if (CalleeName.size() == 0)
continue;		continue;

// Record called target sample and its count		// Record called target sample and its count
const FrameLocation &LeafLoc =		const FrameLocation &LeafLoc = Binary->getInlineLeafFrameLoc(SourceOffset);
Binary->getInlineLeafFrameLoc(SourceOffset);

FunctionProfile.addCalledTargetSamples(LeafLoc.second.LineOffset,		FunctionProfile.addCalledTargetSamples(LeafLoc.second.LineOffset,
LeafLoc.second.Discriminator,		LeafLoc.second.Discriminator,
CalleeName, Count);		CalleeName, Count);
FunctionProfile.addTotalSamples(Count);		FunctionProfile.addTotalSamples(Count);

// Record head sample for called target(callee)		// Record head sample for called target(callee)
// TODO: Cleanup ' @ '		// TODO: Cleanup ' @ '
std::string CalleeContextId =		std::string CalleeContextId =
getCallSite(LeafLoc) + " @ " + CalleeName.str();		getCallSite(LeafLoc) + " @ " + CalleeName.str();
if (ContextId.find(" @ ") != StringRef::npos) {		if (ContextId.find(" @ ") != StringRef::npos) {
CalleeContextId =		CalleeContextId =
ContextId.rsplit(" @ ").first.str() + " @ " + CalleeContextId;		ContextId.rsplit(" @ ").first.str() + " @ " + CalleeContextId;
}		}

if (ProfileMap.find(CalleeContextId) != ProfileMap.end()) {		FunctionSamples &CalleeProfile =
FunctionSamples &CalleeProfile = ProfileMap[CalleeContextId];		getFunctionProfileForContext(CalleeContextId);
assert(Count != 0 && "Unexpected zero weight branch");		assert(Count != 0 && "Unexpected zero weight branch");
if (CalleeProfile.getName().size()) {
CalleeProfile.addHeadSamples(Count);		CalleeProfile.addHeadSamples(Count);
}		}
}		}
}
}
}
}

static FrameLocation getCallerContext(StringRef CalleeContext,		static FrameLocation getCallerContext(StringRef CalleeContext,
StringRef &CallerNameWithContext) {		StringRef &CallerNameWithContext) {
StringRef CallerContext = CalleeContext.rsplit(" @ ").first;		StringRef CallerContext = CalleeContext.rsplit(" @ ").first;
CallerNameWithContext = CallerContext.rsplit(':').first;		CallerNameWithContext = CallerContext.rsplit(':').first;
auto ContextSplit = CallerContext.rsplit(" @ ");		auto ContextSplit = CallerContext.rsplit(" @ ");
FrameLocation LeafFrameLoc = {"", {0, 0}};		FrameLocation LeafFrameLoc = {"", {0, 0}};
StringRef Funcname;		StringRef Funcname;
▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines