This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
docs/CommandGuide/
-
CommandGuide/
-
llvm-profgen.rst
-
include/llvm/ProfileData/
-
llvm/
-
ProfileData/
4/4
SampleProf.h
-
lib/ProfileData/
-
ProfileData/
-
SampleProfWriter.cpp
-
test/tools/llvm-profgen/
-
tools/
-
llvm-profgen/
-
Inputs/
-
inline-cs-noprobe.perfbin
-
inline-cs-noprobe.perfscript
-
noinline-cs-noprobe.perfbin
1/4
noinline-cs-noprobe.perfscript
4/4
inline-cs-noprobe.test
7
noinline-cs-noprobe.test
-
tools/llvm-profgen/
-
llvm-profgen/
-
CMakeLists.txt
21/44
PerfReader.h
23/28
PerfReader.cpp
2/4
ProfileGenerator.h
15/21
ProfileGenerator.cpp
4/4
ProfiledBinary.h
2/5
ProfiledBinary.cpp
4/4
llvm-profgen.cpp

Differential D89723

[CSSPGO][llvm-profgen] Context-sensitive profile data generation
ClosedPublic

Authored by wlei on Oct 19 2020, 12:59 PM.

Download Raw Diff

Details

Reviewers

hoy
wenlei
wmi
davidxl

Commits

rG1f05b1a9f527: [CSSPGO][llvm-profgen] Context-sensitive profile data generation

Summary

This stack of changes introduces llvm-profgen utility which generates a profile data file from given perf script data files for sample-based PGO. It’s part of(not only) the CSSPGO work. Specifically to support context-sensitive with/without pseudo probe profile, it implements a series of functionalities including perf trace parsing, instruction symbolization, LBR stack/call frame stack unwinding, pseudo probe decoding, etc. Also high throughput is achieved by multiple levels of sample aggregation and compatible format with one stop is generated at the end. Please refer to: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s for the CSSPGO RFC.

This change supports context-sensitive profile data generation into llvm-profgen. With simultaneous sampling for LBR and call stack, we can identify leaf of LBR sample with calling context from stack sample . During the process of deriving fall through path from LBR entries, we unwind LBR by replaying all the calls and returns (including implicit calls/returns due to inlining) backwards on top of the sampled call stack. Then the state of call stack as we unwind through LBR always represents the calling context of current fall through path.

we have two types of virtual unwinding 1) LBR unwinding and 2) linear range unwinding.
Specifically, for each LBR entry which can be classified into call, return, regular branch, LBR unwinding will replay the operation by pushing, popping or switching leaf frame towards the call stack and since the initial call stack is most recently sampled, the replay should be in anti-execution order, i.e. for the regular case, pop the call stack when LBR is call, push frame on call stack when LBR is return. After each LBR processed, it also needs to align with the next LBR by going through instructions from previous LBR's target to current LBR's source, which we named linear unwinding. As instruction from linear range can come from different function by inlining, linear unwinding will do the range splitting and record counters through the range with same inline context.

With each fall through path from LBR unwinding, we aggregate each sample into counters by the calling context and eventually generate full context sensitive profile (without relying on inlining) to driver compiler's PGO/FDO.

A breakdown of noteworthy changes:

Added HybridSample class as the abstraction perf sample including LBR stack and call stack
Extended PerfReader to implement auto-detect whether input perf script output contains CS profile, then do the parsing. Multiple HybridSample are extracted
Speed up by aggregating HybridSample into AggregatedSamples
Added VirtualUnwinder that consumes aggregated HybridSample and implements unwinding of calls, returns, and linear path that contains implicit call/return from inlining. Ranges and branches counters are aggregated by the calling context.  Here calling context is string type, each context is a pair of function name and callsite location info, the whole context is like main:1 @ foo:2 @ bar.
Added PorfileGenerater that accumulates counters by ranges unfolding or branch target mapping, then generates context-sensitive function profile including function body, inferring callee's head sample, callsite target samples, eventually records into ProfileMap. 
Leveraged LLVM build-in(SampleProfWriter) writer to support different serialization format with no stop
getCanonicalFnName for callee name and name from ELF section
Added regression test for both unwinding and profile generation

Test Plan:
ninja & ninja check-llvm

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60 ms	x64 windows > LLVM.CodeGen/XCore::threads.ll
	60 ms	x64 windows > LLVM.tools/llvm-profgen::inline-cs-noprobe.test
	50 ms	x64 windows > LLVM.tools/llvm-profgen::noinline-cs-noprobe.test

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

hoy added inline comments.Nov 5 2020, 1:50 PM

llvm/tools/llvm-profgen/PerfReader.h
182	The comment could be retired since we have a tail call tracker coming that tracks both in-LBR tail calls and out-of-LBR tail calls universally.
llvm/tools/llvm-profgen/llvm-profgen.cpp
44–54	Perhaps it's better to include the unwinder in the reader since this driver will also handle non-CS profiles in future. The dataflow from the reader to the profile generator may need a flexible definition (currently is `Unwinder.getSampleCounters()`) for future extension.

wenlei added inline comments.Nov 5 2020, 3:06 PM

llvm/tools/llvm-profgen/PerfReader.h
182	I think the comment needs to be updated, but explanation here is still needed because IIUC missing frame inference happens more like a post process (hence somewhat orthogonal), and here `isCallState` decides the unwind operation on the stack sample (not changed by frame inference) which will always miss tail call frame (unless dwarf stack walking is used by perf).
llvm/tools/llvm-profgen/llvm-profgen.cpp
44–54	Agreed that unwinder better be driven by PerfReader since unwinder is something PerfReader depends on directly (vs depending on its output like ProfileGenerator on PerfReader's output).

move unwinder into PerfReader
use a BinarytoSampleCounter map to group sample counters by binary
add PrologEpilog tracker
support to use getCanonicalFnName for ELF Section based symbol name
fix a negative line offset bug
other refactoring work

Harbormaster completed remote builds in B79015: Diff 305608.Nov 16 2020, 3:06 PM

rebase

Harbormaster completed remote builds in B79035: Diff 305634.Nov 16 2020, 5:56 PM

hoy added inline comments.Nov 17 2020, 4:40 PM

llvm/tools/llvm-profgen/PerfReader.cpp
471	`PerfType` should be defined on the `else` branch if it is not initialized anywhere else.
llvm/tools/llvm-profgen/PerfReader.h
283	Should this be rewritten with a stream-based file reader as done in D89707?

Address reviewer's feedback on PerfType definition

Harbormaster completed remote builds in B79342: Diff 306182.Nov 18 2020, 12:07 PM

wlei added inline comments.Nov 18 2020, 12:07 PM

llvm/tools/llvm-profgen/PerfReader.h
283	I guess you mean to keep consistent to other part of code? Here you see it only read 4000bytes data from the file(`getFileOrSTDIN(FileName, 4000);`), so there shouldn't have memory issue. Currently stream-based liner only support read one line at a time, it need to search line by line, which would be slower than searching in the whole 4k memory. So which one do you prefer?

hoy added inline comments.Nov 18 2020, 4:24 PM

llvm/tools/llvm-profgen/PerfReader.h
283	I see. The current implementation looks good to me.

[NFC]rebase

Harbormaster completed remote builds in B79396: Diff 306285.Nov 18 2020, 6:56 PM

wmi added inline comments.Nov 19 2020, 11:22 AM

llvm/tools/llvm-profgen/PerfReader.h
214–215	This virtual unwinder is not doing the classic unwinding thing. It is walking through the LBR stack of a LBR sample, based on the sample's callstack, and infer the callstack for each address range covered by the LBR sample. The comment can be more clear about it.
llvm/tools/llvm-profgen/ProfileGenerator.cpp
63–71	Why there is region [A, B]: 300, but B: (0, 100) only has 100 sample count?
263	Conext --> Context?

wenlei mentioned this in D90125: [CSSPGO] Infrastructure for context-sensitive Sample PGO and Inlining.Nov 20 2020, 9:53 AM

add more comments for unwinder and BoundaryPoint
remove Skylake only LBR duplication filter

Harbormaster completed remote builds in B79630: Diff 306726.Nov 20 2020, 10:03 AM

wlei marked 43 inline comments as done.Nov 20 2020, 10:07 AM

wlei added inline comments.

llvm/tools/llvm-profgen/PerfReader.h
214–215	Thanks for your suggestion, more comments are added.
llvm/tools/llvm-profgen/ProfileGenerator.cpp
63–71	Sorry for the confusion. See the graph below, here B:(0, 100) is the boundary point, 0 means no samples begin at B, 100 means one sample(sample1) ends at B whose count is 100. I changed the explanation in the comment, see whether it's clear or not. \|<--100-->\| Sample1 \|<------200------>\| Sample2 A B C

hoy added inline comments.Nov 20 2020, 10:50 AM

llvm/tools/llvm-profgen/PerfReader.cpp
26	Nit: please add a TODO here to check if `Source` is in prolog/epilog using precise prolog/epilog table.
llvm/tools/llvm-profgen/ProfileGenerator.cpp
48	I'm wondering if a separate profile file should be output for each binary. Since the samples are already separated for binaries via `BinarySampleCounters`, `ProfileMap` can be made like that too.
llvm/tools/llvm-profgen/ProfiledBinary.cpp
138	Nit: remove the check and add it back with the compression work.

wenlei added inline comments.Nov 20 2020, 10:55 AM

llvm/test/tools/llvm-profgen/Inputs/noinline-cs-noprobe.perfscript
3	I think we also need to support cases where PERF_RECORD_MMAP2 event isn't available, in which case we just use preferred load address from ELF header. Can you add a test case that doesn't have PERF_RECORD_MMAP2? Looks like currently we would just proceed with parsing without a base address set?
llvm/tools/llvm-profgen/PerfReader.cpp
503–506	What would be the workflow for (non-CS) AutoFDO with this new implementation? It looks like `parseTrace` is responsible for aggregation only, then even for AutoFDO, there'll be a post-process after that, to get range:count, right? so it looks to me that a unified workflow could be something like this? for (auto Filename : PerfTraceFilenames) parseAndAggregateTrace(Filename); generateRawProfile(); In side `generateRawProfile`, we would do simple range overlap computation for AutoFDO, or unwind for CSSPGO. Also see comments on `AggregationCounter` - in addition to unifying the workflow, it would be good to unify data structure as well if possible. What do you think?
llvm/tools/llvm-profgen/PerfReader.h
211	The idea of aggregation applies to (non-CS) AutoFDO too. It'd be good to put infrastructure in place that can cover both AutoFDO and CSSPGO in a generic way. Perhaps we can treat non-CS AutoFDO profile (or regular LBR perf profile) just like a hybrid profile except stack part is always empty? Is that what you have in mind?

wlei added inline comments.Nov 20 2020, 11:47 AM

llvm/tools/llvm-profgen/ProfileGenerator.cpp
48	Yeah, it's doable. but that needs more CL design, currently we only support one output file, so we have to change supporting multiple output files which also need an exact one-one mapping to the binary. So we can use `OutputFilenames` to receives multiple output files and match them in order on the command line? or I'm also thinking we just remain this and if the user really need to separate the output for binary, they could call the tool multiple times with different input binary. any suggestions on the command?

wlei added inline comments.Nov 20 2020, 1:14 PM

llvm/test/tools/llvm-profgen/Inputs/noinline-cs-noprobe.perfscript
3	Yeah, currently PERF_RECORD_MMAP2 is required. The problem using preferred load address for non-mmap event is one perf address might belong to multiple binaries, which will mess up the whole process. Also we need to one more perftrace scan to confirm there is no mmap2 event so that we can switch to use preferred address. or we can have a switch like "--no-mmp2-events" to explicitly tell the tool use preferred address, also only support one binary under this switch. or we need some info in the perf trace tell which binary it belong to(I remembered we discuss this internally). any suggestion on this?
llvm/tools/llvm-profgen/PerfReader.cpp
503–506	Good suggestion! As you mention, we can incorporate all into unwinder by treating non-CS profile as hybrid sample with empty call stack. So how about we do that when implementing non-CS part, right now I will change to code like blow? void generateRawProfile (..) { if(getPerfScriptType() == PERF_LBR) { // range overlap computation for regular AutoFdo ... } else if (getPerfScriptType() == PERF_LBR_STACK) { // Unwind samples if it's hybird sample unwindSamples(); } }
llvm/tools/llvm-profgen/PerfReader.h
211	Yeah, it should not specific to unwinder, I will move to PerfReader to support both AutoFDO and CSSPGO

hoy added inline comments.Nov 20 2020, 1:53 PM

llvm/test/tools/llvm-profgen/Inputs/noinline-cs-noprobe.perfscript
3	Maybe the binary lookup table can be pre-filled with preferred load address when the binary is loaded/constructed. Without mmap2 events in the trace file, subsequent processing with just use the preferred addresses.
llvm/tools/llvm-profgen/ProfileGenerator.cpp
48	I see. Let's keep a single output for now.

wenlei added inline comments.Nov 20 2020, 2:24 PM

llvm/tools/llvm-profgen/PerfReader.cpp
503–506	Yes, that looks good for now.

LGTM.

llvm/tools/llvm-profgen/PerfReader.h
214–215	That is helpful. Thanks.
llvm/tools/llvm-profgen/ProfileGenerator.cpp
63–71	It is helpful too. Thanks.

This revision is now accepted and ready to land.Nov 25 2020, 4:32 PM

Herald added a subscriber: lxfind. · View Herald TranscriptNov 25 2020, 4:32 PM

hoy added inline comments.Nov 30 2020, 9:34 AM

llvm/test/tools/llvm-profgen/inline-cs-noprobe.test
28	Can you please add a comment on what compiler command line switches are used to build the source code?
llvm/tools/llvm-profgen/PerfReader.cpp
49	Nit: just use `PrevIP` here instead of using `Start`?
228	Nit: curly braces not needed for single-statement block.
390	Use `exitWithError`?
llvm/tools/llvm-profgen/PerfReader.h
83	Nit: consider using `std::vector` to reduce the number of memory allocations and for better locality.
155	Nit: `const` qualifier for these getters?
319	Nit: `const` qualifier for getters?

wlei added a child revision: D92334: [CSSPGO][llvm-profgen] Pseudo probe decoding and disassembling.Nov 30 2020, 12:24 PM

Address reviewers' feedback: added more comments and some refactoring work

Harbormaster completed remote builds in B80680: Diff 308693.Dec 1 2020, 9:51 AM

wlei marked 11 inline comments as done.Dec 1 2020, 10:17 AM

wlei added inline comments.

llvm/test/tools/llvm-profgen/inline-cs-noprobe.test
28	Good suggestion, comment added
llvm/tools/llvm-profgen/PerfReader.h
83	Here using list is because CallStack has both `push_back` and `push_front` action, in the future it will switch to trie.
155	fixed, good suggestion, thanks!

hoy added inline comments.Dec 1 2020, 1:02 PM

llvm/tools/llvm-profgen/PerfReader.h
155	Actually I meant something like: ProfiledBinary *getBinary() const { return Binary; } bool hasNextLBR() const { return LBRIndex < LBRStack.size(); } ... Sorry for the confusion.

add const qualifier for some functions

Harbormaster completed remote builds in B80715: Diff 308758.Dec 1 2020, 1:50 PM

wlei added inline comments.Dec 1 2020, 1:51 PM

llvm/tools/llvm-profgen/PerfReader.h
155	fixed, thanks for clarification!

hoy accepted this revision.Dec 1 2020, 2:05 PM

wenlei added inline comments.Dec 2 2020, 9:56 AM

llvm/test/tools/llvm-profgen/Inputs/noinline-cs-noprobe.perfscript
3	Yeah, what @hoy suggested is what I was thinking about - default to preferred load address if mmap is absent. We need that but I think It's fine to deal with it in a separate patch.
llvm/tools/llvm-profgen/PerfReader.h
156	const qualifier here as well?
224	For linear unwinding, some brief explanation for handling of inlining would be helpful too.
llvm/tools/llvm-profgen/ProfileGenerator.cpp
48	What about limiting to single binary input for now? Error our with message saying unsupported if multiple binaries are provided. Generating profiles for multiple binaries in a single output file will make the profile summary info inaccurate (e.g. percentile based hot thresholds).

Address wenlei's feedback

Harbormaster completed remote builds in B81021: Diff 309371.Dec 3 2020, 2:38 PM

This looks great. Thanks for working on this and making all the changes!

wlei retitled this revision from [CSSPGO][llvm-profgen]Context-sensitive profile data generation to [CSSPGO][llvm-profgen] Context-sensitive profile data generation.Dec 7 2020, 1:06 PM

wlei edited the summary of this revision. (Show Details)

rebase and update the diff summary

This revision was landed with ongoing or failed builds.Dec 7 2020, 1:54 PM

Closed by commit rG1f05b1a9f527: [CSSPGO][llvm-profgen] Context-sensitive profile data generation (authored by wlei). · Explain Why

This revision was automatically updated to reflect the committed changes.

wlei added a commit: rG1f05b1a9f527: [CSSPGO][llvm-profgen] Context-sensitive profile data generation.

Harbormaster completed remote builds in B81345: Diff 310002.Dec 7 2020, 2:04 PM

fails here http://lab.llvm.org:8011/#/builders/99/builds/1031

FAIL: LLVM :: tools/llvm-profgen/noinline-cs-noprobe.test (68769 of 72066)
******************** TEST 'LLVM :: tools/llvm-profgen/noinline-cs-noprobe.test' FAILED ********************
Script:
--
: 'RUN: at line 1';   llvm-profgen --perfscript=/b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/test/tools/llvm-profgen/Inputs/noinline-cs-noprobe.perfscript --binary=/b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/test/tools/llvm-profgen/Inputs/noinline-cs-noprobe.perfbin --output=/b/sanitizer-x86_64-linux-bootstrap/build/llvm_build_asan/test/tools/llvm-profgen/Output/noinline-cs-noprobe.test.tmp --show-unwinder-output | /b/sanitizer-x86_64-linux-bootstrap/build/llvm_build_asan/bin/FileCheck /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/test/tools/llvm-profgen/noinline-cs-noprobe.test --check-prefix=CHECK-UNWINDER
: 'RUN: at line 2';   /b/sanitizer-x86_64-linux-bootstrap/build/llvm_build_asan/bin/FileCheck /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/test/tools/llvm-profgen/noinline-cs-noprobe.test --input-file /b/sanitizer-x86_64-linux-bootstrap/build/llvm_build_asan/test/tools/llvm-profgen/Output/noinline-cs-noprobe.test.tmp
--
Exit Code: 1
Command Output (stderr):
--
/b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/test/tools/llvm-profgen/noinline-cs-noprobe.test:20:19: error: CHECK-UNWINDER: expected string not found in input
; CHECK-UNWINDER: (5b0, 5c8): 1
                  ^
<stdin>:14:2: note: scanning from here
 (5c8, 5dc): 2
 ^
Input file: <stdin>
Check file: /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/test/tools/llvm-profgen/noinline-cs-noprobe.test
-dump-input=help explains the following input dump.
Input was:
<<<<<<
          .
          .
          .
          9:  (634, 637): 3
         10:  (645, 645): 3
         11: 
         12: Binary(noinline-cs-noprobe.perfbin)'s Branch Counter:
         13: main:1 @ foo:3 @ bar
         14:  (5c8, 5dc): 2
check:20      X~~~~~~~~~~~~ error: no match found
         15:  (5d7, 5e5): 2
check:20     ~~~~~~~~~~~~~~
         16:  (5e9, 634): 3
check:20     ~~~~~~~~~~~~~~
         17: main:1 @ foo
check:20     ~~~~~~~~~~~~
         18:  (62f, 5b0): 3
check:20     ~~~~~~~~~~~~~~
         19:  (637, 645): 3
check:20     ~~~~~~~~~~~~~~
         20:  (645, 5ff): 3
check:20     ~~~~~~~~~~~~~~
>>>>>>
--
********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90
FAIL: LLVM :: tools/llvm-profgen/inline-cs-noprobe.test (68770 of 72066)
******************** TEST 'LLVM :: tools/llvm-profgen/inline-cs-noprobe.test' FAILED ********************
Script:
--
: 'RUN: at line 1';   llvm-profgen --perfscript=/b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/test/tools/llvm-profgen/Inputs/inline-cs-noprobe.perfscript --binary=/b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/test/tools/llvm-profgen/Inputs/inline-cs-noprobe.perfbin --output=/b/sanitizer-x86_64-linux-bootstrap/build/llvm_build_asan/test/tools/llvm-profgen/Output/inline-cs-noprobe.test.tmp --show-unwinder-output | /b/sanitizer-x86_64-linux-bootstrap/build/llvm_build_asan/bin/FileCheck /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/test/tools/llvm-profgen/inline-cs-noprobe.test --check-prefix=CHECK-UNWINDER
: 'RUN: at line 2';   /b/sanitizer-x86_64-linux-bootstrap/build/llvm_build_asan/bin/FileCheck /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/test/tools/llvm-profgen/inline-cs-noprobe.test --input-file /b/sanitizer-x86_64-linux-bootstrap/build/llvm_build_asan/test/tools/llvm-profgen/Output/inline-cs-noprobe.test.tmp
--
Exit Code: 1
Command Output (stderr):
--
/b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/test/tools/llvm-profgen/inline-cs-noprobe.test:16:19: error: CHECK-UNWINDER: expected string not found in input
; CHECK-UNWINDER: (670, 6ad): 1
                  ^
<stdin>:12:2: note: scanning from here
 (69b, 670): 1
 ^
Input file: <stdin>
Check file: /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/test/tools/llvm-profgen/inline-cs-noprobe.test
-dump-input=help explains the following input dump.
Input was:
<<<<<<
          .
          .
          .
          7: main:1 @ foo:3.2 @ bar
          8:  (6af, 6bb): 14
          9: 
         10: Binary(inline-cs-noprobe.perfbin)'s Branch Counter:
         11: main:1 @ foo
         12:  (69b, 670): 1
check:16      X~~~~~~~~~~~~ error: no match found
         13:  (6c8, 67e): 15
check:16     ~~~~~~~~~~~~~~~
>>>>>>
--

Hi, @vitalybuka , sorry for the test failure, the fix-up patch(https://reviews.llvm.org/D92816) is already landed, please update the repo, thanks！

Revision Contents

Path

Size

llvm/

docs/

CommandGuide/

llvm-profgen.rst

6 lines

include/

llvm/

ProfileData/

SampleProf.h

28 lines

lib/

ProfileData/

SampleProfWriter.cpp

5 lines

test/

tools/

llvm-profgen/

Inputs/

inline-cs-noprobe.perfbin

inline-cs-noprobe.perfscript

7 lines

noinline-cs-noprobe.perfbin

noinline-cs-noprobe.perfscript

24 lines

inline-cs-noprobe.test

47 lines

noinline-cs-noprobe.test

60 lines

tools/

llvm-profgen/

2 lines

270 lines

392 lines

96 lines

329 lines

131 lines

99 lines

12 lines

Diff 310002

llvm/docs/CommandGuide/llvm-profgen.rst

	Show All 30 Lines
	.. option:: --output=<string>			.. option:: --output=<string>

	Path of the output profile file.			Path of the output profile file.

	OPTIONS			OPTIONS
	-------			-------
	:program:`llvm-profgen` supports the following options:			:program:`llvm-profgen` supports the following options:

				.. option:: --format=[text\|binary\|extbinary\|compbinary\|gcc]

				Specify the format of the generated profile. Supported <format> are `text`,
				`binary`, `extbinary`, `compbinary`, `gcc`, see `llvm-profdata` for more
				descriptions of the format.

	.. option:: --show-mmap-events			.. option:: --show-mmap-events

	Print mmap events.			Print mmap events.

	.. option:: --show-disassembly			.. option:: --show-disassembly

	Print disassembled code.			Print disassembled code.

	.. option:: --x86-asm-syntax=[att\|intel]			.. option:: --x86-asm-syntax=[att\|intel]

	Specify whether to print assembly code in AT&T syntax (the default) or Intel			Specify whether to print assembly code in AT&T syntax (the default) or Intel
	syntax.			syntax.

llvm/include/llvm/ProfileData/SampleProf.h

Show First 20 Lines • Show All 240 Lines • ▼ Show 20 Lines	bool operator<(const LineLocation &O) const {
return LineOffset < O.LineOffset \|\|		return LineOffset < O.LineOffset \|\|
(LineOffset == O.LineOffset && Discriminator < O.Discriminator);		(LineOffset == O.LineOffset && Discriminator < O.Discriminator);
}		}

bool operator==(const LineLocation &O) const {		bool operator==(const LineLocation &O) const {
return LineOffset == O.LineOffset && Discriminator == O.Discriminator;		return LineOffset == O.LineOffset && Discriminator == O.Discriminator;
}		}

		bool operator!=(const LineLocation &O) const {
		return LineOffset != O.LineOffset \|\| Discriminator != O.Discriminator;
		}

uint32_t LineOffset;		uint32_t LineOffset;
uint32_t Discriminator;		uint32_t Discriminator;
};		};

raw_ostream &operator<<(raw_ostream &OS, const LineLocation &Loc);		raw_ostream &operator<<(raw_ostream &OS, const LineLocation &Loc);

/// Representation of a single sample record.		/// Representation of a single sample record.
///		///
▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines

private:		private:
uint64_t NumSamples = 0;		uint64_t NumSamples = 0;
CallTargetMap CallTargets;		CallTargetMap CallTargets;
};		};

raw_ostream &operator<<(raw_ostream &OS, const SampleRecord &Sample);		raw_ostream &operator<<(raw_ostream &OS, const SampleRecord &Sample);

// State of context associated with FunctionSamples		// State of context associated with FunctionSamples
		wenleiUnsubmitted Done Reply Inline Actions Let's separate CSSPGO changes in SampleProf out from the llvm-profgen changes. We'll send CSSPGO compiler/infrastructure patches separately, and then this llvm-profgen patch can depend on that. wenlei: Let's separate CSSPGO changes in SampleProf out from the llvm-profgen changes. We'll send…
enum ContextStateMask {		enum ContextStateMask {
UnknownContext = 0x0, // Profile without context		UnknownContext = 0x0, // Profile without context
RawContext = 0x1, // Full context profile from input profile		RawContext = 0x1, // Full context profile from input profile
SyntheticContext = 0x2, // Synthetic context created for context promotion		SyntheticContext = 0x2, // Synthetic context created for context promotion
InlinedContext = 0x4, // Profile for context that is inlined into caller		InlinedContext = 0x4, // Profile for context that is inlined into caller
MergedContext = 0x8 // Profile for context merged into base profile		MergedContext = 0x8 // Profile for context merged into base profile
};		};

▲ Show 20 Lines • Show All 159 Lines • ▼ Show 20 Lines	sampleprof_error addCalledTargetSamples(uint32_t LineOffset,
StringRef FName, uint64_t Num,		StringRef FName, uint64_t Num,
uint64_t Weight = 1) {		uint64_t Weight = 1) {
return BodySamples[LineLocation(LineOffset, Discriminator)].addCalledTarget(		return BodySamples[LineLocation(LineOffset, Discriminator)].addCalledTarget(
FName, Num, Weight);		FName, Num, Weight);
}		}

/// Return the number of samples collected at the given location.		/// Return the number of samples collected at the given location.
/// Each location is specified by \p LineOffset and \p Discriminator.		/// Each location is specified by \p LineOffset and \p Discriminator.
/// If the location is not found in profile, return error.		/// If the location is not found in profile, return error.
		wenleiUnsubmitted Done Reply Inline Actions This can now be merged with `getEntrySamples`, with dispatching based on `ProfileIsCS`. wenlei: This can now be merged with `getEntrySamples`, with dispatching based on `ProfileIsCS`.
ErrorOr<uint64_t> findSamplesAt(uint32_t LineOffset,		ErrorOr<uint64_t> findSamplesAt(uint32_t LineOffset,
uint32_t Discriminator) const {		uint32_t Discriminator) const {
const auto &ret = BodySamples.find(LineLocation(LineOffset, Discriminator));		const auto &ret = BodySamples.find(LineLocation(LineOffset, Discriminator));
if (ret == BodySamples.end()) {		if (ret == BodySamples.end()) {
// For CSSPGO, in order to conserve profile size, we no longer write out		// For CSSPGO, in order to conserve profile size, we no longer write out
// locations profile for those not hit during training, so we need to		// locations profile for those not hit during training, so we need to
// treat them as zero instead of error here.		// treat them as zero instead of error here.
if (ProfileIsCS)		if (ProfileIsCS)
		wenleiUnsubmitted Done Reply Inline Actions We probably shouldn't arbitrarily assume live for a general helper function. The logic to assume live can be moved to caller if needed. wenlei: We probably shouldn't arbitrarily assume live for a general helper function. The logic to…
return 0;		return 0;
return std::error_code();		return std::error_code();
		wenleiUnsubmitted Done Reply Inline Actions nit: there's no "average" now with this version. wenlei: nit: there's no "average" now with this version.
} else {		} else {
return ret->second.getSamples();		return ret->second.getSamples();
}		}
}		}

/// Returns the call target map collected at a given location.		/// Returns the call target map collected at a given location.
/// Each location is specified by \p LineOffset and \p Discriminator.		/// Each location is specified by \p LineOffset and \p Discriminator.
/// If the location is not found in profile, return error.		/// If the location is not found in profile, return error.
Show All 39 Lines	public:
/// instruction of the symbol. But as we directly get this info for raw		/// instruction of the symbol. But as we directly get this info for raw
/// profile without referring to potentially inaccurate debug info, this		/// profile without referring to potentially inaccurate debug info, this
/// gives more accurate profile data and is preferred for standalone symbols.		/// gives more accurate profile data and is preferred for standalone symbols.
uint64_t getHeadSamples() const { return TotalHeadSamples; }		uint64_t getHeadSamples() const { return TotalHeadSamples; }

/// Return the sample count of the first instruction of the function.		/// Return the sample count of the first instruction of the function.
/// The function can be either a standalone symbol or an inlined function.		/// The function can be either a standalone symbol or an inlined function.
uint64_t getEntrySamples() const {		uint64_t getEntrySamples() const {
		if (FunctionSamples::ProfileIsCS && getHeadSamples()) {
		// For CS profile, if we already have more accurate head samples
		// counted by branch sample from caller, use them as entry samples.
		return getHeadSamples();
		}
uint64_t Count = 0;		uint64_t Count = 0;
// Use either BodySamples or CallsiteSamples which ever has the smaller		// Use either BodySamples or CallsiteSamples which ever has the smaller
// lineno.		// lineno.
if (!BodySamples.empty() &&		if (!BodySamples.empty() &&
(CallsiteSamples.empty() \|\|		(CallsiteSamples.empty() \|\|
BodySamples.begin()->first < CallsiteSamples.begin()->first))		BodySamples.begin()->first < CallsiteSamples.begin()->first))
Count = BodySamples.begin()->second.getSamples();		Count = BodySamples.begin()->second.getSamples();
else if (!CallsiteSamples.empty()) {		else if (!CallsiteSamples.empty()) {
▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	public:
}		}

/// Set the name of the function.		/// Set the name of the function.
void setName(StringRef FunctionName) { Name = FunctionName; }		void setName(StringRef FunctionName) { Name = FunctionName; }

/// Return the function name.		/// Return the function name.
StringRef getName() const { return Name; }		StringRef getName() const { return Name; }

		/// Return function name with context.
		StringRef getNameWithContext() const {
		return FunctionSamples::ProfileIsCS ? Context.getNameWithContext() : Name;
		}

/// Return the original function name.		/// Return the original function name.
StringRef getFuncName() const { return getFuncName(Name); }		StringRef getFuncName() const { return getFuncName(Name); }

/// Return the canonical name for a function, taking into account		/// Return the canonical name for a function, taking into account
/// suffix elision policy attributes.		/// suffix elision policy attributes.
static StringRef getCanonicalFnName(const Function &F) {		static StringRef getCanonicalFnName(const Function &F) {
static const char *knownSuffixes[] = { ".llvm.", ".part." };
auto AttrName = "sample-profile-suffix-elision-policy";		auto AttrName = "sample-profile-suffix-elision-policy";
auto Attr = F.getFnAttribute(AttrName).getValueAsString();		auto Attr = F.getFnAttribute(AttrName).getValueAsString();
		return getCanonicalFnName(F.getName(), Attr);
		}

		static StringRef getCanonicalFnName(StringRef FnName, StringRef Attr = "") {
		static const char *knownSuffixes[] = { ".llvm.", ".part." };
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'knownSuffixes' [readability-identifier-naming] not useful clang-format: please reformat the code - static const char knownSuffixes[] = { ".llvm.", ".part." }; + static const char knownSuffixes[] = {".llvm.", ".part."}; Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'knownSuffixes' [readability-identifier…
if (Attr == "" \|\| Attr == "all") {		if (Attr == "" \|\| Attr == "all") {
return F.getName().split('.').first;		return FnName.split('.').first;
} else if (Attr == "selected") {		} else if (Attr == "selected") {
StringRef Cand(F.getName());		StringRef Cand(FnName);
for (const auto &Suf : knownSuffixes) {		for (const auto &Suf : knownSuffixes) {
StringRef Suffix(Suf);		StringRef Suffix(Suf);
auto It = Cand.rfind(Suffix);		auto It = Cand.rfind(Suffix);
if (It == StringRef::npos)		if (It == StringRef::npos)
return Cand;		return Cand;
auto Dit = Cand.rfind('.');		auto Dit = Cand.rfind('.');
if (Dit == It + Suffix.size() - 1)		if (Dit == It + Suffix.size() - 1)
Cand = Cand.substr(0, It);		Cand = Cand.substr(0, It);
}		}
return Cand;		return Cand;
} else if (Attr == "none") {		} else if (Attr == "none") {
return F.getName();		return FnName;
} else {		} else {
assert(false && "internal error: unknown suffix elision policy");		assert(false && "internal error: unknown suffix elision policy");
}		}
return F.getName();		return FnName;
}		}

/// Translate \p Name into its original name.		/// Translate \p Name into its original name.
/// When profile doesn't use MD5, \p Name needs no translation.		/// When profile doesn't use MD5, \p Name needs no translation.
/// When profile uses MD5, \p Name in current FunctionSamples		/// When profile uses MD5, \p Name in current FunctionSamples
/// is actually GUID of the original function name. getFuncName will		/// is actually GUID of the original function name. getFuncName will
/// translate \p Name in current FunctionSamples into its original name		/// translate \p Name in current FunctionSamples into its original name
/// by looking up in the function map GUIDToFuncNameMap.		/// by looking up in the function map GUIDToFuncNameMap.
▲ Show 20 Lines • Show All 171 Lines • Show Last 20 Lines

llvm/lib/ProfileData/SampleProfWriter.cpp

	Show First 20 Lines • Show All 270 Lines • ▼ Show 20 Lines
	/// Note: it may be tempting to implement this in terms of			/// Note: it may be tempting to implement this in terms of
	/// FunctionSamples::print(). Please don't. The dump functionality is intended			/// FunctionSamples::print(). Please don't. The dump functionality is intended
	/// for debugging and has no specified form.			/// for debugging and has no specified form.
	///			///
	/// The format used here is more structured and deliberate because			/// The format used here is more structured and deliberate because
	/// it needs to be parsed by the SampleProfileReaderText class.			/// it needs to be parsed by the SampleProfileReaderText class.
	std::error_code SampleProfileWriterText::writeSample(const FunctionSamples &S) {			std::error_code SampleProfileWriterText::writeSample(const FunctionSamples &S) {
	auto &OS = *OutputStream;			auto &OS = *OutputStream;
				if (FunctionSamples::ProfileIsCS)
				OS << "[" << S.getNameWithContext() << "]:" << S.getTotalSamples();
				else
	OS << S.getName() << ":" << S.getTotalSamples();			OS << S.getName() << ":" << S.getTotalSamples();
	if (Indent == 0)			if (Indent == 0)
	OS << ":" << S.getHeadSamples();			OS << ":" << S.getHeadSamples();
	OS << "\n";			OS << "\n";

	SampleSorter<LineLocation, SampleRecord> SortedSamples(S.getBodySamples());			SampleSorter<LineLocation, SampleRecord> SortedSamples(S.getBodySamples());
	for (const auto &I : SortedSamples.get()) {			for (const auto &I : SortedSamples.get()) {
	LineLocation Loc = I->first;			LineLocation Loc = I->first;
	const SampleRecord &Sample = I->second;			const SampleRecord &Sample = I->second;
	▲ Show 20 Lines • Show All 372 Lines • Show Last 20 Lines

llvm/test/tools/llvm-profgen/Inputs/inline-cs-noprobe.perfbin

llvm/test/tools/llvm-profgen/Inputs/inline-cs-noprobe.perfscript

This file was added.

				Using perf wrapper that supports hot-text. Try perf.real if you encounter any issues.
				PERF_RECORD_MMAP2 2854748/2854748: [0x400000(0x1000) @ 0 00:1d 123291722 526021]: r-xp /home/inline-cs-noprobe.perfbin


				40067e
				5541f689495641d7
				0x4006c8/0x40067e/P/-/-/0 0x4006c8/0x40067e/P/-/-/0 0x4006c8/0x40067e/P/-/-/0 0x4006c8/0x40067e/P/-/-/0 0x4006c8/0x40067e/P/-/-/0 0x4006c8/0x40067e/P/-/-/0 0x4006c8/0x40067e/P/-/-/0 0x4006c8/0x40067e/P/-/-/0 0x4006c8/0x40067e/P/-/-/0 0x4006c8/0x40067e/P/-/-/0 0x4006c8/0x40067e/P/-/-/0 0x4006c8/0x40067e/P/-/-/0 0x40069b/0x400670/M/-/-/0 0x4006c8/0x40067e/P/-/-/0 0x4006c8/0x40067e/P/-/-/0 0x4006c8/0x40067e/P/-/-/0

llvm/test/tools/llvm-profgen/Inputs/noinline-cs-noprobe.perfbin

llvm/test/tools/llvm-profgen/Inputs/noinline-cs-noprobe.perfscript

This file was added.

				Using perf wrapper that supports hot-text. Try perf.real if you encounter any issues.
				PERF_RECORD_MMAP2 2854748/2854748: [0x400000(0x1000) @ 0 00:1d 123291722 526021]: r-xp /home/noinline-cs-noprobe.perfbin

				wenleiUnsubmitted Not Done Reply Inline Actions I think we also need to support cases where PERF_RECORD_MMAP2 event isn't available, in which case we just use preferred load address from ELF header. Can you add a test case that doesn't have PERF_RECORD_MMAP2? Looks like currently we would just proceed with parsing without a base address set? wenlei: I think we also need to support cases where PERF_RECORD_MMAP2 event isn't available, in which…
				wleiAuthorUnsubmitted Done Reply Inline Actions Yeah, currently PERF_RECORD_MMAP2 is required. The problem using preferred load address for non-mmap event is one perf address might belong to multiple binaries, which will mess up the whole process. Also we need to one more perftrace scan to confirm there is no mmap2 event so that we can switch to use preferred address. or we can have a switch like "--no-mmp2-events" to explicitly tell the tool use preferred address, also only support one binary under this switch. or we need some info in the perf trace tell which binary it belong to(I remembered we discuss this internally). any suggestion on this? wlei: Yeah, currently PERF_RECORD_MMAP2 is required. The problem using preferred load address for non…
				hoyUnsubmitted Not Done Reply Inline Actions Maybe the binary lookup table can be pre-filled with preferred load address when the binary is loaded/constructed. Without mmap2 events in the trace file, subsequent processing with just use the preferred addresses. hoy: Maybe the binary lookup table can be pre-filled with preferred load address when the binary is…
				wenleiUnsubmitted Not Done Reply Inline Actions Yeah, what @hoy suggested is what I was thinking about - default to preferred load address if mmap is absent. We need that but I think It's fine to deal with it in a separate patch. wenlei: Yeah, what @hoy suggested is what I was thinking about - default to preferred load address if…
				4005dc
				400634
				400684
				7f68c5788793
				0x4005c8/0x4005dc/P/-/-/0 0x40062f/0x4005b0/P/-/-/0 0x400645/0x4005ff/P/-/-/0 0x400637/0x400645/P/-/-/0 0x4005e9/0x400634/P/-/-/0 0x4005d7/0x4005e5/P/-/-/0 0x40062f/0x4005b0/P/-/-/0 0x400645/0x4005ff/P/-/-/0 0x400637/0x400645/P/-/-/0 0x4005e9/0x400634/P/-/-/0 0x4005d7/0x4005e5/P/-/-/0 0x40062f/0x4005b0/P/-/-/0 0x400645/0x4005ff/P/-/-/0 0x400637/0x400645/P/-/-/0 0x4005e9/0x400634/P/-/-/0 0x4005c8/0x4005dc/P/-/-/0

				// Test for leaf frame ending up in prolog
				4005b0
				400684
				7f68c5788793
				0x40062f/0x4005b0/P/-/-/0 0x400645/0x4005ff/P/-/-/0 0x400637/0x400645/P/-/-/0 0x4005e9/0x400634/P/-/-/0 0x4005c8/0x4005dc/P/-/-/0 0x40062f/0x4005b0/P/-/-/0 0x400645/0x4005ff/P/-/-/0 0x400637/0x400645/P/-/-/0 0x4005e9/0x400634/P/-/-/0 0x4005d7/0x4005e5/P/-/-/0 0x40062f/0x4005b0/P/-/-/0 0x400645/0x4005ff/P/-/-/0 0x400637/0x400645/P/-/-/0 0x4005e9/0x400634/P/-/-/0 0x4005d7/0x4005e5/P/-/-/0 0x40062f/0x4005b0/P/-/-/0

				// Call stack:
				// 4005b0 -> start addr of bar
				// 400684 -> address in main
				// LBR Entry: \| Source \| Target
				// 0x40062f/0x4005b0/P/-/-/0 \| callq -132 <bar> \| start addr of bar
				// 0x400645/0x4005ff/P/-/-/0 \| jmp -75 <foo+0xf> \| movl -8(%rbp), %eax
				// 0x400637/0x400645/P/-/-/0 \| jmp 9 <foo+0x55> \| jmp -75 <foo+0xf>
				// 0x4005e9/0x400634/P/-/-/0 \| (bar)retq \| next addr of [callq -132 <bar>]
				// 0x4005d7/0x4005e5/P/-/-/0 \| jmp 9 <bar+0x35> \| movl -4(%rbp), %eax

llvm/test/tools/llvm-profgen/inline-cs-noprobe.test

This file was added.

				; RUN: llvm-profgen --perfscript=%S/Inputs/inline-cs-noprobe.perfscript --binary=%S/Inputs/inline-cs-noprobe.perfbin --output=%t --show-unwinder-output \| FileCheck %s --check-prefix=CHECK-UNWINDER
				; RUN: FileCheck %s --input-file %t
				wmiUnsubmitted Done Reply Inline Actions Is it possible to use a small manually crafted perfscript as input? It is easier to know whether the number in the output makes sense or not when the perfscript is small. It will also be easier if something in the test needs to be adjusted in the future. wmi: Is it possible to use a small manually crafted perfscript as input? It is easier to know…
				wleiAuthorUnsubmitted Done Reply Inline Actions Thanks for your feedbacks. a small perfscript with only one or two sample is replaced. also add unwinder's test wlei: Thanks for your feedbacks. a small perfscript with only one or two sample is replaced. also add…

				; CHECK:[main:1 @ foo]:44:0
				; CHECK: 2.2: 14
				; CHECK: 3: 15
				; CHECK: 3.2: 14 bar:14
				; CHECK: 3.4: 1
				; CHECK:[main:1 @ foo:3.2 @ bar]:14:0
				; CHECK: 1: 14

				; CHECK-UNWINDER: Binary(inline-cs-noprobe.perfbin)'s Range Counter:
				; CHECK-UNWINDER: main:1 @ foo:3.2 @ bar
				; CHECK-UNWINDER: (6af, 6bb): 14
				; CHECK-UNWINDER: main:1 @ foo
				; CHECK-UNWINDER: (670, 6ad): 1
				; CHECK-UNWINDER: (67e, 69b): 1
				; CHECK-UNWINDER: (67e, 6ad): 13
				; CHECK-UNWINDER: (6bd, 6c8): 14

				; CHECK-UNWINDER: Binary(inline-cs-noprobe.perfbin)'s Branch Counter:
				; CHECK-UNWINDER: main:1 @ foo
				; CHECK-UNWINDER: (69b, 670): 1
				; CHECK-UNWINDER: (6c8, 67e): 15

				; original code:
				; clang -O3 -g test.c -o a.out
				#include <stdio.h>
				hoyUnsubmitted Done Reply Inline Actions Can you please add a comment on what compiler command line switches are used to build the source code? hoy: Can you please add a comment on what compiler command line switches are used to build the…
				wleiAuthorUnsubmitted Done Reply Inline Actions Good suggestion, comment added wlei: Good suggestion, comment added

				int bar(int x, int y) {
				if (x % 3) {
				return x - y;
				}
				return x + y;
				}

				void foo() {
				int s, i = 0;
				while (i++ < 4000 * 4000)
				if (i % 91) s = bar(i, s); else s += 30;
				printf("sum is %d\n", s);
				}

				int main() {
				foo();
				return 0;
				}

llvm/test/tools/llvm-profgen/noinline-cs-noprobe.test

This file was added.

				; RUN: llvm-profgen --perfscript=%S/Inputs/noinline-cs-noprobe.perfscript --binary=%S/Inputs/noinline-cs-noprobe.perfbin --output=%t --show-unwinder-output \| FileCheck %s --check-prefix=CHECK-UNWINDER
				; RUN: FileCheck %s --input-file %t

				; CHECK:[main:1 @ foo:3 @ bar]:12:3
				; CHECK: 0: 3
				; CHECK: 1: 3
				; CHECK: 2: 2
				; CHECK: 4: 1
				wmiUnsubmitted Not Done Reply Inline Actions Is it possible for us to tell one level in context is inlined or not? It will make the profile more informative. wmi: Is it possible for us to tell one level in context is inlined or not? It will make the profile…
				wenleiUnsubmitted Not Done Reply Inline Actions Yes, agree that can useful, especially for tuning purpose to see how CS inline decision differs from previous build. We wanted to add a metadata (similar to `!CFGChecksum` for pseudo-probe profile) to indicate whether a context is inlined or not. Note that in this case, it would only tell whether bar is inlined along `main:1 @ foo:3`, but not whether `foo` is inlined along `main:1.` What do you think? Also to keep patch smallish, I think we can add this later separately. [main:1 @ foo:3 @ bar]:29103:1745 ... ... !CFGChecksum: ... !Flag: Inline wenlei: Yes, agree that can useful, especially for tuning purpose to see how CS inline decision differs…
				wmiUnsubmitted Not Done Reply Inline Actions Note that in this case, it would only tell whether bar is inlined along main:1 @ foo:3, but not whether foo is inlined along main:1. What do you think? What is the main difficulty to keep the inline information for each Context level? Also to keep patch smallish, I think we can add this later separately. Sure. > [main:1 @ foo:3 @ bar]:29103:1745 !CFGChecksum: ... !Flag: Inline Can we use some special sign to mark whether bar is inline or not, "" for example? [main:1 @ foo:3 @ bar]:29103:1745 wmi: > Note that in this case, it would only tell whether bar is inlined along main:1 @ foo:3, but…
				wenleiUnsubmitted Not Done Reply Inline Actions What is the main difficulty to keep the inline information for each Context level? Even if we only mark for leaf frame, middle frames inline decision could be found in its own leaf context (if it exists). I think it's also doable if we want to embed inline decision for each frame in a context (either with metadata or header), but since this is mostly for tuning/debugging, we're trying to keep it at minimum for now, and we can expand later if needed.. Now as to header vs metadata as carrier. There're couple reasons we don't do it in the header. We thought consistency between inline vs non-inline context for main profile and header keep things clean. In fact, CSSPGO really treats them indifferently. !metadata can be mapped/converted to binary profile with general framework support. Doing it with special character in header may require special (not-so-clean) handling for text-binary conversion. There's only that much we can do with special character in header, so it's not extensible if we want to encode something else. Initially we have checksum in header as well, but later move it to metadata to keep header clean, and thought it'd be good to keep all auxiliary data in metadata form. wenlei: > What is the main difficulty to keep the inline information for each Context level? Even if…
				wmiUnsubmitted Not Done Reply Inline Actions We thought consistency between inline vs non-inline context for main profile and header keep things clean. In fact, CSSPGO really treats them indifferently. Yes, I understand CSSPGO treats inline callsite or not inline callsite indifferently, and the special character showing whether a callsite is inlined is just for easy debug. Since it is only for debug, you just need to add it when you output the profile into text and you can strip it once for all when you read the text. That will be a standalone processing step. CSSPGO has already include a lot more useful information than current SPGO profile, I just hope the CSSPGO afdo file can show us the inline hierarchy as easy as what we current have. wmi: > We thought consistency between inline vs non-inline context for main profile and header keep…
				wenleiUnsubmitted Not Done Reply Inline Actions I just hope the CSSPGO afdo file can show us the inline hierarchy as easy as what we current have. Yeah, I can see how that can make debugging a bit easier. Perhaps a stack of flags in metadata can do it too if needed. On the other hand, the info is all from dwarf, so what we are discussing is just the visualization. Visualizing inline hierarchy isn't the responsibility of a profile, but afdo happens to be able to visualize inline hierarchy in a nice way, though it's still more of a side effect of the way inline profile is represented. I'd argue that cleanness and consistency probably weighs more than trying to reach parity for that nice side effect (and actually even if we use * for inline frame, it's still not as nice as afdo profile's tree style inline hierarchy..) Perhaps we can leave this one open for now, and see where actual need leads us to? wenlei: > I just hope the CSSPGO afdo file can show us the inline hierarchy as easy as what we current…
				wmiUnsubmitted Not Done Reply Inline Actions Agree. It is not critical. We can leave it open at this moment. wmi: Agree. It is not critical. We can leave it open at this moment.
				; CHECK: 5: 3
				; CHECK:[main:1 @ foo]:9:0
				; CHECK: 2: 3
				; CHECK: 3: 3 bar:3

				; CHECK-UNWINDER: Binary(noinline-cs-noprobe.perfbin)'s Range Counter:
				; CHECK-UNWINDER: main:1 @ foo
				; CHECK-UNWINDER: (5ff, 62f): 3
				; CHECK-UNWINDER: (634, 637): 3
				; CHECK-UNWINDER: (645, 645): 3
				; CHECK-UNWINDER: main:1 @ foo:3 @ bar
				; CHECK-UNWINDER: (5b0, 5c8): 1
				; CHECK-UNWINDER: (5b0, 5d7): 2
				; CHECK-UNWINDER: (5dc, 5e9): 1
				; CHECK-UNWINDER: (5e5, 5e9): 2

				; CHECK-UNWINDER: Binary(noinline-cs-noprobe.perfbin)'s Branch Counter:
				; CHECK-UNWINDER: main:1 @ foo
				; CHECK-UNWINDER: (62f, 5b0): 3
				; CHECK-UNWINDER: (637, 645): 3
				; CHECK-UNWINDER: (645, 5ff): 3
				; CHECK-UNWINDER: main:1 @ foo:3 @ bar
				; CHECK-UNWINDER: (5c8, 5dc): 2
				; CHECK-UNWINDER: (5d7, 5e5): 2
				; CHECK-UNWINDER: (5e9, 634): 3





				; original code:
				; clang -O0 -g test.c -o a.out
				#include <stdio.h>

				int bar(int x, int y) {
				if (x % 3) {
				return x - y;
				}
				return x + y;
				}

				void foo() {
				int s, i = 0;
				while (i++ < 4000 * 4000)
				if (i % 91) s = bar(i, s); else s += 30;
				printf("sum is %d\n", s);
				}

				int main() {
				foo();
				return 0;
				}

llvm/tools/llvm-profgen/CMakeLists.txt


	set(LLVM_LINK_COMPONENTS			set(LLVM_LINK_COMPONENTS
	AllTargetsDescs			AllTargetsDescs
	AllTargetsDisassemblers			AllTargetsDisassemblers
	AllTargetsInfos			AllTargetsInfos
	Core			Core
	MC			MC
	MCDisassembler			MCDisassembler
	Object			Object
				ProfileData
	Support			Support
	Symbolize			Symbolize
	)			)

	add_llvm_tool(llvm-profgen			add_llvm_tool(llvm-profgen
	llvm-profgen.cpp			llvm-profgen.cpp
	PerfReader.cpp			PerfReader.cpp
	ProfiledBinary.cpp			ProfiledBinary.cpp
				ProfileGenerator.cpp
	)			)

llvm/tools/llvm-profgen/PerfReader.h

//===-- PerfReader.h - perfscript reader ------------------------ C++ --===//		//===-- PerfReader.h - perfscript reader ------------------------ C++ --===//
		wenleiUnsubmitted Not Done Reply Inline Actions Would be good to add more comments for the classes/types defined, especially non-trivial ones. wenlei: Would be good to add more comments for the classes/types defined, especially non-trivial ones.
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_TOOLS_LLVM_PROFGEN_PERFREADER_H		#ifndef LLVM_TOOLS_LLVM_PROFGEN_PERFREADER_H
Show All 25 Lines	TraceStream(StringRef Filename) : Fin(Filename.str()) {
if (!Fin.good())		if (!Fin.good())
exitWithError("Error read input perf script file", Filename);		exitWithError("Error read input perf script file", Filename);
advance();		advance();
}		}

StringRef getCurrentLine() {		StringRef getCurrentLine() {
assert(!IsAtEoF && "Line iterator reaches the End-of-File!");		assert(!IsAtEoF && "Line iterator reaches the End-of-File!");
return CurrentLine;		return CurrentLine;
}		}
		hoyUnsubmitted Not Done Reply Inline Actions An artificial branch stands for a series of consecutive branches starting from the current binary with a transition through external code and eventually landing back in the current binary. hoy: An artificial branch stands for a series of consecutive branches starting from the current…

uint64_t getLineNumber() { return LineNumber; }		uint64_t getLineNumber() { return LineNumber; }

bool isAtEoF() { return IsAtEoF; }		bool isAtEoF() { return IsAtEoF; }

// Read the next line		// Read the next line
void advance() {		void advance() {
if (!std::getline(Fin, CurrentLine)) {		if (!std::getline(Fin, CurrentLine)) {
IsAtEoF = true;		IsAtEoF = true;
		wenleiUnsubmitted Not Done Reply Inline Actions This is just an encapsulation for synchronized stack and lbr sample, and it's orthogonal to unwinder. Suggest decouple it from unwinder in naming, instead call it `HybridSample`. wenlei: This is just an encapsulation for synchronized stack and lbr sample, and it's orthogonal to…
return;		return;
}		}
LineNumber++;		LineNumber++;
		wenleiUnsubmitted Not Done Reply Inline Actions nit: I'd replace anti-execution with bottom-up order (or leaf to root). wenlei: nit: I'd replace anti-execution with bottom-up order (or leaf to root).
}		}
		wenleiUnsubmitted Not Done Reply Inline Actions Curious why use list for Stack but SmallVector (and Index) for LBRStack? Seem a bit inconsistent for similar use case. wenlei: Curious why use list for Stack but SmallVector (and Index) for LBRStack? Seem a bit…
		wleiAuthorUnsubmitted Done Reply Inline Actions Currently using list is for easy copy data from UnwinderTrace to UnwindState(see line 90 also list). Why copying the data but not using the Index like LBRStack is because Callstack is changed dynamically during the unwinding(aka need push_front) This is kind of a temporary solution, considering our next step is to use trie node to represent callstack, at that time I will make it consistent to use SmallVector. So Ideally UnwinderTrace only keep the data and UnwindState only keep the index/trie node. wlei: Currently using list is for easy copy data from UnwinderTrace to UnwindState(see line 90 also…
		wenleiUnsubmitted Not Done Reply Inline Actions Got it, makes sense. Thanks for clarifying. wenlei: Got it, makes sense. Thanks for clarifying.
};		};
		wenleiUnsubmitted Not Done Reply Inline Actions nit: anti-execution sounds a bit confusing. I think the canonical term is FIFO order (LBR sampling has two modes, FIFO for trace and FILO for call stack). wenlei: nit: anti-execution sounds a bit confusing. I think the canonical term is FIFO order (LBR…

		// The type of perfscript
		enum PerfScriptType {
		PERF_INVILID = 0,
		PERF_LBR = 1, // Only LBR sample
		PERF_LBR_STACK = 2, // Hybrid sample including call stack and LBR stack.
		};

		// The parsed LBR sample entry.
		struct LBREntry {
		uint64_t Source = 0;
		uint64_t Target = 0;
		// An artificial branch stands for a series of consecutive branches starting
		// from the current binary with a transition through external code and
		// eventually landing back in the current binary.
		bool IsArtificial = false;
		LBREntry(uint64_t S, uint64_t T, bool I)
		: Source(S), Target(T), IsArtificial(I) {}
		};

		// The parsed hybrid sample including call stack and LBR stack.
		struct HybridSample {
		// Profiled binary that current frame address belongs to
		ProfiledBinary *Binary;
		// Call stack recorded in FILO(leaf to root) order
		std::list<uint64_t> CallStack;
		hoyUnsubmitted Done Reply Inline Actions Nit: consider using `std::vector` to reduce the number of memory allocations and for better locality. hoy: Nit: consider using `std::vector` to reduce the number of memory allocations and for better…
		wleiAuthorUnsubmitted Done Reply Inline Actions Here using list is because CallStack has both `push_back` and `push_front` action, in the future it will switch to trie. wlei: Here using list is because CallStack has both `push_back` and `push_front` action, in the…
		// LBR stack recorded in FIFO order
		SmallVector<LBREntry, 16> LBRStack;

		// Used for sample aggregation
		bool operator==(const HybridSample &Other) const {
		if (Other.Binary != Binary)
		return false;
		wenleiUnsubmitted Not Done Reply Inline Actions Can this be a reference as well just like `LBRStack` and point to the input Sample? Note header comment also states "it doesn't hold the data but only keep the pointer/index of the data". wenlei: Can this be a reference as well just like `LBRStack` and point to the input Sample? Note header…
		wleiAuthorUnsubmitted Done Reply Inline Actions Same here, there is pop_front() with CallStack so that I used list. This will be solved by trie. The comments is the final ideal one..sorry for the confusing. wlei: Same here, there is pop_front() with CallStack so that I used list. This will be solved by trie.
		wenleiUnsubmitted Not Done Reply Inline Actions I see. So because Trace is the key of TraceAggregationMap and we will need to mutate this list, you have to create a copy of what's in UnwindTrace, correct? wenlei: I see. So because Trace is the key of TraceAggregationMap and we will need to mutate this list…
		wleiAuthorUnsubmitted Done Reply Inline Actions Yes, the map key enforces the const type and it at least copy one to the State even using trie node( for the tie initialization) because of the mutation. wlei: Yes, the map key enforces the const type and it at least copy one to the State even using trie…
		const std::list<uint64_t> &OtherCallStack = Other.CallStack;
		const SmallVector<LBREntry, 16> &OtherLBRStack = Other.LBRStack;

		if (CallStack.size() != OtherCallStack.size() \|\|
		LBRStack.size() != OtherLBRStack.size())
		return false;

		auto Iter = CallStack.begin();
		for (auto Address : OtherCallStack) {
		if (Address != *Iter++)
		return false;
		}

		for (size_t I = 0; I < OtherLBRStack.size(); I++) {
		wmiUnsubmitted Not Done Reply Inline Actions What is the rationale behind the condition? wmi: What is the rationale behind the condition?
		wenleiUnsubmitted Not Done Reply Inline Actions Ideally we want tip of LBR target and tip of stack leaf to align with the help of PEBS. When we take a stack sample, the leaf IP of stack could be last LBR target address +N bytes, and N shouldn't be too large because N is essentially the sampling skid distance. So I had this in my original prototype as a sanity check to filter out broken records. However the distance chosen was somewhat arbitrary.. In reality, I don't think we've seen this firing with PEBS, but it could happen without PEBS, or if cycles is instead of branch_retired as triggering event. wenlei: Ideally we want tip of LBR target and tip of stack leaf to align with the help of PEBS. When we…
		wmiUnsubmitted Not Done Reply Inline Actions Thanks for the detailed explanation. Copying it to comment will be useful. wmi: Thanks for the detailed explanation. Copying it to comment will be useful.
		if (LBRStack[I].Source != OtherLBRStack[I].Source \|\|
		LBRStack[I].Target != OtherLBRStack[I].Target)
		return false;
		}
		return true;
		}
		};

		// The state for the unwinder, it doesn't hold the data but only keep the
		// pointer/index of the data, While unwinding, the CallStack is changed
		// dynamicially and will be recorded as the context of the sample
		struct UnwindState {
		// Profiled binary that current frame address belongs to
		const ProfiledBinary *Binary;
		// TODO: switch to use trie for call stack
		std::list<uint64_t> CallStack;
		// Used to fall through the LBR stack
		uint32_t LBRIndex = 0;
		// Reference to HybridSample.LBRStack
		const SmallVector<LBREntry, 16> &LBRStack;
		// Used to iterate the address range
		InstructionPointer InstPtr;
		UnwindState(const HybridSample &Sample)
		wmiUnsubmitted Done Reply Inline Actions What do the three uint64_t fields in the map represent? wmi: What do the three uint64_t fields in the map represent?
		wleiAuthorUnsubmitted Done Reply Inline Actions fixed according wenlei's suggestion below by using `ContextBranchCounter` and `ContextRangeCounter` , also give more explanation wlei: fixed according wenlei's suggestion below by using `ContextBranchCounter` and…
		: Binary(Sample.Binary), CallStack(Sample.CallStack),
		wenleiUnsubmitted Done Reply Inline Actions To be accurate, this is actually a counter rather than a map. It can be confusing if this is used for both branch and range, even though branch and range share the bit-wise representation. I think we can typedef `BranchSample` and `RangeSample` both to `std::pair<uint64_t, uint64_t>`, and then typedef `ContextBranchCounter` and `ContextRangeCounter` to `std::unordered_map<std::string, std::map<BranchSample, uint64_t>`, .. wenlei: 1. To be accurate, this is actually a counter rather than a map. 2. It can be confusing if…
		LBRStack(Sample.LBRStack),
		InstPtr(Sample.Binary, Sample.CallStack.front()) {}

		bool validateInitialState() {
		uint64_t LBRLeaf = LBRStack[LBRIndex].Target;
		uint64_t StackLeaf = CallStack.front();
		// When we take a stack sample, ideally the sampling distance between the
		// leaf IP of stack and the last LBR target shouldn't be very large.
		// Use a heuristic size (0x100) to filter out broken records.
		if (StackLeaf < LBRLeaf \|\| StackLeaf >= LBRLeaf + 0x100) {
		WithColor::warning() << "Bogus trace: stack tip = "
		<< format("%#010x", StackLeaf)
		<< ", LBR tip = " << format("%#010x\n", LBRLeaf);
		return false;
		}
		return true;
		}

		void checkStateConsistency() {
		assert(InstPtr.Address == CallStack.front() &&
		"IP should align with context leaf");
		}

		std::string getExpandedContextStr() const {
		return Binary->getExpandedContextStr(CallStack);
		}
		const ProfiledBinary *getBinary() const { return Binary; }
		hoyUnsubmitted Done Reply Inline Actions Nit: `const` qualifier for these getters? hoy: Nit: `const` qualifier for these getters?
		wleiAuthorUnsubmitted Done Reply Inline Actions fixed, good suggestion, thanks! wlei: fixed, good suggestion, thanks!
		hoyUnsubmitted Not Done Reply Inline Actions Actually I meant something like: ProfiledBinary getBinary() const { return Binary; } bool hasNextLBR() const { return LBRIndex < LBRStack.size(); } ... Sorry for the confusion. hoy:* Actually I meant something like: ``` ProfiledBinary *getBinary() const { return Binary; } bool…
		wleiAuthorUnsubmitted Done Reply Inline Actions fixed, thanks for clarification! wlei: fixed, thanks for clarification!
		bool hasNextLBR() const { return LBRIndex < LBRStack.size(); }
		wenleiUnsubmitted Not Done Reply Inline Actions const qualifier here as well? wenlei: const qualifier here as well?
		uint64_t getCurrentLBRSource() const { return LBRStack[LBRIndex].Source; }
		uint64_t getCurrentLBRTarget() const { return LBRStack[LBRIndex].Target; }
		const LBREntry &getCurrentLBR() const { return LBRStack[LBRIndex]; }
		void advanceLBR() { LBRIndex++; }
		};

		// The counter of branch samples for one function indexed by the branch,
		// which is represented as the source and target offset pair.
		using BranchSample = std::map<std::pair<uint64_t, uint64_t>, uint64_t>;
		// The counter of range samples for one function indexed by the range,
		// which is represented as the start and end offset pair.
		using RangeSample = std::map<std::pair<uint64_t, uint64_t>, uint64_t>;
		// Range sample counters indexed by the context string
		using ContextRangeCounter = std::unordered_map<std::string, RangeSample>;
		// Branch sample counters indexed by the context string
		using ContextBranchCounter = std::unordered_map<std::string, BranchSample>;

		// For Hybrid sample counters
		struct ContextSampleCounters {
		ContextRangeCounter RangeCounter;
		ContextBranchCounter BranchCounter;

		void recordRangeCount(std::string &ContextId, uint64_t Start, uint64_t End,
		uint64_t Repeat) {
		RangeCounter[ContextId][{Start, End}] += Repeat;
		}
		hoyUnsubmitted Not Done Reply Inline Actions The comment could be retired since we have a tail call tracker coming that tracks both in-LBR tail calls and out-of-LBR tail calls universally. hoy: The comment could be retired since we have a tail call tracker coming that tracks both in-LBR…
		wenleiUnsubmitted Not Done Reply Inline Actions I think the comment needs to be updated, but explanation here is still needed because IIUC missing frame inference happens more like a post process (hence somewhat orthogonal), and here `isCallState` decides the unwind operation on the stack sample (not changed by frame inference) which will always miss tail call frame (unless dwarf stack walking is used by perf). wenlei: I think the comment needs to be updated, but explanation here is still needed because IIUC…
		void recordBranchCount(std::string &ContextId, uint64_t Source,
		uint64_t Target, uint64_t Repeat) {
		BranchCounter[ContextId][{Source, Target}] += Repeat;
		}
		};

		struct HybridSampleHash {
		uint64_t hashCombine(uint64_t Hash, uint64_t Value) const {
		// Simple DJB2 hash
		return ((Hash << 5) + Hash) + Value;
		}

		uint64_t operator()(const HybridSample &Sample) const {
		uint64_t Hash = 5381;
		Hash = hashCombine(Hash, reinterpret_cast<uint64_t>(Sample.Binary));
		for (const auto &Value : Sample.CallStack) {
		Hash = hashCombine(Hash, Value);
		}
		for (const auto &Entry : Sample.LBRStack) {
		Hash = hashCombine(Hash, Entry.Source);
		Hash = hashCombine(Hash, Entry.Target);
		}
		return Hash;
		wmiUnsubmitted Not Done Reply Inline Actions Rename it to 'isCtxSensitivePerfScript'? wmi: Rename it to 'isCtxSensitivePerfScript'?
		wenleiUnsubmitted Not Done Reply Inline Actions I think context-sensitivity is a concept that only exists at FDO profile level. Thus for perf script, we used the term hybrid which faithfully represents the fact that both LBR and stack are sampled together. wenlei: I think context-sensitivity is a concept that only exists at FDO profile level. Thus for perf…
		}
		};

		// After parsing the sample, we record the samples by aggregating them
		// into this structure and the value is the sample counter.
		using AggregationCounter =
		wenleiUnsubmitted Not Done Reply Inline Actions The idea of aggregation applies to (non-CS) AutoFDO too. It'd be good to put infrastructure in place that can cover both AutoFDO and CSSPGO in a generic way. Perhaps we can treat non-CS AutoFDO profile (or regular LBR perf profile) just like a hybrid profile except stack part is always empty? Is that what you have in mind? wenlei: The idea of aggregation applies to (non-CS) AutoFDO too. It'd be good to put infrastructure in…
		wleiAuthorUnsubmitted Done Reply Inline Actions Yeah, it should not specific to unwinder, I will move to PerfReader to support both AutoFDO and CSSPGO wlei: Yeah, it should not specific to unwinder, I will move to PerfReader to support both AutoFDO and…
		std::unordered_map<HybridSample, uint64_t, HybridSampleHash>;

		/*
		As in hybrid sample we have a group of LBRs and the most recent sampling call
		wmiUnsubmitted Not Done Reply Inline Actions This virtual unwinder is not doing the classic unwinding thing. It is walking through the LBR stack of a LBR sample, based on the sample's callstack, and infer the callstack for each address range covered by the LBR sample. The comment can be more clear about it. wmi: This virtual unwinder is not doing the classic unwinding thing. It is walking through the LBR…
		wleiAuthorUnsubmitted Done Reply Inline Actions Thanks for your suggestion, more comments are added. wlei: Thanks for your suggestion, more comments are added.
		wmiUnsubmitted Not Done Reply Inline Actions That is helpful. Thanks. wmi: That is helpful. Thanks.
		stack, we can walk through those LBRs to infer more call stacks which would be
		used as context for profile. VirtualUnwinder is the class to do the call stack
		unwinding based on LBR state. Two types of unwinding are processd here:
		1) LBR unwinding and 2) linear range unwinding.
		Specifically, for each LBR entry(can be classified into call, return, regular
		branch), LBR unwinding will replay the operation by pushing, popping or
		switching leaf frame towards the call stack and since the initial call stack
		is most recently sampled, the replay should be in anti-execution order, i.e. for
		the regular case, pop the call stack when LBR is call, push frame on call stack
		wenleiUnsubmitted Not Done Reply Inline Actions For linear unwinding, some brief explanation for handling of inlining would be helpful too. wenlei: For linear unwinding, some brief explanation for handling of inlining would be helpful too.
		when LBR is return. After each LBR processed, it also needs to align with the
		next LBR by going through instructions from previous LBR's target to current
		LBR's source, which is the linear unwinding. As instruction from linear range
		can come from different function by inlining, linear unwinding will do the range
		splitting and record counters by the range with same inline context. Over those
		wenleiUnsubmitted Done Reply Inline Actions Do we actually use address now? Can we remove all address and probe related stuff and add them properly in later patches? wenlei: Do we actually use address now? Can we remove all address and probe related stuff and add them…
		unwinding process we will record each call stack as context id and LBR/linear
		range as sample counter for further CS profile generation.
		*/
		class VirtualUnwinder {
		public:
		VirtualUnwinder(ContextSampleCounters *Counters) : SampleCounters(Counters) {}

		wenleiUnsubmitted Done Reply Inline Actions nit: why name line_iterator Index for this one, and different from others? wenlei: nit: why name line_iterator Index for this one, and different from others?
		bool isCallState(UnwindState &State) const {
		// The tail call frame is always missing here in stack sample, we will
		wenleiUnsubmitted Done Reply Inline Actions I suggest let's establish a consistent naming convention here wrt what is a trace and what is an event: Trace is a series of perf events. Each perf event can be an mmap event or a sample. hybrid trace is a series of perf event: mmap events and hybrid samples. hybrid sample is lbr sample plus stack sample. With that, we can rename the following: void parseHybridTrace(line_iterator &Line); -> void parseHybridSample(line_iterator &Line); UnwinderTrace -> HybridSample TraceAggregation -> SampleAggregation wenlei: I suggest let's establish a consistent naming convention here wrt what is a trace and what is…
		wleiAuthorUnsubmitted Done Reply Inline Actions Thanks for suggesting for a consistent naming convention, did the refactor. wlei: Thanks for suggesting for a consistent naming convention, did the refactor.
		// use a specific tail call tracker to infer it.
		return State.getBinary()->addressIsCall(State.getCurrentLBRSource());
		}

		bool isReturnState(UnwindState &State) const {
		// Simply check addressIsReturn, as ret is always reliable, both for
		// regular call and tail call.
		return State.getBinary()->addressIsReturn(State.getCurrentLBRSource());
		}

		void unwindCall(UnwindState &State);
		void unwindLinear(UnwindState &State, uint64_t Repeat);
		void unwindReturn(UnwindState &State);
		void unwindBranchWithinFrame(UnwindState &State);
		bool unwind(const HybridSample &Sample, uint64_t Repeat);
		void recordRangeCount(uint64_t Start, uint64_t End, UnwindState &State,
		uint64_t Repeat);
		void recordBranchCount(const LBREntry &Branch, UnwindState &State,
		uint64_t Repeat);

		private:
		ContextSampleCounters *SampleCounters;
		};

// Filename to binary map		// Filename to binary map
using BinaryMap = StringMap<ProfiledBinary>;		using BinaryMap = StringMap<ProfiledBinary>;
// Address to binary map for fast look-up		// Address to binary map for fast look-up
using AddressBinaryMap = std::map<uint64_t, ProfiledBinary *>;		using AddressBinaryMap = std::map<uint64_t, ProfiledBinary *>;
		// Binary to ContextSampleCounters Map to support multiple binary, we may have
		// same binary loaded at different addresses, they should share the same sample
		// counter
		using BinarySampleCounterMap =
		std::unordered_map<ProfiledBinary *, ContextSampleCounters>;

// Load binaries and read perf trace to parse the events and samples		// Load binaries and read perf trace to parse the events and samples
class PerfReader {		class PerfReader {

BinaryMap BinaryTable;		public:
AddressBinaryMap AddrToBinaryMap; // Used by address-based lookup.		PerfReader(cl::list<std::string> &BinaryFilenames);

		// Hybrid sample(call stack + LBRs) profile traces are seprated by double line
		// break, search for that within the first 4k charactors to avoid going
		// through the whole file.
		wenleiUnsubmitted Done Reply Inline Actions I suggest we either include them properly or remove them from the patch. We still quite a few things not in upstream patch, and it's not possible to have TODOs for all of them. wenlei: I suggest we either include them properly or remove them from the patch. We still quite a few…
		static bool isHybridPerfScript(StringRef FileName) {
		auto BufOrError = MemoryBuffer::getFileOrSTDIN(FileName, 4000);
		hoyUnsubmitted Done Reply Inline Actions Should this be rewritten with a stream-based file reader as done in D89707? hoy: Should this be rewritten with a stream-based file reader as done in D89707?
		wleiAuthorUnsubmitted Done Reply Inline Actions I guess you mean to keep consistent to other part of code? Here you see it only read 4000bytes data from the file(`getFileOrSTDIN(FileName, 4000);`), so there shouldn't have memory issue. Currently stream-based liner only support read one line at a time, it need to search line by line, which would be slower than searching in the whole 4k memory. So which one do you prefer? wlei: I guess you mean to keep consistent to other part of code? Here you see it only read 4000bytes…
		hoyUnsubmitted Not Done Reply Inline Actions I see. The current implementation looks good to me. hoy: I see. The current implementation looks good to me.
		if (!BufOrError)
		exitWithError(BufOrError.getError(), FileName);
		auto Buffer = std::move(BufOrError.get());
		if (Buffer->getBuffer().find("\n\n") == StringRef::npos)
		return false;
		return true;
		}

// The parsed MMap event		// The parsed MMap event
struct MMapEvent {		struct MMapEvent {
uint64_t PID = 0;		uint64_t PID = 0;
uint64_t BaseAddress = 0;		uint64_t BaseAddress = 0;
uint64_t Size = 0;		uint64_t Size = 0;
uint64_t Offset = 0;		uint64_t Offset = 0;
StringRef BinaryPath;		StringRef BinaryPath;
};		};

/// Load symbols and disassemble the code of a give binary.		/// Load symbols and disassemble the code of a give binary.
/// Also register the binary in the binary table.		/// Also register the binary in the binary table.
///		///
ProfiledBinary &loadBinary(const StringRef BinaryPath,		ProfiledBinary &loadBinary(const StringRef BinaryPath,
bool AllowNameConflict = true);		bool AllowNameConflict = true);
void updateBinaryAddress(const MMapEvent &Event);		void updateBinaryAddress(const MMapEvent &Event);
		PerfScriptType getPerfScriptType() const { return PerfType; }
		// Entry of the reader to parse multiple perf traces
		void parsePerfTraces(cl::list<std::string> &PerfTraceFilenames);
		const BinarySampleCounterMap &getBinarySampleCounters() const {
		return BinarySampleCounters;
		}

public:		private:
PerfReader(cl::list<std::string> &BinaryFilenames);

/// Parse a single line of a PERF_RECORD_MMAP2 event looking for a		/// Parse a single line of a PERF_RECORD_MMAP2 event looking for a
/// mapping between the binary name and its memory layout.		/// mapping between the binary name and its memory layout.
///		///
void parseMMap2Event(TraceStream &TraceIt);		void parseMMap2Event(TraceStream &TraceIt);
void parseEvent(TraceStream &TraceIt);		// Parse perf events/samples and do aggregation
		hoyUnsubmitted Done Reply Inline Actions Nit: `const` qualifier for getters? hoy: Nit: `const` qualifier for getters?
// Parse perf events and samples		void parseAndAggregateTrace(StringRef Filename);
void parseTrace(StringRef Filename);		// Parse either an MMAP event or a perf sample
void parsePerfTraces(cl::list<std::string> &PerfTraceFilenames);		void parseEventOrSample(TraceStream &TraceIt);
		// Parse the hybrid sample including the call and LBR line
		void parseHybridSample(TraceStream &TraceIt);
		// Extract call stack from the perf trace lines
		bool extractCallstack(TraceStream &TraceIt, std::list<uint64_t> &CallStack);
		// Extract LBR stack from one perf trace line
		bool extractLBRStack(TraceStream &TraceIt,
		SmallVector<LBREntry, 16> &LBRStack,
		ProfiledBinary *Binary);
		void checkAndSetPerfType(cl::list<std::string> &PerfTraceFilenames);
		// Post process the profile after trace aggregation, we will do simple range
		// overlap computation for AutoFDO, or unwind for CSSPGO(hybrid sample).
		void generateRawProfile();
		// Unwind the hybrid samples after aggregration
		void unwindSamples();
		void printUnwinderOutput();
		// Helper function for looking up binary in AddressBinaryMap
		ProfiledBinary *getBinary(uint64_t Address);

		BinaryMap BinaryTable;
		AddressBinaryMap AddrToBinaryMap; // Used by address-based lookup.

		private:
		BinarySampleCounterMap BinarySampleCounters;
		// Samples with the repeating time generated by the perf reader
		AggregationCounter AggregatedSamples;
		PerfScriptType PerfType;
};		};

} // end namespace sampleprof		} // end namespace sampleprof
} // end namespace llvm		} // end namespace llvm

#endif		#endif

llvm/tools/llvm-profgen/PerfReader.cpp

//===-- PerfReader.cpp - perfscript reader ---------------------- C++ --===//		//===-- PerfReader.cpp - perfscript reader ---------------------- C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
#include "PerfReader.h"		#include "PerfReader.h"

static cl::opt<bool> ShowMmapEvents("show-mmap-events", cl::ReallyHidden,		static cl::opt<bool> ShowMmapEvents("show-mmap-events", cl::ReallyHidden,
cl::init(false), cl::ZeroOrMore,		cl::init(false), cl::ZeroOrMore,
cl::desc("Print binary load events."));		cl::desc("Print binary load events."));

		static cl::opt<bool> ShowUnwinderOutput("show-unwinder-output",
		cl::ReallyHidden, cl::init(false),
		cl::ZeroOrMore,
		cl::desc("Print unwinder output"));

namespace llvm {		namespace llvm {
namespace sampleprof {		namespace sampleprof {

		void VirtualUnwinder::unwindCall(UnwindState &State) {
		// The 2nd frame after leaf could be missing if stack sample is
		// taken when IP is within prolog/epilog, as frame chain isn't
		// setup yet. Fill in the missing frame in that case.
		// TODO: Currently we just assume all the addr that can't match the
		hoyUnsubmitted Not Done Reply Inline Actions Nit: please add a TODO here to check if `Source` is in prolog/epilog using precise prolog/epilog table. hoy: Nit: please add a TODO here to check if `Source` is in prolog/epilog using precise…
		// 2nd frame is in prolog/epilog. In the future, we will switch to
		// pro/epi tracker(Dwarf CFI) for the precise check.
		uint64_t Source = State.getCurrentLBRSource();
		auto Iter = State.CallStack.begin();
		if (State.CallStack.size() == 1 \|\| *(++Iter) != Source) {
		State.CallStack.front() = Source;
		} else {
		State.CallStack.pop_front();
		}
		State.InstPtr.update(Source);
		}

		void VirtualUnwinder::unwindLinear(UnwindState &State, uint64_t Repeat) {
		InstructionPointer &IP = State.InstPtr;
		uint64_t Target = State.getCurrentLBRTarget();
		uint64_t End = IP.Address;
		// Unwind linear execution part
		while (IP.Address >= Target) {
		uint64_t PrevIP = IP.Address;
		IP.backward();
		// Break into segments for implicit call/return due to inlining
		bool SameInlinee =
		State.getBinary()->inlineContextEqual(PrevIP, IP.Address);
		hoyUnsubmitted Done Reply Inline Actions Nit: just use `PrevIP` here instead of using `Start`? hoy: Nit: just use `PrevIP` here instead of using `Start`?
		if (!SameInlinee \|\| PrevIP == Target) {
		recordRangeCount(PrevIP, End, State, Repeat);
		End = IP.Address;
		}
		State.CallStack.front() = IP.Address;
		}
		}

		void VirtualUnwinder::unwindReturn(UnwindState &State) {
		// Add extra frame as we unwind through the return
		const LBREntry &LBR = State.getCurrentLBR();
		uint64_t CallAddr = State.getBinary()->getCallAddrFromFrameAddr(LBR.Target);
		State.CallStack.front() = CallAddr;
		State.CallStack.push_front(LBR.Source);
		wenleiUnsubmitted Done Reply Inline Actions Let `getCurrentLBR` return reference? or why would you want a temporary to bind const-reference to? wenlei: Let `getCurrentLBR` return reference? or why would you want a temporary to bind const-reference…
		wleiAuthorUnsubmitted Done Reply Inline Actions The LBR data is only kept in the UnwinderTrace and UnwindState only keep the index and ref. And because the UnwinderTrace is the key of the TraceAggregationMap, its type is converted to const. If that makes code confusing, I can wrap all the LBRSource/LBRTarget to getCurrentLBRSource()/getCurrentLBRTarget(). wlei: The LBR data is only kept in the UnwinderTrace and UnwindState only keep the index and ref. And…
		wenleiUnsubmitted Not Done Reply Inline Actions Ok, not a big deal I guess since LBREntry is small anyways. but getCurrentLBR can still return const reference? wenlei: Ok, not a big deal I guess since LBREntry is small anyways. but getCurrentLBR can still return…
		State.InstPtr.update(LBR.Source);
		}

		void VirtualUnwinder::unwindBranchWithinFrame(UnwindState &State) {
		// TODO: Tolerate tail call for now, as we may see tail call from libraries.
		// This is only for intra function branches, excluding tail calls.
		uint64_t Source = State.getCurrentLBRSource();
		State.CallStack.front() = Source;
		State.InstPtr.update(Source);
		}

		void VirtualUnwinder::recordRangeCount(uint64_t Start, uint64_t End,
		UnwindState &State, uint64_t Repeat) {
		std::string &&ContextId = State.getExpandedContextStr();
		uint64_t StartOffset = State.getBinary()->virtualAddrToOffset(Start);
		uint64_t EndOffset = State.getBinary()->virtualAddrToOffset(End);
		SampleCounters->recordRangeCount(ContextId, StartOffset, EndOffset, Repeat);
		}

		void VirtualUnwinder::recordBranchCount(const LBREntry &Branch,
		UnwindState &State, uint64_t Repeat) {
		if (Branch.IsArtificial)
		return;
		std::string &&ContextId = State.getExpandedContextStr();
		uint64_t SourceOffset = State.getBinary()->virtualAddrToOffset(Branch.Source);
		uint64_t TargetOffset = State.getBinary()->virtualAddrToOffset(Branch.Target);
		SampleCounters->recordBranchCount(ContextId, SourceOffset, TargetOffset,
		Repeat);
		}

		bool VirtualUnwinder::unwind(const HybridSample &Sample, uint64_t Repeat) {
		// Capture initial state as starting point for unwinding.
		UnwindState State(Sample);

		wenleiUnsubmitted Done Reply Inline Actions Is this still needed now that we normalize traces before aggregation (line 259)? wenlei: Is this still needed now that we normalize traces before aggregation (line 259)?
		// Sanity check - making sure leaf of LBR aligns with leaf of stack sample
		// Stack sample sometimes can be unreliable, so filter out bogus ones.
		if (!State.validateInitialState())
		return false;

		// Also do not attempt linear unwind for the leaf range as it's incomplete.
		bool IsLeaf = true;

		// Now process the LBR samples in parrallel with stack sample
		wenleiUnsubmitted Done Reply Inline Actions Perhaps add a wrapper `State.hasMoreLBRs()` for this? wenlei: Perhaps add a wrapper `State.hasMoreLBRs()` for this?
		wleiAuthorUnsubmitted Done Reply Inline Actions change to `State.hasNextLBR()` wlei: change to `State.hasNextLBR()`
		// Note that we do not reverse the LBR entry order so we can
		// unwind the sample stack as we walk through LBR entries.
		while (State.hasNextLBR()) {
		State.checkStateConsistency();

		// Unwind implicit calls/returns from inlining, along the linear path,
		// break into smaller sub section each with its own calling context.
		if (!IsLeaf) {
		unwindLinear(State, Repeat);
		}
		IsLeaf = false;
		wenleiUnsubmitted Done Reply Inline Actions Use `getCurrentLBR` instead of `State.LBRStack[State.LBRIndex]`? wenlei: Use `getCurrentLBR` instead of `State.LBRStack[State.LBRIndex]`?

		// Save the LBR branch before it gets unwound.
		const LBREntry &Branch = State.getCurrentLBR();

		if (isCallState(State)) {
		// Unwind calls - we know we encountered call if LBR overlaps with
		// transition between leaf the 2nd frame. Note that for calls that
		// were not in the original stack sample, we should have added the
		// extra frame when processing the return paired with this call.
		unwindCall(State);
		} else if (isReturnState(State)) {
		// Unwind returns - check whether the IP is indeed at a return instruction
		wenleiUnsubmitted Done Reply Inline Actions The comment is no longer accurate - now we don't check if src/dst crossing function binary, instead we check whether the IP is indeed at a return instruction and tail call is no longer a problem for this particular processing. (My bad I didn't update the comment in the prototype..) wenlei: The comment is no longer accurate - now we don't check if src/dst crossing function binary…
		unwindReturn(State);
		} else {
		// Unwind branches - for regular intra function branches, we only
		// need to record branch with context.
		wmiUnsubmitted Not Done Reply Inline Actions It is needed or unneeded? Without call/ret, I assume there is no need to push/pop callstack? wmi: It is needed or unneeded? Without call/ret, I assume there is no need to push/pop callstack?
		wenleiUnsubmitted Not Done Reply Inline Actions There's no need to adjust stack for intra-function branches, though conceptually we still need to unwind through the LBRs (updates the `State`). Agreed that the mention of push/pop stack isn't accurate.. wenlei: There's no need to adjust stack for intra-function branches, though conceptually we still need…
		unwindBranchWithinFrame(State);
		}
		State.advanceLBR();
		// Record `branch` with calling context after unwinding.
		recordBranchCount(Branch, State, Repeat);
		}

		return true;
		}

PerfReader::PerfReader(cl::list<std::string> &BinaryFilenames) {		PerfReader::PerfReader(cl::list<std::string> &BinaryFilenames) {
// Load the binaries.		// Load the binaries.
for (auto Filename : BinaryFilenames)		for (auto Filename : BinaryFilenames)
loadBinary(Filename, /AllowNameConflict/ false);		loadBinary(Filename, /AllowNameConflict/ false);
}		}

ProfiledBinary &PerfReader::loadBinary(const StringRef BinaryPath,		ProfiledBinary &PerfReader::loadBinary(const StringRef BinaryPath,
bool AllowNameConflict) {		bool AllowNameConflict) {
Show All 24 Lines	void PerfReader::updateBinaryAddress(const MMapEvent &Event) {
// or if its image is loaded at the same address		// or if its image is loaded at the same address
if (I == BinaryTable.end() \|\| Event.BaseAddress == I->second.getBaseAddress())		if (I == BinaryTable.end() \|\| Event.BaseAddress == I->second.getBaseAddress())
return;		return;

ProfiledBinary &Binary = I->second;		ProfiledBinary &Binary = I->second;

// A binary image could be uploaded and then reloaded at different		// A binary image could be uploaded and then reloaded at different
// place, so update the address map here		// place, so update the address map here
AddrToBinaryMap.erase(Binary.getBaseAddress());		AddrToBinaryMap.erase(Binary.getBaseAddress());
		wmiUnsubmitted Done Reply Inline Actions Can you give an example of LBRStack so it is easy to understand what the code is parsing here? wmi: Can you give an example of LBRStack so it is easy to understand what the code is parsing here?
		wleiAuthorUnsubmitted Done Reply Inline Actions example is added wlei: example is added
AddrToBinaryMap[Event.BaseAddress] = &Binary;		AddrToBinaryMap[Event.BaseAddress] = &Binary;

// Update binary load address.		// Update binary load address.
Binary.setBaseAddress(Event.BaseAddress);		Binary.setBaseAddress(Event.BaseAddress);
}		}

		ProfiledBinary *PerfReader::getBinary(uint64_t Address) {
		auto Iter = AddrToBinaryMap.lower_bound(Address);
		if (Iter == AddrToBinaryMap.end() \|\| Iter->first != Address) {
		if (Iter == AddrToBinaryMap.begin())
		return nullptr;
		Iter--;
		}
		return Iter->second;
		}

		static void printSampleCounter(ContextRangeCounter &Counter) {
		for (auto Range : Counter) {
		outs() << Range.first << "\n";
		for (auto I : Range.second) {
		outs() << " (" << format("%" PRIx64, I.first.first) << ", "
		<< format("%" PRIx64, I.first.second) << "): " << I.second << "\n";
		}
		}
		}

		void PerfReader::printUnwinderOutput() {
		for (auto I : BinarySampleCounters) {
		const ProfiledBinary *Binary = I.first;
		outs() << "Binary(" << Binary->getName().str() << ")'s Range Counter:\n";
		printSampleCounter(I.second.RangeCounter);
		outs() << "\nBinary(" << Binary->getName().str() << ")'s Branch Counter:\n";
		printSampleCounter(I.second.BranchCounter);
		}
		}

		void PerfReader::unwindSamples() {
		for (const auto &Item : AggregatedSamples) {
		const HybridSample &Sample = Item.first;
		VirtualUnwinder Unwinder(&BinarySampleCounters[Sample.Binary]);
		Unwinder.unwind(Sample, Item.second);
		}

		if (ShowUnwinderOutput)
		hoyUnsubmitted Done Reply Inline Actions Nit: curly braces not needed for single-statement block. hoy: Nit: curly braces not needed for single-statement block.
		printUnwinderOutput();
		}

		bool PerfReader::extractLBRStack(TraceStream &TraceIt,
		SmallVector<LBREntry, 16> &LBRStack,
		ProfiledBinary *Binary) {
		// The raw format of LBR stack is like:
		// 0x4005c8/0x4005dc/P/-/-/0 0x40062f/0x4005b0/P/-/-/0 ...
		// ... 0x4005c8/0x4005dc/P/-/-/0
		// It's in FIFO order and seperated by whitespace.
		wmiUnsubmitted Done Reply Inline Actions Removing the else if make the code a little easier to read. if (!SrcIsInternal && !DstIsInternal) continue; if (!SrcIsInternal && DstIsInternal) { PrevTrDst = Dst; continue; } if (SrcIsInternal && !DstIsInternal) { if (!PrevTrDst) continue; Dst = PrevTrDst; PrevTrDst = 0; IsArtificial = true; } // Filter out the branch sample ... wmi: Removing the else if make the code a little easier to read. ``` if (!SrcIsInternal && !
		SmallVector<StringRef, 32> Records;
		TraceIt.getCurrentLine().split(Records, " ");

		wenleiUnsubmitted Done Reply Inline Actions Remove internal task Id T24431811 wenlei: Remove internal task Id T24431811
		// Extract leading instruction pointer if present, use single
		// list to pass out as reference.
		size_t Index = 0;
		if (!Records.empty() && Records[0].find('/') == StringRef::npos) {
		Index = 1;
		}
		// Now extract LBR samples - note that we do not reverse the
		// LBR entry order so we can unwind the sample stack as we walk
		// through LBR entries.
		uint64_t PrevTrDst = 0;

		while (Index < Records.size()) {
		auto &Token = Records[Index++];
		if (Token.size() == 0)
		continue;

		SmallVector<StringRef, 8> Addresses;
		Token.split(Addresses, "/");
		uint64_t Src;
		uint64_t Dst;
		Addresses[0].substr(2).getAsInteger(16, Src);
		Addresses[1].substr(2).getAsInteger(16, Dst);

		bool SrcIsInternal = Binary->addressIsCode(Src);
		wmiUnsubmitted Done Reply Inline Actions Same as above, better to give an example showing what the function is parsing here. wmi: Same as above, better to give an example showing what the function is parsing here.
		bool DstIsInternal = Binary->addressIsCode(Dst);
		bool IsArtificial = false;
		// Ignore branches outside the current binary.
		if (!SrcIsInternal && !DstIsInternal)
		continue;
		if (!SrcIsInternal && DstIsInternal) {
		// For transition from external code (such as dynamic libraries) to
		// the current binary, keep track of the branch target which will be
		// grouped with the Source of the last transition from the current
		// binary.
		PrevTrDst = Dst;
		continue;
		}
		if (SrcIsInternal && !DstIsInternal) {
		// For transition to external code, group the Source with the next
		// availabe transition target.
		if (!PrevTrDst)
		continue;
		Dst = PrevTrDst;
		PrevTrDst = 0;
		IsArtificial = true;
		}
		// TODO: filter out buggy duplicate branches on Skylake

		LBRStack.emplace_back(LBREntry(Src, Dst, IsArtificial));
		}
		TraceIt.advance();
		return !LBRStack.empty();
		}

		bool PerfReader::extractCallstack(TraceStream &TraceIt,
		std::list<uint64_t> &CallStack) {
		// The raw format of call stack is like:
		// 4005dc # leaf frame
		// 400634
		// 400684 # root frame
		// It's in bottom-up order with each frame in one line.
		wenleiUnsubmitted Done Reply Inline Actions This function can take reference to `Trace.CallStack` directly since it's only checking call stack. Actually this function doesn't seem needed, line 290 check non-empty already, and we just need a `return Trace.Binary->addressInPrologEpilog(Trace.CallStack.front())` at line 299. wenlei: This function can take reference to `Trace.CallStack` directly since it's only checking call…

		// Extract stack frames from sample
		wenleiUnsubmitted Done Reply Inline Actions remove the blank line? wenlei: remove the blank line?
		ProfiledBinary *Binary = nullptr;
		while (!TraceIt.isAtEoF() && !TraceIt.getCurrentLine().startswith(" 0x")) {
		StringRef FrameStr = TraceIt.getCurrentLine().ltrim();
		// We might get an empty line at the beginning or comments, skip it
		uint64_t FrameAddr = 0;
		if (FrameStr.getAsInteger(16, FrameAddr)) {
		wmiUnsubmitted Done Reply Inline Actions What sampling event are you using? If br_inst_retired:near_taken is used, the sample won't end up in prolog/epilog. wmi: What sampling event are you using? If br_inst_retired:near_taken is used, the sample won't end…
		hoyUnsubmitted Done Reply Inline Actions Yes, we are sampling the br_inst_retired:near_taken event. Normally there will be no branch instructions in a prolog. This deals with a weird case where a branch instruction ends up in a shrink-wrapped prolog. hoy: Yes, we are sampling the br_inst_retired:near_taken event. Normally there will be no branch…
		wmiUnsubmitted Done Reply Inline Actions I see. Thanks! wmi: I see. Thanks!
		TraceIt.advance();
		break;
		}
		TraceIt.advance();
		if (!Binary) {
		Binary = getBinary(FrameAddr);
		// we might have addr not match the MMAP, skip it
		if (!Binary) {
		if (AddrToBinaryMap.size() == 0)
		WithColor::warning() << "No MMAP event in the perfscript, create it "
		"with '--show-mmap-events'\n";
		break;
		}
		}
		// Currently intermixed frame from different binaries is not supported.
		// Ignore bottom frames not from binary of interest.
		if (!Binary->addressIsCode(FrameAddr))
		break;

		// We need to translate return address to call address
		// for non-leaf frames
		if (!CallStack.empty()) {
		FrameAddr = Binary->getCallAddrFromFrameAddr(FrameAddr);
		}

		CallStack.emplace_back(FrameAddr);
		}

		if (CallStack.empty())
		return false;
		// Skip other unrelated line, find the next valid LBR line
		while (!TraceIt.isAtEoF() && !TraceIt.getCurrentLine().startswith(" 0x")) {
		TraceIt.advance();
		}
		// Filter out broken stack sample. We may not have complete frame info
		// if sample end up in prolog/epilog, the result is dangling context not
		// connected to entry point. This should be relatively rare thus not much
		// impact on overall profile quality. However we do want to filter them
		// out to reduce the number of different calling contexts. One instance
		// of such case - when sample landed in prolog/epilog, somehow stack
		// walking will be broken in an unexpected way that higher frames will be
		// missing.
		return !Binary->addressInPrologEpilog(CallStack.front());
		}

		void PerfReader::parseHybridSample(TraceStream &TraceIt) {
		// The raw hybird sample started with call stack in FILO order and followed
		// intermediately by LBR sample
		// e.g.
		// 4005dc # call stack leaf
		// 400634
		// 400684 # call stack root
		// 0x4005c8/0x4005dc/P/-/-/0 0x40062f/0x4005b0/P/-/-/0 ...
		// ... 0x4005c8/0x4005dc/P/-/-/0 # LBR Entries
		//
		HybridSample Sample;

		// Parsing call stack and populate into HybridSample.CallStack
		if (!extractCallstack(TraceIt, Sample.CallStack)) {
		// Skip the next LBR line matched current call stack
		if (!TraceIt.isAtEoF() && TraceIt.getCurrentLine().startswith(" 0x"))
		TraceIt.advance();
		return;
		}
		// Set the binary current sample belongs to
		Sample.Binary = getBinary(Sample.CallStack.front());

		if (!TraceIt.isAtEoF() && TraceIt.getCurrentLine().startswith(" 0x")) {
		// Parsing LBR stack and populate into HybridSample.LBRStack
		if (extractLBRStack(TraceIt, Sample.LBRStack, Sample.Binary)) {
		// Canonicalize stack leaf to avoid 'random' IP from leaf frame skew LBR
		// ranges
		Sample.CallStack.front() = Sample.LBRStack[0].Target;
		// Record samples by aggregation
		AggregatedSamples[Sample]++;
		}
		} else {
		// LBR sample is encoded in single line after stack sample
		exitWithError("'Hybrid perf sample is corrupted, No LBR sample line");
		}
		hoyUnsubmitted Done Reply Inline Actions Use `exitWithError`? hoy: Use `exitWithError`?
		}

void PerfReader::parseMMap2Event(TraceStream &TraceIt) {		void PerfReader::parseMMap2Event(TraceStream &TraceIt) {
// Parse a line like:		// Parse a line like:
// PERF_RECORD_MMAP2 2113428/2113428: [0x7fd4efb57000(0x204000) @ 0		// PERF_RECORD_MMAP2 2113428/2113428: [0x7fd4efb57000(0x204000) @ 0
// 08:04 19532229 3585508847]: r-xp /usr/lib64/libdl-2.17.so		// 08:04 19532229 3585508847]: r-xp /usr/lib64/libdl-2.17.so
constexpr static const char *const Pattern =		constexpr static const char *const Pattern =
"PERF_RECORD_MMAP2 ([0-9]+)/[0-9]+: "		"PERF_RECORD_MMAP2 ([0-9]+)/[0-9]+: "
"\\[(0x[a-f0-9]+)\\((0x[a-f0-9]+)\\) @ "		"\\[(0x[a-f0-9]+)\\((0x[a-f0-9]+)\\) @ "
"(0x[a-f0-9]+\|0) .\\]: [-a-z]+ (.)";		"(0x[a-f0-9]+\|0) .\\]: [-a-z]+ (.)";
Show All 27 Lines	void PerfReader::parseMMap2Event(TraceStream &TraceIt) {
Fields[MMAPPED_SIZE].getAsInteger(0, Event.Size);		Fields[MMAPPED_SIZE].getAsInteger(0, Event.Size);
Fields[PAGE_OFFSET].getAsInteger(0, Event.Offset);		Fields[PAGE_OFFSET].getAsInteger(0, Event.Offset);
Event.BinaryPath = Fields[BINARY_PATH];		Event.BinaryPath = Fields[BINARY_PATH];
updateBinaryAddress(Event);		updateBinaryAddress(Event);
if (ShowMmapEvents) {		if (ShowMmapEvents) {
outs() << "Mmap: Binary " << Event.BinaryPath << " loaded at "		outs() << "Mmap: Binary " << Event.BinaryPath << " loaded at "
<< format("0x%" PRIx64 ":", Event.BaseAddress) << " \n";		<< format("0x%" PRIx64 ":", Event.BaseAddress) << " \n";
}		}
		TraceIt.advance();
}		}

void PerfReader::parseEvent(TraceStream &TraceIt) {		void PerfReader::parseEventOrSample(TraceStream &TraceIt) {
if (TraceIt.getCurrentLine().startswith("PERF_RECORD_MMAP2"))		if (TraceIt.getCurrentLine().startswith("PERF_RECORD_MMAP2"))
parseMMap2Event(TraceIt);		parseMMap2Event(TraceIt);
		else if (getPerfScriptType() == PERF_LBR_STACK)
		parseHybridSample(TraceIt);
		else {
		// TODO: parse other type sample
TraceIt.advance();		TraceIt.advance();
}		}
		}

void PerfReader::parseTrace(StringRef Filename) {		void PerfReader::parseAndAggregateTrace(StringRef Filename) {
// Trace line iterator		// Trace line iterator
TraceStream TraceIt(Filename);		TraceStream TraceIt(Filename);
while (!TraceIt.isAtEoF()) {		while (!TraceIt.isAtEoF())
parseEvent(TraceIt);		parseEventOrSample(TraceIt);
		}

		void PerfReader::checkAndSetPerfType(
		cl::list<std::string> &PerfTraceFilenames) {
		bool HasHybridPerf = true;
		for (auto FileName : PerfTraceFilenames) {
		if (!isHybridPerfScript(FileName)) {
		HasHybridPerf = false;
		break;
		}
		}

		if (HasHybridPerf) {
		// Set up ProfileIsCS to enable context-sensitive functionalities
		// in SampleProf
		FunctionSamples::ProfileIsCS = true;
		PerfType = PERF_LBR_STACK;
		hoyUnsubmitted Done Reply Inline Actions `PerfType` should be defined on the `else` branch if it is not initialized anywhere else. hoy: `PerfType` should be defined on the `else` branch if it is not initialized anywhere else.

		} else {
		// TODO: Support other type of perf script
		PerfType = PERF_INVILID;
		}

		if (BinaryTable.size() > 1) {
		// TODO: remove this if everything is ready to support multiple binaries.
		exitWithError("Currently only support one input binary, multiple binaries' "
		"profile will be merged in one profile and make profile "
		"summary info inaccurate. Please use `perfdata` to merge "
		"profiles from multiple binaries.");
		}
		}

		void PerfReader::generateRawProfile() {
		if (getPerfScriptType() == PERF_LBR_STACK) {
		// Unwind samples if it's hybird sample
		unwindSamples();
		} else if (getPerfScriptType() == PERF_LBR) {
		// TODO: range overlap computation for regular AutoFDO
}		}
}		}

void PerfReader::parsePerfTraces(cl::list<std::string> &PerfTraceFilenames) {		void PerfReader::parsePerfTraces(cl::list<std::string> &PerfTraceFilenames) {
// Parse perf traces.		// Check and set current perfscript type
		checkAndSetPerfType(PerfTraceFilenames);
		// Parse perf traces and do aggregation.
for (auto Filename : PerfTraceFilenames)		for (auto Filename : PerfTraceFilenames)
parseTrace(Filename);		parseAndAggregateTrace(Filename);

		generateRawProfile();
}		}

} // end namespace sampleprof		} // end namespace sampleprof
		wenleiUnsubmitted Done Reply Inline Actions What would be the workflow for (non-CS) AutoFDO with this new implementation? It looks like `parseTrace` is responsible for aggregation only, then even for AutoFDO, there'll be a post-process after that, to get range:count, right? so it looks to me that a unified workflow could be something like this? for (auto Filename : PerfTraceFilenames) parseAndAggregateTrace(Filename); generateRawProfile(); In side `generateRawProfile`, we would do simple range overlap computation for AutoFDO, or unwind for CSSPGO. Also see comments on `AggregationCounter` - in addition to unifying the workflow, it would be good to unify data structure as well if possible. What do you think? wenlei: What would be the workflow for (non-CS) AutoFDO with this new implementation? It looks like…
		wleiAuthorUnsubmitted Done Reply Inline Actions Good suggestion! As you mention, we can incorporate all into unwinder by treating non-CS profile as hybrid sample with empty call stack. So how about we do that when implementing non-CS part, right now I will change to code like blow? void generateRawProfile (..) { if(getPerfScriptType() == PERF_LBR) { // range overlap computation for regular AutoFdo ... } else if (getPerfScriptType() == PERF_LBR_STACK) { // Unwind samples if it's hybird sample unwindSamples(); } } wlei: Good suggestion! As you mention, we can incorporate all into unwinder by treating non-CS…
		wenleiUnsubmitted Not Done Reply Inline Actions Yes, that looks good for now. wenlei: Yes, that looks good for now.
} // end namespace llvm		} // end namespace llvm

llvm/tools/llvm-profgen/ProfileGenerator.h

This file was added.

				//===-- ProfileGenerator.h - Profile Generator ------------------ C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_TOOLS_LLVM_PROGEN_PROFILEGENERATOR_H
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: header guard does not follow preferred style [llvm-header-guard] not useful Lint: Pre-merge checks: clang-tidy: warning: header guard does not follow preferred style [llvm-header-guard] [[https…
				#define LLVM_TOOLS_LLVM_PROGEN_PROFILEGENERATOR_H
				#include "ErrorHandling.h"
				#include "PerfReader.h"
				#include "ProfiledBinary.h"
				#include "llvm/ProfileData/SampleProfWriter.h"

				using namespace llvm;
				using namespace sampleprof;

				namespace llvm {
				namespace sampleprof {

				class ProfileGenerator {

				public:
				wmiUnsubmitted Not Done Reply Inline Actions I thought the tool can also generate profile for current debug info based non CS AFDO but I am not sure. I guess that is a special case handled by CSProfileGenerator. Could you confirm? wmi: I thought the tool can also generate profile for current debug info based non CS AFDO but I am…
				wenleiUnsubmitted Not Done Reply Inline Actions Yes, eventually llvm-profgen will support both. Our internal implementation for AFDO profile generation is also llvm based, but it's somewhat separated from this one. And we wanted to do some refactoring before we merge the two. That said, agreed that the two can share common interface though I think we could defer that a bit? We will upstream the AFDO profile generation after CSSPGO part is cleared. wenlei: Yes, eventually llvm-profgen will support both. Our internal implementation for AFDO profile…
				ProfileGenerator(){};
				virtual ~ProfileGenerator() = default;
				static std::unique_ptr<ProfileGenerator>
				create(const BinarySampleCounterMap &SampleCounters,
				enum PerfScriptType SampleType);
				virtual void generateProfile() = 0;

				// Use SampleProfileWriter to serialize profile map
				void write();

				protected:
				/*
				For each region boundary point, mark if it is begin or end (or both) of
				the region. Boundary points are inclusive. Log the sample count as well
				so we can use it when we compute the sample count of each disjoint region
				later. Note that there might be multiple ranges with different sample
				count that share same begin/end point. We need to accumulate the sample
				count for the boundary point for such case, because for the example
				below,

				\|<--100-->\|
				\|<------200------>\|
				A B C

				sample count for disjoint region [A,B] would be 300.
				*/
				void findDisjointRanges(RangeSample &DisjointRanges,
				const RangeSample &Ranges);

				// Used by SampleProfileWriter
				StringMap<FunctionSamples> ProfileMap;
				};

				hoyUnsubmitted Done Reply Inline Actions This sounds like a method of `CSProfileGenerator`. hoy: This sounds like a method of `CSProfileGenerator`.
				class CSProfileGenerator : public ProfileGenerator {
				const BinarySampleCounterMap &BinarySampleCounters;

				public:
				CSProfileGenerator(const BinarySampleCounterMap &Counters)
				: BinarySampleCounters(Counters){};

				public:
				void generateProfile() override {
				// Fill in function body samples
				populateFunctionBodySamples();

				// Fill in boundary sample counts as well as call site samples for calls
				populateFunctionBoundarySamples();

				// Fill in call site value sample for inlined calls and also use context to
				// infer missing samples. Since we don't have call count for inlined
				// functions, we estimate it from inlinee's profile using the entry of the
				// body sample.
				populateInferredFunctionSamples();
				}

				private:
				// Helper function for updating body sample for a leaf location in
				// FunctionProfile
				void updateBodySamplesforFunctionProfile(FunctionSamples &FunctionProfile,
				hoyUnsubmitted Done Reply Inline Actions Please make this TODO more clear. hoy: Please make this TODO more clear.
				const FrameLocation &LeafLoc,
				uint64_t Count);
				// Lookup or create FunctionSamples for the context
				FunctionSamples &getFunctionProfileForContext(StringRef ContextId);
				void populateFunctionBodySamples();
				void populateFunctionBoundarySamples();
				void populateInferredFunctionSamples();
				};

				} // end namespace sampleprof
				} // end namespace llvm

				#endif

llvm/tools/llvm-profgen/ProfileGenerator.cpp

This file was added.

				//===-- ProfileGenerator.cpp - Profile Generator ---------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				wenleiUnsubmitted Done Reply Inline Actions nit: all header comments are screwed up by our internal linter. wenlei: nit: all header comments are screwed up by our internal linter.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "ProfileGenerator.h"

				static cl::opt<std::string> OutputFilename("output", cl::value_desc("output"),
				cl::Required,
				cl::desc("Output profile file"));

				static cl::opt<SampleProfileFormat> OutputFormat(
				"format", cl::desc("Format of output profile"), cl::init(SPF_Text),
				cl::values(
				clEnumValN(SPF_Binary, "binary", "Binary encoding (default)"),
				clEnumValN(SPF_Compact_Binary, "compbinary", "Compact binary encoding"),
				clEnumValN(SPF_Ext_Binary, "extbinary", "Extensible binary encoding"),
				clEnumValN(SPF_Text, "text", "Text encoding"),
				clEnumValN(SPF_GCC, "gcc",
				"GCC encoding (only meaningful for -sample)")));

				using namespace llvm;
				using namespace sampleprof;

				namespace llvm {
				namespace sampleprof {

				std::unique_ptr<ProfileGenerator>
				ProfileGenerator::create(const BinarySampleCounterMap &BinarySampleCounters,
				enum PerfScriptType SampleType) {
				std::unique_ptr<ProfileGenerator> ProfileGenerator;

				if (SampleType == PERF_LBR_STACK) {
				ProfileGenerator.reset(new CSProfileGenerator(BinarySampleCounters));
				} else {
				// TODO:
				llvm_unreachable("Unsupported perfscript!");
				}

				return ProfileGenerator;
				}

				void ProfileGenerator::write() {
				auto WriterOrErr = SampleProfileWriter::create(OutputFilename, OutputFormat);
				if (std::error_code EC = WriterOrErr.getError())
				hoyUnsubmitted Not Done Reply Inline Actions I'm wondering if a separate profile file should be output for each binary. Since the samples are already separated for binaries via `BinarySampleCounters`, `ProfileMap` can be made like that too. hoy: I'm wondering if a separate profile file should be output for each binary. Since the samples…
				wleiAuthorUnsubmitted Done Reply Inline Actions Yeah, it's doable. but that needs more CL design, currently we only support one output file, so we have to change supporting multiple output files which also need an exact one-one mapping to the binary. So we can use `OutputFilenames` to receives multiple output files and match them in order on the command line? or I'm also thinking we just remain this and if the user really need to separate the output for binary, they could call the tool multiple times with different input binary. any suggestions on the command? wlei: Yeah, it's doable. but that needs more CL design, currently we only support one output file, so…
				hoyUnsubmitted Not Done Reply Inline Actions I see. Let's keep a single output for now. hoy: I see. Let's keep a single output for now.
				wenleiUnsubmitted Not Done Reply Inline Actions What about limiting to single binary input for now? Error our with message saying unsupported if multiple binaries are provided. Generating profiles for multiple binaries in a single output file will make the profile summary info inaccurate (e.g. percentile based hot thresholds). wenlei: What about limiting to single binary input for now? Error our with message saying unsupported…
				exitWithError(EC, OutputFilename);
				auto Writer = std::move(WriterOrErr.get());
				Writer->write(ProfileMap);
				}

				hoyUnsubmitted Done Reply Inline Actions typedef or using an alias like `RangeCountMap` for this type? hoy: typedef or using an alias like `RangeCountMap` for this type?
				void ProfileGenerator::findDisjointRanges(RangeSample &DisjointRanges,
				const RangeSample &Ranges) {

				/*
				Regions may overlap with each other. Using the boundary info, find all
				disjoint ranges and their sample count. BoundaryPoint contains the count
				mutiple samples begin/end at this points.

				\|<--100-->\| Sample1
				\|<------200------>\| Sample2
				A B C

				In the example above,
				Sample1 begins at A, ends at B, its value is 100.
				Sample2 beings at A, ends at C, its value is 200.
				For A, BeginCount is the sum of sample begins at A, which is 300 and no
				samples ends at A, so EndCount is 0.
				Then boundary points A, B, and C with begin/end counts are:
				wmiUnsubmitted Not Done Reply Inline Actions Why there is region [A, B]: 300, but B: (0, 100) only has 100 sample count? wmi: Why there is region [A, B]: 300, but B: (0, 100) only has 100 sample count?
				wleiAuthorUnsubmitted Done Reply Inline Actions Sorry for the confusion. See the graph below, here B:(0, 100) is the boundary point, 0 means no samples begin at B, 100 means one sample(sample1) ends at B whose count is 100. I changed the explanation in the comment, see whether it's clear or not. \|<--100-->\| Sample1 \|<------200------>\| Sample2 A B C wlei: Sorry for the confusion. See the graph below, here B:(0, 100) is the boundary point, 0 means no…
				wmiUnsubmitted Not Done Reply Inline Actions It is helpful too. Thanks. wmi: It is helpful too. Thanks.
				A: (300, 0)
				B: (0, 100)
				C: (0, 200)
				*/
				struct BoundaryPoint {
				// Sum of sample counts beginning at this point
				uint64_t BeginCount;
				// Sum of sample counts ending at this point
				uint64_t EndCount;

				BoundaryPoint() : BeginCount(0), EndCount(0){};

				void addBeginCount(uint64_t Count) { BeginCount += Count; }

				void addEndCount(uint64_t Count) { EndCount += Count; }
				};

				/*
				For the above example. With boundary points, follwing logic finds two
				disjoint region of

				[A,B]: 300
				[B+1,C]: 200

				If there is a boundary point that both begin and end, the point itself
				becomes a separate disjoint region. For example, if we have original
				ranges of

				\|<--- 100 --->\|
				\|<--- 200 --->\|
				A B C

				there are three boundary points with their begin/end counts of

				A: (100, 0)
				B: (200, 100)
				C: (0, 200)

				the disjoint ranges would be

				[A, B-1]: 100
				[B, B]: 300
				[B+1, C]: 200.
				*/
				std::map<uint64_t, BoundaryPoint> Boundaries;

				for (auto Item : Ranges) {
				uint64_t Begin = Item.first.first;
				uint64_t End = Item.first.second;
				uint64_t Count = Item.second;
				if (Boundaries.find(Begin) == Boundaries.end())
				Boundaries[Begin] = BoundaryPoint();
				Boundaries[Begin].addBeginCount(Count);

				if (Boundaries.find(End) == Boundaries.end())
				Boundaries[End] = BoundaryPoint();
				Boundaries[End].addEndCount(Count);
				}

				uint64_t BeginAddress = 0;
				int Count = 0;
				for (auto Item : Boundaries) {
				uint64_t Address = Item.first;
				BoundaryPoint &Point = Item.second;
				if (Point.BeginCount) {
				if (BeginAddress)
				DisjointRanges[{BeginAddress, Address - 1}] = Count;
				Count += Point.BeginCount;
				BeginAddress = Address;
				}
				if (Point.EndCount) {
				assert(BeginAddress && "First boundary point cannot be 'end' point");
				DisjointRanges[{BeginAddress, Address}] = Count;
				Count -= Point.EndCount;
				BeginAddress = Address + 1;
				hoyUnsubmitted Done Reply Inline Actions Just `return ProfileMap[ContextStr]`? hoy: Just `return ProfileMap[ContextStr]`?
				}
				}
				}

				FunctionSamples &
				CSProfileGenerator::getFunctionProfileForContext(StringRef ContextStr) {
				auto Ret = ProfileMap.try_emplace(ContextStr, FunctionSamples());
				if (Ret.second) {
				SampleContext FContext(Ret.first->first(), RawContext);
				FunctionSamples &FProfile = Ret.first->second;
				FProfile.setName(FContext.getName());
				FProfile.setContext(FContext);
				}
				return Ret.first->second;
				}

				void CSProfileGenerator::updateBodySamplesforFunctionProfile(
				FunctionSamples &FunctionProfile, const FrameLocation &LeafLoc,
				uint64_t Count) {
				wenleiUnsubmitted Done Reply Inline Actions This would not be consistent with the definition of total samples. I think we should only add the portion that was added to body samples. wenlei: This would not be consistent with the definition of total samples. I think we should only add…
				// Filter out invalid negative(int type) lineOffset
				if (LeafLoc.second.LineOffset & 0x80000000)
				return;
				// Use the maximum count of samples with same line location
				ErrorOr<uint64_t> R = FunctionProfile.findSamplesAt(
				LeafLoc.second.LineOffset, LeafLoc.second.Discriminator);
				uint64_t PreviousCount = R ? R.get() : 0;
				if (PreviousCount < Count) {
				FunctionProfile.addBodySamples(LeafLoc.second.LineOffset,
				LeafLoc.second.Discriminator,
				Count - PreviousCount);
				FunctionProfile.addTotalSamples(Count - PreviousCount);
				}
				}

				void CSProfileGenerator::populateFunctionBodySamples() {
				for (const auto &BI : BinarySampleCounters) {
				ProfiledBinary *Binary = BI.first;
				for (const auto &CI : BI.second.RangeCounter) {
				StringRef ContextId(CI.first);
				// Get or create function profile for the range
				FunctionSamples &FunctionProfile =
				getFunctionProfileForContext(ContextId);
				// Compute disjoint ranges first, so we can use MAX
				// for calculating count for each location.
				RangeSample Ranges;
				findDisjointRanges(Ranges, CI.second);

				for (auto Range : Ranges) {
				uint64_t RangeBegin = Binary->offsetToVirtualAddr(Range.first.first);
				hoyUnsubmitted Done Reply Inline Actions Can this be moved out of the loop? The string-based look up is potentially slow. hoy: Can this be moved out of the loop? The string-based look up is potentially slow.
				uint64_t RangeEnd = Binary->offsetToVirtualAddr(Range.first.second);
				uint64_t Count = Range.second;
				// Disjoint ranges have introduce zero-filled gap that
				// doesn't belong to current context, filter them out.
				if (Count == 0)
				hoyUnsubmitted Done Reply Inline Actions `Binary` can be achieved via `Reader` so no need to pass in as an argument. hoy: `Binary` can be achieved via `Reader` so no need to pass in as an argument.
				continue;

				InstructionPointer IP(Binary, RangeBegin, true);

				// Disjoint ranges may have range in the middle of two instr,
				// e.g. If Instr1 at Addr1, and Instr2 at Addr2, disjoint range
				// can be Addr1+1 to Addr2-1. We should ignore such range.
				if (IP.Address > RangeEnd)
				continue;
				wenleiUnsubmitted Done Reply Inline Actions This now actually does more than head samples as call target value sample is handled here too. Perhaps `populateFunctionBoundarySamples` ? wenlei: This now actually does more than head samples as call target value sample is handled here too.

				while (IP.Address <= RangeEnd) {
				uint64_t Offset = Binary->virtualAddrToOffset(IP.Address);
				const FrameLocation &LeafLoc = Binary->getInlineLeafFrameLoc(Offset);
				// Recording body sample for this specific context
				updateBodySamplesforFunctionProfile(FunctionProfile, LeafLoc, Count);
				// Move to next IP within the range
				IP.advance();
				hoyUnsubmitted Not Done Reply Inline Actions @wmi Looks like here is a place we use ELF symbol name as function name. May need to call `getCanonicalFnName` here. hoy: @wmi Looks like here is a place we use ELF symbol name as function name. May need to call…
				}
				}
				}
				}
				}
				wenleiUnsubmitted Done Reply Inline Actions Same as line 195, this could be hoisted out of the loop? wenlei: Same as line 195, this could be hoisted out of the loop?

				void CSProfileGenerator::populateFunctionBoundarySamples() {
				for (const auto &BI : BinarySampleCounters) {
				ProfiledBinary *Binary = BI.first;
				for (const auto &CI : BI.second.BranchCounter) {
				StringRef ContextId(CI.first);
				// Get or create function profile for branch Source
				FunctionSamples &FunctionProfile =
				getFunctionProfileForContext(ContextId);

				for (auto Entry : CI.second) {
				uint64_t SourceOffset = Entry.first.first;
				uint64_t TargetOffset = Entry.first.second;
				uint64_t Count = Entry.second;
				// Get the callee name by branch target if it's a call branch
				StringRef CalleeName = FunctionSamples::getCanonicalFnName(
				Binary->getFuncFromStartOffset(TargetOffset));
				if (CalleeName.size() == 0)
				wenleiUnsubmitted Done Reply Inline Actions explicit namespace is not needed here. wenlei: explicit namespace is not needed here.
				continue;

				// Record called target sample and its count
				const FrameLocation &LeafLoc =
				Binary->getInlineLeafFrameLoc(SourceOffset);

				FunctionProfile.addCalledTargetSamples(LeafLoc.second.LineOffset,
				LeafLoc.second.Discriminator,
				CalleeName, Count);
				FunctionProfile.addTotalSamples(Count);

				// Record head sample for called target(callee)
				// TODO: Cleanup ' @ '
				std::string CalleeContextId =
				getCallSite(LeafLoc) + " @ " + CalleeName.str();
				if (ContextId.find(" @ ") != StringRef::npos) {
				CalleeContextId =
				ContextId.rsplit(" @ ").first.str() + " @ " + CalleeContextId;
				}

				if (ProfileMap.find(CalleeContextId) != ProfileMap.end()) {
				FunctionSamples &CalleeProfile = ProfileMap[CalleeContextId];
				assert(Count != 0 && "Unexpected zero weight branch");
				wmiUnsubmitted Done Reply Inline Actions Conext --> Context? wmi: Conext --> Context?
				if (CalleeProfile.getName().size()) {
				wenleiUnsubmitted Done Reply Inline Actions This is now more than just value samples. The difference is this one us context to infer missing samples, but others use range and branch to populate samples directly. So name it `populateInferredFunctionSamples`? wenlei: This is now more than just value samples. The difference is this one us context to infer…
				CalleeProfile.addHeadSamples(Count);
				}
				}
				}
				}
				}
				}

				static FrameLocation getCallerContext(StringRef CalleeContext,
				StringRef &CallerNameWithContext) {
				StringRef CallerContext = CalleeContext.rsplit(" @ ").first;
				CallerNameWithContext = CallerContext.rsplit(':').first;
				auto ContextSplit = CallerContext.rsplit(" @ ");
				FrameLocation LeafFrameLoc = {"", {0, 0}};
				StringRef Funcname;
				wenleiUnsubmitted Done Reply Inline Actions nit: name it `CallerLeafFrameLoc`? wenlei: nit: name it `CallerLeafFrameLoc`?
				SampleContext::decodeContextString(ContextSplit.second, Funcname,
				LeafFrameLoc.second);
				LeafFrameLoc.first = Funcname.str();
				return LeafFrameLoc;
				wenleiUnsubmitted Done Reply Inline Actions `caller_profile` -> `CallerProfile` wenlei: `caller_profile` -> `CallerProfile`
				}

				void CSProfileGenerator::populateInferredFunctionSamples() {
				for (const auto &Item : ProfileMap) {
				const StringRef CalleeContext = Item.first();
				const FunctionSamples &CalleeProfile = Item.second;

				// If we already have head sample counts, we must have value profile
				// for call sites added already. Skip to avoid double counting.
				if (CalleeProfile.getHeadSamples())
				continue;
				// If we don't have context, nothing to do for caller's call site.
				// This could happen for entry point function.
				if (CalleeContext.find(" @ ") == StringRef::npos)
				continue;

				// Infer Caller's frame loc and context ID through string splitting
				StringRef CallerContextId;
				FrameLocation &&CallerLeafFrameLoc =
				getCallerContext(CalleeContext, CallerContextId);

				// It's possible that we haven't seen any sample directly in the caller,
				// in which case CallerProfile will not exist. But we can't modify
				// ProfileMap while iterating it.
				// TODO: created function profile for those callers too
				if (ProfileMap.find(CallerContextId) == ProfileMap.end())
				continue;
				FunctionSamples &CallerProfile = ProfileMap[CallerContextId];

				// Since we don't have call count for inlined functions, we
				// estimate it from inlinee's profile using entry body sample.
				uint64_t EstimatedCallCount = CalleeProfile.getEntrySamples();
				// If we don't have samples with location, use 1 to indicate live.
				if (!EstimatedCallCount && !CalleeProfile.getBodySamples().size())
				EstimatedCallCount = 1;
				CallerProfile.addCalledTargetSamples(
				CallerLeafFrameLoc.second.LineOffset,
				CallerLeafFrameLoc.second.Discriminator, CalleeProfile.getName(),
				EstimatedCallCount);
				updateBodySamplesforFunctionProfile(CallerProfile, CallerLeafFrameLoc,
				EstimatedCallCount);
				}
				}

				} // end namespace sampleprof
				} // end namespace llvm

llvm/tools/llvm-profgen/ProfiledBinary.h

	Show All 18 Lines
	#include "llvm/MC/MCInstPrinter.h"			#include "llvm/MC/MCInstPrinter.h"
	#include "llvm/MC/MCInstrAnalysis.h"			#include "llvm/MC/MCInstrAnalysis.h"
	#include "llvm/MC/MCInstrInfo.h"			#include "llvm/MC/MCInstrInfo.h"
	#include "llvm/MC/MCObjectFileInfo.h"			#include "llvm/MC/MCObjectFileInfo.h"
	#include "llvm/MC/MCRegisterInfo.h"			#include "llvm/MC/MCRegisterInfo.h"
	#include "llvm/MC/MCSubtargetInfo.h"			#include "llvm/MC/MCSubtargetInfo.h"
	#include "llvm/MC/MCTargetOptions.h"			#include "llvm/MC/MCTargetOptions.h"
	#include "llvm/Object/ELFObjectFile.h"			#include "llvm/Object/ELFObjectFile.h"
				#include "llvm/ProfileData/SampleProf.h"
	#include "llvm/Support/Path.h"			#include "llvm/Support/Path.h"
				#include <list>
	#include <set>			#include <set>
				#include <sstream>
	#include <string>			#include <string>
	#include <unordered_map>			#include <unordered_map>
	#include <unordered_set>			#include <unordered_set>
	#include <vector>			#include <vector>

				using namespace llvm;
				using namespace sampleprof;
	using namespace llvm::object;			using namespace llvm::object;

	namespace llvm {			namespace llvm {
	namespace sampleprof {			namespace sampleprof {

	class ProfiledBinary;			class ProfiledBinary;

	struct InstructionPointer {			struct InstructionPointer {
	ProfiledBinary *Binary;			ProfiledBinary *Binary;
	// Offset to the base address of the executable segment of the binary.			union {
	uint64_t Offset;			// Offset of the executable segment of the binary.
				uint64_t Offset = 0;
				// Also used as address in unwinder
				uint64_t Address;
				};
	// Index to the sorted code address array of the binary.			// Index to the sorted code address array of the binary.
	uint64_t Index;			uint64_t Index = 0;
				InstructionPointer(ProfiledBinary *Binary, uint64_t Address,
				bool RoundToNext = false);
				void advance();
				void backward();
				void update(uint64_t Addr);
				};
				wenleiUnsubmitted Done Reply Inline Actions Is operator++ and operator-- used? these two are duplicated with advance/backward, and we only need to keep one set? wenlei: Is operator++ and operator-- used? these two are duplicated with advance/backward, and we only…

				// PrologEpilog offset tracker, used to filter out broken stack samples
				// Currently we use a heuristic size (two) to infer prolog and epilog
				// based on the start address and return address. In the future,
				// we will switch to Dwarf CFI based tracker
				struct PrologEpilogTracker {
				// A set of prolog and epilog offsets. Used by virtual unwinding.
				std::unordered_set<uint64_t> PrologEpilogSet;
				ProfiledBinary *Binary;
				PrologEpilogTracker(ProfiledBinary *Bin) : Binary(Bin){};

	InstructionPointer(ProfiledBinary *Binary, uint64_t Offset)			// Take the two addresses from the start of function as prolog
	: Binary(Binary), Offset(Offset) {			void inferPrologOffsets(
	Index = 0;			std::unordered_map<uint64_t, std::string> &FuncStartAddrMap) {
				for (auto I : FuncStartAddrMap) {
				PrologEpilogSet.insert(I.first);
				InstructionPointer IP(Binary, I.first);
				IP.advance();
				PrologEpilogSet.insert(IP.Offset);
				}
				}

				// Take the last two addresses before the return address as epilog
				void inferEpilogOffsets(std::unordered_set<uint64_t> &RetAddrs) {
				for (auto Addr : RetAddrs) {
				PrologEpilogSet.insert(Addr);
				InstructionPointer IP(Binary, Addr);
				IP.backward();
				PrologEpilogSet.insert(IP.Offset);
				}
	}			}
	};			};

	class ProfiledBinary {			class ProfiledBinary {
	// Absolute path of the binary.			// Absolute path of the binary.
	std::string Path;			std::string Path;
	// The target triple.			// The target triple.
	Triple TheTriple;			Triple TheTriple;
	// The runtime base address that the executable sections are loaded at.			// The runtime base address that the executable sections are loaded at.
	mutable uint64_t BaseAddress = 0;			mutable uint64_t BaseAddress = 0;
	// The preferred base address that the executable sections are loaded at.			// The preferred base address that the executable sections are loaded at.
	uint64_t PreferredBaseAddress = 0;			uint64_t PreferredBaseAddress = 0;
	// Mutiple MC component info			// Mutiple MC component info
	std::unique_ptr<const MCRegisterInfo> MRI;			std::unique_ptr<const MCRegisterInfo> MRI;
	std::unique_ptr<const MCAsmInfo> AsmInfo;			std::unique_ptr<const MCAsmInfo> AsmInfo;
	std::unique_ptr<const MCSubtargetInfo> STI;			std::unique_ptr<const MCSubtargetInfo> STI;
	std::unique_ptr<const MCInstrInfo> MII;			std::unique_ptr<const MCInstrInfo> MII;
	std::unique_ptr<MCDisassembler> DisAsm;			std::unique_ptr<MCDisassembler> DisAsm;
	std::unique_ptr<const MCInstrAnalysis> MIA;			std::unique_ptr<const MCInstrAnalysis> MIA;
	std::unique_ptr<MCInstPrinter> IP;			std::unique_ptr<MCInstPrinter> IPrinter;
	// A list of text sections sorted by start RVA and size. Used to check			// A list of text sections sorted by start RVA and size. Used to check
	// if a given RVA is a valid code address.			// if a given RVA is a valid code address.
	std::set<std::pair<uint64_t, uint64_t>> TextSections;			std::set<std::pair<uint64_t, uint64_t>> TextSections;
	// Function offset to name mapping.			// Function offset to name mapping.
	std::unordered_map<uint64_t, std::string> FuncStartAddrMap;			std::unordered_map<uint64_t, std::string> FuncStartAddrMap;
				// Offset to context location map. Used to expand the context.
				std::unordered_map<uint64_t, FrameLocationStack> Offset2LocStackMap;
	// An array of offsets of all instructions sorted in increasing order. The			// An array of offsets of all instructions sorted in increasing order. The
	// sorting is needed to fast advance to the next forward/backward instruction.			// sorting is needed to fast advance to the next forward/backward instruction.
	std::vector<uint64_t> CodeAddrs;			std::vector<uint64_t> CodeAddrs;
	// A set of call instruction offsets. Used by virtual unwinding.			// A set of call instruction offsets. Used by virtual unwinding.
	std::unordered_set<uint64_t> CallAddrs;			std::unordered_set<uint64_t> CallAddrs;
	// A set of return instruction offsets. Used by virtual unwinding.			// A set of return instruction offsets. Used by virtual unwinding.
	std::unordered_set<uint64_t> RetAddrs;			std::unordered_set<uint64_t> RetAddrs;

				PrologEpilogTracker ProEpilogTracker;

	// The symbolizer used to get inline context for an instruction.			// The symbolizer used to get inline context for an instruction.
	std::unique_ptr<symbolize::LLVMSymbolizer> Symbolizer;			std::unique_ptr<symbolize::LLVMSymbolizer> Symbolizer;

	void setPreferredBaseAddress(const ELFObjectFileBase *O);			void setPreferredBaseAddress(const ELFObjectFileBase *O);

	// Set up disassembler and related components.			// Set up disassembler and related components.
	void setUpDisassembler(const ELFObjectFileBase *Obj);			void setUpDisassembler(const ELFObjectFileBase *Obj);
	void setupSymbolizer();			void setupSymbolizer();

	/// Dissassemble the text section and build various address maps.			/// Dissassemble the text section and build various address maps.
	void disassemble(const ELFObjectFileBase *O);			void disassemble(const ELFObjectFileBase *O);

	/// Helper function to dissassemble the symbol and extract info for unwinding			/// Helper function to dissassemble the symbol and extract info for unwinding
	bool dissassembleSymbol(std::size_t SI, ArrayRef<uint8_t> Bytes,			bool dissassembleSymbol(std::size_t SI, ArrayRef<uint8_t> Bytes,
	SectionSymbolsTy &Symbols, const SectionRef &Section);			SectionSymbolsTy &Symbols, const SectionRef &Section);
	/// Symbolize a given instruction pointer and return a full call context.			/// Symbolize a given instruction pointer and return a full call context.
	FrameLocationStack symbolize(const InstructionPointer &I);			FrameLocationStack symbolize(const InstructionPointer &IP,
				bool UseCanonicalFnName = false);

	/// Decode the interesting parts of the binary and build internal data			/// Decode the interesting parts of the binary and build internal data
	/// structures. On high level, the parts of interest are:			/// structures. On high level, the parts of interest are:
	/// 1. Text sections, including the main code section and the PLT			/// 1. Text sections, including the main code section and the PLT
	/// entries that will be used to handle cross-module call transitions.			/// entries that will be used to handle cross-module call transitions.
	/// 2. The .debug_line section, used by Dwarf-based profile generation.			/// 2. The .debug_line section, used by Dwarf-based profile generation.
	/// 3. Pseudo probe related sections, used by probe-based profile			/// 3. Pseudo probe related sections, used by probe-based profile
	/// generation.			/// generation.
	void load();			void load();
				const FrameLocationStack &getFrameLocationStack(uint64_t Offset) const {
				auto I = Offset2LocStackMap.find(Offset);
				assert(I != Offset2LocStackMap.end() &&
				"Can't find location for offset in the binary");
				return I->second;
				}

	public:			public:
	ProfiledBinary(StringRef Path) : Path(Path) {			ProfiledBinary(StringRef Path) : Path(Path), ProEpilogTracker(this) {
	setupSymbolizer();			setupSymbolizer();
	load();			load();
	}			}
				uint64_t virtualAddrToOffset(uint64_t VitualAddress) const {
				return VitualAddress - BaseAddress;
				}
				uint64_t offsetToVirtualAddr(uint64_t Offset) const {
				return Offset + BaseAddress;
				}
	const StringRef getPath() const { return Path; }			const StringRef getPath() const { return Path; }
	const StringRef getName() const { return llvm::sys::path::filename(Path); }			const StringRef getName() const { return llvm::sys::path::filename(Path); }
	uint64_t getBaseAddress() const { return BaseAddress; }			uint64_t getBaseAddress() const { return BaseAddress; }
	void setBaseAddress(uint64_t Address) { BaseAddress = Address; }			void setBaseAddress(uint64_t Address) { BaseAddress = Address; }
				uint64_t getPreferredBaseAddress() const { return PreferredBaseAddress; }

				bool addressIsCode(uint64_t Address) const {
				uint64_t Offset = virtualAddrToOffset(Address);
				return Offset2LocStackMap.find(Offset) != Offset2LocStackMap.end();
				wenleiUnsubmitted Done Reply Inline Actions return a const ref, or StringRef since FuncStartAddrMap owns the string? wenlei: return a const ref, or StringRef since FuncStartAddrMap owns the string?
				}
				bool addressIsCall(uint64_t Address) const {
				uint64_t Offset = virtualAddrToOffset(Address);
				return CallAddrs.count(Offset);
				}
				hoyUnsubmitted Done Reply Inline Actions Can this just return a reference so that you don't need to use the move semantics when it gets called? hoy: Can this just return a reference so that you don't need to use the move semantics when it gets…
				bool addressIsReturn(uint64_t Address) const {
				uint64_t Offset = virtualAddrToOffset(Address);
				return RetAddrs.count(Offset);
				}
				bool addressInPrologEpilog(uint64_t Address) const {
				uint64_t Offset = virtualAddrToOffset(Address);
				return ProEpilogTracker.PrologEpilogSet.count(Offset);
				}

				hoyUnsubmitted Done Reply Inline Actions Nit: Get the full context string for the given call stack with inline context filled in? hoy: Nit: Get the full context string for the given call stack with inline context filled in?
				uint64_t getAddressforIndex(uint64_t Index) const {
				return offsetToVirtualAddr(CodeAddrs[Index]);
				}

				// Get the index in CodeAddrs for the address
				// As we might get an address which is not the code
				// here it would round to the next valid code address by
				// using lower bound operation
				uint32_t getIndexForAddr(uint64_t Address) const {
				uint64_t Offset = virtualAddrToOffset(Address);
				auto Low = std::lower_bound(CodeAddrs.begin(), CodeAddrs.end(), Offset);
				return Low - CodeAddrs.begin();
				}

				uint64_t getCallAddrFromFrameAddr(uint64_t FrameAddr) const {
				return getAddressforIndex(getIndexForAddr(FrameAddr) - 1);
				}

				StringRef getFuncFromStartOffset(uint64_t Offset) {
				return FuncStartAddrMap[Offset];
				}

				const FrameLocation &getInlineLeafFrameLoc(uint64_t Offset,
				bool NameOnly = false) {
				return getFrameLocationStack(Offset).back();
				}

				// Compare two addresses' inline context
				bool inlineContextEqual(uint64_t Add1, uint64_t Add2) const;

				// Get the context string of the current stack with inline context filled in.
				// It will search the disassembling info stored in Offset2LocStackMap. This is
				// used as the key of function sample map
				std::string getExpandedContextStr(const std::list<uint64_t> &stack) const;
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for parameter 'stack' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for parameter 'stack' [readability-identifier-naming]…
	};			};

	} // end namespace sampleprof			} // end namespace sampleprof
	} // end namespace llvm			} // end namespace llvm

	#endif			#endif

llvm/tools/llvm-profgen/ProfiledBinary.cpp

Show All 12 Lines
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Format.h"		#include "llvm/Support/Format.h"
#include "llvm/Support/TargetRegistry.h"		#include "llvm/Support/TargetRegistry.h"
#include "llvm/Support/TargetSelect.h"		#include "llvm/Support/TargetSelect.h"

#define DEBUG_TYPE "load-binary"		#define DEBUG_TYPE "load-binary"

using namespace llvm;		using namespace llvm;
		using namespace sampleprof;

static cl::opt<bool> ShowDisassembly("show-disassembly", cl::ReallyHidden,		static cl::opt<bool> ShowDisassembly("show-disassembly", cl::ReallyHidden,
cl::init(false), cl::ZeroOrMore,		cl::init(false), cl::ZeroOrMore,
cl::desc("Print disassembled code."));		cl::desc("Print disassembled code."));

static cl::opt<bool> ShowSourceLocations("show-source-locations",		static cl::opt<bool> ShowSourceLocations("show-source-locations",
cl::ReallyHidden, cl::init(false),		cl::ReallyHidden, cl::init(false),
cl::ZeroOrMore,		cl::ZeroOrMore,
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	void ProfiledBinary::load() {
LLVM_DEBUG(dbgs() << "Loading " << Path << "\n");		LLVM_DEBUG(dbgs() << "Loading " << Path << "\n");

// Find the preferred base address for text sections.		// Find the preferred base address for text sections.
setPreferredBaseAddress(Obj);		setPreferredBaseAddress(Obj);

// Disassemble the text sections.		// Disassemble the text sections.
disassemble(Obj);		disassemble(Obj);

		// Use function start and return address to infer prolog and epilog
		ProEpilogTracker.inferPrologOffsets(FuncStartAddrMap);
		ProEpilogTracker.inferEpilogOffsets(RetAddrs);

// TODO: decode other sections.		// TODO: decode other sections.

return;		return;
}		}

		bool ProfiledBinary::inlineContextEqual(uint64_t Address1,
		uint64_t Address2) const {
		uint64_t Offset1 = virtualAddrToOffset(Address1);
		uint64_t Offset2 = virtualAddrToOffset(Address2);
		const FrameLocationStack &Context1 = getFrameLocationStack(Offset1);
		const FrameLocationStack &Context2 = getFrameLocationStack(Offset2);
		if (Context1.size() != Context2.size())
		return false;

		// The leaf frame contains location within the leaf, and it
		// needs to be remove that as it's not part of the calling context
		return std::equal(Context1.begin(), Context1.begin() + Context1.size() - 1,
		Context2.begin(), Context2.begin() + Context2.size() - 1);
		}

		std::string
		hoyUnsubmitted Done Reply Inline Actions Nit: `std::equal(Context1.begin(), Context1.begin() + Context1.size() - 1, Context2.begin(), Context2.begin() + Context2.size() - 1)` hoy: Nit: `std::equal(Context1.begin(), Context1.begin() + Context1.size() - 1, Context2.begin()…
		ProfiledBinary::getExpandedContextStr(const std::list<uint64_t> &Stack) const {
		std::string ContextStr;
		SmallVector<std::string, 8> ContextVec;
		// Process from frame root to leaf
		for (auto Iter = Stack.rbegin(); Iter != Stack.rend(); Iter++) {
		uint64_t Offset = virtualAddrToOffset(*Iter);
		const FrameLocationStack &ExpandedContext = getFrameLocationStack(Offset);
		for (const auto &Loc : ExpandedContext) {
		ContextVec.push_back(getCallSite(Loc));
		}
		}

		assert(ContextVec.size() && "Context length should be at least 1");

		std::ostringstream OContextStr;
		hoyUnsubmitted Not Done Reply Inline Actions Nit: remove the check and add it back with the compression work. hoy: Nit: remove the check and add it back with the compression work.
		for (uint32_t I = 0; I < (uint32_t)ContextVec.size(); I++) {
		if (OContextStr.str().size()) {
		OContextStr << " @ ";
		}

		if (I == ContextVec.size() - 1) {
		// Only keep the function name for the leaf frame
		StringRef Ref(ContextVec[I]);
		OContextStr << Ref.split(":").first.str();
		} else {
		OContextStr << ContextVec[I];
		hoyUnsubmitted Done Reply Inline Actions `RemoveLeaf` should be checked here? hoy: `RemoveLeaf` should be checked here?
		}
		}

		return OContextStr.str();
		}

void ProfiledBinary::setPreferredBaseAddress(const ELFObjectFileBase *Obj) {		void ProfiledBinary::setPreferredBaseAddress(const ELFObjectFileBase *Obj) {
for (section_iterator SI = Obj->section_begin(), SE = Obj->section_end();		for (section_iterator SI = Obj->section_begin(), SE = Obj->section_end();
SI != SE; ++SI) {		SI != SE; ++SI) {
const SectionRef &Section = *SI;		const SectionRef &Section = *SI;
if (Section.isText()) {		if (Section.isText()) {
PreferredBaseAddress = getELFImageLMAForSec(Section);		PreferredBaseAddress = getELFImageLMAForSec(Section);
return;		return;
}		}
Show All 26 Lines	while (Offset < EndOffset) {
// Disassemble an instruction.		// Disassemble an instruction.
if (!DisAsm->getInstruction(Inst, Size, Bytes.slice(Offset - SectionOffset),		if (!DisAsm->getInstruction(Inst, Size, Bytes.slice(Offset - SectionOffset),
Offset + PreferredBaseAddress, nulls()))		Offset + PreferredBaseAddress, nulls()))
return false;		return false;

if (ShowDisassembly) {		if (ShowDisassembly) {
outs() << format("%8" PRIx64 ":", Offset);		outs() << format("%8" PRIx64 ":", Offset);
size_t Start = outs().tell();		size_t Start = outs().tell();
IP->printInst(&Inst, Offset + Size, "", *STI.get(), outs());		IPrinter->printInst(&Inst, Offset + Size, "", *STI.get(), outs());
if (ShowSourceLocations) {		if (ShowSourceLocations) {
unsigned Cur = outs().tell() - Start;		unsigned Cur = outs().tell() - Start;
if (Cur < 40)		if (Cur < 40)
outs().indent(40 - Cur);		outs().indent(40 - Cur);
InstructionPointer Inst(this, Offset);		InstructionPointer Inst(this, Offset);
outs() << getReversedLocWithContext(symbolize(Inst));		outs() << getReversedLocWithContext(symbolize(Inst));
}		}
outs() << "\n";		outs() << "\n";
}		}

const MCInstrDesc &MCDesc = MII->get(Inst.getOpcode());		const MCInstrDesc &MCDesc = MII->get(Inst.getOpcode());

		// Populate a vector of the symbolized callsite at this location
		InstructionPointer IP(this, Offset);
		Offset2LocStackMap[Offset] = symbolize(IP, true);

// Populate address maps.		// Populate address maps.
CodeAddrs.push_back(Offset);		CodeAddrs.push_back(Offset);
if (MCDesc.isCall())		if (MCDesc.isCall())
CallAddrs.insert(Offset);		CallAddrs.insert(Offset);
else if (MCDesc.isReturn())		else if (MCDesc.isReturn())
RetAddrs.insert(Offset);		RetAddrs.insert(Offset);

Offset += Size;		Offset += Size;
Show All 35 Lines	void ProfiledBinary::setUpDisassembler(const ELFObjectFileBase *Obj) {
MOFI.InitMCObjectFileInfo(Triple(TripleName), false, Ctx);		MOFI.InitMCObjectFileInfo(Triple(TripleName), false, Ctx);
DisAsm.reset(TheTarget->createMCDisassembler(*STI, Ctx));		DisAsm.reset(TheTarget->createMCDisassembler(*STI, Ctx));
if (!DisAsm)		if (!DisAsm)
exitWithError("no disassembler for target " + TripleName, FileName);		exitWithError("no disassembler for target " + TripleName, FileName);

MIA.reset(TheTarget->createMCInstrAnalysis(MII.get()));		MIA.reset(TheTarget->createMCInstrAnalysis(MII.get()));

int AsmPrinterVariant = AsmInfo->getAssemblerDialect();		int AsmPrinterVariant = AsmInfo->getAssemblerDialect();
IP.reset(TheTarget->createMCInstPrinter(Triple(TripleName), AsmPrinterVariant,		IPrinter.reset(TheTarget->createMCInstPrinter(
AsmInfo, MII, *MRI));		Triple(TripleName), AsmPrinterVariant, AsmInfo, MII, *MRI));
IP->setPrintBranchImmAsAddress(true);		IPrinter->setPrintBranchImmAsAddress(true);
}		}

void ProfiledBinary::disassemble(const ELFObjectFileBase *Obj) {		void ProfiledBinary::disassemble(const ELFObjectFileBase *Obj) {
// Set up disassembler and related components.		// Set up disassembler and related components.
setUpDisassembler(Obj);		setUpDisassembler(Obj);

// Create a mapping from virtual address to symbol name. The symbols in text		// Create a mapping from virtual address to symbol name. The symbols in text
// sections are the candidates to dissassemble.		// sections are the candidates to dissassemble.
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	ArrayRef<uint8_t> Bytes =
arrayRefFromStringRef(unwrapOrError(Section.getContents(), FileName));		arrayRefFromStringRef(unwrapOrError(Section.getContents(), FileName));

// Get the list of all the symbols in this section.		// Get the list of all the symbols in this section.
SectionSymbolsTy &Symbols = AllSymbols[Section];		SectionSymbolsTy &Symbols = AllSymbols[Section];

// Disassemble symbol by symbol.		// Disassemble symbol by symbol.
for (std::size_t SI = 0, SE = Symbols.size(); SI != SE; ++SI) {		for (std::size_t SI = 0, SE = Symbols.size(); SI != SE; ++SI) {
if (!dissassembleSymbol(SI, Bytes, Symbols, Section))		if (!dissassembleSymbol(SI, Bytes, Symbols, Section))
exitWithError("disassembling error", FileName);		exitWithError("disassembling error", FileName);
		hoyUnsubmitted Not Done Reply Inline Actions Please comment here and below that the prolog/epilog built here is based on an estimated size. Dwarf decoding is needed for a building precise prolog/epilog. Also how about separating the code to form a prolog/epilog builder that can be based on `FuncStartAddrMap`? The builder will be based on Dwarf CFI in the future. hoy: Please comment here and below that the prolog/epilog built here is based on an estimated size.
		wenleiUnsubmitted Not Done Reply Inline Actions Agree it's cleaner to decouple from the main disasm loop. For now a separate function `trackPrologEpilog` should be enough. (btw, if we need to extend to a class in the future, tracker may be better name than builder..) wenlei: Agree it's cleaner to decouple from the main disasm loop. For now a separate function…
}		}
}		}
}		}

void ProfiledBinary::setupSymbolizer() {		void ProfiledBinary::setupSymbolizer() {
symbolize::LLVMSymbolizer::Options SymbolizerOpts;		symbolize::LLVMSymbolizer::Options SymbolizerOpts;
SymbolizerOpts.PrintFunctions =		SymbolizerOpts.PrintFunctions =
DILineInfoSpecifier::FunctionNameKind::LinkageName;		DILineInfoSpecifier::FunctionNameKind::LinkageName;
SymbolizerOpts.Demangle = false;		SymbolizerOpts.Demangle = false;
SymbolizerOpts.DefaultArch = TheTriple.getArchName().str();		SymbolizerOpts.DefaultArch = TheTriple.getArchName().str();
SymbolizerOpts.UseSymbolTable = false;		SymbolizerOpts.UseSymbolTable = false;
SymbolizerOpts.RelativeAddresses = false;		SymbolizerOpts.RelativeAddresses = false;
Symbolizer = std::make_unique<symbolize::LLVMSymbolizer>(SymbolizerOpts);		Symbolizer = std::make_unique<symbolize::LLVMSymbolizer>(SymbolizerOpts);
}		}

FrameLocationStack ProfiledBinary::symbolize(const InstructionPointer &IP) {		FrameLocationStack ProfiledBinary::symbolize(const InstructionPointer &IP,
		bool UseCanonicalFnName) {
assert(this == IP.Binary &&		assert(this == IP.Binary &&
"Binary should only symbolize its own instruction");		"Binary should only symbolize its own instruction");
auto Addr = object::SectionedAddress{IP.Offset + PreferredBaseAddress,		auto Addr = object::SectionedAddress{IP.Offset + PreferredBaseAddress,
object::SectionedAddress::UndefSection};		object::SectionedAddress::UndefSection};
DIInliningInfo InlineStack =		DIInliningInfo InlineStack =
unwrapOrError(Symbolizer->symbolizeInlinedCode(Path, Addr), getName());		unwrapOrError(Symbolizer->symbolizeInlinedCode(Path, Addr), getName());

FrameLocationStack CallStack;		FrameLocationStack CallStack;

for (int32_t I = InlineStack.getNumberOfFrames() - 1; I >= 0; I--) {		for (int32_t I = InlineStack.getNumberOfFrames() - 1; I >= 0; I--) {
const auto &CallerFrame = InlineStack.getFrame(I);		const auto &CallerFrame = InlineStack.getFrame(I);
if (CallerFrame.FunctionName == "<invalid>")		if (CallerFrame.FunctionName == "<invalid>")
break;		break;
		StringRef FunctionName(CallerFrame.FunctionName);
		if (UseCanonicalFnName)
		FunctionName = FunctionSamples::getCanonicalFnName(FunctionName);
LineLocation Line(CallerFrame.Line - CallerFrame.StartLine,		LineLocation Line(CallerFrame.Line - CallerFrame.StartLine,
CallerFrame.Discriminator);		CallerFrame.Discriminator);
FrameLocation Callsite(CallerFrame.FunctionName, Line);		FrameLocation Callsite(FunctionName.str(), Line);
CallStack.push_back(Callsite);		CallStack.push_back(Callsite);
}		}

return CallStack;		return CallStack;
}		}

		InstructionPointer::InstructionPointer(ProfiledBinary *Binary, uint64_t Address,
		bool RoundToNext)
		: Binary(Binary), Address(Address) {
		Index = Binary->getIndexForAddr(Address);
		if (RoundToNext) {
		// we might get address which is not the code
		// it should round to the next valid address
		this->Address = Binary->getAddressforIndex(Index);
		}
		}

		void InstructionPointer::advance() {
		Index++;
		Address = Binary->getAddressforIndex(Index);
		}

		void InstructionPointer::backward() {
		Index--;
		Address = Binary->getAddressforIndex(Index);
		}

		void InstructionPointer::update(uint64_t Addr) {
		Address = Addr;
		Index = Binary->getIndexForAddr(Address);
		}

} // end namespace sampleprof		} // end namespace sampleprof
} // end namespace llvm		} // end namespace llvm

llvm/tools/llvm-profgen/llvm-profgen.cpp

	//===- llvm-profgen.cpp - LLVM SPGO profile generation tool ---------------===//			//===- llvm-profgen.cpp - LLVM SPGO profile generation tool ------ C++ --===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// llvm-profgen generates SPGO profiles from perf script ouput.			// llvm-profgen generates SPGO profiles from perf script ouput.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "ErrorHandling.h"			#include "ErrorHandling.h"
	#include "PerfReader.h"			#include "PerfReader.h"
				#include "ProfileGenerator.h"
	#include "ProfiledBinary.h"			#include "ProfiledBinary.h"
	#include "llvm/Support/CommandLine.h"			#include "llvm/Support/CommandLine.h"
	#include "llvm/Support/InitLLVM.h"			#include "llvm/Support/InitLLVM.h"
	#include "llvm/Support/TargetSelect.h"			#include "llvm/Support/TargetSelect.h"

	static cl::list<std::string> PerfTraceFilenames(			static cl::list<std::string> PerfTraceFilenames(
	"perfscript", cl::value_desc("perfscript"), cl::OneOrMore,			"perfscript", cl::value_desc("perfscript"), cl::OneOrMore,
	llvm::cl::MiscFlags::CommaSeparated,			llvm::cl::MiscFlags::CommaSeparated,
	cl::desc("Path of perf-script trace created by Linux perf tool with "			cl::desc("Path of perf-script trace created by Linux perf tool with "
	"`script` command(the raw perf.data should be profiled with -b)"));			"`script` command(the raw perf.data should be profiled with -b)"));

	static cl::list<std::string>			static cl::list<std::string>
	BinaryFilenames("binary", cl::value_desc("binary"), cl::OneOrMore,			BinaryFilenames("binary", cl::value_desc("binary"), cl::OneOrMore,
	llvm::cl::MiscFlags::CommaSeparated,			llvm::cl::MiscFlags::CommaSeparated,
	cl::desc("Path of profiled binary files"));			cl::desc("Path of profiled binary files"));

	static cl::opt<std::string> OutputFilename("output", cl::value_desc("output"),
	cl::Required,
	cl::desc("Output profile file"));

	using namespace llvm;			using namespace llvm;
	using namespace sampleprof;			using namespace sampleprof;

	int main(int argc, const char *argv[]) {			int main(int argc, const char *argv[]) {
	InitLLVM X(argc, argv);			InitLLVM X(argc, argv);

	cl::ParseCommandLineOptions(argc, argv, "llvm SPGO profile generator\n");			cl::ParseCommandLineOptions(argc, argv, "llvm SPGO profile generator\n");

	// Initialize targets and assembly printers/parsers.			// Initialize targets and assembly printers/parsers.
	InitializeAllTargetInfos();			InitializeAllTargetInfos();
	InitializeAllTargetMCs();			InitializeAllTargetMCs();
	InitializeAllDisassemblers();			InitializeAllDisassemblers();

	// Load binaries and parse perf events and samples			// Load binaries and parse perf events and samples
	PerfReader Reader(BinaryFilenames);			PerfReader Reader(BinaryFilenames);
	Reader.parsePerfTraces(PerfTraceFilenames);			Reader.parsePerfTraces(PerfTraceFilenames);

				std::unique_ptr<ProfileGenerator> Generator = ProfileGenerator::create(
				Reader.getBinarySampleCounters(), Reader.getPerfScriptType());
				Generator->generateProfile();
				Generator->write();

	return EXIT_SUCCESS;			return EXIT_SUCCESS;
				wenleiUnsubmitted Done Reply Inline Actions If we let ProfileGenerator be the driver, I think we should also let ProfileGenerator initiate the perf loading (line 38); otherwise if we intend to decouple them, and let PerfReader read profile outside of ProfileGenerator, then it's better only pass the loaded profile to ProfileGenerator for cleaner separation. wenlei: If we let ProfileGenerator be the driver, I think we should also let ProfileGenerator initiate…
				wleiAuthorUnsubmitted Done Reply Inline Actions Good suggestion, change to not include PerfReader in ProfileGenerator, then I also decoupled the unwinder from the reader. for the unwinder, the input is the aggregated hybrid sample, the output is the sample counters which is later forwarded to the generator. wlei: Good suggestion, change to not include PerfReader in ProfileGenerator, then I also decoupled…
				hoyUnsubmitted Done Reply Inline Actions Perhaps it's better to include the unwinder in the reader since this driver will also handle non-CS profiles in future. The dataflow from the reader to the profile generator may need a flexible definition (currently is `Unwinder.getSampleCounters()`) for future extension. hoy: Perhaps it's better to include the unwinder in the reader since this driver will also handle…
				wenleiUnsubmitted Done Reply Inline Actions Agreed that unwinder better be driven by PerfReader since unwinder is something PerfReader depends on directly (vs depending on its output like ProfileGenerator on PerfReader's output). wenlei: Agreed that unwinder better be driven by PerfReader since unwinder is something PerfReader…
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[CSSPGO][llvm-profgen] Context-sensitive profile data generation
ClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 310002

llvm/docs/CommandGuide/llvm-profgen.rst

llvm/include/llvm/ProfileData/SampleProf.h

llvm/lib/ProfileData/SampleProfWriter.cpp

llvm/test/tools/llvm-profgen/Inputs/inline-cs-noprobe.perfbin

llvm/test/tools/llvm-profgen/Inputs/inline-cs-noprobe.perfscript

llvm/test/tools/llvm-profgen/Inputs/noinline-cs-noprobe.perfbin

llvm/test/tools/llvm-profgen/Inputs/noinline-cs-noprobe.perfscript

llvm/test/tools/llvm-profgen/inline-cs-noprobe.test

llvm/test/tools/llvm-profgen/noinline-cs-noprobe.test

llvm/tools/llvm-profgen/CMakeLists.txt

llvm/tools/llvm-profgen/PerfReader.h

llvm/tools/llvm-profgen/PerfReader.cpp

llvm/tools/llvm-profgen/ProfileGenerator.h

llvm/tools/llvm-profgen/ProfileGenerator.cpp

llvm/tools/llvm-profgen/ProfiledBinary.h

llvm/tools/llvm-profgen/ProfiledBinary.cpp

llvm/tools/llvm-profgen/llvm-profgen.cpp

Unhandled Exception ("Exception")

Unhandled Exception ("Exception")

This is an archive of the discontinued LLVM Phabricator instance.

[CSSPGO][llvm-profgen] Context-sensitive profile data generationClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 310002

llvm/docs/CommandGuide/llvm-profgen.rst

llvm/include/llvm/ProfileData/SampleProf.h

llvm/lib/ProfileData/SampleProfWriter.cpp

llvm/test/tools/llvm-profgen/Inputs/inline-cs-noprobe.perfbin

llvm/test/tools/llvm-profgen/Inputs/inline-cs-noprobe.perfscript

llvm/test/tools/llvm-profgen/Inputs/noinline-cs-noprobe.perfbin

llvm/test/tools/llvm-profgen/Inputs/noinline-cs-noprobe.perfscript

llvm/test/tools/llvm-profgen/inline-cs-noprobe.test

llvm/test/tools/llvm-profgen/noinline-cs-noprobe.test

llvm/tools/llvm-profgen/CMakeLists.txt

llvm/tools/llvm-profgen/PerfReader.h

llvm/tools/llvm-profgen/PerfReader.cpp

llvm/tools/llvm-profgen/ProfileGenerator.h

llvm/tools/llvm-profgen/ProfileGenerator.cpp

llvm/tools/llvm-profgen/ProfiledBinary.h

llvm/tools/llvm-profgen/ProfiledBinary.cpp

llvm/tools/llvm-profgen/llvm-profgen.cpp

[CSSPGO][llvm-profgen] Context-sensitive profile data generation
ClosedPublic