This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
docs/CommandGuide/
-
CommandGuide/
-
llvm-profgen.rst
-
include/llvm/ProfileData/
-
llvm/
-
ProfileData/
4/4
SampleProf.h
-
lib/ProfileData/
-
ProfileData/
-
SampleProf.cpp
-
SampleProfWriter.cpp
-
test/tools/llvm-profgen/
-
tools/
-
llvm-profgen/
-
Inputs/
-
inline-cs-noprobe.perfbin
-
inline-cs-noprobe.perfscript
-
noinline-cs-noprobe.perfbin
1/4
noinline-cs-noprobe.perfscript
4/4
inline-cs-noprobe.test
-
mmapEvent.test
7
noinline-cs-noprobe.test
-
tools/llvm-profgen/
-
llvm-profgen/
-
CMakeLists.txt
-
CallContext.h
-
LLVMBuild.txt
21/44
PerfReader.h
23/28
PerfReader.cpp
2/4
ProfileGenerator.h
15/21
ProfileGenerator.cpp
4/4
ProfiledBinary.h
2/5
ProfiledBinary.cpp
4/4
llvm-profgen.cpp

Differential D89723

[CSSPGO][llvm-profgen] Context-sensitive profile data generation
ClosedPublic

Authored by wlei on Oct 19 2020, 12:59 PM.

Download Raw Diff

Details

Reviewers

hoy
wenlei
wmi
davidxl

Commits

rG1f05b1a9f527: [CSSPGO][llvm-profgen] Context-sensitive profile data generation

Summary

This stack of changes introduces llvm-profgen utility which generates a profile data file from given perf script data files for sample-based PGO. It’s part of(not only) the CSSPGO work. Specifically to support context-sensitive with/without pseudo probe profile, it implements a series of functionalities including perf trace parsing, instruction symbolization, LBR stack/call frame stack unwinding, pseudo probe decoding, etc. Also high throughput is achieved by multiple levels of sample aggregation and compatible format with one stop is generated at the end. Please refer to: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s for the CSSPGO RFC.

This change supports context-sensitive profile data generation into llvm-profgen. With simultaneous sampling for LBR and call stack, we can identify leaf of LBR sample with calling context from stack sample . During the process of deriving fall through path from LBR entries, we unwind LBR by replaying all the calls and returns (including implicit calls/returns due to inlining) backwards on top of the sampled call stack. Then the state of call stack as we unwind through LBR always represents the calling context of current fall through path.

we have two types of virtual unwinding 1) LBR unwinding and 2) linear range unwinding.
Specifically, for each LBR entry which can be classified into call, return, regular branch, LBR unwinding will replay the operation by pushing, popping or switching leaf frame towards the call stack and since the initial call stack is most recently sampled, the replay should be in anti-execution order, i.e. for the regular case, pop the call stack when LBR is call, push frame on call stack when LBR is return. After each LBR processed, it also needs to align with the next LBR by going through instructions from previous LBR's target to current LBR's source, which we named linear unwinding. As instruction from linear range can come from different function by inlining, linear unwinding will do the range splitting and record counters through the range with same inline context.

With each fall through path from LBR unwinding, we aggregate each sample into counters by the calling context and eventually generate full context sensitive profile (without relying on inlining) to driver compiler's PGO/FDO.

A breakdown of noteworthy changes:

Added HybridSample class as the abstraction perf sample including LBR stack and call stack
Extended PerfReader to implement auto-detect whether input perf script output contains CS profile, then do the parsing. Multiple HybridSample are extracted
Speed up by aggregating HybridSample into AggregatedSamples
Added VirtualUnwinder that consumes aggregated HybridSample and implements unwinding of calls, returns, and linear path that contains implicit call/return from inlining. Ranges and branches counters are aggregated by the calling context.  Here calling context is string type, each context is a pair of function name and callsite location info, the whole context is like main:1 @ foo:2 @ bar.
Added PorfileGenerater that accumulates counters by ranges unfolding or branch target mapping, then generates context-sensitive function profile including function body, inferring callee's head sample, callsite target samples, eventually records into ProfileMap. 
Leveraged LLVM build-in(SampleProfWriter) writer to support different serialization format with no stop
getCanonicalFnName for callee name and name from ELF section
Added regression test for both unwinding and profile generation

Test Plan:
ninja & ninja check-llvm

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

hoy added inline comments.Nov 5 2020, 1:50 PM

llvm/tools/llvm-profgen/PerfReader.h
182	The comment could be retired since we have a tail call tracker coming that tracks both in-LBR tail calls and out-of-LBR tail calls universally.
llvm/tools/llvm-profgen/llvm-profgen.cpp
45–60	Perhaps it's better to include the unwinder in the reader since this driver will also handle non-CS profiles in future. The dataflow from the reader to the profile generator may need a flexible definition (currently is `Unwinder.getSampleCounters()`) for future extension.

wenlei added inline comments.Nov 5 2020, 3:06 PM

llvm/tools/llvm-profgen/PerfReader.h
182	I think the comment needs to be updated, but explanation here is still needed because IIUC missing frame inference happens more like a post process (hence somewhat orthogonal), and here `isCallState` decides the unwind operation on the stack sample (not changed by frame inference) which will always miss tail call frame (unless dwarf stack walking is used by perf).
llvm/tools/llvm-profgen/llvm-profgen.cpp
45–60	Agreed that unwinder better be driven by PerfReader since unwinder is something PerfReader depends on directly (vs depending on its output like ProfileGenerator on PerfReader's output).

move unwinder into PerfReader
use a BinarytoSampleCounter map to group sample counters by binary
add PrologEpilog tracker
support to use getCanonicalFnName for ELF Section based symbol name
fix a negative line offset bug
other refactoring work

Harbormaster completed remote builds in B79015: Diff 305608.Nov 16 2020, 3:06 PM

rebase

Harbormaster completed remote builds in B79035: Diff 305634.Nov 16 2020, 5:56 PM

hoy added inline comments.Nov 17 2020, 4:40 PM

llvm/tools/llvm-profgen/PerfReader.cpp
486	`PerfType` should be defined on the `else` branch if it is not initialized anywhere else.
llvm/tools/llvm-profgen/PerfReader.h
265	Should this be rewritten with a stream-based file reader as done in D89707?

Address reviewer's feedback on PerfType definition

Harbormaster completed remote builds in B79342: Diff 306182.Nov 18 2020, 12:07 PM

wlei added inline comments.Nov 18 2020, 12:07 PM

llvm/tools/llvm-profgen/PerfReader.h
265	I guess you mean to keep consistent to other part of code? Here you see it only read 4000bytes data from the file(`getFileOrSTDIN(FileName, 4000);`), so there shouldn't have memory issue. Currently stream-based liner only support read one line at a time, it need to search line by line, which would be slower than searching in the whole 4k memory. So which one do you prefer?

hoy added inline comments.Nov 18 2020, 4:24 PM

llvm/tools/llvm-profgen/PerfReader.h
265	I see. The current implementation looks good to me.

[NFC]rebase

Harbormaster completed remote builds in B79396: Diff 306285.Nov 18 2020, 6:56 PM

wmi added inline comments.Nov 19 2020, 11:22 AM

llvm/tools/llvm-profgen/PerfReader.h
214–215	This virtual unwinder is not doing the classic unwinding thing. It is walking through the LBR stack of a LBR sample, based on the sample's callstack, and infer the callstack for each address range covered by the LBR sample. The comment can be more clear about it.
llvm/tools/llvm-profgen/ProfileGenerator.cpp
63–71	Why there is region [A, B]: 300, but B: (0, 100) only has 100 sample count?
263	Conext --> Context?

wenlei mentioned this in D90125: [CSSPGO] Infrastructure for context-sensitive Sample PGO and Inlining.Nov 20 2020, 9:53 AM

add more comments for unwinder and BoundaryPoint
remove Skylake only LBR duplication filter

Harbormaster completed remote builds in B79630: Diff 306726.Nov 20 2020, 10:03 AM

wlei marked 43 inline comments as done.Nov 20 2020, 10:07 AM

wlei added inline comments.

llvm/tools/llvm-profgen/PerfReader.h
214–215	Thanks for your suggestion, more comments are added.
llvm/tools/llvm-profgen/ProfileGenerator.cpp
63–71	Sorry for the confusion. See the graph below, here B:(0, 100) is the boundary point, 0 means no samples begin at B, 100 means one sample(sample1) ends at B whose count is 100. I changed the explanation in the comment, see whether it's clear or not. \|<--100-->\| Sample1 \|<------200------>\| Sample2 A B C

hoy added inline comments.Nov 20 2020, 10:50 AM

llvm/tools/llvm-profgen/PerfReader.cpp
26	Nit: please add a TODO here to check if `Source` is in prolog/epilog using precise prolog/epilog table.
llvm/tools/llvm-profgen/ProfileGenerator.cpp
48	I'm wondering if a separate profile file should be output for each binary. Since the samples are already separated for binaries via `BinarySampleCounters`, `ProfileMap` can be made like that too.
llvm/tools/llvm-profgen/ProfiledBinary.cpp
183	Nit: remove the check and add it back with the compression work.

wenlei added inline comments.Nov 20 2020, 10:55 AM

llvm/test/tools/llvm-profgen/Inputs/noinline-cs-noprobe.perfscript
3	I think we also need to support cases where PERF_RECORD_MMAP2 event isn't available, in which case we just use preferred load address from ELF header. Can you add a test case that doesn't have PERF_RECORD_MMAP2? Looks like currently we would just proceed with parsing without a base address set?
llvm/tools/llvm-profgen/PerfReader.cpp
484–487	What would be the workflow for (non-CS) AutoFDO with this new implementation? It looks like `parseTrace` is responsible for aggregation only, then even for AutoFDO, there'll be a post-process after that, to get range:count, right? so it looks to me that a unified workflow could be something like this? for (auto Filename : PerfTraceFilenames) parseAndAggregateTrace(Filename); generateRawProfile(); In side `generateRawProfile`, we would do simple range overlap computation for AutoFDO, or unwind for CSSPGO. Also see comments on `AggregationCounter` - in addition to unifying the workflow, it would be good to unify data structure as well if possible. What do you think?
llvm/tools/llvm-profgen/PerfReader.h
211	The idea of aggregation applies to (non-CS) AutoFDO too. It'd be good to put infrastructure in place that can cover both AutoFDO and CSSPGO in a generic way. Perhaps we can treat non-CS AutoFDO profile (or regular LBR perf profile) just like a hybrid profile except stack part is always empty? Is that what you have in mind?

wlei added inline comments.Nov 20 2020, 11:47 AM

llvm/tools/llvm-profgen/ProfileGenerator.cpp
48	Yeah, it's doable. but that needs more CL design, currently we only support one output file, so we have to change supporting multiple output files which also need an exact one-one mapping to the binary. So we can use `OutputFilenames` to receives multiple output files and match them in order on the command line? or I'm also thinking we just remain this and if the user really need to separate the output for binary, they could call the tool multiple times with different input binary. any suggestions on the command?

wlei added inline comments.Nov 20 2020, 1:14 PM

llvm/test/tools/llvm-profgen/Inputs/noinline-cs-noprobe.perfscript
3	Yeah, currently PERF_RECORD_MMAP2 is required. The problem using preferred load address for non-mmap event is one perf address might belong to multiple binaries, which will mess up the whole process. Also we need to one more perftrace scan to confirm there is no mmap2 event so that we can switch to use preferred address. or we can have a switch like "--no-mmp2-events" to explicitly tell the tool use preferred address, also only support one binary under this switch. or we need some info in the perf trace tell which binary it belong to(I remembered we discuss this internally). any suggestion on this?
llvm/tools/llvm-profgen/PerfReader.cpp
484–487	Good suggestion! As you mention, we can incorporate all into unwinder by treating non-CS profile as hybrid sample with empty call stack. So how about we do that when implementing non-CS part, right now I will change to code like blow? void generateRawProfile (..) { if(getPerfScriptType() == PERF_LBR) { // range overlap computation for regular AutoFdo ... } else if (getPerfScriptType() == PERF_LBR_STACK) { // Unwind samples if it's hybird sample unwindSamples(); } }
llvm/tools/llvm-profgen/PerfReader.h
211	Yeah, it should not specific to unwinder, I will move to PerfReader to support both AutoFDO and CSSPGO

hoy added inline comments.Nov 20 2020, 1:53 PM

llvm/test/tools/llvm-profgen/Inputs/noinline-cs-noprobe.perfscript
3	Maybe the binary lookup table can be pre-filled with preferred load address when the binary is loaded/constructed. Without mmap2 events in the trace file, subsequent processing with just use the preferred addresses.
llvm/tools/llvm-profgen/ProfileGenerator.cpp
48	I see. Let's keep a single output for now.

wenlei added inline comments.Nov 20 2020, 2:24 PM

llvm/tools/llvm-profgen/PerfReader.cpp
484–487	Yes, that looks good for now.

LGTM.

llvm/tools/llvm-profgen/PerfReader.h
214–215	That is helpful. Thanks.
llvm/tools/llvm-profgen/ProfileGenerator.cpp
63–71	It is helpful too. Thanks.

This revision is now accepted and ready to land.Nov 25 2020, 4:32 PM

Herald added a subscriber: lxfind. · View Herald TranscriptNov 25 2020, 4:32 PM

hoy added inline comments.Nov 30 2020, 9:34 AM

llvm/test/tools/llvm-profgen/inline-cs-noprobe.test
28	Can you please add a comment on what compiler command line switches are used to build the source code?
llvm/tools/llvm-profgen/PerfReader.cpp
49	Nit: just use `PrevIP` here instead of using `Start`?
226	Nit: curly braces not needed for single-statement block.
388	Use `exitWithError`?
llvm/tools/llvm-profgen/PerfReader.h
83	Nit: consider using `std::vector` to reduce the number of memory allocations and for better locality.
155	Nit: `const` qualifier for these getters?
309	Nit: `const` qualifier for getters?

wlei added a child revision: D92334: [CSSPGO][llvm-profgen] Pseudo probe decoding and disassembling.Nov 30 2020, 12:24 PM

Address reviewers' feedback: added more comments and some refactoring work

Harbormaster completed remote builds in B80680: Diff 308693.Dec 1 2020, 9:51 AM

wlei marked 11 inline comments as done.Dec 1 2020, 10:17 AM

wlei added inline comments.

llvm/test/tools/llvm-profgen/inline-cs-noprobe.test
28	Good suggestion, comment added
llvm/tools/llvm-profgen/PerfReader.h
83	Here using list is because CallStack has both `push_back` and `push_front` action, in the future it will switch to trie.
155	fixed, good suggestion, thanks!

hoy added inline comments.Dec 1 2020, 1:02 PM

llvm/tools/llvm-profgen/PerfReader.h
155	Actually I meant something like: ProfiledBinary *getBinary() const { return Binary; } bool hasNextLBR() const { return LBRIndex < LBRStack.size(); } ... Sorry for the confusion.

add const qualifier for some functions

Harbormaster completed remote builds in B80715: Diff 308758.Dec 1 2020, 1:50 PM

wlei added inline comments.Dec 1 2020, 1:51 PM

llvm/tools/llvm-profgen/PerfReader.h
155	fixed, thanks for clarification!

hoy accepted this revision.Dec 1 2020, 2:05 PM

wenlei added inline comments.Dec 2 2020, 9:56 AM

llvm/test/tools/llvm-profgen/Inputs/noinline-cs-noprobe.perfscript
3	Yeah, what @hoy suggested is what I was thinking about - default to preferred load address if mmap is absent. We need that but I think It's fine to deal with it in a separate patch.
llvm/tools/llvm-profgen/PerfReader.h
156	const qualifier here as well?
224	For linear unwinding, some brief explanation for handling of inlining would be helpful too.
llvm/tools/llvm-profgen/ProfileGenerator.cpp
48	What about limiting to single binary input for now? Error our with message saying unsupported if multiple binaries are provided. Generating profiles for multiple binaries in a single output file will make the profile summary info inaccurate (e.g. percentile based hot thresholds).

Address wenlei's feedback

Harbormaster completed remote builds in B81021: Diff 309371.Dec 3 2020, 2:38 PM

This looks great. Thanks for working on this and making all the changes!

wlei retitled this revision from [CSSPGO][llvm-profgen]Context-sensitive profile data generation to [CSSPGO][llvm-profgen] Context-sensitive profile data generation.Dec 7 2020, 1:06 PM

wlei edited the summary of this revision. (Show Details)

rebase and update the diff summary

This revision was landed with ongoing or failed builds.Dec 7 2020, 1:54 PM

Closed by commit rG1f05b1a9f527: [CSSPGO][llvm-profgen] Context-sensitive profile data generation (authored by wlei). · Explain Why

This revision was automatically updated to reflect the committed changes.

wlei added a commit: rG1f05b1a9f527: [CSSPGO][llvm-profgen] Context-sensitive profile data generation.

Harbormaster completed remote builds in B81345: Diff 310002.Dec 7 2020, 2:04 PM

fails here http://lab.llvm.org:8011/#/builders/99/builds/1031

FAIL: LLVM :: tools/llvm-profgen/noinline-cs-noprobe.test (68769 of 72066)
******************** TEST 'LLVM :: tools/llvm-profgen/noinline-cs-noprobe.test' FAILED ********************
Script:
--
: 'RUN: at line 1';   llvm-profgen --perfscript=/b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/test/tools/llvm-profgen/Inputs/noinline-cs-noprobe.perfscript --binary=/b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/test/tools/llvm-profgen/Inputs/noinline-cs-noprobe.perfbin --output=/b/sanitizer-x86_64-linux-bootstrap/build/llvm_build_asan/test/tools/llvm-profgen/Output/noinline-cs-noprobe.test.tmp --show-unwinder-output | /b/sanitizer-x86_64-linux-bootstrap/build/llvm_build_asan/bin/FileCheck /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/test/tools/llvm-profgen/noinline-cs-noprobe.test --check-prefix=CHECK-UNWINDER
: 'RUN: at line 2';   /b/sanitizer-x86_64-linux-bootstrap/build/llvm_build_asan/bin/FileCheck /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/test/tools/llvm-profgen/noinline-cs-noprobe.test --input-file /b/sanitizer-x86_64-linux-bootstrap/build/llvm_build_asan/test/tools/llvm-profgen/Output/noinline-cs-noprobe.test.tmp
--
Exit Code: 1
Command Output (stderr):
--
/b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/test/tools/llvm-profgen/noinline-cs-noprobe.test:20:19: error: CHECK-UNWINDER: expected string not found in input
; CHECK-UNWINDER: (5b0, 5c8): 1
                  ^
<stdin>:14:2: note: scanning from here
 (5c8, 5dc): 2
 ^
Input file: <stdin>
Check file: /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/test/tools/llvm-profgen/noinline-cs-noprobe.test
-dump-input=help explains the following input dump.
Input was:
<<<<<<
          .
          .
          .
          9:  (634, 637): 3
         10:  (645, 645): 3
         11: 
         12: Binary(noinline-cs-noprobe.perfbin)'s Branch Counter:
         13: main:1 @ foo:3 @ bar
         14:  (5c8, 5dc): 2
check:20      X~~~~~~~~~~~~ error: no match found
         15:  (5d7, 5e5): 2
check:20     ~~~~~~~~~~~~~~
         16:  (5e9, 634): 3
check:20     ~~~~~~~~~~~~~~
         17: main:1 @ foo
check:20     ~~~~~~~~~~~~
         18:  (62f, 5b0): 3
check:20     ~~~~~~~~~~~~~~
         19:  (637, 645): 3
check:20     ~~~~~~~~~~~~~~
         20:  (645, 5ff): 3
check:20     ~~~~~~~~~~~~~~
>>>>>>
--
********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90
FAIL: LLVM :: tools/llvm-profgen/inline-cs-noprobe.test (68770 of 72066)
******************** TEST 'LLVM :: tools/llvm-profgen/inline-cs-noprobe.test' FAILED ********************
Script:
--
: 'RUN: at line 1';   llvm-profgen --perfscript=/b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/test/tools/llvm-profgen/Inputs/inline-cs-noprobe.perfscript --binary=/b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/test/tools/llvm-profgen/Inputs/inline-cs-noprobe.perfbin --output=/b/sanitizer-x86_64-linux-bootstrap/build/llvm_build_asan/test/tools/llvm-profgen/Output/inline-cs-noprobe.test.tmp --show-unwinder-output | /b/sanitizer-x86_64-linux-bootstrap/build/llvm_build_asan/bin/FileCheck /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/test/tools/llvm-profgen/inline-cs-noprobe.test --check-prefix=CHECK-UNWINDER
: 'RUN: at line 2';   /b/sanitizer-x86_64-linux-bootstrap/build/llvm_build_asan/bin/FileCheck /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/test/tools/llvm-profgen/inline-cs-noprobe.test --input-file /b/sanitizer-x86_64-linux-bootstrap/build/llvm_build_asan/test/tools/llvm-profgen/Output/inline-cs-noprobe.test.tmp
--
Exit Code: 1
Command Output (stderr):
--
/b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/test/tools/llvm-profgen/inline-cs-noprobe.test:16:19: error: CHECK-UNWINDER: expected string not found in input
; CHECK-UNWINDER: (670, 6ad): 1
                  ^
<stdin>:12:2: note: scanning from here
 (69b, 670): 1
 ^
Input file: <stdin>
Check file: /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/test/tools/llvm-profgen/inline-cs-noprobe.test
-dump-input=help explains the following input dump.
Input was:
<<<<<<
          .
          .
          .
          7: main:1 @ foo:3.2 @ bar
          8:  (6af, 6bb): 14
          9: 
         10: Binary(inline-cs-noprobe.perfbin)'s Branch Counter:
         11: main:1 @ foo
         12:  (69b, 670): 1
check:16      X~~~~~~~~~~~~ error: no match found
         13:  (6c8, 67e): 15
check:16     ~~~~~~~~~~~~~~~
>>>>>>
--

Hi, @vitalybuka , sorry for the test failure, the fix-up patch(https://reviews.llvm.org/D92816) is already landed, please update the repo, thanks！

Revision Contents

Path

Size

llvm/

docs/

CommandGuide/

llvm-profgen.rst

12 lines

include/

llvm/

ProfileData/

SampleProf.h

127 lines

lib/

ProfileData/

SampleProf.cpp

1 line

SampleProfWriter.cpp

6 lines

test/

tools/

llvm-profgen/

Inputs/

inline-cs-noprobe.perfbin

inline-cs-noprobe.perfscript

7 lines

noinline-cs-noprobe.perfbin

noinline-cs-noprobe.perfscript

24 lines

inline-cs-noprobe.test

47 lines

mmapEvent.test

9 lines

noinline-cs-noprobe.test

59 lines

tools/

llvm-profgen/

3 lines

3 lines

2 lines

301 lines

496 lines

101 lines

309 lines

87 lines

98 lines

199 lines

Diff 302315

llvm/docs/CommandGuide/llvm-profgen.rst

	llvm-profgen - LLVM SPGO profile generation tool			llvm-profgen - LLVM SPGO profile generation tool
	=================================			=================================

	.. program:: llvm-profgen			.. program:: llvm-profgen

	SYNOPSIS			SYNOPSIS
	--------			--------

	:program:`llvm-profgen` [commands] [options]			:program:`llvm-profgen` [commands] [options]

	DESCRIPTION			DESCRIPTION
	-----------			-----------

	The :program:`llvm-profgen` utility generates a profile data file			The :program:`llvm-profgen` utility generates a profile data file
	from given perf script data files for sampling-based profile-guided			from given perf script data files for sampling-based profile-guided
	optimization(SPGO).			optimization(SPGO).

	COMMANDS			COMMANDS
	--------			--------
	At least one of the following commands are required:			At least one of the following commands are required:

	.. option:: --perfscript=<string[,string,...]>			.. option:: --perfscript=<string[,string,...]>

	Path of perf-script trace created by Linux perf tool with `script`			Path of perf-script trace created by Linux perf tool with `script`
	command(the raw perf.data should be profiled with -b).			command(the raw perf.data should be profiled with -b).

	.. option:: --binary=<string[,string,...]>			.. option:: --binary=<string[,string,...]>

	Path of the input profiled binary files.			Path of the input profiled binary files.

	.. option:: --output=<string>			.. option:: --output=<string>

	Path of the output profile file.			Path of the output profile file.

	OPTIONS			OPTIONS
	-------			-------
	:program:`llvm-profgen` supports the following options:			:program:`llvm-profgen` supports the following options:

				.. option:: --format=[text\|binary\|extbinary\|compbinary\|gcc]

				Specify the format of the generated profile. Supported <format> are `text`,
				`binary`, `extbinary`, `compbinary`, `gcc`, see `llvm-profdata` for more
				descriptions of the format.

	.. option:: --show-mmap-events			.. option:: --show-mmap-events

	Print mmap events.			Print mmap events.

	.. option:: --show-disassembly			.. option:: --show-disassembly

	Print disassembled code.			Print disassembled code.

	.. option:: --x86-asm-syntax=[att\|intel]			.. option:: --x86-asm-syntax=[att\|intel]

	Specify whether to print assembly code in AT&T syntax (the default) or Intel			Specify whether to print assembly code in AT&T syntax (the default) or Intel
	syntax.			syntax.

llvm/include/llvm/ProfileData/SampleProf.h

Show First 20 Lines • Show All 236 Lines • ▼ Show 20 Lines	struct LineLocation {
void print(raw_ostream &OS) const;		void print(raw_ostream &OS) const;
void dump() const;		void dump() const;

bool operator<(const LineLocation &O) const {		bool operator<(const LineLocation &O) const {
return LineOffset < O.LineOffset \|\|		return LineOffset < O.LineOffset \|\|
(LineOffset == O.LineOffset && Discriminator < O.Discriminator);		(LineOffset == O.LineOffset && Discriminator < O.Discriminator);
}		}

		bool operator==(const LineLocation &O) const {
		return LineOffset == O.LineOffset && Discriminator == O.Discriminator;
		}

		bool operator!=(const LineLocation &O) const {
		return LineOffset != O.LineOffset \|\| Discriminator != O.Discriminator;
		}

uint32_t LineOffset;		uint32_t LineOffset;
uint32_t Discriminator;		uint32_t Discriminator;
};		};

raw_ostream &operator<<(raw_ostream &OS, const LineLocation &Loc);		raw_ostream &operator<<(raw_ostream &OS, const LineLocation &Loc);

/// Representation of a single sample record.		/// Representation of a single sample record.
///		///
▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines

private:		private:
uint64_t NumSamples = 0;		uint64_t NumSamples = 0;
CallTargetMap CallTargets;		CallTargetMap CallTargets;
};		};

raw_ostream &operator<<(raw_ostream &OS, const SampleRecord &Sample);		raw_ostream &operator<<(raw_ostream &OS, const SampleRecord &Sample);

		// State of context associated with FunctionSamples
		wenleiUnsubmitted Done Reply Inline Actions Let's separate CSSPGO changes in SampleProf out from the llvm-profgen changes. We'll send CSSPGO compiler/infrastructure patches separately, and then this llvm-profgen patch can depend on that. wenlei: Let's separate CSSPGO changes in SampleProf out from the llvm-profgen changes. We'll send…
		enum ContextState {
		UnknownContext = 0x0, // Profile without context
		RawContext = 0x1, // Full context profile from input profile
		SyntheticContext = 0x2, // Synthetic context created for context promotion
		InlinedContext = 0x4, // Profile for context that is inlined into caller
		MergedContext = 0x8 // Profile for context merged into base profile
		};

		// Calling context for FunctionSamples
		class SampleContext {
		public:
		SampleContext() : State(UnknownContext) {}
		SampleContext(StringRef ContextStr,
		ContextState CState = UnknownContext) {
		setContext(ContextStr, CState);
		}

		void promoteOnPath(StringRef ContextStrToRemove) {
		assert(FullContext.startswith(ContextStrToRemove));

		// Remove leading context and frame separator " @ ".
		FullContext = FullContext.substr(ContextStrToRemove.size() + 3);
		CallingContext = CallingContext.substr(ContextStrToRemove.size() + 3);
		}

		static std::pair<StringRef, StringRef>
		SplitContextString(StringRef ContextStr) {
		return ContextStr.split(" @ ");
		}

		static void DecodeContextString(StringRef ContextStr, StringRef &FName,
		LineLocation &LineLoc) {
		// Get function name
		auto EntrySplit = ContextStr.split(':');
		FName = EntrySplit.first;

		LineLoc = {0, 0};
		if (!EntrySplit.second.empty()) {
		// Get line offset
		auto LocSplit = EntrySplit.second.split('.');
		// Use signed int for getAsInteger so string will be parsed as signed
		int LineOffset = 0;
		LocSplit.first.getAsInteger(10, LineOffset);
		LineLoc.LineOffset = LineOffset;

		// Get discriminator
		if (!LocSplit.second.empty())
		LocSplit.second.getAsInteger(10, LineLoc.Discriminator);
		}
		}

		operator StringRef() const { return FullContext; }
		bool hasState(ContextState state) { return State & (uint32_t)state; }
		void setState(ContextState state) { State \|= (uint32_t)state; }
		void clearState(ContextState state) { State &= (uint32_t)~state; }
		bool hasContext() const { return State != UnknownContext; }
		bool isBaseContext() const { return CallingContext.empty(); }
		StringRef getName() const { return Name; }
		StringRef getCallingContext() const { return CallingContext; }
		StringRef getNameWithContext() const { return FullContext; }

		private:
		void setContext(StringRef ContextStr, ContextState CState) {
		assert(!ContextStr.empty());
		bool HasContext = ContextStr.startswith("[");
		if (!HasContext && CState == UnknownContext) {
		State = UnknownContext;
		Name = FullContext = ContextStr;
		} else {
		// Assume raw context profile if unspecified
		if (CState == UnknownContext)
		State = RawContext;
		else
		State = CState;

		// Remove encapsulating '[' and ']' if any
		if (HasContext)
		FullContext = ContextStr.substr(1, ContextStr.size() - 2);
		else
		FullContext = ContextStr;

		// Caller is to the left of callee in context string
		auto NameContext = FullContext.rsplit(" @ ");
		if (NameContext.second.empty()) {
		Name = NameContext.first;
		CallingContext = NameContext.second;
		} else {
		Name = NameContext.second;
		CallingContext = NameContext.first;
		}
		}
		}

		StringRef FullContext;
		StringRef Name;
		StringRef CallingContext;
		uint32_t State;
		};

class FunctionSamples;		class FunctionSamples;
class SampleProfileReaderItaniumRemapper;		class SampleProfileReaderItaniumRemapper;

using BodySampleMap = std::map<LineLocation, SampleRecord>;		using BodySampleMap = std::map<LineLocation, SampleRecord>;
// NOTE: Using a StringMap here makes parsed profiles consume around 17% more		// NOTE: Using a StringMap here makes parsed profiles consume around 17% more
// memory, which is very significant for large profiles.		// memory, which is very significant for large profiles.
using FunctionSamplesMap = std::map<std::string, FunctionSamples, std::less<>>;		using FunctionSamplesMap = std::map<std::string, FunctionSamples, std::less<>>;
using CallsiteSampleMap = std::map<LineLocation, FunctionSamplesMap>;		using CallsiteSampleMap = std::map<LineLocation, FunctionSamplesMap>;
Show All 37 Lines	sampleprof_error addCalledTargetSamples(uint32_t LineOffset,
StringRef FName, uint64_t Num,		StringRef FName, uint64_t Num,
uint64_t Weight = 1) {		uint64_t Weight = 1) {
return BodySamples[LineLocation(LineOffset, Discriminator)].addCalledTarget(		return BodySamples[LineLocation(LineOffset, Discriminator)].addCalledTarget(
FName, Num, Weight);		FName, Num, Weight);
}		}

/// Return the number of samples collected at the given location.		/// Return the number of samples collected at the given location.
/// Each location is specified by \p LineOffset and \p Discriminator.		/// Each location is specified by \p LineOffset and \p Discriminator.
/// If the location is not found in profile, return error.		/// If the location is not found in profile, return error.
		wenleiUnsubmitted Done Reply Inline Actions This can now be merged with `getEntrySamples`, with dispatching based on `ProfileIsCS`. wenlei: This can now be merged with `getEntrySamples`, with dispatching based on `ProfileIsCS`.
ErrorOr<uint64_t> findSamplesAt(uint32_t LineOffset,		ErrorOr<uint64_t> findSamplesAt(uint32_t LineOffset,
uint32_t Discriminator) const {		uint32_t Discriminator) const {
const auto &ret = BodySamples.find(LineLocation(LineOffset, Discriminator));		const auto &ret = BodySamples.find(LineLocation(LineOffset, Discriminator));
if (ret == BodySamples.end())		if (ret == BodySamples.end())
return std::error_code();		return std::error_code();
else		else
return ret->second.getSamples();		return ret->second.getSamples();
}		}
		wenleiUnsubmitted Done Reply Inline Actions We probably shouldn't arbitrarily assume live for a general helper function. The logic to assume live can be moved to caller if needed. wenlei: We probably shouldn't arbitrarily assume live for a general helper function. The logic to…

/// Returns the call target map collected at a given location.		/// Returns the call target map collected at a given location.
		wenleiUnsubmitted Done Reply Inline Actions nit: there's no "average" now with this version. wenlei: nit: there's no "average" now with this version.
/// Each location is specified by \p LineOffset and \p Discriminator.		/// Each location is specified by \p LineOffset and \p Discriminator.
/// If the location is not found in profile, return error.		/// If the location is not found in profile, return error.
ErrorOr<SampleRecord::CallTargetMap>		ErrorOr<SampleRecord::CallTargetMap>
findCallTargetMapAt(uint32_t LineOffset, uint32_t Discriminator) const {		findCallTargetMapAt(uint32_t LineOffset, uint32_t Discriminator) const {
const auto &ret = BodySamples.find(LineLocation(LineOffset, Discriminator));		const auto &ret = BodySamples.find(LineLocation(LineOffset, Discriminator));
if (ret == BodySamples.end())		if (ret == BodySamples.end())
return std::error_code();		return std::error_code();
return ret->second.getCallTargets();		return ret->second.getCallTargets();
Show All 33 Lines	public:
/// instruction of the symbol. But as we directly get this info for raw		/// instruction of the symbol. But as we directly get this info for raw
/// profile without referring to potentially inaccurate debug info, this		/// profile without referring to potentially inaccurate debug info, this
/// gives more accurate profile data and is preferred for standalone symbols.		/// gives more accurate profile data and is preferred for standalone symbols.
uint64_t getHeadSamples() const { return TotalHeadSamples; }		uint64_t getHeadSamples() const { return TotalHeadSamples; }

/// Return the sample count of the first instruction of the function.		/// Return the sample count of the first instruction of the function.
/// The function can be either a standalone symbol or an inlined function.		/// The function can be either a standalone symbol or an inlined function.
uint64_t getEntrySamples() const {		uint64_t getEntrySamples() const {
		if (FunctionSamples::ProfileIsCS && getHeadSamples()) {
		// For CS profile, if we already have more accurate head samples
		// counted by branch sample from caller, use them as entry samples.
		return getHeadSamples();
		}
uint64_t Count = 0;		uint64_t Count = 0;
// Use either BodySamples or CallsiteSamples which ever has the smaller		// Use either BodySamples or CallsiteSamples which ever has the smaller
// lineno.		// lineno.
if (!BodySamples.empty() &&		if (!BodySamples.empty() &&
(CallsiteSamples.empty() \|\|		(CallsiteSamples.empty() \|\|
BodySamples.begin()->first < CallsiteSamples.begin()->first))		BodySamples.begin()->first < CallsiteSamples.begin()->first))
Count = BodySamples.begin()->second.getSamples();		Count = BodySamples.begin()->second.getSamples();
else if (!CallsiteSamples.empty()) {		else if (!CallsiteSamples.empty()) {
▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	public:
}		}

/// Set the name of the function.		/// Set the name of the function.
void setName(StringRef FunctionName) { Name = FunctionName; }		void setName(StringRef FunctionName) { Name = FunctionName; }

/// Return the function name.		/// Return the function name.
StringRef getName() const { return Name; }		StringRef getName() const { return Name; }

		/// Return function name with context.
		StringRef getNameWithContext() const {
		return FunctionSamples::ProfileIsCS ? Context.getNameWithContext() : Name;
		}

/// Return the original function name.		/// Return the original function name.
StringRef getFuncName() const { return getFuncName(Name); }		StringRef getFuncName() const { return getFuncName(Name); }

/// Return the canonical name for a function, taking into account		/// Return the canonical name for a function, taking into account
/// suffix elision policy attributes.		/// suffix elision policy attributes.
static StringRef getCanonicalFnName(const Function &F) {		static StringRef getCanonicalFnName(const Function &F) {
static const char *knownSuffixes[] = { ".llvm.", ".part." };		static const char *knownSuffixes[] = { ".llvm.", ".part." };
auto AttrName = "sample-profile-suffix-elision-policy";		auto AttrName = "sample-profile-suffix-elision-policy";
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	public:
///		///
/// \returns the FunctionSamples pointer to the inlined instance.		/// \returns the FunctionSamples pointer to the inlined instance.
/// If \p Remapper is not nullptr, it will be used to find matching		/// If \p Remapper is not nullptr, it will be used to find matching
/// FunctionSamples with not exactly the same but equivalent name.		/// FunctionSamples with not exactly the same but equivalent name.
const FunctionSamples *findFunctionSamples(		const FunctionSamples *findFunctionSamples(
const DILocation *DIL,		const DILocation *DIL,
SampleProfileReaderItaniumRemapper *Remapper = nullptr) const;		SampleProfileReaderItaniumRemapper *Remapper = nullptr) const;

		static bool ProfileIsCS;

		SampleContext &getContext() const { return Context; }

		void setContext(const SampleContext &FContext) { Context = FContext; }

static SampleProfileFormat Format;		static SampleProfileFormat Format;

/// Whether the profile uses MD5 to represent string.		/// Whether the profile uses MD5 to represent string.
static bool UseMD5;		static bool UseMD5;

/// GUIDToFuncNameMap saves the mapping from GUID to the symbol name, for		/// GUIDToFuncNameMap saves the mapping from GUID to the symbol name, for
/// all the function symbols defined or declared in current module.		/// all the function symbols defined or declared in current module.
DenseMap<uint64_t, StringRef> *GUIDToFuncNameMap = nullptr;		DenseMap<uint64_t, StringRef> *GUIDToFuncNameMap = nullptr;

// Assume the input \p Name is a name coming from FunctionSamples itself.		// Assume the input \p Name is a name coming from FunctionSamples itself.
// If UseMD5 is true, the name is already a GUID and we		// If UseMD5 is true, the name is already a GUID and we
// don't want to return the GUID of GUID.		// don't want to return the GUID of GUID.
static uint64_t getGUID(StringRef Name) {		static uint64_t getGUID(StringRef Name) {
return UseMD5 ? std::stoull(Name.data()) : Function::getGUID(Name);		return UseMD5 ? std::stoull(Name.data()) : Function::getGUID(Name);
}		}

// Find all the names in the current FunctionSamples including names in		// Find all the names in the current FunctionSamples including names in
// all the inline instances and names of call targets.		// all the inline instances and names of call targets.
void findAllNames(DenseSet<StringRef> &NameSet) const;		void findAllNames(DenseSet<StringRef> &NameSet) const;

private:		private:
/// Mangled name of the function.		/// Mangled name of the function.
StringRef Name;		StringRef Name;

		/// Calling context for function profile
		mutable SampleContext Context;

/// Total number of samples collected inside this function.		/// Total number of samples collected inside this function.
///		///
/// Samples are cumulative, they include all the samples collected		/// Samples are cumulative, they include all the samples collected
/// inside this function and all its inlined callees.		/// inside this function and all its inlined callees.
uint64_t TotalSamples = 0;		uint64_t TotalSamples = 0;

/// Total number of samples collected at the head of the function.		/// Total number of samples collected at the head of the function.
/// This is an approximation of the number of calls made to this function		/// This is an approximation of the number of calls made to this function
▲ Show 20 Lines • Show All 99 Lines • Show Last 20 Lines

llvm/lib/ProfileData/SampleProf.cpp

	Show All 25 Lines
	#include <system_error>			#include <system_error>

	using namespace llvm;			using namespace llvm;
	using namespace sampleprof;			using namespace sampleprof;

	namespace llvm {			namespace llvm {
	namespace sampleprof {			namespace sampleprof {
	SampleProfileFormat FunctionSamples::Format;			SampleProfileFormat FunctionSamples::Format;
				bool FunctionSamples::ProfileIsCS = false;
	bool FunctionSamples::UseMD5;			bool FunctionSamples::UseMD5;
	} // namespace sampleprof			} // namespace sampleprof
	} // namespace llvm			} // namespace llvm

	namespace {			namespace {

	// FIXME: This class is only here to support the transition to llvm::Error. It			// FIXME: This class is only here to support the transition to llvm::Error. It
	// will be removed once this transition is complete. Clients should prefer to			// will be removed once this transition is complete. Clients should prefer to
	▲ Show 20 Lines • Show All 249 Lines • Show Last 20 Lines

llvm/lib/ProfileData/SampleProfWriter.cpp

	Show First 20 Lines • Show All 240 Lines • ▼ Show 20 Lines
	/// Note: it may be tempting to implement this in terms of			/// Note: it may be tempting to implement this in terms of
	/// FunctionSamples::print(). Please don't. The dump functionality is intended			/// FunctionSamples::print(). Please don't. The dump functionality is intended
	/// for debugging and has no specified form.			/// for debugging and has no specified form.
	///			///
	/// The format used here is more structured and deliberate because			/// The format used here is more structured and deliberate because
	/// it needs to be parsed by the SampleProfileReaderText class.			/// it needs to be parsed by the SampleProfileReaderText class.
	std::error_code SampleProfileWriterText::writeSample(const FunctionSamples &S) {			std::error_code SampleProfileWriterText::writeSample(const FunctionSamples &S) {
	auto &OS = *OutputStream;			auto &OS = *OutputStream;
				if (FunctionSamples::ProfileIsCS)
				OS << "[" << S.getNameWithContext() << "]:" << S.getTotalSamples();
				else
	OS << S.getName() << ":" << S.getTotalSamples();			OS << S.getName() << ":" << S.getTotalSamples();

	if (Indent == 0)			if (Indent == 0)
	OS << ":" << S.getHeadSamples();			OS << ":" << S.getHeadSamples();
	OS << "\n";			OS << "\n";

	SampleSorter<LineLocation, SampleRecord> SortedSamples(S.getBodySamples());			SampleSorter<LineLocation, SampleRecord> SortedSamples(S.getBodySamples());
	for (const auto &I : SortedSamples.get()) {			for (const auto &I : SortedSamples.get()) {
	LineLocation Loc = I->first;			LineLocation Loc = I->first;
	const SampleRecord &Sample = I->second;			const SampleRecord &Sample = I->second;
	▲ Show 20 Lines • Show All 372 Lines • Show Last 20 Lines

llvm/test/tools/llvm-profgen/Inputs/inline-cs-noprobe.perfbin

llvm/test/tools/llvm-profgen/Inputs/inline-cs-noprobe.perfscript

This file was added.

				Using perf wrapper that supports hot-text. Try perf.real if you encounter any issues.
				PERF_RECORD_MMAP2 2854748/2854748: [0x400000(0x1000) @ 0 00:1d 123291722 526021]: r-xp /home/inline-cs-noprobe.perfbin


				40067e
				5541f689495641d7
				0x4006c8/0x40067e/P/-/-/0 0x4006c8/0x40067e/P/-/-/0 0x4006c8/0x40067e/P/-/-/0 0x4006c8/0x40067e/P/-/-/0 0x4006c8/0x40067e/P/-/-/0 0x4006c8/0x40067e/P/-/-/0 0x4006c8/0x40067e/P/-/-/0 0x4006c8/0x40067e/P/-/-/0 0x4006c8/0x40067e/P/-/-/0 0x4006c8/0x40067e/P/-/-/0 0x4006c8/0x40067e/P/-/-/0 0x4006c8/0x40067e/P/-/-/0 0x40069b/0x400670/M/-/-/0 0x4006c8/0x40067e/P/-/-/0 0x4006c8/0x40067e/P/-/-/0 0x4006c8/0x40067e/P/-/-/0

llvm/test/tools/llvm-profgen/Inputs/noinline-cs-noprobe.perfbin

llvm/test/tools/llvm-profgen/Inputs/noinline-cs-noprobe.perfscript

This file was added.

				Using perf wrapper that supports hot-text. Try perf.real if you encounter any issues.
				PERF_RECORD_MMAP2 2854748/2854748: [0x400000(0x1000) @ 0 00:1d 123291722 526021]: r-xp /home/noinline-cs-noprobe.perfbin

				wenleiUnsubmitted Not Done Reply Inline Actions I think we also need to support cases where PERF_RECORD_MMAP2 event isn't available, in which case we just use preferred load address from ELF header. Can you add a test case that doesn't have PERF_RECORD_MMAP2? Looks like currently we would just proceed with parsing without a base address set? wenlei: I think we also need to support cases where PERF_RECORD_MMAP2 event isn't available, in which…
				wleiAuthorUnsubmitted Done Reply Inline Actions Yeah, currently PERF_RECORD_MMAP2 is required. The problem using preferred load address for non-mmap event is one perf address might belong to multiple binaries, which will mess up the whole process. Also we need to one more perftrace scan to confirm there is no mmap2 event so that we can switch to use preferred address. or we can have a switch like "--no-mmp2-events" to explicitly tell the tool use preferred address, also only support one binary under this switch. or we need some info in the perf trace tell which binary it belong to(I remembered we discuss this internally). any suggestion on this? wlei: Yeah, currently PERF_RECORD_MMAP2 is required. The problem using preferred load address for non…
				hoyUnsubmitted Not Done Reply Inline Actions Maybe the binary lookup table can be pre-filled with preferred load address when the binary is loaded/constructed. Without mmap2 events in the trace file, subsequent processing with just use the preferred addresses. hoy: Maybe the binary lookup table can be pre-filled with preferred load address when the binary is…
				wenleiUnsubmitted Not Done Reply Inline Actions Yeah, what @hoy suggested is what I was thinking about - default to preferred load address if mmap is absent. We need that but I think It's fine to deal with it in a separate patch. wenlei: Yeah, what @hoy suggested is what I was thinking about - default to preferred load address if…
				4005dc
				400634
				400684
				7f68c5788793
				0x4005c8/0x4005dc/P/-/-/0 0x40062f/0x4005b0/P/-/-/0 0x400645/0x4005ff/P/-/-/0 0x400637/0x400645/P/-/-/0 0x4005e9/0x400634/P/-/-/0 0x4005d7/0x4005e5/P/-/-/0 0x40062f/0x4005b0/P/-/-/0 0x400645/0x4005ff/P/-/-/0 0x400637/0x400645/P/-/-/0 0x4005e9/0x400634/P/-/-/0 0x4005d7/0x4005e5/P/-/-/0 0x40062f/0x4005b0/P/-/-/0 0x400645/0x4005ff/P/-/-/0 0x400637/0x400645/P/-/-/0 0x4005e9/0x400634/P/-/-/0 0x4005c8/0x4005dc/P/-/-/0

				// Test for leaf frame ending up in prolog
				4005b0
				400684
				7f68c5788793
				0x40062f/0x4005b0/P/-/-/0 0x400645/0x4005ff/P/-/-/0 0x400637/0x400645/P/-/-/0 0x4005e9/0x400634/P/-/-/0 0x4005c8/0x4005dc/P/-/-/0 0x40062f/0x4005b0/P/-/-/0 0x400645/0x4005ff/P/-/-/0 0x400637/0x400645/P/-/-/0 0x4005e9/0x400634/P/-/-/0 0x4005d7/0x4005e5/P/-/-/0 0x40062f/0x4005b0/P/-/-/0 0x400645/0x4005ff/P/-/-/0 0x400637/0x400645/P/-/-/0 0x4005e9/0x400634/P/-/-/0 0x4005d7/0x4005e5/P/-/-/0 0x40062f/0x4005b0/P/-/-/0

				// Call stack:
				// 4005b0 -> start addr of bar
				// 400684 -> address in main
				// LBR Entry: \| Source \| Target
				// 0x40062f/0x4005b0/P/-/-/0 \| callq -132 <bar> \| start addr of bar
				// 0x400645/0x4005ff/P/-/-/0 \| jmp -75 <foo+0xf> \| movl -8(%rbp), %eax
				// 0x400637/0x400645/P/-/-/0 \| jmp 9 <foo+0x55> \| jmp -75 <foo+0xf>
				// 0x4005e9/0x400634/P/-/-/0 \| (bar)retq \| next addr of [callq -132 <bar>]
				// 0x4005d7/0x4005e5/P/-/-/0 \| jmp 9 <bar+0x35> \| movl -4(%rbp), %eax

llvm/test/tools/llvm-profgen/inline-cs-noprobe.test

This file was added.

				; RUN: llvm-profgen --perfscript=%S/Inputs/inline-cs-noprobe.perfscript --binary=%S/Inputs/inline-cs-noprobe.perfbin --output=%t --show-unwinder-output \| FileCheck %s --check-prefix=CHECK-UNWINDER
				; RUN: FileCheck %s --input-file %t
				wmiUnsubmitted Done Reply Inline Actions Is it possible to use a small manually crafted perfscript as input? It is easier to know whether the number in the output makes sense or not when the perfscript is small. It will also be easier if something in the test needs to be adjusted in the future. wmi: Is it possible to use a small manually crafted perfscript as input? It is easier to know…
				wleiAuthorUnsubmitted Done Reply Inline Actions Thanks for your feedbacks. a small perfscript with only one or two sample is replaced. also add unwinder's test wlei: Thanks for your feedbacks. a small perfscript with only one or two sample is replaced. also add…

				; CHECK:[main:1 @ foo]:267:0
				; CHECK: 2.2: 12
				; CHECK: 3: 13
				; CHECK: 3.2: 12 bar:12
				; CHECK: 3.4: 1
				; CHECK: 4294967286: 12
				; CHECK:[main:1 @ foo:3.2 @ bar]:72:0
				; CHECK: 1: 12

				; CHECK-UNWINDER: Range Counter:
				; CHECK-UNWINDER: main:1 @ foo:3.2 @ bar
				; CHECK-UNWINDER: (4006af, 4006bb): 12
				; CHECK-UNWINDER: main:1 @ foo
				; CHECK-UNWINDER: (400670, 4006ad): 1
				; CHECK-UNWINDER: (40067e, 40069b): 1
				; CHECK-UNWINDER: (40067e, 4006ad): 11
				; CHECK-UNWINDER: (4006bd, 4006c8): 12

				; CHECK-UNWINDER: Branch Counter:
				; CHECK-UNWINDER: main:1 @ foo
				; CHECK-UNWINDER: (40069b, 400670): 1
				; CHECK-UNWINDER: (4006c8, 40067e): 13

				; original code:
				#include <stdio.h>
				hoyUnsubmitted Done Reply Inline Actions Can you please add a comment on what compiler command line switches are used to build the source code? hoy: Can you please add a comment on what compiler command line switches are used to build the…
				wleiAuthorUnsubmitted Done Reply Inline Actions Good suggestion, comment added wlei: Good suggestion, comment added

				int bar(int x, int y) {
				if (x % 3) {
				return x - y;
				}
				return x + y;
				}

				void foo() {
				int s, i = 0;
				while (i++ < 4000 * 4000)
				if (i % 91) s = bar(i, s); else s += 30;
				printf("sum is %d\n", s);
				}

				int main() {
				foo();
				return 0;
				}

llvm/test/tools/llvm-profgen/mmapEvent.test

	; REQUIRES: x86-registered-target			; REQUIRES: x86-registered-target
	; RUN: llvm-mc -filetype=obj -triple=x86_64 %S/disassemble.s -o %t			; RUN: llvm-mc -filetype=obj -triple=x86_64 %S/disassemble.s -o %t
	; RUN: llvm-profgen --perfscript=%s --binary=%t --output=%t --show-mmap-events \| FileCheck %s			; RUN: llvm-profgen --perfscript=%s --binary=%t --output=%t --show-mmap-events \| FileCheck %s

	PERF_RECORD_MMAP2 2580483/2580483: [0x400000(0x1000) @ 0 103:01 539973862 1972407324]: r-xp /home/a.out			PERF_RECORD_MMAP2 2580483/2580483: [0x400000(0x1000) @ 0 103:01 539973862 1972407324]: r-xp /home/a.out
	PERF_RECORD_MMAP2 2580483/2580483: [0x7f2505b40000(0x224000) @ 0 08:04 19532214 4169021329]: r-xp /usr/lib64/ld-2.17.so			PERF_RECORD_MMAP2 2580483/2580483: [0x7f2505b40000(0x224000) @ 0 08:04 19532214 4169021329]: r-xp /usr/lib64/ld-2.17.so
	PERF_RECORD_MMAP2 2580483/2580483: [0x7ffe88097000(0x1000) @ 0 00:00 0 0]: r-xp [vdso]			PERF_RECORD_MMAP2 2580483/2580483: [0x7ffe88097000(0x1000) @ 0 00:00 0 0]: r-xp [vdso]
	PERF_RECORD_MMAP2 2580483/2580483: [0x7f2505d56000(0xa000) @ 0 08:04 19530021 4190740662]: r-xp /usr/lib64/perf_fopen_hook.so			PERF_RECORD_MMAP2 2580483/2580483: [0x7f2505d56000(0xa000) @ 0 08:04 19530021 4190740662]: r-xp /usr/lib64/perf_fopen_hook.so
	PERF_RECORD_MMAP2 2580483/2580483: [0x7f250593c000(0x204000) @ 0 08:04 19532229 3585508847]: r-xp /usr/lib64/libdl-2.17.so			PERF_RECORD_MMAP2 2580483/2580483: [0x7f250593c000(0x204000) @ 0 08:04 19532229 3585508847]: r-xp /usr/lib64/libdl-2.17.so
	PERF_RECORD_MMAP2 2580483/2580483: [0x7f250556e000(0x3ce000) @ 0 08:04 19532221 4003737677]: r-xp /usr/lib64/libc-2.17.so			PERF_RECORD_MMAP2 2580483/2580483: [0x7f250556e000(0x3ce000) @ 0 08:04 19532221 4003737677]: r-xp /usr/lib64/libc-2.17.so
	PERF_RECORD_MMAP2 2580483/2580483: [0x7f2505358000(0x216000) @ 0 08:04 19534595 2609212015]: r-xp /usr/lib64/libz.so.1.2.7			PERF_RECORD_MMAP2 2580483/2580483: [0x7f2505358000(0x216000) @ 0 08:04 19534595 2609212015]: r-xp /usr/lib64/libz.so.1.2.7
	7f2505b49811 0x7f2505b49811/0x7f2505b509f0/P/-/-/0 0x7f2505b4974c/0x7f2505b4975b/P/-/-/0 0x7f2505b49837/0x7f2505b49720/P/-/-/0 0x7f2505b50a5a/0x7f2505b49816/P/-/-/0 0x7f2505b50a27/0x7f2505b50a50/P/-/-/0 0x7f2505b50a36/0x7f2505b50a20/P/-/-/0 0x7f2505b59dd0/0x7f2505b50a34/P/-/-/0 0x7f2505b59db4/0x7f2505b59dc3/P/-/-/0 0x7f2505b50a2f/0x7f2505b59db0/P/-/-/0 0x7f2505b50a15/0x7f2505b50a29/P/-/-/0 0x7f2505b59dd0/0x7f2505b50a05/P/-/-/0 0x7f2505b59db4/0x7f2505b59dc3/P/-/-/0 0x7f2505b50a00/0x7f2505b59db0/P/-/-/0 0x7f2505b49811/0x7f2505b509f0/P/-/-/0 0x7f2505b4974c/0x7f2505b4975b/P/-/-/0 0x7f2505b4a08a/0x7f2505b496a0/P/-/-/0			7f2505b49811
				0x7f2505b49811/0x7f2505b509f0/P/-/-/0 0x7f2505b4974c/0x7f2505b4975b/P/-/-/0 0x7f2505b49837/0x7f2505b49720/P/-/-/0 0x7f2505b50a5a/0x7f2505b49816/P/-/-/0 0x7f2505b50a27/0x7f2505b50a50/P/-/-/0 0x7f2505b50a36/0x7f2505b50a20/P/-/-/0 0x7f2505b59dd0/0x7f2505b50a34/P/-/-/0 0x7f2505b59db4/0x7f2505b59dc3/P/-/-/0 0x7f2505b50a2f/0x7f2505b59db0/P/-/-/0 0x7f2505b50a15/0x7f2505b50a29/P/-/-/0 0x7f2505b59dd0/0x7f2505b50a05/P/-/-/0 0x7f2505b59db4/0x7f2505b59dc3/P/-/-/0 0x7f2505b50a00/0x7f2505b59db0/P/-/-/0 0x7f2505b49811/0x7f2505b509f0/P/-/-/0 0x7f2505b4974c/0x7f2505b4975b/P/-/-/0 0x7f2505b4a08a/0x7f2505b496a0/P/-/-/0
	PERF_RECORD_MMAP2 2580483/2580483: [0x7f2505d56000(0x8000) @ 0 08:04 19530021 4190740662]: r-xp /usr/lib64/perf_fopen_hook.so			PERF_RECORD_MMAP2 2580483/2580483: [0x7f2505d56000(0x8000) @ 0 08:04 19530021 4190740662]: r-xp /usr/lib64/perf_fopen_hook.so
	4006b1 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0			4006b1
				0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0
	PERF_RECORD_MMAP2 2580483/2580483: [0x7f2505156000(0x202000) @ 0 103:01 539962022 734061270]: r-xp /home/hoy/test/dlopen/helper.so			PERF_RECORD_MMAP2 2580483/2580483: [0x7f2505156000(0x202000) @ 0 103:01 539962022 734061270]: r-xp /home/hoy/test/dlopen/helper.so
	4006b1 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0			4006b1
				0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0 0x4006b1/0x4006a0/P/-/-/0
	PERF_RECORD_MMAP2 2580483/2580483: [0x7f2505156000(0x202000) @ 0 103:01 539962022 734061270]: r-xp /home/hoy/test/dlopen/helper.so			PERF_RECORD_MMAP2 2580483/2580483: [0x7f2505156000(0x202000) @ 0 103:01 539962022 734061270]: r-xp /home/hoy/test/dlopen/helper.so


	; CHECK: Mmap: Binary /home/a.out loaded at 0x400000			; CHECK: Mmap: Binary /home/a.out loaded at 0x400000
	; CHECK: Mmap: Binary /usr/lib64/ld-2.17.so loaded at 0x7f2505b40000			; CHECK: Mmap: Binary /usr/lib64/ld-2.17.so loaded at 0x7f2505b40000
	; CHECK: Mmap: Binary [vdso] loaded at 0x7ffe88097000			; CHECK: Mmap: Binary [vdso] loaded at 0x7ffe88097000
	; CHECK: Mmap: Binary /usr/lib64/perf_fopen_hook.so loaded at 0x7f2505d56000			; CHECK: Mmap: Binary /usr/lib64/perf_fopen_hook.so loaded at 0x7f2505d56000
	; CHECK: Mmap: Binary /usr/lib64/libdl-2.17.so loaded at 0x7f250593c000			; CHECK: Mmap: Binary /usr/lib64/libdl-2.17.so loaded at 0x7f250593c000
	; CHECK: Mmap: Binary /usr/lib64/libc-2.17.so loaded at 0x7f250556e000			; CHECK: Mmap: Binary /usr/lib64/libc-2.17.so loaded at 0x7f250556e000
	; CHECK: Mmap: Binary /usr/lib64/libz.so.1.2.7 loaded at 0x7f2505358000			; CHECK: Mmap: Binary /usr/lib64/libz.so.1.2.7 loaded at 0x7f2505358000
	; CHECK: Mmap: Binary /usr/lib64/perf_fopen_hook.so loaded at 0x7f2505d56000			; CHECK: Mmap: Binary /usr/lib64/perf_fopen_hook.so loaded at 0x7f2505d56000
	; CHECK: Mmap: Binary /home/hoy/test/dlopen/helper.so loaded at 0x7f2505156000			; CHECK: Mmap: Binary /home/hoy/test/dlopen/helper.so loaded at 0x7f2505156000
	; CHECK: Mmap: Binary /home/hoy/test/dlopen/helper.so loaded at 0x7f2505156000			; CHECK: Mmap: Binary /home/hoy/test/dlopen/helper.so loaded at 0x7f2505156000

llvm/test/tools/llvm-profgen/noinline-cs-noprobe.test

This file was added.

				; RUN: llvm-profgen --perfscript=%S/Inputs/noinline-cs-noprobe.perfscript --binary=%S/Inputs/noinline-cs-noprobe.perfbin --output=%t --show-unwinder-output \| FileCheck %s --check-prefix=CHECK-UNWINDER
				; RUN: FileCheck %s --input-file %t

				; CHECK:[main:1 @ foo]:57:0
				; CHECK: 2: 3
				; CHECK: 3: 3 bar:3
				; CHECK:[main:1 @ foo:3 @ bar]:50:3
				; CHECK: 0: 3
				wmiUnsubmitted Not Done Reply Inline Actions Is it possible for us to tell one level in context is inlined or not? It will make the profile more informative. wmi: Is it possible for us to tell one level in context is inlined or not? It will make the profile…
				wenleiUnsubmitted Not Done Reply Inline Actions Yes, agree that can useful, especially for tuning purpose to see how CS inline decision differs from previous build. We wanted to add a metadata (similar to `!CFGChecksum` for pseudo-probe profile) to indicate whether a context is inlined or not. Note that in this case, it would only tell whether bar is inlined along `main:1 @ foo:3`, but not whether `foo` is inlined along `main:1.` What do you think? Also to keep patch smallish, I think we can add this later separately. [main:1 @ foo:3 @ bar]:29103:1745 ... ... !CFGChecksum: ... !Flag: Inline wenlei: Yes, agree that can useful, especially for tuning purpose to see how CS inline decision differs…
				wmiUnsubmitted Not Done Reply Inline Actions Note that in this case, it would only tell whether bar is inlined along main:1 @ foo:3, but not whether foo is inlined along main:1. What do you think? What is the main difficulty to keep the inline information for each Context level? Also to keep patch smallish, I think we can add this later separately. Sure. > [main:1 @ foo:3 @ bar]:29103:1745 !CFGChecksum: ... !Flag: Inline Can we use some special sign to mark whether bar is inline or not, "" for example? [main:1 @ foo:3 @ bar]:29103:1745 wmi: > Note that in this case, it would only tell whether bar is inlined along main:1 @ foo:3, but…
				wenleiUnsubmitted Not Done Reply Inline Actions What is the main difficulty to keep the inline information for each Context level? Even if we only mark for leaf frame, middle frames inline decision could be found in its own leaf context (if it exists). I think it's also doable if we want to embed inline decision for each frame in a context (either with metadata or header), but since this is mostly for tuning/debugging, we're trying to keep it at minimum for now, and we can expand later if needed.. Now as to header vs metadata as carrier. There're couple reasons we don't do it in the header. We thought consistency between inline vs non-inline context for main profile and header keep things clean. In fact, CSSPGO really treats them indifferently. !metadata can be mapped/converted to binary profile with general framework support. Doing it with special character in header may require special (not-so-clean) handling for text-binary conversion. There's only that much we can do with special character in header, so it's not extensible if we want to encode something else. Initially we have checksum in header as well, but later move it to metadata to keep header clean, and thought it'd be good to keep all auxiliary data in metadata form. wenlei: > What is the main difficulty to keep the inline information for each Context level? Even if…
				wmiUnsubmitted Not Done Reply Inline Actions We thought consistency between inline vs non-inline context for main profile and header keep things clean. In fact, CSSPGO really treats them indifferently. Yes, I understand CSSPGO treats inline callsite or not inline callsite indifferently, and the special character showing whether a callsite is inlined is just for easy debug. Since it is only for debug, you just need to add it when you output the profile into text and you can strip it once for all when you read the text. That will be a standalone processing step. CSSPGO has already include a lot more useful information than current SPGO profile, I just hope the CSSPGO afdo file can show us the inline hierarchy as easy as what we current have. wmi: > We thought consistency between inline vs non-inline context for main profile and header keep…
				wenleiUnsubmitted Not Done Reply Inline Actions I just hope the CSSPGO afdo file can show us the inline hierarchy as easy as what we current have. Yeah, I can see how that can make debugging a bit easier. Perhaps a stack of flags in metadata can do it too if needed. On the other hand, the info is all from dwarf, so what we are discussing is just the visualization. Visualizing inline hierarchy isn't the responsibility of a profile, but afdo happens to be able to visualize inline hierarchy in a nice way, though it's still more of a side effect of the way inline profile is represented. I'd argue that cleanness and consistency probably weighs more than trying to reach parity for that nice side effect (and actually even if we use * for inline frame, it's still not as nice as afdo profile's tree style inline hierarchy..) Perhaps we can leave this one open for now, and see where actual need leads us to? wenlei: > I just hope the CSSPGO afdo file can show us the inline hierarchy as easy as what we current…
				wmiUnsubmitted Not Done Reply Inline Actions Agree. It is not critical. We can leave it open at this moment. wmi: Agree. It is not critical. We can leave it open at this moment.
				; CHECK: 1: 3
				; CHECK: 2: 2
				; CHECK: 4: 1
				; CHECK: 5: 3

				; CHECK-UNWINDER: Range Counter:
				; CHECK-UNWINDER: main:1 @ foo
				; CHECK-UNWINDER: (4005ff, 40062f): 3
				; CHECK-UNWINDER: (400634, 400637): 3
				; CHECK-UNWINDER: (400645, 400645): 3
				; CHECK-UNWINDER: main:1 @ foo:3 @ bar
				; CHECK-UNWINDER: (4005b0, 4005c8): 1
				; CHECK-UNWINDER: (4005b0, 4005d7): 2
				; CHECK-UNWINDER: (4005dc, 4005e9): 1
				; CHECK-UNWINDER: (4005e5, 4005e9): 2

				; CHECK-UNWINDER: Branch Counter:
				; CHECK-UNWINDER: main:1 @ foo
				; CHECK-UNWINDER: (40062f, 4005b0): 3
				; CHECK-UNWINDER: (400637, 400645): 3
				; CHECK-UNWINDER: (400645, 4005ff): 3
				; CHECK-UNWINDER: main:1 @ foo:3 @ bar
				; CHECK-UNWINDER: (4005c8, 4005dc): 2
				; CHECK-UNWINDER: (4005d7, 4005e5): 2
				; CHECK-UNWINDER: (4005e9, 400634): 3





				; original code:
				#include <stdio.h>

				int bar(int x, int y) {
				if (x % 3) {
				return x - y;
				}
				return x + y;
				}

				void foo() {
				int s, i = 0;
				while (i++ < 4000 * 4000)
				if (i % 91) s = bar(i, s); else s += 30;
				printf("sum is %d\n", s);
				}

				int main() {
				foo();
				return 0;
				}

llvm/tools/llvm-profgen/CMakeLists.txt

	include_directories(			include_directories(
	${LLVM_MAIN_SRC_DIR}/lib/Target/X86			${LLVM_MAIN_SRC_DIR}/lib/Target/X86
	${LLVM_BINARY_DIR}/lib/Target/X86			${LLVM_BINARY_DIR}/lib/Target/X86
	)			)
	set(LLVM_LINK_COMPONENTS			set(LLVM_LINK_COMPONENTS
	AllTargetsDescs			AllTargetsDescs
	AllTargetsDisassemblers			AllTargetsDisassemblers
	Core			Core
	MC			MC
	MCDisassembler			MCDisassembler
	Object			Object
				ProfileData
	Support			Support
	Symbolize			Symbolize
	)			)

	add_llvm_tool(llvm-profgen			add_llvm_tool(llvm-profgen
	llvm-profgen.cpp			llvm-profgen.cpp
	ProfiledBinary.cpp			ProfiledBinary.cpp
				PerfReader.cpp
				ProfileGenerator.cpp
	)			)

llvm/tools/llvm-profgen/CallContext.h

	//===-- CallContext.h - Call Context Handler -----------------------*- C++			//===-- CallContext.h - Call Context Handler --------------------- C++ --===//
	//-*-===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLVM_TOOLS_LLVM_PROGEN_CALLCONTEXT_H			#ifndef LLVM_TOOLS_LLVM_PROGEN_CALLCONTEXT_H
	▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

llvm/tools/llvm-profgen/LLVMBuild.txt

	Show All 12 Lines
	; http://llvm.org/docs/LLVMBuild.html			; http://llvm.org/docs/LLVMBuild.html
	;			;
	;===------------------------------------------------------------------------===;			;===------------------------------------------------------------------------===;

	[component_0]			[component_0]
	type = Tool			type = Tool
	name = llvm-profgen			name = llvm-profgen
	parent = Tools			parent = Tools
	required_libraries = DebugInfoDWARF MC MCDisassembler MCParser Object all-targets Demangle Support			required_libraries = DebugInfoDWARF MC MCDisassembler MCParser Object all-targets Demangle ProfileData Support

llvm/tools/llvm-profgen/PerfReader.h

This file was added.

				//===-- PerfReader.h - perfscript reader ------------------------ C++ --===//
				//
				wenleiUnsubmitted Not Done Reply Inline Actions Would be good to add more comments for the classes/types defined, especially non-trivial ones. wenlei: Would be good to add more comments for the classes/types defined, especially non-trivial ones.
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_TOOLS_LLVM_PROGEN_PROFILEREADER_H
				#define LLVM_TOOLS_LLVM_PROGEN_PROFILEREADER_H
				#include "ErrorHandling.h"
				#include "ProfiledBinary.h"
				#include "llvm/Support/CommandLine.h"
				#include "llvm/Support/LineIterator.h"
				#include "llvm/Support/Regex.h"
				#include <list>
				#include <map>
				#include <vector>

				using namespace llvm;
				using namespace llvm::sampleprof;

				namespace llvm {
				namespace sampleprof {

				// The type of perfscript
				enum PerfScriptType {
				PERF_INVILID = 0,
				PERF_LBR = 1, // Only LBR sample
				PERF_LBR_STACK = 2, // Hybrid sample including call stack and LBR stack.
				};

				// The parsed LBR sample entry.
				struct LBREntry {
				uint64_t Source = 0;
				uint64_t Target = 0;
				// An artificial branch stands for a series of consecutive branches starting
				// from the current binary with a transition through external code and
				// eventually landing back in the current binary.
				bool IsArtificial = false;
				LBREntry(uint64_t S, uint64_t T, bool I)
				: Source(S), Target(T), IsArtificial(I) {}
				};

				hoyUnsubmitted Not Done Reply Inline Actions An artificial branch stands for a series of consecutive branches starting from the current binary with a transition through external code and eventually landing back in the current binary. hoy: An artificial branch stands for a series of consecutive branches starting from the current…
				// The parsed hybrid sample including call stack and LBR stack.
				struct HybridSample {
				// Profiled binary that current frame address belongs to
				ProfiledBinary *Binary;
				// Call stack recorded in FILO(leaf to root) order
				std::list<uint64_t> CallStack;
				// LBR stack recorded in FIFO order
				SmallVector<LBREntry, 16> LBRStack;

				wenleiUnsubmitted Not Done Reply Inline Actions This is just an encapsulation for synchronized stack and lbr sample, and it's orthogonal to unwinder. Suggest decouple it from unwinder in naming, instead call it `HybridSample`. wenlei: This is just an encapsulation for synchronized stack and lbr sample, and it's orthogonal to…
				// Used for sample aggregation
				bool operator==(const HybridSample &Other) const {

				wenleiUnsubmitted Not Done Reply Inline Actions nit: I'd replace anti-execution with bottom-up order (or leaf to root). wenlei: nit: I'd replace anti-execution with bottom-up order (or leaf to root).
				const std::list<uint64_t> &OtherCallStack = Other.CallStack;
				wenleiUnsubmitted Not Done Reply Inline Actions Curious why use list for Stack but SmallVector (and Index) for LBRStack? Seem a bit inconsistent for similar use case. wenlei: Curious why use list for Stack but SmallVector (and Index) for LBRStack? Seem a bit…
				wleiAuthorUnsubmitted Done Reply Inline Actions Currently using list is for easy copy data from UnwinderTrace to UnwindState(see line 90 also list). Why copying the data but not using the Index like LBRStack is because Callstack is changed dynamically during the unwinding(aka need push_front) This is kind of a temporary solution, considering our next step is to use trie node to represent callstack, at that time I will make it consistent to use SmallVector. So Ideally UnwinderTrace only keep the data and UnwindState only keep the index/trie node. wlei: Currently using list is for easy copy data from UnwinderTrace to UnwindState(see line 90 also…
				wenleiUnsubmitted Not Done Reply Inline Actions Got it, makes sense. Thanks for clarifying. wenlei: Got it, makes sense. Thanks for clarifying.
				const SmallVector<LBREntry, 16> &OtherLBRStack = Other.LBRStack;
				wenleiUnsubmitted Not Done Reply Inline Actions nit: anti-execution sounds a bit confusing. I think the canonical term is FIFO order (LBR sampling has two modes, FIFO for trace and FILO for call stack). wenlei: nit: anti-execution sounds a bit confusing. I think the canonical term is FIFO order (LBR…

				if (CallStack.size() != OtherCallStack.size() \|\|
				LBRStack.size() != OtherLBRStack.size())
				return false;

				auto Iter = CallStack.begin();
				for (auto Address : OtherCallStack) {
				if (Address != *Iter++)
				return false;
				}

				for (size_t I = 0; I < OtherLBRStack.size(); I++) {
				if (LBRStack[I].Source != OtherLBRStack[I].Source \|\|
				LBRStack[I].Target != OtherLBRStack[I].Target)
				return false;
				}
				return true;
				}
				};

				// The state for the unwinder, it doesn't hold the data but only keep the
				// pointer/index of the data, While unwinding, the call stack is changed
				// dynamicially and will be recorded as the context of the sample
				struct UnwindState {
				// Profiled binary that current frame address belongs to
				hoyUnsubmitted Done Reply Inline Actions Nit: consider using `std::vector` to reduce the number of memory allocations and for better locality. hoy: Nit: consider using `std::vector` to reduce the number of memory allocations and for better…
				wleiAuthorUnsubmitted Done Reply Inline Actions Here using list is because CallStack has both `push_back` and `push_front` action, in the future it will switch to trie. wlei: Here using list is because CallStack has both `push_back` and `push_front` action, in the…
				ProfiledBinary *Binary;
				// TODO: switch to use trie for call stack
				std::list<uint64_t> CallStack;
				// Used to fall through the LBR stack
				uint32_t LBRIndex = 0;
				// Reference to HybridSample.LBRStack
				const SmallVector<LBREntry, 16> &LBRStack;
				// Used to iterate the address range
				wenleiUnsubmitted Not Done Reply Inline Actions Can this be a reference as well just like `LBRStack` and point to the input Sample? Note header comment also states "it doesn't hold the data but only keep the pointer/index of the data". wenlei: Can this be a reference as well just like `LBRStack` and point to the input Sample? Note header…
				wleiAuthorUnsubmitted Done Reply Inline Actions Same here, there is pop_front() with CallStack so that I used list. This will be solved by trie. The comments is the final ideal one..sorry for the confusing. wlei: Same here, there is pop_front() with CallStack so that I used list. This will be solved by trie.
				wenleiUnsubmitted Not Done Reply Inline Actions I see. So because Trace is the key of TraceAggregationMap and we will need to mutate this list, you have to create a copy of what's in UnwindTrace, correct? wenlei: I see. So because Trace is the key of TraceAggregationMap and we will need to mutate this list…
				wleiAuthorUnsubmitted Done Reply Inline Actions Yes, the map key enforces the const type and it at least copy one to the State even using trie node( for the tie initialization) because of the mutation. wlei: Yes, the map key enforces the const type and it at least copy one to the State even using trie…
				InstructionPointer InstPtr;
				UnwindState(const HybridSample &Sample)
				: Binary(Sample.Binary), CallStack(Sample.CallStack),
				LBRStack(Sample.LBRStack),
				InstPtr(Sample.Binary, Sample.CallStack.front()) {}

				bool validateInitialState() {
				uint64_t LBRLeaf = LBRStack[LBRIndex].Target;
				uint64_t StackLeaf = CallStack.front();
				if (StackLeaf < LBRLeaf \|\| StackLeaf >= LBRLeaf + 0x100) {
				WithColor::warning() << "Bogus trace: stack tip = "
				<< format("%#010x", StackLeaf)
				<< ", LBR tip = " << format("%#010x\n", LBRLeaf);
				return false;
				wmiUnsubmitted Not Done Reply Inline Actions What is the rationale behind the condition? wmi: What is the rationale behind the condition?
				wenleiUnsubmitted Not Done Reply Inline Actions Ideally we want tip of LBR target and tip of stack leaf to align with the help of PEBS. When we take a stack sample, the leaf IP of stack could be last LBR target address +N bytes, and N shouldn't be too large because N is essentially the sampling skid distance. So I had this in my original prototype as a sanity check to filter out broken records. However the distance chosen was somewhat arbitrary.. In reality, I don't think we've seen this firing with PEBS, but it could happen without PEBS, or if cycles is instead of branch_retired as triggering event. wenlei: Ideally we want tip of LBR target and tip of stack leaf to align with the help of PEBS. When we…
				wmiUnsubmitted Not Done Reply Inline Actions Thanks for the detailed explanation. Copying it to comment will be useful. wmi: Thanks for the detailed explanation. Copying it to comment will be useful.
				}
				return true;
				}

				void checkStateConsistency() {
				assert(Instptr.Address == CallStack.front() &&
				"IP should align with context leaf");
				}

				std::string getExpandedContextStr() {
				return Binary->getExpandedContextStr(CallStack, true);
				}
				ProfiledBinary *getBinary() { return Binary; }
				bool hasNextLBR() { return LBRIndex < LBRStack.size(); }
				uint64_t getCurrentLBRSource() { return LBRStack[LBRIndex].Source; }
				uint64_t getCurrentLBRTarget() { return LBRStack[LBRIndex].Target; }
				const LBREntry &getCurrentLBR() { return LBRStack[LBRIndex]; }
				void advanceLBR() { LBRIndex++; }
				};

				// The counter of branch samples for one function indexed by the branch,
				// which is represented as the source and target address pair.
				using BranchSample = std::map<std::pair<uint64_t, uint64_t>, uint64_t>;
				wmiUnsubmitted Done Reply Inline Actions What do the three uint64_t fields in the map represent? wmi: What do the three uint64_t fields in the map represent?
				wleiAuthorUnsubmitted Done Reply Inline Actions fixed according wenlei's suggestion below by using `ContextBranchCounter` and `ContextRangeCounter` , also give more explanation wlei: fixed according wenlei's suggestion below by using `ContextBranchCounter` and…
				// The counter of range samples for one function indexed by the range,
				wenleiUnsubmitted Done Reply Inline Actions To be accurate, this is actually a counter rather than a map. It can be confusing if this is used for both branch and range, even though branch and range share the bit-wise representation. I think we can typedef `BranchSample` and `RangeSample` both to `std::pair<uint64_t, uint64_t>`, and then typedef `ContextBranchCounter` and `ContextRangeCounter` to `std::unordered_map<std::string, std::map<BranchSample, uint64_t>`, .. wenlei: 1. To be accurate, this is actually a counter rather than a map. 2. It can be confusing if…
				// which is represented as the start and end address pair.
				using RangeSample = std::map<std::pair<uint64_t, uint64_t>, uint64_t>;
				// Range sample counters indexed by the context string
				using ContextRangeCounter = std::unordered_map<std::string, RangeSample>;
				// Branch sample counters indexed by the context string
				using ContextBranchCounter = std::unordered_map<std::string, BranchSample>;

				// For Hybrid sample
				struct ContextSampleCounters {
				ContextRangeCounter RangeCounter;
				ContextBranchCounter BranchCounter;

				void recordRangeCount(uint64_t Start, uint64_t End, UnwindState &State,
				uint64_t Repeat);
				void recordBranchCount(const LBREntry &Branch, UnwindState &State,
				uint64_t Repeat);
				};

				struct HybridSampleHash {
				uint64_t hashCombine(uint64_t Hash, uint64_t Value) const {
				// Simple DJB2 hash
				return ((Hash << 5) + Hash) + Value;
				}

				uint64_t operator()(const HybridSample &Sample) const {
				uint64_t Hash = 5381;
				hoyUnsubmitted Done Reply Inline Actions Nit: `const` qualifier for these getters? hoy: Nit: `const` qualifier for these getters?
				wleiAuthorUnsubmitted Done Reply Inline Actions fixed, good suggestion, thanks! wlei: fixed, good suggestion, thanks!
				hoyUnsubmitted Not Done Reply Inline Actions Actually I meant something like: ProfiledBinary getBinary() const { return Binary; } bool hasNextLBR() const { return LBRIndex < LBRStack.size(); } ... Sorry for the confusion. hoy:* Actually I meant something like: ``` ProfiledBinary *getBinary() const { return Binary; } bool…
				wleiAuthorUnsubmitted Done Reply Inline Actions fixed, thanks for clarification! wlei: fixed, thanks for clarification!
				for (const auto &Value : Sample.CallStack) {
				wenleiUnsubmitted Not Done Reply Inline Actions const qualifier here as well? wenlei: const qualifier here as well?
				Hash = hashCombine(Hash, Value);
				}
				for (const auto &Entry : Sample.LBRStack) {
				Hash = hashCombine(Hash, Entry.Source);
				Hash = hashCombine(Hash, Entry.Target);
				}
				return Hash;
				}
				};

				// After parsing the sample, we record the samples by aggregating them
				// into this structure and the value is the sample counter.
				using AggregationCounter =
				std::unordered_map<HybridSample, uint64_t, HybridSampleHash>;

				// Class for sample unwinding
				class VirtualUnwinder {
				public:
				VirtualUnwinder(AggregationCounter &Counter) : AggregatedSamples(Counter) {}

				bool isCallState(UnwindState &State) const {
				// TODO: Once stack sample supports tail call (e.g. dwarf-based), we
				// need to detect tail call jump here as well. This can be done by
				// checking whether non-inline leaf has changed and it's not return.

				// Since tail call frame is missing in stack sample, we intentionally
				hoyUnsubmitted Not Done Reply Inline Actions The comment could be retired since we have a tail call tracker coming that tracks both in-LBR tail calls and out-of-LBR tail calls universally. hoy: The comment could be retired since we have a tail call tracker coming that tracks both in-LBR…
				wenleiUnsubmitted Not Done Reply Inline Actions I think the comment needs to be updated, but explanation here is still needed because IIUC missing frame inference happens more like a post process (hence somewhat orthogonal), and here `isCallState` decides the unwind operation on the stack sample (not changed by frame inference) which will always miss tail call frame (unless dwarf stack walking is used by perf). wenlei: I think the comment needs to be updated, but explanation here is still needed because IIUC…
				// don't want detect tail call now, otherwise we would be trying to
				// unwind non-existing tail call frame.
				return State.getBinary()->addressIsCall(State.getCurrentLBRSource());
				}

				bool isReturnState(UnwindState &State) const {
				// Simply check address_is_ret, as ret is always reliable, both for
				// regular call and tail call.
				return State.getBinary()->addressIsReturn(State.getCurrentLBRSource());
				}

				void unwindCall(UnwindState &State);
				void unwindLinear(UnwindState &State, uint64_t Repeat);
				void unwindReturn(UnwindState &State);
				void unwindBranchWithinFrame(UnwindState &State);

				void unwindSamples();
				bool unwindOneSample(const HybridSample &Sample, uint64_t Repeat);
				void printUnwinderOutput();
				ContextSampleCounters &getSampleCounters() { return SampleCounters; }

				private:
				ContextSampleCounters SampleCounters;
				AggregationCounter &AggregatedSamples;
				wmiUnsubmitted Not Done Reply Inline Actions Rename it to 'isCtxSensitivePerfScript'? wmi: Rename it to 'isCtxSensitivePerfScript'?
				wenleiUnsubmitted Not Done Reply Inline Actions I think context-sensitivity is a concept that only exists at FDO profile level. Thus for perf script, we used the term hybrid which faithfully represents the fact that both LBR and stack are sampled together. wenlei: I think context-sensitivity is a concept that only exists at FDO profile level. Thus for perf…
				};

				// Filename to binary map
				using BinaryMap = StringMap<ProfiledBinary>;
				// Address to binary map for fast look-up
				wenleiUnsubmitted Not Done Reply Inline Actions The idea of aggregation applies to (non-CS) AutoFDO too. It'd be good to put infrastructure in place that can cover both AutoFDO and CSSPGO in a generic way. Perhaps we can treat non-CS AutoFDO profile (or regular LBR perf profile) just like a hybrid profile except stack part is always empty? Is that what you have in mind? wenlei: The idea of aggregation applies to (non-CS) AutoFDO too. It'd be good to put infrastructure in…
				wleiAuthorUnsubmitted Done Reply Inline Actions Yeah, it should not specific to unwinder, I will move to PerfReader to support both AutoFDO and CSSPGO wlei: Yeah, it should not specific to unwinder, I will move to PerfReader to support both AutoFDO and…
				using AddressBinaryMap = std::map<uint64_t, ProfiledBinary *>;

				// Load binaries and read perf trace to parse the events and samples
				class PerfReader {
				wmiUnsubmitted Not Done Reply Inline Actions This virtual unwinder is not doing the classic unwinding thing. It is walking through the LBR stack of a LBR sample, based on the sample's callstack, and infer the callstack for each address range covered by the LBR sample. The comment can be more clear about it. wmi: This virtual unwinder is not doing the classic unwinding thing. It is walking through the LBR…
				wleiAuthorUnsubmitted Done Reply Inline Actions Thanks for your suggestion, more comments are added. wlei: Thanks for your suggestion, more comments are added.
				wmiUnsubmitted Not Done Reply Inline Actions That is helpful. Thanks. wmi: That is helpful. Thanks.

				public:
				PerfReader(cl::list<std::string> &BinaryFilenames);

				/// Prepare a memory buffer for the contents of \p Filename.
				///
				static std::unique_ptr<MemoryBuffer> setupMemoryBuffer(StringRef Filename) {
				auto BufferOrErr = MemoryBuffer::getFileOrSTDIN(Filename);
				if (std::error_code EC = BufferOrErr.getError())
				wenleiUnsubmitted Not Done Reply Inline Actions For linear unwinding, some brief explanation for handling of inlining would be helpful too. wenlei: For linear unwinding, some brief explanation for handling of inlining would be helpful too.
				exitWithError(EC, Filename);

				auto Buffer = std::move(BufferOrErr.get());
				if (uint64_t(Buffer->getBufferSize()) >
				std::numeric_limits<uint32_t>::max())
				exitWithError("file too large", Filename);
				wenleiUnsubmitted Done Reply Inline Actions Do we actually use address now? Can we remove all address and probe related stuff and add them properly in later patches? wenlei: Do we actually use address now? Can we remove all address and probe related stuff and add them…

				return Buffer;
				}

				// Hybrid sample(call stack + LBRs) profile traces are seprated by double line
				// break, search for that within the first 4k charactors to avoid going
				// through the whole file.
				wenleiUnsubmitted Done Reply Inline Actions nit: why name line_iterator Index for this one, and different from others? wenlei: nit: why name line_iterator Index for this one, and different from others?
				static bool isHybridPerfScript(StringRef FileName) {
				auto BufOrError = MemoryBuffer::getFileOrSTDIN(FileName, 4000);
				wenleiUnsubmitted Done Reply Inline Actions I suggest let's establish a consistent naming convention here wrt what is a trace and what is an event: Trace is a series of perf events. Each perf event can be an mmap event or a sample. hybrid trace is a series of perf event: mmap events and hybrid samples. hybrid sample is lbr sample plus stack sample. With that, we can rename the following: void parseHybridTrace(line_iterator &Line); -> void parseHybridSample(line_iterator &Line); UnwinderTrace -> HybridSample TraceAggregation -> SampleAggregation wenlei: I suggest let's establish a consistent naming convention here wrt what is a trace and what is…
				wleiAuthorUnsubmitted Done Reply Inline Actions Thanks for suggesting for a consistent naming convention, did the refactor. wlei: Thanks for suggesting for a consistent naming convention, did the refactor.
				if (!BufOrError)
				exitWithError(BufOrError.getError(), FileName);
				auto Buffer = std::move(BufOrError.get());
				if (Buffer->getBuffer().find("\n\n") == StringRef::npos)
				return false;
				return true;
				}

				// The parsed MMap event
				struct MMapEvent {
				pid_t PID = 0;
				uint64_t BaseAddress = 0;
				uint64_t Size = 0;
				uint64_t Offset = 0;
				StringRef BinaryPath;
				};

				/// Load symbols and disassemble the code of a give binary.
				/// Also register the binary in the binary table.
				///
				ProfiledBinary &loadBinary(const StringRef BinaryPath,
				bool AllowNameConflict = true);

				// Helper function for looking up binary in AddressBinaryMap
				static ProfiledBinary *getBinary(AddressBinaryMap &AddrToBinaryMap,
				uint64_t Address);
				hoyUnsubmitted Done Reply Inline Actions Should this be rewritten with a stream-based file reader as done in D89707? hoy: Should this be rewritten with a stream-based file reader as done in D89707?
				wleiAuthorUnsubmitted Done Reply Inline Actions I guess you mean to keep consistent to other part of code? Here you see it only read 4000bytes data from the file(`getFileOrSTDIN(FileName, 4000);`), so there shouldn't have memory issue. Currently stream-based liner only support read one line at a time, it need to search line by line, which would be slower than searching in the whole 4k memory. So which one do you prefer? wlei: I guess you mean to keep consistent to other part of code? Here you see it only read 4000bytes…
				hoyUnsubmitted Not Done Reply Inline Actions I see. The current implementation looks good to me. hoy: I see. The current implementation looks good to me.

				void updateBinaryAddress(const MMapEvent &Event);
				PerfScriptType getPerfScriptType() { return PerfType; }
				// Entry of the reader to parse multiple perf traces
				void parsePerfTraces(cl::list<std::string> &PerfTraceFilenames);
				AggregationCounter &getSamples() { return AggregatedSamples; }
				AddressBinaryMap &getAddrToBinaryMap() { return AddrToBinaryMap; };

				private:
				/// Parse a single line of a PERF_RECORD_MMAP2 event looking for a
				/// mapping between the binary name and its memory layout.
				///
				void parseMMap2Event(line_iterator &Line);
				void parseTrace(StringRef Filename);
				// Parse either an MMAP event or a perf sample
				void parseEventOrSample(line_iterator &Line);
				// Parse the hybrid sample including the call and LBR line
				wenleiUnsubmitted Done Reply Inline Actions I suggest we either include them properly or remove them from the patch. We still quite a few things not in upstream patch, and it's not possible to have TODOs for all of them. wenlei: I suggest we either include them properly or remove them from the patch. We still quite a few…
				void parseHybridSample(line_iterator &Line);
				// Extract call stack from the perfscipt line
				bool extractCallstack(line_iterator &Line, std::list<uint64_t> &CallStack);
				// Extract LBR stack from the perfscipt line
				bool extractLBRStack(line_iterator &Line, SmallVector<LBREntry, 16> &LBRStack,
				ProfiledBinary *Binary);
				void checkAndSetPerfType(cl::list<std::string> &PerfTraceFilenames);

				BinaryMap BinaryTable;
				AddressBinaryMap AddrToBinaryMap; // Used by address-based lookup.
				// Samples with the repeating time generated by the perf reader
				AggregationCounter AggregatedSamples;
				PerfScriptType PerfType;
				};

				} // end namespace sampleprof
				} // end namespace llvm

				#endif
				hoyUnsubmitted Done Reply Inline Actions Nit: `const` qualifier for getters? hoy: Nit: `const` qualifier for getters?

llvm/tools/llvm-profgen/PerfReader.cpp

This file was added.

				//===-- PerfReader.cpp - perfscript reader ---------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				#include "PerfReader.h"

				static cl::opt<bool> ShowMmapEvents("show-mmap-events", cl::ReallyHidden,
				cl::init(false), cl::ZeroOrMore,
				cl::desc("Print binary load events."));

				static cl::opt<bool> SHowUnwinderOutput("show-unwinder-output",
				cl::ReallyHidden, cl::init(false),
				cl::ZeroOrMore,
				cl::desc("Print unwinder output"));

				namespace llvm {
				namespace sampleprof {

				void VirtualUnwinder::unwindCall(UnwindState &State) {
				// The 2nd frame after leaf could be missing if stack sample is
				// taken when IP is within prolog/epilog, as frame chain isn't
				// setup yet. Fill in the missing frame in that case.
				uint64_t Source = State.getCurrentLBRSource();
				hoyUnsubmitted Not Done Reply Inline Actions Nit: please add a TODO here to check if `Source` is in prolog/epilog using precise prolog/epilog table. hoy: Nit: please add a TODO here to check if `Source` is in prolog/epilog using precise…
				auto Iter = State.CallStack.begin();
				if (State.CallStack.size() == 1 \|\| *(++Iter) != Source) {
				State.CallStack.front() = Source;
				} else {
				State.CallStack.pop_front();
				}
				State.InstPtr.update(Source);
				}

				void VirtualUnwinder::unwindLinear(UnwindState &State, uint64_t Repeat) {
				InstructionPointer &IP = State.InstPtr;
				uint64_t Target = State.getCurrentLBRTarget();
				uint64_t End = IP.Address;
				// Unwind linear execution part
				while (IP.Address >= Target) {
				uint64_t PrevIP = IP.Address;
				IP.backward();
				// Break into segments for implicit call/return due to inlining
				bool SameInlinee =
				State.getBinary()->inlineContextEqual(PrevIP, IP.Address);
				if (!SameInlinee \|\| PrevIP == Target) {
				uint64_t Start = PrevIP;
				SampleCounters.recordRangeCount(Start, End, State, Repeat);
				hoyUnsubmitted Done Reply Inline Actions Nit: just use `PrevIP` here instead of using `Start`? hoy: Nit: just use `PrevIP` here instead of using `Start`?
				End = IP.Address;
				}
				State.CallStack.front() = IP.Address;
				}
				}

				void VirtualUnwinder::unwindReturn(UnwindState &State) {
				// Add extra frame as we unwind through the return
				const LBREntry &LBR = State.getCurrentLBR();
				uint64_t CallAddr = State.getBinary()->getCallAddrFromFrameAddr(LBR.Target);
				State.CallStack.front() = CallAddr;
				State.CallStack.push_front(LBR.Source);
				State.InstPtr.update(LBR.Source);
				}

				wenleiUnsubmitted Done Reply Inline Actions Let `getCurrentLBR` return reference? or why would you want a temporary to bind const-reference to? wenlei: Let `getCurrentLBR` return reference? or why would you want a temporary to bind const-reference…
				wleiAuthorUnsubmitted Done Reply Inline Actions The LBR data is only kept in the UnwinderTrace and UnwindState only keep the index and ref. And because the UnwinderTrace is the key of the TraceAggregationMap, its type is converted to const. If that makes code confusing, I can wrap all the LBRSource/LBRTarget to getCurrentLBRSource()/getCurrentLBRTarget(). wlei: The LBR data is only kept in the UnwinderTrace and UnwindState only keep the index and ref. And…
				wenleiUnsubmitted Not Done Reply Inline Actions Ok, not a big deal I guess since LBREntry is small anyways. but getCurrentLBR can still return const reference? wenlei: Ok, not a big deal I guess since LBREntry is small anyways. but getCurrentLBR can still return…
				void VirtualUnwinder::unwindBranchWithinFrame(UnwindState &State) {
				// TODO: Tolerate tail call for now, as we may see tail call from libraries.
				// This is only for intra function branches, excluding tail calls.
				uint64_t Source = State.getCurrentLBRSource();
				State.CallStack.front() = Source;
				State.InstPtr.update(Source);
				}

				void ContextSampleCounters::recordRangeCount(uint64_t Start, uint64_t End,
				UnwindState &State,
				uint64_t Repeat) {
				std::string &&ContextId = State.getExpandedContextStr();
				RangeCounter[ContextId][{Start, End}] += Repeat;
				}

				void ContextSampleCounters::recordBranchCount(const LBREntry &Branch,
				UnwindState &State,
				uint64_t Repeat) {
				if (Branch.IsArtificial)
				return;
				std::string &&ContextId = State.getExpandedContextStr();
				BranchCounter[ContextId][{Branch.Source, Branch.Target}] += Repeat;
				}

				using ContextToSampleCounter =
				std::unordered_map<std::string,
				std::map<std::pair<uint64_t, uint64_t>, uint64_t>>;
				static void printSampleCounter(ContextToSampleCounter &Counter) {
				for (auto Range : Counter) {
				outs() << Range.first << "\n";
				for (auto I : Range.second) {
				outs() << " (" << format("%" PRIx64, I.first.first) << ", "
				<< format("%" PRIx64, I.first.second) << "): " << I.second << "\n";
				}
				wenleiUnsubmitted Done Reply Inline Actions Is this still needed now that we normalize traces before aggregation (line 259)? wenlei: Is this still needed now that we normalize traces before aggregation (line 259)?
				}
				}

				void VirtualUnwinder::printUnwinderOutput() {
				outs() << "Range Counter:\n";
				printSampleCounter(SampleCounters.RangeCounter);
				outs() << "\nBranch Counter:\n";
				printSampleCounter(SampleCounters.BranchCounter);
				}
				wenleiUnsubmitted Done Reply Inline Actions Perhaps add a wrapper `State.hasMoreLBRs()` for this? wenlei: Perhaps add a wrapper `State.hasMoreLBRs()` for this?
				wleiAuthorUnsubmitted Done Reply Inline Actions change to `State.hasNextLBR()` wlei: change to `State.hasNextLBR()`

				void VirtualUnwinder::unwindSamples() {
				for (const auto &Item : AggregatedSamples) {
				const HybridSample &Sample = Item.first;
				unwindOneSample(Sample, Item.second);
				}

				if (SHowUnwinderOutput) {
				printUnwinderOutput();
				}
				}
				wenleiUnsubmitted Done Reply Inline Actions Use `getCurrentLBR` instead of `State.LBRStack[State.LBRIndex]`? wenlei: Use `getCurrentLBR` instead of `State.LBRStack[State.LBRIndex]`?

				bool VirtualUnwinder::unwindOneSample(const HybridSample &Sample,
				uint64_t Repeat) {
				// Capture initial state as starting point for unwinding.
				UnwindState State(Sample);

				// Sanity check - making sure leaf of LBR aligns with leaf of stack sample
				// Stack sample sometimes can be unreliable, so filter out bogus ones.
				if (!State.validateInitialState())
				return false;

				// Also do not attempt linear unwind for the leaf range as it's incomplete.
				wenleiUnsubmitted Done Reply Inline Actions The comment is no longer accurate - now we don't check if src/dst crossing function binary, instead we check whether the IP is indeed at a return instruction and tail call is no longer a problem for this particular processing. (My bad I didn't update the comment in the prototype..) wenlei: The comment is no longer accurate - now we don't check if src/dst crossing function binary…
				bool IsLeaf = true;

				// Now process the LBR samples in parrallel with stack sample
				// Note that we do not reverse the LBR entry order so we can
				wmiUnsubmitted Not Done Reply Inline Actions It is needed or unneeded? Without call/ret, I assume there is no need to push/pop callstack? wmi: It is needed or unneeded? Without call/ret, I assume there is no need to push/pop callstack?
				wenleiUnsubmitted Not Done Reply Inline Actions There's no need to adjust stack for intra-function branches, though conceptually we still need to unwind through the LBRs (updates the `State`). Agreed that the mention of push/pop stack isn't accurate.. wenlei: There's no need to adjust stack for intra-function branches, though conceptually we still need…
				// unwind the sample stack as we walk through LBR entries.
				while (State.hasNextLBR()) {
				State.checkStateConsistency();

				// Unwind implicit calls/returns from inlining, along the linear path,
				// break into smaller sub section each with its own calling context.
				if (!IsLeaf) {
				unwindLinear(State, Repeat);
				}
				IsLeaf = false;

				// Save the LBR branch before it gets unwound.
				const LBREntry &Branch = State.getCurrentLBR();

				if (isCallState(State)) {
				// Unwind calls - we know we encountered call if LBR overlaps with
				// transition between leaf the 2nd frame. Note that for calls that
				// were not in the original stack sample, we should have added the
				// extra frame when processing the return paired with this call.
				unwindCall(State);
				} else if (isReturnState(State)) {
				// Unwind returns - check whether the IP is indeed at a return instruction
				unwindReturn(State);
				} else {
				// Unwind branches - for regular intra function branches, we only
				// need to record branch with context.
				unwindBranchWithinFrame(State);
				}
				State.advanceLBR();
				// Record `branch` with calling context after unwinding.
				SampleCounters.recordBranchCount(Branch, State, Repeat);
				}

				return true;
				}

				PerfReader::PerfReader(cl::list<std::string> &BinaryFilenames) {
				// Load the binaries.
				for (auto Filename : BinaryFilenames)
				loadBinary(Filename, /AllowNameConflict/ false);
				}

				ProfiledBinary &PerfReader::loadBinary(const StringRef BinaryPath,
				bool AllowNameConflict) {
				// The binary table is currently indexed by the binary name not the full
				// binary path. This is because the user-given path may not match the one
				// that was actually executed.
				StringRef BinaryName = llvm::sys::path::filename(BinaryPath);

				// Call to load the binary in the ctor of ProfiledBinary.
				auto Ret = BinaryTable.insert({BinaryName, ProfiledBinary(BinaryPath)});
				wmiUnsubmitted Done Reply Inline Actions Can you give an example of LBRStack so it is easy to understand what the code is parsing here? wmi: Can you give an example of LBRStack so it is easy to understand what the code is parsing here?
				wleiAuthorUnsubmitted Done Reply Inline Actions example is added wlei: example is added

				if (!Ret.second && !AllowNameConflict) {
				std::string ErrorMsg = "Binary name conflict: " + BinaryPath.str() +
				" and " + Ret.first->second.getPath().str() + " \n";
				exitWithError(ErrorMsg);
				}

				return Ret.first->second;
				}

				void PerfReader::updateBinaryAddress(const MMapEvent &Event) {
				// Load the binary.
				StringRef BinaryPath = Event.BinaryPath;
				StringRef BinaryName = llvm::sys::path::filename(BinaryPath);
				auto I = BinaryTable.find(BinaryName);
				// Drop the event which doesn't belong to user-provided binaries
				// or if its image is loaded at the same address
				if (I == BinaryTable.end() \|\| Event.BaseAddress == I->second.getBaseAddress())
				return;

				ProfiledBinary &Binary = I->second;

				// A binary image could be uploaded and then reloaded at different
				// place, so update the address map here
				AddrToBinaryMap.erase(Binary.getBaseAddress());
				AddrToBinaryMap[Event.BaseAddress] = &Binary;

				// Update binary load address.
				Binary.setBaseAddress(Event.BaseAddress);
				}

				ProfiledBinary *PerfReader::getBinary(AddressBinaryMap &AddrToBinaryMap,
				uint64_t Address) {
				auto Iter = AddrToBinaryMap.lower_bound(Address);
				if (Iter == AddrToBinaryMap.end() \|\| Iter->first != Address) {
				if (Iter == AddrToBinaryMap.begin())
				return nullptr;
				Iter--;
				}
				return Iter->second;
				}
				hoyUnsubmitted Done Reply Inline Actions Nit: curly braces not needed for single-statement block. hoy: Nit: curly braces not needed for single-statement block.

				bool PerfReader::extractLBRStack(line_iterator &LBRLine,
				SmallVector<LBREntry, 16> &LBRStack,
				ProfiledBinary *Binary) {
				// The raw format of LBR stack is like:
				// 0x4005c8/0x4005dc/P/-/-/0 0x40062f/0x4005b0/P/-/-/0 ...
				// ... 0x4005c8/0x4005dc/P/-/-/0
				// It's in FIFO order and seperated by whitespace.
				SmallVector<StringRef, 32> Records;
				LBRLine->split(Records, " ");
				++LBRLine;
				// Extract leading instruction pointer if present, use single
				// list to pass out as reference.
				wmiUnsubmitted Done Reply Inline Actions Removing the else if make the code a little easier to read. if (!SrcIsInternal && !DstIsInternal) continue; if (!SrcIsInternal && DstIsInternal) { PrevTrDst = Dst; continue; } if (SrcIsInternal && !DstIsInternal) { if (!PrevTrDst) continue; Dst = PrevTrDst; PrevTrDst = 0; IsArtificial = true; } // Filter out the branch sample ... wmi: Removing the else if make the code a little easier to read. ``` if (!SrcIsInternal && !
				size_t Index = 0;
				if (!Records.empty() && Records[0].find('/') == StringRef::npos) {
				uint64_t StackIP;
				wenleiUnsubmitted Done Reply Inline Actions Remove internal task Id T24431811 wenlei: Remove internal task Id T24431811
				Records[0].getAsInteger(16, StackIP);
				Index = 1;
				}
				// Now extract LBR samples - note that we do not reverse the
				// LBR entry order so we can unwind the sample stack as we walk
				// through LBR entries.
				std::pair<uint64_t, uint64_t> PrevBr;
				uint64_t PrevTrDst = 0;
				bool DuplicateFiltered = false;

				while (Index < Records.size()) {
				auto &Token = Records[Index++];
				if (Token.size() == 0)
				continue;

				SmallVector<StringRef, 8> Addresses;
				Token.split(Addresses, "/");
				uint64_t Src;
				uint64_t Dst;
				Addresses[0].substr(2).getAsInteger(16, Src);
				Addresses[1].substr(2).getAsInteger(16, Dst);

				bool SrcIsInternal = Binary->addressIsCode(Src);
				bool DstIsInternal = Binary->addressIsCode(Dst);
				wmiUnsubmitted Done Reply Inline Actions Same as above, better to give an example showing what the function is parsing here. wmi: Same as above, better to give an example showing what the function is parsing here.
				bool IsArtificial = false;

				// Ignore branches outside the current binary.
				if (!SrcIsInternal && !DstIsInternal)
				continue;
				if (!SrcIsInternal && DstIsInternal) {
				// For transition from external code (such as dynamic libraries) to
				// the current binary, keep track of the branch target which will be
				// grouped with the Source of the last transition from the current
				// binary.
				PrevTrDst = Dst;
				continue;
				}
				if (SrcIsInternal && !DstIsInternal) {
				// For transition to external code, group the Source with the next
				// availabe transition target.
				if (!PrevTrDst)
				continue;
				Dst = PrevTrDst;
				PrevTrDst = 0;
				IsArtificial = true;
				}
				// Filter out the branch sample if it is identical with the previous
				// one. This is due to a bug in LBR stack on Skylake.
				// Sometimes the LBR stack contains duplicate entries, which largely
				// messes up the region count computed from the LBR data. This logic
				// filters out one duplicate branch, note that it may remove valid
				// loop back edge, leading to n-1 trip count.
				if (Src != PrevBr.first \|\| Dst != PrevBr.second) {
				DuplicateFiltered = false;
				} else if (!DuplicateFiltered) {
				DuplicateFiltered = true;
				continue;
				}

				LBRStack.emplace_back(LBREntry(Src, Dst, IsArtificial));
				PrevBr = {Src, Dst};
				wenleiUnsubmitted Done Reply Inline Actions This function can take reference to `Trace.CallStack` directly since it's only checking call stack. Actually this function doesn't seem needed, line 290 check non-empty already, and we just need a `return Trace.Binary->addressInPrologEpilog(Trace.CallStack.front())` at line 299. wenlei: This function can take reference to `Trace.CallStack` directly since it's only checking call…
				}
				return !LBRStack.empty();
				wenleiUnsubmitted Done Reply Inline Actions remove the blank line? wenlei: remove the blank line?
				}

				bool PerfReader::extractCallstack(line_iterator &Line,
				std::list<uint64_t> &CallStack) {
				// The raw format of call stack is like:
				// 4005dc # leaf frame
				wmiUnsubmitted Done Reply Inline Actions What sampling event are you using? If br_inst_retired:near_taken is used, the sample won't end up in prolog/epilog. wmi: What sampling event are you using? If br_inst_retired:near_taken is used, the sample won't end…
				hoyUnsubmitted Done Reply Inline Actions Yes, we are sampling the br_inst_retired:near_taken event. Normally there will be no branch instructions in a prolog. This deals with a weird case where a branch instruction ends up in a shrink-wrapped prolog. hoy: Yes, we are sampling the br_inst_retired:near_taken event. Normally there will be no branch…
				wmiUnsubmitted Done Reply Inline Actions I see. Thanks! wmi: I see. Thanks!
				// 400634
				// 400684 # root frame
				// It's in bottom-up order with each frame in one line.

				// Extract stack frames from sample
				ProfiledBinary *Binary = nullptr;
				while (!Line.is_at_eof() && !Line->startswith(" 0x")) {
				StringRef FrameStr = Line->ltrim();
				++Line;
				// We might get an empty line at the beginning, skip it
				uint64_t FrameAddr = 0;
				if (FrameStr.getAsInteger(16, FrameAddr))
				break;
				if (!Binary) {
				Binary = getBinary(AddrToBinaryMap, FrameAddr);
				// we might have addr not match the MMAP, skip it
				if (!Binary) {
				if (AddrToBinaryMap.size() == 0)
				WithColor::warning() << "No MMAP event in the perfscript, create it "
				"with '--show-mmap-events'\n";
				break;
				}
				}
				// Currently intermixed frame from different binaries is not supported.
				// Ignore bottom frames not from binary of interest.
				if (!Binary->addressIsCode(FrameAddr))
				break;

				// We need to translate return address to call address
				// for non-leaf frames
				if (!CallStack.empty()) {
				FrameAddr = Binary->getCallAddrFromFrameAddr(FrameAddr);
				}

				CallStack.emplace_back(FrameAddr);
				}

				if (CallStack.empty())
				return false;
				// Skip other unrelated line, find the next valid LBR line
				while (!Line.is_at_eof() && !Line->startswith(" 0x")) {
				++Line;
				}
				// Filter out broken stack sample. We may not have complete frame info
				// if sample end up in prolog/epilog, the result is dangling context not
				// connected to entry point. This should be relatively rare thus not much
				// impact on overall profile quality. However we do want to filter them
				// out to reducde the number of different calling contexts. One instance
				// of such case - when sample landed in prolog/epilog, somehow stack
				// walking will be broken in an unexpected way that higher frames will be
				// missing.
				return !Binary->addressInPrologEpilog(CallStack.front());
				}

				void PerfReader::parseHybridSample(line_iterator &Line) {
				// The raw hybird sample started with call stack in FILO order and followed
				// intermediately by LBR sample
				// e.g.
				// 4005dc # call stack leaf
				// 400634
				// 400684 # call stack root
				// 0x4005c8/0x4005dc/P/-/-/0 0x40062f/0x4005b0/P/-/-/0 ...
				// ... 0x4005c8/0x4005dc/P/-/-/0 # LBR Entries
				//
				HybridSample Sample;

				// Parsing call stack and populate into HybridSample
				if (!extractCallstack(Line, Sample.CallStack)) {
				// Skip the next LBR line matched current call stack
				if (!Line.is_at_eof() && Line->startswith(" 0x"))
				Line++;
				return;
				}
				// Set the binary current sample belongs to
				Sample.Binary = getBinary(AddrToBinaryMap, Sample.CallStack.front());

				if (!Line.is_at_eof() && Line->startswith(" 0x")) {
				hoyUnsubmitted Done Reply Inline Actions Use `exitWithError`? hoy: Use `exitWithError`?
				// Parsing LBR stack and populate into HybridSample
				if (extractLBRStack(Line, Sample.LBRStack, Sample.Binary)) {
				// Canonicalize stack leaf to avoid 'random' IP from leaf frame skew LBR
				// ranges
				Sample.CallStack.front() = Sample.LBRStack[0].Target;
				// record samples by aggregation
				AggregatedSamples[Sample]++;
				}
				} else {
				// LBR sample is encoded in single line after stack sample
				llvm_unreachable("'Hybrid perf sample is corrupted, No LBR sample line");
				}
				}

				void PerfReader::parseEventOrSample(line_iterator &Line) {
				if (Line->startswith("PERF_RECORD_MMAP2"))
				parseMMap2Event(Line);
				else if (getPerfScriptType() == PERF_LBR_STACK)
				parseHybridSample(Line);
				else {
				// TODO: parse other type sample
				Line++;
				}
				}

				void PerfReader::parseTrace(StringRef Filename) {
				auto Buffer = setupMemoryBuffer(Filename);
				line_iterator LineIt(Buffer, /SkipBlanks=*/false);
				while (!LineIt.is_at_eof()) {
				parseEventOrSample(LineIt);
				}
				}

				void PerfReader::parseMMap2Event(line_iterator &Line) {
				// Parse a line like:
				// PERF_RECORD_MMAP2 2113428/2113428: [0x7fd4efb57000(0x204000) @ 0
				// 08:04 19532229 3585508847]: r-xp /usr/lib64/libdl-2.17.so
				constexpr static const char *const Pattern =
				"PERF_RECORD_MMAP2 ([0-9]+)/[0-9]+: "
				"\\[(0x[a-f0-9]+)\$(0x[a-f0-9]+)\$ @ "
				"(0x[a-f0-9]+\|0) .\\]: [-a-z]+ (.)";
				// Field 0 - whole line
				// Field 1 - PID
				// Field 2 - base address
				// Field 3 - mmapped size
				// Field 4 - page offset
				// Field 5 - binary path
				enum EventIndex {
				WHOLE_LINE = 0,
				PID = 1,
				BASE_ADDRESS = 2,
				MMAPPED_SIZE = 3,
				PAGE_OFFSET = 4,
				BINARY_PATH = 5
				};

				Regex RegMmap2(Pattern);
				SmallVector<StringRef, 6> Fields;
				if (RegMmap2.match(*Line, &Fields)) {
				MMapEvent Event;
				Fields[PID].getAsInteger(10, Event.PID);
				Fields[BASE_ADDRESS].getAsInteger(0, Event.BaseAddress);
				Fields[MMAPPED_SIZE].getAsInteger(0, Event.Size);
				Fields[PAGE_OFFSET].getAsInteger(0, Event.Offset);
				Event.BinaryPath = Fields[BINARY_PATH];
				updateBinaryAddress(Event);
				if (ShowMmapEvents) {
				outs() << "Mmap: Binary " << Event.BinaryPath << " loaded at "
				<< format("0x%" PRIx64 ":", Event.BaseAddress) << " \n";
				}
				} else {
				std::string ErrorMsg = "Cannot parse mmap event: Line " +
				Twine(Line.line_number()).str() + ": " +
				Line->str() + " \n";
				exitWithError(ErrorMsg);
				}
				++Line;
				}

				void PerfReader::checkAndSetPerfType(
				cl::list<std::string> &PerfTraceFilenames) {
				bool HasHybridPerf = true;
				for (auto FileName : PerfTraceFilenames) {
				if (!isHybridPerfScript(FileName)) {
				HasHybridPerf = false;
				break;
				}
				}

				if (HasHybridPerf) {
				// Set up ProfileIsCS to enable context-sensitive functionalities in
				// SampleProf
				FunctionSamples::ProfileIsCS = true;
				PerfType = PERF_LBR_STACK;
				}
				}

				void PerfReader::parsePerfTraces(cl::list<std::string> &PerfTraceFilenames) {
				hoyUnsubmitted Done Reply Inline Actions `PerfType` should be defined on the `else` branch if it is not initialized anywhere else. hoy: `PerfType` should be defined on the `else` branch if it is not initialized anywhere else.
				// check and set current perfscript type
				wenleiUnsubmitted Done Reply Inline Actions What would be the workflow for (non-CS) AutoFDO with this new implementation? It looks like `parseTrace` is responsible for aggregation only, then even for AutoFDO, there'll be a post-process after that, to get range:count, right? so it looks to me that a unified workflow could be something like this? for (auto Filename : PerfTraceFilenames) parseAndAggregateTrace(Filename); generateRawProfile(); In side `generateRawProfile`, we would do simple range overlap computation for AutoFDO, or unwind for CSSPGO. Also see comments on `AggregationCounter` - in addition to unifying the workflow, it would be good to unify data structure as well if possible. What do you think? wenlei: What would be the workflow for (non-CS) AutoFDO with this new implementation? It looks like…
				wleiAuthorUnsubmitted Done Reply Inline Actions Good suggestion! As you mention, we can incorporate all into unwinder by treating non-CS profile as hybrid sample with empty call stack. So how about we do that when implementing non-CS part, right now I will change to code like blow? void generateRawProfile (..) { if(getPerfScriptType() == PERF_LBR) { // range overlap computation for regular AutoFdo ... } else if (getPerfScriptType() == PERF_LBR_STACK) { // Unwind samples if it's hybird sample unwindSamples(); } } wlei: Good suggestion! As you mention, we can incorporate all into unwinder by treating non-CS…
				wenleiUnsubmitted Not Done Reply Inline Actions Yes, that looks good for now. wenlei: Yes, that looks good for now.
				checkAndSetPerfType(PerfTraceFilenames);
				// Parse perf traces.
				for (auto Filename : PerfTraceFilenames)
				parseTrace(Filename);

				}

				} // namespace sampleprof
				} // namespace llvm

llvm/tools/llvm-profgen/ProfileGenerator.h

This file was added.

				//===-- ProfileGenerator.h - Profile Generator ------------------ C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_TOOLS_LLVM_PROGEN_PROFILEGENERATOR_H
				#define LLVM_TOOLS_LLVM_PROGEN_PROFILEGENERATOR_H
				#include "ErrorHandling.h"
				#include "PerfReader.h"
				#include "ProfiledBinary.h"
				#include "llvm/ProfileData/SampleProfWriter.h"

				using namespace llvm;
				using namespace llvm::sampleprof;

				namespace llvm {
				namespace sampleprof {

				using AddressBinaryMap = std::map<uint64_t, ProfiledBinary *>;

				class ProfileGenerator {
				wmiUnsubmitted Not Done Reply Inline Actions I thought the tool can also generate profile for current debug info based non CS AFDO but I am not sure. I guess that is a special case handled by CSProfileGenerator. Could you confirm? wmi: I thought the tool can also generate profile for current debug info based non CS AFDO but I am…
				wenleiUnsubmitted Not Done Reply Inline Actions Yes, eventually llvm-profgen will support both. Our internal implementation for AFDO profile generation is also llvm based, but it's somewhat separated from this one. And we wanted to do some refactoring before we merge the two. That said, agreed that the two can share common interface though I think we could defer that a bit? We will upstream the AFDO profile generation after CSSPGO part is cleared. wenlei: Yes, eventually llvm-profgen will support both. Our internal implementation for AFDO profile…

				public:
				ProfileGenerator(AddressBinaryMap &Addr2BinaryMap)
				: AddrToBinaryMap(Addr2BinaryMap){};

				virtual ~ProfileGenerator() = default;
				static std::unique_ptr<ProfileGenerator>
				create(ContextSampleCounters &SampleCounters, enum PerfScriptType SampleType,
				AddressBinaryMap &Addr2BinaryMap);
				virtual void generateProfile() = 0;

				// Use SampleProfileWriter to serialize profile map
				void write();

				protected:
				/*
				For each region boundary point, mark if it is begin or end (or both) of
				the region. Boundary points are inclusive. Log the sample count as well
				so we can use it when we compute the sample count of each disjoint region
				later. Note that there might be multiple ranges with different sample
				count that share same begin/end point. We need to accumulate the sample
				count for the boundary point for such case, because for the example
				below,

				\|<--100-->\|
				\|<------200------>\|
				A B C

				sample count for disjoint region [A,B] would be 300.
				*/
				void findDisjointRanges(
				std::map<std::pair<uint64_t, uint64_t>, uint64_t> &DisjointRanges,
				const std::map<std::pair<uint64_t, uint64_t>, uint64_t> &Ranges);
				hoyUnsubmitted Done Reply Inline Actions This sounds like a method of `CSProfileGenerator`. hoy: This sounds like a method of `CSProfileGenerator`.

				// Used by SampleProfileWriter
				StringMap<FunctionSamples> ProfileMap;
				AddressBinaryMap &AddrToBinaryMap;
				};

				class CSProfileGenerator : public ProfileGenerator {
				ContextSampleCounters &SampleCounters;

				public:
				CSProfileGenerator(ContextSampleCounters &Counter,
				AddressBinaryMap &Addr2BinaryMap)
				: ProfileGenerator(Addr2BinaryMap), SampleCounters(Counter){};

				public:
				void generateProfile() override {
				// Fill in function body samples
				populateFunctionBodySamples();

				// Fill in head sample counts as well as value samples for calls
				populateFunctionBoundarySamples();

				// Fill in call site value sample for inlined calls and also use context to
				// infer missing samples Since we don't have call count for inlined
				// functions, we estimate it from inlinee's// profile using the average of first/last body sample.
				populateInferredFunctionSamples();
				hoyUnsubmitted Done Reply Inline Actions Please make this TODO more clear. hoy: Please make this TODO more clear.
				}

				private:
				void updateBodySamplesforFunctionProfile(uint64_t Address,
				FunctionSamples &FunctionProfile,
				ProfiledBinary *Binary,
				uint64_t Count);
				// Lookup or create FunctionSamples for the context
				FunctionSamples &getFunctionProfileForContext(StringRef ContextId);
				void populateFunctionBodySamples();
				void populateFunctionBoundarySamples();
				void populateInferredFunctionSamples();
				};

				} // end namespace sampleprof
				} // end namespace llvm

				#endif

llvm/tools/llvm-profgen/ProfileGenerator.cpp

This file was added.

				//===-- ProfileGenerator.cpp - Profile Generator ---------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				wenleiUnsubmitted Done Reply Inline Actions nit: all header comments are screwed up by our internal linter. wenlei: nit: all header comments are screwed up by our internal linter.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "ProfileGenerator.h"

				static cl::opt<std::string> OutputFilename("output", cl::value_desc("output"),
				cl::Required,
				cl::desc("Output profile file"));

				static cl::opt<SampleProfileFormat> OutputFormat(
				"format", cl::desc("Format of output profile"), cl::init(SPF_Text),
				cl::values(
				clEnumValN(SPF_Binary, "binary", "Binary encoding (default)"),
				clEnumValN(SPF_Compact_Binary, "compbinary", "Compact binary encoding"),
				clEnumValN(SPF_Ext_Binary, "extbinary", "Extensible binary encoding"),
				clEnumValN(SPF_Text, "text", "Text encoding"),
				clEnumValN(SPF_GCC, "gcc",
				"GCC encoding (only meaningful for -sample)")));

				using namespace llvm;

				namespace llvm {
				namespace sampleprof {

				std::unique_ptr<ProfileGenerator>
				ProfileGenerator::create(ContextSampleCounters &SampleCounters,
				enum PerfScriptType SampleType,
				AddressBinaryMap &Addr2BinaryMap) {
				std::unique_ptr<ProfileGenerator> ProfileGenerator;

				if (SampleType == PERF_LBR_STACK) {
				ProfileGenerator.reset(
				new CSProfileGenerator(SampleCounters, Addr2BinaryMap));
				} else {
				// TODO:
				exitWithError("Unsupported perfscript!");
				}

				return ProfileGenerator;
				}

				void ProfileGenerator::write() {
				auto WriterOrErr = SampleProfileWriter::create(OutputFilename, OutputFormat);
				hoyUnsubmitted Not Done Reply Inline Actions I'm wondering if a separate profile file should be output for each binary. Since the samples are already separated for binaries via `BinarySampleCounters`, `ProfileMap` can be made like that too. hoy: I'm wondering if a separate profile file should be output for each binary. Since the samples…
				wleiAuthorUnsubmitted Done Reply Inline Actions Yeah, it's doable. but that needs more CL design, currently we only support one output file, so we have to change supporting multiple output files which also need an exact one-one mapping to the binary. So we can use `OutputFilenames` to receives multiple output files and match them in order on the command line? or I'm also thinking we just remain this and if the user really need to separate the output for binary, they could call the tool multiple times with different input binary. any suggestions on the command? wlei: Yeah, it's doable. but that needs more CL design, currently we only support one output file, so…
				hoyUnsubmitted Not Done Reply Inline Actions I see. Let's keep a single output for now. hoy: I see. Let's keep a single output for now.
				wenleiUnsubmitted Not Done Reply Inline Actions What about limiting to single binary input for now? Error our with message saying unsupported if multiple binaries are provided. Generating profiles for multiple binaries in a single output file will make the profile summary info inaccurate (e.g. percentile based hot thresholds). wenlei: What about limiting to single binary input for now? Error our with message saying unsupported…
				if (std::error_code EC = WriterOrErr.getError())
				exitWithError(EC, OutputFilename);
				auto Writer = std::move(WriterOrErr.get());
				Writer->write(ProfileMap);
				}
				hoyUnsubmitted Done Reply Inline Actions typedef or using an alias like `RangeCountMap` for this type? hoy: typedef or using an alias like `RangeCountMap` for this type?

				void ProfileGenerator::findDisjointRanges(RangeSample &DisjointRanges,
				const RangeSample &Ranges) {

				/*
				Regions may overlap with each other. Using the boundary info, find all
				disjoint ranges and their sample count. In the example above, there are
				three boundary points A, B, and C, whose begin/end counts are

				A: (300, 0)
				B: (0, 100)
				C: (0, 200)

				respectively. With these points, follwing logic finds two disjoint
				region of

				[A,B]: 300
				[B+1,C]: 200
				wmiUnsubmitted Not Done Reply Inline Actions Why there is region [A, B]: 300, but B: (0, 100) only has 100 sample count? wmi: Why there is region [A, B]: 300, but B: (0, 100) only has 100 sample count?
				wleiAuthorUnsubmitted Done Reply Inline Actions Sorry for the confusion. See the graph below, here B:(0, 100) is the boundary point, 0 means no samples begin at B, 100 means one sample(sample1) ends at B whose count is 100. I changed the explanation in the comment, see whether it's clear or not. \|<--100-->\| Sample1 \|<------200------>\| Sample2 A B C wlei: Sorry for the confusion. See the graph below, here B:(0, 100) is the boundary point, 0 means no…
				wmiUnsubmitted Not Done Reply Inline Actions It is helpful too. Thanks. wmi: It is helpful too. Thanks.

				. If there is a boundary point that both begin and end, the point itself
				becomes a separate disjoint region. For example, if we have original
				ranges of

				\|<--- 100 --->\|
				\|<--- 200 --->\|
				A B C

				, there are three boundary points with their begin/end counts of

				A: (100, 0)
				B: (200, 100)
				C: (0, 200)

				, and the disjoint ranges would be

				[A, B-1]: 100
				[B, B]: 300
				[B+1, C]: 200.
				*/

				struct BoundaryPoint {
				uint64_t BeginCount;
				uint64_t EndCount;

				BoundaryPoint() : BeginCount(0), EndCount(0){};

				void addBeginCount(uint64_t Count) { BeginCount += Count; }

				void addEndCount(uint64_t Count) { EndCount += Count; }
				};

				std::map<uint64_t, BoundaryPoint> Boundaries;

				for (auto Item : Ranges) {
				uint64_t Begin = Item.first.first;
				uint64_t End = Item.first.second;
				uint64_t Count = Item.second;
				if (Boundaries.find(Begin) == Boundaries.end())
				Boundaries[Begin] = BoundaryPoint();
				Boundaries[Begin].addBeginCount(Count);

				if (Boundaries.find(End) == Boundaries.end())
				Boundaries[End] = BoundaryPoint();
				Boundaries[End].addEndCount(Count);
				}

				uint64_t BeginAddress = 0;
				int Count = 0;
				for (auto Item : Boundaries) {
				uint64_t Address = Item.first;
				BoundaryPoint &Point = Item.second;
				if (Point.BeginCount) {
				if (BeginAddress)
				DisjointRanges[{BeginAddress, Address - 1}] = Count;
				Count += Point.BeginCount;
				BeginAddress = Address;
				}
				if (Point.EndCount) {
				assert(BeginAddress && "First boundary point cannot be 'end' point");
				DisjointRanges[{BeginAddress, Address}] = Count;
				Count -= Point.EndCount;
				BeginAddress = Address + 1;
				}
				}
				}

				FunctionSamples &
				CSProfileGenerator::getFunctionProfileForContext(StringRef ContextStr) {
				if (ProfileMap.find(ContextStr) == ProfileMap.end()) {
				SampleContext FContext(ContextStr, RawContext);
				FunctionSamples &FProfile = ProfileMap[ContextStr];
				FProfile.setName(FContext.getName());
				FProfile.setContext(FContext);
				hoyUnsubmitted Done Reply Inline Actions Just `return ProfileMap[ContextStr]`? hoy: Just `return ProfileMap[ContextStr]`?
				}
				return ProfileMap[ContextStr];
				}

				void CSProfileGenerator::updateBodySamplesforFunctionProfile(
				uint64_t Address, FunctionSamples &FunctionProfile, ProfiledBinary *Binary,
				uint64_t Count) {
				// Use the maximum count of samples with same line location
				const SourceLocation &LeafLoc = Binary->getInlineLeafFrameLoc(Address);
				ErrorOr<uint64_t> R = FunctionProfile.findSamplesAt(
				LeafLoc.second.LineOffset, LeafLoc.second.Discriminator);
				if (std::error_code EC = R.getError())
				exitWithError(EC);
				uint64_t PreviousCount = R.get();
				if (PreviousCount < Count) {
				FunctionProfile.addBodySamples(LeafLoc.second.LineOffset,
				LeafLoc.second.Discriminator,
				Count - PreviousCount);
				}
				wenleiUnsubmitted Done Reply Inline Actions This would not be consistent with the definition of total samples. I think we should only add the portion that was added to body samples. wenlei: This would not be consistent with the definition of total samples. I think we should only add…
				FunctionProfile.addTotalSamples(Count);
				}

				void CSProfileGenerator::populateFunctionBodySamples() {
				for (const auto &Item : SampleCounters.RangeCounter) {
				StringRef ContextId(Item.first);
				// Get or create function profile for the range
				FunctionSamples &FunctionProfile = getFunctionProfileForContext(ContextId);
				auto Binary = PerfReader::getBinary(AddrToBinaryMap,
				Item.second.begin()->first.first);
				// Compute disjoint ranges first, so we can use MAX
				// for calculating count for each location.
				RangeSample Ranges;
				findDisjointRanges(Ranges, Item.second);

				for (auto Range : Ranges) {
				uint64_t RangeBegin = Range.first.first;
				uint64_t RangeEnd = Range.first.second;
				uint64_t Count = Range.second;
				// Disjoint ranges have introduce zero-filled gap that
				// doesn't belong to current context, filter them out.
				if (Count == 0)
				continue;

				InstructionPointer IP(Binary, RangeBegin, true);

				// Disjoint ranges may have range in the middle of two instr,
				// e.g. If Instr1 at Addr1, and Instr2 at Addr2, disjoint range
				// can be Addr1+1 to Addr2-1. We should ignore such range.
				if (IP.Address > RangeEnd)
				hoyUnsubmitted Done Reply Inline Actions Can this be moved out of the loop? The string-based look up is potentially slow. hoy: Can this be moved out of the loop? The string-based look up is potentially slow.
				continue;

				while (IP.Address <= RangeEnd) {
				// Recording body sample for this specific context
				updateBodySamplesforFunctionProfile(IP.Address, FunctionProfile, Binary,
				hoyUnsubmitted Done Reply Inline Actions `Binary` can be achieved via `Reader` so no need to pass in as an argument. hoy: `Binary` can be achieved via `Reader` so no need to pass in as an argument.
				Count);
				// Move to next IP within the range
				IP.advance();
				}
				}
				}
				}

				void CSProfileGenerator::populateFunctionBoundarySamples() {
				wenleiUnsubmitted Done Reply Inline Actions This now actually does more than head samples as call target value sample is handled here too. Perhaps `populateFunctionBoundarySamples` ? wenlei: This now actually does more than head samples as call target value sample is handled here too.
				for (const auto &Item : SampleCounters.BranchCounter) {
				StringRef ContextId(Item.first);
				auto Binary = PerfReader::getBinary(AddrToBinaryMap,
				Item.second.begin()->first.first);
				// Get or create function profile for branch Source
				FunctionSamples &FunctionProfile = getFunctionProfileForContext(ContextId);

				for (auto Entry : Item.second) {
				hoyUnsubmitted Not Done Reply Inline Actions @wmi Looks like here is a place we use ELF symbol name as function name. May need to call `getCanonicalFnName` here. hoy: @wmi Looks like here is a place we use ELF symbol name as function name. May need to call…
				uint64_t Source = Entry.first.first;
				uint64_t Target = Entry.first.second;
				uint64_t Count = Entry.second;
				StringRef TargetName = Binary->getFuncFromStartAddr(Target);
				if (TargetName.size() == 0)
				wenleiUnsubmitted Done Reply Inline Actions Same as line 195, this could be hoisted out of the loop? wenlei: Same as line 195, this could be hoisted out of the loop?
				continue;

				// Record value sample and its count
				const SourceLocation &LeafLoc = Binary->getInlineLeafFrameLoc(Source);

				FunctionProfile.addCalledTargetSamples(LeafLoc.second.LineOffset,
				LeafLoc.second.Discriminator,
				TargetName, Count);
				FunctionProfile.addTotalSamples(Count);

				// Record head sample for call Target
				// TODO: Cleanup ' @ '
				std::string ContextForTarget =
				getCallSite(LeafLoc) + " @ " + TargetName.str();
				if (ContextId.find(" @ ") != StringRef::npos) {
				ContextForTarget =
				ContextId.rsplit(" @ ").first.str() + " @ " + ContextForTarget;
				}
				wenleiUnsubmitted Done Reply Inline Actions explicit namespace is not needed here. wenlei: explicit namespace is not needed here.

				if (ProfileMap.find(ContextForTarget) != ProfileMap.end()) {
				FunctionSamples &TargetProfile = ProfileMap[ContextForTarget];
				assert(count != 0 && "Unexpected zero weight branch");
				if (TargetProfile.getName().size()) {
				TargetProfile.addHeadSamples(Count);
				}
				}
				}
				}
				}

				static SourceLocation getCallerConext(StringRef CalleeContext,
				StringRef &CallerNameWithContext) {
				StringRef CallerContext = CalleeContext.rsplit(" @ ").first;
				CallerNameWithContext = CallerContext.rsplit(':').first;
				auto ContextSplit = CallerContext.rsplit(" @ ");
				SourceLocation LeafFrameLoc = {"", {0, 0}};
				StringRef Funcname;
				SampleContext::DecodeContextString(ContextSplit.second, Funcname,
				LeafFrameLoc.second);
				LeafFrameLoc.first = Funcname.str();
				return LeafFrameLoc;
				wmiUnsubmitted Done Reply Inline Actions Conext --> Context? wmi: Conext --> Context?
				}
				wenleiUnsubmitted Done Reply Inline Actions This is now more than just value samples. The difference is this one us context to infer missing samples, but others use range and branch to populate samples directly. So name it `populateInferredFunctionSamples`? wenlei: This is now more than just value samples. The difference is this one us context to infer…

				void CSProfileGenerator::populateInferredFunctionSamples() {
				for (const auto &Item : ProfileMap) {
				const StringRef CalleeContext = Item.first();
				const FunctionSamples &CalleeProfile = Item.second;

				// If we already have head sample counts, we must have value profile
				// for call sites added already. Skip to avoid double counting.
				if (CalleeProfile.getHeadSamples())
				continue;
				// If we don't have context, nothing to do for caller's call site.
				// This could happen for entry point function.
				if (CalleeContext.find(" @ ") == StringRef::npos)
				continue;

				wenleiUnsubmitted Done Reply Inline Actions nit: name it `CallerLeafFrameLoc`? wenlei: nit: name it `CallerLeafFrameLoc`?
				StringRef NameWithContext;
				SourceLocation &&CallerLeafFrameLoc =
				getCallerConext(CalleeContext, NameWithContext);

				wenleiUnsubmitted Done Reply Inline Actions `caller_profile` -> `CallerProfile` wenlei: `caller_profile` -> `CallerProfile`
				// It's possible that we haven't seen any sample directly in the caller,
				// in which case CallerProfile will not exist. But we can't modify
				// ProfileMap while iterating it.
				// TODO: created function profile for those callers too
				if (ProfileMap.find(NameWithContext) == ProfileMap.end())
				continue;
				FunctionSamples &CallerProfile = ProfileMap[NameWithContext];

				// Since we don't have call count for inlined functions, we
				// estimate it from inlinee's profile using entry body sample.
				uint64_t EstimatedCallCount = CalleeProfile.getEntrySamples();
				// If we don't have samples with location, use 1 to indicate live.
				if (!EstimatedCallCount && !CalleeProfile.getBodySamples().size())
				EstimatedCallCount = 1;
				CallerProfile.addCalledTargetSamples(
				CallerLeafFrameLoc.second.LineOffset,
				CallerLeafFrameLoc.second.Discriminator, CalleeProfile.getName(), EstimatedCallCount);
				CallerProfile.addBodySamples(CallerLeafFrameLoc.second.LineOffset,
				CallerLeafFrameLoc.second.Discriminator,
				EstimatedCallCount);
				CallerProfile.addTotalSamples(EstimatedCallCount);
				}
				}

				} // namespace sampleprof
				} // end namespace llvm

llvm/tools/llvm-profgen/ProfiledBinary.h

//===-- ProfiledBinary.h - Binary decoder ------------------------ C++ --===//		//===-- ProfiledBinary.h - Binary decoder ------------------------ C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_TOOLS_LLVM_PROGEN_PROFILEDBINARY_H		#ifndef LLVM_TOOLS_LLVM_PROGEN_PROFILEDBINARY_H
#define LLVM_TOOLS_LLVM_PROGEN_PROFILEDBINARY_H		#define LLVM_TOOLS_LLVM_PROGEN_PROFILEDBINARY_H

#include "CallContext.h"		#include "CallContext.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/DebugInfo/Symbolize/Symbolize.h"		#include "llvm/DebugInfo/Symbolize/Symbolize.h"
#include "llvm/Object/ELFObjectFile.h"		#include "llvm/Object/ELFObjectFile.h"
		#include "llvm/ProfileData/SampleProf.h"
#include "llvm/Support/Path.h"		#include "llvm/Support/Path.h"
		#include <list>
#include <set>		#include <set>
#include <string>		#include <string>
#include <unordered_map>		#include <unordered_map>
#include <unordered_set>		#include <unordered_set>
#include <vector>		#include <vector>

using namespace llvm::object;		using namespace llvm::object;

namespace llvm {		namespace llvm {
namespace sampleprof {		namespace sampleprof {

class ProfiledBinary;		class ProfiledBinary;

struct InstructionPointer {		struct InstructionPointer {
ProfiledBinary *Binary;		ProfiledBinary *Binary;
// Offset to the base address of the executable segment of the binary.		union {
uint64_t Offset;		// Offset of the executable segment of the binary.
		uint64_t Offset = 0;
		// Also used as address in unwinder
		uint64_t Address;
		};
// Index to the sorted code address array of the binary.		// Index to the sorted code address array of the binary.
uint64_t Index;		uint64_t Index = 0;
		InstructionPointer(ProfiledBinary *Binary, uint64_t Address,
InstructionPointer(ProfiledBinary *Binary, uint64_t Offset)		bool RoundToNext = false);
: Binary(Binary), Offset(Offset) {		void advance();
Index = 0;		void backward();
}		void update(uint64_t Addr);
		wenleiUnsubmitted Done Reply Inline Actions Is operator++ and operator-- used? these two are duplicated with advance/backward, and we only need to keep one set? wenlei: Is operator++ and operator-- used? these two are duplicated with advance/backward, and we only…
};		};

		using AddressBinaryMap = std::map<uint64_t, ProfiledBinary *>;

class ProfiledBinary {		class ProfiledBinary {
// Absolute path of the binary.		// Absolute path of the binary.
std::string Path;		std::string Path;
// The target triple.		// The target triple.
Triple TheTriple;		Triple TheTriple;
// The runtime base address that the executable sections are loaded at.		// The runtime base address that the executable sections are loaded at.
mutable uint64_t BaseAddress = 0;		mutable uint64_t BaseAddress = 0;
// The preferred base address that the executable sections are loaded at.		// The preferred base address that the executable sections are loaded at.
uint64_t PreferredBaseAddress = 0;		uint64_t PreferredBaseAddress = 0;
// A list of text sections sorted by start RVA and size. Used to check		// A list of text sections sorted by start RVA and size. Used to check
// if a given RVA is a valid code address.		// if a given RVA is a valid code address.
std::set<std::pair<uint64_t, uint64_t>> TextSections;		std::set<std::pair<uint64_t, uint64_t>> TextSections;
// Function offset to name mapping.		// Function offset to name mapping.
std::unordered_map<uint64_t, std::string> FuncStartAddrMap;		std::unordered_map<uint64_t, std::string> FuncStartAddrMap;
		// Address to context location map. Used to expand the context.
		std::unordered_map<uint64_t, SourceLocationVec> AddrToLocMap;
// An array of offsets of all instructions sorted in increasing order. The		// An array of offsets of all instructions sorted in increasing order. The
// sorting is needed to fast advance to the next forward/backward instruction.		// sorting is needed to fast advance to the next forward/backward instruction.
std::vector<uint64_t> CodeAddrs;		std::vector<uint64_t> CodeAddrs;
// A set of call instruction offsets. Used by virtual unwinding.		// A set of call instruction offsets. Used by virtual unwinding.
std::unordered_set<uint64_t> CallAddrs;		std::unordered_set<uint64_t> CallAddrs;
// A set of return instruction offsets. Used by virtual unwinding.		// A set of return instruction offsets. Used by virtual unwinding.
std::unordered_set<uint64_t> RetAddrs;		std::unordered_set<uint64_t> RetAddrs;
		// A set of prolog and epilog offsets. Used by virtual unwinding.
		std::unordered_set<uint64_t> PrologEpilogSet;

// The symbolizer used to get inline context for an instruction.		// The symbolizer used to get inline context for an instruction.
std::unique_ptr<symbolize::LLVMSymbolizer> Symbolizer;		std::unique_ptr<symbolize::LLVMSymbolizer> Symbolizer;

void setPreferredBaseAddress(const ELFObjectFileBase *O);		void setPreferredBaseAddress(const ELFObjectFileBase *O);

void setupSymbolizer();		void setupSymbolizer();

Show All 12 Lines	class ProfiledBinary {
/// generation.		/// generation.
void load();		void load();

public:		public:
ProfiledBinary(StringRef Path) : Path(Path) {		ProfiledBinary(StringRef Path) : Path(Path) {
setupSymbolizer();		setupSymbolizer();
load();		load();
}		}
		uint64_t virtualAddrToOffset(uint64_t VitualAddress) {
		return VitualAddress - BaseAddress;
		}
		uint64_t offsetToVirtualAddr(uint64_t offset) { return offset + BaseAddress; }
const StringRef getPath() const { return Path; }		const StringRef getPath() const { return Path; }
const StringRef getName() const { return llvm::sys::path::filename(Path); }		const StringRef getName() const { return llvm::sys::path::filename(Path); }
uint64_t getBaseAddress() const { return BaseAddress; }		uint64_t getBaseAddress() const { return BaseAddress; }
void setBaseAddress(uint64_t Address) { BaseAddress = Address; }		void setBaseAddress(uint64_t Address) { BaseAddress = Address; }
		uint64_t getPreferredBaseAddress() { return PreferredBaseAddress; }
		bool addressIsCode(uint64_t Address) {
		uint64_t Offset = virtualAddrToOffset(Address);
		return AddrToLocMap.find(Offset) != AddrToLocMap.end();
		}
		bool addressIsCall(uint64_t Address) {
		uint64_t Offset = virtualAddrToOffset(Address);
		return CallAddrs.count(Offset);
		}
		bool addressIsReturn(uint64_t Address) {
		uint64_t Offset = virtualAddrToOffset(Address);
		return RetAddrs.count(Offset);
		}
		bool addressInPrologEpilog(uint64_t Address) {
		uint64_t Offset = virtualAddrToOffset(Address);
		return PrologEpilogSet.count(Offset);
		}

		uint64_t getAddressforIndex(uint64_t idx) {
		return offsetToVirtualAddr(CodeAddrs[idx]);
		}

		// Get the index in CodeAddrs for the address
		// As we might get an address which is not the code
		// here it would round to the next valid code address by
		// using lower bound operation
		uint32_t getIndexForAddr(uint64_t Address) {
		uint64_t Offset = virtualAddrToOffset(Address);
		auto Low = std::lower_bound(CodeAddrs.begin(), CodeAddrs.end(), Offset);
		return Low - CodeAddrs.begin();
		}

		uint64_t getCallAddrFromFrameAddr(uint64_t FrameAddr) {
		return getAddressforIndex(getIndexForAddr(FrameAddr) - 1);
		}

		StringRef getFuncFromStartAddr(uint64_t StartAddr) {
		wenleiUnsubmitted Done Reply Inline Actions return a const ref, or StringRef since FuncStartAddrMap owns the string? wenlei: return a const ref, or StringRef since FuncStartAddrMap owns the string?
		uint64_t Offset = virtualAddrToOffset(StartAddr);
		return FuncStartAddrMap[Offset];
		}

		const SourceLocation &getInlineLeafFrameLoc(uint64_t Address,
		hoyUnsubmitted Done Reply Inline Actions Can this just return a reference so that you don't need to use the move semantics when it gets called? hoy: Can this just return a reference so that you don't need to use the move semantics when it gets…
		bool NameOnly = false) {
		uint64_t Offset = virtualAddrToOffset(Address);
		return AddrToLocMap[Offset].back();
		}

		// compare two addresses' inline context
		bool inlineContextEqual(uint64_t add1, uint64_t add2);

		// Get the context string of the current stack with inline context filled in.
		hoyUnsubmitted Done Reply Inline Actions Nit: Get the full context string for the given call stack with inline context filled in? hoy: Nit: Get the full context string for the given call stack with inline context filled in?
		// It will search the disassembling info stored in AddrToLocMap. This is used
		// as the key of function sample map
		std::string getExpandedContextStr(std::list<uint64_t> &stack,
		bool compressRecursion);
};		};

} // namespace sampleprof		} // namespace sampleprof
} // end namespace llvm		} // end namespace llvm

#endif		#endif

llvm/tools/llvm-profgen/ProfiledBinary.cpp

Show First 20 Lines • Show All 144 Lines • ▼ Show 20 Lines	if (auto Obj = dyn_cast<ELFObjectFileBase>(&Binary)) {
// TODO: decode other sections.		// TODO: decode other sections.

return;		return;
}		}

exitWithError("not a valid Elf image", Path);		exitWithError("not a valid Elf image", Path);
}		}

		bool ProfiledBinary::inlineContextEqual(uint64_t Address1, uint64_t Address2) {
		uint64_t Offset1 = virtualAddrToOffset(Address1);
		uint64_t Offset2 = virtualAddrToOffset(Address2);
		const SourceLocationVec &Context1 = AddrToLocMap[Offset1];
		const SourceLocationVec &Context2 = AddrToLocMap[Offset2];
		if (Context1.size() != Context2.size())
		return false;

		// The leaf frame contains location within the leaf, and it
		// needs to be remove that as it's not part of the calling context
		return std::equal(Context1.begin(), Context1.begin() + Context1.size() - 1,
		Context2.begin(), Context2.begin() + Context2.size() - 1);
		}

		std::string ProfiledBinary::getExpandedContextStr(std::list<uint64_t> &Stack,
		bool compressRecursion) {
		hoyUnsubmitted Done Reply Inline Actions Nit: `std::equal(Context1.begin(), Context1.begin() + Context1.size() - 1, Context2.begin(), Context2.begin() + Context2.size() - 1)` hoy: Nit: `std::equal(Context1.begin(), Context1.begin() + Context1.size() - 1, Context2.begin()…
		std::string ContextStr;
		SmallVector<std::string, 8> ContextVec;
		// Process from entry to leaf
		for (auto Iter = Stack.rbegin(); Iter != Stack.rend(); Iter++) {
		uint64_t Offset = virtualAddrToOffset(*Iter);
		const SourceLocationVec &ExpandedContext = AddrToLocMap[Offset];
		for (const auto &Loc : ExpandedContext) {
		ContextVec.push_back(getCallSite(Loc));
		}
		}

		assert(ContextVec.size() && "Context length should be at least 1");

		// TODO: compress for recursive context

		hoyUnsubmitted Not Done Reply Inline Actions Nit: remove the check and add it back with the compression work. hoy: Nit: remove the check and add it back with the compression work.
		for (uint32_t I = 0; I < (uint32_t)ContextVec.size(); I++) {
		if (ContextStr.size()) {
		ContextStr += " @ ";
		}

		if (I == ContextVec.size() - 1) {
		// Only keep the function name for the leaf frame
		StringRef Ref(ContextVec[I]);
		ContextStr += Ref.split(":").first.str();
		} else {
		ContextStr += ContextVec[I];
		hoyUnsubmitted Done Reply Inline Actions `RemoveLeaf` should be checked here? hoy: `RemoveLeaf` should be checked here?
		}
		}

		return ContextStr;
		}

void ProfiledBinary::setPreferredBaseAddress(const ELFObjectFileBase *Obj) {		void ProfiledBinary::setPreferredBaseAddress(const ELFObjectFileBase *Obj) {
for (section_iterator SI = Obj->section_begin(), SE = Obj->section_end();		for (section_iterator SI = Obj->section_begin(), SE = Obj->section_end();
SI != SE; ++SI) {		SI != SE; ++SI) {
const SectionRef &Section = *SI;		const SectionRef &Section = *SI;
if (Section.isText()) {		if (Section.isText()) {
PreferredBaseAddress = getELFImageLMAForSec(Section);		PreferredBaseAddress = getELFImageLMAForSec(Section);
return;		return;
}		}
▲ Show 20 Lines • Show All 101 Lines • ▼ Show 20 Lines	for (unsigned SI = 0, SE = Symbols.size(); SI != SE; ++SI) {
if (Start >= End)		if (Start >= End)
continue;		continue;

std::string SymbolName = Symbols[SI].Name.str();		std::string SymbolName = Symbols[SI].Name.str();
if (ShowDisassembly)		if (ShowDisassembly)
outs() << '<' << SymbolName << ">:\n";		outs() << '<' << SymbolName << ">:\n";

uint64_t Offset = Start;		uint64_t Offset = Start;
		unsigned InstIdx = 0;
		uint64_t PreOffset = 0;
while (Offset < End) {		while (Offset < End) {
MCInst Inst;		MCInst Inst;
uint64_t Size;		uint64_t Size;
// Disassemble an instruction.		// Disassemble an instruction.
bool Disassembled = DisAsm->getInstruction(		bool Disassembled = DisAsm->getInstruction(
Inst, Size, Bytes.slice(Offset - SectionOffset),		Inst, Size, Bytes.slice(Offset - SectionOffset),
Offset + ImageLoadAddr, nulls());		Offset + ImageLoadAddr, nulls());
if (Disassembled) {		if (Disassembled) {
if (ShowDisassembly) {		if (ShowDisassembly) {
outs() << format("%8" PRIx64 ":", Offset);		outs() << format("%8" PRIx64 ":", Offset);
size_t Start = outs().tell();		size_t Start = outs().tell();
IP->printInst(&Inst, Offset + Size, "", *STI.get(), outs());		IP->printInst(&Inst, Offset + Size, "", *STI.get(), outs());
if (ShowSourceLocations) {		if (ShowSourceLocations) {
unsigned Cur = outs().tell() - Start;		unsigned Cur = outs().tell() - Start;
if (Cur < 40)		if (Cur < 40)
outs().indent(40 - Cur);		outs().indent(40 - Cur);
InstructionPointer Inst(this, Offset);		InstructionPointer Inst(this, Offset);
outs() << getReversedLocWithContext(symbolize(Inst));		outs() << getReversedLocWithContext(symbolize(Inst));
}		}
outs() << "\n";		outs() << "\n";
}		}
// Populate address maps.		// Populate a vector of the symbolized callsite at this location
		InstructionPointer IP(this, Offset);
		AddrToLocMap[Offset] = symbolize(IP);
CodeAddrs.push_back(Offset);		CodeAddrs.push_back(Offset);
if (InstIsCall(Inst, TheTriple))
		if (InstIsCall(Inst, TheTriple)) {
CallAddrs.insert(Offset);		CallAddrs.insert(Offset);
else if (InstIsReturn(Inst, TheTriple))		} else if (InstIsReturn(Inst, TheTriple)) {
RetAddrs.insert(Offset);		RetAddrs.insert(Offset);
		PrologEpilogSet.insert(Offset);
		if (PreOffset != 0) {
		PrologEpilogSet.insert(PreOffset);
		hoyUnsubmitted Not Done Reply Inline Actions Please comment here and below that the prolog/epilog built here is based on an estimated size. Dwarf decoding is needed for a building precise prolog/epilog. Also how about separating the code to form a prolog/epilog builder that can be based on `FuncStartAddrMap`? The builder will be based on Dwarf CFI in the future. hoy: Please comment here and below that the prolog/epilog built here is based on an estimated size.
		wenleiUnsubmitted Not Done Reply Inline Actions Agree it's cleaner to decouple from the main disasm loop. For now a separate function `trackPrologEpilog` should be enough. (btw, if we need to extend to a class in the future, tracker may be better name than builder..) wenlei: Agree it's cleaner to decouple from the main disasm loop. For now a separate function…
		}
		}
} else {		} else {
exitWithError("disassembling error", FileName);		exitWithError("disassembling error", FileName);
}		}

		if (InstIdx <= 1) {
		PrologEpilogSet.insert(Offset);
		}
		InstIdx++;
		PreOffset = Offset;
Offset += Size;		Offset += Size;
}		}

if (ShowDisassembly)		if (ShowDisassembly)
outs() << "\n";		outs() << "\n";

FuncStartAddrMap[Start] = Symbols[SI].Name.str();		FuncStartAddrMap[Start] = Symbols[SI].Name.str();
}		}
}		}
}		}

void ProfiledBinary::setupSymbolizer() {		void ProfiledBinary::setupSymbolizer() {
symbolize::LLVMSymbolizer::Options SymbolizerOpts;		symbolize::LLVMSymbolizer::Options SymbolizerOpts;
SymbolizerOpts.PrintFunctions =		SymbolizerOpts.PrintFunctions =
DILineInfoSpecifier::FunctionNameKind::LinkageName;		DILineInfoSpecifier::FunctionNameKind::LinkageName;
SymbolizerOpts.Demangle = false;		SymbolizerOpts.Demangle = false;
SymbolizerOpts.DefaultArch = TheTriple.getArchName().str();		SymbolizerOpts.DefaultArch = TheTriple.getArchName().str();
SymbolizerOpts.UseSymbolTable = false;		SymbolizerOpts.UseSymbolTable = false;
SymbolizerOpts.RelativeAddresses = false;		SymbolizerOpts.RelativeAddresses = false;
Symbolizer = std::make_unique<symbolize::LLVMSymbolizer>(SymbolizerOpts);		Symbolizer = std::make_unique<symbolize::LLVMSymbolizer>(SymbolizerOpts);
}		}

SourceLocationVec ProfiledBinary::symbolize(const InstructionPointer &I) {		SourceLocationVec ProfiledBinary::symbolize(const InstructionPointer &I) {
assert(this == I.Binary);		assert(this == I.Binary);
auto Addr = object::SectionedAddress{I.Offset + PreferredBaseAddress,		auto Addr = object::SectionedAddress{I.Address + PreferredBaseAddress,
object::SectionedAddress::UndefSection};		object::SectionedAddress::UndefSection};
DIInliningInfo InlineStack =		DIInliningInfo InlineStack =
unwrapOrError(Symbolizer->symbolizeInlinedCode(Path, Addr), getName());		unwrapOrError(Symbolizer->symbolizeInlinedCode(Path, Addr), getName());

SourceLocationVec CallStack;		SourceLocationVec CallStack;

for (int i = InlineStack.getNumberOfFrames() - 1; i >= 0; i--) {		for (int i = InlineStack.getNumberOfFrames() - 1; i >= 0; i--) {
const auto &CallerFrame = InlineStack.getFrame(i);		const auto &CallerFrame = InlineStack.getFrame(i);
if (CallerFrame.FunctionName == "<invalid>")		if (CallerFrame.FunctionName == "<invalid>")
break;		break;
LineLocation Line(CallerFrame.Line - CallerFrame.StartLine,		LineLocation Line(CallerFrame.Line - CallerFrame.StartLine,
CallerFrame.Discriminator);		CallerFrame.Discriminator);
SourceLocation Callsite(CallerFrame.FunctionName, Line);		SourceLocation Callsite(CallerFrame.FunctionName, Line);
CallStack.push_back(Callsite);		CallStack.push_back(Callsite);
}		}

return CallStack;		return CallStack;
}		}

		InstructionPointer::InstructionPointer(ProfiledBinary *Binary, uint64_t Address,
		bool RoundToNext)
		: Binary(Binary), Address(Address) {
		Index = Binary->getIndexForAddr(Address);
		if (RoundToNext) {
		// we might get address which is not the code
		// it should round to the next valid address
		this->Address = Binary->getAddressforIndex(Index);
		}
		}

		void InstructionPointer::advance() {
		Index++;
		Address = Binary->getAddressforIndex(Index);
		}

		void InstructionPointer::backward() {
		Index--;
		Address = Binary->getAddressforIndex(Index);
		}

		void InstructionPointer::update(uint64_t Addr) {
		Address = Addr;
		Index = Binary->getIndexForAddr(Address);
		}

} // namespace sampleprof		} // namespace sampleprof
} // end namespace llvm		} // end namespace llvm

llvm/tools/llvm-profgen/llvm-profgen.cpp

	//===- llvm-profgen.cpp - LLVM SPGO profile generation tool ---------------===//			//===- llvm-profgen.cpp - LLVM SPGO profile generation tool ------ C++ --===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// llvm-profgen generates SPGO profiles from perf script ouput.			// llvm-profgen generates SPGO profiles from perf script ouput.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "ErrorHandling.h"			#include "ErrorHandling.h"
				#include "PerfReader.h"
				#include "ProfileGenerator.h"
	#include "ProfiledBinary.h"			#include "ProfiledBinary.h"
	#include "llvm/Support/CommandLine.h"			#include "llvm/Support/CommandLine.h"
	#include "llvm/Support/InitLLVM.h"			#include "llvm/Support/InitLLVM.h"
	#include "llvm/Support/LineIterator.h"			#include "llvm/Support/LineIterator.h"
	#include "llvm/Support/MemoryBuffer.h"			#include "llvm/Support/MemoryBuffer.h"
	#include "llvm/Support/Path.h"			#include "llvm/Support/Path.h"
	#include "llvm/Support/Regex.h"
	#include "llvm/Support/TargetSelect.h"			#include "llvm/Support/TargetSelect.h"
	#include <iostream>
	#include <list>
	#include <map>

	using namespace llvm;			using namespace llvm;
				using namespace llvm::sampleprof;

	static cl::list<std::string> PerfTraceFilenames(			static cl::list<std::string> PerfTraceFilenames(
	"perfscript", cl::value_desc("perfscript"), cl::OneOrMore,			"perfscript", cl::value_desc("perfscript"), cl::OneOrMore,
	llvm::cl::MiscFlags::CommaSeparated,			llvm::cl::MiscFlags::CommaSeparated,
	cl::desc("Path of perf-script trace created by Linux perf tool with "			cl::desc("Input Linux perf script output (should be profiled with -b)"));
	"`script` command(the raw perf.data should be profiled with -b)"));

	static cl::list<std::string>			static cl::list<std::string>
	BinaryFilenames("binary", cl::value_desc("binary"), cl::OneOrMore,			BinaryFilenames("binary", cl::value_desc("binary"), cl::OneOrMore,
	llvm::cl::MiscFlags::CommaSeparated,			llvm::cl::MiscFlags::CommaSeparated,
	cl::desc("Path of profiled binary files"));			cl::desc("Input profiled binary files"));

	static cl::opt<std::string> OutputFilename("output", cl::value_desc("output"),
	cl::Required,
	cl::desc("Output profile file"));

	static cl::opt<bool> ShowMmapEvents("show-mmap-events", cl::ReallyHidden,
	cl::init(false), cl::ZeroOrMore,
	cl::desc("Print binary load events."));

	namespace llvm {
	namespace sampleprof {

	using BinaryMap = StringMap<ProfiledBinary>;
	using AddressBinaryMap = std::map<uint64_t, ProfiledBinary *>;

	struct MMapEvent {
	pid_t PID = 0;
	uint64_t BaseAddress = 0;
	uint64_t Size = 0;
	uint64_t Offset = 0;
	StringRef BinaryPath;
	};

	class PerfReader {

	BinaryMap BinaryTable;
	AddressBinaryMap AddrToBinaryMap; // Used by address-based lookup.

	/// Prepare a memory buffer for the contents of \p Filename.
	///
	static std::unique_ptr<MemoryBuffer> setupMemoryBuffer(StringRef Filename) {
	auto BufferOrErr = MemoryBuffer::getFileOrSTDIN(Filename);
	if (std::error_code EC = BufferOrErr.getError())
	exitWithError(EC, Filename);

	auto Buffer = std::move(BufferOrErr.get());
	if (Buffer->getBufferSize() >
	static_cast<size_t>(std::numeric_limits<uint32_t>::max()))
	exitWithError("file too large", Filename);

	return Buffer;
	}

	/// Load symbols and disassemble the code of a give binary.
	/// Also register the binary in the binary table.
	///
	ProfiledBinary &loadBinary(const StringRef BinaryPath,
	bool AllowNameConflict = true) {
	// The binary table is currently indexed by the binary name not the full
	// binary path. This is because the user-given path may not match the one
	// that was actually executed.
	StringRef BinaryName = llvm::sys::path::filename(BinaryPath);

	// Call to load the binary in the ctor of ProfiledBinary.
	auto Ret = BinaryTable.insert({BinaryName, ProfiledBinary(BinaryPath)});

	if (!Ret.second && !AllowNameConflict) {
	std::string ErrorMsg = "Binary name conflict: " + BinaryPath.str() +
	" and " + Ret.first->second.getPath().str() +
	" \n";
	exitWithError(ErrorMsg);
	}

	return Ret.first->second;
	}

	void updateBinaryAddress(const MMapEvent &Event) {
	// Load the binary.
	StringRef BinaryPath = Event.BinaryPath;
	StringRef BinaryName = llvm::sys::path::filename(BinaryPath);

	auto I = BinaryTable.find(BinaryName);
	// Drop the event which doesn't belong to user-provided binaries
	// or if its image is loaded at the same address
	if (I == BinaryTable.end() \|\|
	Event.BaseAddress == I->second.getBaseAddress())
	return;

	ProfiledBinary &Binary = I->second;

	// A binary image could be uploaded and then reloaded at different
	// place, so update the address map here
	AddrToBinaryMap.erase(Binary.getBaseAddress());
	AddrToBinaryMap[Event.BaseAddress] = &Binary;

	// Update binary load address.
	Binary.setBaseAddress(Event.BaseAddress);
	}

	public:
	PerfReader() {}

	/// Parse a single line of a PERF_RECORD_MMAP2 event looking for a
	/// mapping between the binary name and its memory layout.
	///
	void parseMMap2Event(const line_iterator Line) {
	// Parse a line like:
	// PERF_RECORD_MMAP2 2113428/2113428: [0x7fd4efb57000(0x204000) @ 0
	// 08:04 19532229 3585508847]: r-xp /usr/lib64/libdl-2.17.so
	constexpr static const char *const Pattern =
	"PERF_RECORD_MMAP2 ([0-9]+)/[0-9]+: "
	"\\[(0x[a-f0-9]+)\$(0x[a-f0-9]+)\$ @ "
	"(0x[a-f0-9]+\|0) .\\]: [-a-z]+ (.)";
	// Field 0 - whole line
	// Field 1 - PID
	// Field 2 - base address
	// Field 3 - mmapped size
	// Field 4 - page offset
	// Field 5 - binary path
	enum EventIndex {
	WHOLE_LINE = 0,
	PID = 1,
	BASE_ADDRESS = 2,
	MMAPPED_SIZE = 3,
	PAGE_OFFSET = 4,
	BINARY_PATH = 5
	};

	Regex RegMmap2(Pattern);
	SmallVector<StringRef, 6> Fields;
	if (RegMmap2.match(*Line, &Fields)) {
	MMapEvent Event;
	Fields[PID].getAsInteger(10, Event.PID);
	Fields[BASE_ADDRESS].getAsInteger(0, Event.BaseAddress);
	Fields[MMAPPED_SIZE].getAsInteger(0, Event.Size);
	Fields[PAGE_OFFSET].getAsInteger(0, Event.Offset);
	Event.BinaryPath = Fields[BINARY_PATH];
	updateBinaryAddress(Event);
	if (ShowMmapEvents) {
	outs() << "Mmap: Binary " << Event.BinaryPath << " loaded at "
	<< format("0x%" PRIx64 ":", Event.BaseAddress) << " \n";
	}
	} else {
	std::string ErrorMsg = "Cannot parse mmap event: Line " +
	Twine(Line.line_number()).str() + ": " +
	Line->str() + " \n";
	exitWithError(ErrorMsg);
	}
	}

	void parseEvent(line_iterator &Index) {
	if (Index->startswith("PERF_RECORD_MMAP2"))
	parseMMap2Event(Index);
	++Index;
	}

	void parseTrace(StringRef Filename) {
	auto Buffer = setupMemoryBuffer(Filename);
	line_iterator LineIt(Buffer, /SkipBlanks=*/false);
	while (!LineIt.is_at_eof()) {
	parseEvent(LineIt);
	}
	}

	void run() {
	// Load the binaries.
	for (auto Filename : BinaryFilenames)
	loadBinary(Filename, /AllowNameConflict/ false);

	// Parse perf traces.
	for (auto Filename : PerfTraceFilenames)
	parseTrace(Filename);
	}
	};

	} // end namespace sampleprof
	} // end namespace llvm

	using namespace sampleprof;

	int main(int argc, const char *argv[]) {			int main(int argc, const char *argv[]) {
	InitLLVM X(argc, argv);			InitLLVM X(argc, argv);

	cl::ParseCommandLineOptions(argc, argv, "llvm SPGO profile generator\n");			cl::ParseCommandLineOptions(argc, argv, "llvm SPGO profile generator\n");

	// Initialize targets and assembly printers/parsers.			// Initialize targets and assembly printers/parsers.
	InitializeAllTargetInfos();			InitializeAllTargetInfos();
	InitializeAllTargetMCs();			InitializeAllTargetMCs();
	InitializeAllDisassemblers();			InitializeAllDisassemblers();

	PerfReader Reader;			// Load binaries and parse perf events and samples
	Reader.run();			PerfReader Reader(BinaryFilenames);
				Reader.parsePerfTraces(PerfTraceFilenames);

				// Unwind parsed samples
				VirtualUnwinder Unwinder(Reader.getSamples());
				Unwinder.unwindSamples();

				std::unique_ptr<ProfileGenerator> Generator = ProfileGenerator::create(
				Unwinder.getSampleCounters(), Reader.getPerfScriptType(),
				Reader.getAddrToBinaryMap());
				Generator->generateProfile();
				Generator->write();

	return EXIT_SUCCESS;			return EXIT_SUCCESS;
				wenleiUnsubmitted Done Reply Inline Actions If we let ProfileGenerator be the driver, I think we should also let ProfileGenerator initiate the perf loading (line 38); otherwise if we intend to decouple them, and let PerfReader read profile outside of ProfileGenerator, then it's better only pass the loaded profile to ProfileGenerator for cleaner separation. wenlei: If we let ProfileGenerator be the driver, I think we should also let ProfileGenerator initiate…
				wleiAuthorUnsubmitted Done Reply Inline Actions Good suggestion, change to not include PerfReader in ProfileGenerator, then I also decoupled the unwinder from the reader. for the unwinder, the input is the aggregated hybrid sample, the output is the sample counters which is later forwarded to the generator. wlei: Good suggestion, change to not include PerfReader in ProfileGenerator, then I also decoupled…
				hoyUnsubmitted Done Reply Inline Actions Perhaps it's better to include the unwinder in the reader since this driver will also handle non-CS profiles in future. The dataflow from the reader to the profile generator may need a flexible definition (currently is `Unwinder.getSampleCounters()`) for future extension. hoy: Perhaps it's better to include the unwinder in the reader since this driver will also handle…
				wenleiUnsubmitted Done Reply Inline Actions Agreed that unwinder better be driven by PerfReader since unwinder is something PerfReader depends on directly (vs depending on its output like ProfileGenerator on PerfReader's output). wenlei: Agreed that unwinder better be driven by PerfReader since unwinder is something PerfReader…
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[CSSPGO][llvm-profgen] Context-sensitive profile data generation
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 302315

llvm/docs/CommandGuide/llvm-profgen.rst

llvm/include/llvm/ProfileData/SampleProf.h

llvm/lib/ProfileData/SampleProf.cpp

llvm/lib/ProfileData/SampleProfWriter.cpp

llvm/test/tools/llvm-profgen/Inputs/inline-cs-noprobe.perfbin

llvm/test/tools/llvm-profgen/Inputs/inline-cs-noprobe.perfscript

llvm/test/tools/llvm-profgen/Inputs/noinline-cs-noprobe.perfbin

llvm/test/tools/llvm-profgen/Inputs/noinline-cs-noprobe.perfscript

llvm/test/tools/llvm-profgen/inline-cs-noprobe.test

llvm/test/tools/llvm-profgen/mmapEvent.test

llvm/test/tools/llvm-profgen/noinline-cs-noprobe.test

llvm/tools/llvm-profgen/CMakeLists.txt

llvm/tools/llvm-profgen/CallContext.h

llvm/tools/llvm-profgen/LLVMBuild.txt

llvm/tools/llvm-profgen/PerfReader.h

llvm/tools/llvm-profgen/PerfReader.cpp

llvm/tools/llvm-profgen/ProfileGenerator.h

llvm/tools/llvm-profgen/ProfileGenerator.cpp

llvm/tools/llvm-profgen/ProfiledBinary.h

llvm/tools/llvm-profgen/ProfiledBinary.cpp

llvm/tools/llvm-profgen/llvm-profgen.cpp

Unhandled Exception ("Exception")

Unhandled Exception ("Exception")

This is an archive of the discontinued LLVM Phabricator instance.

[CSSPGO][llvm-profgen] Context-sensitive profile data generationClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 302315

llvm/docs/CommandGuide/llvm-profgen.rst

llvm/include/llvm/ProfileData/SampleProf.h

llvm/lib/ProfileData/SampleProf.cpp

llvm/lib/ProfileData/SampleProfWriter.cpp

llvm/test/tools/llvm-profgen/Inputs/inline-cs-noprobe.perfbin

llvm/test/tools/llvm-profgen/Inputs/inline-cs-noprobe.perfscript

llvm/test/tools/llvm-profgen/Inputs/noinline-cs-noprobe.perfbin

llvm/test/tools/llvm-profgen/Inputs/noinline-cs-noprobe.perfscript

llvm/test/tools/llvm-profgen/inline-cs-noprobe.test

llvm/test/tools/llvm-profgen/mmapEvent.test

llvm/test/tools/llvm-profgen/noinline-cs-noprobe.test

llvm/tools/llvm-profgen/CMakeLists.txt

llvm/tools/llvm-profgen/CallContext.h

llvm/tools/llvm-profgen/LLVMBuild.txt

llvm/tools/llvm-profgen/PerfReader.h

llvm/tools/llvm-profgen/PerfReader.cpp

llvm/tools/llvm-profgen/ProfileGenerator.h

llvm/tools/llvm-profgen/ProfileGenerator.cpp

llvm/tools/llvm-profgen/ProfiledBinary.h

llvm/tools/llvm-profgen/ProfiledBinary.cpp

llvm/tools/llvm-profgen/llvm-profgen.cpp

[CSSPGO][llvm-profgen] Context-sensitive profile data generation
ClosedPublic