This is an archive of the discontinued LLVM Phabricator instance.

[CSSPGO][llvm-profgen] Parse mmap events from perf script
ClosedPublic

Authored by wlei on Oct 19 2020, 9:06 AM.

Details

Summary

This stack of changes introduces llvm-profgen utility which generates a profile data file from given perf script data files for sample-based PGO. It’s part of(not only) the CSSPGO work. Specifically to support context-sensitive with/without pseudo probe profile, it implements a series of functionalities including perf trace parsing, instruction symbolization, LBR stack/call frame stack unwinding, pseudo probe decoding, etc. Also high throughput is achieved by multiple levels of sample aggregation and compatible format with one stop is generated at the end. Please refer to: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s for the CSSPGO RFC.

As a starter, this change sets up an entry point by introducing PerfReader to load profiled binaries and perf traces(including perf events and perf samples). For the event, here it parses the mmap2 events from perf script to build the loader snaps, which is used to retrieve the image load address in the subsequent perf tracing parsing.

As described in llvm-profgen.rst, the tool being built aims to support multiple input perf data (preprocessed by perf script) as well as multiple input binary images. It should also support dynamic reload/unload shared objects by leveraging the loader snaps being built by this change

Diff Detail

Event Timeline

wlei created this revision.Oct 19 2020, 9:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 19 2020, 9:06 AM
wlei requested review of this revision.Oct 19 2020, 9:06 AM
wlei edited the summary of this revision. (Show Details)Oct 19 2020, 1:12 PM
wlei added reviewers: hoy, wenlei, wmi, davidxl.
wmi added a comment.Oct 20 2020, 10:50 AM

Lei, thanks for the patch.

wmi added inline comments.Oct 20 2020, 10:50 AM
llvm/docs/CommandGuide/llvm-profgen.rst
15

Please explain what SPGO represents here.

llvm/tools/llvm-profgen/llvm-profgen.cpp
72

It is a map from address to ProfiledBinary, how about using AddressBinaryMap?

117

Here it is may or may not?

129

Maybe use BinaryTable.insert instead of BinaryTable.find above and use the return value to check whether the binary has been loaded before, then you don't have to search the table another time here.

144–145

Here BinaryAddrMap needs to erase an entry before inserting a new one. I guess it is to support the case that a load image is unloaded and then reloaded at a different place. If that is correct, please add some comment to make it clear.

shenhan added inline comments.
llvm/docs/CommandGuide/llvm-profgen.rst
36

Sometimes, the profiled binary path as recorded in mmaps is different from the local binary's path. I understand then this will only use the "name" part to match.
But sometimes the name part is also different, then the match cannot proceed.

One possible improvement is when the binary provided has embed "build-id", this can query perfdata file to get the absolute path of the file with matching build id, and use that absolute path to match mmap events and this would be the most accurate match, what do you think?

hoy added inline comments.Oct 21 2020, 3:46 PM
llvm/docs/CommandGuide/llvm-profgen.rst
36

Thanks for the suggestion. A build-id list can be made as an additional input to the tool for an accurate lookup of the mmap events.

mtrofin added inline comments.
llvm/tools/llvm-profgen/llvm-profgen.cpp
28

(flyby comment) is it a "perfscript" or a "perftrace" - or maybe "perfdata" rather.

Also, probably "perf tool" not "perf script" in the description.

35

Path to profiled binary?

50

Nit: to help maintainability down the road, perhaps ensure primitive fields are initialized here - I realize they are init-ed in the ctor, but it's even easier to maintain things when def and init on the same line. Less typing for the ctor, too.

67

should this set IsLoaded to false?

also, why not assert(!IsLoaded); or rename to "ensureLoaded";

or - is there value in a ProfiledBinary that's not loadable? If not, how about:

  • private ctor
  • fields are const; no need for IsLoaded
  • public static factory method returning a std::unique_ptr<ProfildBinary> that's null if loading fails.

that way, a reader knows they don't need to bother worrying about ProfiledBinaries that don't work - simpler overall state. wdyt?

75

best to initialize struct fields to avoid undefined values.

95

should this rather be Buffer->getBufferSize() > static_cast<size_t>(std::numeric_limits<uint32_t>::max())?

106

could P+1 be out of bounds?

138

I guess this can happen because the Event could reference other binaries than the interesting ones, correct? Could you add a comment here about this - i.e. that the event is intentionally dropped here.

141

why call load - why not just fetch the ProfiledBinary entry? Unless I'm missing something, the expected behavior is that first the user-provided binaries are loaded; and if that goes well, then the events for them are loaded. So at this point the ProfiledBinary should be there?

144–145

Naive question: that assumes no interesting events were collected when the image was loaded at the old address?

175

consider using an enum like:

enum EventIndex {

WholeLine = 0,
PID = 1,
BaseAddress = 2... (etc)

}

so then dereferencing Fields is more readable.

207

perhaps instead of 'run' something more specific, like "loadBinariesAndEvents".

223

probably can be done later just fine, but should the main be in its own file, so then this becomes a reusable library?

wlei updated this revision to Diff 300034.Oct 22 2020, 10:14 AM

fix according to reviewers' suggestions

llvm/docs/CommandGuide/llvm-profgen.rst
15

@wmi Thanks for the helpful list of feedbacks. More description is added here

36

@shenhan Thanks for your suggestion. Do you mean I need to do parsing the binary to get the "build-id"(if it has), then take it as a key to query the perfdata in which it bonds the absolute patch with the build-id? If my understanding is right, I will try to do it later change.

llvm/tools/llvm-profgen/llvm-profgen.cpp
28

@mtrofin Thanks for your helpful list of suggestions!

(flyby comment) is it a "perfscript" or a "perftrace" - or maybe "perfdata" rather.

here we intentionally use "perfscript" but not "perfdata" because it's not the raw perfdata created by the perf record but the output by the perf script -i perf.data, this can avoid to call the cmd inside the code.

Also, probably "perf tool" not "perf script" in the description.

Here change the description of it

35

you mean to change the desc here? "Path" is added

50

Fixed!

72

Good suggestion

75

fixed!

95

Good catch

106

Yeah, it seems it's not used here, delete this!

117

Good catch, fixed

129

Good to know this, fixed!

138

Yes, comments added

144–145

yes , comments added

144–145

Not sure whether it will remain the same address, but here I think it will cover this case. or maybe you want me to check if the base address doesn't change then we can just drop this event?

175

fixed, thanks for the suggestion

207

Agree it's better to have a specific name. My concern is we do not only load binaries and mmp events here, we must also do parsing the sample line simultaneously to extract info for the LBR unwinder(see its children changes), also do check the perfscript type, so I just use a general name as an entry of those misc. Any thoughts about this?

223

Yeah, we separate PerfReader to other files in the following commit, please see its children changes.

wlei retitled this revision from [AutoFDO][llvm-profgen] Parse mmap events from perf script to [CSSPGO][llvm-profgen] Parse mmap events from perf script.Oct 22 2020, 10:15 AM
wlei edited the summary of this revision. (Show Details)
hoy added inline comments.Oct 22 2020, 10:16 AM
llvm/tools/llvm-profgen/llvm-profgen.cpp
106

The underlying MemoryBuffer ensures there is always trailing \0 at the end.

mtrofin added inline comments.Oct 22 2020, 10:29 AM
llvm/tools/llvm-profgen/llvm-profgen.cpp
144–145

Sorry, what I meant was: as the events list gets traversed, say that for a while events corresponding to the current base address are captured; then the base address changes. What happens to the captured events?

hoy added inline comments.Oct 22 2020, 10:39 AM
llvm/tools/llvm-profgen/llvm-profgen.cpp
67

Asserting the function is only run once per binary is reasonable since all provided binaries are loaded as soon as the tool starts. The initial design allowed for a load on demand, i.e, when binaries are not provided via the command line option, the profiled binaries record by mmap events will be automatically loaded while processing the events. IsLoaded was introduced as a signal for that. However that complicated the tool and was not continued.

A load failure would result the tool to error and exit instead of returning a null object. Please see subsequent patch D89712.

mtrofin added inline comments.Oct 22 2020, 10:45 AM
llvm/tools/llvm-profgen/llvm-profgen.cpp
67

So then could the loading happen in the ctor, since failure to load => fast fail? Then the ProfiledBinary is known to be always loaded, and the only mutable field is the base address - to be clear, my suggestion is for simpler maintainability.

hoy added inline comments.Oct 22 2020, 10:45 AM
llvm/tools/llvm-profgen/llvm-profgen.cpp
144–145

The events come in order of time. The mmap events serve as a time stamp for all LBR and call stack events. Once a mmap event is processed, the binary load loaded address recorded will be updated and all subsequent LBR events will be processed against the updated base address.

mtrofin added inline comments.Oct 22 2020, 10:49 AM
llvm/tools/llvm-profgen/llvm-profgen.cpp
144–145

I see - thanks!

hoy added inline comments.Oct 22 2020, 10:58 AM
llvm/tools/llvm-profgen/llvm-profgen.cpp
67

I see your point now. Thanks for the suggestion. There could be a case that user-specified binary never participated in profiling, which means they are not included in the mmap events, and they should not be loaded and if they are, their load failure should not block the processing. However, that sounds a user responsibility. Moving the load into ctor time is reasonable to me. It is a cleaner design.

wlei updated this revision to Diff 300122.Oct 22 2020, 4:31 PM

move load() to ctor of ProfiledBianry

wlei updated this revision to Diff 300759.Oct 26 2020, 12:19 PM

delete useless ProfiledBinary ctor

shenhan added inline comments.Oct 29 2020, 9:50 AM
llvm/tools/llvm-profgen/llvm-profgen.cpp
205

One thought on using MemoryBuffer::getFileOrSTDIN (or other similar MemroyBuffer based file reading):

  • there could be scalability issues when processing huge (10G+) files, because the way MemoryBuffer::getFileOrSTDIN reads files is it reads the whole content into memory (or do a mmap) in oneshot, this imposes a huge burden on memory io.
  • for perf script output parsing, it is mostly line based - each line is processed and discarded, this suggests that a stream based processing is more suitable and could be much more efficient. A straightforward way (with lower level io operations) could be like this: std::ifstream fin(perf_script_filename); if (!fin.good()) { /* error */ } for (std::string line; std::getline(input, line); ) { parseEvent(line); }
  • this way, the memory consumption is almost constant regardless of the inpurt perf script file.

What do you think?

hoy added inline comments.Oct 29 2020, 10:58 AM
llvm/tools/llvm-profgen/llvm-profgen.cpp
205

This sounds a good solution to me. The perf file easies goes very large for large application and long profiling runs where reducing memory footprint will be very helpful.

wlei added inline comments.Oct 29 2020, 11:02 AM
llvm/tools/llvm-profgen/llvm-profgen.cpp
205

@shenhan Thanks for your suggestion, having a constant memory consumption is great, let me change the code.

wenlei added inline comments.Oct 30 2020, 11:00 AM
llvm/docs/CommandGuide/llvm-profgen.rst
15

small nit: searching through the code base, seems like the canonical name is "sample-based profile guided optimizations", most notably from the help message of fprofile-sample-use, so let's be consistent.

llvm/tools/llvm-profgen/llvm-profgen.cpp
150

const line_iterator& ?

211

What about we pass in BinaryFilenames and PerfTraceFilenames as parameters to this function (or to ctor), instead of letting PerfReader coupled with command-line options directly. Then perhaps name it readFromInput(BinaryFilenames, PerfTraceFilenames).

wlei updated this revision to Diff 303974.Nov 9 2020, 1:27 PM

Refactor to put PerfReader and ProfiledBinary into seperated files
Use stream based trace reader(TraceStream class)
some function renaming

wlei edited the summary of this revision. (Show Details)Nov 9 2020, 1:42 PM
wlei marked 32 inline comments as done.Nov 9 2020, 4:04 PM
wmi accepted this revision.Nov 13 2020, 4:10 PM

LGTM.

llvm/tools/llvm-profgen/ErrorHandling.h
10

Please fix the clang-tidy warning.

23

Like the clang-tidy suggests, better change it to "const Twine&"

This revision is now accepted and ready to land.Nov 13 2020, 4:10 PM
wlei updated this revision to Diff 305301.Nov 13 2020, 10:41 PM

fix clang-tidy warning.

wenlei accepted this revision.Nov 20 2020, 9:58 AM

LGTM.

This revision was landed with ongoing or failed builds.Nov 20 2020, 2:27 PM
This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.Apr 21 2021, 7:17 AM
thakis added inline comments.
llvm/test/tools/llvm-profgen/lit.local.cfg
7

Out of interest, why is this here? llvm-profgen is part of the llvm build and should always exist, right (…assuming you add it to LLVM_TEST_DEPENDS in llvm/test/CmakeLists.txt – is there a reason to not add it there?)? No other tool test has a check like this as far as I know.

thakis added inline comments.Apr 21 2021, 8:50 AM
llvm/test/tools/llvm-profgen/lit.local.cfg
7

Looks like it's in LLVM_TEST_DEPENDS nowadays. Maybe these two lines can go now?

hoy added inline comments.Apr 21 2021, 2:35 PM
llvm/test/tools/llvm-profgen/lit.local.cfg
7

Good point. Yes, with llvm-profgen included in LLVM_TEST_DEPENDS, it should always be rebuilt and available at test time. The two lines are unnecessary.