This is an archive of the discontinued LLVM Phabricator instance.

[llvm-profgen] Support LBR only perf script
ClosedPublic

Authored by wlei on Aug 16 2021, 12:02 PM.

Details

Summary

This change aims at supporting LBR only sample perf script which is used for regular(Non-CS) profile generation. A LBR perf script includes a batch of LBR sample which starts with a frame pointer and a group of 32 LBR entries is followed. The FROM/TO LBR pair and the range between two consecutive entries (the former entry's TO and the latter entry's FROM) will be used to infer function profile info.

An example of LBR perf script(created by perf script -F ip,brstack -i perf.data)

40062f 0x40062f/0x4005b0/P/-/-/9  0x400645/0x4005ff/P/-/-/1  0x400637/0x400645/P/-/-/1 ...
4005d7 0x4005d7/0x4005e5/P/-/-/8  0x40062f/0x4005b0/P/-/-/6  0x400645/0x4005ff/P/-/-/1 ...
...

For implementation:

  • Extended a new child class LBRPerfReader for the sample parsing, reused all the functionalities in extractLBRStack except for an extension to parsing leading instruction pointer.
  • HybridSample is reused(just leave the call stack empty) and the parsed samples is still aggregated in AggregatedSamples. After that, range samples, branch sample, address samples are computed and recorded.
  • Reused ContextSampleCounterMap to store the raw profile, since it's no need to aggregation by context, here it just registered one sample counter with a fake context key.
  • Unified to use show-raw-profile instead of show-unwinder-output to dump the intermediate raw profile, see the comments of the format of the raw profile. For CS profile, it remains to output the unwinder output.

Profile generation part will come soon.

Diff Detail

Event Timeline

wlei created this revision.Aug 16 2021, 12:02 PM
wlei requested review of this revision.Aug 16 2021, 12:02 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 16 2021, 12:02 PM
wlei updated this revision to Diff 366712.Aug 16 2021, 12:27 PM

add test for mmap event perf script

wlei edited the summary of this revision. (Show Details)Aug 16 2021, 2:44 PM
wlei added reviewers: hoy, wenlei, wmi.
hoy added inline comments.Aug 17 2021, 12:08 PM
llvm/tools/llvm-profgen/PerfReader.cpp
657

I'm thinking of separating LBR samples from HybridSample and making HybridSample inherit that. This way they are technically separate and could reduce confusion. What do you think?

llvm/tools/llvm-profgen/PerfReader.h
373

Do we really need this? RangeCounter should cover addresses?

521

Nit: splitting once is enough, instead of unlimited times (-1)?

544

Why need this? Can the count be just skipped since it does not reflect a call stack sample and LBR sample?

wlei added inline comments.Aug 18 2021, 6:25 PM
llvm/tools/llvm-profgen/PerfReader.cpp
657

Sounds good, changed!

llvm/tools/llvm-profgen/PerfReader.h
373

Good catch. I did check our internal tool, the addressCounter is actually used only without LBR sample, so here we indeed don't need this, thanks!

521

-1(maximum number of times split) is the default value, here is for the 4th parameter which means to not keep the empty string, otherwise there are many empty record, it's not direct to get the first leading pointer. Seems C++ can't skip the middle default value.

544

Here it's in fact to skip it. To distinguish LBR and Hybrid sample, it used Count to indicate call stack exists, >0 means hybrid sample. When adding the aggregated count, it will also increase Count and treat LBR sample as hybrid sample incorrectly.

wlei updated this revision to Diff 367378.Aug 18 2021, 6:25 PM
wlei edited the summary of this revision. (Show Details)

addressing Hongtao's feedback

wlei updated this revision to Diff 367379.Aug 18 2021, 6:30 PM

fix typo

Unified to use show-raw-profile instead of show-unwinder-output to dump the intermediate raw profile, see the comments of the format of the raw profile. For CS profile, it remains to output the unwinder output

If these are considered actual profile (as the name show-raw-profile suggests), it's probably better to write to output file instead of dumping to stdout. How about just have a switch skip-symbolization, which writes the raw, unsymbolized profile to output file directly, and then skip all later processing.

The range representation between cs and non-cs profile is slightly different, perhaps they can also be unified.

llvm/tools/llvm-profgen/PerfReader.cpp
657

I actually find the inheritance isn't really necessary now. The hashing specialization in LBRSample does not add value because the general version for HybridSample can handle that too, and a separate hash for LBRSample just adds duplication. On the contrary, how about we rename HybridSample to PerfSample and remove the inheritance structure all together?

Also I don't think HybridSample being subclass of LBRSample is good because this is more about composition. What about if we have a perf sample that only has call stack but not LBR? inheritance wouldn't work well for that (and we don't want virtual inheritance). This is similar to why we don't make subclass for aggregation count.

llvm/tools/llvm-profgen/PerfReader.h
521

Line.split(Records, " ", 1, false); instead?

688

Would an empty string (zero length) work?

llvm/tools/llvm-profgen/llvm-profgen.cpp
95

As an intermediate state, print a warning?

wlei added inline comments.Aug 19 2021, 10:33 AM
llvm/tools/llvm-profgen/PerfReader.h
521

Sorry, I misunderstood. I tired Line.split(Records, " ", 1, false); but it showed it doesn't work, there are several leading empty space, splitting once will only separate one empty space and the remaining still start with empty space.

hoy added inline comments.Aug 19 2021, 11:59 AM
llvm/tools/llvm-profgen/PerfReader.cpp
657

Removing the inheritance has a cost of stacking fields that are not necessarily used at the same time together. For example, the SmallVector<uint64_t, 16> CallStack field will be there for LBRSamples and in member functions the unused fields have to be checked. But we don't have virtual method overhead as a result of removing inheritance, which is a cons of OOP.

Multi-inheritance can be use to a separate callstack samples.

I still prefer OOP here from memory and code maintenance and extension point of view. Let me know your thoughts.

llvm/tools/llvm-profgen/PerfReader.h
544

I see. Do we always have an aggregated count for both hybrid and LBR samples? If not, a decimal number can also be a valid call frame addr in which case we may not be able to tell LBR samples apart from hybrid samples.

wlei updated this revision to Diff 368137.Aug 23 2021, 10:12 AM

addressing Wenlei's feedback

wlei added a comment.Aug 23 2021, 10:14 AM

If these are considered actual profile (as the name show-raw-profile suggests), it's probably better to write to output file instead of dumping to stdout. How about just have a switch skip-symbolization, which writes the raw, unsymbolized profile to output file directly, and then skip all later processing.

Sounds good to write to the output file, I indeed found the raw profile outputs are mixed with other warning logs which is inconvenient for other tools to use the raw profile. Changed!

The range representation between cs and non-cs profile is slightly different, perhaps they can also be unified.

Can we use a separate patch(will come soon) for the unified format? just for easy review(there should be several tests to be fixed).

llvm/tools/llvm-profgen/PerfReader.h
688

Good, catch! Removed!

llvm/tools/llvm-profgen/llvm-profgen.cpp
95

Sounds good

wlei updated this revision to Diff 369482.Aug 30 2021, 10:15 AM

Remove HybridSample and LBRSample, use one PerfSample instead

hoy accepted this revision.Aug 30 2021, 11:18 AM

lgtm.

This revision is now accepted and ready to land.Aug 30 2021, 11:18 AM
wenlei accepted this revision.Aug 30 2021, 12:37 PM

lgtm with minor comments, thanks.

llvm/tools/llvm-profgen/PerfReader.h
521

not a big deal, but this could be faster as it avoids scanning and splitting the entire string:

Line = Line.trim(" ");
Line.split(Records, " ", 1, false);
613–614

nit: now that this is output to a file instead of stdout, rename it as writeRawProfile?

wlei updated this revision to Diff 369557.Aug 30 2021, 2:34 PM

addressing feedbacks.

wlei added inline comments.Aug 30 2021, 2:38 PM
llvm/tools/llvm-profgen/PerfReader.h
521

Sounds good, I actually found there are two space strings between each LBR entry, so I need to change to split 2 times :)

wenlei added inline comments.Aug 30 2021, 6:30 PM
llvm/tools/llvm-profgen/PerfReader.h
521

Or Line.split(Records, " ", 1, false);? two space instead of one as the separator. Works either way.

This revision was automatically updated to reflect the committed changes.