This is an archive of the discontinued LLVM Phabricator instance.

[CSSPGO][llvm-profgen] Disassemble text sections
ClosedPublic

Authored by wlei on Oct 19 2020, 10:03 AM.

Details

Summary

This stack of changes introduces llvm-profgen utility which generates a profile data file from given perf script data files for sample-based PGO. It’s part of(not only) the CSSPGO work. Specifically to support context-sensitive with/without pseudo probe profile, it implements a series of functionalities including perf trace parsing, instruction symbolization, LBR stack/call frame stack unwinding, pseudo probe decoding, etc. Also high throughput is achieved by multiple levels of sample aggregation and compatible format with one stop is generated at the end. Please refer to: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s for the CSSPGO RFC.

This change enables disassembling the text sections to build various address maps that are potentially used by the virtual unwinder. A switch --show-disassembly is being added to print the disassembly code.

Like the llvm-objdump tool, this change leverages existing LLVM components to parse and disassemble ELF binary files. So far X86 is supported.

Test Plan:

ninja check-llvm

Diff Detail

Event Timeline

wlei created this revision.Oct 19 2020, 10:03 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 19 2020, 10:03 AM
wlei requested review of this revision.Oct 19 2020, 10:03 AM
wlei edited the summary of this revision. (Show Details)Oct 19 2020, 1:16 PM
wlei added reviewers: wmi, hoy, davidxl, wenlei.
wlei updated this revision to Diff 300767.Oct 26 2020, 12:56 PM
wlei edited the summary of this revision. (Show Details)

rebase

wlei retitled this revision from [AutoFDO][llvm-profgen] Disassemble text sections to [CSSPGO][llvm-profgen] Disassemble text sections.Oct 26 2020, 12:57 PM
wlei edited the summary of this revision. (Show Details)
wmi added inline comments.Oct 26 2020, 5:37 PM
llvm/test/tools/llvm-profgen/disassemble.s
24

Can we have a test with multiple .text sections and some of the sections have suffixes like ".text.hot" or ".text.unlikely"?

llvm/tools/llvm-profgen/ProfiledBinary.cpp
53

Can we use MCInstrDesc::isCall()?

73

Can we use MCInstrDesc::isReturn()?

127

Can we reverse the condition here?
auto Obj = dyn_cast<ELFObjectFileBase>(&Binary)
if (!Obj)

exitWithError("not a valid Elf image", Path);

...

261–292

Can we wrap it in a function like disassembleSymbol, to make the disassemble function smaller?

wlei updated this revision to Diff 301520.Oct 28 2020, 11:26 PM
wlei marked 4 inline comments as done.

address reviewer's comments

wlei added inline comments.Oct 28 2020, 11:27 PM
llvm/test/tools/llvm-profgen/disassemble.s
24

@wmi Thanks for your helpful suggestions.
Modified this test with multiple .text sections and ".text.hot" and ".text.unlikely"

llvm/tools/llvm-profgen/ProfiledBinary.cpp
53

fixed, thanks!

73

fixed!

127

Good catch, fixed!

261–292

Yeah, It seems there are so many arguments after the refactoring, so I changed the scope of those MC ptrs, move into the class scope, also wrap them into setUpDisassembler function.

cc: @hoy

hoy added inline comments.Oct 29 2020, 12:01 AM
llvm/tools/llvm-profgen/ProfiledBinary.cpp
115

No need to pass this in as a parameter. Can just use the member field PreferredBaseAddress instead.

117

StringRef type doesn't need to be passed by reference as a convention.

117

The function can return false upon a disassemble error and let the caller handle the error so that FileName is not needed here.

wlei updated this revision to Diff 301671.Oct 29 2020, 10:12 AM

address reviewer's comments

wlei marked 3 inline comments as done.Oct 29 2020, 10:13 AM
wlei added inline comments.
llvm/tools/llvm-profgen/ProfiledBinary.cpp
115

Thanks for your suggestions, all fixed

wenlei added inline comments.Oct 30 2020, 11:03 AM
llvm/tools/llvm-profgen/ProfiledBinary.cpp
60

Can we sink the 4 getELFImageLMAForSec manually to avoid duplication and make it more readable?

If it's not too much work, I'd suggest squash the refactoring of your later patch (RVA->Offset etc from https://reviews.llvm.org/D89715) into this one. Upstream patch does not have to map 1:1 to internal patch - squashing, minimizing code churn and grouping by functionality helps make it easier for other reviewers. Thanks.

wlei updated this revision to Diff 303306.Nov 5 2020, 5:30 PM
wlei marked an inline comment as done.

move some renaming work into this from the child diff

wlei added inline comments.Nov 5 2020, 6:52 PM
llvm/tools/llvm-profgen/ProfiledBinary.cpp
60

Seems we can't merge those IF together by "||", I tried and got error.
you can see in llvm/lib/DebugInfo/Symbolize/Symbolize.cpp, they also use the same style.

Optional<ArrayRef<uint8_t>> getBuildID(const ELFObjectFileBase *Obj) {
  Optional<ArrayRef<uint8_t>> BuildID;
  if (auto *O = dyn_cast<ELFObjectFile<ELF32LE>>(Obj))
    BuildID = getBuildID(O->getELFFile());
  else if (auto *O = dyn_cast<ELFObjectFile<ELF32BE>>(Obj))
    BuildID = getBuildID(O->getELFFile());
  else if (auto *O = dyn_cast<ELFObjectFile<ELF64LE>>(Obj))
    BuildID = getBuildID(O->getELFFile());
  else if (auto *O = dyn_cast<ELFObjectFile<ELF64BE>>(Obj))
    BuildID = getBuildID(O->getELFFile());
  else
    llvm_unreachable("unsupported file format");
  return BuildID;
}

(https://github.com/llvm/llvm-project/blob/master/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp)

wmi added inline comments.Nov 13 2020, 5:14 PM
llvm/tools/llvm-profgen/ProfiledBinary.cpp
233

Can we set PreferredBaseAddress here and don't have to iterate all the sections in setPreferredBaseAddress function?
PreferredBaseAddress = getELFImageLMAForSec(Section);

wlei marked an inline comment as done.Nov 16 2020, 2:18 PM
wlei added inline comments.
llvm/tools/llvm-profgen/ProfiledBinary.cpp
233

Thanks for your suggestion. Considering the section number should not be very large, we intend to decouple them to make disassemble function focus on disassembling things and PreferredBaseAddress would also be used in other functions so we used an explicit set-up for better readability.

wlei updated this revision to Diff 305596.Nov 16 2020, 2:18 PM

fix clang-tidy warning

wmi accepted this revision.Nov 16 2020, 2:20 PM

LGTM.

llvm/tools/llvm-profgen/ProfiledBinary.cpp
233

Ok, thanks.

This revision is now accepted and ready to land.Nov 16 2020, 2:20 PM
wenlei added inline comments.Nov 20 2020, 10:19 AM
llvm/tools/llvm-profgen/ProfiledBinary.cpp
60

What I was thinking about was to hoist getELFImageLMAForSec only. But not a big deal, current one is fine too.

auto *ELFObj = dyn_cast<ELF32LEObjectFile>(Sec.getObject());
if (!ELFObj)
  ELFObj = dyn_cast<ELF32BEObjectFile>(Sec.getObject());
if (!ELFObj)
  ELFObj = dyn_cast<ELF64BEObjectFile>(Sec.getObject());
if (!ELFObj)
  ELFObj = dyn_cast<ELF64BEObjectFile>(Sec.getObject());

return getELFImageLMAForSec(ELFObj->getELFFile(), Sec, ELFObj->getFileName());
wenlei accepted this revision.Nov 20 2020, 10:39 AM
wlei edited the summary of this revision. (Show Details)Nov 20 2020, 1:35 PM
This revision was landed with ongoing or failed builds.Nov 20 2020, 2:27 PM
This revision was automatically updated to reflect the committed changes.