Page MenuHomePhabricator

[llvm-objdump][CallGraphSection] Extract call graph information from binary
AbandonedPublic

Authored by necipfazil on Jul 13 2021, 10:42 AM.

Details

Summary

Introduce –call-graph-info option to llvm-objdump to dump call graph
information. Output includes information for type identifiers for indirect
calls and targets from call graph section, if available.

Original RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-June/151044.html
Updated RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-July/151739.html

Diff Detail

Event Timeline

necipfazil created this revision.Jul 13 2021, 10:42 AM
necipfazil requested review of this revision.Jul 13 2021, 10:42 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 13 2021, 10:42 AM

Adapt to the new call graph section layout, refactor

  • Parse the call graph section based on the new section layout
  • Refactor code for readability and efficiency
  • Account for functions that call graph section has no info for, and them as indirect target with unknown type id.
  • Add additional warning messages (for functions that call graph section has no info for)
morehouse added inline comments.Jul 20 2021, 9:16 AM
llvm/tools/llvm-objdump/llvm-objdump.cpp
1093
1471

Is MIA->isIndirectBranch() && MIA->isCall() sufficient to detect an indirect call?

2156–2158
2220–2226
2228–2242

Nit: the above two loops could be combined.

2250–2266

Nit: the above two loops could be combined.

lattner resigned from this revision.Jul 20 2021, 9:44 PM

Fix small nits

necipfazil marked 6 inline comments as done.Jul 21 2021, 3:34 PM
necipfazil added inline comments.
llvm/tools/llvm-objdump/llvm-objdump.cpp
1471

MIA->isIndirectBranch() evaluates to false for any call instruction.

This revision is now accepted and ready to land.Jul 22 2021, 8:41 AM
jhenderson requested changes to this revision.Jul 26 2021, 12:42 AM

Please don't commit this yet. there's a lot of code, and I'd like to spend quite some time reviewing it - a first glance through raises some initial concerns, but I need to understand the whole thing to be able to properly consider them and possible suggestions.

Some initial high-level comments/questions:

  1. Is it possible to break this patch down into smaller parts, perhaps implementing some subset of the functionality per part? That'll make things easier to review.
  2. Why is this functionality being added to llvm-objdump rather than llvm-readobj? (There may be a perfectly valid reason to do so, but llvm-readobj tends to be the place we put things to interpret arbitrary sections).
  3. Can we avoid the canned binary in the test, please? Use an input generated at runtime using either llvm-mc or yaml2obj (the latter is preferable, but likely requires adding functionality to yaml2obj for the new section type).
llvm/tools/llvm-objdump/ObjdumpOpts.td
36

Please place this in alphabetical order in relation to other options (i.e. before demangle).

This revision now requires changes to proceed.Jul 26 2021, 12:42 AM
necipfazil marked 2 inline comments as done.EditedJul 28 2021, 10:08 PM

Thanks for your feedback.

Please don't commit this yet. there's a lot of code, and I'd like to spend quite some time reviewing it - a first glance through raises some initial concerns, but I need to understand the whole thing to be able to properly consider them and possible suggestions.

Some initial high-level comments/questions:

  1. Is it possible to break this patch down into smaller parts, perhaps implementing some subset of the functionality per part? That'll make things easier to review.

This patch is now split to 6 smaller patches: D107028, D107029, D107030, D107031, D107032, D107033. We can now abandon this revision.

  1. Why is this functionality being added to llvm-objdump rather than llvm-readobj? (There may be a perfectly valid reason to do so, but llvm-readobj tends to be the place we put things to interpret arbitrary sections).

We would like to dissasemble the object for information on functions and call sites. The first attempt was to use llvm-readobj but llvm-objdump seems to provide a large amount of code reuse for our case. Especially see D107029 for functions and D107030 for call sites.

  1. Can we avoid the canned binary in the test, please? Use an input generated at runtime using either llvm-mc or yaml2obj (the latter is preferable, but likely requires adding functionality to yaml2obj for the new section type).

I added a bunch of additional (and smaller) llvm-mc/yaml2obj tests with the new set of patches.

  • Edit: not -> now

If I get time, I will look at the patch series next week. Before you go ahead with landing anything though, I think you need to demonstrate that this feature is both useful and desired by the community: this is quite a large amount of code, and on my first glance, neither RFC attracted any feedback, which may suggest people aren't interested in this functionality.

If I get time, I will look at the patch series next week. Before you go ahead with landing anything though, I think you need to demonstrate that this feature is both useful and desired by the community: this is quite a large amount of code, and on my first glance, neither RFC attracted any feedback, which may suggest people aren't interested in this functionality.

We expect this feature to be useful for sanitizers, especially hardware-supported memory tagging, as it allows us to compress allocation/deallocation stack traces and reconstruct them offline. This will reduce memory overhead from stack traces by 16x and make production deployment feasible in more scenarios. I'd argue that this alone makes the added complexity worthwhile.

I also imagine the call graph would be useful for guided fuzzing (e.g., funcA calls funcB, and we want coverage of funcB, so we mutate inputs that touch funcA).

necipfazil abandoned this revision.Jul 29 2021, 3:10 PM

Abandoning this revision as it is split.