Page MenuHomePhabricator

[CallGraphSection] Add call graph section options and documentation
AcceptedPublic

Authored by necipfazil on Jul 13 2021, 10:06 AM.

Details

Summary

This is the first of the patch series that adds the support for
computing, storing, and restoring call graphs with LLVM. This adds the
options and the design documentation for computing and storing the call
graphs.

Inferring indirect call targets from a binary is challenging without
source-level information. Hence, the reconstruction of a fine-grained
call graph from the binary is unfeasible for indirect/virtual calls.
To address this, designed solution is to collect the necessary information
to construct the call graph while the source information is present, and
store it in a non-code section of the binary. To enable, use
-fcall-graph-section for Clang, or --call-graph-section for LLVM.

Original RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-June/151044.html
Updated RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-July/151739.html

Diff Detail

Unit TestsFailed

TimeTest
1,390 msx64 debian > MemProfiler-x86_64-linux.TestCases::test_new_load_store.cpp
Script: -- : 'RUN: at line 5'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang --driver-mode=g++ -fmemory-profile -mno-omit-leaf-frame-pointer -fno-omit-frame-pointer -fno-optimize-sibling-calls -gline-tables-only -m64 -O0 /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/memprof/TestCases/test_new_load_store.cpp -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/memprof/X86_64LinuxConfig/TestCases/Output/test_new_load_store.cpp.tmp
2,870 msx64 debian > libarcher.barrier::barrier.c
Script: -- : 'RUN: at line 14'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/barrier/barrier.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/barrier/Output/barrier.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/barrier/Output/barrier.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/barrier/Output/barrier.c.tmp.log | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/barrier/barrier.c
2,880 msx64 debian > libarcher.critical::critical.c
Script: -- : 'RUN: at line 15'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/critical/critical.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/critical/Output/critical.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/critical/Output/critical.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/critical/Output/critical.c.tmp.log | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/critical/critical.c
2,860 msx64 debian > libarcher.parallel::parallel-simple.c
Script: -- : 'RUN: at line 15'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/parallel/parallel-simple.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/parallel/Output/parallel-simple.c.tmp -latomic && env OMP_TOOL_VERBOSE_INIT=stderr env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/parallel/Output/parallel-simple.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/parallel/Output/parallel-simple.c.tmp.log 2>&1 | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/parallel/parallel-simple.c --check-prefixes CHECK,TSAN_ON
2,930 msx64 debian > libarcher.parallel::parallel-simple2.c
Script: -- : 'RUN: at line 15'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/parallel/parallel-simple2.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/parallel/Output/parallel-simple2.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/parallel/Output/parallel-simple2.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/parallel/Output/parallel-simple2.c.tmp.log | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/parallel/parallel-simple2.c
View Full Test Results (18 Failed)

Event Timeline

necipfazil created this revision.Jul 13 2021, 10:06 AM
necipfazil requested review of this revision.Jul 13 2021, 10:06 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJul 13 2021, 10:06 AM
morehouse added inline comments.Jul 13 2021, 12:46 PM
clang/docs/CallGraphSection.rst
59

Why would a type ID be repeated?

62
70
149

Why quotes around these table headers/entries?

161

Why is this?

175

Do we need to list functions that don't match any callsites (e.g., main)?

184

So this section is useful for constructing the call graph, but we don't really need it for stack trace reconstruction, right?

necipfazil added inline comments.Jul 13 2021, 1:42 PM
clang/docs/CallGraphSection.rst
59

Current implementation [1] creates a call graph section per function comdat group, to which the assembly printer writes the call graph entries related to the function. This is done to enable dead-stripping of call graph entries. Consequently, the callsites from two functions may share the same type ID but appear as distinct entries as they will be written to distinct sections. Although they are merged to a single section by the linker, the type ID repetition persists since the linker only concatenates.

Eliminating this to ensure that type IDs are not repeated should only decrease the binary size overhead.

[1] https://reviews.llvm.org/D105916 , see MCObjectFileInfo::getCallGraphSection()

149

Quotes are placed to espace commas (actually, there is only one) in the table cells. They are not visible in the output html.

161

Please see the response above.

175

They may be needed for dynamically loaded objects whose functions are called indirectly outside the object.

184

For stack trace reconstruction, we need the full reverse call graph, thus, this section. Without this section, which function makes the indirect call is unknown.

During stack trace reconstruction, let's say we end up in a state where an indirect call is seen. To take another step, we need to know which function made this indirect call, so that we can continue the search through the callers of that function.

rebase, fix nits in docs

We should also tests for the new flags in clang/test/Driver/clang_f_opts.c.

clang/docs/CallGraphSection.rst
59

Indeed, this is a tricky problem. The current solution is probably OK, but maybe we can simplify the callgraph section format in light of this solution. e.g.,

Format version, Type id, IsFuncEntry, PC
  0, NumericTypeId(foo), 1, FuncEntryPc(foo)
  0, NumericTypeId(foo), 0, CallSitePc(fp_foo())
  0, NumericTypeId(main), 1, FuncEntryPc(main)

cc @MaskRay

necipfazil marked 2 inline comments as done.
  • test new clang flags
  • rebase
morehouse added inline comments.Jul 16 2021, 1:09 PM
clang/test/Driver/clang_f_opts.c
603 ↗(On Diff #359010)

For completeness, we should also test

-fcall-graph-section -fno-call-graph-section

and

-fno-call-graph-section -fcall-graph-section
MaskRay added inline comments.Jul 16 2021, 1:25 PM
clang/test/Driver/clang_f_opts.c
603 ↗(On Diff #359010)

I find it usually a bit wasteful to have too many RUN lines for a boolean option.

Testing just -fcall-graph-section and -fcall-graph-section -fno-call-graph-section will look good enough to me. See coverage.c for an example.

I think nowadays the practice is to have boolean options tested in separate files, instead of the monolithic clang_f_opts.c

(I saw you CCed me. I have applied the patch series locally. I need to experiment it a bit before commenting.)

Change call graph section layout

  • Changed the call graph section layout
  • Extended the example
  • Updated the documentation

In the previous version, the call graph section layout was optimized for
entries with unique type ids in which to list any indirect calls and targets
with such type id. However, the implementation requirements prevented sharing
information from different functions in the same entry. The main reason is
the need for separate entries per functions to allow dead-stripping.

As the previous layout was suboptimal for per-function call graph section
entries, a new layout is created. Please see the updated documentation for the
new layout.

The new layout has the following advantages:

  • The previous layout allowed multiple indirect targets to be listed in a single entry. However, there was no use since each entry was created for a single function. With the new layout, an entry is said to belong to a function, and lists info on the function and its indirect callsites. Consequently, the layout is designed for per-function entry approach.
  • The previous layout listed callsites per type id as (TypeId, CallSitesWithTypeIdCount, [CallSite1WithTypeId, ...]). However, now that the callsites are listed per function, a small number of callsites are expected for each type id. Therefore, they are now listed as (TypeId, CallSite) pairs to avoid wasting space on count values that are usually 1.
  • In the previous layout, if a function was not an indirect target, its entry PC was not included. This prevented making the distinction between
    1. the non-indirect target functions and 2) those functions that we had no information about (e.g., functions from linked objects with no call graph section). With the new layout, the function entry PC is always listed. It is marked as non-indirect target if so. For completeness, the user still needs to take any function with missing information as receiver to any indirect calls. However, the user can avoid doing the same for the non-indirect targets as their entry PCs are now listed, which improves the precision for the resulting call graph.
necipfazil marked 6 inline comments as done.Jul 17 2021, 9:05 PM

High-level comments:

I'd use a bottom-up patch order. llvm patches are before clang patches.
The clang driver patch is the last - having the user-facing option requires the functionality to be fully available.
llvm-objdump patch is orthogonal to clang patches, so its order is flexible.

If the section only contains indirect call edges, perhaps it should have indirect in the option name.

clang/docs/CallGraphSection.rst
10

object file

21

missing

22

due to having unneeded edges

30

`--call-graph-info`

clang/include/clang/Basic/CodeGenOptions.def
425

make sense to move it near EmitCallSiteInfo

clang/include/clang/Driver/Options.td
1258

make sense to move it near stacksizessection

clang/lib/CodeGen/BackendUtil.cpp
571

make sense to move it near EmitCallSiteInfo

clang/lib/Driver/ToolChains/Clang.cpp
6420

make sense to move it near stacksizessection

morehouse accepted this revision.Jul 19 2021, 9:26 AM

New .callgraph section layout LGTM.

Please also address any further feedback from Fangrui.

This revision is now accepted and ready to land.Jul 19 2021, 9:26 AM
necipfazil marked 10 inline comments as done.

Fix nits in documentation and tests

My previous feeling still applies. It seems to me that the clang driver patch should be the last.

Looks like the RFC threads didn't have any discussion - I think that'll be necessary before moving forward with/committing any work here, for what it's worth. Perhaps some Apple folks would have some interest since they use ARM and sanitizers a lot & probably care about performance of those things together too?

Might be worth looking at the overall feature this is part of - efficient call stack retrieval/rebuilding, by the sounds of it/from what I understand of the original proposal.