This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo] Add support for DWARF5 call site-related attributes
ClosedPublic

Authored by vsk on Jul 26 2018, 4:28 PM.

Details

Summary

DWARF v5 introduces DW_AT_call_all_calls, a subprogram attribute which
indicates that all calls (both regular and tail) within the subprogram
have call site entries. The information within these call site entries
can be used by a debugger to populate backtraces with synthetic tail
call frames.

Tail calling frames go missing in backtraces because the frame of the
caller is reused by the callee. Call site entries allow a debugger to
reconstruct a sequence of (tail) calls which led from one function to
another. This improves backtrace quality. There are limitations: tail
recursion isn't handled, variables within synthetic frames may not
survive to be inspected, etc. This approach is not novel, see:

https://gcc.gnu.org/wiki/summit2010?action=AttachFile&do=get&target=jelinek.pdf

This patch adds an IR-level flag (DIFlagAllCallsDescribed) which lowers
to DW_AT_call_all_calls. It adds the minimal amount of DWARF generation
support needed to emit standards-compliant call site entries. For easier
deployment, when the debugger tuning is LLDB, the DWARF requirement is
adjusted to v4.

Testing: Apart from check-{llvm, clang}, I built a stage2 RelWithDebInfo
clang binary. Its dSYM passed verification and grew by 1.4% compared to
the baseline. 151,879 call site entries were added.

rdar://42001377

Diff Detail

Event Timeline

vsk created this revision.Jul 26 2018, 4:28 PM
vsk updated this revision to Diff 157610.Jul 26 2018, 4:56 PM
  • In constructCallSiteEntryDIE, handle the fact that there may not be a DIE representing the local scope.

Do you really want to implement all_calls? It seems like that's going to be difficult and expensive (in size). Looks like you're focussed mostly on the tail call case, so why not aim for all_tail_calls?

& why the special case of only describing the first call site for any given function - I mean, it's a space saving, but is that sustainable (if so, are you thinking about proposing a DWARF change to support this implementation choice?)? If not, what's the benefit in having that feature/quirk in the short term?

(also, in general: what's your use case for this feature? It does seem like a rather expensive feature (in terms of debug info size) & I've not, myself, encountered particular uses/need for it as yet)

vsk added a comment.Jul 27 2018, 9:48 AM

Do you really want to implement all_calls? It seems like that's going to be difficult and expensive (in size). Looks like you're focussed mostly on the tail call case, so why not aim for all_tail_calls?

Without all_calls, it's not possible to infer a backtrace for certain chains of calls. Consider stopping inside of bar() in this example:

volatile int sink;

void bar() __attribute__((noinline)) {
  sink++; //< foo() does not show up in the backtrace.
}

void foo() __attribute__((noinline)) {
  bar(); //< Tail call.
}

void quux() __attribute__((noinline)) {
  bar(); //< Tail call.
}

int main() __attribute__((disable_tail_calls)) {
  foo(); //< Not a tail call.
  return 0;
}

The physical backtrace looks like:

#0 bar
#1 main

Without a TAG_call_site for 'foo' within 'main', it's not possible to complete the backtrace. The synthetic frame between 'bar' and 'main' could either be 'foo' or 'quux'.

& why the special case of only describing the first call site for any given function - I mean, it's a space saving, but is that sustainable (if so, are you thinking about proposing a DWARF change to support this implementation choice?)? If not, what's the benefit in having that feature/quirk in the short term?

Describing only the first call site for a given callee isn't a long-term solution. The basic problem a debugger needs to solve (as I understand it :) is to find a unique path from one frame to another. To solve this problem in the general case, you'd need a TAG_call_site with both the call target and the return PC (address of the instruction after the call) for every call in the program. The return PC is the extra piece of information that allows a debugger to disambiguate between the multiple call chains which could complete a physical backtrace. That's because a return address is stored in a known location on the stack and can be probed for.

The benefit of having limited TAG_call_site information initially is that it's enough to prototype basic debugger support. The algorithm is: do a DFS on the call graph to find a path from one frame to another, and refuse to create synthetic tail call frames if there's more than one path. This has limitations. For example, this algorithm wouldn't be able to complete the backtrace in the example above. However, I think it'd be useful as a proof-of-concept which we can use to lower-bound debug info bloat and to evaluate backtrace quality improvements.

(also, in general: what's your use case for this feature? It does seem like a rather expensive feature (in terms of debug info size) & I've not, myself, encountered particular uses/need for it as yet)

We have a lot of crash logs from optimized programs to symbolicate. I want this feature to make it easier to read these backtraces, much like inline frames do.

vsk added a comment.Jul 27 2018, 9:52 AM
In D49887#1178345, @vsk wrote:

The algorithm is: do a DFS on the call graph to find a path from one frame to another, and refuse to create synthetic tail call frames if there's more than one path. This has limitations. For example, this algorithm wouldn't be able to complete the backtrace in the example above.

Oops! Sorry. It would be able to complete the backtrace in the example I shared.

If, OTOH, there were a call to 'quux' within 'main', *then* it would fail to complete the backtrace.

As a partial and somewhat experimental feature, this should be controlled by a command-line option and off by default.
Even when it's complete it should probably still be under at least a tuning option, unless you can demonstrate the size cost is small.

vsk updated this revision to Diff 157810.Jul 27 2018, 5:04 PM
  • Gate call site info on -mllvm -callsite-debuginfo-experimental (off by default).
  • Add verifier checks for TAG_call_site. Resolve verifier issues found during stage2 testing.

Here are the dSYM sizes I see in RelWithDebInfo (-O2 -g) stage2 builds of clang:

BuildTotal dSYM SizeSubprogram CountCall Site Entries
Baseline1,053,275,396 bytes336,4110
-callsite-debuginfo-experimental=true1,066,904,289343,83882,399

There's a 1.3% increase in the overall size of the dSYM (13MB). 94% of that increase (12.3MB) is due to a size increase in the .debug_info section. The other 6% is distributed across other debugging sections.

It looks like the bulk of the size increase comes from forward-declarations of subprograms in DW_AT_call_origin. One take-away is that it might be more efficient (albeit non-conformant) to use mangled names inside of DW_AT_call_origin.

I experimented with emitting one TAG_call_site per callsite and found that this roughly doubles the number of entries (to 176,087). If we were to add return PC information to each entry, on net this would just add ~5MB to the final dSYM. This is a lot less than I expected.

aprantl added inline comments.Jul 30 2018, 7:41 AM
llvm/include/llvm/IR/DebugInfoFlags.def
51

Since this patch is adding a new flag, there should also be a bitcode roundtrip test (llvm-as - | llvm-dis - | llvm-as - | llvm-dis -).

In D49887#1179141, @vsk wrote:
  • Gate call site info on -mllvm -callsite-debuginfo-experimental (off by default).
  • Add verifier checks for TAG_call_site. Resolve verifier issues found during stage2 testing.

Here are the dSYM sizes I see in RelWithDebInfo (-O2 -g) stage2 builds of clang:

BuildTotal dSYM SizeSubprogram CountCall Site Entries
Baseline1,053,275,396 bytes336,4110
-callsite-debuginfo-experimental=true1,066,904,289343,83882,399

There's a 1.3% increase in the overall size of the dSYM (13MB). 94% of that increase (12.3MB) is due to a size increase in the .debug_info section. The other 6% is distributed across other debugging sections.

It looks like the bulk of the size increase comes from forward-declarations of subprograms in DW_AT_call_origin. One take-away is that it might be more efficient (albeit non-conformant) to use mangled names inside of DW_AT_call_origin.

I experimented with emitting one TAG_call_site per callsite and found that this roughly doubles the number of entries (to 176,087). If we were to add return PC information to each entry, on net this would just add ~5MB to the final dSYM. This is a lot less than I expected.

Sounds like it'd be reasonable to implement a standards-conformant mode, then? & if you're going for call_all_calls - the implementation seems simpler (no need for the extra "already used" map) & not too expensive.

vsk planned changes to this revision.Jul 31 2018, 9:44 AM

I think it makes sense to implement the standards-conformant solution first. I'll update this patch once I have a proof-of-concept on the lldb side.

vsk updated this revision to Diff 160012.Aug 9 2018, 2:46 PM
vsk retitled this revision from [DebugInfo] Add basic support for DWARF5 call site-related attributes to [DebugInfo] Add support for DWARF5 call site-related attributes.
vsk edited the summary of this revision. (Show Details)
  • Add verifier & bitcode tests, rebase.
  • Add complete DWARF generation support for call site entries (i.e, one call site entry per call, DW_AT_call_tail_call emission, and DW_AT_call_return_pc emission).
vsk planned changes to this revision.Aug 9 2018, 6:54 PM

Dsymutil will need to learn to resolve addresses in DW_AT_call_return_pc. Currently, I think it's skipping over them, which means that lldb can't use them.

probably leave this a bit more to @aprantl and @JDevlieghere (figure they can speak more to the LLDB needs/tradeoffs/etc that this is targeting) - but there's a first pass.

clang/lib/CodeGen/CGDebugInfo.cpp
51–55

Notice there aren't many cl::opts in clang - this should instead probably be added as a cc1 option, probably a '-f' option (see if GCC has a flag name for this already you can reuse?). Once it's more than experimental, promoted up to a full driver option that forwards down to a cc1 option, etc.

4371

Rather than taking and returning the flags - maybe just return the flags here and users can OR them with their other flags?

4376–4380

I don't think "full" debug info is the right property here - the difference between "full"/standalone and "limited" is that limited assumes the whole program is built with debug info - so it can expect to find debug info in another translation unit/compile unit to cover needed types, etc.

This doesn't fit under that umbrella - I think it probably needs a separate -f flag and it can default to on when tuning for LLDB potentially.

clang/test/CodeGenCXX/dbg-info-all-calls-described.cpp
44

I take it this is intended to test that non-defining declarations do /not/ have the DIFlagAllCallsDescribed added? But it looks like this check line doesn't test that it's /not/ present. Perhaps adding a comma at the end to ensure no other flags are being passed in the flags field?

48–67

Are these all interesting? Looks like they're all handled by the same code/aren't uniquely interesting situations, perhaps?

llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp
222

isSubroutineDIE checks for both DW_TAG_subprogram and DW_TAG_inlined_subroutine. Given this code specifically only wants to check for the latter - perhaps it should do that directly?

239

Not sure how this compares to other parts of the verifier - but I imagine this would be rather noisy if there was debug info with all correct DW_TAG_call_sites, but missing DW_AT_call attribute - a function would have many DW_TAG_call_sites, all generating errors, rather than just the first one in each function?

(but maybe that's not a huge deal? Dunno)

vsk updated this revision to Diff 160200.EditedAug 10 2018, 4:00 PM
vsk marked 4 inline comments as done.

Thanks for your feedback @dblaikie! I've addressed some of it and responded to the rest inline.

  • Add dsymutil support for relocating the contents of DW_AT_call_return_pc. Now, these addresses are useful to lldb. @friss & @JDevlieghere I've never looked at dsymutil before, would appreciate any comments you may have here.
  • Rebase & address some review comments.
clang/lib/CodeGen/CGDebugInfo.cpp
51–55

I think it'd make sense to add a -cc1/driver option if we wanted users to be able to toggle this functionality on/off. But, I don't think we do.

FWIW, GCC just turns call site info on by default (see https://godbolt.org/g/BrLCTd, note that they use .uleb128 0x2117 for DW_AT_GNU_all_call_sites but it's the same thing). From what I can make of GCC's '-###' output, they have no frontend toggle. I think the view here (which I share) is that call site info is just part of generic debugging support.

It seems prudent/sufficient to gate the llvm support on a temporary cl::opt. We'd remove the cl::opt in short order when https://reviews.llvm.org/D50478 lands.

Wdyt? Is there some use case for having a driver-level I'm just missing?

4371

Sounds good.

4376–4380

Thanks for catching this! Hm, I'm still not sure that we'd need a flag to toggle call site info. ISTM that it'd be appropriate to turn it on under DebugLineTablesOnly, LimitedDebugInfo, and FullDebugInfo. FWIW, GCC enables it at -g{1,2,3}.

clang/test/CodeGenCXX/dbg-info-all-calls-described.cpp
44

Yes, that's the intent behind the test. I assumed that this would be covered by the -implicit-check-not=DIFlagAllCallsDescribed? Yep:

$ echo "foo foo" | ./bin/FileCheck <(echo "CHECK: foo") -implicit-check-not=foo
command line:1:22: error: CHECK-NOT: excluded string found in input
-implicit-check-not='foo'
                     ^
<stdin>:1:5: note: found here
foo foo
48–67

Good point, this can be trimmed down.

llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp
239

That's a good point, but I'm not sure it'd be worth doing the extra bookkeeping to prevent that from happening.

aprantl added inline comments.Aug 13 2018, 9:25 AM
clang/lib/CodeGen/CGDebugInfo.cpp
51–55

Why would someone want to turn this feature off? Does it increase the size by a lot?

vsk added inline comments.Aug 13 2018, 10:08 AM
clang/lib/CodeGen/CGDebugInfo.cpp
51–55

I measured a 1.4% increase in the size of a RelWithDebInfo stage2 build of clang. This seemed like a reasonable increase to me.

aprantl added inline comments.Aug 13 2018, 10:17 AM
clang/lib/CodeGen/CGDebugInfo.cpp
51–55

Since this is purely additive I wouldn't even bother with an option for this feature and just enable it by default.

51–55

(if the emitted DWARF version is high enough)

friss added a comment.Aug 13 2018, 2:37 PM

The dsymutil part LGTM

vsk updated this revision to Diff 162432.Aug 24 2018, 11:25 AM
  • Per Adrian and Paul's feedback, remove the experimental "gating" cl::opt because the size overhead of call site info is low.
  • Rebase.

Hi Vedant, apologies for the delay in reviewing this. I have a few nits inline but otherwise this LGTM!

clang/lib/CodeGen/CGDebugInfo.cpp
4374

Any reason you have extra parenthesis here?

4384

nit: Let's switch those around so we gain short-circuiting for the already computed boolean.

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
555

nit: seems like you could simplify this with a ternary operator.

1353

nit: you could merge these two.

llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp
216

We tend to return an unsigned for the verifiers. It doesn't really matter much but it's nice for consistency and makes it easy to change if there's ever more than one failures.

vsk updated this revision to Diff 162925.Aug 28 2018, 11:52 AM
vsk marked 5 inline comments as done.
vsk edited the summary of this revision. (Show Details)

@JDevlieghere thanks for the feedback, and no worries!

vsk updated this revision to Diff 167039.Sep 25 2018, 7:36 PM
  • Rebase.
  • Back out the dsymutil change, as the correct section slide offset can only be applied by the debugger (after things like dylibs are mapped into a process).
vsk added a comment.Oct 2 2018, 12:09 PM

Does this look good to land?

Looks mostly good from my end.

llvm/test/DebugInfo/X86/dwarfdump-callsite-invalid-1.s
2

This test should go into test/tools/llvm-dwarfdump.

vsk updated this revision to Diff 168156.Oct 3 2018, 12:24 PM
vsk marked an inline comment as done.
  • Move a dwarfdump test into test/tools/llvm-dwarfdump.
aprantl accepted this revision.Oct 5 2018, 12:55 PM
This revision is now accepted and ready to land.Oct 5 2018, 12:55 PM
This revision was automatically updated to reflect the committed changes.
aheejin added inline comments.
llvm/trunk/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
536 ↗(On Diff #168523)

Sorry for the late question, but is the first operand always the callee in a call instruction? That's not true for WebAssembly, but I'm wondering if WebAssembly is an exception and this holds for all other targets.

Herald added a project: Restricted Project. · View Herald Transcript
djtodoro added inline comments.
llvm/trunk/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
536 ↗(On Diff #168523)

Sorry for the late question, but is the first operand always the callee in a call instruction?

AFAIK, it is true for Intel, Arm, MIPS...

That's not true for WebAssembly, but I'm wondering if WebAssembly is an exception and this holds for all other targets.

If that is the case, we can make a TargetInstr method, e.g. getCallee(), that returns the first operand in general, and override it in the WebAssembly case, right?

aheejin added inline comments.May 10 2021, 9:33 PM
llvm/trunk/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
536 ↗(On Diff #168523)

Thanks for the reply! We have multiple call instructions and the location of the callee operand differs even among them, so we have getCalleeOp utility method. Adding a general getCallee can be good, but if WebAssembly is the only target that does not satisfy the condition that the callee is the 0th operand, maybe we can add a small hacky if-else here..?

djtodoro added inline comments.May 11 2021, 1:36 AM
llvm/trunk/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
536 ↗(On Diff #168523)

I see... I think we should keep this code target independent (as far as possible), so I still vote for a TargetInstr method here.

(side note: In addition, there might be some downstream targets that could also have the callee operand at non-0th position, so this will indicate that they need to update that code in order to have proper debug info about call-sites.)

aheejin added inline comments.May 23 2021, 1:18 PM
llvm/trunk/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
536 ↗(On Diff #168523)

Thanks for the suggestion! I tried that in D102978.