This is an archive of the discontinued LLVM Phabricator instance.

Improve DWARF parsing and accessing by 1% to 2%
ClosedPublic

Authored by clayborg on May 29 2019, 4:14 PM.

Details

Summary

When LLDB first started we didn't have our mmap of the DWARF data done correctly and if the backing file would change we would get live changes as the file changed and it would cause problems. We now mmap correctly and do not run into these issues. There was legacy code in DWARFDebugInfoEntry::GetAbbreviationDeclarationPtr(...) that would always extract the abbrev index each time the function was called to verify that DWARF data hadn't changed and a warning was emitted if it did. We no longer need this and the code was removed. The other thing this function did when it parsed the abbrev index was give us the offset of the first attribute bytes by adding the LEB128 size to the offset. This required an extra parameter to DWARFDebugInfoEntry::GetAbbreviationDeclarationPtr(...) which is now removed. I added "lldb::offset_t DWARFDebugInfoEntry::GetFirstAttributeOffset() const" which calculates this when we need it and modified all sites that need the offset to call it.

Now that we aren't decoding and verifying the abbrev index, it speeds up DWARF access by 1% to 2%.

Diff Detail

Event Timeline

clayborg created this revision.May 29 2019, 4:14 PM
aprantl accepted this revision.May 29 2019, 4:28 PM

Out of curiosity, what was that change in the mmap call that fixed this?

This revision is now accepted and ready to land.May 29 2019, 4:28 PM
labath accepted this revision.May 30 2019, 12:52 AM
labath added inline comments.
source/Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.cpp
724

It would be better to say auto *abbrevDecl here. The extra char doesn't cost much, and it saves one from wondering whether there is anything fancy going on behind the scenes http://llvm.org/docs/CodingStandards.html#beware-unnecessary-copies-with-auto

BTW, how do you measure these things?

When I tried to benchmark something I got a difference of at least 5% on what should be identical runs. I'd have to make hundreds of runs in order for a 1% difference in average to show up as statistically significant..

BTW, how do you measure these things?

When I tried to benchmark something I got a difference of at least 5% on what should be identical runs. I'd have to make hundreds of runs in order for a 1% difference in average to show up as statistically significant..

I quit all programs on my macOS machine and run the same script 5 times in a row. I get quite consistent results that are plus or minus 0.1 seconds usually. So I try to ensure nothing else is running. I have a dwarf script that loads 100 or so files with debug info (all .so files from a directory) and then sets a breakpoint by name and by file and line where the file is a header file, to ensure all line tables get checked.

$ ~/Documents/src/lldb/svn/lldb/build/Release/lldb -o 'command script import dwarfperf.py' -o 'dwarfperf -d ~/Documents/143642775' -o q

This will load all files from "~/Documents/143642775" and then set breakpoints and it returns a time. These are all files with no accelerator tables to ensure the DWARF must be manually indexed.

Out of curiosity, what was that change in the mmap call that fixed this?

We switched to mapping things private and shared. This might be been before open sourcing happened so there might not be a revision for it. Previously we weren't mapping private so we got any changes made to the file. I have never seen the error fire since we fixed this.

clayborg closed this revision.May 30 2019, 8:39 AM

r362103 | gclayton | 2019-05-30 08:21:23 -0700 (Thu, 30 May 2019) | 9 lines

Improve DWARF parsing and accessing by 1% to 2%

When LLDB first started we didn't have our mmap of the DWARF data done correctly and if the backing file would change we would get live changes as the file changed and it would cause problems. We now mmap correctly and do not run into these issues. There was legacy code in DWARFDebugInfoEntry::GetAbbreviationDeclarationPtr(...) that would always extract the abbrev index each time the function was called to verify that DWARF data hadn't changed and a warning was emitted if it did. We no longer need this and the code was removed. The other thing this function did when it parsed the abbrev index was give us the offset of the first attribute bytes by adding the LEB128 size to the offset. This required an extra parameter to DWARFDebugInfoEntry::GetAbbreviationDeclarationPtr(...) which is now removed. I added "lldb::offset_t DWARFDebugInfoEntry::GetFirstAttributeOffset() const" which calculates this when we need it and modified all sites that need the offset to call it.

Now that we aren't decoding and verifying the abbrev index, it speeds up DWARF access by 1% to 2%.

Differential Revision: https://reviews.llvm.org/D62634

On Fedora 29 x86_64 it broke:

FAIL: LLDB (/home/jkratoch/redhat/llvm-monorepo-clangassert/bin/clang-9-x86_64) :: test_enable_dwo (TestBreakpointLocations.BreakpointLocationsTestCase)
FAIL: LLDB (/home/jkratoch/redhat/llvm-monorepo-clangassert/bin/clang-9-x86_64) :: test_shadowed_command_options_dwo (TestBreakpointLocations.BreakpointLocationsTestCase)
FAIL: LLDB (/home/jkratoch/redhat/llvm-monorepo-clangassert/bin/clang-9-x86_64) :: test_shadowed_cond_options_dwo (TestBreakpointLocations.BreakpointLocationsTestCase)

That is the dwo part of TestBreakpointLocations.

On Fedora 29 x86_64 it broke:

FAIL: LLDB (/home/jkratoch/redhat/llvm-monorepo-clangassert/bin/clang-9-x86_64) :: test_enable_dwo (TestBreakpointLocations.BreakpointLocationsTestCase)
FAIL: LLDB (/home/jkratoch/redhat/llvm-monorepo-clangassert/bin/clang-9-x86_64) :: test_shadowed_command_options_dwo (TestBreakpointLocations.BreakpointLocationsTestCase)
FAIL: LLDB (/home/jkratoch/redhat/llvm-monorepo-clangassert/bin/clang-9-x86_64) :: test_shadowed_cond_options_dwo (TestBreakpointLocations.BreakpointLocationsTestCase)

That is the dwo part of TestBreakpointLocations.

This really shouldn't be the cause. No functional change here.

labath added a subscriber: dblaikie.Jun 3 2019, 8:29 AM

Actually, there was a subtle change of behavior here. Before this change, if we tried to parse a DIE using an abbreviation table from a different file, we would most likely fail immediately because of the fail-safe in GetAbbreviationDeclarationPtr. Now, we just do a blind read and return bogus data.

Now, normally this shouldn't matter, but in case of split-dwarf, things start to get interesting. Lldb has this assumption that when we are reading a debug info entry (which is not the CU DIE), we are actually reading it from the dwo compile unit and not the skeleton unit in the main file. This means it uses the dwo abbreviation list and everything. Now, as far as I can tell, LLDB is kind of right here. The DWARF5 (in v4 split dwarf was a non-standard extension) spec says "A skeleton compilation unit has no children." (3.1.2 Skeleton Compilation Unit Entries). And indeed, gcc produces compilation units with no children. Clang, on the other hand, seems to be putting some DIEs into the main CU, if the compilation results in functions being inlined. And it seems clang has been doing this since at least 7.0 (I haven't tried older versions).

So it seems we have two problems here:

  1. We need to figure out whether there's a bug in clang and fix it. + @dblaikie for help with that
  2. Depending on the outcome of the first item, we need to do something in lldb to handle this. If this is a clang bug (and my feeling is that it is), then the best solution might be to just start ignoring any non-CU dies from the main file in split-dwarf scenario

@dblaikie: Do you agree with my assessment above? If this is indeed a clang bug, any chance you could help with debugging the clang/llvm side of things?

The simplest reproducer for this would be something like:

inline __attribute__((always_inline)) int foo(int x) { return x; }

int main(int argc) { return foo(argc); }

$ clang++ a.cc -c -o a.o -g -gsplit-dwarf

Actually, there was a subtle change of behavior here. Before this change, if we tried to parse a DIE using an abbreviation table from a different file, we would most likely fail immediately because of the fail-safe in GetAbbreviationDeclarationPtr. Now, we just do a blind read and return bogus data.

Now, normally this shouldn't matter, but in case of split-dwarf, things start to get interesting. Lldb has this assumption that when we are reading a debug info entry (which is not the CU DIE), we are actually reading it from the dwo compile unit and not the skeleton unit in the main file. This means it uses the dwo abbreviation list and everything. Now, as far as I can tell, LLDB is kind of right here. The DWARF5 (in v4 split dwarf was a non-standard extension) spec says "A skeleton compilation unit has no children." (3.1.2 Skeleton Compilation Unit Entries). And indeed, gcc produces compilation units with no children. Clang, on the other hand, seems to be putting some DIEs into the main CU, if the compilation results in functions being inlined. And it seems clang has been doing this since at least 7.0 (I haven't tried older versions).

So it seems we have two problems here:

  1. We need to figure out whether there's a bug in clang and fix it. + @dblaikie for help with that

This is intentional behavior in clang (controllable under the -f[no-]split-dwarf-inlining flag, and modified by the -f[no-]debug-info-for-profiling flag).

This extra debug info is used for online symbolication (in the absence of .dwo files) - such as for the sanitizers (accurate symbolication is necessary for correctness in msan, due to msan's necessary use of blacklisting to avoid certain false positives) and some forms of sample based profiling collection.

In the default mode, clang includes, as you noted, just the subprograms that have inlined subroutines in them in this split-dwarf-inlining data.
In -fdebug-info-for-profiling all subprograms are described in the skeleton CU with some minimal attributes (they don't need parameters, local variables/scopes, etc) necessary to do certain profile things I don't know lots about.

Chances are it's probably best for a debugger to ignore these entries.

  1. Depending on the outcome of the first item, we need to do something in lldb to handle this. If this is a clang bug (and my feeling is that it is), then the best solution might be to just start ignoring any non-CU dies from the main file in split-dwarf scenario

@dblaikie: Do you agree with my assessment above? If this is indeed a clang bug, any chance you could help with debugging the clang/llvm side of things?

The simplest reproducer for this would be something like:

inline __attribute__((always_inline)) int foo(int x) { return x; }

int main(int argc) { return foo(argc); }

$ clang++ a.cc -c -o a.o -g -gsplit-dwarf

This is intentional behavior in clang (controllable under the -f[no-]split-dwarf-inlining flag, and modified by the -f[no-]debug-info-for-profiling flag).

This extra debug info is used for online symbolication (in the absence of .dwo files) - such as for the sanitizers (accurate symbolication is necessary for correctness in msan, due to msan's necessary use of blacklisting to avoid certain false positives) and some forms of sample based profiling collection.

In the default mode, clang includes, as you noted, just the subprograms that have inlined subroutines in them in this split-dwarf-inlining data.
In -fdebug-info-for-profiling all subprograms are described in the skeleton CU with some minimal attributes (they don't need parameters, local variables/scopes, etc) necessary to do certain profile things I don't know lots about.

I think we have to tolerate the msan part. However, the profile feedback workflow could (should) surely be taught to read a .dwo file. The point of -fdebug-info-for-profiling was so you don't have to emit limited/full debug info in the .o file just to make profiling work; but if you're using split DWARF then you're doing limited/full anyway, and the feedback loop happens in the context of a build environment so the .dwo can be asserted to be available. So in split DWARF mode, we should not be emitting debug-info-for-profiling into the skeleton.

Chances are it's probably best for a debugger to ignore these entries.

This stuff is going to show up in the wild, so yeah. You know you're looking at a skeleton, so don't bother looking at any children.

This is intentional behavior in clang (controllable under the -f[no-]split-dwarf-inlining flag, and modified by the -f[no-]debug-info-for-profiling flag).

This extra debug info is used for online symbolication (in the absence of .dwo files) - such as for the sanitizers (accurate symbolication is necessary for correctness in msan, due to msan's necessary use of blacklisting to avoid certain false positives) and some forms of sample based profiling collection.

In the default mode, clang includes, as you noted, just the subprograms that have inlined subroutines in them in this split-dwarf-inlining data.
In -fdebug-info-for-profiling all subprograms are described in the skeleton CU with some minimal attributes (they don't need parameters, local variables/scopes, etc) necessary to do certain profile things I don't know lots about.

I think we have to tolerate the msan part. However, the profile feedback workflow could (should) surely be taught to read a .dwo file. The point of -fdebug-info-for-profiling was so you don't have to emit limited/full debug info in the .o file just to make profiling work; but if you're using split DWARF then you're doing limited/full anyway, and the feedback loop happens in the context of a build environment so the .dwo can be asserted to be available. So in split DWARF mode, we should not be emitting debug-info-for-profiling into the skeleton.

Ah, hmm, looks like debug-info-for-profiling+split+-dwarf-inling ends up in a slighty hybrid state. DIFP does add the linkage name, decl line and decl file to subprograms, beyond just the simple name that gmlt would normally provide. But one of DIFP's other features doesn't trigger on the split-dwarf-inlining: it doesn't cause descriptions of functions without any inlining to be described.

Chances are it's probably best for a debugger to ignore these entries.

This stuff is going to show up in the wild, so yeah. You know you're looking at a skeleton, so don't bother looking at any children.

labath added a comment.Jun 4 2019, 4:14 AM
  1. We need to figure out whether there's a bug in clang and fix it. + @dblaikie for help with that

This is intentional behavior in clang (controllable under the -f[no-]split-dwarf-inlining flag, and modified by the -f[no-]debug-info-for-profiling flag).

This extra debug info is used for online symbolication (in the absence of .dwo files) - such as for the sanitizers (accurate symbolication is necessary for correctness in msan, due to msan's necessary use of blacklisting to avoid certain false positives) and some forms of sample based profiling collection.

In the default mode, clang includes, as you noted, just the subprograms that have inlined subroutines in them in this split-dwarf-inlining data.
In -fdebug-info-for-profiling all subprograms are described in the skeleton CU with some minimal attributes (they don't need parameters, local variables/scopes, etc) necessary to do certain profile things I don't know lots about.

Chances are it's probably best for a debugger to ignore these entries.

Thanks for explaining David. It sounds like we can safely ignore these entries if we have successfully loaded the dwo file, as it will contain a superset of information. If we cannot find the dwo file for any reason, then this information might be useful as a rough approximation of debug info. However, in this case, there's no dwo file around, so there isn't a possibility for confusion. I'll prepare a patch doing something like that.

Over email, @clayborg wrote:

Actually, there was a subtle change of behavior here. Before this change, if we tried to parse a DIE using an abbreviation table from a different file, we would most likely fail immediately because of the fail-safe in GetAbbreviationDeclarationPtr. Now, we just do a blind read and return bogus data.

This seems just like a straight up bug we need to fix in LLDB. It shouldn't affect or require any changes from any compilers. Seems like a lldbassert that maybe verifies that the abbreviation we use is valid for the current CU in debug builds would be nice so we can trigger this bug when we run the test suite or locally against an example program would be good. Depending on that fact that we have a mismatch in the abbrev index seems fragile. Most .debug_abbrev tables start out with a DW_TAG_compile_unit. So if the abbrev index magically matches, that still doesn't mean the we will be decoding the right data even if the index matches.

Yes, this part is a bug, but in order to figure out how to fix it, I had to know whether the compiler behavior is intentional, and whether we need to access that data. If it turned out we did, the fix would be more involved, because we'd also need to change the UID encoding so that we can reference these DIEs correctly, etc. Given that it seems we can just ignore these debug info entries, the fix will be relatively simple.