This is an archive of the discontinued LLVM Phabricator instance.

Calculate indexes of last child of each DWARF entry once during tryExtractDIEsIfNeeded.
Needs ReviewPublic

Authored by simon.giesecke on May 17 2021, 9:11 AM.

Details

Summary

This ensures that the last child indexes are calculated in linear time and
can later be queried in constant time by getLastChild.

The baseline situation was that individual calls to getLastChild were linear in the
size of DieArray. Calling getLastChild once for every DWARFDebugInfoEntry was
amortized quadratic in the size of DieArray.

Running "llvm-gsymutil --convert llvm-gsymutil --quiet" using a RelWithDebInfo build
of llvm-gsymutil (after recent other optimizations) and gathering a profile using
perf showed that llvm::DWARFUnit::getLastChild is the no. 1 function, accounting
for 9.9% of the CPU time, and llvm::DWARFUnit::getSibiling is the no. 4 function,
accounting for 4.76% of the CPU time.

Diff Detail

Event Timeline

simon.giesecke created this revision.May 17 2021, 9:11 AM
simon.giesecke requested review of this revision.May 17 2021, 9:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 17 2021, 9:12 AM

One thing I don't know here is whether all relevant uses of DWARFUnit will end up iterating the children. If that's not the case, then this be done only optionally. We can't do it on-demand on the first call to getLastChild easily though, because then, e.g. GsymCreator, would need to synchronize accesses from multiple threads. But maybe this could be passed as an option to tryExtractDIEsIfNeeded?

If this is worthwhile, a similar thing could be done for the siblings.

simon.giesecke edited the summary of this revision. (Show Details)May 17 2021, 9:14 AM
clayborg requested changes to this revision.May 17 2021, 7:42 PM
clayborg added reviewers: aprantl, JDevlieghere.

You should probably get Phabricator working: https://llvm.org/docs/Phabricator.html

This will ensure your patches have context. If you submitting patches manually, you need to specify more context lines:

git diff -U999999

The DWARFDebugInfoEntry in LLDB contains more data. See inlined comment on improving performance for the LLVM DWARF reader. But if we are going to fix perf issues with DWARFDie navigation, we should improve the DWARFDebugInfoEntry class to contain the information needed to navigate better. FYI: be very careful when trying to port any code from the LLDB DWARF parser as it actually removes the NULL terminator DIEs from its DWARFUnit vector of DWARFDebugInfoEntry objects as this saves a ton of memory and they aren't needed if you create your DWARFDebugInfoEntry with enough info.

llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp
833–842

This is a side affect of how the DWARFDebugInfoEntry class is created. LLDB has a more efficient way of doing things as each DWARFDebugInfoEntry knows its sibling index. Since all of the DIEs are contained in an std::vector inside of the DWARFUnit, we can just store the parent index, the sibling index. See the file lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.h.

uint32_t m_parent_idx; // How many to subtract from "this" to get the parent.
uint32_t m_sibling_idx; // How many to add to "this" to get the sibling.

The current DWARFDebugInfoEntry just contains a offset and a depth and an abbrev pointer. This info doesn't make it easy to navigate the DWARFDie objects sibling, and parent. If the LLVM DWARF parser adopts this parent index and sibling index, then navigation can happen much quicker and this function can simply get the sibling index and subtract 1 as that will always the the NULL tag that terminates the previous DIE child chain.

This revision now requires changes to proceed.May 17 2021, 7:44 PM

You should probably get Phabricator working: https://llvm.org/docs/Phabricator.html

This will ensure your patches have context. If you submitting patches manually, you need to specify more context lines:

git diff -U999999

I have set up arc now.

The DWARFDebugInfoEntry in LLDB contains more data. See inlined comment on improving performance for the LLVM DWARF reader. But if we are going to fix perf issues with DWARFDie navigation, we should improve the DWARFDebugInfoEntry class to contain the information needed to navigate better.

Ok, I added this separately first, because I wasn't sure if it should be optional, and not take up memory for use cases that might not need it. But from your comments I now understand that the extra memory usage should not be an issue.

FYI: be very careful when trying to port any code from the LLDB DWARF parser as it actually removes the NULL terminator DIEs from its DWARFUnit vector of DWARFDebugInfoEntry objects as this saves a ton of memory and they aren't needed if you create your DWARFDebugInfoEntry with enough info.

Should we remove the NULL terminator DIEs here as well, eventually? But I guess even if that should be done, this requires a lot of further adaptations. However, we would probably save more memory than we're adding through the extra sibling and parent index entries.

simon.giesecke retitled this revision from Calculate indexes of last child of each DWARF entry once during tryExtractDIEsIfNeeded. to Avoid calculating the string hash twice in GsymCreator::insertString..
simon.giesecke edited the summary of this revision. (Show Details)

Update using arc

simon.giesecke retitled this revision from Avoid calculating the string hash twice in GsymCreator::insertString. to Calculate indexes of last child of each DWARF entry once during tryExtractDIEsIfNeeded..
simon.giesecke edited the summary of this revision. (Show Details)

Update using arc

What's your use case such that this performance concern has come up for you?

avl added a subscriber: avl.May 18 2021, 11:42 AM
avl added inline comments.
llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp
833–842

uint32_t m_parent_idx; How many to subtract from "this" to get the parent.
uint32_t m_sibling_idx;
How many to add to "this" to get the sibling.

The current DWARFDebugInfoEntry just contains a offset and a depth and an abbrev pointer. This info doesn't make it easy to navigate the DWARFDie objects sibling, and parent. If the LLVM DWARF parser adopts this parent index and sibling index, then navigation can happen much quicker and this function can simply get the sibling index and subtract 1 as that will always the the NULL tag that terminates the previous DIE child chain.

If the LLVM DWARF parser adopts above parent index and sibling index solution then it would also be useful for dsymutil. Currently it creates links to parents by separate pass.

What's your use case such that this performance concern has come up for you?

I think anyone who goes to any DIE that contains a bunch of children and uses the DWARFDie::iterator (like for a CU Die) and iterates over its children is probably the worst case.

So any users of DWARFDie::end() like DWARFDie::children(). This is used in many places like in llvm/lib/CodeGen/AsmPrinter/AsmPrinterDwarf.cpp, llvm/lib/CodeGen/AsmPrinter/DIEHash.cpp, llvm/lib/DebugInfo/GSYM/DwarfTransformer.cpp, llvm/lib/DWARFLinker/DWARFLinker.cpp and llvm/unittests/DebugInfo/DWARF/DWARFDebugInfoTest.cpp. There may be more and I just searched for "Die.children()".

Any of the DWARFUnit::getSibling(...), DWARFUnit::getParent(...), DWARFUnit::getLastChild(...) would be greatly sped up if we use m_parent_idx and m_sibling_idx in DWARFDebugInfoEntry.

The DWARFDebugInfoEntry in LLDB contains more data. See inlined comment on improving performance for the LLVM DWARF reader. But if we are going to fix perf issues with DWARFDie navigation, we should improve the DWARFDebugInfoEntry class to contain the information needed to navigate better.

Ok, I added this separately first, because I wasn't sure if it should be optional, and not take up memory for use cases that might not need it. But from your comments I now understand that the extra memory usage should not be an issue.

The LLVM DWARFDebugInfoEntry contains:

class DWARFDebugInfoEntry {
  /// Offset within the .debug_info of the start of this entry.
  uint64_t Offset = 0;

  /// The integer depth of this DIE within the compile unit DIEs where the
  /// compile/type unit DIE has a depth of zero.
  uint32_t Depth = 0;

  const DWARFAbbreviationDeclaration *AbbrevDecl = nullptr;
};

This will end up being 24 bits with 4 bytes of padding after the Depth.

If we modified the LLDB DWARFDebugInfoEntry for LLVM it could contain:

class DWARFDebugInfoEntry {
  uint64_t Offset; // Offset within the .debug_info/.debug_types
  uint32_t ParentIdx; // How many to subtract from "this" to get the parent. If zero this die has no parent
  uint32_t SiblingIdx; // How many to add to "this" to get the sibling.
  const DWARFAbbreviationDeclaration *AbbrevDecl = nullptr;
};

This would be the same 24 byte size as the current version and it would improve many of the DWARFUnit::getXXX() calls that get siblings, parents, etc.

So this wouldn't take up any more memory.

FYI: be very careful when trying to port any code from the LLDB DWARF parser as it actually removes the NULL terminator DIEs from its DWARFUnit vector of DWARFDebugInfoEntry objects as this saves a ton of memory and they aren't needed if you create your DWARFDebugInfoEntry with enough info.

Should we remove the NULL terminator DIEs here as well, eventually? But I guess even if that should be done, this requires a lot of further adaptations. However, we would probably save more memory than we're adding through the extra sibling and parent index entries.

Again, there will be no memory differences between the old and new due to the padding that exists in the older DWARFDebugInfoEntry structs. I would avoid removing the NULL tags for now because DWARF dumping code relies on these being in there for DWARF printing. This is something we could do later though.

What's your use case such that this performance concern has come up for you?

It's a hotspot when running llvm-gsymutil --convert. I can provide some numbers if you want.

What's your use case such that this performance concern has come up for you?

It's a hotspot when running llvm-gsymutil --convert. I can provide some numbers if you want.

If you happen to have them, might be nice to have in the patch description/commit message.

(@clayborg's suggestion of replacing "Depth" with two smaller fields seems roughly plausible to me/reasonable thing to try, for what it's worth)

simon.giesecke edited the summary of this revision. (Show Details)

Updated patch description with numbers from perf profile.

avl added inline comments.Sep 6 2021, 5:44 AM
llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp
833–842

This is a side affect of how the DWARFDebugInfoEntry class is created. LLDB has a more efficient way of doing things as each DWARFDebugInfoEntry knows its sibling index. Since all of the DIEs are contained in an std::vector inside of the DWARFUnit, we can just store the parent index, the sibling index. See the file lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.h.

uint32_t m_parent_idx; // How many to subtract from "this" to get the parent.
uint32_t m_sibling_idx; // How many to add to "this" to get the sibling.

The current DWARFDebugInfoEntry just contains a offset and a depth and an abbrev pointer. This info doesn't make it easy to navigate the DWARFDie objects sibling, and parent. If the LLVM DWARF parser adopts this parent index and sibling index, then navigation can happen much quicker and this function can simply get the sibling index and subtract 1 as that will always the the NULL tag that terminates the previous DIE child chain.

Since it was mentioned in this review - Is somebody going to implementing that idea(replacing "Depth" with two smaller fields)?

simon.giesecke added inline comments.Sep 6 2021, 6:12 AM
llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp
833–842

Since it was mentioned in this review - Is somebody going to implementing that idea(replacing "Depth" with two smaller fields)?

I would consider that when working on this again. However, I am not sure when I will be able to get back to this. If someone else wants to pick it up, that would be great.

I at least added a comment to DWARFDebugInfoEntry to reflect the possible directions (such as adding parent/sibling index) and reference this review here: 821954f97c6b978cca72cb412e98d35caee4cac3 - so if/whenever someone feels like poking at this they might have a good chance of coming across some of this history/context.

avl added inline comments.Sep 16 2021, 1:51 PM
llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp
833–842

Since it was mentioned in this review - Is somebody going to implementing that idea(replacing "Depth" with two smaller fields)?

I would consider that when working on this again. However, I am not sure when I will be able to get back to this. If someone else wants to pick it up, that would be great.

I am interested in picking up this. Will do this. Thanks.