This is an archive of the discontinued LLVM Phabricator instance.

[DWARF] Store compile unit IDs in the DIERef class
AbandonedPublic

Authored by labath on May 3 2019, 2:33 AM.

Details

Summary

The offset of a compile unit is enough to identify in a section, but
things start to get a lot more complicated when you have multiple object
files, each with potentially several debug-info-carrying sections
floating around.

We were already pushing this concept to its limits split-dwarf, which
required juggling offsets of several units to get things to work.
However, this really become a problem when we tried to introduce
debug_types support, which resulted in a long series of attempts to
"concatenate" various debug info sections so that the unit offset
remains a valid identifier.

Instead of attempting to assign unique offsets to all units that we care
about, this patch does something more fundamental. We just admit that
using offsets as unique identifiers does not scale to all debug info
formats we want to support, and uses a different ID instead.

Decoupling the ID from the offset means the offset can be free to point
to the actual offset of the compile unit in the relevant section,
without any changes to low-level parsing code needed. And using a
surrogate ID means its much easier to map the compile units into a
single address space. At least two strategies are possible after this
patch:
a) map all relevant units into a single continuous sequence of integers
(as if the were all stored in one vector)
b) steal some bits from the cu_idx field and use it as some sort of a
discriminator (essentially creating multiple address spaces)

Additionally, this makes us better prepared for the future, where the
total size of debug info exceeds 4GB. It also simplifies code, as we can
treat different debug info flavours more uniformly.

This patch:

  • renames the cu_offset field of DIERef to cu_idx and fixes all uses to treat it as such
  • removes DWARFUnit::GetBaseObjOffset as it no longer serves any useful purpose. Instead the DWO unit is made to share the ID of the "base" unit, as they logically represent a single compilation.
  • changes SymbolFileDWARFDwo to use the index of the contained unit as its ID (this is the same approach taken by SymbolFileDWARFDebugMap).
  • has a longer commit message than the code it changes. :)

Event Timeline

labath created this revision.May 3 2019, 2:33 AM
labath updated this revision to Diff 197936.May 3 2019, 2:42 AM

Fix two more things I noticed when looking at the patch in phab:

  • incorrect use of unit offset in DIERef constructor in the DebugMap case (I'm guessing this didn't cause any test failures because both offsets and IDs are generally zero there).
  • incorrect local variable name in ManualDWARFIndex (NFC)
labath added a comment.May 3 2019, 7:50 AM

Hm... looking at @jankratochvil's patches, I realized that I completely omitted HashedNameToDIE from this patch (it used emplace_back to construct DIERefs, so it snuck past me). Thinking about it, I guess the best way to handle that would be to set the cu_idx to the invalid value for DIERefs created there, as the offset is sufficient to identify the die in those cases (which I guess is also the reason why all tests passed). That shouldn't impact the performance in any way as one way or the other one has to do a binary search to convert the offset into a DWARFUnit*. I'll try to update that on monday...

labath planned changes to this revision.May 7 2019, 9:45 AM

My understanding of the DIERef class has evolved. I'm going to revise this approach a bit.

labath abandoned this revision.May 16 2019, 8:46 AM

Obsoleted by D61908.