Page MenuHomePhabricator

DWARF: Add "dwo_num" field to the DIERef class

Authored by labath on Jun 17 2019, 6:22 AM.



When dwo support was introduced, it used a trick where debug info
entries were referenced by the offset of the compile unit in the main
file, but the die offset was relative to the dwo file. Although there
was some elegance to it, this representation was starting to reach its
breaking point:

  • the fact that the skeleton compile unit owned the DWO file meant that it was impossible (or at least hard and unintuitive) to support DWO files containing more than one compile unit. These kinds of files are produced by LTO for example.
  • it made it impossible to reference any DIEs in the skeleton compile unit (although the skeleton units are generally empty, clang still puts some info into them with -fsplit-dwarf-inlining).
  • (current motivation) it made it very hard to support type units placed in DWO files, as type units don't have any skeleton units which could be referenced in the main file

This patch addresses this problem by introducing an additional
"dwo_num" field to the DIERef class, whose purpose is to identify the
dwo file. It's kind of similar to the dwo_id field in DWARF5 unit
headers, but while this is a 64bit hash whose main purpose is to catch
file mismatches, this is just a smaller integer used to indentify a
loaded dwo file. Currently, this is based on the index of the skeleton
compile unit which owns the dwo file, but it is intended to be
eventually independent of that (to support the LTO use case).

Simultaneously the unit_offset field is changed to reference the compile
unit containing the referenced die instead of the main unit. This means
we can remove the "BaseObjectOffset" field from the DWARFUnit class. It
also means we can remove some of the workarounds put in place to support
the skeleton-unit+dwo-die combo. More work is needed to remove all of
them, which is out of scope of this patch.

Diff Detail


Event Timeline

labath created this revision.Jun 17 2019, 6:22 AM

I am concerned with increasing the size of DIERef objects. If we go to 12 bytes per DIERef, and we mostly store these in NameToDIE in "lldb_private::UniqueCStringMap<DIERef> m_map;" maps, which contain a std::vector of UniqueCStringMap::Entry objects which are:

struct Entry {
  ConstString cstring;
  T value;

Where T is DIERef. This will align to 24 bytes. Before we would be at 16 when DIERef was 8 bytes. I know we already exceeded the 8 byte size of DIERef with the added "section" field in DIERef with SVN revision 360872, so we already have 24 byte alignment, but the memory used by LLDB for large debug sessions will have increased significantly since a month ago.

If DIERef becomes the same size as DWARFDIE objects, we still need DIERef as when we index the DWARF we end up with DIERef objects and then we can unload the actual DWARF for the compile unit (so we can't switch over to using DWARFDIE since they have pointers to the loaded DWARF DIEs, but we will need to watch for code that might have been using a DIERef in lieu of DWARFDIE objects due to the DIERef size being smaller, but now that they are closer or will usually align to the same size, some clients might be able to switch over to use DWARFDIE objects now that there is really no size win.

So to address my concern with increase in DIERef size is wether we can use the "lldb::user_id_t" in the NameToDie map if we can always change a user_id_t into a DIERef and get back a size win in the process? Possibly look at anyone else storing DIERef objects and see if they can use lldb::user_id_t values too?

59–62 ↗(On Diff #205060)

I'm a bit concerned about increasing the size by 4 bytes here. All of our indexes are using maps of name to DIERef objects right?

64 ↗(On Diff #205060)

Insert a message here?

245–246 ↗(On Diff #205060)

Make a DWARFDIE accessor function?:

DIERef DWARFDIE::GetDIERef() const;
1494 ↗(On Diff #205060)

Can a DWO file be missing and ever return NULL here?

1495 ↗(On Diff #205060)

NULL can never be returned here?

labath marked 7 inline comments as done.Jun 18 2019, 6:16 AM

I've been thinking about the DIERef class a lot while doing this, and the more I thought about it, the more I became convinced that forcing everything to go through DIERefs is not a good idea. It is kind of useful to have it as a sort of exchange currency between various components, but that means it forces an particular access pattern for the debug info, one which may not be always optimal. For example, there's no reason why the manual index would need to use offsets of anything. Offsets require binary search, but the manual index has already gone through the debug info once, so it can easily remember the indexes of everything. Indexes don't need binary search, and they can be stored more compactly. Even for the debug_names index, which does use offsets, the DIERef representation is not the ideal form. That's because it uses unit-relative die indexes while DIERef stores the absolute index. This means it still has to look up the DWO unit in order to correctly construct an absolute die offset.

So, I was thinking that instead of the indexes handing vectors of DIERef, they could provide an iterator api that would return DWARFDIEs directly. This way each index would be completely free to use whatever internal representation it wants. All it would need to do is to be able to convert it to a DWARFDIE eventually. The reason I've put putting it off for now is that the iterators, when combined with the polymorphism of the index class can get a bit messy, and so I wasn't sure if this is worth that effort. However, I agree that the size of the manual index is important, so I am going to post a dumbed-down version of this idea, which uses a compact representation internally, but then still converts it to an array of DIERefs.

59–62 ↗(On Diff #205060)

I would say only the manual index is affected by this. All the indexes *return* results as a DIERef, but I don't think that's a problem because they don't store them this way internally.

64 ↗(On Diff #205060)

What would you have me say? I can't think of anything that wouldn't already be obvious from the assert condition. I think that's one of the reasons why c++>=14 makes the message optional..

245–246 ↗(On Diff #205060)

Actually, there is one already, I just forgot to use it.

1494 ↗(On Diff #205060)

Given that the DWO numbers are completely made up and handed out by SymbolFileDWARFDwos, the only way you can get a valid dwo number is if you already had a SymbolFileDWARFDwo to give it to you. That means we can assume that the unit at the given index will be a skeleton unit and that it will contain a valid dwo symbol file.

labath updated this revision to Diff 205319.Jun 18 2019, 6:16 AM
labath marked 2 inline comments as done.
  • use the existing function for DWARFDIE->DIERef conversion
  • fix a bug where we were ignoring the dwo_num and section fields when searching for all entries in a given unit. Technically, this was incorrect even when we introduced the "section" field, but there it did not matter because this function is only used for searching for global variables, and those don't appear in a type unit. Add a test.
labath updated this revision to Diff 205527.Jun 19 2019, 1:59 AM

Remove the cu_offset field from DIERef, bringing the total size back down to "8", as dicussed here, and on D63491.

clayborg requested changes to this revision.Jun 19 2019, 3:09 PM
clayborg added inline comments.
30 ↗(On Diff #205527)

maybe make "llvm::Optional<uint32_t> num" the last arg and give it a llvm::None default argument value? Rename to dwo_num as well.

49–50 ↗(On Diff #205527)

Can we use "uint32_t" here? Explicitly specify the size would be nice.

Also we can avoid using invalid_dwo_num in this class if we do:

uint32_t m_dwo_num : 30;
uint32_t m_dwo_valid : 1;
uint32_t m_section : 1;
19 ↗(On Diff #205527)

can/should we still pass "const DIERef &ref" by value like we used to now that it is smaller?

59 ↗(On Diff #205527)

can/should we still pass "const DIERef &ref" by value like we used to now that it is smaller?

1224–1230 ↗(On Diff #205527)

Can we move the lldb::user_id_t code into DIERef now that it is really simple to decode? Too many details are leaking out of DIERef like its invalid dwo num enocding, where the section is in the user_id_t...

This code could be:

return DecodedUID{*this, DIERef(uid)};
This revision now requires changes to proceed.Jun 19 2019, 3:09 PM
labath marked 8 inline comments as done.Jun 20 2019, 2:35 AM
labath added inline comments.
30 ↗(On Diff #205527)

I'd rather not do that. There aren't that many places that are constructing DIERefs -- I counted just four, and only one (the one in apple index) of them is passing llvm::None as the argument. I think it's better to have the person writing the code stop and think about what is the right value here, instead of just hoping that the default argument will be correct (which it won't be, in most cases).

Also having the arguments ordered this way makes it a nice logical progression. First you select the file, then the section in that file, and finally you give the offset in that section.

59 ↗(On Diff #205527)

Yeah, that's a good point.

1224–1230 ↗(On Diff #205527)

I don't think that's a good idea for two reasons:

  • decoding is fairly simply in the non-debug-map case, but with a debug-map, the decoding is still relatively complicated, as you need to consult the debug map in order to find the right symbol file. If we don't move that part into the DIERef class, then we'll end up with a constructor that only works in some cases, which will be confusing. If we do move it, then we're letting information about SymbolFileDWARF and it's relationship to SymbolFileDWARFDebugMap leak into DIERef class.
  • I wouldn't even say this is leaking information out of the DIERef class. SymbolFileDWARF is responsible for the conversion from a DIERef into a user_id_t (in SymbolFileDWARF::GetUID), and so it should also covert it back. For that to work, it obviously needs to be aware of the fields in the DIERef and their value ranges, but it is free to encode that information into a user_id_t any way it sees fit. (However, this reminds me that I should update GetUID to make use of the dwo_num abstraction. Right now, it works because the dwo_num is derived from the SymbolFile ID, but it would be cleaner to use dwo_num explicitly.)
labath updated this revision to Diff 205760.Jun 20 2019, 2:35 AM
labath marked an inline comment as done.
  • use uint32_t for bitfields
  • pass DIERefs by value
  • implement GetUID so that it is structurally similar to the decoding code in DecodeUID
clayborg accepted this revision.Jun 20 2019, 7:04 AM
clayborg added inline comments.
35 ↗(On Diff #205760)

Sounds good

This revision is now accepted and ready to land.Jun 20 2019, 7:04 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJun 21 2019, 12:53 AM