This is an archive of the discontinued LLVM Phabricator instance.

DebugInfo: Filter DWARFv5 TUs out of the debug_info unit list when CUs requested
ClosedPublic

Authored by dblaikie on Sep 18 2020, 1:11 PM.

Details

Summary

Since DWARFv5 places TUs in debug_info, some of DWARFContext's APIs have
become a bit erroneous, including TUs in the CU list by accident.

Implementation: Use a filtering iterator when walking the debug_info units looking for CUs.

Drawback: The range type is now different from other unit queries, and the iteration may be slightly slower due to the filtering.

Trying to pick between this solution and D87937

Diff Detail

Event Timeline

dblaikie created this revision.Sep 18 2020, 1:11 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 18 2020, 1:11 PM
dblaikie requested review of this revision.Sep 18 2020, 1:11 PM
dblaikie edited the summary of this revision. (Show Details)Sep 18 2020, 1:23 PM

My instinct is that I prefer a third approach - simply keep both combined and separate lists - the combined list (could be a vector or an ordered map) is the offset-ordered list of all units, whilst there are then separate lists for compile and type units, with members being pointers to the units stored in the primary list. I think this has the advantage of being very explicit about things, with the drawback being that there are multiple members to keep in sync (possibly alleviated by having a wrapper class around the containers).

If I had to pick between the two existing options, I think I have a marginal preference for this one, since it requires less code and is less surprising to someone who might not be too familiar with the code - I would expect a list of units to be in offset order naturally, and not expect to have to go through some sort of sorting procedure in the future.

My instinct is that I prefer a third approach - simply keep both combined and separate lists - the combined list (could be a vector or an ordered map) is the offset-ordered list of all units, whilst there are then separate lists for compile and type units, with members being pointers to the units stored in the primary list. I think this has the advantage of being very explicit about things, with the drawback being that there are multiple members to keep in sync (possibly alleviated by having a wrapper class around the containers).

My concern there is higher memory usage/processing time - something I'm currently trying to combat. (& in fact, considering how a broader refactoring of libDebugInfoDWARF might look - to avoid parsing things a given client isn't using - I guess by that token, it might be nice to be able to read just the compile units and skip over the type units entirely - though, then if someone does ask for the debug_info units you'd need to go back and parse them to build a second list). I guess if you're OK with the lists being independently lazily built (well, not entirely independent - they would share any elements that overlap... - that'd make ownershp difficult, though), maybe it'd make sense that way. I guess alternatively I could start experimenting with a fully lazy parsing that doesn't keep a list at all - just walks the list, building one at a time... though that's going to be way more involved than I can get into right now (cross-CU referencing, etc).

If I had to pick between the two existing options, I think I have a marginal preference for this one, since it requires less code and is less surprising to someone who might not be too familiar with the code - I would expect a list of units to be in offset order naturally, and not expect to have to go through some sort of sorting procedure in the future.

Fair - appreciate the perspective. One other wrinkle here is that if we support the "type_units()" API (I don't think anyone's actually using it, so I'd probably delete it for now) it'll need to use a fancy iterator solution too - concat_iterator + filter_iterator to take the v5 type units out of the debug_info units, and concat that with the v4 type units from debug_types.

If I had to pick between the two existing options, I think I have a marginal preference for this one, since it requires less code and is less surprising to someone who might not be too familiar with the code - I would expect a list of units to be in offset order naturally, and not expect to have to go through some sort of sorting procedure in the future.

Fair - appreciate the perspective. One other wrinkle here is that if we support the "type_units()" API (I don't think anyone's actually using it, so I'd probably delete it for now) it'll need to use a fancy iterator solution too - concat_iterator + filter_iterator to take the v5 type units out of the debug_info units, and concat that with the v4 type units from debug_types.

I'm happy for that little extra complexity. I think it intuitively makes sense, even if it does mean messing about a little in the code.

llvm/include/llvm/DebugInfo/DWARF/DWARFContext.h
166

Fixes the clang-tidy warning, and also more in keeping with isTypeUnit. (Also isCU would be acceptable).

This revision was not accepted when it landed; it landed in state Needs Review.Sep 23 2020, 10:16 PM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.

If I had to pick between the two existing options, I think I have a marginal preference for this one, since it requires less code and is less surprising to someone who might not be too familiar with the code - I would expect a list of units to be in offset order naturally, and not expect to have to go through some sort of sorting procedure in the future.

Fair - appreciate the perspective. One other wrinkle here is that if we support the "type_units()" API (I don't think anyone's actually using it, so I'd probably delete it for now) it'll need to use a fancy iterator solution too - concat_iterator + filter_iterator to take the v5 type units out of the debug_info units, and concat that with the v4 type units from debug_types.

I'm happy for that little extra complexity. I think it intuitively makes sense, even if it does mean messing about a little in the code.

Fair enough. It's adequate for needs right now - though I'd like to revisit/keep in our mind the idea that maybe these APIs should all be a bit more lightweight/only-as-needed if we can head in that direction. (I'm concerned that to get there may not be readily made incremental, but we'll see - maybe we can pick away at it)

Appreciate the help considering the alternatives here!

llvm/include/llvm/DebugInfo/DWARF/DWARFContext.h
166

Ah, sure thing - (& had a weird bit of leftover where I had two versions of this function (one here and one in DWARFUnit.h) & things were working anyway... which is vaguely unsettling - anyway, removed that duplication in the process of fixing the rename)