This is an archive of the discontinued LLVM Phabricator instance.

[DWARF v5] Refactoring range list dumping to fold DWARF v4 functionality into v5 handling (almost NFC).
ClosedPublic

Authored by wolfgangp on Aug 21 2018, 6:17 PM.

Details

Summary

This patch is motivated by the following observations:

  • DWARF 4 rangelists are essentially a subset of DWARF 5 rangelists.
  • We implemented DWARF5 largely independently (the consumer side, i.e. mostly llvm-dwarfdump). The DWARF 5 implementations are in DWARFDebugRnglists.*, the DWARF 4 implementations in DWARFDebugRangelists.* with a fair amount of duplication.

So I thought it should be possible to fold the DWARF 4 functionality into the newly implemented DWARF 5 functionality so that, at the expense of a bit more version-dependent code in dump and extract routines, we should be able to eliminate the DWARF4-only code (i.e. DWARFDebugRangeList.*) entirely.

This patch does attempt this in the following way:

  • DWARF5 has range list tables, which of course don't exist in DWARF4, so for DWARF 4 (i.e. when we are extracting range lists from the .debug_ranges section) we create a fake table so we can use the same mechanisms as in DWARF 5.
  • The extract and dump routines become version-aware so they can do what's expected both for DWARF4 and DWARF5.

The code in DWARFContext.cpp, DWARFUnit.cpp that arbiters between the different versions now just needs to set up the extractors correctly, set the correct section and pass the DWARF version, and should otherwise be version-oblivious to deal with ranges.

This is also prep for location list dumping, where it's possible to do the same thing, i.e. express DWARF4 functionality as a special case of DWARF5. This is why two routines in DWARFUnit.cpp that deal with the setup of range list tables are now templatized, so they can be used for location list tables as well.

One thing to note is that this change is almost, but not quite NFC. The difference is the following: In DWARF4 the range list table in .debug_ranges is dumped with the list offset preceding the entries:

00000000 00000000 0000abcd
00000000 ffffffff 0000deaf
00000000 00000010 00000020

etc. with the leading '00000000' being the list's section offset, which is repeated for every entry.
I didn't find this particularly useful, so the refactor now displays the list with the correct section offset for each entry, with the list offset simply being the first entry's offset.

00000000 00000000 0000abcd
00000008 ffffffff 0000deaf
00000010 00000010 00000020

Apologies for the size of the patch, but this will make the changes for location list dumping smaller.

Diff Detail

Repository
rL LLVM

Event Timeline

wolfgangp created this revision.Aug 21 2018, 6:17 PM
dblaikie added inline comments.
include/llvm/DebugInfo/DWARF/DWARFListTable.h
52 ↗(On Diff #161862)

Why this new special case - when would the range list contain an empty range?

71 ↗(On Diff #161862)

This appears to be unused in this patch? (I see no other mention of 'addEntry') - delete it as dead code?

72–77 ↗(On Diff #161862)

This seems to be dead in this patch - maybe remove it from here and add it in whichever patch adds its use?

87–90 ↗(On Diff #161862)

That sounds like in all cases we know the version from the input, right? So when/why would this assumption be needed?

lib/DebugInfo/DWARF/DWARFDebugRnglists.cpp
30–33 ↗(On Diff #161862)

Would it be simpler to model DWARFv4 range list entries as RLE_offset_pair, rather than as start_end with an offset? It looks like then it'd fall out without a special case in getAbsoluteRanges (& that function may not need to be touched at all - it wouldn't need to change behavior based on version & maybe the assertions aren't worth the extra API change?)

84–92 ↗(On Diff #161862)

Suspect it might be fine to rephrase the DWARFv5 message ("start/end encoding" (or "offset pair at offset ..." ) - so it applies to the 4 and 5 case equally) to be more general & avoid the 4/5 special casing here. (or in offset_pair if the code moves there based on the previous comment)

117–127 ↗(On Diff #161862)

Perhaps this logic should move into the relevant (start_end or offset_pair) case above rather than here - since it doesn't/can't apply to other cases in the switch?

lib/DebugInfo/DWARF/DWARFUnit.cpp
318–319 ↗(On Diff #161862)

Maybe rather than propagating DWO information here, the Base could be set to Table.getHeaderSize() if there is no Base specified? (using Base == 0 or Base == -1 or something as a flagging value to communicate "no Base has been specified, fill one in")?

(though side rant: I don't understand why these offsets ever point past the header - that seems like it breaks the purpose of the header to provide section versioning, etc - it means the header format can never change and the parser would have to walk backwards from the base offset to parse the version to know how to parse the rest of the section anyway... @probinson @aprantl - though perhaps I've ranted sufficiently about this before)

493 ↗(On Diff #161862)

I think the ">= 5" phrasing might make more sense to readers "supported since 5" is sort of what I'm thinking when I read it.

test/DebugInfo/dwarfdump-ranges.test
18–23 ↗(On Diff #161862)

If you're updating all these test cases anyway - is it worth revisiting what the best format for dumping debug_ranges is? I'm not sure printing those offsets is especially meaningful (either in the old behavior or the new behavior) rather than just printing the starting offset, then printing the pairs.

& printing it in a way that's more similar between ranges and rnglistsn wouldn't hurt either - not sure if there's something in common that'd suit everyone though.

probinson added inline comments.Aug 22 2018, 2:46 PM
include/llvm/DebugInfo/DWARF/DWARFListTable.h
52 ↗(On Diff #161862)

It's explicitly legal to have a list that consists solely of the end-of-list marker. Not saying any producer would actually emit it this way, but it's one way to describe an empty range. The parser should handle this case smoothly.

dblaikie added inline comments.Aug 22 2018, 4:18 PM
include/llvm/DebugInfo/DWARF/DWARFListTable.h
52 ↗(On Diff #161862)

Good point - I guess I could narrow/clarify my comment somewhat: Why is the emptiness of a range list important? I'd figure the printing algorithm would work the same either way.

vleschuk added inline comments.Aug 23 2018, 2:32 AM
include/llvm/DebugInfo/DWARF/DWARFDebugRnglists.h
67 ↗(On Diff #161862)

I suggest that DWARFListType::extract() saves Version in class data, and thus we don't need to specify it here explicitly. Same suggestion regarding RangeListEntry and other classes.

include/llvm/DebugInfo/DWARF/DWARFListTable.h
68 ↗(On Diff #161862)

Why does dump() method need Version argument? We set the version when we are extracting the data. Maybe save it somewhere during extraction?

240 ↗(On Diff #161862)

To make naming consistent maybe we should rename it to getLength()?

lib/DebugInfo/DWARF/DWARFDebugRnglists.cpp
86 ↗(On Diff #161862)

Maybe illegal_byte_sequence would be more suitable here?

wolfgangp updated this revision to Diff 162310.Aug 23 2018, 5:45 PM
wolfgangp marked 9 inline comments as done.

Addressed David's review comments:

  • Changed the dumping of DWARF v4 range lists to resemble the new format and updated the tests. More detailed comments are inline.
  • Use DW_RLE_offset_pair to model the DWARF 4 range list entries.
  • Removed some dead code that will not be used for this patch.
  • Answered other miscellaneous comments.

Addressed Victor's comments:

  • renamed length() to getLength() for consistency.
  • not sure if I want to store the DWARF version in the list entries.
include/llvm/DebugInfo/DWARF/DWARFDebugRnglists.h
67 ↗(On Diff #161862)

I was trying to avoid having to store the version in each list and entry, since it's always available from the context (i.e. the table during dumping and extracting). Based on David's suggestion I eliminated it from this function. It would remove a parameter from the extract and dump routines, however. I'm not sure. I guess I can see it stored in the list, but perhaps not in each individual entry.

include/llvm/DebugInfo/DWARF/DWARFListTable.h
52 ↗(On Diff #161862)

Also , the DWARF5 rnglist implementation keeps the end-of-list entry in the list (unlike the DWARF4 implementation), which makes display a bit more straightforward.

52 ↗(On Diff #161862)

Good point - I guess I could narrow/clarify my comment somewhat: Why is the emptiness of a > range list important? I'd figure the printing algorithm would work the same either way.

This was motivated by the code in DwarfLinker.cpp (dsymutil) that needs to check for the emptiness of a list during range patching. With the end-of-list entry as part of the Entries vector it seemed prudent to provide this method.

68 ↗(On Diff #161862)

See my comment above. I'm not sure if I want the version to be stored in each individual entry, given that they are usually processed altogether and not passed around individually anyway. I could be persuaded, though. It would make the interface simpler.

71 ↗(On Diff #161862)

I will be using this with location lists in an upcoming patch. Ah, I see, it's bad form to add it before it's actually used. OK I'll delete it.

72–77 ↗(On Diff #161862)

Actually, it's used further down in DWARFListTableBase<DWARFListType>::dump

87–90 ↗(On Diff #161862)

During extraction we're using getTableLength() (I've renamed it from just 'length()' in the first version of this review) for validation and also to determine if we can recover from an error and keep parsing the section. With the creation of an artificial table for DWARF 4 range lists the calculation of the table length is now dependent on the version, since for DWARF 5 we need to add the size of the length field to the length we extracted. In DWARF 4 the length is simply the size of the section as is.
If we find an error in the table header before we get to extract the version, the version would obviously not be set to anything meaningful and would confuse our recovery in DWARFContext.cpp. Also, presetting the version allows us to use getTableLength() before we actually extract the version for validation.
But it's perhaps more appropriate to preset the version during extraction, so I'm doing it there now.

240 ↗(On Diff #161862)

I've renamed the DWARFListTableBase::length() to getLength() and
DWARFListTableHeader::length() to getTableLength().
DWARFListTableHeader::getLength() simply extracts the length field from the header data.

lib/DebugInfo/DWARF/DWARFDebugRnglists.cpp
30–33 ↗(On Diff #161862)

I tried this and it seems it would leave getAbsoluteRanges unchanged. The only negative IMO is that offset_pair does not expect relocated addresses but only the extractor has to worry about that.

84–92 ↗(On Diff #161862)

The error handling in offset_pair is different b/c DWARF 4 range lists take relocated addresses, whereas DWARF 5 offset_pair takes ULEBs. We have to validate after extracting in the DWARF 5 case. This makes the code for the 2 versions fairly different anyway. Is is still worth reconciling the error messages?

lib/DebugInfo/DWARF/DWARFUnit.cpp
318–319 ↗(On Diff #161862)

We're going to need DWO information later anyway during extraction of the (range or location) lists to decide which entries are valid, e.g. start_end, start_length, base_address are only valid in non-split CUs.

test/DebugInfo/dwarfdump-ranges.test
18–23 ↗(On Diff #161862)

If we're not too worried about anybody depending on the existing format, we could make the same distinction in the old mode between verbose and terse mode:

Verbose mode: section offsets + raw values of entries + final values of ranges (i.e. adjusted by base) displayed as intervals (between '[' and ')').

Terse mode: just the final ranges displayed as intervals.

The only difference between old and new would be no encoding strings in the old verbose mode.

18–23 ↗(On Diff #161862)

I changed the formatting to roughly what we do in the new version, so the table looks like this in verbose mode:

.debug_ranges contents:
0x00000000: 0x0000000000000000, 0x0000000000000001 => [0x0000000000000000, 0x0000000000000001)
0x00000010: 0x0000000000000003, 0x0000000000000006 => [0x0000000000000003,  0x0000000000000006)
0x00000020: 0xffffffffffffffff, 0x0000000000000000
0x00000030: 0x0000000000000001, 0x0000000000000002 => [0x0000000000000001, 0x0000000000000002)
0x00000040: 0x0000000000000000, 0x0000000000000000

and like this in terse mode:

.debug_ranges contents:
[0x0000000000000000, 0x0000000000000001)
[0x0000000000000003, 0x0000000000000006)
[0x0000000000000001, 0x0000000000000002)
<End of list>

Does this make sense?

vleschuk added inline comments.Aug 24 2018, 7:25 AM
include/llvm/DebugInfo/DWARF/DWARFDebugRnglists.h
67 ↗(On Diff #161862)

Agree, I think every entry would be overkill.

vleschuk added inline comments.Aug 26 2018, 10:37 PM
lib/DebugInfo/DWARF/DWARFDebugRnglists.cpp
86 ↗(On Diff #161862)

Why is this marked as done?

wolfgangp added inline comments.Aug 27 2018, 10:10 AM
lib/DebugInfo/DWARF/DWARFDebugRnglists.cpp
86 ↗(On Diff #161862)

It's because the relevant code was moved to the RLE_offset_pair clause. I made the change there Sorry I should have pointed this out.

This revision is now accepted and ready to land.Aug 27 2018, 12:33 PM
JDevlieghere accepted this revision.Aug 28 2018, 2:31 AM

A few nits, but otherwise this LGTM

include/llvm/DebugInfo/DWARF/DWARFListTable.h
175 ↗(On Diff #162310)

Spurious /// or missing comment?

305 ↗(On Diff #162310)

nit: no braces

lib/DebugInfo/DWARF/DWARFContext.cpp
337 ↗(On Diff #162310)

Maybe we can just refactor getMaxVersion to take an optional default version?

dblaikie added inline comments.Aug 28 2018, 10:20 AM
include/llvm/DebugInfo/DWARF/DWARFListTable.h
52 ↗(On Diff #161862)

Could this be addressed by keeping the DWARF4 end-of-list entry too, for consistency?

Though I'm still a bit confused - I'm not sure why DwarfLinker.cpp would need to worry about whether the list is empty or not? Maybe as an optimization, but otherwise I'd figure it would just iterate over the list & if it's empty, it would iterate over zero things.

& why does this change you're making cause this extra case to be added - I thought this code (DWARFListtype) was for DWARF5 rnglists already & now it was getting DWARF4 support - but it sounds like you had to add a case that is for the DWARF5 case... so I'm a bit confused?

72–77 ↗(On Diff #161862)

I guess perhaps I was thinking: this function seems to be a no-op & should be removed from this patch - and added back in when it's used/needed?

87–90 ↗(On Diff #161862)

I'm pretty confused by this - how does having a hardcoded (possibly incorrect) version help with error recovery? I guess it's correct based on the section being extracted from (though in the future it could be incorrect when we get to DWARFv6)?

What sort of recovery would be done when the version couldn't be read from the input?

wolfgangp updated this revision to Diff 162997.Aug 28 2018, 5:44 PM
wolfgangp marked 4 inline comments as done.

Addressed review comments:

  • refactored getMaxVersion() in DWARFUnit.h to take an optional default argument.
  • Removed some code that is a noop for this patch
  • Extracting the initial length field of a list table to a local instead of straight into the header data. This aids error recovery when we find an unsupported DWARF64 length field.
  • added a missing comment
  • corrected one coding guides violation.
include/llvm/DebugInfo/DWARF/DWARFListTable.h
52 ↗(On Diff #161862)

Could this be addressed by keeping the DWARF4 end-of-list entry too, for consistency?

Actually, that is what's happening with this change. I should have said "unlike the original DWARF4 implementation" in my previous comment.

Though I'm still a bit confused - I'm not sure why DwarfLinker.cpp would need to worry
about whether the list is empty or not? Maybe as an optimization, but otherwise I'd figure it
would just iterate over the list & if it's empty, it would iterate over zero things.

Looking at the code it seems that in the context in question it performs a remapping of ranges and only seems to look at the first entry of a range list to determine whether it can validly remap the entire range list. It is not iterating over the entries, so I guess it just wants to check whether there is a valid first entry in the first place.

& why does this change you're making cause this extra case to be added - I thought this
code (DWARFListtype) was for DWARF5 rnglists already & now it was getting DWARF4
support - but it sounds like you had to add a case that is for the DWARF5 case

Actually, this is dsymutil handling the .debug_ranges section, so basically DWARF 4 rangelists.
With this patch DWARFDebugRnglist is now handling both DWARF 4 and DWARF 5 ranges, and since we're keeping the end-of-list entry in the list the extra case had to be added, since we didn't do this previously in the DWARF4 code.

As for dsymutil handling DWARF5 rnglists (i.e. the .debug_rnglists section), that is not supported at this point.

72–77 ↗(On Diff #161862)

OK. It will be added back in later, because the corresponding function for the location lists will do something more meaningful.

87–90 ↗(On Diff #161862)

Error recovery (in dumpListSection() in DWARFContext.cpp) relies on the correct interpretation of the length field in the header. In DWARF 5 we need to add sizeof(uint32_t) to the extracted length, in DWARF 4 the length field has simply been set to the section size and already represents the length.

If we find errors before we extract the version, e.g. while validating the extracted length against section size etc., we return without knowing the version, and getLength( ) doesn't know what to do. Hence I wanted to preset the version until we extract it from the data.

The other relevant scenario is when we extract an invalid version from the data. In that case we report the error but need to set the version to something valid in order to have the length interpreted correctly.

I suppose we could handle this differently by providing a different interface to get the table length for recovery purposes. We always know the version in the context, so we would never have to preset the table version during extraction. Do you have any other suggestions?

dblaikie added inline comments.Aug 29 2018, 5:03 PM
include/llvm/DebugInfo/DWARF/DWARFListTable.h
87–90 ↗(On Diff #161862)

I think I'm just generally confused about this still.

(looks like, for now, the Version field is no longer preset? So maybe what I'm asking about has already been addressed? (I Was concerned the default value in the Version field might be misleading or otherwise cause problems later))

Why does the error recovery in dumpListSection only apply to DWARF4 and earlier?

It seems like the contract for "extract" is a bit subtle if you can still access some fields(like length) but not others. Maybe it'd be better to have extract modify an in/out parameter of the offset - rather than relying on reading the length after extract fails? Not sure - and that doesn't address the issue of printing partially extracted lists... - maybe that's not worth supporting?

wolfgangp added inline comments.Aug 29 2018, 6:07 PM
include/llvm/DebugInfo/DWARF/DWARFListTable.h
87–90 ↗(On Diff #161862)

Why does the error recovery in dumpListSection only apply to DWARF4 and earlier?

Actually, error recovery really only affects DWARF5. In DWARF4 we just have one large array of entries without table headers etc. so reading more tables after an error is not an issue. Printing partially extracted lists in DWARF4 was just me being paranoid about preserving existing behavior, but perhaps that's not necessary, as you say.

It seems like the contract for "extract" is a bit subtle if you can still access some fields(like length) but not others. Maybe it'd be better to have extract modify an in/out parameter of the offset - rather than relying on reading the length after extract fails?

That's a good point. After an error the entire table should be considered invalid, and accessing the length field is inconsistent with that. I'll look into that.

vleschuk added inline comments.Aug 29 2018, 8:41 PM
include/llvm/DebugInfo/DWARF/DWARFListTable.h
87–90 ↗(On Diff #161862)

For DebugAddr I have added two methods: bool hasValidLength() and void invalidateLength(). If an unrecoverable error occurs extract() calls invalidateLength() method and calling code checks hasValidLength(). This hides the implementation details from calling code (e.g. we don't need to compare Length to zero directly). I think we can go this way here too. (Actually I was going to suggest a patch for this after this revision is landed in order not to mix everything, as this patch already has a lot of changes).

wolfgangp updated this revision to Diff 163449.Aug 30 2018, 4:53 PM

When we find an error during table extraction, ListTableBase::extract is expected to set Offset to either 0 if the error is not recoverable, or to past the end of the table if it is. This removes the need to access the extracted length in the caller in the error case and simplifies recovery.

@vleschuk: Let me know what you think about this. I understand your suggestion but this seems a bit simpler to me.

wolfgangp added inline comments.Aug 30 2018, 4:58 PM
lib/DebugInfo/DWARF/DWARFContext.cpp
285 ↗(On Diff #163449)

Here we're no longer accessing the table length.

lib/DebugInfo/DWARF/DWARFListTable.cpp
45 ↗(On Diff #163449)

This indicates a recoverable error and the caller can continue with a presumably correct Offset.

@vleschuk: Let me know what you think about this. I understand your suggestion but this seems a bit simpler to me.

Looks good and consistent with other code in this area.

dblaikie accepted this revision.Sep 4 2018, 6:41 PM
dblaikie added inline comments.
test/DebugInfo/Generic/cu-ranges.ll
17–20 ↗(On Diff #163449)

Not sure if this is necessary/improving the check, perhaps? (if this test was going to check for the address ranges - it could've been doing that already, I'm not sure the recent changes to dumping make this more viable/appropriate? Not sure though)

wolfgangp updated this revision to Diff 164558.Sep 7 2018, 5:54 PM
wolfgangp marked an inline comment as done.

Added some checks to cu-ranges.ll to capture the intent of the test better.

test/DebugInfo/Generic/cu-ranges.ll
17–20 ↗(On Diff #163449)

Perhaps this change would capture the intent of the test better, if we want to verify that ranges for 2 different sections are created, i.e. more relying on the ability to display the section.

dblaikie accepted this revision.Sep 7 2018, 6:02 PM
dblaikie added inline comments.
test/DebugInfo/Generic/cu-ranges.ll
17–20 ↗(On Diff #163449)

At that point, we can probably drop the .debug_ranges part of this test entirely.

And the range checking in the debug_info section can probably skip the address ranges and just be:

CHECK:  DW_AT_ranges
CHECK-NEXT: "__TEXT,__foo"
CHECK-NEXT: ".text")+

Since the addresses aren't checked and this isn't a test for llvm-dwarfdump's syntactic output, but a test for LLVM's ranegs output, assuming dwarfdump's output is as expected.

But up to you what you reckon's more readable there.

@wolfgangp do you need any help with this revision before you commit it?

@wolfgangp do you need any help with this revision before you commit it?

Sorry for the delay. Other work got in the way a bit. I'll commit shortly.

This revision was automatically updated to reflect the committed changes.
llvm/trunk/lib/DebugInfo/DWARF/DWARFContext.cpp