This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo] Add basic support for DWARF 5 .debug_rnglists dumping
ClosedPublic

Authored by jhenderson on Jan 24 2018, 7:31 AM.

Details

Summary

This change adds support to llvm-dwarfdump for dumping a DWARF5 .debug_rnglists section in a regular ELF file. It is not complete, in that several DW_RLE_* encodings are not currently supported, but does dump the header and the basic ranges for DW_RLE_start_length and DW_RLE_start_end encodings.

Obvious next steps are to add verbose dumping that dumps the raw encodings, rather than the interpreted contents, to add -verify support of the section (e.g. to show that the correct number of offsets is included), add dumping of .debug_rnglists.dwo, and to add support for other encodings.

Diff Detail

Repository
rL LLVM

Event Timeline

jhenderson created this revision.Jan 24 2018, 7:31 AM

I'm guessing several of these invalid tests could be in one file - only the "length too long" should need to be last in the list, right? ("length too short" is undetectable, isn't it? it'd just appear as though the remaining bytes outside the length were the next entry and that entry would be incorrect in some way (its length, version, etc))

Also, it might be that we want more fidelity in the dumping, so that we can test the differences in encoding between start+length, start+end, offset+offset, etc?

More or less like I was dabbling with for loc lists here: https://reviews.llvm.org/D36300 you can see how different loc list ranges are dumped differently, explaining how the loc list is structured not just what it means semantically.

I'm guessing several of these invalid tests could be in one file - only the "length too long" should need to be last in the list, right? ("length too short" is undetectable, isn't it? it'd just appear as though the remaining bytes outside the length were the next entry and that entry would be incorrect in some way (its length, version, etc))

The "length too short" test is not undetectable in this instance, because it leaves the .debug_rnglists without an end of list marker, which is invalid. In other cases, you are right, a short length may be undetectable e.g. if the length missed out an entire list of ranges, or the entire payload. The invalid_offset_entries test can't be combined with the too long test, because both effectively result in trying to read more than there is available, in the current code (we don't bother to validate the number of offset entries against the section size, although that is something that could be done - I considered this unnecessary because the DataExtractor class will simply read 0s if it runs off the end of the section). I could combine the invalid header case, the short case, and one of the long length or invalid offset cases, but as that leaves one separate case, I feel like the clarity of distinct test cases is better at that point.

Also, it might be that we want more fidelity in the dumping, so that we can test the differences in encoding between start+length, start+end, offset+offset, etc?

More or less like I was dabbling with for loc lists here: https://reviews.llvm.org/D36300 you can see how different loc list ranges are dumped differently, explaining how the loc list is structured not just what it means semantically.

I agree that this would be useful to have, but I felt like this belonged under the "verbose" switch in llvm-dwarfdump. This would be similar to .debug_line, where -debug-line dumps just the interpreted line table, whilst -debug-line -verbose dumps the encoding as well as the interpretation of it.

JDevlieghere added inline comments.Jan 25 2018, 6:41 AM
include/llvm/DebugInfo/DWARF/DWARFDebugRnglists.h
23

As this is so similar to DWARFAddressRange it would be nice if we could reuse that in some way. We already have infrastructure to verify these entries for things like overlap that we might want to reuse in the verifier.

lib/DebugInfo/DWARF/DWARFContext.cpp
488

Set?

lib/DebugInfo/DWARF/DWARFDebugRnglists.cpp
23

Let's return an LLVM error instead of a bool so we can convey a useful error message to the caller.

109

LLVM Coding standard says no braces around single line if/for

117

Same here and on the next line.

I'm guessing several of these invalid tests could be in one file - only the "length too long" should need to be last in the list, right? ("length too short" is undetectable, isn't it? it'd just appear as though the remaining bytes outside the length were the next entry and that entry would be incorrect in some way (its length, version, etc))

The "length too short" test is not undetectable in this instance, because it leaves the .debug_rnglists without an end of list marker, which is invalid.

Sure, but it would be detected as "missing end of list marker" rather than "length too short". I would think we should probably be respecting the length to find the boundaries - rather than assuming the length is wrong and walking further to find the beginning of the next contribution.

In other cases, you are right, a short length may be undetectable e.g. if the length missed out an entire list of ranges, or the entire payload. The invalid_offset_entries test can't be combined with the too long test, because both effectively result in trying to read more than there is available, in the current code (we don't bother to validate the number of offset entries against the section size, although that is something that could be done - I considered this unnecessary because the DataExtractor class will simply read 0s if it runs off the end of the section).

As developer tools, it might be good for llvm-dwarfdump to provide more precise error messages to ensure debugging is as easy as possible (if it says "hey, there's a zero here" and there's no zero - it just reached the end of the contribution - that could cause some painful mislead debugging)

I could combine the invalid header case, the short case, and one of the long length or invalid offset cases, but as that leaves one separate case, I feel like the clarity of distinct test cases is better at that point.

Also, it might be that we want more fidelity in the dumping, so that we can test the differences in encoding between start+length, start+end, offset+offset, etc?

More or less like I was dabbling with for loc lists here: https://reviews.llvm.org/D36300 you can see how different loc list ranges are dumped differently, explaining how the loc list is structured not just what it means semantically.

I agree that this would be useful to have, but I felt like this belonged under the "verbose" switch in llvm-dwarfdump. This would be similar to .debug_line, where -debug-line dumps just the interpreted line table, whilst -debug-line -verbose dumps the encoding as well as the interpretation of it.

Fair enough

jhenderson marked 5 inline comments as done.

Addressed review comments.

In switching to using Errors, I realised I could add several more errors producing finer grained error messages. To go along with that, I've also rewritten the tests quite a bit, mostly to combine the error message tests into a single test, and to test a couple of corner cases.

With this change, the behaviour is to try to keep going and parse other tables in the same section, as much as possible. In keeping with what happens with other debug sections, when there is an error with the .debug_rnglists parsing, llvm-dwarfdump still does not exit with a failure.

include/llvm/DebugInfo/DWARF/DWARFDebugRnglists.h
23

Good idea. I wonder if DWARFAddressRange should be moved into its own file, since it's not specific to the DWARFDebugRangeList class?

lib/DebugInfo/DWARF/DWARFContext.cpp
488

Oops, that was leftover from an earlier version when the class was called something to do with sets.

lib/DebugInfo/DWARF/DWARFDebugRnglists.cpp
23

I like this idea. It allows for more fine-grained behaviour than simply not printing the section.

117

There are two lines in the outer loop!

dblaikie added inline comments.Jan 31 2018, 10:44 AM
lib/DebugInfo/DWARF/DWARFContext.cpp
488

Move this variable into the loop - since it's not intended to preserve state between iterations of the loop? I realize it means some memory reuse by keeping it outside the loop, but I'd probably classify that as premature optimization?

lib/DebugInfo/DWARF/DWARFDebugRnglists.cpp
20

Could this be written as "HeaderData = {}" ?

25–37

These seem a bit weirdly special cased to handle certain string layouts.

Could you instead use the LLVM string format library as you've used elsewhere?

104

Probably more appropriate to use createError here - since it's certainly reachable by user-defined inputs. But up to you if you reckon this'll be around for a really short time.

test/tools/llvm-dwarfdump/X86/debug_rnglists_invalid.s
9–27 ↗(On Diff #132142)

Looks like this should move to the some_invalid.s file? Alternatively, could that assembly be moved into this file?

JDevlieghere added inline comments.Feb 1 2018, 2:16 AM
include/llvm/DebugInfo/DWARF/DWARFDebugRnglists.h
23

Sounds like a good idea!

lib/DebugInfo/DWARF/DWARFDebugRnglists.cpp
117

Right, missed the last OS << ... underneath :-)

jhenderson updated this revision to Diff 132383.Feb 1 2018, 6:32 AM
jhenderson marked 7 inline comments as done.

Addressed review comments.

dblaikie accepted this revision.Feb 1 2018, 11:40 AM

Looks good to me

lib/DebugInfo/DWARF/DWARFDebugRnglists.cpp
104

Test cases for these, then? (the same test cases can be updated once there's better dumping support to test the behavior there)

This revision is now accepted and ready to land.Feb 1 2018, 11:40 AM
This revision was automatically updated to reflect the committed changes.