llvm-dwarfdump was silent even when the format of DWARF was invalid and/or llvm-dwarfdump did not understand/support some of the constructs. This can be pretty confusing as llvm-dwarfdump is a tool for DWARF producers+consumers development.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/DebugInfo/DWARF/DWARFDebugInfoEntry.cpp | ||
---|---|---|
37–39 | Could probably include the bounds in the message? Maybe something like: "DWARF compile unit extends beyond its bounds [x, y) to z"? | |
52–55 | Maybe include some details about the offset to the debug_abbrev contribution, and the range of valid abbreviation values? (I forget if the abbrev table is necessarily contiguous - if it isn't, then maybe that's too complicated) | |
llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp | ||
387–406 | What was the motivation to move this code and add the Depth check in? I guess this didn't actually work/was untested, maybe? Could you explain why it didn't work, etc? |
llvm/test/tools/llvm-dwarfdump/X86/format-warnings.s | ||
---|---|---|
1 | The test deserves an introductory commetn describing what it is testing. Additionally, the name is too generic and possibly slightly misleading "format-warnings.s" suggests that the main aim of the test is to test the formatting of warning messages in general, whereas this test is more about invalid debug abbrev/debug info, so perhaps it could just be debug-entry-invalid.s or something like that. I have a personal preference to not use stdin to drive llvm-dwarfdump, and instead create an object file on disk with llvm-mc. This makes it easier to debug the test should something go wrong, since you can inspect the object without needing to change the test code. | |
3 | It would probably be a good idea if you used a non-zero value for the cu index. That will help flag up any conversion issues (e.g. because you used a 32-bit print format for a 64-bit number). Same goes for the remaining messages. Here and below, you can drop the "CHECK-" bit of the check prefixes, to make them more concise. | |
21 | Indentation here is a little inconsistent. | |
31 | Here and elsewhere, consider lining up your comments vertically. They're a bit all over the place currently. |
llvm/test/tools/llvm-dwarfdump/X86/format-warnings.s | ||
---|---|---|
1 | There is somewhat of a convention to prefer piping, because it means the input file name (whatever it happens to be on the buildbot/local filesystem) does not appear in the output - this reduces the chance that FileCheck commands might accidentally match on some part of the input file name - making the test more hermetic/reliable. |
To satisfy all the required detailed reporting I had to extend the patch far more than I originally inteded. Is it OK this way?
llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp | ||
---|---|---|
387–406 | That DIE.extractFast just exited on both errors and successful end of DIEs. As it did not report any errors the return code (false) was the same. So this one specific error was handled outside. But now when we do all the detailed error reporting we need to do it from inside DIE.extractFast as that has all the info available. Therefore this error has moved inside DIE.extractFast (if (Offset >= UEndOffset) there). | |
llvm/test/tools/llvm-dwarfdump/X86/format-warnings.s | ||
3 |
Done.
That would not show anything more as on 64-bit platforms variadic function extends all parameters to (at least) 64 bits. Therefore even 32-bit format will still read 64-bit variadic parameter. |
Not had time to review the test cases yet, but will do that next week.
llvm/include/llvm/DebugInfo/DWARF/DWARFContext.h | ||
---|---|---|
372–375 ↗ | (On Diff #352847) | I'd rename this function to getSupportedAddressSizes(), which reads slightly better. Also, it can be simplified, as shown in line. String literals don't have lifetime issues, so there's no need for the static local variable. |
llvm/lib/DebugInfo/DWARF/DWARFDebugAbbrev.cpp | ||
74 ↗ | (On Diff #352847) | Lambdas follow variable naming style, so flush -> Flush. |
103 ↗ | (On Diff #352845) | I'm not entirely convinced we need to print all valid abbrev values, even in ranged form, since typically, the abbrevs will go from 1-max in a contiguous set. That being said, I'm not opposed to it. I think the code could be simplified anyway. Something like the following is more readable to me. It also handles adjacent codes being non-contiguous within the set of abbrevs: std::string DWARFAbbreviationDeclarationSet::getCodeRange() const { // Create a sorted list of all abbrev codes. std::vector<uint32_t> Codes; Codes.reserve(Decls.size()); std::transform(Decls.begin(), Decls.end(), std::back_inserter(Codes), [](const DWARFAbbreviationDeclaration &Decl){ return Decl.getCode(); }); std::sort(Codes.begin(), Codes.end()); std::string Buffer = ""; raw_string_ostream Stream(Buffer); // Each iteration through this look represents a single contiguous range in the set of codes. for(auto Current = Codes.begin(), End = Codes.end(); Current != End;) { uint32_t RangeStart = *Current; // Add the current range start. Stream << *Current; uint32_t RangeEnd = RangeStart; // Find the end of the current range. while(++Current != End && *Current == RangeEnd + 1) ++RangeEnd; // If there is more than one value in the range, add the range end too. if (RangeStart != RangeEnd) Stream << "-" << RangeEnd; // If there is at least one more range, add a separator. if (Current != End) Stream << ", "; } return Buffer; } |
llvm/lib/DebugInfo/DWARF/DWARFDebugInfoEntry.cpp | ||
37 | It's not going to be clear to the end user that these two values represent offsets. I'd be more explicit: "DWARF unit from offset x to offset y ..." Same applies below. | |
56 | It's not clear to me (when reading the message without the code context) what these last two offsets represent. I suspect that the last offset is actually unnecessary, since the OffsetPtr location for this case is going to be fixed within the unit header, if I'm not mistaken. | |
llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp | ||
364 | I think this is a little clearer. | |
llvm/test/tools/llvm-dwarfdump/X86/debug-entry-invalid.s | ||
1 ↗ | (On Diff #352847) | Nit: I'm trying to encourage new tests to use '##' for comments, to help distinguish them from lit and FileCheck directives. |
llvm/test/tools/llvm-dwarfdump/X86/format-warnings.s | ||
3 | Right, but not all supported platforms are 64-bit. I've actually seen bugs precisely because of this sort of issue in similar code. |
llvm/lib/DebugInfo/DWARF/DWARFDebugAbbrev.cpp | ||
---|---|---|
103 ↗ | (On Diff #352845) | Yep, all this complexity would fall under the clause I mentioned in the original feedback: " (I forget if the abbrev table is necessarily contiguous - if it isn't, then maybe that's too complicated)" - so the table isn't contiguous, and maybe this is too complicated to be worth it? I really don't mind either way, at this point. @jhenderson's version seems nice, if we're going to do this. |
llvm/include/llvm/DebugInfo/DWARF/DWARFContext.h | ||
---|---|---|
372–375 ↗ | (On Diff #352847) |
True, I am stupid. |
llvm/lib/DebugInfo/DWARF/DWARFDebugAbbrev.cpp | ||
103 ↗ | (On Diff #352845) | TBH I did not sort the Abbrevs intentionally. This debugging output is for DWARF developers, not for end users. By sorting it the message gets too disconnected from what is really written in the DWARF. Moreover when usually the Abbrevs are sorted. But I have accepted your version. |
llvm/lib/DebugInfo/DWARF/DWARFDebugInfoEntry.cpp | ||
37 | I haven't changed this yet. I disagree with "from offset x to offset y" as that would need to be rather "from offset x incl. to offset y excl." which already looks to me too talkative. Primarily as this message is for DWARF developers, not for end users. | |
56 |
I agree, thanks. | |
llvm/test/tools/llvm-dwarfdump/X86/format-warnings.s | ||
3 | It is true 32-bit buildbots would catch it. |
llvm/lib/DebugInfo/DWARF/DWARFDebugAbbrev.cpp | ||
---|---|---|
103 ↗ | (On Diff #352845) | I don't have a strong opinion about whether they should be sorted or not, and if you would prefer dropping the sorting, that's fine (I think the rest of the code will just work, but am not 100% certain without spending more time than I care to thinking about it). I'm also happy if you'd prefer to drop the entire thing. |
81 ↗ | (On Diff #353241) | |
llvm/lib/DebugInfo/DWARF/DWARFDebugInfoEntry.cpp | ||
37 | With at least offsets, I would never read "from offset X to offset Y" as inclusive at both ends, because of the nature of what an offset represents. But maybe it is a real issue. You could achieve the same meaning, without the ambiguity risk by saying "with length X at offset Y" instead, for example. The length is actually the thing that's encoded in the DWARF after all. You could make it slightly less talkative like this: "DWARF unit (offset 0x1234, length 0x4321) tries to ...". I don't think the two error messages are quite equivalent. In the DataExtractor one, the message talks about reading the range, and therefore it's somewhat clearer that you're dealing with an offset, whereas here the numbers in the range aren't things that are mentioned as being read or similar, so you lose that context. | |
54 | In this case, you don't need the end offset of the unit, as it has no impact here - only the start offset is actually important, so you can identify the unit that is being read. | |
66 | AbbrCode is a uint64_t. The test will fail on some platforms due to either bitness or endianness issues. | |
80 | dwarf::Form is specified to be a uint16_t in its declaration. | |
llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp | ||
283 | I wonder if this error needs putting earlier, in case the header is truncated? | |
287 | debug_info.size() returns a size_t, so may not always be 64 bits. | |
288 | As this is for DWARF developers, maybe it would be best to use the actual field name as defined by the DWARF standard (specifically "type_offset"), for something like: DWARF type unit (offset 0x1234, length 0x4321) type_offset 0x1111 points inside the header or past the unit end". (I also included my suggestions from above, and a couple of other wording suggestions - I'm not a massive fan of specifying "relative boundary" because it's not clear to me what the boundary is relative to). | |
290 | Size is a uint8_t. | |
291 | I'd put this before the type_offset check, as a different DWARF version might not have the type offset field at all etc. | |
295 | getVersion() returns a uint16_t. | |
303 | As above, I don't think you need the end offset here, as it isn't relevant for this message. | |
304 | getAddressByteSize() returns a uint8_t. | |
llvm/test/tools/llvm-dwarfdump/X86/debug-entry-invalid.s | ||
25 ↗ | (On Diff #353241) | I'd test both the cases where the offset points to within the header and past the end of the unit. |
93 ↗ | (On Diff #353241) | Nit: all the other comments use upper-case for their first letter. |
llvm/test/tools/llvm-dwarfdump/X86/format-warnings.s | ||
3 |
| |
3 | I've gone through and highlighted where else I see a type mismatch in your print formats versus the type being used. I think it would be a good idea to match the types, as whilst the implementation eventually calls a function that uses variadic arguments, I don't think there's any strict requirement for it to do so. Plus, the integer promotion is subtle, and unnecessary code subtlety harms maintainability. Finally, some of those might actually result in bugs. From my understanding, integer promotion of variadic arguments is only as far as int/unsigned int. The size of int is implementation defined and not necessarily the same as uint32_t (though admittedly I don't know of any cases where it isn't currently), nor anything necessarily to do with the host system bitness. Thus, when using uint*_t types, you should use their corresponding macros for printing. |
llvm/lib/DebugInfo/DWARF/DWARFDebugInfoEntry.cpp | ||
---|---|---|
37 |
That is definitely ambiguous as it can mean length b-a of [a, b) or it can mean length stored in the binary which is b-a-4 (for DWARF32). I used the offsets with "incl." and "excl." to move forward.
So you did mean the b-a-4. |
Changes the messages to "incl." and "excl." as I hope that can be acceptable for both of us.
I have newly used joinErrors there which I intended originally and then I forgot about it. It creates a two-line error:
warning: DWARF unit at 0x0000002c cannot be parsed: warning: unexpected end of data at offset 0x2d while reading [0x2c, 0x30)
LGTM, with one possible suggestion, but also happy if this is committed as-is. You might want to wait for @dblaikie too.
llvm/include/llvm/DebugInfo/DWARF/DWARFContext.h | ||
---|---|---|
372–375 ↗ | (On Diff #353364) | This is probably absolutely fine, but I was thinking about it and wondering whether it would make some sense to factor out the commonality into some sort of container, that the string function can iterate over to generate a string, and the bool function can just compare values against. What do you think? |
llvm/lib/DebugInfo/DWARF/DWARFDebugInfoEntry.cpp | ||
37 | Fair point. Let's stick with your latest version. |
llvm/include/llvm/DebugInfo/DWARF/DWARFContext.h | ||
---|---|---|
372–375 ↗ | (On Diff #353364) | I did not want to code it that myself first as it looked overengineered to me and it still looks so. But why not if there is an agreement upon it. I agree there is no information duplication then. |
llvm/lib/DebugInfo/DWARF/DWARFDebugAbbrev.cpp | ||
75 ↗ | (On Diff #353934) | I have put there a simple iterator instead of llvm::transform as it is really shorter and easier to read. |
It's not going to be clear to the end user that these two values represent offsets. I'd be more explicit: "DWARF unit from offset x to offset y ..."
Same applies below.